Who can deposit data to the Repository
Anyone with a valid Cambridge CRSid can upload the data by logging into Symplectic Elements. Anyone who has a legitimate reason to submit their data to the University of Cambridge data repository (see the repository terms of use) but who does not have a valid CRSid, should e-mail us to request external user access.
What can be deposited
Research data (and code or software) connected to the University of Cambridge can be deposited in the repository. This includes: datasets created by current or former University of Cambridge researchers, research students or staff members; datasets resulting from research conducted at the University of Cambridge; datasets that appear in a journal published, or a conference hosted, by the University; or datasets resulting from research undertaken using University facilities. Data made available in the repository is publicly visible. If the data derive from human participants then data that identifies individuals, either directly or indirectly, cannot be deposited in the repository (exceptions exist where explicit consent to do so has been obtained from participants). Where the data are deidentified, the depositor must provide evidence to indicate that participant consent has been obtained that permits sharing of the pseudonymised or anonymised data publicly.
How to deposit a dataset
Click here for step-by-step instructions on how to deposit your research data (software or code) via your Symplectic Elements account.
Information required when depositing data
Information about the nature of the data
When submitting your data we will ask you if your data contain any personal/sensitive, commercially sensitive or other forms of confidential/restricted information, and whether you have the rights to share these data via the repository.
Information about personal/sensitive information
For information about personal/sensitive information have a look here. If you have any doubts about this question, please consult the University of Cambridge Ethics pages or e-mail the University’s Research Governance and Integrity Officer.
Information about other forms of confidential/restricted information
Examples of other forms of confidential/restricted information might include cases where there are confidentiality/publication restrictions in sponsorship or collaborations agreements, or, for example, where research data are subject to UK export control law.
Rights to share the data
Depositors must confirm that they have the authority or permission to deposit data. This includes obtaining permission to share the data from any third parties who might hold rights over this dataset.
If you have any doubts about your rights to deposit and share your data, please consult (in the given order):
- the Principal Investigator responsible for this study (if you are not the Principal Investigator)
- your Departmental Administrator (or equivalent)
- the appropriate Contracts Manager at the Research Operations Office.
Compulsory information about your dataset
When you submit your data to us we will ask for:
- Title of the dataset
- The authors of the dataset
- Information about the publications (or thesis) associated with your dataset (if applicable):
- If your data supports a publication (or publications), we will also ask you for details of the publication (title of the publication, DOI of the publication)
- Embargo options:
- If you wish, you will be able to select the option to embargo your dataset. Note that we will only embargo datasets until the associated publication has been published. Data files will not be publicly available while an embargo is in place, but the metadata will be publicly visible.
- Description of the data
- This is important contextual information about the dataset and documenting your data comprehensively is an essential part of sharing your data well. Where are these data from? How were the data generated? Give details about your data that will help someone else understand your data and reuse it effectively. You may wish to embed discipline-specific metadata in your dataset documentation – the Digital Curation Centre provides examples of disciplinary metadata. We recommend that you deposit a Readme file alongside your dataset, or multiple Readme files if your dataset consists of multiple files and folders.
- Keywords
- Choose keyword to make your data discoverable via search engines.
- Sponsorship information and grant IDs
- Link your dataset deposit to all sources of funding and sponsorship that directly relate to the research that produced the dataset.
- Information on file formats:
- You need to list all file formats present in your dataset.
- You are strongly encouraged to submit your dataset in open formats, to facilitate long-term preservation and accessibility of your data. We recognise that it is not always possible to export all data files into open formats; therefore, research data in proprietary file formats are also accepted into the repository but you will need to provide information about software required to read and process your files.
- If applicable, the same data can be provided in both proprietary and non-proprietary (open) formats.
- Guidance on choosing file formats is available here.
- Software:
- Information about the software needed to read your files or any other information that someone might find useful in order to open/process your data files.
- Licence for your data:
Optional information
- Related resources
- If you would like us to link your dataset with other relevant existing resources (for example, your other existing publications, other datasets, external reports, webpages, news articles etc.), please provide URLs here.
- ORCID
- Open Researcher and Contributor ID (ORCID) provides each academic with a unique identifier, and is increasingly required by publishers and by data repositories at the stage of research output submission. The use of ORCID ensures that each academic’s research activities are distinguished from those of others with similar names.
- See Research Information for how to link your research outputs in Symplectic Elements to your ORCID.
- Name of the Principal Investigator
- If you are not the Principal Investigator, you will be asked to indicate who was the Principal Investigator
- Additional information:
- Use to provide notes for the Research Data team. This information is for internal use only and will not be published alongside your dataset.
Compulsory administrative information
We will also collect some administrative information about you in order to process and preserve your research data:
- Your name and surname
- Your e-mail address
- Your department/institute
Data files
Finally, you will be asked to upload your files. Only submit those files that are the final versions for publication as you will not be able to delete files. If you have mistakenly uploaded the wrong file(s) then contact us immediately. You are responsible for consulting the guidance on file formats before submitting your data to the data repository.
Note that the maximum file size for individual files that can be uploaded via Symplectic Elements is 2GB. In addition, you will only be able to deposit up to a total of 2GB at a time. To deposit more data than this, you will need to return to the dataset record and redeposit additional files.
If the total size of your dataset exceeds 20GB, there will be a one-off charge of £4 per GB (e.g. a dataset of 24GB will cost £96 to deposit. Datasets up to 20GB are free to deposit.
What happens after you have deposited your dataset
We will respond to you within three working days following your dataset submission. The DOI for your dataset will be visible in the Symplectic Elements record for your dataset immediately after you have made your submission (find this in the 'DOI' field in the 'Apollo' section of the 'Data sources' box). If the dataset is associated with a publication (or publications), you should cite your dataset DOI in the data availability statement in the associated publication(s) and provide the full dataset citation in the reference list.
If you have specified that your dataset contains sensitive information, or if we suspect sensitive content based on the information provided, we shall contact you for more information.
If you have submitted your dataset as a placeholder record, we will contact you and confirm the dataset DOI and provide instructions on how to finalise your dataset. We will wait for you to finalise your dataset but please note that we expect datasets to be finalised prior to or in time with the release of the associated manuscript by the publisher.
If you have submitted your dataset as the final version (or after you have finalised your placeholder dataset), we then review your data submission before uploading it to the repository. Please note that our DOI policy states that datasets cannot be changed (e.g. files added, removed or amended) after they have been approved into the repository. Only finalised datasets will be processed into the repository. If you wish to submit a dataset in draft form to be finalised later (e.g. after peer review) then choose to submit your dataset as a placeholder record.
When we review a finalised dataset, we check the following before approving the dataset into the repository:
- Is this dataset submitted by (or on behalf of) a current/former University of Cambridge researcher, research student, or staff member?
- Does the submitter have the rights to share the data via the Repository? See ‘Responsibilities of Depositors’ in the Repository Terms of Use.
- Does the dataset contain any confidential/restricted information?
- Do the files open without errors?
- Is the submission accompanied by appropriate metadata description? This includes keywords, a detailed dataset description, software instructions and, if applicable, additional documentation files such as readme file(s), a codebook or data dictionary.
- Are the file formats suitable for long-term access and preservation of data files? If not, could the files be exported to a different file format, more suitable for preservation?
- If applicable, has the title of the publication associated with the dataset been provided?
- Has an external email address (i.e. not a cam.ac.uk address) been provided?
If your dataset contains information pertaining to human participants (including de-identified data) we will contact you to ask if you have the correct consent to allow your data to be made publicly available in the Repository. We will also ask you to send us a copy of the consent forms and/or participant information sheets for our review. All data deposited in the Repository are made publicly available.
We expect datasets to have appropriate metadata supplied so that the contents of the dataset can be understood and reused by others. If we think that any information is missing, we will get in touch with you to request the missing information.
If all the information is provided, we will upload your data into the repository and send you confirmation that your dataset has been published. Your dataset will be linked to the DOI of the associated publication, either at the point of deposit in the repository or at a later date if unavailable beforehand. We will also link your dataset to the corresponding manuscript in the repository (e.g. the accepted manuscript submitted to the Open Access team) and vice versa. These steps help to increase the findability of your dataset, enhancing opportunities for dataset citation. The recommended citation for your dataset (author(s), publication date, title and DOI) is provided on the page for your dataset in the repository.
If you have selected to embargo your dataset, access to the data files can be granted only by the dataset author via our request a copy service while the files are under embargo. Although the data files are not publicly available while datasets are embargoed, please note that metadata for your dataset are publicly visible and findable via search engines. Embargoes on datasets associated with articles are removed as soon as we are aware that the article has been published .
Quality assurance
Our quality and approval checks for datasets and software/code outputs have developed over time. The last four years have seen various process improvements. The Depositor’s Checklist was first made available in November 2022 and aims to assist research staff and students to deposit their data in Apollo and ensure its optimal quality. We will not publish a dataset in Apollo that does not meet our quality standards except under exceptional circumstances. If we do publish a dataset that does not meet our standards, or if we are informed about any quality issue concerning an existing dataset, then we will provide metadata in the dataset record to describe areas of concern. This is to ensure transparency. Notification of any issues with a record in Apollo can be raised by contacting us, stating the issues and providing the record’s DOI.
Frequently asked questions about depositing a dataset
Q. I need a link (a DOI) for my dataset to include in my manuscript but I’m not ready to upload the final version of my dataset
A. Submit your dataset as a ‘placeholder’ record and you will automatically receive a DOI to cite in your manuscript. Reference the dataset DOI in the data availability statement and the dataset's full citation in the reference list. To submit a placeholder dataset, complete the mandatory fields in the form, mark the status of the record as ‘placeholder’ and agree to the repository licence agreement. Do not upload any data files unless these will be the final versions for publication in the repository – you will not be able to delete files once uploaded. You can finalise your dataset later but the dataset must be finalised before or closely in line with the publication of any manuscripts (e.g. journal articles) associated with the dataset.
Q. My dataset contains information derived from human participants. What do I need to consider when depositing my data?
A. Data that identify individuals, either directly or indirectly, cannot be deposited in the repository. There are rare exceptions but these are deposited only if evidence is provided that explicit consent has been obtained from participants that permits sharing the data publicly. The Research Data team checks data files for any personal/sensitive information but it is your responsibility as the dataset author to ensure that no personal/sensitive data are present. You will need to provide us with evidence (e.g. copies of consent forms, participant information sheets, privacy statements) to prove that you have the correct permissions to share the data publicly. We are particularly interested in the wording used regarding data sharing in the documentation for participants. This is relevant to anonymised as well as pseudonymised data. You will need to provide us with this evidence before citing the DOI for your dataset anywhere just in case the consents preclude publication of your dataset in the repository.
Q. The article linked to my dataset is currently under review and I’m not sure if I’ll need to make any changes to my dataset as a result of the review process.
A. Submit your data as a placeholder record and finalise your data at a later date after the paper has been reviewed (but before the paper is published). You will not be able to update the final version of your dataset (e.g. to amend, add, remove files) after it has been published in the repository so it is important that the final file versions are uploaded. You can provide an updated version of the same dataset at a later date but this will need to be submitted as a new dataset record, which will have a new DOI. We will not be able to remove any previous versions of your dataset from the repository but we will create links between the different versions so that the most recent one is easy to find.
Q. I’m trying to upload a file that’s bigger than 2GB but it’s not working. How do I deposit my data?
A. It is not possible to use Symplectic Elements to upload a single data file larger than 2GB. You will receive an error message if you try to do so and your data will not be deposited. You have two options: (1) if it does not compromise the integrity of your data you can split the large file into smaller files of less than 2GB each – you will need to deposit less than 2GB at a time, returning to your dataset publication details in order to redeposit the additional files; (2) contact the Research Data team at info@data.cam.ac.uk and we can deposit the >2GB file into the repository on your behalf. See the next question (‘My dataset is bigger than 20GB. How do I deposit it?) for specific steps to take for each option.
Q. My dataset is bigger than 20GB. How do I deposit it?
A. If your dataset exceeds 20GB in size then there are costs associated with depositing your data. This consists of a one-off charge of £4/GB; for example, a dataset of 40GB will incur a single non-recurring charge of £160. Because of the size of the dataset, we will deposit the dataset directly into the repository for you. You should contact us, explaining that the dataset exceeds 20GB and providing us with a link to your dataset so that we can download the files (e.g. OneDrive, Dropbox, Google Drive link). You will still need to fill in the dataset form in Symplectic Elements and agree to the Repository licence terms. It is essential that you create and upload a README file, or a series of README files, to describe the content of your dataset. See here for guidance on README file creation.
Q. My dataset is over 2GB in size and I have several files to deposit. An error message stops me from depositing all the files in my dataset at once. What can I do?
A. Only up to 2GB of data can be deposited at a time. For example, you may have one file of 1GB and another of 0.5GB – you can deposit these together but you will not be able to deposit at the same time an additional 1GB file as this brings the total to 2.5GB, exceeding the system limit. Deposit files that fall under the 2GB limit. You will then be able to add additional files by redepositing to the same dataset record. To do this, take the following steps once you have deposited the initial set of files: (1) click “View your publication details”, which will bring you back to your dataset record; (2) click “View” within the box that displays the files already deposited; (3) this brings you to the ‘Redeposit publication’ page where you can choose the additional files to upload; (4) click the ‘Redeposit’ button and your files will upload. You will need to repeat this process until all your data files have been uploaded. If your dataset exceeds 20GB in total, or if you experience any problems, then contact the Research Data team.
Q. My dataset consists of several folders with a number of sub-folders and files. How can I retain the structure?
A. To retain the folder structure you can compress the folder (or series of folders) and uploaded the compressed (e.g. .zip) file. We strongly recommend that you create and upload a README file that describes in detail the contents of your dataset. This will help others to understand your dataset and interpret its contents, and will enable it to be reused effectively.
If you have any additional questions about data submissions or data sharing, please also see our FAQ page. If you cannot find the answer to your question, please contact us.