This page provides guidance for dataset depositors on:

Information required when depositing data

What happens after you have deposited your data

Frequently asked questions about depositing data

Visit this page to deposit your dataset via Symplectic Elements and for brief step-by-step instructions on how to do this.

Who can deposit data to the Repository

Anyone with a valid Cambridge CRSid can upload the data by logging into Symplectic Elements. Anyone who has a legitimate reason to submit their data to the University of Cambridge data repository (see the repository terms of use) but who does not have a valid CRSid, should e-mail us to request external user access.

What can be deposited

Research data (including code, software or methods) connected to the University of Cambridge can be deposited in the repository. This includes: data created by current or former University of Cambridge researchers, research students or staff members; data resulting from research conducted at the University of Cambridge; data that appear in a journal published, or a conference hosted, by the University; or data resulting from research undertaken using University facilities. Data made available in the repository is publicly visible. If the data derive from human participants then data that identifies individuals, either directly or indirectly, cannot be deposited in the repository (exceptions exist where explicit consent to do so has been obtained from participants). Where the data are deidentified, the depositor must provide evidence to indicate that participant consent has been obtained that permits sharing of the pseudonymised or anonymised data publicly.

How to deposit data

Click here for step-by-step instructions on how to deposit your research data (including software, code and methods) via your Symplectic Elements account.

Information required when depositing data

Information about the nature of the data

When submitting your data we will ask you if your data contain any personal/sensitive, commercially sensitive or other forms of confidential/restricted information, and whether you have the rights to share these data via the repository.

Information about personal/sensitive information

For information about personal/sensitive information have a look here. If you have any doubts about this question, please consult the University of Cambridge Ethics pages or e-mail the University’s Research Governance and Integrity Officer.

Information about other forms of confidential/restricted information

Examples of other forms of confidential/restricted information might include cases where there are confidentiality/publication restrictions in sponsorship or collaborations agreements, or, for example, where research data are subject to UK export control law.

Rights to share the data

Depositors must confirm that they have the authority or permission to deposit data. This includes obtaining permission to share the data from any third parties who might hold rights over this data.

If you have any doubts about your rights to deposit and share your data, please consult (in the given order):

the Principal Investigator responsible for this study (if you are not the Principal Investigator)
your Departmental Administrator (or equivalent)
the appropriate Contracts Manager at the Research Operations Office.

Compulsory information about your data

When you submit your data to us we will ask for:

Select the type of research output: Dataset, software/code or method
Title of the data
The authors of the data
Information about the publications (or thesis) associated with your data (if applicable):
- If your data supports a publication (or publications), we will also ask you for details of the publication (title of the publication, DOI of the publication)
Embargo options:
- If you wish, you will be able to select the option to embargo your data (not available for method output type). Note that we will only embargo data until the associated publication has been published. Data files will not be publicly available while an embargo is in place, but the metadata will be publicly visible.
Description of the data
- This is important contextual information about the data and documenting your data comprehensively is an essential part of sharing your data well. Where are these data from? How were the data generated? Give details about your data that will help someone else understand your data and reuse it effectively. You may wish to embed discipline-specific metadata in your data documentation – the Digital Curation Centre provides examples of disciplinary metadata. We recommend that you deposit a Readme file alongside your data, or multiple Readme files if your data consists of multiple files and folders.
Keywords
- Choose keyword to make your data discoverable via search engines.
Sponsorship information and grant IDs
- Link your data deposit to all sources of funding and sponsorship that directly relate to the research that produced the data.
Information on file formats:
- You need to list all file formats present in your data.
- You are strongly encouraged to submit your data in open formats, to facilitate long-term preservation and accessibility of your data. We recognise that it is not always possible to export all data files into open formats; therefore, research data in proprietary file formats are also accepted into the repository but you will need to provide information about software required to read and process your files.
- If applicable, the same data can be provided in both proprietary and non-proprietary (open) formats.
- Guidance on choosing file formats is available here.
Software:
- Information about the software needed to read your files or any other information that someone might find useful in order to open/process your data files.
Licence for your data:
- You will be required to indicate what type of licence you would like to apply to your research data. You can read information about available licences here and a nice graphic explaining licence types can be viewed here.

Optional information

Related resources
- If you would like us to link your data with other relevant existing resources (for example, your other existing publications, other datasets, external reports, webpages, news articles etc.), please provide URLs here.
ORCID
- Open Researcher and Contributor ID (ORCID) provides each academic with a unique identifier, and is increasingly required by publishers and by data repositories at the stage of research output submission. The use of ORCID ensures that each academic’s research activities are distinguished from those of others with similar names.
- See Research Information for how to link your research outputs in Symplectic Elements to your ORCID.
Name of the Principal Investigator
- If you are not the Principal Investigator, you will be asked to indicate who was the Principal Investigator
Additional information:
- Use to provide notes for the Research Data team. This information is for internal use only and will not be published alongside your data.

Compulsory administrative information

We will also collect some administrative information about you in order to process and preserve your research data:

Your name and surname
Your e-mail address
Your department/institute

Data files

Finally, you will be asked to upload your files. Only submit those files that are the final versions for publication as you will not be able to delete files. If you have mistakenly uploaded the wrong file(s) then contact us immediately. You are responsible for consulting the guidance on file formats before submitting your data to the data repository.

Note that the maximum file size for individual files that can be uploaded via Symplectic Elements is 2GB. In addition, you will only be able to deposit up to a total of 2GB at a time. To deposit more data than this, you will need to return to the deposit record and redeposit additional files.

Data up to 20GB are free to deposit. If the total size of your data exceeds 20GB, there will be a one-off charge of £4 per GB (e.g. a data of 24GB will cost £96 to deposit).

What happens after you have deposited your data

We will respond to you within three working days following your data submission. The DOI for your dataset will be visible in the Symplectic Elements record for your deposit immediately after you have made your submission (find this in the 'DOI' field in the 'Apollo' section of the 'Data sources' box). If the data is associated with a publication (or publications), you should cite your data DOI in the data availability statement in the associated publication(s) and provide the full citation data in the reference list.

If you have specified that your data contains sensitive information, or if we suspect sensitive content based on the information provided, we shall contact you for more information.

If you have submitted your data as a placeholder record (not available for method output types), we will contact you and confirm the data DOI and provide instructions on how to finalise your deposit. We will wait for you to finalise your data but please note that we expect data to be finalised prior to or in time with the release of the associated manuscript by the publisher.

If you have submitted your data as the final version (or after you have finalised your placeholder deposit), we then review your data submission before uploading it to the repository. Please note that our DOI policy states that data cannot be changed (e.g. files added, removed or amended) after they have been approved into the repository. Only finalised data will be processed into the repository. If you wish to submit data in draft form to be finalised later (e.g. after peer review) then choose to submit your data as a placeholder record.

When we review a finalised deposit, we check the following before approving the data into the repository:

Is this data submitted by (or on behalf of) a current/former University of Cambridge researcher, research student, or staff member?
Does the submitter have the rights to share the data via the Repository? See ‘Responsibilities of Depositors’ in the Repository Terms of Use.
Does the data contain any confidential/restricted information?
Do the files open without errors?
Is the submission accompanied by appropriate metadata description? This includes keywords, a detailed data description, software instructions and, if applicable, additional documentation files such as readme file(s), a codebook or data dictionary.
Are the file formats suitable for long-term access and preservation of data files? If not, could the files be exported to a different file format, more suitable for preservation?
If applicable, has the title of the publication associated with the data been provided?
Has an external email address (i.e. not a cam.ac.uk address) been provided?

If your data contains information pertaining to human participants (including de-identified data) we will contact you to ask if you have the correct consent to allow your data to be made publicly available in the Repository. We will also ask you to send us a copy of the consent forms and/or participant information sheets for our review. All data deposited in the Repository are made publicly available.

We expect data to have appropriate metadata supplied so that the contents of the data can be understood and reused by others. If we think that any information is missing, we will get in touch with you to request the missing information.

If all the information is provided, we will upload your data into the repository and send you confirmation that your data has been published. Your data will be linked to the DOI of the associated publication, either at the point of deposit in the repository or at a later date if unavailable beforehand. We will also link your data to the corresponding manuscript in the repository (e.g. the accepted manuscript submitted to the Open Access team) and vice versa. These steps help to increase the findability of your data, enhancing opportunities for data citation. The recommended citation for your data (author(s), publication date, title and DOI) is provided on the page for your data in the repository.

If you have selected to embargo your data, access to the data files can be granted only by the data author via our request a copy service while the files are under embargo. Although the data files are not publicly available while data are embargoed, please note that metadata for your data are publicly visible and findable via search engines. Embargoes on data associated with articles are removed as soon as we are aware that the article has been published .

Quality assurance

Our quality and approval checks for datasets, software/code and method outputs have developed over time. The last four years have seen various process improvements. The Depositor’s Checklist was first made available in November 2022 and aims to assist research staff and students to deposit their data in Apollo and ensure its optimal quality. We will not publish data in Apollo that does not meet our quality standards except under exceptional circumstances. If we do publish a data that does not meet our standards, or if we are informed about any quality issue concerning an existing data, then we will provide metadata in the data record to describe areas of concern. This is to ensure transparency. Notification of any issues with a record in Apollo can be raised by contacting us, stating the issues and providing the record’s DOI.

Frequently asked questions about depositing data

Q. I need a link (a DOI) for my data to include in my manuscript but I’m not ready to upload the final version of my data

A. Submit your data as a ‘placeholder’ record and you will receive a DOI to cite in your manuscript. Reference the data DOI in the data availability statement and the full citation data in the reference list. To submit a placeholder data, complete the mandatory fields in the Elements deposit form, mark the status of the record as ‘placeholder’ and agree to the repository licence agreement. Do not upload any data files unless these will be the final versions for publication in the repository – you will not be able to delete files once uploaded. You can finalise your data later but the data must be finalised before or closely in line with the publication of any manuscripts (e.g. journal articles) associated with the data.

Q. My data contains information derived from human participants. What do I need to consider when depositing my data?

A. Data that identify individuals, either directly or indirectly, cannot be deposited in the repository. There are rare exceptions but these are deposited only if evidence is provided that explicit consent has been obtained from participants that permits sharing the data publicly. The Research Data team checks data files for any personal/sensitive information but it is your responsibility as the data author to ensure that no personal/sensitive data are present. You will need to provide us with evidence (e.g. copies of consent forms, participant information sheets, privacy statements) to prove that you have the correct permissions to share the data publicly. We are particularly interested in the wording used regarding data sharing in the documentation for participants. This is relevant to anonymised as well as pseudonymised data. You will need to provide us with this evidence before citing the DOI for your data anywhere just in case the consents preclude publication of your data in the repository.

Q. The article linked to my data is currently under review and I’m not sure if I’ll need to make any changes to my data as a result of the review process.

A. Submit your data as a placeholder record and finalise your data at a later date after the paper has been reviewed (but before the paper is published). You will not be able to update the final version of your data (e.g. to amend, add, remove files) after it has been published in the repository so it is important that only the final file versions are uploaded.

If you want to update existing files that have already been approved into Apollo, you can do this with DOI versioning. Please email the Research Data Team with the updated files and a description of the update and they will provide you with a new DOI version number which is linked to the original version. We will not be able to remove any previous versions of your data from the repository.

Q. I’m trying to upload a file that’s bigger than 2GB but it’s not working. How do I deposit my data?

A. It is not possible to use Symplectic Elements to upload a single data file larger than 2GB. You will receive an error message if you try to do so and your data will not be deposited. You have two options: (1) if it does not compromise the integrity of your data you can split the large file into smaller files of less than 2GB each – you will need to deposit less than 2GB at a time, returning to your deposit record in order to redeposit the additional files; (2) contact the Research Data team at info@data.cam.ac.uk and we can deposit the >2GB file into the repository on your behalf. See the next question (‘My data is bigger than 20GB. How do I deposit it?) for specific steps to take for each option.

Q. My data is bigger than 20GB. How do I deposit it?

A. If your data exceeds 20GB in size then there are costs associated with depositing your data. This consists of a one-off charge of £4/GB; for example, a dataset of 40GB will incur a single non-recurring charge of £160. Because of the size of the data, we will deposit the data directly into the repository for you. You should contact us, explaining that the data exceeds 20GB and providing us with a link to your data so that we can download the files (e.g. OneDrive, Dropbox, Google Drive link). You will still need to fill in the deposit form in Symplectic Elements and agree to the Repository licence terms. It is essential that you create and upload a README file, or a series of README files, to describe the content of your data. See here for guidance on README file creation.

Q. My data is over 2GB in size and I have several files to deposit. An error message stops me from depositing all the files in my deposit at once. What can I do?

A. Only up to 2GB of data can be deposited at a time. For example, you may have one file of 1GB and another of 0.5GB – you can deposit these together but you will not be able to deposit at the same time an additional 1GB file as this brings the total to 2.5GB, exceeding the system limit. Deposit files that fall under the 2GB limit. You will then be able to add additional files by redepositing to the same deposit record. To do this, take the following steps once you have deposited the initial set of files: (1) click “View your publication details”, which will bring you back to your deposit record; (2) click “View” within the box that displays the files already deposited; (3) this brings you to the ‘Redeposit publication’ page where you can choose the additional files to upload; (4) click the ‘Redeposit’ button and your files will upload. You will need to repeat this process until all your data files have been uploaded. If your data exceeds 20GB in total, or if you experience any problems, then contact the Research Data team.

Q. My data consists of several folders with a number of sub-folders and files. How can I retain the structure?

A. To retain the folder structure you can compress the folder (or series of folders) and uploaded the compressed (e.g. .zip) file. We strongly recommend that you create and upload a README file that describes in detail the contents of your data. This will help others to understand your data and interpret its contents, and will enable it to be reused effectively.

If you have any additional questions about data submissions or data sharing, please also see our FAQ page. If you cannot find the answer to your question, please contact us.

Guidance on the data submission process

Who can deposit data to the Repository

What can be deposited

How to deposit data

Information required when depositing data

Information about the nature of the data

Rights to share the data

Compulsory information about your data

Optional information

Compulsory administrative information

Data files

What happens after you have deposited your data

Quality assurance

Frequently asked questions about depositing data

Study at Cambridge

About the University

Research at Cambridge