Depositor's checklist

Ask yourself the questions in the Depositor's Checklist below as you plan to deposit your research data, and take action to deposit your files in Apollo, the University of Cambridge repository. By following the guidance provided you will enhance the quality of your research data deposit and ensure that you follow any ethical or legal obligations when you upload your files for inclusion in Apollo. Your research data will be easy to find, understand and reuse, minimising risks that your work will be misunderstood by others and increasing the likelihood that it will be cited. You will improve the reproducibility of your research (if relevant to your field) and, more broadly, demonstrate transparency: these are critical contributors to your research integrity.

After you have deposited your files in its final form, we will check your metadata and files (information on our review process). We will contact you if there are any problems with your files, its metadata and documentation. There are unlikely to be any problems if you adhere to the Depositor’s Checklist, and the process of publishing your files in Apollo will be smoother and faster.

How to use the Depositor’s Checklist

Run through the questions in the checklist sequentially. Not all questions will be relevant to you. Select the question to find out more information: the key message for you is summarised first, followed up by more detailed information if needed.

The questions are ordered according to the deposit process, and include what you need to consider:

Before creating a record for your deposit – Questions 1–9
Completing the form – Questions 10–15
Uploading your files – Questions 16–26
After you've deposited your files – Question 27

1. Are the files you want to deposit research data?

Research data are information (quantitative and qualitative) gathered during your research that support your findings, and are produced by all disciplines (arts, humanities, social sciences and STEM fields). You may have research data (to submit as a ‘dataset’ record), or software or code (to submit as a ‘software/code’ output type) or a method or protocol (to submit as a 'method' output type). If associated, code can be submitted with data in the dataset record.

More detailed information:

For example, research data can be measurements, digital images, sound recordings, movies, artwork, survey data, fieldwork observations, interview transcripts, texts, methodology. Research data can be the raw (primary) data that have been collected or measured directly. They can also be secondary data, processed or deriving from existing sources. Research data are produced by researchers across all academic disciplines and via a wide variety of methods (e.g. via experiments, observations, interviews, archival work).
You may have code or software that you wish to publish in Apollo. Set the output type to ‘software/code’ if you have software or source code. If you have code connected directly to the dataset then it is simpler to deposit this with your data files. Depositing as a separate ‘software/code’ item will give you a separate DOI for your code and software-specific licensing options.
If you want to publish a method or protocol in Apollo. Set the output type to 'method'. This may include for example experimental or study protocols, workflows, instruction manuals, survey templates, safety checklists, operational procedures.

2. Can you access your Symplectic Elements account?

You will need to use Elements to deposit into Apollo but if you are an alumnus and the work was produced while at Cambridge then contact us.

More detailed information:

If you a current member of the University of Cambridge and are research staff or a post-graduate student then you should be able to access your Elements account and deposit files from there. See 'Signing in to Elements' for help and our step-by-step instructions on how to deposit via Symplectic Elements
Contact us if you do not have an active Raven account but do have an affiliation with the University of Cambridge and would like to publish your research data in Apollo. Check the repository terms of use for your eligibility.
See:
- How to use Symplectic Elements.

3. Have you checked your funder’s data sharing policy?

Some funders require you to use a particular repository – check your funders data sharing policy. From first to last choice, your chosen repository should be funder-specific (if applicable), discipline-specific (if a trusted repository is available), institutional repository (Apollo), general repository (e.g. Figshare, Zenodo, Dryad).

More detailed information:

Some funders require you to deposit your research data into a specified repository (e.g. NERC require you to use one of their data centres). It is important to be aware of your funder’s data sharing policy (most major funders have these) and meet their requirements. You can use a trustworthy discipline-specific repository, if one exists for your field or type of data. Your next choice should be Apollo as your institutional repository. We recommend these over general-purpose repositories as the latter do not offer discipline-specific metadata standards or provide the same level of deposit review and checks. The absence of these can affect the quality of the published research data record.
See:
- Repository selection guidance
- Funder policies.

4. Do you have the rights to publish the research data in the repository?

You must confirm that you have the authority or permission to deposit your research data in Apollo. You must adhere to ethical agreements (e.g. relating to consent), agreements with commercial sponsors or industry partners, and your funder’s data sharing policy. If your files contains data from a third party then you must ensure you have permission to publish these.

More detailed information:

Subject to any other contracts (such as with your funders, data subjects and academic or external collaborators), University researchers and students retain non-patentable intellectual property (IP) rights (e.g. database rights, copyright) resulting from activities undertaken during the course of employment or study at the University. This includes IP rights in databases and software/code, and is in accordance with Chapter XIII of the University’s Statutes and Ordinances on Finance and Property (subsection, Intellectual Property Rights). Exceptions may exist and it is essential to check contracts or agreements with funders, sponsors, collaborators or other third parties. Cambridge Enterprise provide detailed information on Intellectual Property Rights and are available to advise researchers.
There may exist contractual obligations with your funder/sponsor or other collaborators whereby third parties hold either the rights over the data (e.g. commercial sponsors, Government-funded agencies, charities) or you are contractually obliged to seek their permission to publish. You will need to check contracts and possibly seek permission before sharing the data publicly. As a first step, consult the project PI, your departmental administrator, or the appropriate Contracts Manager at the Research Operations Office – they will be able to check the terms of any contracts (e.g. student sponsorship agreements).
Most major funders have data sharing policies in which they state that their funded researchers are required to make openly available the data that support research findings. Exceptions exist for ethical or legal reasons. The University of Cambridge also supports this stance in its Research Data Management Policy Framework and Open Research Position Statement.
If your research data contains data that originate from a different source then you need to check that you have the rights to share these data within your research data under your chosen licence.
If your data contains information that relates to human subjects then you need to demonstrate that you have consent for data sharing.
See:
- Intellectual Property Rights
- UK Data Service guidance on rights.

5. Does your research output incorporate pre-existing data that originate from another source?

If any third-party data, not collected or generated by you or your research data's co-authors, is present in your data then you need to establish if you have permissions to include these data as part of your research output and, if you do, to reference these data appropriately.

More detailed information:

Your research may have used pre-existing data (e.g. numerical, images, text, audio etc.) and these third-party data may be present in the work you wish to deposit. If this is the case, you need to ensure you have the rights to share these data. Check the licence and terms assigned to the data.
Ensure that the existence of any reused or third-party data in your research output is appropriately documented (e.g. correctly cited with associated terms allowing redistribution provided). This information can be added to a Readme file or the ‘Detailed description’ field in Elements, or uploaded as a separate document (see the variable information log template provided by the UK Data Service for this purpose).
If you do not have the rights to share these data then you need to either obtain permission to share the data from the original data owners or redact the data from your research output, providing metadata to explain the redaction method. Either way, you should cite any reused data sources in the research output’s metadata and in your manuscript.
See:
- UK Data Service guidance on data sharing rights, copyright and documenting sources.

6. Does your research data contain personal, pseudonymised or anonymised data relating to living humans?

You will need to provide proof that the consent obtained from your participants regarding data sharing allows you to publish the data in Apollo as open (publicly available) data.

More detailed information:

If your data derive from living individuals then you need to check that consent has been given that allows you to share the data in your chosen format (i.e. as identifiable individuals, pseudonymised data, or anonymised data). We will ask you to confirm that you have all necessary consents and to supply us with supporting evidence (e.g. participant information sheets, blank consent form, ethics documents, privacy statements).
Currently, we do not provide a service that manages or controls access to restricted deposits: all research outputs in Apollo are made publicly available either at the point of publication or after a finite embargo period (dataset embargoes are lifted when a dataset’s associated manuscript is published). There are alternative repositories that offer this service, such as the UK Data Service.
If your data are anonymised, and consent obtained does not prevent you from sharing these data publicly, then check that all direct and indirect identifiers have been removed. It is the research output author’s responsibility to ensure that data are correctly anonymised.
If your data contain ID numbers or pseudonyms and a key is being securely held by yourself, your institution, partner institution, or data collection agency that connects these IDs/pseudonyms to named individuals, then your data is pseudonymised and not anonymised. We will not accept pseudonymised data in Apollo unless participants have consented to this. There is a risk of reidentification as long as the associated key exists and the IDs/pseudonyms correlate directly with that key – the data can be anonymised by destroying the key or by replacing the IDs with different (preferably random) codes that do not link to the key. If the latter option is chosen then it should not be possible at any point to recombine the key and the data to identify individuals. Pseudonymised data is still personal data (see ICO guidance).
If you think that anonymising your data will reduce its value and utility then consider an alternative repository that offers restricted access to data.
See:
- General guidance on personal and sensitive data and links to other resources
- University policy on the ethics of research involving human participants and personal data
- Data protection in a research context
- UK Data Service guidance on anonymising quantitative data, anonymising qualitative data and consent for data sharing
- Looking for a restricted access repository? Search the registry of research data repositories, which can be filtered by data access options.

7. Are your data commercially sensitive?

If your research is funded by a commercial or industry sponsor then you need to check that your contract with them allows you to share your data publicly. If you intend to commercialise your data (for example by licensing it commercially to an end-user or to another company, filing for IP rights such as patents, creating a start-up company with your data as an asset, or carrying out a consultancy using your data as a tool) then seek advice from us and Cambridge Enterprise before making your data openly available.

More detailed information:

You may be required to not disclose the data publicly. This may be for a fixed period or until approved by a sponsor or funder. Check your contracts regarding data sharing if your research is funded by a commercial/industry sponsor. Contact the project Principal Investigator and appropriate Research Contracts team. You may be registering a patent, in which case it may be necessary to delay a data’s release.
If you wish to commercialise your data then you need to discuss your options with Cambridge Enterprise prior to sharing your data under your chosen licence.
Your research may also receive support from a funding body, such as a research council or charity. It is important to check your funder’s data sharing policy to ensure that you are meeting their requirements.
See:
- Intellectual Property Rights in our data management guide
- Chapter XIII of the University’s Statutes and Ordinances on Finance and Property, subsection Intellectual Property Rights
- Cambridge Enterprise information on Intellectual Property and practical guidance.

8. Does your research data contain environmentally or culturally sensitive information?

If your resarch data contains sensitive environmental or cultural information, where disclosure of the data has the potential to result in harm to the environment or cultural heritage then you may need to redact elements of your data and justify this redaction prior to depositing your data.

More detailed information:

Your data may contain information that can be considered sensitive for environmental or cultural reasons. An example could be location information (e.g. geographical co-ordinates) of endangered plant and animal species, or of archaeological sites that necessitate protection. You can still share these data openly but you may need to redact some information, providing a general description of what has been redacted and why.

9. Have you discussed your research data deposit with any of the co-authors?

If your dataset has co-authors, are they aware that you are depositing the data in Apollo and are they happy with the content, description and licence?

More detailed information:

You may be depositing the data on behalf of the data's author and any co-authors. If so, ensure that you have been given permission to do so.
If you are an author of the data to be deposited and have co-authors (or co-creators) then ensure that they are aware that you are depositing the data in Apollo. This gives your co-authors the opportunity to raise any issues regarding the data prior to publication, and to raise any concerns regarding Intellectual Property Rights, contractual obligations, funder obligations, and licence choice.

10. Have you given your research output a meaningful title?

Name your research output so that it is informative and corresponds to any associated publication. The title will be part of its citation, to be used by you and others. The title must not be identical to the title of any associated publication but it can incorporate the publication title verbatim.

More detailed information:

Your research output title, along with names of the authors and date of publication in Apollo, will form the basis of the citation for your data. This citation will be used by you in the reference list of the article associated with your data, and by others who cite your data.
For data supporting articles we recommend the following convention for your titles:
- ‘Research data supporting [enter title of article/chapter/thesis]’
- ‘Software code supporting [enter title of article/chapter/thesis]’.
- ‘Method supporting [enter title of article/chapter/thesis]’.
The metadata is indexed by Google so having an informative title helps your data to be discovered during internet searches.

11. Have you described your data so that others are able to understand and reuse your data effectively?

Good data documentation avoids the risk that your data will be misinterpreted and misused by others. Ensure your data is well-described in the Elements form so that others can understand what the data contains, the study it relates to, and how the data were created. We encourage you to upload additional documentation files such as Readme file to facilitate understanding and reuse of your data.

More detailed information:

We expect you to provide comprehensive information about your data in the Elements form. This metadata is published alongside your data and is indexed by Google, making your data easier to find in internet searches and more likely to be used and cited.
Use the ‘Detailed description’ field in the form to provide: a description of the dataset or software/code’s content (e.g. files, folders, raw data, processed data, and supporting information such as code, protocols or data dictionaries); information about where the study data derives from; methodologies used to produce and process the data (including instruments and any relevant settings); any limitations of the data; details of any third-party data (e.g. attribution, citations, licence terms).
If the data you are depositing meets a specific metadata standard or schema, please include a reference (URI) to the standard or schema in the data documentation. Incorporating the correct metadata terms into your data documention (either in the Elements form or in a separate file, such as a Readme file) will help you to meet quality standards for your discipline. The Digital Curation Centre provides examples of disciplinary metadata and you can also search Fairsharing.org for metadata standards.
For some datasets the above information may be insufficient to describe your dataset adequately, particularly if the dataset is complex and requires more detailed information to allow others to understand and reuse your data. It may be necessary to upload additional documents to contextualise your data. Depending on the study and resulting data, these may include: Readme files, codebooks, data dictionaries, user guides, protocols, electronic research notebook sections, surveys/questionnaires, participant information sheets, blank consent forms, instrument settings or supporting code.
See:
- Data documentation and metadata guidance
- UK Data Service guidance on documenting datasets and documentation templates
- Downloadable Readme file template (from Cornell University).

12. Have you described the software required to access your files?

Provide software-related instructions that will help others to access and read your files. This is especially important if you have used proprietary, bespoke or uncommon software. If proprietary, what open-source or free software can be used to access your data?

More detailed information:

Information relating to relevant software should be added to the ‘software/usage instructions’ field.
It is particularly necessary to provide software usage instructions and relevant background information when your data are in uncommon formats that derive from specialist software. Relevant information to include are the software version or equipment used to generate the data and any details of alternative software options that allow free access to the data.

13. Have you selected the right licence for your research output?

Creative Commons licences are most commonly used for data. We recommend a CC BY licence for data. Other options are available for data and software/code. This licence selector tool can help you choose.

More detailed information:

We recommend a CC BY (Creative Commons Attribution) licence for datasets. This licence means that the data can be reused as long as correct attribution is given to the dataset and its authors.
Other licences are available for you to choose from for data and software/code, which may be more appropriate. If your dataset contains any third-party data then you need to first check that the terms associated with these data give you rights to redistribute, and if you do then you need to establish if you can share these under your chosen licence.
If your dataset has co-authors, check that you all agree on the licence choice.
See:
- Licensing data
- Licensing software
- Guidance on how to share and licence your data
- Detailed information on licensing data from the Digital Curation Centre.

14. Have you linked to outputs directly related to your data?

Linking your data to its related outputs helps your data to be found via internet searches and contextualises your data more fully, increasing the likelihood that others will view, cite and reuse your data.

More detailed information:

If your data is associated with a specific publication, provide the title(s) and DOI(s) (if known) in the ‘Details of associated publication’ field.
If there are any other related works (e.g. Github repositories, other versions of the dataset, pre-prints), add urls to the ‘Related resources’ field.
Some deposits are standalone research outputs but most are associated with a publication.

15. Have you linked to funding?

Acknowledge any funders and sponsors in the deposit form and by linking your deposit record to specific grants listed in Elements.

More detailed information:

Create a relationship between your deposit record and related grants by using the ‘Relationships’ section to linking to a funding source. Those already attributed to you will be listed for you to select from or you can search for a grant by title or code.
You can acknowledge additional sources in the Elements form under ‘Sponsorship and other sources of funding’.
See:
- Grants in Elements, from Research Information (Raven login required):

16. Have you decided what files to upload?

Upload the data files that support your research findings (e.g. associated with a journal article) or your research project, or elements of your project (e.g. as a standalone research output). Data files must be well-organised and well-documented to maximise understanding and reuse potential.

More detailed information:

Your research output can be comprised of files that support the research findings reported in the associated publication(s) or it can be a standalone research output not linked to a specific publication.
If applicable, you may include raw data as well as the corresponding processed data.
We expect your data files to be well organised and well documented (e.g. with Readme files) so that your data retain value, utility and integrity.
See:
- Guidance on what data to share.

17. Are your files in formats that can be opened by others who do not have access to specialist or proprietary software?

Where possible, data files in proprietary formats should also be made available in open or more common formats. If this is not feasible then instructions should be provided on how to access the data without the user incurring a financial cost.

More detailed information:

If proprietary, you can deposit two copies of the same data: one in the original proprietary format and the other as an open format file exported from within the proprietary software. For example, .csv versions of SPSS (.sav) or Stata (.dta) data, can also be deposited alongside any corresponding codebooks or metadata.
See:
- Choosing file formats.

18. Have you organised and named the files in a way that is easy to understand and reuse?

If your data is structured into folders, then you can upload folders only if they are compressed into files (e.g. a .zip file). Use file names that are informative for the user and provide a description of the contents of each file/folder.

More detailed information:

It is only possible to upload a folder of files if the folder is compressed into, for example, a .zip or .tar.gz file. Alternatively, you may upload the files individuals if you do not need to maintain the structure of your data directory (or hierarchical filing system).
The Apollo record for your deposit will display a list of file names so it is important that the file name is concise but informative. Provide a list of file names and a description of the associated contents in the ‘Detailed description’ metadata and/or in a separate Readme file.
See:
- Naming and organising files.

19. Have you provided a Readme file to describe your data?

Ideally, your data should contain a ‘Readme’ file that describes your data. Others can download this together with your data files, helping them to understand and correctly acknowledge and reuse your data.

More detailed information:

For simple data files (e.g. a single .csv file), it may be possible to capture all information about the data in the deposit form in Elements but you may still wish to have a Readme file that users can download together with your data files. We recommend this regardless of data complexity.
For more complex (and large) data files we expect you to provide a Readme file, or a series of Readme files if there are multiple folders. If this is absent, we will contact you to request that you create and upload one to sit alongside your data.
Readme files are normally .txt format but your data may require relatively extensive documentation, possibly with images and tables. If this is the case, then the file could be provided as a .docx or PDF/A file.
See:
- We recommend the Readme file template and guidance provided by Cornell University.

20. Have you described your variables?

If you have data for specific variables then ensure the variables and any related values are defined. This will help others to understand and reuse your data and avoids the risk of others misinterpreting your data.

More detailed information:

If you have qualitative and/or quantitative data for specific variables then you will need to provide a codebook (or data dictionary) to define, for example, the variables, measurement units, value labels, missing values, acronyms, abbreviations.
Depending on the software that holds your data, you may be able to create a codebook automatically (e.g. SPSS offers this).
See:
- Guide on ‘How to create a data dictionary’.
- ‘Create a codebook’ from the Data Documentation Initiative (DDI).

21. Do you have any code to support the data?

Code relating to your data can be uploaded together with your data files, or you can deposit code as a separate ‘software/code’ output type in Elements.

More detailed information:

You may have code (e.g. Python scripts, R scripts, or SPSS or SAS syntax) that aid representation, understanding and reuse of your data.
Relevant code can be archived in Apollo alongside the dataset. Alternatively, you can deposit code as a separate ‘software/code’ output type in Elements. You may wish to do this if the code is a standalone output and you want to apply a software-related licence. Once published in Apollo, it won’t be possible to amend the code (or data) files but you can deposit a new version using DOI versioning.

22. Do you have any additional supporting documentation that relate to your data?

Upload alongside your data files any additional files necessary for contextualising and reusing your data.

More detailed information:

There may be other files in addition to a standard Readme file and possibly other forms of documentation (e.g. a codebook) that provide valuable information about your data and related research findings. The relevance of this question depends on the nature of your research but may include, for example, copies of surveys, questionnaires, participant information sheets, blank consent forms, instrument settings, protocols, data management plans, excerpts of electronic research notebooks. These files can be uploaded alongside your data files. You can submit methodology related data as a separate 'method' output type in Elements.
Alternatively, you can link to associated documentation located elsewhere by providing the relevant urls in the dataset form under ‘Related resources’. To ensure long-term preservation of any related outputs, we do recommend that any related resources are archived under persistent identifiers (e.g. a DOI) and, preferably, that a copy is also preserved with your dataset.

23. Are your data files the final versions for publication?

Upload only the files for publication in Apollo as it is not possible to change the contents of the files post-publication. You must contact us as soon as possible if you have uploaded incorrect files not meant for publication.

More detailed information:

It is important that the files you deposit are the final versions for publication in Apollo as it is not possible to change the files once published (e.g. to amend, delete or add files). If, in the future, you have an updated version of your dataset then you will need to deposit the new version as a new DOI version.
If you have uploaded the wrong files then contact us as soon as possible and we will remove them for you. The incorrect files must be deleted before your data is published in the repository. The process of publishing data in the repository is not automatic/instant – a member of the Research Data team will review your data and check its contents before approving the record into Apollo.
If you don’t have all the data files ready yet then you can still create a record for your data, make the deposit with or without files, and add your files to the record at a later date. If this is the case, then mark the status of the record as ‘Placeholder’ rather than ‘Final’. You will still receive a DOI for your data that you can cite.

24. Can I update my data after it has been deposited in Apollo?

If you want to update any files that have already been approved into Apollo, then you can do this with DOI versioning. Please contact the Research Data Team and send the updated files and a description on how this differs from the original version. The Research Data Team will create the new DOI version that will be linked to the original version. We will not be able to remove any previous versions of your data from the repository.

25. Have you read the repository terms of use and deposit licence agreement?

You are agreeing to the Repository Deposit Licence Agreement when you deposit your data in Apollo.

More detailed information:

The repository terms of use include information on what can be deposited in Apollo and your responsibilities.
See:
- Apollo governance and policies.

26. Is your data bigger than 2GB?

You can deposit data that exceeds 2GB but you can only deposit up to 2GB of data files at a time. To deposit more than 2GB you will need to return to the same deposit record in Elements and select to redeposit the additional files to that record.

More detailed information:

Only up to 2GB of data can be deposited at a time. For example, you may have one file of 1GB and another of 0.5GB – you can deposit these together but you will not be able to deposit at the same time an additional 1GB file as this brings the total to 2.5GB, exceeding the system limit. Deposit files that fall under the 2GB limit. You will then be able to add additional files by redepositing to the same deposit record. To do this, take the following steps once you have deposited the initial set of files: (1) click “View your publication details”, which will bring you back to your deposit record; (2) click “View” within the box that displays the files already deposited; (3) this brings you to the ‘Redeposit publication’ page where you can choose the additional files to upload; (4) click the ‘Redeposit’ button and your files will upload. You will need to repeat this process until all your data files have been uploaded.
If you want to deposit a single data file that is larger than 2GB then you have two options. Option 1: if it does not compromise the integrity of your data, split the large file into smaller files of less than 2GB each – you will need to deposit less than 2GB at a time, returning to the same deposit record to redeposit the additional files. Option 2: contact the Research Data team and we can deposit the >2GB file into the repository on your behalf.

27. Is your data bigger than 20GB?

If your data exceeds 20GB in size then there is a charge of £4/GB to deposit your data. Contact us and we will guide you through the deposit process.

More detailed information:

There is a one-off charge of £4/GB to deposit data that exceeds 20GB; for example, data of 40GB will incur a single non-recurring charge of £160.
Because of the size of the data, we will deposit the data directly into Apollo for you. You should contact us to explain that the data exceeds 20GB and provide us with a link (e.g. OneDrive, Dropbox, Google Drive) to your data so that we can download the files. We will provide instructions on how to pay the charge.
You will still need to fill in the deposit form in Elements and agree to the Repository licence agreement by making a deposit, with or without files.
It is essential that you upload at least one Readme file to describe the content of your data.
See:
- Information on repository charges
- Readme file template and guidance.

28. Have you written your data access statement?

If your data relates to a publication then you must reference the data in your paper’s data access statement, providing the dataset DOI in the statement and the full data citation (including the DOI) in your paper’s reference list.

More detailed information:

Those who are funded by UKRI or any of its councils must include a data access statement in their papers.
The University of Cambridge Research Data Management policy framework also states that research staff and students are responsible for providing a statement in research articles describing how and on what terms any supporting research data may be accessed.
See:

How to use the Depositor’s Checklist

Study at Cambridge

About the University

Research at Cambridge