skip to content
 

Ask yourself the questions in the Depositor's Checklist below as you plan to deposit, and take action to deposit, your dataset in Apollo, the University of Cambridge repository. By following the guidance provided you will enhance the quality of your dataset and ensure that you follow any ethical or legal obligations when you upload your dataset for inclusion in Apollo. Your dataset will be easy to find, understand and reuse, minimising risks that your data will be misunderstood by others and increasing the likelihood that it will be cited. You will improve the reproducibility of your research (if relevant to your field) and, more broadly, demonstrate transparency: these are critical contributors to your research integrity. 

After you have deposited your dataset in its final form, we will check your metadata and files (information on our review process). We will contact you if there are any problems with your dataset, its metadata and documentation. There are unlikely to be any problems if you adhere to the Depositor’s Checklist, and the process of publishing your dataset in Apollo will be smoother and faster.

How to use the Depositor’s Checklist

Run through the questions in the checklist sequentially. Not all questions will be relevant to you. Select the question to find out more information: the key message for you is summarised first, followed up by more detailed information if needed.

The questions are ordered according to the data deposit process, and include what you need to consider:

Before creating a record for your dataset – Questions 1–9
Completing the dataset form – Questions 10–15
Uploading your files – Questions 16–26
After you've deposited your dataset – Question 27

 

1. Are the files you want to deposit research data?

Research data are information (quantitative and qualitative) gathered during your research that support your findings, and are produced by all disciplines (arts, humanities, social sciences and STEM fields). You may have research data (to submit as a ‘dataset’ record), or software or code (to submit as a ‘software/code’ output type). If associated, code can be submitted with data in the dataset record.

More detailed information:

  • For example, research data can be measurements, digital images, sound recordings, movies, artwork, survey data, fieldwork observations, interview transcripts, texts. Research data can be the raw (primary) data that have been collected or measured directly. They can also be secondary data, processed or deriving from existing sources. Research data are produced by researchers across all academic disciplines and via a wide variety of methods (e.g. via experiments, observations, interviews, archival work).
  • You may have code or software that you wish to publish in Apollo. Set the output type to ‘software/code’ if you have software or source code. If you have code connected directly to the dataset then it is simpler to deposit this with your data files. Depositing as a separate ‘software/code’ item will give you a separate DOI for your code and software-specific licensing options. 
2. Can you access your Symplectic Elements account?

You will need to use Elements to deposit your dataset but if you are an alumnus and the data was produced while at Cambridge then contact us.

More detailed information:

  • If you a current member of the University of Cambridge and are research staff or a post-graduate student then you should be able to access your Elements account and deposit a dataset from there. See 'Signing in to Elements' for help and our step-by-step instructions on how to deposit a dataset.
  • Contact us if you do not have an active Raven account but do have an affiliation with the University of Cambridge and would like to publish your dataset in Apollo. Check the repository terms of use for your eligibility.
  • See: 
3. Have you checked your funder’s data sharing policy?

Some funders require you to use a particular repository – check your funders data sharing policy. From first to last choice, your chosen repository should be funder-specific (if applicable), discipline-specific (if a trusted repository is available), institutional repository (Apollo), general repository (e.g. Figshare, Zenodo, Dryad).

More detailed information:

  • Some funders require you to deposit your research data into a specified repository (e.g. NERC require you to use one of their data centres). It is important to be aware of your funder’s data sharing policy (most major funders have these) and meet their requirements. You can use a trustworthy discipline-specific repository, if one exists for your field or type of data. Your next choice should be Apollo as your institutional repository. We recommend these over general-purpose repositories as the latter do not offer discipline-specific metadata standards or provide the same level of dataset review and checks. The absence of these can affect the quality of the published dataset record.  
  • See:
4. Do you have the rights to publish the dataset in the repository?

You must confirm that you have the authority or permission to deposit your dataset in Apollo. You must adhere to ethical agreements (e.g. relating to consent), agreements with commercial sponsors or industry partners, and your funder’s data sharing policy. If your dataset contains data from a third party then you must ensure you have permission to publish these within your dataset. 

More detailed information:

  • Subject to any other contracts (such as with your funders, data subjects and academic or external collaborators), University researchers and students retain non-patentable intellectual property (IP) rights (e.g. database rights, copyright) resulting from activities undertaken during the course of employment or study at the University. This includes IP rights in databases and software/code, and is in accordance with Chapter XIII of the University’s Statutes and Ordinances on Finance and Property (subsection, Intellectual Property Rights). Exceptions may exist and it is essential to check contracts or agreements with funders, sponsors, collaborators or other third parties. Cambridge Enterprise provide detailed information on Intellectual Property Rights and are available to advise researchers.
  • There may exist contractual obligations with your funder/sponsor or other collaborators whereby third parties hold either the rights over the data (e.g. commercial sponsors, Government-funded agencies, charities) or you are contractually obliged to seek their permission to publish. You will need to check contracts and possibly seek permission before sharing the data publicly. As a first step, consult the project PI, your departmental administrator, or the appropriate Contracts Manager at the Research Operations Office – they will be able to check the terms of any contracts (e.g. student sponsorship agreements). 
  • Most major funders have data sharing policies in which they state that their funded researchers are required to make openly available the data that support research findings. Exceptions exist for ethical or legal reasons. The University of Cambridge also supports this stance in its Research Data Management Policy Framework and Open Research Position Statement.
  • If your dataset contains data that originate from a different source then you need to check that you have the rights to share these data within your dataset under your chosen licence.
  • If your dataset contains information that relates to human subjects then you need to demonstrate that you have consent for data sharing.
  • See: 
5. Does your dataset incorporate pre-existing data that originate from another source?

If any third-party data, not collected or generated by you or your dataset’s co-authors, is present in your dataset then you need to establish if you have permissions to include these data as part of your research output and, if you do, to reference these data appropriately.

More detailed information:

  • Your research may have used pre-existing data (e.g. numerical, images, text, audio etc.) and these third-party data may be present in the dataset you wish to deposit. If this is the case, you need to ensure you have the rights to share these data. Check the licence and terms assigned to the data. 
  • Ensure that the existence of any reused or third-party data in your dataset is appropriately documented (e.g. correctly cited with associated terms allowing redistribution provided). This information can be added to a Readme file or the ‘Detailed description’ field in Elements, or uploaded as a separate document (see the variable information log template provided by the UK Data Service for this purpose). 
  • If you do not have the rights to share these data then you need to either obtain permission to share the data from the original data owners or redact the data from your dataset, providing metadata to explain the redaction method. Either way, you should cite any reused data sources in the dataset’s metadata and in your manuscript.
  • See: 
6. Does your dataset contain personal, pseudonymised or anonymised data relating to living humans?

You will need to provide proof that the consent obtained from your participants regarding data sharing allows you to publish the data in Apollo as open (publicly available) data.

More detailed information:

  • If your data derive from living individuals then you need to check that consent has been given that allows you to share the data in your chosen format (i.e. as identifiable individuals, pseudonymised data, or anonymised data). We will ask you to confirm that you have all necessary consents and to supply us with supporting evidence (e.g. participant information sheets, blank consent form, ethics documents, privacy statements). 
  • Currently, we do not provide a service that manages or controls access to restricted datasets: all datasets in Apollo are made publicly available either at the point of publication or after a finite embargo period (dataset embargoes are lifted when a dataset’s associated manuscript is published). There are alternative repositories that offer this service, such as the UK Data Service.
  • If your data are anonymised, and consent obtained does not prevent you from sharing these data publicly, then check that all direct and indirect identifiers have been removed. It is the dataset author’s responsibility to ensure that data are correctly anonymised. 
  • If your data contain ID numbers or pseudonyms and a key is being securely held by yourself, your institution, partner institution, or data collection agency that connects these IDs/pseudonyms to named individuals, then your data is pseudonymised and not anonymised. We will not accept pseudonymised data in Apollo unless participants have consented to this. There is a risk of reidentification as long as the associated key exists and the IDs/pseudonyms correlate directly with that key – the data can be anonymised by destroying the key or by replacing the IDs with different (preferably random) codes that do not link to the key. If the latter option is chosen then it should not be possible at any point to recombine the key and the dataset to identify individuals. Pseudonymised data is still personal data (see ICO guidance). 
  • If you think that anonymising your data will reduce its value and utility then consider an alternative repository that offers restricted access to data.
  • See: 
7. Are your data commercially sensitive?

If your research is funded by a commercial or industry sponsor then you need to check that your contract with them allows you to share your data publicly. If you intend to commercialise your dataset (for example by licensing it commercially to an end-user or to another company, filing for IP rights such as patents, creating a start-up company with your data as an asset, or carrying out a consultancy using your data as a tool) then seek advice from us and Cambridge Enterprise before making your data openly available. 

More detailed information:

8. Does your data contain environmentally or culturally sensitive information?

If your data contain sensitive environmental or cultural information, where disclosure of the data has the potential to result in harm to the environment or cultural heritage then you may need to redact elements of your data and justify this redaction prior to depositing your dataset. 

More detailed information:

  • Your data may contain information that can be considered sensitive for environmental or cultural reasons. An example could be location information (e.g. geographical co-ordinates) of endangered plant and animal species, or of archaeological sites that necessitate protection. You can still share these data openly but you may need to redact some information, providing a general description of what has been redacted and why. 
9. Have you discussed your dataset deposit with any co-authors of the dataset?

If your dataset has co-authors, are they aware that you are depositing a dataset in Apollo and are they happy with the dataset’s content, description and licence?

More detailed information:

  • You may be depositing the dataset on behalf of the dataset author and any co-authors. If so, ensure that you have been given permission to do so.
  • If you are an author of the dataset and have dataset co-authors (or co-creators) then ensure that they are aware that you are depositing the dataset in Apollo. This gives your co-authors the opportunity to raise any issues regarding the dataset prior to publication, and to raise any concerns regarding Intellectual Property Rights, contractual obligations, funder obligations, and dataset licence choice.
10. Have you given your dataset a meaningful title?

Name your dataset so that it is informative and corresponds to any associated publication. The dataset title will be part of its citation, to be used by you and others. The dataset title must not be identical to the title of any associated publication but it can incorporate the publication title verbatim. 

More detailed information:

  • Your dataset title, along with names of the dataset’s authors and date of publication in Apollo, will form the basis of the citation for your dataset. This citation will be used by you in the reference list of the article associated with your dataset, and by others who cite your dataset. 
  • For dataset’s supporting articles we recommend the following convention for your dataset title: ‘Research data supporting [enter title of article/chapter/thesis]’. Your dataset metadata is indexed by Google so having an informative title helps your dataset to be discovered during internet searches.
11. Have you described your dataset so that others are able to understand and reuse your data effectively?

Good data documentation avoids the risk that your data will be misinterpreted and misused by others. Ensure your dataset is well-described in the Elements form so that others can understand what the dataset contains, the study it relates to, and how the data were created. We encourage you to upload additional dataset documentation files to facilitate understanding and reuse of your data. 

More detailed information:

  • We expect you to provide comprehensive information about your dataset in the Elements form. This metadata is published alongside your dataset and is indexed by Google, making your dataset easier to find in internet searches and more likely to be used and cited. 
  • Use the ‘Detailed description’ field in the form to provide: a description of the dataset’s content (e.g. files, folders, raw data, processed data, and supporting information such as code, protocols or data dictionaries); information about the study the dataset derives from; methodologies used to produce and process the data (including instruments and any relevant settings); any limitations of the dataset; details of any third-party data (e.g. attribution, citations, licence terms).
  • If the data you are depositing meets a specific metadata standard or schema, please include a reference (URI) to the standard or schema in the data documentation. Incorporating the correct metadata terms into your data documention (either in the Elements form or in a separate file, such as a Readme file) will help you to meet quality standards for your discipline. The Digital Curation Centre provides examples of disciplinary metadata and you can also search Fairsharing.org for metadata standards.  
  • For some datasets the above information may be insufficient to describe your dataset adequately, particularly if the dataset is complex and requires more detailed information to allow others to understand and reuse your data. It may be necessary to upload additional documents to contextualise your data. Depending on the study and resulting data, these may include: Readme files, codebooks, data dictionaries, user guides, protocols, electronic research notebook sections, surveys/questionnaires, participant information sheets, blank consent forms, instrument settings or supporting code. 
  • See: 
12. Have you described the software required to access your data files?

Provide software-related instructions that will help others to access and read your data. This is especially important if you have used proprietary, bespoke or uncommon software. If proprietary, what open-source or free software can be used to access your data? 

More detailed information:

  • Information relating to relevant software should be added to the ‘software/usage instructions’ field.
  • It is particularly necessary to provide software usage instructions and relevant background information when your data are in uncommon formats that derive from specialist software. Relevant information to include are the software version or equipment used to generate the data and any details of alternative software options that allow free access to the data. 
13. Have you selected the right licence for your dataset?

Creative Commons licences are most commonly used for data. We recommend a CC BY licence for data. Other options are available for data and software/code. This licence selector tool can help you choose.

More detailed information:

  • We recommend a CC BY (Creative Commons Attribution) licence for datasets. This licence means that the data can be reused as long as correct attribution is given to the dataset and its authors. 
  • Other licences are available for you to choose from for data and software/code, which may be more appropriate. If your dataset contains any third-party data then you need to first check that the terms associated with these data give you rights to redistribute, and if you do then you need to establish if you can share these under your chosen licence.
  • If your dataset has co-authors, check that you all agree on the licence choice.
  • See: 
14. Have you linked to outputs directly related to your data?

Linking your dataset to its related outputs helps your dataset to be found via internet searches and contextualises your dataset more fully, increasing the likelihood that others will view, cite and reuse your data. 

More detailed information:

  • If your dataset is associated with a specific publication, provide the title(s) and DOI(s) (if known) in the ‘Details of associated publication’ field. 
  • If there are any other related works (e.g. Github repositories, other versions of the dataset, pre-prints), add urls to the ‘Related resources’ field.
  • Some datasets are standalone research outputs but most are associated with a publication. 
15. Have you linked to funding?

Acknowledge any funders and sponsors in the dataset form and by linking your dataset record to specific grants listed in Elements. 

More detailed information:

  • Create a relationship between your dataset record and related grants by using the ‘Relationships’ section to linking to a funding source. Those already attributed to you will be listed for you to select from or you can search for a grant by title or code. 
  • You can acknowledge additional sources in the Elements form under ‘Sponsorship and other sources of funding’. 
  • See: 
16. Have you decided what files to upload?

Upload the data files that support your research findings (e.g. associated with a journal article) or your research project, or elements of your project (e.g. as a standalone dataset). Datasets must be well-organised and well-documented to maximise understanding and reuse potential. 

More detailed information:

  • Your dataset can be comprised of files that support the research findings reported in the associated publication(s) or it can be a standalone dataset not linked to a specific publication.
  • If applicable, you may include raw data as well as the corresponding processed data. 
  • We expect your data files to be well organised and well documented (e.g. with Readme files) so that your data retain value, utility and integrity. 
  • See: 
17. Are your files in formats that can be opened by others who do not have access to specialist or proprietary software?

Where possible, data files in proprietary formats should also be made available in open or more common formats. If this is not feasible then instructions should be provided on how to access the data without the user incurring a financial cost. 

More detailed information:

  • If proprietary, you can deposit two copies of the same data: one in the original proprietary format and the other as an open format file exported from within the proprietary software. For example, .csv versions of SPSS (.sav) or Stata (.dta) data, can also be deposited alongside any corresponding codebooks or metadata.
  • See: 
18. Have you organised and named the files in your dataset in a way that is easy to understand and reuse?

If your dataset is structured into folders, then you can upload folders only if they are compressed into files (e.g. a .zip file). Use file names that are informative for the user and provide a description of the contents of each file/folder.

More detailed information:

  • It is only possible to upload a folder of files if the folder is compressed into, for example, a .zip or .tar.gz file. Alternatively, you may upload the files individuals if you do not need to maintain the structure of your data directory (or hierarchical filing system). 
  • The Apollo record for your dataset will display a list of file names so it is important that the file name is concise but informative. Provide a list of file names and a description of the associated contents in the ‘Detailed description’ metadata and/or in a separate Readme file.
  • See: 
19. Have you provided a Readme file to describe your dataset?

Ideally, your dataset should contain a ‘Readme’ file that describes your dataset. Others can download this together with your data files, helping them to understand and correctly acknowledge and reuse your dataset. 

More detailed information:

  • For simple datasets (e.g. a single .csv file), it may be possible to capture all information about the dataset in the dataset form in Elements but you may still wish to have a Readme file that users can download together with your dataset files. We recommend this regardless of dataset complexity.
  • For more complex datasets  (and large datasets) we expect you to provide a Readme file, or a series of Readme files if there are multiple folders. If this is absent, we will contact you to request that you create and upload one to sit alongside your dataset. 
  • Readme files are normally .txt format but your dataset may require relatively extensive documentation, possibly with images and tables. If this is the case, then the file could be provided as a .docx or PDF/A file.
  • See: 
20. Have you described your variables?

If you have data for specific variables then ensure the variables and any related values are defined. This will help others to understand and reuse your data and avoids the risk of others misinterpreting your data.  

More detailed information:

  • If you have qualitative and/or quantitative data for specific variables then you will need to provide a codebook (or data dictionary) to define, for example, the variables, measurement units, value labels, missing values, acronyms, abbreviations.
  • Depending on the software that holds your data, you may be able to create a codebook automatically (e.g. SPSS offers this). 
  • See: 
21. Do you have any code to support the data?

Code relating to your data can be uploaded together with your data files, or you can deposit code as a separate ‘software/code’ output type in Elements.

More detailed information:

  • You may have code (e.g. Python scripts, R scripts, or SPSS or SAS syntax) that aid representation, understanding and reuse of your data.
  • Relevant code can be archived in Apollo alongside the dataset. Alternatively, you can deposit code as a separate ‘software/code’ output type in Elements. You may wish to do this if the code is a standalone output and you want to apply a software-related licence. Once published in Apollo, it won’t be possible to amend the code (or data) files but you can deposit a new version as a separate record if required. (Note: DOI versioning is not currently available in Apollo but this will be possible in 2023.)
22. Do you have any additional supporting documentation that relate to your data?

Upload alongside your data files any additional files necessary for contextualising and reusing your dataset. 

More detailed information:

  • There may be other files in addition to a standard Readme file and possibly other forms of documentation (e.g. a codebook) that provide valuable information about your dataset and related research findings. The relevance of this question depends on the nature of your research but may include, for example, copies of surveys, questionnaires, participant information sheets, blank consent forms, instrument settings, protocols, data management plans, excerpts of electronic research notebooks. These files can be uploaded alongside your data files.
  • Alternatively, you can link to associated documentation located elsewhere by providing the relevant urls in the dataset form under ‘Related resources’. To ensure long-term preservation of any related outputs, we do recommend that any related resources are archived under persistent identifiers (e.g. a DOI) and, preferably, that a copy is also preserved with your dataset. 
23. Are your data files the final versions for publication?

Upload only the files for publication in Apollo as it is not possible to change the contents of a dataset post-publication. You must contact us as soon as possible if you have uploaded incorrect files not meant for publication.

More detailed information:

  • It is important that the files you deposit are the final versions for publication in Apollo as it is not possible to change datasets once published (e.g. to amend, delete or add files). If, in the future, you have an updated version of your dataset then you will need to deposit the new version as a new record – this will have a new DOI but we will link different versions of the same dataset so that the most recent version is easy to find. (Note: DOI versioning is not currently available in Apollo but this will be possible in 2023.)
  • If you have uploaded the wrong files then contact us as soon as possible and we will remove them for you. The incorrect files must be deleted before your dataset is published in the repository. The process of publishing datasets in the repository is not automatic/instant – a member of the Research Data team will review your dataset and check its contents before approving the record into Apollo.
  • If you don’t have all the data files ready yet then you can still create a record for your dataset, make the deposit with or without files, and add your files to the record at a later date. If this is the case, then mark the status of the record as ‘Placeholder’ rather than ‘Final’. You will still receive a DOI for your dataset that you can cite. 
24. Have you read the repository terms of use and deposit licence agreement?

You are agreeing to the Repository Deposit Licence Agreement when you deposit your dataset in Apollo. 

More detailed information:

25. Is your dataset bigger than 2GB?

You can deposit a dataset that exceeds 2GB but you can only deposit up to 2GB of data files at a time. To deposit more than 2GB you will need to return to the same dataset record in Elements and select to redeposit the additional files to that record.

More detailed information:

  • Only up to 2GB of data can be deposited at a time. For example, you may have one file of 1GB and another of 0.5GB – you can deposit these together but you will not be able to deposit at the same time an additional 1GB file as this brings the total to 2.5GB, exceeding the system limit. Deposit files that fall under the 2GB limit. You will then be able to add additional files by redepositing to the same dataset record. To do this, take the following steps once you have deposited the initial set of files: (1) click “View your publication details”, which will bring you back to your dataset record; (2) click “View” within the box that displays the files already deposited; (3) this brings you to the ‘Redeposit publication’ page where you can choose the additional files to upload; (4) click the ‘Redeposit’ button and your files will upload. You will need to repeat this process until all your data files have been uploaded.
  • If you want to deposit a single data file that is larger than 2GB then you have two options. Option 1: if it does not compromise the integrity of your data, split the large file into smaller files of less than 2GB each – you will need to deposit less than 2GB at a time, returning to your dataset publication details to redeposit the additional files. Option 2: contact the Research Data team and we can deposit the >2GB file into the repository on your behalf. 
26. Is your dataset bigger than 20GB?

If your dataset exceeds 20GB in size then there is a charge of £4/GB to deposit your data. Contact us and we will guide you through the deposit process. 

More detailed information:

  • There is a one-off charge of £4/GB to deposit a dataset that exceeds 20GB; for example, a dataset of 40GB will incur a single non-recurring charge of £160. 
  • Because of the size of the dataset, we will deposit the dataset directly into Apollo for you. You should contact us to explain that the dataset exceeds 20GB and provide us with a link (e.g. OneDrive, Dropbox, Google Drive) to your dataset so that we can download the files. We will provide instructions on how to pay the charge.
  • You will still need to fill in the dataset form in Elements and agree to the Repository licence agreement by making a deposit, with or without files. 
  • It is essential that you upload at least one Readme file to describe the content of your dataset.
  • See: 
27. Have you written your data access statement?

If your dataset relates to a publication then you must reference the dataset in your paper’s data access statement, providing the dataset DOI in the statement and the full dataset citation (including the DOI) in your paper’s reference list. 

More detailed information: