Q. Can I submit my data to any repository I want?
You need to check if your funder mandates deposit of your data in a specific repository (you can check your funder’s policy here: http://www.data.cam.ac.uk/funders). For example, if you are funded by ESRC you should normally submit your data to the UK Data Service ReShare repository, while if you are funded by NERC you may need to deposit in one of a number of data centres provided for specific types of environmental data. Otherwise, it is up to you and your research group to choose the most appropriate repository for your data. We provide guidance about what to consider when looking for a suitable data repository, together with information on repositories available to you here: http://www.data.cam.ac.uk/repository.
Q. How do I submit my data to the University of Cambridge data repository?
You can submit your data to us via our simple webform: http://www.data.cam.ac.uk/upload. We will upload your data to the repository and provide you a persistent link identifier within three working days.
Q. I have big data files - how do I upload these to the repository and share them?
You can submit your data to us via our simple webform: http://www.data.cam.ac.uk/upload. Note that a maximum size of a file that can be submitted is 2GB. This limit applies to the size of individual files (your file collection can be larger). If you would like to upload larger data files (exceeding 2GB) and you are unable to split your files into smaller chunks or compress them, please e-mail us at firstname.lastname@example.org and we will discuss alternative options with you.
When it comes to sharing big files via the repository, remember also that the end user will need to download your files in order to re-use them. In other words, in order to be re-usable, your data needs to be downloadable. To help this always consider ways of trying to reduce the size of your files (for example, by compressing them), before sharing.
Q. Do I need to pay for sharing my research data via the University data repository?
The current one-off charge for long-term curated data storage at the University of Cambridge data repository is £4/GB for datasets above 20GB. This cost should be budgeted into all future grant applications.
This charge of £4/GB will cover the cost of:
- Storage in a curated and managed server
- Hardware and curation
- Providing a display and search mechanism for your data
- Backing up your data at three different locations
- Protection, storage and sharing of your data for as long as it is required by your funder (or for as long as your data are used by others).
Please note that this price is being regularly reviewed and might change in the future. If you have any questions, please contact us.
Q. How will you process the payment for data storage at the University of Cambridge data repository?
If the size of your dataset is above 20GB, we will ask you for the grant code that should be charged for your data submission. The total charge will be £4/GB multiplied by the total amount of GB of data you are submitting to us. So for example, if you are submitting 45GB of data, we will invoice your grant for £180. We will contact the finance manager at your Department to deal with the invoicing (we will cc you to the e-mail), but you will not need to worry about this.
Q. What if I cannot pay for my data?
Recovering the costs of data storage is crucial for the sustainability of the University of Cambridge data repository. If you do not have sufficient funds left on your grant application, please contact your department – your department might be able to cover the cost of sharing your research data.
If you are unable to pay for your data from your research grant and your department is unable to help, please e-mail us indicating how much data do you wish to submit, and we will see if we can help. We will consider every request on a case by case basis.
Q. What will happen to my data if it has not been accessed for 10 years?
If your data has not been accessed for 10 years, we will need to decide (in consultation with researchers) whether to continue keeping your data in the University of Cambridge data repository or not. When making the decision, we will think about the value of your data: can it regenerated? How broad is the community of potential users of your data? How likely will these data be used in the future? Is the software necessary to read and be process your data still available?
Even if it is decided that your data should not be kept, the metadata record will be kept.
Q. What file formats are accepted by the University of Cambridge data repository?
We will accept any file formats, but wherever possible, we encourage you to submit your data in open file formats, for example gif, png, txt, csv, and others. Open file formats make your data more accessible to others, who might not have the proprietary software to read your files. Additionally, open file formats are less likely to become obsolete in the future.
However, it might be that using proprietary file formats allows richer description or better functionality of your files. In this case, we would encourage you to submit your data in proprietary, and in open file formats. Please provide information about the software required to read and process your files.
If you cannot (easily) export your data to open file formats, please submit your data to us in proprietary format, and provide information about the software that is needed to read and process your files.
Q. How do I get the link to my supporting data?
After a data submission is received via the website upload form (www.data.cam.ac.uk/upload), the Research Data Team will review the submission and upload it to the data repository (which will generate a unique DOI). Subsequently, we will contact the data submitter with the link to the data (within 3 working days from receiving the initial data submission).
Q. Do I need a separate persistent link for every supporting file?
You need a link for each record in the repository. Note that every record can contain several items. Typically researchers create a separate dataset record (with a DOI assigned) for each publication.
Q. Can I get a a persistent link for my data before the data is completely ready?
Yes. In order to do this, please submit your data to us by filling in the data submission form (www.data.cam.ac.uk/upload) with as much information about your data as possible. If the information is not yet available, please write 'TBC'. The form will also ask you to submit data files together with the form. Please submit a placeholder record, like the example attached here.
We will send you a persistent link to your data within three working days. You will need to send us your 'real' data files before the time of publication of the corresponding research paper. Replacing the data files will not affect the persistent URL.
Q. How can I cite somebody else's data?
When citing data you should include enough information so that it can be easily located and identified, ideally by using a persistent link to this data (Digital Object Identifier - DOI). You might wish to cite data using the following format:
Contributing Authors. Publication Year. Dataset Title [format and/or medium]. Publisher/Repository. DOI.
Q. How should I refer to my data deposited at the University of Cambridge data repository?
If your data stored at the University of Cambridge data repository supports a publication, you should add a statement like this to your publication to link to your data:
"Additional data related to this publication is available at the University of Cambridge data repository (INSERT LINK: paste the DOI link to your data, which you can find together with your record at the University of Cambridge data repository)."
If you/someone else refers to your data elsewhere than in your original publication, the following convention can be used:
"Contributing Authors. Publication Year. Dataset Title [format and/or medium]. Publisher/Repository. Link to data."
Some more examples are available from Nature here.
Q. Where in the publication I shall add the statement on data availability?
Many journals have dedicated sections on supplementary information. If your journal has a section like this, add your statement there. If your journals does not have a section on supplementary information, add a statement under a heading 'Data access', before the section where you acknowledge your funders.
Some more examples are available from Nature here.
Q. How do I link to my data? Do you have any template statements that I could adapt for my publication?
Yes, we provide several example statements that you might want to adapt for use in your publication. Please remember to adjust these statements to your situation.
Statements for data openly available
- Additional data related to this publication is available at the xxxxxxx data repository (add the url to your data).
- All data accompanying this publication are directly available within the publication.
- Additional research data supporting this publication are available as 'supplementary files' at the journal's website (add the link to supplementary files)
- Multiple datasets freely available at various data repositories were used in the publication. All of them are referred to in 'References' section of the paper.
- Overall statistical analysis of research data underpinning this publications is available at the xxxxxxx data repository (add the url to your data). Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, which were impossible to anonymise.
- Processed, qualitative data from this study is available at the xxxxxxx data repository (add the url to your data). Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, but none of the interviewees consented to data sharing.
IP protection/commercial data
- Additional data related to this publication contain xxxxxxx, but cannot be released. The data contains confidential information, protected by a non-disclosure agreement (enter the agreement number if available) with (enter the company name if possible). This data can be made available, subject to a non-disclosure agreement.
Data too expensive to be shared
- The publication is accompanied by a representative sample from the experiment (see xxxxxx - add the URL to your data). Detailed procedures explaining how this representative sample was selected, and how this experiment can be repeated, are provided in the Materials and Methods section. Additional raw data underlying this publication contain (xxxx - add number) additional sample images. These additional images are not shared online due to size of the images (xxxxGB/image); public sharing of these images is not cost-efficient, and the experiment can be easily reproduced.
Non-digital data, or data not readily available
- Supporting data for this publication is available at the xxxxxxx data repository (add the url to your data); however data is available only in a proprietary file format xxxxx (name of the file format), which can be only open with xxxxxx software.
- Additional supporting data to this publication contains thirteen non-digital samples of the raw materials tested. Samples are stored at a safe location at xxxxxxxxx (e.g. name your department/institution), and can be made available on request, subject to the requestor travelling to xxxxxxxx (location of the samples).
No new data generated
- This is a review article, and therefore all data underlying this study is cited in references.
If you need any additional guidance on these statements, please get in touch with us.