skip to content

Research Data Management

 
Enter keyword(s) to search for FAQs or click here to reset search
Type your question.
Please enter your email.

Collaboration

Q. I am involved in an international collaboration with people who are not RCUK funded and who do not want to share data. What should I do?

Ideally (and in future collaborations) you should inform your potential collaborators that due to your public funding, you are expected to share research data as openly as possible. With your current research project you should determine with your collaborator which data can be shared, which cannot, and describe this in your data management plan. If (some) research data need to be restricted, then you should provide an appropriate statement in your publication explaining the reason why access to data is restricted. Effective data management planning from the outset of a proposed project will help you determine if (some) research data needs to be restricted, and to provide appropriate statements in any publications which arise explaining the reason why access to data is restricted. For more guidance on sample statements please see the FAQ "How do I link to my data? Do you have any template statements that I could adapt for my publication?".

Q. During this research project I realised I had a competitor. Instead of scooping each other, we started collaborating towards a joint publication. My collaborator does not want to share research data. I did not consider this in my data management p...

In this case you should try to convince your collaborator to share your research data.
The fundamental principle is that published research should be open to scrutiny by others. It may help if you ask yourself ‘If I don’t share this data, under any circumstances, and others question the validity of my published my findings, will I have to tell them ‘you just have to take my word for it, I refuse to share data?’ – clearly a situation you would wish to avoid. Therefore, it would be ideal if you could convince your collaborator to share data that is needed to validate research described in your publication, or at least describe the conditions under which you would agree to make it available to anyone wishing to test the robustness of your methodology and hence your published findings.
If your collaborator persistently disagrees with data sharing, you should contact your funder as soon as possible to inform your funder that you will have to restrict the access to your data, and to explain the reason for the restriction.

Costs

Q. Wellcome Trust encourages data sharing and data re-use, but does not allow for costs of long-term data preservation to be budgeted in grant applications. This does not make sense to me.

Wellcome Trust would usually expect costs associated with routine data storage to be met by the institution. They will only consider storage costs associated with large or complex datasets which exceed standard institutional allowances.

Q. Is there any transition fund to pay for data sharing if it has not been budgeted in the grant application?

Unfortunately, there is no central transition fund to pay for sharing of research data, if there was no budget in the grant application. Your department/institute might be able to help you paying for this expense.

Q. Can I ask in my grant for a staff member to help me with data management?

Yes, this is an eligible cost on grant applications: you can request a salary to support a research data manager for your research project, as long as it is justified.

Q. According to CRUK policy, costs for data sharing can be budgeted in grant applications only from August 2015. What about research data from older projects, when these costs were not eligible in grant applications? Is there any transition fund avai...

Unfortunately, there are no additional funds to pay for these costs. Researchers who have older datasets that might be of significant value to the community should contact CRUK – all requests for support will be considered on a case by case basis.

Data Protection

Q. Who can help me with intellectual property rights questions?

Queries concerning IPR conditions in the sponsorship or funding agreement under which your research at the University is undertaken may be directed in the first instance to the appropriate Contracts Manager at the Research Operations Office.

For general questions on IPR, contact the Legal Services Office:

e-mail: legal@admin.cam.ac.uk

For questions touching on commercialisation, contact Cambridge Enterprise:

e-mail: enquiries@enterprise.cam.ac.uk

Q. What online courses are available on data protection?

The University of Cambridge offers self-taught web courses on Data Protection. Following these courses offers a good way of gauging your basic knowledge about the Data Protection Act and your responsibilities (note: you need a Raven login to enrol). They will only take you 30 minutes to complete and are a good resource to get a general understanding of data protection issues.

Additionally, there is also a group workshop on Data Protection and FOI delivered regularly by the University Information Compliance Officer: Data Protection and FOI: An Introduction

Q. What is personal and sensitive data?

Personal data is data relating to a living individual, which allows the individual to be identified from the information itself or from the information plus any other information held by the 'data controller' (or from information available in the public domain). The University of Cambridge as a whole is the data controller.
Sensitive data is personal data about:

  • racial or ethnic origin
  • political opinions
  • religious beliefs
  • Trade Union membership
  • physical and mental health
  • sexual life
  • criminal offences and court proceedings about these

If you would like to learn more about personal and sensitive data and do some practicial excercises on identifying these data types, the University of Cambridge offers short 30-mins long online courses on personal and sensitive data. Additionally, you can also register for a face to face training on Data Protection and FOI, delivered by the University of Cambridge Information Compliance Officer.

Q. What does the law require me to do with data protection?

The Data Protection Act of 1998 gives individuals certain rights, and imposes obligations on those who record and use personal information to be open about how information is used and to follow eight data protection principles. Personal data must be:

  • processed fairly and lawfully
  • obtained for specified and lawful purposes
  • adequate, relevant and not excessive
  • accurate and, where necessary, kept up-to-date
  • not kept for longer than necessary
  • processed in accordance with the subject's rights
  • kept secure
  • not transferred abroad without adequate protection

If you would like to learn more about personal and sensitive data and do some practicial excercises on identifying these data types, the University of Cambridge offers short 30-mins long online courses on personal and sensitive data. Additionally, you can also register for a face to face training on Data Protection and FOI, delivered by the University of Cambridge Information Compliance Officer.

Q. How should I store my sensitive or confidential data?

You should limit physical access to sensitive data or encrypt it (speak with your local IT/Computing Officer or the University Information Services Help Desk for help in doing this). 
To avoid accidentally compromising the data at some future date, you should always store information about the data's sensitivity and any available information on participants' consent or use agreements from your data provider with the data itself (i.e. put information about lawful and ethical data use in your data documentation or metadata description).

Q. How do I share or publish my findings for research using sensitive or confidential data?

There can be a potential conflict between abiding by data protection legislation and ethical guidelines, whilst at the same time fulfilling funder's and individual's requirements to make research results available. Ethics committees may believe that any personal or sensitive data should remain confidential. It is important therefore to distinguish between personal and more general data gathered during research. 
Personal data can be disclosed or shared if the individual has given explicit consent and specified the level at which this should be done. You should always consult with your Faculty Ethics Committee if you are unsure whether the data you wish to share or publish can be used. The University of Cambridge has an Ethics in Research website, which explains when to seek an ethics review and what body to consult. That page includes a handy Ethics Review Flow Chart, the University Guidelines on Ethics in Research, information on applying for ethical approval and information on consent forms.
In some cases, you may be able to anonymise your data in order to share and publish it in more detail. The UK Data Service provides brief Guidance on Anonymisation.

Q. Does my project need a review by a university ethics board?

The University of Cambridge has an Ethics in Research web page, which explains when to seek an ethics review and what body to consult. 
That page includes a handy Ethics Review Flow Chart and the University Guidelines on Ethics in Research.

Q. Data supporting my research is personal or sensitive. How do I share these data?

If your research involves human participants, you need to carefully consider ethical aspects of your research already before the start of your project. You should address these considerations in your data management plan. In most research projects of this type, you ask your participants to fill in a consent form. When you are considering sharing data the consent form should inform the participants about your plans for research data processing, storage and sharing. For example, you can inform your participants that anonymised data will be shared via the University of Cambridge data repository.

There is good guidance on consent forms at the UK Data Archive. The UK Data Archive also provides a sample consent form.

Further guidance on various aspects of personal and sensitive data is available.

Formats

Q. What image format should I use?

Some image formats are better for particular purposes than others. For example, TIFFs preserve digital image information well, but users cannot view them with internet browers and they take up a lot of computer storage space. 
Click here to view pros and cons, along with uses for the most common image formats.

Q. What formats are best for storing files in the short- or medium-term?

Some of the best formats for ensuring that your data are available in the longer term, make it more difficult to extract or alter the raw data (e.g. a PDF versus a Word document). 
If you are actively working in a format that is not good for long term accessiblity (see the answer to the question above), you should save a copy of your most important files in a long-lived format. You can do this either at the end of the project, or intermittenly. 
If you are nearing the end of a project and don't have space to store multiple formats or all of your files (or time to convert them), pick your most vital files, and be sure to keep the longer access version. You may have to re-format or re-copy it later, but you will have a smaller chance of losing the information altogether. 

Q. What formats are best for preserving files in the long term?

Popular formats such as those produced by Microsoft Office products (e.g. Word documents or Excel spreadsheets) are very likely to have reasonable longevity, but be aware that they are proprietary (owned by someone) and so will not necessarily exist forever or remain easily readable. It might be better to store important information in open, non-proprietary formats – for example, PDF rather than Microsoft Word, CSV rather than Excel, TIFF rather than Photoshop files, or as XML rather than a database. 
However, open formats may not support all the functionality found within a proprietary format, or they might result in larger files because they offer less efficient compression of files. Sometimes, you will want to store your data in its original format and also in a more open or accessible format for sharing, archiving, or future use.

Q. What do I need to know about JPEGs?

JPEGs use something called 'lossy' compression to keep your files from being too large. This means that every time you re-save a particular JPEG file, it will lose some information. This will make your image look blurrier and blurrier over time. So, why use JPEGs at all? Answer: JPEG compression allows you to have smaller images for purposes such as web delivery and document embedding, so these are still quite useful.

For important images, or images which you may re-use, you should always keep a master copy in a non-lossy format, such as TIFF or PNG.

Q. What are 'non-proprietary' or 'open' formats, and why would I use them?

In the simplest cases, a non-properitary format is a format which does not have restrictions on its use and over which no one claims intellectual property rights. For example, Microsoft Office products, such as Microsoft Word, are proprietary, while Open Office products are non-proprietary (and open source). 
For long term access to files, digital preservation experts tend to recommend 'non-proprietary' and 'open' formats. The logic here is that if the code behind the software is publically available (i.e. open source), then that format/software will be supported so long as at least one competent tinkerer still finds it interesting or useful. 
In contrast, a private software company can go out of business or stop producing a compatible version of the software in whose format your data was saved, and no one will have the rights or knowledge to provide it anymore. 

Funder policies

Q. Where can I find more information about funders’ requirements for data sharing and help available at the University of Cambridge?

We are regularly running Open Data information sessions, which are created to provide you with information about funders’ requirements for data sharing and support available at the University of Cambridge. You can register for these sessions here:

http://www.data.cam.ac.uk/info-sessions

If you are unable to come to the session, get in touch with us – we can organise a dedicated information session at your department and at your convenience.

Q. When I accepted my EPSRC award in 2010, they did not have expectations on data sharing. Do I need to share my data from that research? If I knew about these expectations, perhaps I would not have accepted the award.

Yes you do, the expectation from 2011 has been that publicly-funded research data will be shared, and it is vital that published research findings are by default made open to scrutiny by the sharing of the underpinning data on which they rely (see also the Royal Society’s report ‘Science as an open enterprise’ and the government’s ‘Open Data’ white paper, both published in 2012). In terms and conditions of awards from Research Councils, it is stated that terms and conditions might change. If you accepted an award from Research Councils you have also agreed to the fact that conditions might change. If there are reasons why you are unable to share your research you will need to make these clear.

Q. My research is partially funded by a commercial company, which does not want research data to be released. What should I do?

Ideally (and in future collaborations) you will already have a collaboration agreement in place which identifies if any data provided by the company is confidential, and which clarifies ownership of exploitation rights to any intellectual property arising from the project. If one is not already in place you should determine with your collaborating company which data can be shared and which cannot, and take steps to reach such an agreement as soon as possible (the Research Operations Office can help with this).
You should always inform any commercial company with which you wish to collaborate that due to your public funding, you are expected to share research data as openly as possible. Effective data management planning from the outset of a proposed project will help you determine if (some) research data needs to be restricted, and to provide appropriate statements in any publications which arise explaining the reason why access to data is restricted. For more guidance on sample statements please see the FAQ "How do I link to my data? Do you have any template statements that I could adapt for my publication?".

Q. I am funded by EPSRC - can I restrict access to my data?

The EPSRC expects you to make your research data publicly available, with as few restrictions as possible. However, there are some exemptions to this. The access to the following types of data can be restricted:

  • Personal data should not be released, unless consent of the person is given; otherwise the data will need to be properly anonymised. Anonymisation can be more complex and time consuming than simply removing someone’s name, so plan ahead (guidance on personal and sensitive data is available).
  • Sensitive data (that would compromise intellectual property, or security) should only be released under carefully controlled conditions and once any necessary permissions are obtained (guidance on personal and sensitive data is available).
  • Reasonable delays/restrictions to data publishing are acceptable if necessary to protect intellectual property or commercially confidential data.
  • If data preservation is not possible or cost-effective, it is acceptable not to publish the data, as long as the ability to validate published research findings is not compromised. For example, suitably documented research methodology and initial conditions allows others in principle to produce an equivalent dataset sufficient to validate the published work.

Q. How do I inform the EPSRC about possible problems with data sharing? They don’t require a data management plan.

Even though the EPSRC does not evaluate data management plans as part of the grant application process, they are clear that all well managed publicly-funded research should include, from the outset, consideration of any potential issues with future data sharing. It is therefore good practice to prepare a data management plan. If applicable, your data management plan should also describe possible solutions to problems with data sharing. Deciding on a research data sharing strategy from the outset of your research may spare you difficulties towards the end of your project. We also encourage you to outline your data sharing intentions, and any constraints that may apply, in your grant application, as even though the EPSRC does not require your plan, this will potentially demonstrate additional value in your proposed research to those who peer review your application.

Licensing

Q. Why can’t I make my data only available to UK taxpayers? Research Councils are paid by UK taxpayers, why should other people benefit from our data sharing? It puts the UK at a competitive disadvantage.

First, researchers should always be willing to provide the evidence that substantiates their published research findings. Second, the move towards Open Science and Open Data is a global cultural change. Similarly to the UK Research Councils, the European Commission, the NIH, Bill & Melinda Gates Foundation and many others have policies on research data sharing. Open Science and Open Data are intended to drive innovation through knowledge exchange across the globe, and this can be achieved only if data is shared as freely as possible.

Q. Which licence should I choose from my data?

The following licences are offered for you to choose from when uploading your research data to us:

  •  CC BY
  •  CC BY-SA
  •  CC BY-ND
  •  CC BY-NC
  •  CC BY-NC-SA
  •  CC BY-NC-ND
  •  GNU GPL v3

Funders do not prescribe any particular licences for datasets, so may decide for yourself which licence best suits your dataset. You can read more about the available licences here and there is also a nice graphic explaining the different licence types which can be viewed here. You might also find the licence selector tool useful.
Our recommended licence is CC BY. CC BY requires end users to cite your data but also allows your dataset to be re-used for multiple purposes (thus maximising the impact of your dataset and the potential number of citations).

Q. What if someone uses my data improperly?

That’s unfortunately always possible and cannot be avoided. Every time you publish, you risk being misinterpreted – that’s also true in the traditional publication process. Publishing research data underpinning your publication actually decreases the risk of your work being misinterpreted or misused, as you can be more transparent about your research findings.

Q. What can other people do with my data?

This will depend on the license that you choose for your data. When you submit your files to the University of Cambridge data repository, we will ask you how do you want to license your data. It is important that you think about this carefully, as this will determine what others can/cannot do with your data. You can read more about available licenses on our website.

Q. Can you recommend a licence for my dataset?

Our recommended licence is CC BY. CC BY requires end users to cite your data but also allows your dataset to be re-used for multiple purposes (thus maximising the impact of your dataset and the potential number of citations).
Funders do not prescribe any particular licences for datasets, so may decide for yourself which licence best suits your dataset. You can read more about the available licences here and there is also a nice graphic explaining the different licence types which can be viewed here.

Q. Can I embargo data that supports my publication (after the article is published)?

Various funders have different policies on this and you should consult the policy of your funder directly. For example, the EPSRC says that research data “is expected to be accessible online no later than the date of first online publication of the article”, whereas if you are funded by the STFC, you can embargo your research data for the period commonly accepted within your community.
There is a list of the policies for the top 20 funders to the University. If your funder is not listed there, you can try searching for the policy of your funder on Sherpa/Juliet website. If your funder’s policy is unavailable on Sherpa/Juliet, you should get in touch with your funder directly.

Management

Q. Where can I get advice on good research data management?

Information on various research data management resources and training is available here: http://www.data.cam.ac.uk/support. Additionally, we can organise a tailored session for you, to meet your needs. To arrange this, you can either e-mail us, or fill in the training request form: http://www.data.cam.ac.uk/training-request.

Q. What is research data?

Almost every funder has its own definition of research data to reflect disciplinary differences. Every research area is different, there are various types of data generated or consulted and these exist in multiple forms and formats. This means the definition of research data also differs.
A cross-disciplinary definition of research data is information that is collected or created to develop claims made in the academic literature.
This includes quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, interview or other methods, or information derived from existing sources. Data can be:
•          Raw or primary data (e.g. direct from measurement or collection)
•          Secondary data processed from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set)
•          Derived from existing sources where the copyright can be externally held
In addition to measurements recording physical conditions/attributes, examples of data are a spreadsheet of statistics, a collection of digital images, a sound recording, transcripts of an interview, survey data and fieldwork observations with appropriate annotations, an artwork, archives, published texts, a manuscript etc.
The essential essence of ‘data’ in terms of open research data is that they are the information necessary to support or validate a research project’s observations, findings or outputs.

Q. What is metadata? Can you give some examples?

Metadata is the description of data. We provide detailed explanation about what metadata is here: http://www.data.cam.ac.uk/data-management-guide/organising-your-data#Metadata
Discipline-specific examples of metadata are provided by the Digital Curation Centre, and can be found here: http://www.dcc.ac.uk/resources/metadata-standards

Q. What if my question is not addressed in FAQ?

Please e-mail us, and we will get back to you shortly.

Q. What data do I need to keep?

The fundamental principle is that published research should be open to scrutiny by others. It may help if you ask yourself ‘If I delete/don’t share this data, under any circumstances, and others question the validity of my published my findings, will I have to tell them ‘you just have to take my word for it, I no longer have/refuse to share data?’ – clearly a situation you would wish to avoid. Therefore, it would be ideal if you could share data that is needed to validate research described in your publication, or at least describe the conditions under which you would agree to make it available to anyone wishing to test the robustness of your methodology and hence your published findings.

Q. I am funded by EPSRC - what happens if I am not compliant with EPSRC expectations?

The EPSRC began checking compliance with their expectations on research data management in 2015 by checking the availability of data under-pinning research papers published after 1st May 2015, examining the following aspects:

  1. Does the published research paper include a statement describing how to access underlying data? (this has been an RCUK-wide requirement since 2013)
  2. If there is no statement – where is the data?
  3. Is there the right type of data available?

Where the checks give rise to cause for concern, individual researchers will be contacted. EPSRC will also investigate any complaints about research data not being managed in line with EPSRC expectations.
EPSRC aims to embed compliance checking as part of regular grant assessment by the Research Councils Audit and Assurance Services Group (AASG). AASG might perform thorough checks on randomly selected grants for their compliance with EPSRC expectations on data sharing.

Q. Does the University recommend any Electronic Lab Notebooks?

The University currently do not recommend any particular Electronic Lab Notebooks solutions. However, there are many options available externally and some are being investigated.

Q. Do I need to make my data intelligible to others?

It is the best practice to make your research data intelligible to others, as this facilitates data re-use. Your data needs to be sufficiently well described to allow validation of your research findings. For more information about good data description practices look here.

Ownership

Q. Who owns my dataset?

University researchers and students retain intellectual property rights where they arise, or the right to apply for such rights, from the results of activities undertaken by University staff in the course of their employment by the University and by students in the course of their study at the University in accordance with Chapter XIII of the University’s Statutes and Ordinances on Finance and Property, subsection Intellectual Property Rights - this also includes datasets.

In other words, unless the contract with your funder (or your collaborators) states otherwise, if you are the creator of your dataset, you will be the primary owner of the intellectual property rights.

Publications

Q. Where in the publication I shall add the statement on data availability?

Many journals have dedicated sections on supplementary information. If your journal has a section like this, add your statement there. If your journals does not have a section on supplementary information, add a statement under a heading 'Data access', before the section where you acknowledge your funders.

Some more examples are available from Nature here.

Q. My publication has already been accepted and I did not provide a statement about data. What shall I do?

First, if you have not shared your research data yet, share it as soon as possible via a suitable data repository. We provide guidance about what to consider when looking for the most suitable data repository here: http://www.data.cam.ac.uk/repository. If you would like to share your data via the institutional repository, you can simply upload your data via Symplectic Elements. Subsequently, ask your publisher if you can add a data availability statement to your publication, which they will typically allow. We will be able to add a link to your paper in the data record in the repository.

You might find this decision tree helpful to guide you through the process.

Q. If my dataset supports a paper, does the publisher own it?

Most of the time when a publisher accepts a manuscript for a publication they will ask you to sign a copyright transfer agreement. In practice these agreements usually mean that you will transfer your copyright to the publisher and you will no longer own any copyright over the published version of your article.

However, if you submit your dataset supporting the publication to the University repository, the University is the publisher of your dataset, but NOT the publisher of your corresponding paper. This means that you can decide under what conditions to make your dataset available.

If you have any questions about data licensing, please email us.

Q. I am funded by EPSRC and I would like to publish several papers out of my data – if I release my data with the first publication, I won’t be able to publish anymore. What shall I do?

If you have a precise plan of future publications, you might indicate in your first publication that the underpinning research data will be made available for validation in the subsequent publication (to be published within xxxxx months).

Q. How do I link to my data? Do you have any template statements that I could adapt for my publication?

Yes, we provide several example statements that you might want to adapt for use in your publication. Please remember to adjust these statements to your situation.

Statements for data openly available

  • Additional data related to this publication is available at the xxxxxxx data repository (add the url to your data).
  • All data accompanying this publication are directly available within the publication.
  • Additional research data supporting this publication are available as 'supplementary files' at the journal's website (add the link to supplementary files)
  • Multiple datasets freely available at various data repositories were used in the publication. All of them are referred to in 'References' section of the paper.

Ethical constraints

  • Overall statistical analysis of research data underpinning this publications is available at the xxxxxxx data repository (add the url to your data). Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, which were impossible to anonymise.
  • Processed, qualitative data from this study is available at the xxxxxxx data repository (add the url to your data). Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, but none of the interviewees consented to data sharing.

IP protection/commercial data

  • Additional data related to this publication contain xxxxxxx, but cannot be released. The data contains confidential information, protected by a non-disclosure agreement (enter the agreement number if available) with (enter the company name if possible). This data can be made available, subject to a non-disclosure agreement.

Data too expensive to be shared

  • The publication is accompanied by a representative sample from the experiment (see xxxxxx - add the URL to your data). Detailed procedures explaining how this representative sample was selected, and how this experiment can be repeated, are provided in the Materials and Methods section. Additional raw data underlying this publication contain (xxxx - add number) additional sample images. These additional images are not shared online due to size of the images (xxxxGB/image); public sharing of these images is not cost-efficient, and the experiment can be easily reproduced.

Non-digital data, or data not readily available

  • Supporting data for this publication is available at the xxxxxxx data repository (add the url to your data); however data is available only in a proprietary file format xxxxx (name of the file format), which can be only open with xxxxxx software.
  • Additional supporting data to this publication contains thirteen non-digital samples of the raw materials tested. Samples are stored at a safe location at xxxxxxxxx (e.g. name your department/institution), and can be made available on request, subject to the requestor travelling to xxxxxxxx (location of the samples).

No new data generated

  • This is a review article, and therefore all data underlying this study is cited in references.

If you need any additional guidance on these statements, please get in touch with us.

Repositories

Q. What will happen to my data if it has not been accessed for 10 years?

If your data has not been accessed for 10 years, we will need to decide (in consultation with researchers) whether to continue keeping your data in the University of Cambridge data repository or not. When making the decision, we will think about the value of your data: can it regenerated? How broad is the community of potential users of your data? How likely will these data be used in the future? Is the software necessary to read and be process your data still available?

Even if it is decided that your data should not be kept, the metadata record will be kept.

Q. What if I cannot pay for my data?

Recovering the costs of data storage is crucial for the sustainability of the University of Cambridge data repository. If you do not have sufficient funds left on your grant application, please contact your department – your department might be able to cover the cost of sharing your research data.
If you are unable to pay for your data from your research grant and your department is unable to help, please e-mail us indicating how much data do you wish to submit, and we will see if we can help. We will consider every request on a case by case basis.

Q. What file formats are accepted by the University of Cambridge data repository?

We will accept any file formats, but wherever possible, we encourage you to submit your data in open file formats, for example gif, png, txt, csv, and others. Open file formats make your data more accessible to others, who might not have the proprietary software to read your files. Additionally, open file formats are less likely to become obsolete in the future.
However, it might be that using proprietary file formats allows richer description or better functionality of your files. In this case, we would encourage you to submit your data in proprietary, and in open file formats. Please provide information about the software required to read and process your files.
If you cannot (easily) export your data to open file formats, please submit your data to us in proprietary format, and provide information about the software that is needed to read and process your files.

Q. I have big data files - how do I upload these to the repository and share them?

You can submit your data to us via Symplectic Elements. Note that a maximum size of an individual file that can be submitted is 2GB, with a maximum size of 20GB for the entire dataset. If you would like to upload larger data files (exceeding 2GB) and you are unable to split your files into smaller chunks or compress them or your entire dataset is larger than 20GB, please e-mail us at info@data.cam.ac.uk and we will discuss alternative options with you.
When it comes to sharing big files via the repository, remember also that the end user will need to download your files in order to re-use them. In other words, in order to be re-usable, your data needs to be downloadable. To help this always consider ways of trying to reduce the size of your files (for example, by compressing them), before sharing.

Q. How will you process the payment for data storage at the University of Cambridge data repository?

If the size of your dataset is above 20GB, we will ask you for the grant code that should be charged for your data submission. The total charge will be £4/GB multiplied by the total amount of GB of data you are submitting to us. So for example, if you are submitting 45GB of data, we will invoice your grant for £180. We will contact the finance manager at your Department to deal with the invoicing (we will cc you to the e-mail), but you will not need to worry about this.

Q. How should I refer to my data deposited at the University of Cambridge data repository?

If your data stored at the University of Cambridge data repository supports a publication, you should add a statement like this to your publication to link to your data:
"Additional data related to this publication is available at the University of Cambridge data repository (INSERT LINK: paste the DOI link to your data, which you can find together with your record at the University of Cambridge data repository)."
If you/someone else refers to your data elsewhere than in your original publication, the following convention can be used:
"Contributing Authors. Publication Year. Dataset Title [format and/or medium]. Publisher/Repository. Link to data."
Some more examples are available from Nature here.

Q. How do I submit my data to the University of Cambridge data repository?

You can submit your data to us via Symplectic Elements. We will check your data before making it live on the repository. You will immediately receive a placeholder DOI upon submission which will resolve once the dataset has been made live in the repository. Guidance about how to upload is available as step-by-step instructions (log in to Moodle required) or in a video.

Q. How do I get the link to my supporting data?

As soon as you have submitted the data via Symplectic Elements you will be emailed with a placeholder DOI for the data. The DOI won't work until after the Research Data Team have reviewed your submission and made it live in the repository but you will be able to add the DOI to your publication if required.

Q. How can I cite somebody else's data?

When citing data you should include enough information so that it can be easily located and identified, ideally by using a persistent link to this data (Digital Object Identifier - DOI). You might wish to cite data using the following format:
Contributing Authors. Publication Year. Dataset Title [format and/or medium]. Publisher/Repository. DOI.

Q. Do I need to pay for sharing my research data via the University data repository?

The current one-off charge for long-term curated data storage at the University of Cambridge data repository is £4/GB for datasets above 20GB. This cost should be budgeted into all future grant applications.
This charge of £4/GB will cover the cost of:

  • Storage in a curated and managed server
  • Hardware and curation
  • Providing a display and search mechanism for your data
  • Backing up your data at three different locations
  • Protection, storage and sharing of your data for as long as it is required by your funder (or for as long as your data are used by others).

Please note that this price is being regularly reviewed and might change in the future. If you have any questions, please contact us.

Q. Do I need a separate persistent link for every supporting file?

You need a link for each record in the repository. Note that every record can contain several items. Typically researchers create a separate dataset record (with a DOI assigned) for each publication.

Q. Can I submit my data to any repository I want?

You need to check if your funder mandates deposit of your data in a specific repository (you can check your funder’s policy here: http://www.data.cam.ac.uk/funders). For example, if you are funded by ESRC you should normally submit your data to the UK Data Service ReShare repository, while if you are funded by NERC you may need to deposit in one of a number of data centres provided for specific types of environmental data. Otherwise, it is up to you and your research group to choose the most appropriate repository for your data. We provide guidance about what to consider when looking for a suitable data repository, together with information on repositories available to you here: http://www.data.cam.ac.uk/repository.

Q. Can I get a a persistent link for my data before the data is completely ready?

Yes. In order to do this, please submit your data to via Symplectic Elements with as much information about your data as possible. If the information is not yet available, please write 'TBC'. For the 'Status' field please select 'Placeholder record'. When asked to upload data files, you can simply upload a placeholder file, like the example attached here, if your data is not ready yet. You will be sent a DOI for the record as soon as you finish the submission.
You will be able to edit the submission whilst the Status is still set to 'Placeholder record'. Once your submission has been finalised you will need to change the Status to 'Final record'. This will prompt the Research Data Team to review your submission and make it live on the repository and then the DOI will resolve within 48 hours of the record going live.

Sharing

Q. What data, and at what level needs to be shared?

As a minimum, you should share research data which is needed to validate findings described in your publication. You, the researcher, are the expert of your own research data and you are in the best position to decide which data is valuable to others, and needed to validate your findings.

For more details you should consult the policy of your funder. There is a list of the policies for the top 20 funders to the University: http://www.data.cam.ac.uk/funders. If your funder is not listed there, you can try searching for the policy of your funder on Sherpa/Juliet website: http://www.sherpa.ac.uk/juliet/. If your funder’s policy is unavailable on Sherpa/Juliet, you should get in touch with your funder directly.

Q. If I share my data via a repository and people can simply download my data, I can no longer collaborate with them to work on the data and I have lost the possibility of getting credit for my data.

Nobody wants to prevent new collaborations from happening. A solution might be to add a statement that you are willing to collaborate in the description of your data. Your data requestor might be interested in collaborating, simply because you know your data the best. Funders also expect that the data re-used by others is appropriately acknowledged/cited, and they want to ensure that due credit results from the secondary use of data.

Q. I own my data - do I need to share it?

Yes, if the sponsor or funder of your research requires you to share your data, you have the responsibility and are obliged to do so.

If an agreement is reached with an external sponsor of research or a third party on behalf of and with the knowledge of the University staff and students, as a condition of sponsorship or research funding the University staff and students must abide by that sponsor’s or third party’s terms and conditions, inclusive of intellectual property rights, research data management and data dissemination procedures.

Q. Do I need to share data underpinning my PhD thesis?

PhD students are encouraged to share research data from their PhD research, providing that:

a). The research process is not damaged by premature and/or inappropriate release of research data. Examples might include:

- if your research is sponsored in part by an industrial collaborator and you are bound by a confidentiality agreement not to disclose some of your data they have provided to you;
- if your research relies on personal data from participants who have not consented to the release of their personal data

b). The research data has been generated in accordance with the University’s Research Policies, the University’s Research Integrity and Ethics guidelines and in accordance with policies of research funders.

In general it is advised that supervisors are always consulted before any research data underpinning PhD research is released.

Q. Do I need to publish data underlying conference publications?

You are always encouraged to publish research data supporting your conference publications, and you are expected to publish your supporting data if the conference publication is peer-reviewed.

Q. Do I have to share all of my research data?

You should share research data which:

  • Is necessary to validate findings described in your publication;
  • Data which might be valuable to others;
  • Data which cannot be re-generated (for example, data coming from environmental observations).

Q. Could I just share research data only when asked for it?

Yes, but only provided that there are legitimate reasons why you cannot make your data openly available. A possible reason might be data containing personal/sensitive information. In circumstances when data is made available via managed access (upon request), data access controls and criteria for what needs to happen for the access to be granted have to be made clear in metadata description.

For more guidance on managed access to research data please see the EAGDA report on data access governance.

Q. Am I expected to share large datasets resulting from bigger projects (databases, long-term datasets) or data supporting individual publications?

Research data that supports individual publications should be made available with a hyperlink to the data. Researchers should also consider and plan more broadly how they can make data assets of value resulting from our funded research available to others in a timely and appropriate manner.

Software

Q. What reuse license should I choose for my source code?

It is important to take care when deciding on the reuse licence you wish to attach to your data as this affects the way others can use your software. The Software Sustainability Institute has created excellent guidelines on open source licenses for software.
If you share your source code via Apollo, the University of Cambridge repository, the default licenses we suggest are: the MIT License, Apache License 2.0, GNU General Public License 3.0, GNU General Public License 2.0, BSD (3-clause) License 2.0.
There are further resources and licenses suggested at the Open Source Initiative.

Q. We have used our own software to generate data, but because this software was not generated with public funding it is not publicly available and we don’t want to share it. The supporting data files cannot be opened in any other software. What shal...

The fundamental principle is that that published research should be open to scrutiny by others - it may help if you ask yourself ‘If I don’t make this code available to anybody else, under any circumstances, and others question the validity of my published my findings, will I have to tell them ‘you just have to take my word for it?’ – clearly a situation you would wish to avoid. Therefore, it would be ideal if you could share your software, or at least describe the conditions under which you would agree to make it available to anyone wishing to test the robustness of your methodology and hence your published findings. If there are compelling reasons not to share your software, then you should at least share the supporting data files. Perhaps at some point alternative software solutions to process these files will become available. 

Q. Most software cannot be patented – how shall I decide if there is a commercial potential and if I should restrict the access to my source code?

You should consider what is more economically beneficial: to share your software openly and allow people to innovate using it, or to commercialise it? You should justify your decision in the data management plan. If you are unsure about what to do, you should consult the Knowledge Transfer Facilitator at your department. The University of Cambridge has a list of Knowledge Transfer Facilitators, or ask Cambridge Enterprise.

Q. Do I need to share my source code?

If your source code is necessary to validate your research findings, then you are expected to share it. Please read the guidelines on software sharing published by the Software Sustainability Institute (SSI), written by Neil Chue Hong (with input from Stephen Eglen from the University of Cambridge and Ben Ryan from EPSRC).

Q. Can I use GitHub to share my source code?

GitHub’s terms of service mean that it does not meet EPSRC’s expectations as a suitable option for the long-term storage/preservation of code.
You might use GitHub as a useful service in real time for work in progress; however, it is recommended that you share your software via a repository that is more suitable for the long-term preservation, for example, Zenodo. Zenodo additionally offers a GitHub plugin, which allows researchers to easily share their GitHub software via Zenodo.

Q. Can I share my source code as binary files instead of executable files?

If binary files are sufficient to validate your findings, you may share your source code as binary files.