skip to content
 

This section explains the whys and hows of sharing research data. This is the evidence (digital/non-digital and qualitative/quantitative) that support the answers to your research questions. This data can take many forms, such as texts, images, artworks, movies, sound recordings, interview transcripts, observations, numerical measurements, and so on.

Here, we are talking about sharing data widely, and not just with others who are connected to your research project (such as colleagues or collaborators). In this context, data sharing means making your data available to a broader audience, either by making it publicly accessible as ‘open data’ or available via restricted access if ethical or legal reasons necessitate this. The latter may be the case if the data derive from human participants, are sensitive for commercial, cultural or environment reasons, or if the data fall under copyright restrictions. 

These concepts are summarised in the Concordat on Open Research Data (2016), which states that:

Open research data are those research data that can be freely accessed, used, modified, and shared, provided that there is appropriate acknowledgement if required;

Not all research data can be open and the concordat recognises that access may need to be managed in order to maintain confidentiality, guard against unreasonable cost, protect individuals’ privacy, respect consent terms, as well as managing security or other risks.”
 

This concordat is relevant to all members of the UK research community and outlines a set of principles and best practice expectations. It is not a rulebook; however, if your research is funded then do familiarise yourself with your funder’s data sharing policy, if they have one.

A forthcoming contribution to this data management guide will provide information specifically for research staff and students who work with sensitive data. Those conducting research involving personal data are strongly encouraged to read University of Cambridge guidance on academic research involving personal data. While most of this section focuses on open data sharing, there are principles, practices and information within that are equally relevant to those who work with data requiring access restrictions. 

In this section:

What data do I share?

Why share your data?

How to share your data?

Data access statements

Additional resources

 


What data do I share?

At this stage, you may be asking yourself the question, ‘what research datasets do I share?’ As a minimum, anticipate sharing the data that support your published findings; in other words, the raw data needed to recreate your findings (and accompanying code, if applicable). This may be quantitative data, qualitative data, images, movies, texts, and so on, that are necessary to confirm the integrity of your reported conclusions at the very least.

Alternatively, you may have a large corpus of data relating to a specific project that you wish to share, with this forming a major research output in its own right. Perhaps you have data that you wish to constitute a collection of items to be explored, cited, reused and built upon by others or yourself. 

If the data are sensitive and cannot be shared openly, consider what aspects of your dataset can be made publicly available. For example, you may have raw data comprising responses to a questionnaire. In this case, even if the raw data from the participants cannot be shared then you can share the questionnaire and related metadata so that others can see what you did and build on your work. 

It is important to consider any potential costs of sharing your data. This is where developing a data management plan early on in the research project’s lifecycle is essential, at the very least so that you can budget for any data storage, management and sharing costs in grant applications. 

Back to top


Why share your data?

There are several factors worth considering. Before we explore some of these, these are a few questions to ask yourself in relation to the research project (or projects) that you have undertaken, are currently working on, or are planning to do: 

  • What data do I need to preserve in the long-term so that it is safe and can continue to be of value in the future?
  • How can I enhance the impact of my research through sharing data? 
  • What data does my project funder expect me to share?
  • What data does the University of Cambridge expect me to share?

Here is some information to help you to answer these or to explore further.

Ensuring long-term preservation and value

Sharing data in a suitable repository is the best way to keep data safe and accessible in the long-term. Sharing data via project websites or your article’s supplementary information are not recommended for long-term data preservation. For how to do share your data in a repository and what this entails, see How to share your data. This ensures that your data are accessible to both yourself and others in the years to come, and you need not worry about your data becoming lost or destroyed. If you are unable to share your data (e.g. for ethical or legal reasons), or if it does not make sense (or is unfeasible) to deposit all of your research data in a repository, then you will still need to take steps to:

Broadening impact

Sharing the data underlying your findings enables your research to be built upon, giving your data greater intrinsic value and avoiding unnecessary research duplication. Furthermore, access to datasets for secondary use can benefit those researchers who otherwise lack the resources to meet the costs of collecting new data. Data that are made publicly available can have an impact beyond academic contexts, used perhaps in education, policy, media or other innovative ways. Find out about cases where University of Cambridge researchers have had their data reused in these blog posts on the Mammographic Image Society database and open data sharing and reuse

There is also the advantage that a dataset published in a repository with a permanent identifier (such as a DOI) is both citeable and trackable, making it possible to see how a dataset has been used. In fact, there is evidence to indicate that articles that link to their accompanying datasets in a repository are more likely to be cited. For example, Colavizza and colleagues (2020) demonstrated that articles that include in the data access statement a link (e.g. a DOI) to the accompanying data in a repository are associated with a citation advantage of up to c.25%. This supports the suggestion that making research data openly available has the potential to enhance a researcher’s (or research group’s) visibility, networks and collaborations. Although the journey to recognise open research practices is in the early stages, it is perhaps wise to start this journey sooner rather than later. 

Last but not least, inherent in sharing data supporting research findings and, importantly, in sharing it well, are reputational advantages – it shows academic rigour and integrity, supporting research reproducibility (if applicable to the discipline) and research transparency (applicable to all disciplines). 

Funder requirements 

If your research is funded then it is important to know if your funder has a policy on data sharing, and if they do, what you are expected to do. You can search on your funder’s webpages for this information, or look in Sherpa Juliet, but to help you, we provide a list of a number of major funder’s data sharing requirements. The underlying premise of many funders is that publicly funded research is for the public good and therefore research data should be shared openly, where possible (i.e. where there are no ethical or legal reasons to prevent public release of the data). There are also economic advantages – research has better value for money if all related outputs are shared so that they can be built upon by future research. Most major funders expect researchers in receipt of their grants to share the raw data in support of published findings, not only as a means to make data available for future research but also to give credence to research findings – in other words, the integrity of the research is enhanced via open and transparent practices. 

University of Cambridge expectations 

The University of Cambridge Research Data Management Policy Framework outlines the responsibilities of research staff and students when it comes to both research data management and data sharing. Related to this is the University of Cambridge Open Research Position Statement. It is important that all are aware of the content of the policy framework and the position statement, regardless of whether or not the research is funded, or funded by a funder with a specific data sharing policy. The data sharing expectations of the University are the same as those of most major funders: data are to be made as open as possible and as closed as necessary. The same principles apply to additional output types, such as protocols, software and code. Of significance here is the San Francisco Declaration on Research Assessment (DORA), which the University of Cambridge signed in 2019. This shows a commitment to DORA’s principles, one of which is to recognise all research outputs for their value and impact, such as datasets and software, so that recognition does not rest solely on research publications.

Back to top


How to share your data?

Take these quick steps:

  1. Ensure your dataset adheres to the FAIR principles
  2. Deposit your dataset in an appropriate data repository (this will help you to meet some of the FAIR principles)
  3. Include a Data Access Statement in your publication and cite your dataset in your publication’s reference list

If you have followed good research data management throughout the course of a research project then your data will be well-organised, clearly documented and exist in open (non-proprietary) or common file formats. This will make sharing your data relatively easy when the time comes, although it is never too late to implement good data management. In order to share your data, best practice is to deposit your data in an appropriate data repository. Even if there are valid reasons to restrict access to some data (e.g. for data protection, ethical or legal reasons), then associated metadata, other documentation and perhaps a subset of the dataset (i.e. data requiring no restrictions) can still be deposited. 

You may be asking ‘What exactly is a data repository and how do I choose one for my data?’. If so, go to our Repository page as here you will find advice on how to choose an appropriate repository. The University of Cambridge has its own institutional repository for Cambridge staff and students –  called Apollo – and we provide instructions on how to deposit your dataset in Apollo.

The FAIR Principles

Sharing data in a repository is the best way to meet the FAIR principles. Applying the FAIR principles to your research data means making your data Findable, Accessible, Interoperable and Reusable. The FAIR principles can seem science-focused but they are equally relevant to arts, humanities and social sciences research. The following figure provides a simplified view of what the FAIR principles mean and what you need to do to meet them.

FAIR data for researchers

While the FAIR principles should be put into practice throughout the lifecycle of a research project, they are particularly relevant when it comes to sharing your data. It’s important to realise that all datasets can be made FAIR, including those that cannot be shared publicly as open data. The FAIR principles apply equally to datasets requiring restricted access (e.g. for ethical/legal reasons) – in these instances, metadata can be shared even if the dataset cannot, and access conditions should be clearly outlined (see data access statements).

The table below describes six steps to take during the course of your research and when sharing your data in order to make your data FAIR. Putting your data in an appropriate repository can go a long way toward helping you to do this. 

Six steps to FAIR data What the researcher needs to do What the repository will do
Documentation Document your data by providing any information necessary for the data to be understood by others, or by you in future years. This may include providing code that accompanies your data. You know your data best and are responsible for documenting your data. Create Readme files to describe your data (Readme template). Some repositories curate dataset submissions and will ask you for additional data documentation if deemed necessary.

Metadata

Ensure metadata is provided with your dataset so that the data are easier to discover, understand and reuse. Recording metadata is part of the data documentation process. You may wish to explore metadata standards for your discipline. Most repositories will collect additional standard metadata as part of the data submission process, some of these you will need to supply when you deposit your data (e.g. by filling in a form) and some will be automatically generated by the repository.
File formats Save your data files in appropriate formats (open or common file formats) so that your data can be used easily by others, integrated with other datasets, are easily machine-readable and are accessible in the long term by you and others. You are responsible for providing your data in common/open formats. Some repositories curate submissions and may request that data are provided in open formats. This is important for long-term data preservation.
Data access Make your data accessible by depositing your data as open data in a trusted repository so that anyone with an internet connection will be able to access the data. If access needs to be restricted, be clear to what extent and how your data can be accessed by others. Include this information in a metadata-only repository record and/or in the associated publication’s Data Access Statement. Some repositories provide managed access to datasets containing sensitive information (e.g. UKDS) where access to data needs to be authorised. 
Link to the data Obtain a link to your dataset and cite it. Deposit the data in an appropriate repository to ensure that there is a permanent link to your dataset. Ensure you choose a repository that provides persistent identifiers (e.g. DOI). Cite the dataset plus its link in the Data Access Statement and in the manuscript's reference list. Persistent dataset identifiers should be provided by the repository, which you can cite in research outputs.
License When making your data openly available, choose a licence for your dataset that indicates how it can be reused by others. Once you’ve chosen the licence, the repository should handle the rest by including the licence in the metadata.

 

Taking these six steps helps to avoid the problems so aptly revealed in this data management horror story created by NYU Health Science Libraries.

 

How can I make it easier for others to re-use the materials that I produce?

One relatively simple way to make it easier for others to re-use tools, data or other content that you produce is to add a Creative Commons licence. For example ‘By-Attribution, Non-Commercial’  (CC BY-NC) is a common Creative Commons licence – when you mark your file, image, or information with this, it means that anyone can use your information in any way they like, so long as they attribute it to you and don’t use it for commercial purposes. There are other types of Creative Commons licences, also allowing commerical use, and licences which do not require the re-user to attribute the creator. Creative Commons licences are often used for materials released online, but you can also include these in printed materials if your publisher does not own the rights. For additional information about Creative Commons licence options, visit their website or watch the short video below:

To license something with a Creative Commons licence, you don't need to file any paperwork – just publish (in print or on the web) your materials along with a notification that you are using a particular licence.

IMPORTANT NOTE: Creative Commons licences are 'irrevocable' so do not add a Creative Commons licence unless you are sure that:

  1. You have the right to publish this information
  2. You will not want to revoke it later on for any reason.

If you have plans to commercialise your research then we recommend that you contact Cambridge Enterprise for advice before applying a licence to your data, software or code.  

Find our more: see our FAQs on licensing data and licensing software.

Back to top


Data Access Statements

Data access (or availability) statements are applicable to research in all disciplines – arts, humanities, social sciences and STEMM subjects. These are statements included in a section of a publication that specify where the research data (e.g. code, software, numerical data, qualitative data, textual records, images, sounds, objects, manuscripts) associated with the paper can be found and how they can be accessed. 

Many publishers will ask authors to include a data access statement in their paper, whereas some publishers do not actively promote this or have processes in place yet to incorporate these in articles. The University of Cambridge Research Data Management policy framework also states that research staff and students are responsible for “Providing a statement in research articles describing how and on what terms any supporting research data may be accessed (or a statement that all data is contained within the article, if there is no supporting research data)”. 

Those who are funded by UKRI or any of its councils must include a data access statement in their papers. This is the case with all articles, even those where there are no data associated with the article or the data cannot be accessed (e.g. because there are commercially sensitive or personal/sensitive human participant data). Reasons for limiting access to data must be provided in the statement. If applicable, the data access statement should include a link to the data (e.g. a DOI for the dataset in a repository). To include a data access statement in articles is mandatory for all UKRI grant holders, and has been a requirement since the new UKRI Open Access policy came into effect on 1 April 2022. For more information, see Cambridge’s Open Access website for details of the new UKRI requirements. 

To help all research staff and students, regardless of funding held or academic discipline, please find in the table below some examples of data access statements that you might want to use and tailor for your publication (areas for tailoring are noted in italics). You can also find additional information in the ‘Publications’ section of our FAQs page. Please note that we recommend strongly that you include the DOI of your dataset in your data access statement and the full citation for your dataset in your publication’s reference list. Try inputting your dataset DOI into DataCite search (e.g. 10.17863/CAM.xxxxx) and looking under ‘Cite’ for a formatted dataset citation to insert into your publication.

Openly available data
  • Additional data related to this publication are available at the xxxxx data repository at the following link (add the url to your data - preferably this will be a permanent identifier such as a DOI).
  • All data accompanying this publication are available within the publication.
  • Additional research data supporting this publication are available as 'supplementary files' at the journal's website (add the link to supplementary files).
  • Multiple datasets freely available at various data repositories were used in the publication. All of them are referred to in the 'References' section of the paper.
Ethical constraints
  • Overall statistical analysis of research data underpinning this publication is available at the xxxxx data repository (add the url to your data, e.g. the DOI). Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, which were impossible to anonymise.
  • Processed, qualitative data from this study are available at the xxxxx data repository (add the url to your data, e.g. the DOI). Additional raw data related to this publication cannot be openly released; the raw data contains transcripts of interviews, but none of the interviewees consented to data sharing.
IP protection or commercial data
  • Additional data related to this publication contain xxxxx (describe the data), but these data cannot be released publicly. The data contain confidential information, protected by a non-disclosure agreement (enter the agreement number if available) with (enter the company name if possible). This data can be made available, subject to a non-disclosure agreement.
Rights held elsewhere prevent data sharing
  • The raw data used in this study cannot be shared as the copyright is held elsewhere (e.g. of digital images, maps, texts, audio-visual material). Instead, metadata describing the raw data has been created as part of this study and are available at the xxxxx data repository at the following link (add the url to your data - preferably this will be a permanent identifier such as a DOI).
  • This study involved the secondary analysis of pre-existing datasets. All datasets are referred to in the text and cited in the reference list. Licensing restrictions prevent sharing of these data in association with this study. 
Data too expensive to be shared
  • The publication is accompanied by a representative sample from the experiment (see xxxxx - add the url to your data - preferably this will be a permanent identifier such as a DOI). Detailed procedures explaining how this representative sample was selected, and how this experiment can be repeated, are provided in xxxxx (e.g. Materials and Methods section, publicly available protocol [include link to protocol, e.g. DOI]). Additional raw data underlying this publication consist of (describe the data that is too sizeable to share). These additional files are not shared online due to their size (e.g. sample images of xxxGB/image); public sharing of these images is not cost-efficient, and the experiment can be easily reproduced. 
Non-digital data or data not readily available
  • Supporting data for this publication are available at the xxxxx data repository (add the url to your data - preferably this will be a permanent identifier such as a DOI); however, data are available only in a proprietary file format xxxxx (name of the file format), which can be opened only with xxxxx software.
  • Additional supporting data for this publication consists of xxxxx (describe the number of samples and what they consist of). Samples are stored at a safe location at xxxxx (e.g. name your department/institution), and can be made available on request, subject to the requestor travelling to xxxxx (location of the samples).
  • This study involved examination of physical records (e.g. of museum objects) held at (e.g. name of museum/institute). Relevant accession/reference numbers and associated information are provided in documentation archived at the xxxxx data repository at the following link (add the url to your data - preferably this will be a permanent identifier such as a DOI).
No new data generated
  • This is a review article and generated no new data. All data underlying this study are cited in the references.

Back to top


Additional resources

If you want to further explore any of the issues raised in this section of the data management guide, then you might find the following useful:

Back to top