This section explains the whys and hows of sharing research data. This is the evidence (digital/non-digital and qualitative/quantitative) that support the answers to your research questions. This data can take many forms, such as texts, images, artworks, movies, sound recordings, interview transcripts, observations, numerical measurements, and so on.
Here, we are talking about sharing data widely, and not just with others who are connected to your research project (such as colleagues or collaborators). In this context, data sharing means making your data available to a broader audience, either by making it publicly accessible as ‘open data’ or available via restricted access if ethical or legal reasons necessitate this. The latter may be the case if the data derive from human participants, are sensitive for commercial, cultural or environment reasons, or if the data fall under copyright restrictions.
These concepts are summarised in the Concordat on Open Research Data (2016), which states that:
“Open research data are those research data that can be freely accessed, used, modified, and shared, provided that there is appropriate acknowledgement if required;
Not all research data can be open and the concordat recognises that access may need to be managed in order to maintain confidentiality, guard against unreasonable cost, protect individuals’ privacy, respect consent terms, as well as managing security or other risks.”
This concordat is relevant to all members of the UK research community and outlines a set of principles and best practice expectations. It is not a rulebook; however, if your research is funded then do familiarise yourself with your funder’s data sharing policy, if they have one.
A forthcoming contribution to this data management guide will provide information specifically for research staff and students who work with sensitive data. Those conducting research involving personal data are strongly encouraged to read University of Cambridge guidance on academic research involving personal data. While most of this section focuses on open data sharing, there are principles, practices and information within that are equally relevant to those who work with data requiring access restrictions.
In this section:
What data do I share?
At this stage, you may be asking yourself the question, ‘what research datasets do I share?’ As a minimum, anticipate sharing the data that support your published findings; in other words, the raw data needed to recreate your findings (and accompanying code, if applicable). This may be quantitative data, qualitative data, images, movies, texts, and so on, that are necessary to confirm the integrity of your reported conclusions at the very least.
Alternatively, you may have a large corpus of data relating to a specific project that you wish to share, with this forming a major research output in its own right. Perhaps you have data that you wish to constitute a collection of items to be explored, cited, reused and built upon by others or yourself.
If the data are sensitive and cannot be shared openly, consider what aspects of your dataset can be made publicly available. For example, you may have raw data comprising responses to a questionnaire. In this case, even if the raw data from the participants cannot be shared then you can share the questionnaire and related metadata so that others can see what you did and build on your work.
It is important to consider any potential costs of sharing your data. This is where developing a data management plan early on in the research project’s lifecycle is essential, at the very least so that you can budget for any data storage, management and sharing costs in grant applications.
Why share your data?
There are several factors worth considering. Before we explore some of these, these are a few questions to ask yourself in relation to the research project (or projects) that you have undertaken, are currently working on, or are planning to do:
- What data do I need to preserve in the long-term so that it is safe and can continue to be of value in the future?
- How can I enhance the impact of my research through sharing data?
- What data does my project funder expect me to share?
- What data does the University of Cambridge expect me to share?
Here is some information to help you to answer these or to explore further.
Ensuring long-term preservation and value
Sharing data in a suitable repository is the best way to keep data safe and accessible in the long-term. Sharing data via project websites or your article’s supplementary information are not recommended for long-term data preservation. For how to do share your data in a repository and what this entails, see How to share your data. This ensures that your data are accessible to both yourself and others in the years to come, and you need not worry about your data becoming lost or destroyed. If you are unable to share your data (e.g. for ethical or legal reasons), or if it does not make sense (or is unfeasible) to deposit all of your research data in a repository, then you will still need to take steps to:
Broadening impact
Sharing the data underlying your findings enables your research to be built upon, giving your data greater intrinsic value and avoiding unnecessary research duplication. Furthermore, access to datasets for secondary use can benefit those researchers who otherwise lack the resources to meet the costs of collecting new data. Data that are made publicly available can have an impact beyond academic contexts, used perhaps in education, policy, media or other innovative ways. Find out about cases where University of Cambridge researchers have had their data reused in these blog posts on the Mammographic Image Society database and open data sharing and reuse.
There is also the advantage that a dataset published in a repository with a permanent identifier (such as a DOI) is both citeable and trackable, making it possible to see how a dataset has been used. In fact, there is evidence to indicate that articles that link to their accompanying datasets in a repository are more likely to be cited. For example, Colavizza and colleagues (2020) demonstrated that articles that include in the data access statement a link (e.g. a DOI) to the accompanying data in a repository are associated with a citation advantage of up to c.25%. This supports the suggestion that making research data openly available has the potential to enhance a researcher’s (or research group’s) visibility, networks and collaborations. Although the journey to recognise open research practices is in the early stages, it is perhaps wise to start this journey sooner rather than later.
Last but not least, inherent in sharing data supporting research findings and, importantly, in sharing it well, are reputational advantages – it shows academic rigour and integrity, supporting research reproducibility (if applicable to the discipline) and research transparency (applicable to all disciplines).
Funder requirements
If your research is funded then it is important to know if your funder has a policy on data sharing, and if they do, what you are expected to do. You can search on your funder’s webpages for this information, or look in Sherpa Juliet, but to help you, we provide a list of a number of major funder’s data sharing requirements. The underlying premise of many funders is that publicly funded research is for the public good and therefore research data should be shared openly, where possible (i.e. where there are no ethical or legal reasons to prevent public release of the data). There are also economic advantages – research has better value for money if all related outputs are shared so that they can be built upon by future research. Most major funders expect researchers in receipt of their grants to share the raw data in support of published findings, not only as a means to make data available for future research but also to give credence to research findings – in other words, the integrity of the research is enhanced via open and transparent practices.
University of Cambridge expectations
The University of Cambridge Research Data Management Policy Framework outlines the responsibilities of research staff and students when it comes to both research data management and data sharing. Related to this is the University of Cambridge Open Research Position Statement. It is important that all are aware of the content of the policy framework and the position statement, regardless of whether or not the research is funded, or funded by a funder with a specific data sharing policy. The data sharing expectations of the University are the same as those of most major funders: data are to be made as open as possible and as closed as necessary. The same principles apply to additional output types, such as protocols, software and code. Of significance here is the San Francisco Declaration on Research Assessment (DORA), which the University of Cambridge signed in 2019. This shows a commitment to DORA’s principles, one of which is to recognise all research outputs for their value and impact, such as datasets and software, so that recognition does not rest solely on research publications.
How to share your data?
Take these quick steps:
- Ensure your dataset adheres to the FAIR principles
- Deposit your dataset in an appropriate data repository (this will help you to meet some of the FAIR principles)
- Include a Data Access Statement in your publication and cite your dataset in your publication’s reference list
If you have followed good research data management throughout the course of a research project then your data will be well-organised, clearly documented and exist in open (non-proprietary) or common file formats. This will make sharing your data relatively easy when the time comes, although it is never too late to implement good data management. In order to share your data, best practice is to deposit your data in an appropriate data repository. Even if there are valid reasons to restrict access to some data (e.g. for data protection, ethical or legal reasons), then associated metadata, other documentation and perhaps a subset of the dataset (i.e. data requiring no restrictions) can still be deposited.
You may be asking ‘What exactly is a data repository and how do I choose one for my data?’. If so, go to our Repository page as here you will find advice on how to choose an appropriate repository. The University of Cambridge has its own institutional repository for Cambridge staff and students – called Apollo – and we provide instructions on how to deposit your dataset in Apollo.
The FAIR Principles
Sharing data in a repository is the best way to meet the FAIR principles. Applying the FAIR principles to your research data means making your data Findable, Accessible, Interoperable and Reusable. The FAIR principles can seem science-focused but they are equally relevant to arts, humanities and social sciences research. The following figure provides a simplified view of what the FAIR principles mean and what you need to do to meet them.
While the FAIR principles should be put into practice throughout the lifecycle of a research project, they are particularly relevant when it comes to sharing your data. It’s important to realise that all datasets can be made FAIR, including those that cannot be shared publicly as open data. The FAIR principles apply equally to datasets requiring restricted access (e.g. for ethical/legal reasons) – in these instances, metadata can be shared even if the dataset cannot, and access conditions should be clearly outlined (see data access statements).
The table below describes six steps to take during the course of your research and when sharing your data in order to make your data FAIR. Putting your data in an appropriate repository can go a long way toward helping you to do this.
Six steps to FAIR data | What the researcher needs to do | What the repository will do |
---|---|---|
Documentation | Document your data by providing any information necessary for the data to be understood by others, or by you in future years. This may include providing code that accompanies your data. You know your data best and are responsible for documenting your data. Create Readme files to describe your data (Readme template). | Some repositories curate dataset submissions and will ask you for additional data documentation if deemed necessary. |
Metadata |
Ensure metadata is provided with your dataset so that the data are easier to discover, understand and reuse. Recording metadata is part of the data documentation process. You may wish to explore metadata standards for your discipline. | Most repositories will collect additional standard metadata as part of the data submission process, some of these you will need to supply when you deposit your data (e.g. by filling in a form) and some will be automatically generated by the repository. |
File formats | Save your data files in appropriate formats (open or common file formats) so that your data can be used easily by others, integrated with other datasets, are easily machine-readable and are accessible in the long term by you and others. | You are responsible for providing your data in common/open formats. Some repositories curate submissions and may request that data are provided in open formats. This is important for long-term data preservation. |
Data access | Make your data accessible by depositing your data as open data in a trusted repository so that anyone with an internet connection will be able to access the data. If access needs to be restricted, be clear to what extent and how your data can be accessed by others. Include this information in a metadata-only repository record and/or in the associated publication’s Data Access Statement. | Some repositories provide managed access to datasets containing sensitive information (e.g. UKDS) where access to data needs to be authorised. |
Link to the data | Obtain a link to your dataset and cite it. Deposit the data in an appropriate repository to ensure that there is a permanent link to your dataset. Ensure you choose a repository that provides persistent identifiers (e.g. DOI). Cite the dataset plus its link in the Data Access Statement and in the manuscript's reference list. | Persistent dataset identifiers should be provided by the repository, which you can cite in research outputs. |
License | When making your data openly available, choose a licence for your dataset that indicates how it can be reused by others. | Once you’ve chosen the licence, the repository should handle the rest by including the licence in the metadata. |
Taking these six steps helps to avoid the problems so aptly revealed in this data management horror story created by NYU Health Science Libraries.
How can I make it easier for others to re-use the materials that I produce?
One relatively simple way to make it easier for others to re-use tools, data or other content that you produce is to add a Creative Commons licence. For example ‘By-Attribution, Non-Commercial’ (CC BY-NC) is a common Creative Commons licence – when you mark your file, image, or information with this, it means that anyone can use your information in any way they like, so long as they attribute it to you and don’t use it for commercial purposes. There are other types of Creative Commons licences, also allowing commerical use, and licences which do not require the re-user to attribute the creator. Creative Commons licences are often used for materials released online, but you can also include these in printed materials if your publisher does not own the rights. For additional information about Creative Commons licence options, visit their website or watch the short video below:
To license something with a Creative Commons licence, you don't need to file any paperwork – just publish (in print or on the web) your materials along with a notification that you are using a particular licence.
IMPORTANT NOTE: Creative Commons licences are 'irrevocable' so do not add a Creative Commons licence unless you are sure that:
- You have the right to publish this information
- You will not want to revoke it later on for any reason.
If you have plans to commercialise your research then we recommend that you contact Cambridge Enterprise for advice before applying a licence to your data, software or code.
Find our more: see our FAQs on licensing data and licensing software.
Data Access Statements
Data access (or availability) statements are applicable to research in all disciplines – arts, humanities, social sciences and STEMM subjects. These are statements included in a section of a publication that specify where the research data (e.g. code, software, numerical data, qualitative data, textual records, images, sounds, objects, manuscripts) associated with the paper can be found and how they can be accessed.
Many publishers will ask authors to include a data access statement in their paper, whereas some publishers do not actively promote this or have processes in place yet to incorporate these in articles. The University of Cambridge Research Data Management policy framework also states that research staff and students are responsible for “Providing a statement in research articles describing how and on what terms any supporting research data may be accessed (or a statement that all data is contained within the article, if there is no supporting research data)”.
Those who are funded by UKRI or any of its councils must include a data access statement in their papers. This is the case with all articles, even those where there are no data associated with the article or the data cannot be accessed (e.g. because there are commercially sensitive or personal/sensitive human participant data). Reasons for limiting access to data must be provided in the statement. If applicable, the data access statement should include a link to the data (e.g. a DOI for the dataset in a repository). To include a data access statement in articles is mandatory for all UKRI grant holders, and has been a requirement since the new UKRI Open Access policy came into effect on 1 April 2022. For more information, see Cambridge’s Open Access website for details of the new UKRI requirements.
To help all research staff and students, regardless of funding held or academic discipline, please find in the table below some examples of data access statements that you might want to use and tailor for your publication (areas for tailoring are noted in italics). You can also find additional information in the ‘Publications’ section of our FAQs page. Please note that we recommend strongly that you include the DOI of your dataset in your data access statement and the full citation for your dataset in your publication’s reference list. Try inputting your dataset DOI into DataCite search (e.g. 10.17863/CAM.xxxxx) and looking under ‘Cite’ for a formatted dataset citation to insert into your publication.
Openly available data |
|
---|---|
Ethical constraints |
|
IP protection or commercial data |
|
Rights held elsewhere prevent data sharing |
|
Data too expensive to be shared |
|
Non-digital data or data not readily available |
|
No new data generated |
|
Additional resources
If you want to further explore any of the issues raised in this section of the data management guide, then you might find the following useful:
- University of Cambridge Open Research website
- Posts with recordings from Unlocking Research, the Open Research at Cambridge blog:
- Open Research at Cambridge – conference opening session (with talks by Professor Anne Ferguson-Smith, Professor Steve Russell, Mandy Hill and Dr Neal Spencer)
- Practical steps toward more reproducible research
- Open data sharing and reuse
- Who are the winners and losers of good data practices?
- Who is reusing data? Successes and future trends
- For more information on the FAIR data principles:
- OpenAIRE provide a useful guide for researchers on the FAIR principles
- ‘How FAIR are your data?’ checklist – use this to assess the extent to which your data meet the FAIR principles and to identify areas may need attention
- Article by Williamson et al. (2016) outlining ‘The FAIR Guiding Principles for scientific data management and stewardship’
- GO FAIR initiative
- The Research Data Management toolkit for Life Sciences - RDM kit