Q. What is research data?
Almost every funder has its own definition of research data to reflect disciplinary differences. Every research area is different, there are various types of data generated or consulted and these exist in multiple forms and formats. This means the definition of research data also differs.
A cross-disciplinary definition of research data is information that is collected or created to develop claims made in the academic literature.
This includes quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, interview or other methods, or information derived from existing sources. Data can be:
• Raw or primary data (e.g. direct from measurement or collection)
• Secondary data processed from primary data for subsequent analysis or interpretation (e.g. cleaned up or as an extract from a larger data set)
• Derived from existing sources where the copyright can be externally held
In addition to measurements recording physical conditions/attributes, examples of data are a spreadsheet of statistics, a collection of digital images, a sound recording, transcripts of an interview, survey data and fieldwork observations with appropriate annotations, an artwork, archives, published texts, a manuscript etc.
The essential essence of ‘data’ in terms of open research data is that they are the information necessary to support or validate a research project’s observations, findings or outputs.
Q. What data, and at what level needs to be shared?
As a minimum, you should share research data which is needed to validate findings described in your publication. You, the researcher, are the expert of your own research data and you are in the best position to decide which data is valuable to others, and needed to validate your findings.
For more details you should consult the policy of your funder. There is a list of the policies for the top 20 funders to the University: http://www.data.cam.ac.uk/funders. If your funder is not listed there, you can try searching for the policy of your funder on Sherpa/Juliet website: http://www.sherpa.ac.uk/juliet/. If your funder’s policy is unavailable on Sherpa/Juliet, you should get in touch with your funder directly.
Q. What data do I need to keep?
The fundamental principle is that published research should be open to scrutiny by others. It may help if you ask yourself ‘If I delete/don’t share this data, under any circumstances, and others question the validity of my published my findings, will I have to tell them ‘you just have to take my word for it, I no longer have/refuse to share data?’ – clearly a situation you would wish to avoid. Therefore, it would be ideal if you could share data that is needed to validate research described in your publication, or at least describe the conditions under which you would agree to make it available to anyone wishing to test the robustness of your methodology and hence your published findings.
Q. Do I have to share all of my research data?
You should share research data which:
- Is necessary to validate findings described in your publication;
- Data which might be valuable to others;
- Data which cannot be re-generated (for example, data coming from environmental observations).
Q. I own my data - do I need to share it?
Yes, if the sponsor or funder of your research requires you to share your data, you have the responsibility and are obliged to do so.
If an agreement is reached with an external sponsor of research or a third party on behalf of and with the knowledge of the University staff and students, as a condition of sponsorship or research funding the University staff and students must abide by that sponsor’s or third party’s terms and conditions, inclusive of intellectual property rights, research data management and data dissemination procedures.
Q. I am funded by EPSRC - what happens if I am not compliant with EPSRC expectations?
The EPSRC will begin checking compliance with their expectations on research data management after the summer break 2015: they will do this by checking the availability of data under-pinning research papers published after 1st May 2015, examining the following aspects:
- Does the published research paper include a statement describing how to access underlying data? (this has been an RCUK-wide requirement since 2013)
- If there is no statement – where is the data?
- Is there the right type of data available?
Where the checks give rise to cause for concern, individual researchers will be contacted. EPSRC will also investigate any complaints about research data not being managed in line with EPSRC expectations.
EPSRC aims to embed compliance checking as part of regular grant assessment by the Research Councils Audit and Assurance Services Group (AASG). AASG might perform thorough checks on randomly selected grants for their compliance with EPSRC expectations on data sharing.
Q. Do I need to share data underpinning my PhD thesis?
PhD students are encouraged to share research data from their PhD research, providing that:
a). The research process is not damaged by premature and/or inappropriate release of research data. Examples might include:
- if your research is sponsored in part by an industrial collaborator and you are bound by a confidentiality agreement not to disclose some of your data they have provided to you;
- if your research relies on personal data from participants who have not consented to the release of their personal data
b). The research data has been generated in accordance with the University’s Research Policies, the University’s Research Integrity and Ethics guidelines and in accordance with policies of research funders.
In general it is advised that supervisors are always consulted before any research data underpinning PhD research is released.
Q. Do I need to make my data intelligible to others?
It is the best practice to make your research data intelligible to others, as this facilitates data re-use. Your data needs to be sufficiently well described to allow validation of your research findings. For more information about good data description practices look here.
Q. What is metadata? Can you give some examples?
Metadata is the description of data. We provide detailed explanation about what metadata is here: http://www.data.cam.ac.uk/data-management-guide/organising-your-data#Metadata
Discipline-specific examples of metadata are provided by the Digital Curation Centre, and can be found here: http://www.dcc.ac.uk/resources/metadata-standards
Q. My publication has already been accepted and I did not provide a statement about data. What shall I do?
First, if you have not shared your research data yet, share it as soon as possible via a suitable data repository. We provide guidance about what to consider when looking for the most suitable data repository here: http://www.data.cam.ac.uk/repository. If you would like to share your data via the institutional repository, you can simply send your data to us by filling in a form available at www.data.cam.ac.uk/upload. Subsequently, ask your publisher if you can add the statement about research data to your publication (for sample statements and information on where to put them within the publication, have a look here: http://www.data.cam.ac.uk/research-data-policies/funders-policies/epsrc-funded-researchers). Typically, publishers will allow this statement to be added to the publication. If your publisher does not allow you to make changes to your publication, let us know – we can manually link your data to your publication in the University of Cambridge data repository.
You might find this decision tree helpful to guide you through the process: http://www.data.cam.ac.uk/files/epsrctree.pdf.
Q. Do I need to publish data underlying conference publications?
You are always encouraged to publish research data supporting your conference publications, and you are expected to publish your supporting data if the conference publication is peer-reviewed.
Q. Am I expected to share large datasets resulting from bigger projects (databases, long-term datasets) or data supporting individual publications?
Research data that supports individual publications should be made available with a hyperlink to the data. Researchers should also consider and plan more broadly how they can make data assets of value resulting from our funded research available to others in a timely and appropriate manner.