Metadata: What is it and why is it important?

In simple terms, metadata is data about data. It helps anyone that might be interested in your data work out what it is and how they can use it. You can think of it like an instruction manual that contains all the details on your data so others can easily discover your data and understand how to use it without having to contact you.

Why is metadata important?

Detailed metadata allows data to be understood by other researchers, more easily discoverable and integrated for reuse. This increased transparency, improves research integrity which is essential for advancing research. As a researcher, it can also help to promote your citations, credibility and your chances of getting funded.

What should your metadata include?

There are discipline specific differences, but these 5 general rules can be followed by all:

icons_v2_organisation.png

1. Organisation

Details of how files and folders are structured with a key to the file names. It can be helpful to use a README file to document organisation – this should make sense to someone from outside your project.

icons_v2_data_collection.png

2. Data Collection

Types of files with details on how, when and where the data was collected. You could also include information on who collected and analysed the data and how to contact them.

icons_v2_methodology.png

3. Methodology

What software or equipment was used. Detail any permissions, ethics and consent obtained.

icons_v2_analysis.png

4. Analysis

What was used to analyse the data.

icons_v2_sharing.png

5. Sharing

Licence information for reuse. For example, does the licence choice allow derivatives of your data to be made and distributed?

Metadata Q&A

What is metadata?

Metadata is the data that is used to describe your data so it can be interpreted correctly, placing it in context and making it clear, easy to access and reuse.

Why should I care about good metadata?

Many funders and publishers now require you to share your data. The expectation is that your data should be well-curated, facilitating findability and reusability as described by the FAIR principles. Good metadata improves the chances that others will find and reuse your data. This is good not only for research and research integrity, but it can also be good for your reputation as a researcher, the more findable and reusable your data are, the more citations you will receive.

What information should I include?

As a minimum, you should consider providing the following details about your data:

Title – ensure this is meaningful to someone outside your research group
Date(s) of creation
Who contributed to data creation/collection?
File formats and software (or hardware) required to access and use the files
Consistent folder and file naming convention. The TLS Document Naming Convention is used by the Research Data Team at the University of Cambridge
Data dictionary – a collection of information about the data objects, tables, fields or other elements in the dataset/database.
Rights and Licence – who holds the rights to the data and what rights and limitations apply to researchers who want to use the data.
Contact information – name and means of contact for the person who is responsible for the data (usually the data owner)
Identifier – a DOI or other stable and unique marker

You could also include: definition of variables; vocabulary (full names of all abbreviations and meanings); units of measurement; any assumptions made; any further necessary or useful information to help interpretation of the data.

Who should the metadata be aimed at?

Ideally this should be written so that someone in your research area (outside your group) could reuse the data or build on the research without speaking to you.

What is a README file?

A README file is a text file that sits alongside your data to document the contents, naming convention and structure of files and folders and the project they relate to. Information should include a description of file formats and software needed to use the files. You can also document details of licence or any restrictions placed on the data. Cornell University offers some good advice on how to write README files

What are metadata standards?

Metadata are more useful when standards are used, enabling easier use and comparison by researchers in the relevant discipline. If you are sharing your data in a repository, then the metadata standards used will vary between repositories.

There are generic metadata standards (e.g. Dublin Core and MODS) and discipline-specific ones (e.g. Darwin Core – biological specimens). FAIRsharing is a curated, searchable portal of inter-related data standards, databases, and policies in the life, environmental, and biomedical sciences. See the Research Data Alliance Metadata Standards Catalogue.

When should I document my metadata?

It is a good practice to begin to document your data at the very beginning of your research project and continue to add information as the project progresses. Include procedures for documentation in your data planning.

How should I present metadata?

If you are uploading to a repository, metadata will often be required as part of the submission record.
In a README file – this is particularly useful if you have a large or complex dataset, or if detailed instructions are required for accessing/reusing your data. Cornell University offers some good advice on how to write README files.
Other options – metadata can also be included in the data files (e.g.data dictionary in a spreadsheet page). However, it can be better to have your metadata readily accessible outside of your data files, so that researchers can access it without having to download the dataset.

How can I make my metadata machine/computer-readable?

Making your metadata machine-readable is better for discoverability and reusability. Some repositories may capture metadata in machine-readable formats. Make sure the data itself is machine-readable, when possible, for example:

Use structured, tabular formats (e.g., csv, XML, JSON, MS Excel) rather than free text formats (PDF, MS Word). Of course, this depends on your specific research area.
Don’t use colours to store information (e.g., in MS Excel) use coded variables instead.

Are there any tools to help track metadata?

Digital Curation Centre has a list of Metadata Tools to help track metadata.

Credits: This information was a result of a collaboration between Cambridge University Libraries, Cambridge University Press & Assessment (CUP&A) and the Cambridge Data Champions, funded by the CUP&A University Collaboration Budget. With special thanks to Kevin Symonds, Simon Carrignon, Irene Fabry-Tehranchi, Lucy Woolhouse, Curtis Sharma, Kiera McNeice, Monica Moniz.

Metadata