Tips on looking after and on sharing your data
You’ve invested a lot of time and effort in creating your data, so keep it safe. Learn how to select what to keep and how to store it carefully. Discover why and how to back it up to make sure it is not lost. Find out how to preserve your data and back-ups, and consider how you can get the most from your data, perhaps through re-use and sharing.
This short video illustrates the value of good data management and provides a few key best practices (from Digital Preservation Europe)
Long-Term Storage and Preservation
Selection - Choosing What to Keep
Sharing - how to make your data easily re-usable?
Digital Repositories - referring to a dedicated page where we provide detailed guidance on various data repositories
Choosing the right way to store your data can help you work more flexibly, easily and quickly. Thoughtful storage solutions can also simplify version control and collaboration with others. You may be required by your PI or funder to store your data in a particular place, or you may have more choices available. No matter which solution you use, the two golden rules of storage apply.
- Where possible, only store what you need to keep.
- Store crucial data in more than one secure location.
Can I use portable storage media (e.g. memory sticks, external hard drives)?
Portable storage media such as memory sticks (USB sticks) are more risky and vulnerable to loss and damage. Computing officers will not back them up or support them centrally. It is important not to rely on them as your only copy of important data.
They are very convenient though, and useful for:
- temporary copies/moving files e.g. taking a presentation to a conference
- secondary or back-up copies
- files only one person at a time needs access to
- data you can afford to lose
Nearly everyone who has experienced serious data loss did not think it would happen to them - but it does happen periodically. The results can be catastrophic for your research project, or for you personally. However, you can prevent data loss by following good backup practices.
I usually store my data on a department or college network drive. Does that mean it is backed up?
Many computer networks within the University back up files automatically, but some do not.
Ask your local computing officer or network administrator:
- whether files on the network are automatically backed up, and, if so
- which folders or drives on the network are backed up automatically
- how frequently the backups happen, and
- how long backups are stored for
What is the best practice for backing up data?
IT professionals strongly recommend that:
- you make two, or even three, back-ups of all important documents and data not stored on a networked file server (failure rates for storage media are probably higher than you think!)
- you store one back-up in a different location from the others (to keep your files safe in case of a fire, flood, burglary, etc)
- you use multiple different types of storage media or storage media from different manufacturers (to protect against multiple media failures, e.g. a bad batch of discs)
How should I choose what back-up storage media to use?
Your choice of storage media for back-up will depend on the quantity and type of data you have: memory sticks, online backup services (i.e. FTP servers) may be convenient for small amounts of data, whereas hard drives or magnetic tapes may be more appropriate for large volumes or when you need to store data offline for security reasons.
How should I choose what to back up and when?
Back-up can be time-consuming or expensive if your files take up a lot of space, or if you keep different files in different locations. To help you decide what to back up and when, think about which files you would need in order to re-create or restore in the case of loss and which data are crucial for your work?
You may choose to only back up certain data, or to back up files you use every day more regularly than others. The basic rule of thumb is:
- The more important the data and the more often they change, the more regularly they need to be backed up
- If your files take up a large amount of space and backing up all of them (or backing them up sufficiently frequently) would be difficult or expensive, you may want to focus on backing up specific key information, programs, algorithms, or documentations that you would need in order to re-create the data in case of data loss.
The term 'preservation' means ensuring something can still be seen or used over time. In the context of digital data, long-term preservation is the process of maintaining data over time so that they can still be found, understood, accessed, and used in the future.
Why does preservation matter to me?
You may think that by saving your data in one or more places you have made sure it is effectively preserved, but with digital technology developing so quickly, your digital data are at risk from one or more of the following problems:
- file formats might not be compatible with future software, and therefore unreadable
- even if a document can still be opened with new software, it may be altered to a degree as to no longer be understandable or reliable for continued research
- storage media may have been degraded, scratched or broken, especially if they are portable, such as USB sticks, so information might be lost
- the files or data will not be understood because there is no supporting documentation or metadata, or this has not been preserved correctly either.
What can I do to ensure my data are usable in the future?
When creating, organising and storing your data you can take a few initial steps to try and ensure your data remain useable and understandable for the future:
- effectively document your data so that it can be understood in the future
- periodically move data to new storage media (drives degrade over time)
- keep more than one copy of data, and on a variety of storage media
- migrate data to new software versions, or use a format that can easily be imported to various software programs.
Ideally this should be covered in a data management plan at the start of a project, so that you can factor any associated time and resources into your budget.
It is tempting to keep everything, just in case you need it in the future, but keeping all your files for the foreseeable future costs money, and makes it more difficult to find the truly important things. It is also worth remembering that if you have something on file, then it might be subject to a Freedom of Information (FOI) request.
What does selection involve?
Choosing what to keep and what can be disposed of or deleted is always going to involve a subjective judgement, as nobody knows exactly what information is going to be wanted in the future.
All we can do is think the matter through carefully, abide by the policies we need to (e.g. from funders) and document decisions made and the reasons for them. It will not be a perfect process, but should at least be a sensible one.
Cannot I just keep everything?
There are some good reasons why selection is worth doing:
- because storage costs money; storage requires effort / staff hours; storing massive amounts of data complicate finding and access of truly useful stuff.
- because Freedom of Information laws mean that what you keep on file may have to be disclosed, if requested.
How do I know what to keep and what to delete?
These following questions, based on material devised by the Digital Curation Centre, can help you decide what you should keep and what can be deleted:
- does my funder or the university need me to keep this data and / or make it available for a certain amount of time?
- does this data constitute the 'vital records' of a project, organisation or consortium and therefore need to be retained indefinitely?
- do I have the legal and intellectual property rights to keep and re-use this data? If not, can these be negotiated?
- does sufficient documentation and descriptive information (‘metadata’) exist to explain the data, and allow the data or record to be found wherever it ends up being stored?
- if I need to pay to keep the data, can I afford it?
Once you have sorted through your files and asked these questions you then need to:
Check your data protection responsibilities.
- prepare documentation for each file
- find out how to deposit in a data repository.
How can I make it easier for others to re-use the materials that I produce?
One relatively simple way to make it easier for others to re-use tools, data, or other content that you produce is to add a Creative Commons license. For example ‘By-Attribution, Non-Commercial’ is a common Creative Commons license – when you mark your file, image, or information with this, it means that anyone can use your information in any way they like, so long as they attribute it to you and don’t use it for commercial purposes. There are other types of Creative Commons licenses, also allowing commerical use, and licesnse which do not require re-user to attribute the creator. Creative Commons licenses are often used for materials released online, but you can also include these in printed materials if your publisher does not own the rights. For additional information about Creative Commons license options, visit their website or watch the short video below:
To license something with a Creative Commons license, you don't need to file any paperwork - just publish (in print or on the web) your materials along with a notification that you are using a particular license.
IMPORTANT NOTE: Creative Commons licenses are 'irrevocable' so do not add a Creative Commons license unless you are sure that:
- you have the right to publish this information
- you will not want to revoke it later on for any reason.