Organising your data
Once you create, gather, or start manipulating data and files, they can quickly become disorganised. To save time and prevent errors later on, you and your colleagues should decide how you will name and structure files and folders. Including documentation (or 'metadata') will allow you to add context to your data so that you and others can understand it in the short, medium, and long-term.
Below you can find some guidance on:
Choosing a logical and consistent way to name and organise your files allows you and others to easily locate and use them. Ideally, the best time to think how to name and structure the documents and directories you create is at the start of a project.
Agreeing on a naming convention will help to provide consistency, which will make it easier to find and correctly identify your files, prevent version control problems when working on files collaboratively. Organising your files carefully will save you time and frustration by helping you and your colleagues find what you need when you need it.
How should I organise my files?
Whether you are working on a stand alone computer, or on a networked drive, the need to establish a system that allows you to access your files, avoid duplication, and ensure that your data can be backed up, takes a little planning. A good place to start is to develop a logical folder structure. The following tips should help you develop such a system:
- Use folders - group files within folders so information on a particular topic is located in one place
- Adhere to existing procedures - check for established approaches in your team or department which you can adopt
- Name folders appropriately - name folders after the areas of work to which they relate and not after individual researchers or students. This avoids confusion in shared workspaces if a member of staff leaves, and makes the file system easier to navigate for new people joining the workspace
- Be consistent – when developing a naming scheme for your folders it is important that once you have decided on a method, you stick to it. If you can, try to agree on a naming scheme from the outset of your research project
- Structure folders hierarchically - start with a limited number of folders for the broader topics, and then create more specific folders within these
- Separate ongoing and completed work - as you start to create lots of folders and files, it is a good idea to start thinking about separating your older documents from those you are currently working on
- Try to keep your ‘My Documents’ folder for files you are actively working on, and every month or so, move the files you are no longer working on to a different folder or location, such as a folder on your desktop, a special archive folder or an external hard drive
- Backup – ensure that your files, whether they are on your local drive, or on a network drive, are backed up
- Review records - assess materials regularly or at the end of a project to ensure files are not kept needlessly. Put a reminder in your calendar so you do not forget!
What do I need to consider when creating a file name?
Decide on a file naming convention at the start of your project.
Useful file names are:
- meaningful to you and your colleagues
- allow you to find the file easily.
It is useful if your department/project agrees on the following elements of a file name:
- Vocabulary – choose a standard vocabulary for file names, so that everyone uses a common language
- Punctuation – decide on conventions for if and when to use punctuation symbols, capitals, hyphens and spaces
- Dates – agree on a logical use of dates so that they display chronologically i.e. YYYY-MM-DD
- Order - confirm which element should go first, so that files on the same theme are listed together and can therefore be found easily
- Numbers – specify the amount of digits that will be used in numbering so that files are listed numerically e.g. 01, 002, etc.
How should I name my files, so that I know which document is the most recent version?
Very few documents are drafted by one person in one sitting. More often there will be several people involved in the process and it will occur over an extended period of time. Without proper controls this can quickly lead to confusion as to which version is the most recent. Here is a suggestion of one way to avoid this:
- Use a 'revision' numbering system. Any major changes to a file can be indicated by whole numbers, for example, v01 would be the first version, v02 the second version. Minor changes can be indicated by increasing the decimal figure for example, v01_01 indicates a minor change has been made to the first version, and v03_01 a minor change has been made to the third version.
- When draft documents are sent out for amendments, upon return they should carry additional information to identify the individual who has made the amendments. Example: a file with the name datav01_20130816_SJ indicates that a colleague (SJ) has made amendments to the first version on the 16th August 2013. The lead author would then add those amendments to version v01 and rename the file following the revision numbering system.
- Include a 'version control table' for each important document, noting changes and their dates alongside the appropriate version number of the document. If helpful, you can include the file names themselves along with (or instead of) the version number.
- Agree who will finish finals and mark them as 'final.'
There are also numerous external resources that will offer you guidance on the best file naming conventions and you can find more information about them here.
To ensure that you understand your own data and that others may find, use and properly cite your data, it helps to add documentation and metadata (data about data) to the documents and datasets you create.
What are 'documentation' and 'metadata'?
The term 'documentation' encompasses all the information necessary to interpret, understand and use a given dataset or set of documents. On this website, we use 'documentation' and 'metadata' (data about data - usually embedded in the data files/documents themselves) interchangeably.
When and how do I include documentation/metadata?
It is a good practice to begin to document your data at the very beginning of your research project and continue to add information as the project progresses. Include procedures for documentation in your data planning.
There are a number of ways you can add documentation to your data:
Information about a file or dataset can be included within the data or document itself. For digital datasets, this means that the documentation can sit in separate files (for example text files) or be integrated into the data file(s), as a header or at specified locations in the file. Examples of embedded documentation include:
- code, field and label descriptions
- descriptive headers or summaries
- recording information in the Document Properties function of a file (Microsoft)
This is information in separate files that accompanies data in order to provide context, explanation, or instructions on confidentiality and data use or reuse. Examples of supporting documentation include:
- Working papers or laboratory books
- Questionnaires or interview guides
- Final project reports and publications
- Catalogue metadata
Supporting documentation should be structured, so that it can be used to identify and locate the data via a web browser or web based catalogue. Catalogue metadata is usually structured according to an international standard and associated with the data by repositories or data centres when materials are deposited. Examples of catalogue data are:
Digital Curation Centre provides examples of disciplinary-specific metadata, which can be viewed here.
Tools for metadata tracking and data standards
ISA Tools - metadata tracking tools for life sciences
The open source ISA metadata tracking tools help to manage an increasingly diverse set of life science, environmental and biomedical experiments that employing one or a combination of technologies.
Built around the ‘Investigation’ (the project context), ‘Study’ (a unit of research) and ‘Assay’ (analytical measurement) general-purpose Tabular format, the ISA tools helps you to provide rich description of the experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable.
BioSharing - searchable portal of inter-related data standards, databases, and policies for life sciences
BioSharing is a curated, searchable portal of inter-related data standards, databases, and policies in the life, environmental, and biomedical sciences.
Projects can last for months or years, and it is easy to lose track of which piece of information came from which source. It can be a challenge to have to reconstruct half of your citations in the scramble at the end of the project! Your future self may not remember everything that seems obvious in the present, so it is important to take clear notes about your sources.
What is 'reference management software'?
Reference management software helps you keep track of your citations as you work, and partially automates the process of constructing bibliographies when it is time to publish. The University of Cambridge also offers support and training on several referencing systems.
Who can help me with reference conventions and formats for my academic discipline or particular project?
Your departmental librarian will be able to help you pick the right format for references and will probably know about some useful search and management tools that you have not used before. Feel free to ask him/her for advice.
Additionally, your college librarian is also a very good resource and is there to help.
Find your departmental and college librarian on the University's Libraries Directory.
Most people now routinely send and receive lots of messages every day and as a result, their inbox can get very quickly overloaded with hundreds of personal and work-related email. Setting aside some time to organise your emails will ensure information can be found quickly and easily, and is stored securely.
Why should I organise my email?
Apart from the obvious frustration and time wasted looking for that email you remember sending to someone last month, email is increasingly used to store important documents and data, often with information related to the attachments within the email itself. Without the proper controls in place they can often be deleted by mistake. It is also important to remember that your work email comes under The Data Protection Act 1998 and the Freedom of Information Act 2000, so your emails are potentially open to scrutiny.
What are the first steps to organising my email?
If your emails have got out of control there are a number of immediate steps you can take to control the problem:
- Archive your old emails. If you have hundreds of emails hanging around from more than a month ago, move them into a new folder called something like "Archive". You can always come back to these at a later date.
- Now go through your remaining inbox email by email. If an email is useless, delete it. If not, ask yourself: is it "active" - is there a specific action you, or someone else, need to take, or do you just vaguely think it is worth keeping? If the latter, move it to the archive.
How can I ensure my emails remain organised?
Here are some general tips to ensure your email remains organised in the long term:
- Delete emails you do not need. Remove any trivial or old messages from your inbox and sent items on a regular (ideally daily) basis.
- Use folders to store messages. Establish a structured file directory by subject, activity or project.
- Separate personal emails. Set up a separate folder for these. Ideally, you should not receive any personal emails to your work email account.
- Limit the use of attachments. Use alternative and more secure methods to exchange data where possible (see ‘data sharing’ for options). If attachments are used, exercise version control and save important attachments to other places, such as a network drive.