skip to content
 

Choosing Formats

In planning a research project, it is important that you consider which file formats you will use to store your data. In some cases, this will be dictated by the software you are using or the conventions of your discipline. In other cases you may have to make a choice between several options.

These are likely to be some of the key factors in your decision-making:

  • what software and formats you or your colleagues have used in past projects
  • any discipline-specific norms (and any peer support that comes with them)
  • what software is compatible with hardware you already have
  • whether you have funding for new software
  • how you plan to analyse, sort, or store your data

But you should also consider:

  • what formats will be easiest to share with colleagues for future projects
  • what formats are at risk of obsolescence, because of new versions or their dependence on particular software
  • what formats will allow to open and read your data in the future
  • what formats will be the easiest to annotate with metadata so that you and others can interpret them days, months, or years in the future

In some cases, it might be the best to use one format for data collection and analysis, and converting your data to another format for archiving once your project is complete.

Best formats for preservation

If you are not aware of any disciplinary standards these are some good file formats for the preservation of the most common data types:

  • Textual data: XML, TXT, HTML, PDF/A (Archival PDF)
  • Tabular data (including spreadsheets): CSV
  • Databases: XML, CSV
  • Images: TIFF, PNG, JPEG (note: JPEGS are a 'lossy' format which lose information when re-saved, so only use them if you are not concerned about image quality)
  • Audio: FLAC, WAV, MP3

Frequently Asked Questions About Data Formats