This is a draft. We need your help to make it better. Get involved, learn more, and help us improve the Open Definition:
The Open Definition has three key requirements for a work to be open: an open license, open access, and an open format. This page focuses on the open format:
Section 1.3 Open Format from the Open Definition version 2.0 states:
The work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights. Specifically, data should be machine-readable, available in bulk, and provided in an open format (i.e., a format with a freely available published specification which places no restrictions, monetary or otherwise, upon its use) or, at the very least, can be processed with at least one free/libre/open-source software tool.
We want to create open knowledge. To help achieve this, the Open Format requires:
The work must be provided in a convenient format so that it is easy to reuse. This requires the work to be published in a format that maximises knowledge sharing and reuse. The format may vary for different media types (e.g. image, text, tabular or geographic data).
The work must be provided in a modifiable format so it can be reused in different ways, in part or in whole. What is an appropriate modifiable form?
(contribution needed)
Data is machine-readable if it is in a format that can be easily read, written, parsed and displayed by a computer.
For example:
As another example:
Appropriate machine-readable format may vary by data type. For example, a machine-readable format for geographic data may be different to a format for tabular data.
This section is based on an archived OKFN glossary and an Open Definition discussion about a harmonised Open Format definition.
See also https://www.data.gov/developers/blog/primer-machine-readability-online-documents-and-data
The work should be provided in bulk means that the data can be easily downloaded as a whole in one request.
This requirement complements the Access section of the Open Definition and together they require that:
But your data can still be open if you publish it as many individual files (however it could be argued you’re not publishing it in a convenient form).
An Open Data Format is a format with, “a freely available published specification which places no restrictions, monetary or otherwise, upon its use”.
A freely available published specification allows:
If an open data format has no restrictions, monetary or otherwise, upon its use, then:
An Open Format is a format that, “can be processed with at least one free/libre/open-source software tool”.
If there is a free software tool available to process the data, then the data can be re-used without the need to implement software.
The Open Format for data definitions above enable tabular data (e.g. a Nation Budget) to be published as a PDF (an open format according to the definition). However, this is not a convenient form for this type of data and, “the work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights”.
So, is a PDF of a National Budget open?
Tim Berners-Lee’s 5 Star Open Data scheme says it’s open and gets 1 star.
Based on the definition of machine readable above, a PDF of a Nation Budget isn’t open. (contribution needed - is this the intent?)
It could be argued that by prefixing the second sentence of the Open Format with, “Specifically, data should…”, this means non-data works may, but are not required to:
(Contribution needed - Is it OK that these requirements are all optional for non-data works?)
Some words in the Open Definition have special meaning and are shown in bold or italics. There meaning is defined below:
Work - denotes the item or piece of knowledge being transferred. Examples of a work include, but are not limited to: data, music, art, images, video, literary compositions, web pages and software.
Must, Required, or Shall - an absolute requirement RFC2119.
Must Not or Shall Not - an absolute prohibition RFC2119.
Should or Recommended - there may be valid reasons to ignore this requirement but the full implications must be understood and carefully weighed before choosing a different course RFC2119.
Should Not or Not Recommended - there may be valid reasons when the particular behaviour is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behaviour described with this label RFC2119.
May or Optional - the item is truly optional RFC2119.
These improvement ideas mainly come from conversations on the discussion list.
A specification that describes an open format should be:
The work should be published in a lossless and uncompressed open format so all the original detail is retained.
Tools like the Open Data Census and Open Data Certificates test to see if data is published using an open format. This improvement idea seeks to harmonise the definition of the Open Format for data so that tools could all point to the Open Definition, in the same way the tools currently point to it for a definition of an open licence and a list of conformant licenses.
Do you have another resource you’d like added below? Make the list better.
These links provide some alternate perspectives on open formats:
These lists of open formats have not been assessed as being conformant with the Open Definition: