Creating documentation throughout your research project is an important component of the research process. At the very least, you will require the information to describe your research outputs in any future presentations or publications. In order to preserve your data in a repository or share your data with others, you will be expected to provide supplemental information such as citation information, an explanation of survey methodology, sampling information, question context and coding, how and why derived variables were created, and more.
We are always happy to answer your questions and assist you further. Contact us.
Many long-term preservation platforms and services require a varying amount of structured metadata to accompany deposited data files.
Metadata refers to the data or information that supports the discovery, understanding, and management of your research data. Good quality metadata is essential for accurate and informed usage, especially if data is to be reused or shared in the future. Metadata initiated during the planning stages of a research project decreases risks associated with data loss during and after project completion. It is critical, therefore, to start documenting your data from the very beginning of your project.
- To enable others to reuse your data
- To facilitate preservation
- To allow replication at a later date
- To make the data understandable to others
The level of structure used to document your data will depend on the complexity of the project or data collected and the number of people involved in the project. Consider documenting the following information:
- Document creators, collaborators, funders, rights
- Outline the research question and rationale
- Document the date the data was gathered or analysed
- Describe the survey methodology
- Describe the sampling frame
- Describe instruments, instrument setting or measures used
File or Database Level
- Describe the relationship between files
- Document information contained within the files
- Identify the format files are stored in
- List and document tests or analysis performed on the file(s)
Use a readme.txt file to document information at the file or folder level
- Include information on file naming, abbreviations or acronyms used as well as contents of the file(s)
- Document not only variable name but also the variable label explaining the variable meaning, unit of measure, sample weighting, etc.
- Information could be contained in a codebook
Sources of Metadata Information
- Standard information submitted in Research Ethics Board (REB) request
- Laboratory notebooks & experimental protocols
- Questionnaires, codebooks, data dictionaries
- Software syntax and output files
- Information about equipment settings & instrument calibration
- Database schema
- Methodology reports
- Provenance information about sources of derived data
Using Standards, Taxonomies, Classification Systems
When preserving or sharing data, standards, taxonomies or classification systems can be utilized to categorize or document data or other information in a widely understood method. Data repositories usually request that you use an international metadata standard.
- Dublin Core (DC) for general purpose standard for basic element description
- Data Documentation Initiative (DDI) XML based standard for description of social and behavioural science data sets
- Federal Geographic Data Committee (FGDC) for geospatial description
Ontologies, Taxonomies, and Classifications
It is important to use discipline-specific ontologies, vocabularies or taxonomies and classification systems when creating and documenting your data. This is a method of standardizing information into relational schemas ensuring wide-spread understanding of concepts and descriptions.
- Formal models that define concepts and their relationships within a specific domain
- A simple ontology could be created developing naming conventions for your files and folders
- Examples include:
- Used primarily in the sciences to depict hierarchical relationships
- Biology – phylum, family, genus...
- Used extensively by governments to depict hierarchical relationships and standard descriptions of specific classes such as goods, crops, geographical units, industries, occupations.
- Examples include:
Example of a Metadata Standard: Dublin Core Metadata Element Set
This fifteen term vocabulary set is considered to be the core elements which should be used to describe an item. It is part of a more complex set of vocabularies known as the DCMI Metadata Terms which is an ISO Standard [ISO15836] and an ANSI/NISO Standard [NISOZ3985].
|Contributor||An entity responsible for making contributions to the resource|
|Coverage||The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant|
|Creator||An entity primarily responsible for making the resource|
|Date||A point or period of time associated with an event in the lifecycle of the resource|
|Description||An account of the resource|
|Format||The file format, physical medium, or dimensions of the resource|
|Identifier||An unambiguous reference to the resource within a given context|
|Language||A language of the resource|
|Publisher||An entity responsible for making the resource available|
|Relation||A related resource|
|Rights||Information about the rights held in and over the resource|
|Source||A related resource from which the described resource is derived|
|Subject||The topic of the resource|
|Title||A name given to the resource|
|Type||The nature or genre of the resource|
This table has been compiled from the Dublin Core Metadata Element Set, Version 1.1 document and used under Creative Commons Attribution 3.0 Unported Licence.
We would like to thank the UK Data Service for use of their training materials in the creation of these modules.
We would also like to thank the EDINA and Data Library, University of Edinburgh for use of materials from the Research Data MANTRA [online course] in the creation of these modules.