What is “Data Publication”?
The concept of data publication is simple in theory: rather than relying on journal articles alone for scholarly communication, data sets can be published as first-class scholarly products, either alongside journal articles that use the dataset or as a standalone object with inherent value. In this sense, “publication” entails making data publicly available. Data publication is linked to the idea of “open data”, i.e., that some data should be freely available to everyone to use as they wish, without restrictions.
Why publish data?
New mandates: Funders, journals, institutions, and other stakeholders in the research world are beginning to require that researchers make their data publicly accessible. See, for example, the data management plan requirement from the National Science Foundation, or the White House Office of Science and Technology Policy memorandum.
- Allows credit to data producers and curators (via data citation and emerging altmetrics)
- Encourages reuse of datasets and discourages duplication of effort
- Encourages proper curation and management of data
- Ensures completeness of the scientific record, as well as transparency and reproducibility of research
- Improves discoverability of datasets
Methods for publishing data
There are several options available for those interested in making their data openly available:
- A community repository. These repositories may be housed within a disciplinary organization (e.g., a professional society) or at an institution (i.e., a university with a repository for their scholars, such as the Merritt Repository that serves the UC system). Ideally, they have preservation and access plans in place, guaranteeing the long-term availability of the data they host.
- A”data journal.” Researchers publish their data in a journal solely intended for datasets (see this list of data journals). This option is similar to the repository option above, with the potential for more features like enhanced data discovery.
- Supplementary to a journal article. The data accompany a primary journal article as supporting information. Note, however, access to such supplemental materials can be compromised years after publication due to broken links.
- A personal or lab webpage. A common choice for researchers who wish to share data and maintain control of the datasets. However, there are issues with stability and persistence of these data, as well as how easily researchers can find them.
Is all data that is published “open data”?
In short, no. Openness implies that there are no restrictions. Researchers new to data publishing are often eager to put restrictions on the viewing, access, and use of their datasets. This is not the best approach for scientific progress since such restrictions limit what data consumers can do with the data. For example, if an author decrees that he or she must be contacted before data can be used and does not maintain up-to-date contact information, this dataset is not open. To ensure that a dataset is truly open, consider placing it in the public domain. Researchers can still receive recognition via the scientific norms of citation in the bibliography, similar to attribution for more traditional journal articles. To learn more, watch this short video, “Open Data Explained”, from OERIPR Support.
Tools for publishing data
- Plan for data management and publication: DMPTool
- Create identifiers to cite and share data: EZID
- Perform a quality check, describe, and share tabular data: DataUp
- Create metadata and share data: DataShare (currently available for UC San Francisco only)
- Choose a repository to host your dataset:
For more information about data management, sharing, publishing, and archiving, follow the Data Pub blog from CDL.