How to share research data

There are multiple ways to share your research data. A common, yet not particularly suitable, approach is sending the data via e-mail upon request. This approach not only complicates access to data for users but also burdens the authors who has to manage the requests individually. Additionally, there is a risk that the researcher might lose their data due to hardware failure or human error, or if a longer period has passed since the publication of the article, the researcher might not remember which file was used for the data analysis. Sharing research data via e-mail also does not allow the user to properly cite the data. Therefore, it is recommended to use a different method for sharing data.

This section takes a closer look at three approaches to sharing data:

Supplementary Materials to your research article
Data Repository
Data journal

Supplementary Materials to a research article

Some publishers offer the option of attaching Supplementary Materials to an article. In this way, authors can add, for example, raw data or code to their article, and the publisher then publishes the materials online and adds a link to the materials to your published article. If you choose this option, keep in mind that unless the publication is Open Access, the copyrights might be transferred to the publisher. Moreover, this option would not allow readers to reuse the data, or cite it independently of the main publication.

This method of sharing might be suitable, for instance, if you would like to share a larger table with measured values that would normally be a part of the article but would not fit onto a journal page. For publishing other types of data, it would be more suitable to deposit the data in a data repository and add a link to the dataset to your paper.

Data repositories

The best way to preserve your data - whether you decide to share them or not - is to deposit them in a data repository.

To increase the impact of your data, you should deposit your data in a subject specific repository. The advantages of subject specific repositories are that they bring together researchers from a particular scientific discipline who share their data with one another, and they are also usually better equipped to meet the needs of the community, for example, when it comes to the type of data the researchers share. An example of a subject specific repository is LINDAT/CLARIAH-CZ for linguistic data and tools, which is developed at the Institute of Formal and Applied Linguistics at MFF UK.

When choosing a suitable subject specific repository, it is a good idea to use a repository which is already established for your research domain - you can ask your colleagues where they deposit their data, or use the international registry of data repositories re3data.org.

If you cannot find a suitable subject specific repository, you can deposit your data in a general-purpose repository, which store data of all scientific disciplines. The most commonly used general-purpose repositories are Zenodo, Figshare, or Dryad. The Generalist Repository Comparison Chart can help you select a general repository.

When choosing a suitable repository, check the following as well:

Does the repository assign a persistent and unique identifier (e.g., DOI)? Thanks to a persistent identifier, your data are more easily findable and citable.
Is the repository certified as a ‘trusted data repository’? If the repository is certified, it is more likely that your data will be well looked after.
Does the repository enable open access to your data? If you decide to share your data, this is key information.
Does the repository license your data? Does it offer clear terms and conditions for data reuse? It is important that others know what they can and cannot do with your data.
Does the repository provide a landing page for your dataset with metadata? Metadata will help others find your data, tell what they are and how to cite them.
Does the repository enable versioning? If you update your dataset, you can upload it as a new version of the original dataset. The new dataset is given its own identifier and users can easily find out what is the latest version or which version was used in a particular study.

You can easily check some of these information at re3data.org. Each entry, in the upper right corner, includes a series of pictograms which tell you, for example, whether the repository uses persistent identifiers, whether it is open or certified. You can find more information in the detailed entry or on the website of the repository.

An example of the entry for LINDAT/CLARIN in re3data.org

This method of sharing is suitable for both standalone datasets and for underlying data for published articles. If you are publishing underlying data for published articles, remember to include a link to the dataset in your publication, and a link to the publication in the metadata of your data (preferably in the form of a persistent identifier, such as DOI).

Data journals

Data journals mirror the traditional model of scientific publication through articles and combine data sharing in repositories with publishing a data paper (also known as a data descriptor or a data note). A data paper is analogous to traditional research papers, it can be cited and can be reported in OBD (the current research information system used at Charles University).

The structure of data papers may vary depending on the requirements of individual journals, but a fundamental characteristic is that they describe a specific, publicly accessible dataset, not an analysis conducted on it. A data paper should provide information on what the data are, how and where they were created and so on, and the paper should also contain a link back to the dataset (ideally via a persistent identifier such as DOI). Generally, the publishers should not host the data; instead, the dataset should be deposited in a trusted open access repository so that even if the paper might have restricted access, the dataset would still be available.

Just like traditional publications, data journals follow a standard peer review process, however, there may be differences in terms of what the reviewers are requested to assess and whether the peer review process is open or not.

Listed below are some examples of data journals and you can find further examples here.

Scientific data - mainly natural science disciplines
Earth System Science Data - geosciences
Journal of Open Archaeology Data - archaeology
Biodiversity Data Journal - biodiversity
GigaScience - life and biomedical sciences
Journal of Open Research Software - research software

Useful Resources

Chavan, Vishwas & Lyubomir Penev. 2011. The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics 12, S2. https://doi.org/10.1186/1471-2105-12-S15-S2

Gould, Julie. 2014. How to publish your data in a data journal. Naturejobs Blog. http://blogs.nature.com/naturejobs/2014/12/04/how-to-publish-your-data-in-a-data-journal/

Stall, Shelley, Maryann E. Martone, et al. 2020. Generalist Repository Comparison Chart. Zenodo. http://doi.org/10.5281/zenodo.3946720

Last change: August 20, 2024 09:59

PDF TXT