Institute of the Czech National Corpus
Faculty of Arts
ORCID: 0000-0003-3977-2393
ResearchID: J-7184-2017
GitHub: vaclavcvrcek
Twitter: @CvrcekV
I have actually had a positive approach to open science ever since the beginning of my work with the Czech National Corpus project at the Faculty of Arts, Charles University. As a research infrastructure, the Czech National Corpus strives to create and make accessible extensive databases of texts intended for research (linguistic, literary, historical, etc.). The concept of language corpora, as it has been developed in the world since the 1960s, has been relatively close to open science since the beginning, because it was clear that the enormous effort that you put into creating a collection of hundreds of millions of texts must not be wasted by keeping the result – a language corpus – in a vault and making it available only to a small group of its creators.
This is primarily work with publicly available language resources which we have a large range of already at the Czech National Corpus (see www.korpus.cz). I am glad that we agree in the implementation team on a policy of maximum openness, and if the laws and copyrights permit, we try to provide data to as many users as possible with a minimum of restrictions (e.g. with free registration). Unfortunately, there are still relatively restrictive rules in the Czech Republic regarding the sharing of texts for academic and pedagogical purposes, so it is definitely not as we would like in all cases.
There are several advantages, and with the J. Chromý team, we have tried to describe them in a little more detail for linguistics in the joint article Linguistics as an open and transparent discipline, Naše řeč 104(1). Among the main ones, I would mention the following: synergy – data created by someone can still be useful in other research, replicability of research – verification of results is key to the healthy development of any discipline.
Most importantly, don’t be afraid. It’s easier than it may seem. Services such as OSF.io or various repositories are very intuitive today, and storing data there does not take more time than you would spend cleaning out your own hard drive. In addition, it pays off whenever you need to return to research and data.
The second important aspect is that sharing the knowledge we create in our work is our mission. For this reason, we publish articles and monographs. It’s the same with sharing data, procedures, and tools. The only difference is that we give colleagues and the public more opportunities to look into our “kitchen”.
I think the biggest obstacle is still mistrust. From some of my colleagues’ reactions, I have the impression that they are afraid that someone will call them out for mistakes or that someone will discover some great result (a diamond hidden in the data). In my opinion, both concerns are strange. We all make mistakes, and if they are not intentional (falsifying research), informing about a mistake is not dishonest. It is the only way our knowledge can move forward. It is not pleasant for anyone to find out that they were wrong, but that belongs to our profession. The second concern is, in my opinion, a myth – ground-breaking discoveries from data that have already been used once will only succeed if we combine them with some other data or if we apply a radically different analytical view or method. However, both of these assume considerable added value, and it is definitely not “without work”. We should rather be pleased that we have helped make a significant discovery.
For me, open science is especially an appeal for greater humility towards the scientific method and a joint effort to reproduce our knowledge of the world.
Residency, Invoicing and Correspondence Address
Charles University
Central Library
Ovocný trh 560/5
116 36 Prague 1
Czech Republic
Office Address
José Martího 2 (2nd floor)
160 00 Prague 6
Phone: +420 224 491 839, 172
E-mail: openscience@cuni.cz
Www: openscience.cuni.cz