By Julien Colomb | July 4, 2018
Prof. Tuuli Toivonen on 5 June 2017, University of Helsinki
It was frustrating to know that valuable data had been collected but there was no access to that. The frustration was particularly intense if the data was collected by publicly-funded institutions that did not have the resources to really analyse the data. Then, we argued, that publicly-produced data should be mobilised for research. Or even better, we argued, it should be available for everyone without restrictions, to reduce the administrative workload and to reduce the barriers for collaboration and innovation between scientists, companies and active citizens.
Since those times I have strongly felt that if I produce data in a publicly-funded research project, it is my positive duty to share it with others, to advance further use of our research results and to reduce duplication of effort in society. Of course it is also pleasing to see your own data starting to feed into town planning or take new shapes in the hands of other researchers.
Closed data sets increase duplication of effort, add administrative hassle, allow false results to get published more easily, and reduce the speed of scientific advancements. Also, it increases the likelihood of eroding the resources that we have. Data with high future value may get lost in the disks of individual researchers. Also, errors might accumulate if research results cannot later be verified or tested.
It needs to be much easier than it is at the moment. It takes time for the infrastructure and practices for sharing data to become more established, and researcher training needs to pay attention to this from early on. At the moment, one needs to be enthusiastic to start sharing data because it is seems difficult and messy at first. Also, a change in research culture, including journal practices and the merit system, is needed to make sharing data a default rather than an exception.
Ari Asmi, University of Helsinki
In my dissertation I collected datasets and trend analysis of particulates. As I collected the material, I had to contact many people around the world. I had heard rumors that someone might have some datasets somewhere. There was an awful lot of negotiation with each and every one about how data could be used and under what conditions. It was an enormous operation. It was then when I became aware of the importance of data issues.
When I started working with open data, it was more a philosophical question for me: when practicing science, the results must be verifiable. The idea of transparency of publicly-funded research was also important. No one should be able to reserve research material for his or her own use simply because that person thinks he/she might write another paper on the basis of the material later.
The question is not just about philosophy of science; openness of data can also serve as proof that the data has not been manipulated unethically. Transparency makes your argumentation much stronger. Furthermore, it makes it possible to show where your conclusions have been derived from.
The most frustrating thing is when researchers ask what is the personal benefit they gain from openness. I am often forced to justify the openness with very far-reaching arguments. Thus, the biggest problem at the moment is that there is no reward for an individual researcher. When recruiting new staff, a university could ask what kind of open datasets the researcher has published, how much those datasets have been used, and how relevant they have been to the discipline. At present, reference databases and published articles are thoroughly checked over, but datasets are just additional information at best. Currently, research data is not considered as a part of demonstrating a researcher’s competence.
Another positive thing is the students’ response to the current situation. They have been surprised that very often research data is not openly available. It’s strange to them, and they ask why it is not open. I then have to answer that it is because of practical reasons.
The development of the tools must also be taken into account. Tools for data processing are improving all the time. And when tools get better, publishing of data becomes increasingly easier. Eventually, the effort needed for publishing might diminish so that publishing is accomplished more easily.