In 2016, our paper “Quality Assessment of Linked Data: A Survey” was published in the Semantic Web Journal. This paper was a systematic review (see this blog I wrote on how to conduct such reviews) where we analyze 30 core articles that focus on Linked Data quality assessment and provide a comprehensive list of 18 quality dimensions and 69 metrics. This paper has over 300+ citations (according to Google Scholar).
I have presented this work several times over the past years and got the opportunity to present it again yesterday at the ConQuaire workshop on Data Quality and Reproducibility. Michel was invited to give a talk but he already had other commitments at the same time and since the topic was closely related to my work, he recommended me and the organizers gladly obliged :).
The workshop started with presentations on the results of the ConQuaire project (Continuous quality control for research data to ensure reproducibility) followed by talks on a talk about the GO FAIR initiative and my presentation about the different data quality dimensions, metrics, tools and how this relates to the FAIR principles. Here are my slides: https://www.slideshare.net/amrapalijz/data-quality-and-the-fair-principles. There were two very interesting talks about using GitLab and Jupyter notebooks as a means to enable reproducibility. The last talk was about “Open Reproducible Research in the Geosciences: Obstacles, Solutions, and Incentives”, see slides here.
After the talks there was a very interesting panel discussion on: “Is there a crisis of computational reproducibility in science? If yes: How do we solve it?” which I was part of along with 4 other panelists and a moderator. I spoke about the “Artificial intelligence faces reproducibility crisis” article that looked at 400 algorithms and found that only 6% shared code, only 1/3rd shared data and only 1/2 shared pseudocode! The reasons were that either the code was work in progress, was owned by a company, held by researcher in competition or simply lost. There were some interesting questions and discussions on the panel that I jotted down:
- “It is not a crisis but a problem” said one panelist and “its not a problem per say but something definitely needs to be done about it” said another
- Is it a technological or social problem or both? There are technological advances that need to happen to, for example, automatically check the reproducibility of a code
- In terms of a social problem, it is not necessarily the journals that should be made responsible to ensure that the papers contain reproducible components but instead of the scientific community (reviewers) to create a set of guidelines to ensure reproducibility
- Then there was a discussion around rewarding those who abide by the reproducibility “rules” and give them due credit/reputation points as a means of incentivizing people to produce reproducible content
- There was a question of whether it is necessary to be “open” to be reproducible?
- The moderator asked us our personal experience on the first time we were asked to reproduce our own data, which led to rather embarrassing and disappointing stories
- This also led to the question of whether and why reproducibility is only gaining momentum now whereas it would have been a “problem” 20 years ago
- An audience member raised an important question of whether code was not always published because people are embarrassed of the poor quality or for the negative results?
There aren’t necessarily answers to all the questions but we are in the right moment to think of reproducibility especially with the boom of data and technologies and some thing needs to be done about it !
And now, here are tweets from the participants!