Defining Data Provenance
In the summer of 1911, the smiling portrait of Leonardo da Vinci’s Mona Lisa vanished from the Louvre Museum. In fact, by the end of the twentieth century, art crime became one of the most profitable criminal activities, partly because technology then was not advanced enough to detect a forgery and partly because art became so easy to transport (as opposed to weapons or drugs which could easily be detected by sniffer dogs). After all, what trained canine could really tell the difference between a knock-off da Vinci and a credible one?
Derived from the French word provenir, provenance refers to the historical ownership of a resource or document and has been practiced in the context of art history, digital libraries and now research data. The latter has become increasingly important for data users when accounting for the input data in their research. As a type of metadata, data provenance wears a heavy crown: to confirm authenticity and foster trust for data users.
Data provenance is not only about history but the interactions of data with other entities that have contributed to its creation by serving as a witness to its journey. Why and how was the data produced? When, where and by whom? Data provenance answers these questions.
A Foundation of Trust
The information we have available to us is vast and ever expanding. With this breadth of data, it has become crucial to know the source of the data we’re interacting with and whether such data can be trusted.
We have spoken many times in this series about the need for trustworthy data and why it is a prerequisite for deriving data value and useability. When conducting data research, it is unlikely that data users are the producers of data. Knowing who collected the data and whether it was machine generated, transcripted, self-reported or verified by a professional, has a huge impact on the value of a dataset.
Comparing Apples and Oranges
Often the whole is greater than the sum of its parts. Linking datasets can have huge benefits and often it’s the only way to get insight into diverse populations and modalities. However, different data creators apply different methodologies and approaches to collecting and processing data. The question analysts have to ask themselves is whether they are ‘comparing apples with apples’ or ‘apples with oranges’. Here is where data provenance flexes its muscles: by being the bedrock for determining the trustworthiness of results, reproducibility of those results and reusability of the data itself.
The basis of the analysis of data users and the accountability of their research rests heavily on the trustworthiness and credibility of their data inputs. Provenance of metadata allows users to sufficiently assess the quality, credibility and trustworthiness of input data. For data users, provenance is not an option but a necessity.
METADATAWORKS has enabled some of the most notable data-driven organisations to create quality analysis and forecasts by driving data value with ease.
To see how we can quantum leap the quality of your data, speak to our team.