Thoughts

Redefining the Importance of Data Provenance

by Adam Milward , March 22nd, 2021

Context & credibility the role of data provenance

Why is context & credibility important to create data provenance?

In the vibrant summer of 1911, the captivating smile of Leonardo da Vinci‚Äôs Mona Lisa mysteriously disappeared from the Louvre Museum. By the close of the 20th century, art crime had burgeoned into a flourishing criminal enterprise. This was partially because the then-existing technology couldn’t effectively detect forgeries and partly because art was effortless to transport, unlike weapons or drugs that were more easily detected. Yet, how could a trained dog distinguish between a counterfeit da Vinci and a genuine one?

Data provenance, drawn from the French term ‘provenir’, denotes the historical ownership of an asset or document. Initially practised within the realm of art history and digital libraries, its significance has now permeated into the domain of data research. As a facet of metadata, data provenance bears a significant responsibility: verifying authenticity and cultivating trust among data users.

But data provenance isn’t merely about historical context; it’s also about how the data interacts with other entities, contributing to its formation and chronicling its journey. It addresses the why, how, when, where, and who associated with data production.

Cultivating Trust through Data Provenance

The wealth of information accessible to us is both vast and continually expanding. Consequently, knowing the origin of the data we’re dealing with and whether it can be trusted has become essential.

Throughout this series, we’ve frequently highlighted the necessity for reliable data and why it forms the bedrock for deriving data value and usability. In data research, it’s seldom that data users are also the data producers. Hence, knowing who collected the data and whether it was machine-generated, transcribed, self-reported, or verified by a professional substantially affects the worth of a dataset.

Balancing Comparisons: The Power of Data Provenance

Often, the whole surpasses the sum of its parts. Combining datasets can yield immense benefits and is frequently the sole method of gaining insight into diverse populations and modalities. However, data creators have varying methodologies and approaches for data collection and processing. Data analysts must ascertain if they are comparing like with like, a concept best described as ‘comparing apples with apples’ versus ‘comparing apples with oranges’. Herein lies the power of data provenance: it forms the foundation for determining the trustworthiness of results, their reproducibility, and the reusability of the data itself.

The basis of data users’ analysis and the accountability of their research heavily hinges on the reliability and credibility of their data inputs. The provenance of metadata enables users to assess the quality, credibility, and reliability of input data. For data users, provenance is not just an optional feature; it’s a necessity.

METADATAWORKS has empowered some of the most notable data-driven organizations to produce quality analysis and forecasts by facilitating simple yet effective data value management.

Keen to witness how we can significantly enhance the quality of your data?

Engage with our team today.

Related Insights

We use Cookies

We use tools, such as cookies, to enable essential services and functionality on our site and to collect data on how visitors interact with our site, products and services. By opting in or continuing to use this site, you agree to our use of these tools for advertising and analytics.