Machine learning is on the shopping list for many seeking IT solutions today. The recent Gartner Magic Quadrant for Metadata Management Solutions report for 2020 acknowledged that whilst market expectations for augmentation and machine learning in metadata management solutions are accelerating, the market is largely yet to realise the potential use cases and benefits of this technology.
MetadataWorks, supported by funding from InnovateUK, is investigating the potential for machine learning to support the metadata onboarding process, and contribute to the standardisation of dataset attributes across organisations.
Specifically, MetadataWorks are investigating the ability of unsupervised natural language processing algorithms and neural networks to suggest mappings between dataset data attributes (extracted using open-source data profiling tools), and common data models, such as OMOP or the NHS Data Dictionary.
If successful, these tools could support data analysts and scientists to generate data documentation. Additionally, the tools could demonstrate the potential to transform datasets to industry and international standards, reducing data complexity and increasing data utilisation.
An alternative use case for these machine learning techniques could support organisations to identify the range of data attributes used across organisational data, and the potential for consolidation and standardisation.
We’re looking forward to presenting the results of this work in mid 2021. If you’re interested in seeing how these tools could work with your data, please get in touch!