Transformational Data Governance & Linked Data Integration
Our primary objective was to enable Kaluza to establish a robust data governance framework, improve data integration processes, and enhance data quality. Our team aimed to centralise data efforts, decentralise mapping processes, and introduce new technologies and methodologies to optimise the data platform.
Amarti collaborated with the senior leadership team and the Director of Data to perform a comprehensive discovery phase. Our team assisted in formulating and implementing the organisation’s data strategy. We advocated for necessary cultural and organisational changes to support the data initiatives and ensure operational readiness.
Data Observability and Quality:
Amarti led requirements gathering, education, and vendor selection process for data observability, data quality tooling, and data catalogue solutions. We designed and implemented Kafka journey recording to ensure comprehensive tracking of data flow.
We established a data lineage framework to track data from individual Kafka services to the data warehouse and BI tools.
Data Governance and Ownership:
We formed a data governance group based on the Data Mesh and Data as a Product organisational patterns, involving 16 domain champions from each engineering team.
Agile workshops were conducted to define the purpose, responsibilities, and boundaries of the data governance group. Collaborative ways of working were facilitated within the governance group to ensure effective data management.
Semantic Integration and Linked Data:
Amarti introduced the concept of a semantic integration layer utilising linked data, knowledge graphs, and elastic search in Neptune. As well as, developed a web-based application for data discovery, browsing data, and data models. We implemented complete provenance information for each data point, enabling tracking from the journey start to data warehouse.
End-to-End Data Provenance Solution:
We integrated existing Kafka topics with the Linked Data solution using RML (RDF Mapping Language). We demonstrated the loading of RML mappings, data, metadata, and schema definitions for comprehensive data provenance tracking. We helped develop a web application and configured a production-ready system in AWS for seamless integration.
Data Normalisation and Mapping Efficiency:
We showcased the benefits of using RML to centralise data normalisation and mapping efforts. Amarti demonstrated how decentralising mapping efforts and domain model design among engineering teams could improve efficiency. We introduced the Kaluza platform as a means to reduce overall engineering efforts and enhance data mapping processes.
Architectural Guidance and Technological Advancements:
We provided architectural guidance for Amarti’s Kafka infrastructure, and proposed leveraging tools such as Apache Flink/Beam for stateful stream processing and streaming databases. Additionally, we introduced new startups and technologies relevant to the data platform, methodology, and tooling.
The work carried out by amarti resulted in significant improvements in Kaluza’s data governance, integration processes, and overall data management capabilities. The following outcomes and benefits were achieved:
Strengthened Data Governance:
The establishment of a data governance group based on the Data Mesh and Data as a Product organisational patterns proved to be a successful approach. With 16 domain champions from each engineering team, the governance group fostered ownership, collaboration, and responsibility for data management.
Agile workshops conducted by amarti facilitated the definition of purpose, responsibilities, and boundaries of the governance group. This led to a more streamlined and efficient data governance framework within the organisation.
Enhanced Data Integration and Provenance Tracking:
The introduction of a semantic integration layer utilising linked data, knowledge graphs, and Elasticsearch in Neptune significantly improved data integration processes. It enabled the organisation to search, discover, and browse data and data models effectively.
The implementation of a web-based application provided complete provenance information for each data point, offering end-to-end tracking from journey start to the data warehouse. This increased transparency and data reliability.