
ANWB - Solution Architect DataHub
Introduction of a new data integration concept and realisation of the self-service platform to support all data initiatives within the ANWB
Technologies:
ANWB - AWS Solution Architect DataHub
Project Overview
At the beginning of 2019 I introduced the DataHub concept at ANWB. As Solution Architect I provided technical leadership to a newly established team of cloud and data engineers to realise this concept as a self-service platform.
This data integration concept brings together the originally unfortunately separated worlds of ETL, Data Warehouses and Business Intelligence and transactional and operational systems. We can also serve the emerging disciplines of Machine Learning, Data Science and AI. Traditionally data for all kinds of applications is exported, made available and replicated multiple times without real visibility of where the data goes and what it is used for. This platform makes integration easier and gives immediate insight into where data goes and supports data governance discipline.
The platform started as a proof of concept by realising 4 use cases as proof that we could easily create high-quality data integration solutions with AWS Serverless technology. This as an alternative to the creaking infrastructure of the enterprise service bus (ESB) and the ETL platform.
The platform largely consisted of two frameworks, a streaming framework and a file/batch framework. Data imported via streaming processes could be exported again via file processing if necessary for the receiving party. All import and export processes existed and ran independently of each other. Because we only used serverless technology scaling issues were never a problem. Transactional databases can be replicated via Change Data Capture and further processed as streaming data or as files.
Self Service Platform
All components were set up in such a way that they can be created and modified by the data suppliers and users themselves. 75% was configuration, the rest were either SQL queries to make your selection of data or filling in a framework lambda function to do transformations on streaming events.
The platform is fully documented from first steps and tutorials to a description of all available components to create integrations with all kinds of data technologies. OData, CDC, FTP, SMB, Kinesis, S3, Kafka, Snowflake, SQL. In addition one could also create custom integrations with APIs from vendors for example.
AWS Autonomy
The use of AWS was still in its infancy within ANWB at the start of this trajectory. During the preparatory phase I started with guardrails and automatically hardening new AWS accounts. In this way I with my team could take full ownership of the technology stack of our solutions without having to submit all kinds of change requests to central infra teams with every release.
Ultimately this basis has grown within ANWB into a full-fledged Cloud Platform team that via AWS best practices such as Control Tower can roll out accounts for new teams or projects.
Technology Stack
Python, CDK, API Gateway, Lambda, S3, Glue, Athena, DynamoDB, Step Functions, Kinesis, SNS, SQS
Alle projecten
Ready to get started?
Let's discuss how I can help your organization. I'm happy to talk about the possibilities.