
Jacek Cieślak
Service Manager at APEX Service Managers project
Three of our apex teammates took on huge responsibility. They took the floor at this year's data science summit at PGE Narodowy Stadium. It is the largest independent data science conference in the cee region. Raiffeisen Tech Poland could not miss it.
A venue full of people waiting to hear what we have to say. All eyes on us. Several cameras aimed at our direction, capturing our every move, microphones weighing a ton in our hands. This definitely is not the world we live in every day. But a great opportunity came along, enabling us to learn of this world from the inside out. There is no better way to discover the true nature of a tech conference, than to give a speech during one. So, brace yourselves for the story of how we held on tight at the top of the technological wave... for the 30 minutes!
DSS 2024 was the very first conference of this scale in which we participated as the official representatives and speakers of Raiffeisen Tech Poland. Speaking of us, it's high time we introduce ourselves:
Attending the Data Science Summit was a great opportunity to explore the latest trends in data management, machine learning, and digital transformation. We had a chance to exchange experiences with experts from various industries and learn about innovative solutions used by market leaders. We particularly appreciated the possibility of participating in workshop sessions, which allowed us to learn about the latest tools and technologies used in analytical processes and data management.
During Data Science Summit 2024, we gave a talk on ‘Navigating the Data Storm – Our Journey from Data Lake on AWS to Databricks’. We shared our experiences related to the transformation of data infrastructure. In our presentation, we discussed the challenges, strategies, and benefits of migrating from a traditional Data Lake on AWS to the Databricks platform. Below is a detailed summary of each of the three parts of the talk, with emphasis on the crucial aspects discussed.
The first part of the presentation was done by me – Jacek. I discussed the transition from the traditional Data Lake on the AWS platform to the Databricks, emphasizing that we opted for Databricks because our existing internal solutions were not integrating seamlessly. The RBI group needed a unified platform capable of handling data ingestion, processing, and supporting machine learning and AI applications. Additionally, we chose to decentralize our Head Office-developed solutions, recognizing that both Head Office and Network Units should operate independently, own their processes and data, and maintain their own governance.
The benefits of adopting Databricks are:
I also discussed the problems associated with the rapid growth of the platform and its integration with legacy or on-premise solutions, which usually contributes to various obstacles, such as slowing down the process of provisioning and accessing resources, the complicated management of network connectivity and permissions models across the group, or data sharing between Head Office and NWB projects.
Finally, I referred to the Self-Service solution, which is currently under extensive development and will be delivered to users soon. But more on that in Robert's part.
During the second part of the presentation, the stage belonged to Robert Marek, who outlined the need for self-service in the context of autonomous data ingestion, data processing, machine learning, and access management. He discussed two alternative development paths for the area, weighing up the advantages and disadvantages of each.
Exploring alternative approaches:
Robert also presented tools to support the self-service approach, including alerts for monitoring data delays to enable more efficient management of data. He also noted the importance of standardizing processes to avoid unnecessary delays and complexity.
Lastly, he pointed out that the platform was designed for seamless data access and user-friendly management, is inherently complex and must operate in accordance with the requirements of our RBI group. This necessitates meticulous planning and careful management.
Last but not least – Mateusz Wujec – who discussed the challenges of data quality management and testing from the perspective of a data engineer. He emphasized the importance of regular data analysis, highlighting several key tools and processes:
Mateusz also indicated that the biggest challenge was managing the quality of data from external systems. He stressed that full quality improvement requires not only better tools but also collaboration with data providers.
One of the key issues raised by Mateusz was the testing of CI/CD processes. With tools such as Terraform and GitHub Actions, the infrastructure as a code has been greatly improved. Nevertheless, testing streams in Databricks notebooks remains a challenge that the team continues to work on.
No one will be surprised that the presentation caused us some stress. Despite some jitters, everything went according to plan. The presentation concluded with a QA session, which allowed us to further explore the topics. The discussion that ensued showed how popular the topic of data infrastructure transformation is. Together with other conference participants, we saw how important flexibility, collaboration, and process standardisation are in today's world of data management. Despite some difficulties, the migration to Databricks proved to be a key step in optimizing data processing. It has given our teams greater autonomy while ensuring a high level of data quality and security. It was a very challenging but also rewarding day!
We shared not only the challenges we encountered during this transformation, but also the solutions that proved to be crucial to the success of the entire process. We described the entire journey through the data storm – from planning to implementation to the final result that revolutionised our approach to data processing.
If you are curious about the topic of migrating from DataLake on AWS to Databricks and you want to know the details of our technological journey, we encourage you to watch the video available on our Tech Blog and on YouTube. We also invite you to explore the presentation in English, which we discussed during our talk.
Service Manager at APEX Service Managers project