Jakub Ratajewicz
Business Analyst in the GAIA project
In the era of digitalization and sustainable development, innovative technological solutions play a crucial role in the success of financial institutions. At Raiffeisen Tech, we are committed to the green revolution and strive to introduce groundbreaking solutions that contribute to environmental protection and sustainable development.
One of our latest projects is the ESG Data Hub, implemented on the APEX platform. This initiative, along with the EU Taxonomy Tool (GAIA), has led to the formation of a new Tribe called "Accelerate Data Governance & Green Data Analytics (ADGGDA)." Interested in the details of our endeavor and the benefits it offers to clients and the entire financial sector? Read on!
ESG stands for Environmental, Social, and Corporate Governance. These three categories are the foundation for evaluating businesses, countries, and other organizations in terms of sustainability. The ESG Data Hub is designed to be a central repository providing various teams within the bank with data necessary for conducting such assessments.
We are building the ESG Data Hub on the APEX platform, an internal RBI project leveraging Databricks technology. APEX offers an environment and tools for creating data engineering and machine learning solutions. Many projects are developed on APEX concurrently, and Raiffeisen Tech has played a significant role in advancing the platform itself. Our team is developing a Data Hub for the Head Office, with the long-term goal that each bank within the Group will be responsible for its own Hub, based on our solution.
In the current, initial phase of the project, we are focused on implementing the Data Hub itself, represented in blue on the diagram. The primary data processing workflow is implemented as a Databricks Workflow. We construct this workflow declaratively, defining its configuration in YAML files and utilizing Databricks Asset Bundles. One major advantage of Databricks Workflows is the ability to easily schedule the execution of processes and set up notifications, such as via email or Teams, in case of failures.
Each step is written in Python. For data processing, after acquisition, we use Apache Spark, which is the primary tool for such tasks on Databricks. Initially, we deal with three data sources:
In the first step, data is ingested (from APIs or AWS S3 buckets), parsed, and stored in the Hub as CSV/Parquet files. We then use Delta Live Table Pipelines to incrementally clean incoming data, normalize it, and transform it into a consistent structure. Each pipeline consists of several steps and stores the processed data at various stages in tables within the database, following the Medallion Architecture model.
The consolidated data is merged into a single table, the primary output of the process, which other teams will use. The final step involves sending a message to REDA (Rice Event Driven Architecture) about the addition or update of data in the Data Hub. REDA utilizes Confluent Kafka, and we send messages via Spark Structured Streaming. Data consumers receive our messages, informing them of what data is available and when they can retrieve it from the Data Hub.
The final component is Denodo. Denodo enables the creation of APIs based on various data sources, such as relational databases or Databricks, in a partially automated manner. After receiving messages about new data from REDA, Denodo allows consumers to fetch data from the ESG Data Hub.
At this stage, we focus on acquiring and handling data to be made available to specialized teams. Once this phase is stabilized, it offers the potential to extract aggregate information from the collected data, which can be shared more broadly within our organization.
ESG data is a broad concept. The categories mentioned above have numerous characteristics, providing significant potential and opportunities in the finance sector. The data provided by our MVP will be used to assign a client rating based on environmental impact. This rating will influence the terms and incentives available to companies that transform their business or undertake initiatives, such as emission reduction efforts. Client ratings and other ESG characteristics are or will be required by regulators from financial institutions. This data will also allow us to declare and assess our Bank's portfolio. Highly emissive clients may pose future risks, as regulations become stricter, potentially leading to higher costs for such clients.
Currently, we are beginning to collect new data, focusing on:
The development of the ESG Data Hub is a significant step towards sustainable development at Raiffeisen Tech. The new functionalities will enhance the efficiency and transparency of financial processes, supporting our mission of environmental protection and social responsibility.
Business Analyst in the GAIA project