
Robert Marek
System Analyst at Raiffeisen Tech
APEX is a great example of collaboration between engineers from Poland, Kosovo, and Romania. In this article, I will describe the fundamentals of how the data layer functions within the platform and introduce some key terms related to APEX.
The name APEX stands for The Analytics Platform EXperience and serves as the successor to the RBI Group Advanced Analytics Data Lake and Workspace.
The APEX platform is distinguished by several key features that set it apart from traditional services provided by the Head Office (HO). First and foremost, it is characterized by a decentralized architecture, meaning it is distributed both logically and physically. In the context of tenancy, within each Network Unit, every team operates as a fully independent tenant, providing greater flexibility and autonomy in management. APEX is also a self-service platform, offering dedicated services that enable user management, data sharing with other tenants, and data importing and publishing.
Importantly, the platform is managed by the APEX team, which means that users do not need advanced technical knowledge of cloud infrastructure to effectively use its available functionalities.
APEX was developed to replace the previous data lake model used within Raiffeisen Tech, which relied on AWS along with processing via EMR and Airflow. This model required a very rigid approach to processing structures, making cooperation between Data Scientists and Data Engineers more difficult. The APEX platform aims to standardize the organization's approach to data, facilitating automation and scalability across the entire data science cycle (model training, testing, validation, etc.).
The initiative started in 2021, and the platform is continuously evolving. Some of the key milestones in its development include:
APEX is used by many project teams, including RBI Head Office and Raiffeisen Tech Poland’s team.
All of these teams leverage the benefits of the APEX platform to develop their products. The integrated development environment, along with broad access to data engineering automation methods and model creation, accelerates the delivery of new product versions. Integration with the organization's internal data ecosystem, along with the Data Unit and Data Share systems, enables seamless collaboration between Data Scientists and Data Engineers. The ability to scale computing resources allows for the optimization of technical solutions in terms of performance/price ratio.
The APEX platform is tightly integrated with the AWS cloud ecosystem. Additionally, it enables integration with other external systems, such as Power BI, using universal and standard data consumption methods like jdbc/odbc or sftp.
One of the key components of the APEX ecosystem is the Databricks platform, provided through a WebUI panel.
Within APEX, Databricks enables, among others:
An important component of the data layer are the concepts of Data Unit and Data Share. Understanding these is crucial for understanding how data is shared between tenants.
A data unit, in simple terms, is a representation of data location. In the context of APEX, it refers to a location in the AWS cloud (S3), both within a given Network Unit account and the Data Lake.
The primary types of Data Units are:
Data Share is a logical representation of data sharing for a given tenant. Simply creating a Data Unit does not grant data access to any tenants; a dedicated Data Share is required.
In simple terms:
A few words about the APEX user interface. Key functionalities are described, showing how users can interact with the platform. Screenshots can be added or configuration options described.
The current priority is the integration of Unity Catalog, which will provide many additional capabilities to platform users, such as:
In addition, the self-service layer is being actively developed, enabling the delegation of workspace management tasks to end users, without the need to contact the Head Office. Soon, the option to add a vector database to selected workspaces will also be made available.
Although this article provides only a general overview of the APEX data layer architecture, we hope it has brought some clarity to how the platform operates. With its decentralized architecture and flexible data management options, APEX is a modern and versatile solution for analytical needs within the RBI Group. More information about data acquisition and processing in APEX will be covered in the next article.
System Analyst at Raiffeisen Tech