Enterprise application integration vs data integration
Enterprise application integration (EAI) and data integration are two areas that often cause confusion. How do we distinguish between the two and choose the right middleware for the job? This post will answer the question and I hope it helps when trying to navigate this space.
EAI refers to the process of combining multiple software applications, systems, and data sources into a single unified system to improve efficiency and data sharing. This often involves connecting disparate applications using APIs or middleware to enable the exchange of data and information between them. The goal of application integration is to eliminate the need for manual data entry, reduce data duplication and errors, and enable a more seamless flow of information across an organization.
Some refer to this as an ESB (Enterprise Service Bus), which is a pattern not a tool and the message-oriented middleware you have, may or may not support the pattern. Message- oriented middleware these days is often used to create and manage micro services.
Data integration, often referred to as ETL (Extract, Transform and Load) and ELT (Extract, Load and Transform) are two different technologies that are typically used for datawarehousing, business intelligence and reporting.
For the sake of this post I will refer to EAI as message orientated (referring to ESB) and data orientated middleware (referring to ETL and ELT).
Message orientated middleware
- Provides a communication layer between applications and services within an organization, enabling them to exchange data and messages in a loosely coupled manner. It acts as a central hub to route, transform, and manage data and messages, promoting service integration and reuse. Service oriented architecture (SOA) or ESB re typical descriptions used.
Data orientated middleware
- Is used to extract data from multiple sources, transform it into a desired format, and load it into a target system, typically a data warehouse, for reporting and analysis. Data orientated middleware is mainly focused on the integration of data from different systems and its preparation for analysis. ETL, ELT tools fall into this category.
For simplicity we can say that one uses a message-oriented middleware when you need to integrate different applications and services within an organization, facilitating communication and data exchange. Use data-oriented middleware when you need to extract data from multiple sources, transform it, and load it into a target system for reporting and analysis.
They both have their own strengths and weaknesses, and the choice between them will depend on the specific requirements of a given project.
It is (typically) more appropriate to choose a data-oriented approach:
- When the main focus is on data integration and preparation for reporting and analysis.
- When dealing with large amounts of data from multiple sources that need to be transformed and loaded into a centralized data repository.
- When the data integration requirements are well defined and stable, and the focus is mainly on batch processing.
- When cleansing of data is required.
It is (typically) more appropriate to choose a message-oriented approach
- When the main focus is on real-time communication and integration between applications and services within an organization.
- When there is a need for a flexible and scalable communication infrastructure that can handle a high volume of messages and data.
- When the integration requirements are dynamic, and there is a need for a more agile approach to integration.
If your project focuses more on data integration and preparation for analysis, an ETL / ELT - data approach may be a better choice. If your project focuses more on real-time communication and integration between services, SOA / ESB - messaging approach may be a better choice.
It is not a clear cut choice though...
- High volume data transfer: When there is a large volume of data that needs to be moved quickly, ETL can be a more effective solution as it can handle batch processing and parallel processing to speed up the data transfer. This is changing with integration platforms as a service, as they tend to this very well.
- Complex data transformations: ETL can handle complex data transformations that may be difficult to implement in an SOA implementation. Especially, if data needs to be cleaned, reformatted AND aggregated, ETL is a better option. Transformations can be done in either is the combination that wins for ETL.
- Integration with legacy systems: SOA / ESB could be more effective for integrating with legacy systems that may have limited APIs or may be difficult to integrate with an ETL approach. However, the opposite is true as well, it depends.
- Data warehousing and Business Intelligence: ETL is well suited for data warehousing and Business Intelligence use cases, where large amounts of data need to be processed and transformed for analysis.
It's worth noting that these are not mutually exclusive, and some organizations may choose to use a combination of both to meet their specific needs. This is a good idea.
We now have a grasp of the differences and can elaborate on the details
- An Enterprise Service Bus (ESB) is better at providing a centralized infrastructure for communication and data exchange between different applications in a service-oriented architecture (SOA) environment. An ESB acts as an intermediary between applications and supports many protocols, data formats and message exchange patterns, while also providing features like routing, transformation, security and error handling. This is the message-oriented approach.
- An Extract, Transform, Load (ETL) tool, on the other hand, is better at extracting data from multiple sources, transforming it into the desired format and loading it into a target system, usually a data warehouse, for analysis and reporting. ETL is typically used for large-scale data migration, aggregation and data warehousing, but is not designed to provide communication and integration between applications in real-time. This is the data-oriented approach.
While both SAO / ESB and ETL have similar capabilities to extract, transform and load data, an SAO / ESB is better suited for real-time communication and data exchange between applications, while an ETL approach is better suited for batch processing and data warehousing. In some cases, not always - I am purposely creating a distinction here.
On the data side we have (at least) two approaches ETL and ELT – there is a distinction here as well.
When do I choose to ELT or ETL
- Real-time analytics: ELT is better suited for real-time analytics as the data can be transformed and loaded faster into the target system.
- Hybrid cloud architecture: ELT is better suited for hybrid cloud architecture as it allows for the data to be processed on the cloud and transformed using cloud resources.
However, ETL is still preferred if the data requires extensive cleansing before it is loaded into the target system. Nothing is black and white, cloud ELT tools are getting pretty good here as well.
I mentioned service oriented architecture, earlier- SOA is a widely used approach for building and organizing software systems in a modular, scalable, and flexible manner. SOA allows for the integration of independent services, each with a well-defined interface, to create larger, more complex applications. While the term "SOA" has fallen out of fashion in recent years, the principles and practices behind it remain relevant and widely used in modern software development. We now build micro services that basically are a sub-set of what we did in an SOA (over simplified, I know).
To put it all into context. Some commonly used integration patterns or integration styles if you prefer would be:
- Batch Processing: This is a simple pattern that involves collecting data in batches and processing it in one go. It's best used when there's a low volume of data and no real-time requirement.
- Real-time Streaming: This pattern involves sending data in real-time as soon as it becomes available. It's best used when data needs to be processed as soon as it's generated and low latency is a requirement.
- Change Data Capture (CDC): This pattern involves capturing changes to data in an ERP system and processing only the changed data in real-time. It's best used when there's a high volume of data and real-time requirements.
- Message Queuing: This pattern involves sending messages from the ERP to a queue and processing them as soon as resources are available. It's best used when there's a high volume of data and real-time requirements, but data consistency is not a strict requirement.
The best integration approach for a specific scenario will depend on the specific requirements and constraints of the systems involved. As we can see each type of tooling will fit differently and if one uses the wrong tool it will NOT deliver on the promise of what ever reason one had to invest in the first place.
Message and data-oriented approaches are popular for integrating and processing data in modern enterprise architecture. Each approach has its own advantages and disadvantages and the best approach depends on the specific requirements of a project.
The choice of approach should be based on the goals of the project, the type of data, and the processing requirements. If you have an integration problem you would like to discuss I would be happy to pitch in.
oktober 24, 2023