Outbox Pattern vs Debezium

961 Views Asked by Kshitij Kohli At 12 May 2023 at 07:58

I am trying to understand if there's a fundamental difference between what the 2 are trying to achieve. I have a use case of landing my postgres data to data lake, and these are the 2 paved-road approaches that i have.

Option 1. Create an outbox table in my database, commit to the table in the same transaction as my main tables, then a tool Ceres picks up this change (CDC) and publishes to Kafka

Option 2. Connect my postgres to a debezium connector, Debezium automatically reads my WAL and keeps on publishing the changes in my DB to data lake.

At first sight, looks like Option 2 is a neater and cleaner approach with no overheads of committing to Outbox table. Is my deduction correct? Outbox pattern looks to be the legacy pattern which could now be redundant since we can accomplish the same in a simpler, neater way using Debezium?

Original Q&A

There are 3 best solutions below

Lumix On 18 May 2023 at 05:29

The Outbox pattern is a way to solve the 2-phase-commit issue. One way to realize it is using Debezium Connectors (another one would be to poll the outbox-table).

You do not need to have a Outbox pattern to use Debezium though (you can monitor your entity tables directly with a Debezium connector for example).

If you want to enable Debezium Connectors you need to enable CDC. CDC simply means Change Data Capture -> a way to capture data changes in your database.

Debezium itself has a good article about using their connectors to implement the outbox pattern: https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/

Ramiro González Maciel On 17 November 2023 at 12:49

As you mentioned, both options will work. I think the difference lies in that the first option makes explicit the use of an Event on the domain/services: you have to create the outbox-event table, define the entities/aggregates that can be published, add the "insert event" logic to your code, etc.

The first approach seems more appropriate to use in microservices communication, where as part of the logic you want a service to publish an event, so you model this explicitly.

The second option seems more appropriate to a "data lake" needs as in your case, where you want to collect data into a data lake, but are not so much interested in modeling events.

Nitin Gaur On 31 December 2023 at 08:58

Yes, Option 2 seems a neater and cleaner approach. However, the benefit of having an outbox table is, it can represent your message structure. Otherwise you may end up introducing message model in you main table or hiding message creation logic inside CDC tool. In that sense, Option 1 is cleaner! So, it depends which style you prefer.

Outbox Pattern vs Debezium

There are 3 best solutions below

Related Questions in REAL-TIME

Related Questions in DEBEZIUM

Related Questions in REAL-TIME-DATA

Related Questions in DATA-INGESTION

Related Questions in OUTBOX-PATTERN

Trending Questions

Popular # Hahtags

Popular Questions