I am trying to understand if there's a fundamental difference between what the 2 are trying to achieve. I have a use case of landing my postgres data to data lake, and these are the 2 paved-road approaches that i have.
Option 1. Create an outbox table in my database, commit to the table in the same transaction as my main tables, then a tool Ceres picks up this change (CDC) and publishes to Kafka
Option 2. Connect my postgres to a debezium connector, Debezium automatically reads my WAL and keeps on publishing the changes in my DB to data lake.
At first sight, looks like Option 2 is a neater and cleaner approach with no overheads of committing to Outbox table. Is my deduction correct? Outbox pattern looks to be the legacy pattern which could now be redundant since we can accomplish the same in a simpler, neater way using Debezium?
The Outbox pattern is a way to solve the 2-phase-commit issue. One way to realize it is using Debezium Connectors (another one would be to poll the outbox-table).
You do not need to have a Outbox pattern to use Debezium though (you can monitor your entity tables directly with a Debezium connector for example).
If you want to enable Debezium Connectors you need to enable CDC. CDC simply means Change Data Capture -> a way to capture data changes in your database.
Debezium itself has a good article about using their connectors to implement the outbox pattern: https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/