Embedded Debezium configurations for dynamic table and filters addition without connector restart

499 Views Asked by At

I'm using an embedded debezium engine (v1.8.1) within my spring boot application. Our application allows the user to create workflows for their businesses and they can create user-defined triggers to start the workflow. The user can define triggers based on some data change in the database to start business workflows. In our application, users can watch any table from DB and create any filter criteria on any of the columns in the table. We are using Embedded Debezium to capture all data change events (CRUD) for that table and when the column condition matches for any of the rows through CRUD operations, a trigger is set off to start the workflow. For e.g., In the case of an Order and Delivery system, the orders table is watched for the specific column 'STATUS' and when it matches the value 'COMPLETED', the delivery workflow will be started for all customers whose orders are in completed status.

To achieve the above functionality, I'm having the following queries,

  1. Which snapshot mode/settings are efficient/optimal for the above functionality?

  2. How to do dynamic table addition? - Users can add any new table to be watched at any point in time. I read about signal tables and the following configurations required to achieve the same.

"signal.data.collection":"schemaname. debezium_signal" "table.include.list", "schemaname.tb1,schemaname.tb2,schemaname.debezium_signal"

I planned to have the signal table created, and to add its entry in the table.include.list before the connector is brought up for the first time. From thereon, whenever a user adds a new table (say schemaname.tb3) to be watched, we may need to add an entry in the signal table from our application (insert into debezium_signal values (1,'execute-snapshot','{"data-collections": ["schemaname.tb3"]}')) for that table to be picked up for incremental snapshot and watched for any data change events from thereon. But it seems like the configuration table.include.list should also be updated to include the new table (schemaname.tb3) to capture its change events which require a connector restart. Is there a way to add tables dynamically without restarting connectors?

  1. How to achieve different transforms/filters for each table - I want to have different filters or criteria (user-defined) to be matched for each table for selecting rows. For eg, schemaname.tb1.STATUS == 'COMPLETED' for tb1, schemaname.tb2.delivered_date > some_date for tb2. Currently, the filters that we define in the configuration are applied for all the tables and we don't have the option to configure them for specific tables. Also, these criteria or filters should be added dynamically whenever the user defines them in the application. How do we achieve this without connector restart? Moreover, the documentation says that the search criteria column should be an indexed column (for efficient performance).

  2. Debezium services have rest end-points to update connector configuration. Do we have anything similar to achieve the same for updating connector configuration? Or do we need to have our own rest APIs to achieve the same?

  3. Since we are going to use an embedded debezium engine, what are the optimal heap memory and other JVM options to configure to process change events from a database having millions of records without crashing the connectors.

  4. Planning to use handleBatch() handler to handle batch events if there is more than one row satisfying the user-defined criteria. Is there any drawback using the batch handlers from the usual handleEvent().

Kindly advise.

0

There are 0 best solutions below