How to configure kafka cluster for batch processing

1.5k Views Asked by At

I am trying to provide only certain number of new rows to a DB(consumer) using Kafka connect. For which I've configured the source config file as

This is how the source.properties looks:

 name=source-postgres
 connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
 tasks.max=1
 batch.max.rows = 10
 connection.url=jdbc:postgresql://<URL>/postgres?user=postgres&password=post
 mode=timestamp+incrementing
 timestamp.column.name=updated_at
 incrementing.column.name=id
 topic.prefix=postgres_

This is the content of sink property file

name=dbx-sink
batch.size=5
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1

# The topics to consume from - required for sink connectors like this one
topics=postgres_users

# Configuration specific to the JDBC sink connector.
# We want to connect to a SQLite database stored in the file test.db and auto-create tables.
connection.url=jdbc:postgresql://<URL>:35000/postgres?user=dba&password=nopasswd
auto.create=true

But this doesn't have any effect, whenever a new row is available it's getting inserted to the DB (consumer). So, I added another config parameter to the sink batch.size=10. This also has no effect.

When I am starting the connect-standalone.sh script I can see the batch.max.rows = 10 on the console.

What am I doing wrong or how to fix it?

1

There are 1 best solutions below

4
On

batch.max.rows will send 10 rows per batch; it won't limit the number of rows sent in total.