I'm planning on joining two topics as KStreams over a long window (~1week). Assuming there will be hundreds of millions of records accumulated in this window, how long will the joining consumer take to restart? I'm asking this because I was unable to find the information regarding how many of the records from the window are stored in the consumer cache.
Kafka KStream to KStream join | restart performance
192 Views Asked by Atom At
1
There are 1 best solutions below
Related Questions in PERFORMANCE
- Upsert huge amount of data by EFCore.BulkExtensions
- How can I resolve this error and work smoothly in deep learning?
- Efficiently processing many small elements of a collection concurrently in Java
- Theme Preloader for speed optimization in WordPress
- I need help to understand the time wich my simple ''hello world'' is taking to execute
- Non-blocking state update
- Do conditional checks cause bottlenecks in Javascript?
- Performance of sketch drastically decreases outside of the P5 Web Editor
- sample query for review for improvement on big query
- Is there an indexing strategy in Postgres which will operate effectively for JOINs with ORs
- Performance difference between two JavaScript code snippets for comparing arrays of strings
- C++ : Is there an objective universal way to compare the speed of iterative algorithms?
- How to configure api http request with load testing
- the difference in terms of performance two types of update in opensearch
- Sveltekit : really long to send the first page and intense CPU computation
Related Questions in CACHING
- Using Puppeteer to scrape a public API only when the data changes
- Caching private wordpress rest endpoints
- Cloudflare not respecting Cache-Control
- Unexpected Recursive Call
- Cannot serialize (Spring Boot)
- Nginx only caches file endpoints
- The Selenium application properties folder holds two environment options. After running a test the environment setting changes to a previous setting
- Launch jobs in cache in a loop in bash script
- Multiple async request do not store anything to cache
- Dev tool for Next.js cache on the client?
- Creating a letter in the terminal by entering
- Laravel: check if cache has key with thag
- The retrieval time for the Apache Ignite cache is too long
- How to run gradle with caches files
- Docker Run cache mount does not cache apt-get dependencies
Related Questions in APACHE-KAFKA
- No method found for class java.lang.String in Kafka
- How to create beans of the same class for multiple template parameters in Spring
- Troubleshoot .readStream function not working in kafka-spark streaming (pyspark in colab notebook)
- Handling and ignore UNKNOWN_TOPIC_OR_PARTITION error in Kafka Streams
- Connect Apache Flink with Apache kudu as sink using Pyflink
- Embedded Kafka Failed to Start After Spring Starter Parent Version 3.1.10
- Producer Batching Service Bus Vs Kafka
- How to create a docker composer environment where containers can communicate each other?
- Springboot Kafka Consumer unable to maintain connect to kafka cluster brokers
- Kafka integration between two micro service which can respond back to the same function initiated the request
- Configuring Apache Spark's MemoryStream to simulate Kafka stream
- Opentelemetry Surpresses Kafka Produce Message Java
- Kafka: java.lang.NoClassDefFoundError: Could not initialize class org.apache.logging.log4j.core.appender.mom.kafka.KafkaManager
- MassTransit Kafka producers configure to send several events to the same Kafka topic
- NoClassDefFoundError when running JAR file with Apache Kafka dependencies
Related Questions in APACHE-KAFKA-STREAMS
- Handling and ignore UNKNOWN_TOPIC_OR_PARTITION error in Kafka Streams
- spring-cloud-stream-binder-kafka-streams consumer shuts down when RuntimeException occurs
- Is there a way to sync applications having kafka stream to avoid duplicate message processing?
- Kafka Streams: Efficient Batch Collection and State Store Management
- Springboot kafka consumer dies permanently
- Understanding the requirements for a Kafka streams application
- Kafka Streams topology initially dropping messages to intermediate topics
- "ConfigException: Please specify a key serde or set one" although I've specified it and also set a default one in my Spring Boot + Kafka Stream app
- Kafka Streams: Kafka Stream Application getting intermittent SaslAuthenticationException
- Switch between Kafka topics
- How to insert a time/data filtered Kafka Stream into a Postgres Database
- Calling POST Rest API in kafka streams application
- Using TopologyTestDriver for testing Biconsumer
- Filtering and forwarding Kafka messages based on key alone with Kafka Streams
- How to write BatchProcessor for lambda with Kafka trigger in AWS?
Related Questions in KAFKA-JOIN
- KStream join with KTable record drops if key not exist in KTable
- Tombstone records not processed in Kstream-Ktable join
- Kafka streams join duplicates
- How to use Kafka Streams to only process messages sent less than 3 times per key (UserId) over a given time window
- How to implement KStream-Ktable leftJoin, how to get and set the field by using Envelope object and implement the join for KStream-Ktable?
- Relative amount of events in kafka streams using join
- What is the best approach to merge two events from same topic using Kafka Streams Api?
- Kafka streams join: How to wait a duration of time before emitting records?
- Kstream-Kstream join based on common field
- Kafka- join KStream and KTable on composite key
- Does KTable and KStream join publish a new record when there's an update on KTable?
- Can't we join two tables and fetch data in Kafka?
- how to handle one to many relationship using kafka streams join operations
- Kafka - Joining Data from two different Streams when Data comes at different times
- Kafka KStream to KStream join | restart performance
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
By default, data that is buffered in a window is stored in RocksDB, ie, local disk. Hence, on restart (on the same machine) nothing needs to be re-loaded as the data is already available.
If you restart on a different machine, the whole content of the store would need to be re-read from a Kafka topic (that backs up the store to guarantee fault-tolerance). How long this takes depends on many factors and it's hard to estimate. You can register a "restore callback" though to monitor the restore process. This should give you some way to run some experiments to get insight how long it may take.