Is watermark in Structured Streaming always set using processing time or event time or both?
Is watermark based on processing time or event time or both?
935 Views Asked by Jacek Laskowski At
1
There are 1 best solutions below
Related Questions in APACHE-SPARK
- Spark .mapValues setup with multiple values
- Where do 'normal' println go in a scala jar, under Spark
- How to query JSON data according to JSON array's size with Spark SQL?
- How do I set the Hive user to something different than the Spark user from within a Spark program?
- How to add a new event to Apache Spark Event Log
- Spark streaming + kafka throughput
- dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error
- Spark pairRDD not working
- How to know which worker a partition is executed at?
- Using HDFS with Apache Spark on Amazon EC2
- How to create a executable jar reading files from local file system
- How to keep a SQLContext instance alive in a spark streaming application's life cycle?
- Cassandra spark connector data loss
- Proper way to provide spark application a parameter/arg with spaces in spark-submit
- sorting RDD elements
Related Questions in SPARK-STRUCTURED-STREAMING
- Spark Structured Streaming
- Why does Spark application fail with “ClassNotFoundException: Failed to find data source: kafka” as uber-jar with sbt assembly?
- Spark structured steaming from kafka - last message processed again after resume from checkpoint
- Spark Structured Streaming using sockets, set SCHEMA, Display DATAFRAME in console
- TypeError: 'Builder' object is not callable Spark structured streaming
- Is there a bug about StructField in SPARK Structured Streaming
- Why do Spark DataFrames not change their schema and what to do about it?
- How to calculate the time to fetch records from Kafka?
- Converting Json string to Avro in scala results in ClassCastException
- Select entire row based on a logic applied on 2 columns across multiple rows
- How to use from_json with schema as string (i.e. a JSON-encoded schema)?
- Why does spark-submit fail to find kafka data source unless --packages is used?
- Why does starting a streaming query lead to "ExitCodeException exitCode=-1073741515"?
- How to perform multiple time Window operation on Streaming DataFrame?
- How to know what events were late in a streaming batch?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
In Structured Streaming 2.2 streaming watermark is tracked based on event time as defined by
eventTimecolumn in Dataset.withWatermark operator.That gives you event time watermark by default.
But your initial Dataset can have no event time column initially and thus you can auto-generate one using
current_dateorcurrent_timestampfunctions or some other way at processing time. That would give you processing time watermark (based on the custom-generated column).In the most generic solution using KeyValueGroupedDataset.flatMapGroupsWithState, you can pre-define the strategies or write a custom one. That's why they call it a solution for Arbitrary Stateful Aggregations in Structured Streaming.