Apache Storm and Samza guarantee at least once delivery. It means that there may be some duplicates in the computation process. Do we need to move the duplicates by ourselves(including removing duplicate part in our code)? For example, the word count problem. If word 'boy' appear only once, but there are 2 'boy' due to some failure or latency. Storm replayed 'boy'. So is the result of 'boy's count two? Or Storm remove the duplicate for us, the result is one?
Do we need to remove duplicate by ourselves on at least once delivery case?
84 Views Asked by SherleyZ At
1
There are 1 best solutions below
Related Questions in APACHE
- Special access rule in an .htaccess file for IP addresses, authorized only for one directory structure
- How to isolate PHP apps from each other on a local machine(Windows or Linux)?
- Cannot load modules/mod_dav_svn.so into server
- How to ignore case in regexp mapping in a .htaccess rewrite rule?
- Oracle Http server ISNT-07551
- I cant access file directory with PHP local host on XAMPP. it just shows one of the files I have in my visual studio code
- Apache Reverse Proxy: only one proxy directive is working. Second one is ignored
- Issue with Django --> Apache WSGI deployment
- changing the node version used by apache web server
- Apache: How can I redirect to a subfolder with a URL param but serve required content via the main URL?
- Why/How does Apache auto-include "DHE" TLS1.2 ciphers while nginx needs "dhparams" file?
- Set up MX records in apache/Ubuntu to point to external mail server
- How to proxy to another port?
- Php can not upload file out of /var/www/html even after disabling Selinux
- Serve static site on S3 + CloudFlare with Apache retaining the source URL
Related Questions in APACHE-STORM
- How can I serialize a numpy array while preserving matrix dimensions?
- Logging from a storm bolt - where is it going?
- Storm Word Count Topology - Concept issue with number of executions
- Supervisor node will not connect to storm cluster
- Storm [ERROR] Async loop died
- How to export data from Cassandra to mongodb?
- Why is my streamparse topology definition complaining about a wrong number of arguments to thrift$mk-topology?
- storm caching in topology level available for all bolts
- java.lang.RuntimeException : no viable alternative at input '<EOF>'
- storm supervisor exits when processing event
- apache storm into node js
- Passing cmd line params to storm subprocesses
- storm-starter with intellij idea,maven project could not find class
- storm + kafka: understanding ack, fail and latency
- storm topology: one to many (random)
Related Questions in TRIDENT
- Storm Trident group by selected fields before windowing
- Could I use Trident library to generate the absorption spectrum of atoms/molecules like carbon
- Why does the locaton of my animated circles flicker?
- rancher trident storage integration w/ rancherOS 1.55
- how to groupby tuples within a window in Storm trident?
- Apache Storm: Supervisor kills and restarts worker process
- Why does Trident not call ack() or fail() in this minimal example?
- Prometheus and nfs storage
- Do we need to remove duplicate by ourselves on at least once delivery case?
- How to fix error 'Component: [x] subscribes from non-existent component [y]' in Apache Storm Trident topology
- Reading data form Elasticsearch into Flink aggregation?
- How can I return a tuple with multiple fields from Combiner/Reducer/Aggregator function?
- Find the word having the maximum count using a Trident topology
- KafkaTridentSpoutOpaque Repeated consumption the last message
- storing intermediate data in storm topology
Related Questions in APACHE-SAMZA
- Samza task is taking more space even though no process is being performed
- How does samza generate the container.id when the application is deployed in yarn?
- Apache Samza flush table update to changelog immediately
- org.apache.beam.sdk.util.UserCodeException while executing Beam Pipeline using the Samza Runner
- Conflict with runner dependencies in Beam
- Reset to custom offset in Kafka partition
- Force Samza key/value store backed by RocksDB to reload from kafka changelog?
- Exception on samza KafkaSystemFactory.getAdmin
- Samza tutorial compileScala FAILED
- Do we need to remove duplicate by ourselves on at least once delivery case?
- How to implement a message queue system using Samza and Kafka?
- Samza 1.1.0 - run-app.sh does not work during deployment of hello samza
- samza container are failing
- Kafka Producer TimeOutException
- Samza 0.14.1 not correctly handling OffsetOutOfRangeException exception?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Storm won't remove duplicates, you have to check if you already processed the root message at the start of your stream (i.e. your spout) so you don't send it again in your topology and then mess your counters.
Idempotent Consumer pattern is what you should look at. Storing hashes of last events fetched so you can ignore them if they are accidentally sent once more is a way to achieve that for instance (ConcurrentHashMap in memory can do that or external caches like Redis, don't forget to evict these structures once you are certain you have no risk of getting the event again).