I am doing some benchmarks with RocksDB Java for my own application data and would like to be sure the created data is stored as optimally as possible before starting to measure read performance (i.e. if any background compaction etc. is going on during/after inserts I would like to wait for that to complete). Is this something I need to be concerned about and if so how can I programmatically know when it is ok to start my read benchmark?
Do I need to wait for background compaction to finish after creating test data to do a good read benchmark?
219 Views Asked by Tristpost At
2
There are 2 best solutions below
Related Questions in BENCHMARKING
- How can I check OPA memory usage with big file size?
- Combine known-size slices into an array in rust
- appending one byte array dramatically fewer allocations than 2 byte array
- Measuring TensorFlow Lite Model Speed
- Understanding Parameters for Intel MKL LINPACK w/MPI `ppn` and `np`
- How to optimize the following conditional assignment of a vector?
- Improving Django Application Performance: Comparing Blocking and Non-blocking Implementations
- Achieving More FMA3 Performance Than The Theoretical Maximum
- How to turn off the level 3 cache on my AMD Threadripper Pro so I can get good benchmarking of my Gnu C++ code?
- double value contains 'm' at the end while printing in google benchmarks table
- How do I improve benchmark accuracy in Javascript?
- OCI runtime error while executing ML perf object detection benchmark
- Java: (Micro) benchmark library imports using JMH?
- Wrong memory benchmarking results in Golang
- I'm beginner.I wonder how to evaluate my own pretrain model on GLUE benchmark?
Related Questions in ROCKSDB
- Flink 1.15.2 OOM issue due to RocksDB
- How to solve the resource temporarily unavailable problem when using YCSB to generate multiple clients to access rocksdb?
- RocksDB merge operands versus input/output values
- Rocksdb bloom filter stats showing zero values
- How might I implement etcd's watch-stream functionality with RocksDB?
- RocksDB with jemalloc or tcmalloc in KafkaStreams
- How to make sure which memory allocator will be used by RocksDB?
- Spark Structured Streaming StateStore Exception with RocksDBStateStoreProvider
- custom `prefix_extractor` per `ColumnFamilyOptions` in RocksDB
- flink sql job throw no space left exception though sufficient space was available
- State store in Kafka Streams processor returns random values
- Setting LIBRARY_PATH for rocksdb in linux with clang/gcc
- How to test range query performance using db_bench of rocksdb?
- Why am I getting java.lang.NoClassDefFoundError after upgrading Kafka streams version?
- flink with rocksdb failed when doing aggregation
Related Questions in ROCKSDB-JAVA
- Rocksdb bloom filter stats showing zero values
- RocksDB. Backups take up more space than the database itself
- How to configure RocksDB matrics using kafka streams 3.2.0 in java?
- RocksDB Metrics
- Flink RocksDB custom options factory config error disable block cache
- Memory is not reclaimed when close the rocksdb instance
- Will rocksDB support nested keys for one value?
- Calling java native method from Kotlin
- Kafka Streams state store count
- Rocksdb deleted records are visible in iterator
- Tuning rocksDB to handle a lot of missing keys
- How to use Rocksdb merge in Java?
- can I backing up rocksdb while putting?
- Is there a way to get Strong Consistency with RocksDb in Java?
- Optimal RocksDB configuration for use as secondary "cache"
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
This is a tricky subject. The overly general answer is to test what you care about. If you care about system performance overall under a mixed read-write workload, that's probably what you should test. If you care about read performance under those conditions, then you should probably test under those conditions. (Note that RocksDB LOG file reports operation counts and latency statistics, though those don't include penalties associated with the Java layer.) However, it can require hours of testing under such chaotic conditions to get reliable data about one aspect of performance such as read latency or max throughput.
If you are willing to sacrifice some statistical validity for more statistical reliability (for faster accurate performance measurement) then you can run just the read path. As you note, you want to avoid background compactions in order to consistently isolate just the read path. For this I recommend re-opening the database as read-only and then performing your reads. Or you can wait for pending compactions to finish by periodically polling DB property kNumRunningCompactions until it is zero (perhaps several times in a row). This approach generally leaves the LSM in some random, average-ish state that reflects how reads will perform in an active read-write system, though the particular LSM state can vary considerably, so you might want to average over several such states.
The problem with running a full compaction before testing read performance is that your LSM will always be in an "optimized" state, so reads will be as fast as they can be. If your actual workload is always read-only after compaction, then by all means test this way, but it's considered to have low validity for real-world read performance for most workloads.
If you are doing A-B testing on a change that doesn't affect how the DB is written, then the best approach is to build a single DB and test read performance under both A and B configurations on that DB, opened read-only. You can even run the A and B tests simultaneously so that each is similarly affected by any noise from other processes on the system.
And of course one of the big challenges is that performance characteristics can change dramatically for small DBs vs. large DBs, and large DBs take a very long time to construct.