Cassandra long query time and adding to memtable when keys are fully constrained

Question

Cassandra long query time and adding to memtable when keys are fully constrained

458 Views Asked by that_guy At 06 September 2017 at 12:46

I have a Cassandra table and the keys look like this:

PRIMARY KEY (("k1", "k2"), "c1", "c2"), ) WITH CLUSTERING ORDER BY ("c1" DESC, "c2" DESC);

When I fully constrain a query it takes significantly longer than if I leave out the last clustering key. It also preforms an "Adding to feed memtable" which the unconstrained query does not. Why is this? I know previously this query would not add the entry to the memtable as I have custom code running when things are added to the memtable. This code should only run when things are inserted or modified but started running when I was only querying items.

Edit: I should have mentioned both queries return 1 row and it is the same record.

  activity                                                                                                                                                                          | timestamp                  | source        | source_elapsed | client
 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
                                                                                                                                                                 Execute CQL3 query | 2017-09-05 18:09:37.456000 | **.***.**.237 |              0 | ***.**.*.4
                                              Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC'; [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 |            267 | ***.**.*.4
                                                                                                                                          Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:37.456000 | **.***.**.237 |            452 | ***.**.*.4
                                                                                                                     Executing single-partition query on feed [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 |           1253 | ***.**.*.4
                                                                                                                                 Acquiring sstable references [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 |           1312 | ***.**.*.4
                                                                                                                                    Merging memtable contents [SharedPool-Worker-3] | 2017-09-05 18:09:37.457000 | **.***.**.237 |           1370 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 22 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           6939 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 21 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7077 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 12 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7137 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 6 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7194 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 3 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7249 | ***.**.*.4
                                                                                                                                 Merging data from sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463000 | **.***.**.237 |           7362 | ***.**.*.4
                                                                                                                                 Key cache hit for sstable 10 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7429 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 9 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7489 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 4 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7628 | ***.**.*.4
                                                                                                                                  Key cache hit for sstable 7 [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7720 | ***.**.*.4
                                                                                                                                 Defragmenting requested data [SharedPool-Worker-3] | 2017-09-05 18:09:37.463001 | **.***.**.237 |           7779 | ***.**.*.4
                                                                                                                                      Adding to feed memtable [SharedPool-Worker-4] | 2017-09-05 18:09:37.464000 | **.***.**.237 |           7896 | ***.**.*.4
                                                                                                                            Read 1 live and 4 tombstone cells [SharedPool-Worker-3] | 2017-09-05 18:09:37.464000 | **.***.**.237 |           7932 | ***.**.*.4
                                                                                                                                                                   Request complete | 2017-09-05 18:09:37.464092 | **.***.**.237 |           8092 | ***.**.*.4

activity                                                                                                                                              | timestamp                  | source        | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+------------
                                                                                                                                    Execute CQL3 query | 2017-09-05 18:09:44.703000 | **.***.**.237 |              0 | ***.**.*.4
                                Parsing select c2 from feed where k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z'; [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 |            508 | ***.**.*.4
                                                                                                             Preparing statement [SharedPool-Worker-1] | 2017-09-05 18:09:44.704000 | **.***.**.237 |            717 | ***.**.*.4
                                                                                        Executing single-partition query on feed [SharedPool-Worker-2] | 2017-09-05 18:09:44.704000 | **.***.**.237 |           1377 | ***.**.*.4
                                                                                                    Acquiring sstable references [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1499 | ***.**.*.4
                                                                                                    Key cache hit for sstable 10 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1730 | ***.**.*.4
                                                       Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1804 | ***.**.*.4
                                                                                                    Key cache hit for sstable 22 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1858 | ***.**.*.4
                                                                                                    Key cache hit for sstable 21 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1908 | ***.**.*.4
                                                                                                    Key cache hit for sstable 12 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705000 | **.***.**.237 |           1951 | ***.**.*.4
                                                                                                     Key cache hit for sstable 6 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2002 | ***.**.*.4
                                                                                                     Key cache hit for sstable 3 [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2037 | ***.**.*.4
                                                                                       Merged data from memtables and 6 sstables [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2252 | ***.**.*.4
                                                                                               Read 1 live and 4 tombstone cells [SharedPool-Worker-2] | 2017-09-05 18:09:44.705001 | **.***.**.237 |           2307 | ***.**.*.4
                                                                                                                                      Request complete | 2017-09-05 18:09:44.705458 | **.***.**.237 |           2458 | ***.**.*.4

cqlsh> show version [cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 |
Native protocol v4]

Original Q&A

There are 2 best solutions below

Shrikar On 10 September 2017 at 06:48

You are comparing apples to oranges

The first query you are asking for all rows matching condition k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC' The extra condition here being c2 = 'CCC' so cassandra needs to do more work in returning rows matching those conditions.
In the second query you are relaxing the condition of matching on c2 hence you can see different performance behaviors.

Lets assume you have 1000 rows which match the condition k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z'. Adding the condition for c2 might return only 4 rows (It might need to check all rows for the c2 condition) where as removing the conditions will start streaming the results once it gets to matching k1, k2 and c1.

If you really want to compare you can compare performance between

k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'CCC' OR k1 = 'AAA' and k2 = 'BBB' and c1 = '2017-09-05T16:09:00.222Z' and c2 = 'XXX'

Also when checking for performance you need to run the same query multiple times to avoid any caching behaviors.

**Jeff Jirsa** · Accepted Answer · 2017-09-12T01:45:37.333000

This is a great question, and you've (helpfully) provided all of the information we need to answer it!

Your first query is a point lookup (because you specify both clustering keys). The second is a slice.

If we look at the trace, the obvious difference in your trace is:

Skipped 8/9 non-slice-intersecting sstables, included 5 due to tombstones

That's a pretty good hint that we're taking two different read paths. You could use that to code dive, but long story short, the filter you use for your point read means you'll query the memtable/sstables in different order - for point reads, we sort by timestamp, for slice, we'll try to eliminate non-intersecting sstables first.

The comments in the code hint at that - for the first:

/**
 * Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
 * max timestamp.
 *
 * This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
 * more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
 * no collection or counters are included).
 * This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
 */

And the second:

    /*
     * We have 2 main strategies:
     *   1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use
     *      unless we have a names filter that we know we can optimize futher.
     *   2) If we have a name filter (so we query specific rows), we can make a bet: that all column for all queried row
     *      will have data in the most recent sstable(s), thus saving us from reading older ones. This does imply we
     *      have a way to guarantee we have all the data for what is queried, which is only possible for name queries
     *      and if we have neither collections nor counters (indeed, for a collection, we can't guarantee an older sstable
     *      won't have some elements that weren't in the most recent sstables, and counters are intrinsically a collection
     *      of shards so have the same problem).
     */

In your case, if the row returned happened to be in the memtable, the first (point) read would be faster. Also, since you have 8 sstables, you're likely using STCS or TWCS - if you used LCS, it's likely that you'd compact that partition into ~5 sstables, and you'd (again) have a more predictable read performance.

I know previously this query would not add the entry to the memtable as I have custom code running when things are added to the memtable. This code should only run when things are inserted or modified but started running when I was only querying items.

Neither read path should add anything to the memtable by default, unless you're read repairing (that is, unless the values mismatch between replicas, or background read repair chance is triggered). Note that a slice query is FAR more likely to mismatch than a point query, because it's scan-based - you'll read-repair any/all of the deletion markers (tombstones) with matching values of c1 = '2017-09-05T16:09:00.222Z'

Edit: I missed one line in the trace:

Defragmenting requested data

That's an indication that you're using STCS and touched too many sstables, so the whole partition was copied into the memtable to make future reads faster. That's a little known optimization in STCS when you start touching too many sstables, you can work around it by using LCS.

Cassandra long query time and adding to memtable when keys are fully constrained

There are 2 best solutions below

Related Questions in CASSANDRA

Related Questions in CASSANDRA-3.0

Trending Questions

Popular # Hahtags

Popular Questions