Delta Import Causing Solr Response To Double OR Even Worse

200 Views Asked by At

SOLR VERSION 6.0.0

Spikes at delta import - every 2 hours

There is spike which can be observed at every 2 hours. i.e whenever delta import runs

  1. Response time doubles
  2. CPU load average doubles and if it runs to near to peal hour than it goes even to 10 times.

My Cache Settings are

 1. <filterCache class="solr.FastLRUCache"
                 size="5000"
                 initialSize="512"
                 autowarmCount="128"/>

    <queryResultCache class="solr.FastLRUCache"
                     size="10000"
                     initialSize="512"
                     autowarmCount="128"/>

     <documentCache class="solr.FastLRUCache"
                   size="100000"
                   initialSize="10000"
                   autowarmCount="0"/>

      <cache name="perSegFilter"
      class="solr.search.LRUCache"
      size="10"
      initialSize="0"
      autowarmCount="10"
      regenerator="solr.NoOpRegenerator" />

    <enableLazyFieldLoading>true</enableLazyFieldLoading>

    <queryResultWindowSize>20</queryResultWindowSize>
    <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
    <useColdSearcher>false</useColdSearcher>

Most of my queries are CPU centric as they involved lots of IN queries and NOT IN Queries. Also have if cond for scoring. Assuming my queries will continue to be cpu cnetric.

Help

What am I doing wrong as my delta import causing so much high response.

  • Index size: 2GB
  • Servers: 4
  • Serving: 100k per hour

Also delta update results in 200k record update out of 1 million as one of the field of solr (last login) changes frequently.

My delta import comprised of three part

a) delete -- around 100

b) insert -- around 30k

c) updates -- around 1.9k (one or two column)

for insert and update i am using updating using /update?&overwrite=true&wt=json

for deleted stream.body=id:1 stream.body=id:1 .... and then update

and at end using /update?commit=true&wt=json

Is there any optimized way of doing so (DIH is better but not optimized in term of performance)

1

There are 1 best solutions below

3
On

You have very large caches and large autowarm settings. So, when your import hits the commit, the index readers have to be reopened and the caches rebuilt. And with useColdSearcher=false, you will get the response delay while all those caches are warmed up.

You could try changing that setting, this will make queries slower but not blocking during the warm-up.

You could also experiment with soft vs. hard commit settings, but remember that soft commits will make the content visible when they run, so if you delete records first and then soft commit is triggered part-way through reindex, you may see partial results. Not as much of an issue with partial update.

The other option, as MatsLindh said is to do a full index offline and then switch cores over using aliases or core-swap mechanism. You can even always have the reference core offline that you index to and even optimize and then copy the resulting index into production.