Apache SOLR from Version 8.6.0 - Joining between Multiple Collections and Multiple Shards in each collection

241 Views Asked by At

Please consider SOLR version greater than 8.6.0 for this query. There are many questions regarding this issue but all are before version 8.6.0 and at that time SOLR was not supporting Join between multiple shards and multiple collection.

Now refer below links, it mention that SOLR supports Join between multiple collection and multiple shards. Refer Solr 8.6.0 Release Highlights: in this news. https://solr.apache.org/news.html

Also, SOLR provides guidance how to configure and achieve the same. Refer Join Query Parser Definition in solrconfig.xml in below link https://solr.apache.org/guide/8_6/other-parsers.html

I have also gone through below link for configuration https://kmwllc.com/index.php/2020/07/15/the-cross-collection-join-query/

Please review below configuration for local and other collection, schema structure and join query

Local Collection Name - loc_1 collection

Other Collection Name - oth_1 collection

loc_id (loc_1 collection) is a join key with oth_id (other collection)

  1. loc_1 Collection (local collection) -

schema.xml

<field name="loc_id" type="int" indexed="true" stored="true"/> 
<field name="loc_auth" type="int" indexed="true" stored="true"/>

solrconfig.xml file

<queryParser name="join" class="org.apache.solr.search.JoinQParserPlugin">
    <str name="routerField">loc_id</str>
    </queryParser>
  1. oth_1 Collection (remote collection) -

schema.xml

<field name="oth_id" type="int" indexed="true" stored="true" />
<field name="oth_auth" type="int" indexed="true" stored="true" multiValued="true" />

solrconfig.xml file

<cache name="hash_oth_id"  
       class="solr.CaffeineCache"
       size="128"
       initialSize="0"
       regenerator="solr.NoOpRegenerator"/>

I can successfully execute below query in Local environment with single Shard only and retrieve data from both collection. I am passing in local collection with fq.

loc_auth:123 OR {!join from=oth_id to=loc_id fromIndex=oth_1}oth_auth:123

But, above query fails with Multiple Shard on QA environment so, with help of above links, I have modify solrconfig.xml file as per above. Then execute the same query by just adding method=crossCollection. Please refer below join query for cross collection which I am passing in fq.

loc_auth:123 OR {!join method=crossCollection from=oth_id to=loc_id fromIndex=oth_1}oth_auth:123

It throws error as given here. Please any one can help if I am missing some configuration or need any changes in Shared configuration or need to re-create the shared structure.

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <bool name="zkConnected">true</bool>
  <int name="status">500</int>
  <int name="QTime">8</int>
</lst>
<lst name="error">
  <lst name="metadata">
    <str name="error-class">org.apache.solr.common.SolrException</str>
    <str name="root-error-class">java.lang.RuntimeException</str>
  </lst>
  <str name="msg">java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: params method=crossCollection&amp;q=oth_auth:123&amp;fl=oth_id&amp;sort=oth_id+asc&amp;qt=/export&amp;wt=javabin&amp;distrib=false</str>
  <str name="trace">org.apache.solr.common.SolrException: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: params method=crossCollection&amp;q=oth_auth:123&amp;fl=oth_id&amp;sort=oth_id+asc&amp;qt=/export&amp;wt=javabin&amp;distrib=false
    at org.apache.solr.search.join.CrossCollectionJoinQuery$CrossCollectionJoinQueryWeight.getDocSet(CrossCollectionJoinQuery.java:300)
    at org.apache.solr.search.join.CrossCollectionJoinQuery$CrossCollectionJoinQueryWeight.scorer(CrossCollectionJoinQuery.java:311)
    at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:148)
    at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
    at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:344)
    at org.apache.solr.search.SolrIndexSearcher$FilterImpl$FilterSet.iterator(SolrIndexSearcher.java:2366)
    at org.apache.solr.search.Filter$1.scorer(Filter.java:133)
    at org.apache.lucene.search.Weight.bulkScorer(Weight.java:182)
    at org.apache.lucene.search.ConstantScoreQuery$1.bulkScorer(ConstantScoreQuery.java:121)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:656)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
    at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)
    at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1595)
    at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1412)
    at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
    at org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1500)
    at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:390)
    at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:369)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
    at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: params method=crossCollection&amp;q=oth_auth:123&amp;fl=oth_id&amp;sort=oth_id+asc&amp;qt=/export&amp;wt=javabin&amp;distrib=false
    at org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:398)
    at org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:274)
    at org.apache.solr.client.solrj.io.stream.PushBackStream.open(PushBackStream.java:71)
    at org.apache.solr.client.solrj.io.stream.ReducerStream.open(ReducerStream.java:200)
    at org.apache.solr.client.solrj.io.stream.UniqueStream.open(UniqueStream.java:151)
    at org.apache.solr.search.join.CrossCollectionJoinQuery$CrossCollectionJoinQueryWeight.getDocSet(CrossCollectionJoinQuery.java:286)
    ... 61 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: params method=crossCollection&amp;q=oth_auth:123&amp;fl=oth_id&amp;sort=oth_id+asc&amp;qt=/export&amp;wt=javabin&amp;distrib=false
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:392)
    ... 66 more
Caused by: java.io.IOException: params method=crossCollection&amp;q=oth_auth:123&amp;fl=oth_id&amp;sort=oth_id+asc&amp;qt=/export&amp;wt=javabin&amp;distrib=false
    at org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrStream.java:132)
    at org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:504)
    at org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:493)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more
Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 123) or the data in not in 'javabin' format
    at org.apache.solr.common.util.JavaBinCodec._init(JavaBinCodec.java:213)
    at org.apache.solr.common.util.JavaBinCodec.initRead(JavaBinCodec.java:202)
    at org.apache.solr.client.solrj.io.stream.JavabinTupleStreamParser.&lt;init&gt;(JavabinTupleStreamParser.java:44)
    at org.apache.solr.client.solrj.io.stream.SolrStream.constructParser(SolrStream.java:314)
    at org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrStream.java:130)
    ... 7 more
</str>
  <int name="code">500</int>
</lst>
</response>
0

There are 0 best solutions below