How do you confiugre /export requestHandler in SolrCloud to use all shards

939 Views Asked by At

I'm using solr 4.10.2. I got an /export handler working to export large datasets. When I deployed the config into my solr cluster environment I noticed that the export function was missing some records.

If I ran the same query string through /select and /export I would get less records in the /export call.

Is there anything special you need to do to get the /export to work in a SolrCloud environment?

  <requestHandler name="/export" class="solr.SearchHandler">
    <lst name="invariants">
      <str name="rq">{!xport}</str>
      <str name="wt">xsort</str>
      <str name="distrib">false</str>
    </lst>

    <arr name="components">
      <str>query</str>
    </arr>
  </requestHandler>

I tried changing the "distrib" attribute to true hoping that would help, but that caused other errors.

Any suggestions?

2

There are 2 best solutions below

0
On BEST ANSWER

The /export endpoint is only relevant to the local node, but the Streaming Expressions API (available under /stream without any further configuration) is built on top of the /export endpoint and is meant to be the Cloud alternative.

This also allows you to process the content when requesting it, if applicable.

The required parameters for /stream is the same as for the /export.

But since you're on 4.10.2, you're going to have to request the clusterstate.json from Zookeeper and then query each node by itself, before merging the results locally.

You can retrieve this file by connecting to Zookeeper:

zkCli.sh -server ip:2181

and then retrieve the clusterstate:

get /clusterstate.json

You'll find a list of shards and their replicas for each collection, and you can then iterate over those values and retrieve your results from the /export handler on each server.

0
On

Here is some code that will get what is described above:

final CloudSolrServer server = new CloudSolrServer(zooKeeperEndpoint);
server.connect();
final ClusterState clusterState = server.getZkStateReader().getClusterState();
// and get the leader of the collection...
Replica leader1 = clusterState.getLeader("search_index", "shard1");
Replica leader2 = clusterState.getLeader("search_index", "shard2");
Replica leader3 = clusterState.getLeader("search_index", "shard3");

List<String> listOfNodes = new ArrayList<String>();
listOfNodes.add((String) leader1.get("core"));
listOfNodes.add((String) leader2.get("core"));
listOfNodes.add((String) leader3.get("core"));

Then loop over the list calling each core of the solr index:

String solrURL = "http://mysolrserver/solr" + "/" + nodeEndpoint + "/export?q=*:*" + "&fq=text:\"*SEARCHSTRING*\"&fl=field1,field2&sort=sortFieldId asc";