Uneven load distribution after data import to DSE Search cluster

Question

Uneven load distribution after data import to DSE Search cluster

414 Views Asked by Leon At 17 August 2025 at 08:56

I am experimenting with DataStax Enterprise Search. I have a two node cluster and I am importing data using Solr console Dataimport capability. I have my virtual nodes disabled (num_tokens = 1 in cassandra.yaml) as per "Configuring Solr" doc (http://www.datastax.com/docs/datastax_enterprise3.2/solutions/dse_search_schema#configuring-solr). My simplified schema is as follows:

<schema name="spatial" version="1.1">

<types>
    <fieldType name="string" class="solr.StrField" omitNorms="true"/>
    <fieldType name="boolean" class="solr.BoolField" omitNorms="true"/>
    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> 
    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" omitNorms="true"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"/>
    <fieldType name="binary" class="solr.BinaryField"/>

    <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
</types>

  <fields>
      <field name="id"  type="string" indexed="true"  stored="true"/>
      <field name="objectid" type="tint" indexed="true" stored="true" required="true" multiValued="false" />
      <field name="guwi" type="string" indexed="true" stored="true" required="false" multiValued="false" />
      <field name="country" type="string" indexed="true" stored="true" required="false" multiValued="false" />
      <field name="region" type="string" indexed="true" stored="true" required="false" multiValued="false" />
      <field name="latlong" type="location" indexed="true" stored="false"/>
  </fields>
  <defaultSearchField>objectid</defaultSearchField>
  <uniqueKey>id</uniqueKey>
</schema>

Data import succeeds. However when I run "nodetool status" I can see that the load is not evenly distributed across my two node but is all concentrated on the node I used to perform data import. I tried to modify uniqueKey to be a composite key, like (id,latlong) or even a just latlong, but it does not seem to change load distribution. Am I missing something?

Thanks, Leon

Original Q&A

There are 1 best solutions below

**RussS** · Answer 1

Your problem, as seen in the nodetool output, is that the two nodes have tokens that are too close together. Because of this, node (10.30.161.137) is responsible for 94% of the token range.

This is most likely because when you set the num_token=1 you did not set the initial token value. When initial token isn't set, undesirable values may be assigned.

initial_token (Default: disabled) Used in the single-node-per-token architecture, where a node owns exactly one contiguous range in the ring space. If you haven't specified num_tokens or have set it to the default value of 1, you should always specify this parameter when setting up a production cluster for the first time and when adding capacity. For more information, see this parameter in the Cassandra 1.1 Node and Cluster Configuration documentation.

Configuring Cassandra

A token calculator is available here Token Generator

Uneven load distribution after data import to DSE Search cluster

There are 1 best solutions below

Related Questions in SOLR

Related Questions in CASSANDRA

Related Questions in DATAIMPORTHANDLER

Related Questions in DATASTAX

Trending Questions

Popular # Hahtags

Popular Questions