How to manage very large Solr indexes

616 Views Asked by LandonC At 15 June 2015 at 22:17

I'm trying to plan a SolrCloud implementation, and given current index sizes from testing, my estimated physical index size for 1 billion documents is roughly 20 terabytes. So far, I've been unable to find a cloud host that can support a single volume of this size. I was hoping somebody could provide some guidance with regard to managing an index this large. Is a 20TB index absurd? Is there something I'm missing with regard to SolrCloud architecture? Most of the guidelines I've seen indicate that the entire index, regardless of shard count, should be replicated on every machine to guarantee redundancy, so every node would require a 20TB storage device. If there's anyone out there who can shed some light, I would greatly appreciate it.

Original Q&A

There are 1 best solutions below

Persimmonium On 16 June 2015 at 07:31

Not sure where you read such guidelines?

It is totally normal to keep only a portion of the index in each shard (each shard having one master and a number of replicas).

You would need to study how to shard your index, using built in routing based on a hash or provide your own.

Edit: so if I understand correctly, you are assuming that every node in the cluster must have either a master or a replica of EVERY shard, correct? If so, the answer is no. In order to provide resilience, you need to have master/replicas of every shard somewhere in the cluster, but you can have a node N that does not contain anything from shard S, as long as S has a master and a replica (at least) in other nodes.

How to manage very large Solr indexes

There are 1 best solutions below

Related Questions in SOLR

Related Questions in BIGDATA

Related Questions in SOLRCLOUD

Trending Questions

Popular # Hahtags

Popular Questions