When to query cassandra directly vs using an indexer

613 Views Asked by At

Would it be a good idea to read big data (query that returns billions of results) using an indexer(elastic search/solr) overtop of cassandra? Or would it be more preformant to ask cassandra directly? I am only wondering about reading data, not about updating and deleting.

Should indexers only be used for searches that return smaller sets of data?

I guess in a nutshell my question is when is it better to query an indexer over a big data database - more specifically cassandra when the query narrows down the potential reaults? Does this mean if the query returns a wide range of results that it aould be better to query cassandra directly?

1

There are 1 best solutions below

0
On BEST ANSWER

Would it be a good idea to read big data (query that returns billions of results) using an indexer(elastic search/solr) overtop of cassandra? Or would it be more preformant to ask cassandra directly? I am only wondering about reading data, not about updating and deleting

Do you mean, reading the data, indexing it, then reading it again from the index? Then definitely reading once would be better. i.e. asking Cassandra directly. Unless, you want to use ElasticSearch linguistic capabilities. If your query doesn't account for natural language, then go with reading directly from Cassandra.

Should indexers only be used for searches that return smaller sets of data?

Yes, search engines are optimized for this types of queries. Search engines solve 2 main issues: 1. Returning relevant results various types of filtering and natural languages capabilities. e.g. searching for "USA" and finding "United States of America" 2. Scoring the results in such a way that the most relevant (by some ranking function such as TD-IDF or BM25

When a search query executed only the id's of the document are returned and are assembled from the store part of the index, which is the most expensive search engine operation (besides optimizing perhaps :P ).

I guess in a nutshell my question is when is it better to query an indexer over a big data database - more specifically cassandra when the query narrows down the potential reaults? Does this mean if the query returns a wide range of results that it aould be better to query cassandra directly?

In a nutshell, if you can narrow the results from Cassandra in the same way as ElasitcSearch query, then you don't need ElasticSearch.