I have an OpenSearch domain with 40 data nodes. There is currently one index in the whole cluster. We are a delete-heavy cluster where we are constantly deleting HTML documents and adding new ones. We currently have about 200,000,000 searchable documents and 160,000,000 deleted documents. Would reindexing be a good idea? Also, are there tools you can use to estimate the time it would take to reindex a domain?
How often should you reindex an elasticsearch cluster?
123 Views Asked by Sean At
1
There are 1 best solutions below
Related Questions in ELASTICSEARCH
- Elasticsearch schema for multiple versions of the same text
- Elasticsearch nested filter query
- Elasticsearch data model
- search with filter by token count
- Usage of - operator in elasticsearch
- Running multiprocessing on two different functions in Python 2.7
- How to get an Elasticsearch aggregation with multiple fields
- How to implement custom sort in elasticsearch?
- Custom Analyzer not working Elasticsearch
- How to implement full text search using Elasticsearch in Rails?
- UnresolvedAddressException in Logstash+elasticsearch
- Elasticsearch Fiddler No DNS
- Monolithic ETL to distributed/scalable solution and OLAP cube to Elasticsearch/Solr
- how to disable page query in Spring-data-elasticsearch
- Create Custom Analyzer after index has been created
Related Questions in OPENSEARCH
- OpenSearch + PHP for INSPIRE ATOM : Why can I get the correct Content-Type?
- SharePoint 2013 not sending to OpenSearch endpoint the right {searchTerms}
- Autocomplete in opensearch
- Tab to Search in Chrome With AJAX Post Search Engine
- Open Search Server: Ignore content but follow links
- query with search terms in specific elements in opensearch?
- How to define a trigger keyword in OpenSearchDescription xml file?
- Opensearch Alerting: Per Document Monitor to include few document field(s) in alert message
- Opensearch cluster is not working with ingress
- Weird OpenSearch query results when using nested objects and some questions
- Alternatives to `asciifolding` filter for removing Greek ascents from unicode text
- aws eventbridge eventpattern for opensearch document insert
- Opensearch issues with json field names containing []
- How do I install pytorch in a Docker container without blowing up memory?
- OpenSearch Javascript Client No or Bad response
Related Questions in AMAZON-OPENSEARCH
- AWS Opensearch: How to aggregate by properties of a composite element
- Opensearch Alerting: Per Document Monitor to include few document field(s) in alert message
- Send the data from perftop to opensearch indices
- max_expansions parameter in the match_phrase_prefix query in the AWS Opensearch
- Display result in opensearch with same city result only
- Elasticsearch - "could not read the current timestamp”
- How to implement cold storage in OpenSearch without using OpenSearch lifecycle policy?
- How to set up IAM role to access AWS OpenSearch Service domain through terraform
- Migrating to spring-boot 3.1 and jdk17 using elasticsearch, with AWS OpenSearch
- How often should you reindex an elasticsearch cluster?
- Opensearch serverless - init ingest performances issue
- Nlog Target for Opensearch
- How do I specify dual-stack IP address type for AWS Elasticsearch in Terraform
- AWS OpenSearch sniffing not works with spring-data-opensearch-starter dependency
- How to search AWS OpenSearch instance from .Net Core application C#
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Reindex is not the only option. If you can pause the documents ingestion for a few hours (or maybe days), you can run:
split on the index. If you split by a factor 6, you will have no more segments > 5GB and Elasticsearch will merge segments and at the same time free disk space of deleted documents. But this option requires a lot of free disk space. Please read carefully the documentation.
forcemerge on the index. I think you will have to specify a value for
max_num_segmentsand / oronly_expunge_deletes. Warning: Force merge operation cannot be cancelled and can takes hours on such big shards.Ideally for the future, you should try to avoid having only one big index because they are harder to operate. Usually, it's possible to distribute documents in multiple indexes by a switch (on first letters of HTML domain name for example).