So first we are getting a list of termVectors, which contain all tokens, then we create a map<token, frequency in the document>.
then the method createQueue will determine a score by deleting, stopWords and word which occurs not enough, compute idf, then idf * doc_frequency of a given token which is equals to its token, then we keeping the 25 best one, but after that how does it work? How is it compare to the whole index? I read http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ but that didn't explain it, or I miss the point.
how does More_like_this elasticsearch work (into the whole index)
115 Views Asked by mel At
1
There are 1 best solutions below
Related Questions in INDEXING
- Why does mysql stop using indexes when date ranges are added to the query?
- MySQL: Using natural primary index or adding surrogate when tables are given
- How does MongoDB process unsupported languages?
- Error in indicies while unsetting Sessions
- How to index a field with mongodb-erlang
- How to force use of indices in MongoDB?
- Hint indexes to mysql on Join
- Lucene get all non deleted document from index file
- Querydsl generated sql query wrong sql type (nvarchar instead of varchar)
- Numpy Indexing: Get every second coloumn for each even row
- Simpler, safer string manipulation Python
- Understanding "ValueError: need more than 1 value to unpack" w/without enumerate()
- Poor performance with mongo array index
- Is it possible to skip IndexRebuilder in the startup process of mongodb 2.6?
- Does PostgreSQL self join ignore indexes?
Related Questions in ELASTICSEARCH
- Elasticsearch schema for multiple versions of the same text
- Elasticsearch nested filter query
- Elasticsearch data model
- search with filter by token count
- Usage of - operator in elasticsearch
- Running multiprocessing on two different functions in Python 2.7
- How to get an Elasticsearch aggregation with multiple fields
- How to implement custom sort in elasticsearch?
- Custom Analyzer not working Elasticsearch
- How to implement full text search using Elasticsearch in Rails?
- UnresolvedAddressException in Logstash+elasticsearch
- Elasticsearch Fiddler No DNS
- Monolithic ETL to distributed/scalable solution and OLAP cube to Elasticsearch/Solr
- how to disable page query in Spring-data-elasticsearch
- Create Custom Analyzer after index has been created
Related Questions in LUCENE
- Do 'reduce' with results from Cloudant search?
- How can I integrate Solr5.1.0 with Nutch1.10
- Exact word not boosting much Solr
- Solr stopped with Error opening new searcher at org.apache.solr.core
- How to get parsed terms of a document - Lucene
- Lucene get all non deleted document from index file
- ElasticSearch synonym and word delimiter analyzer are not compatible
- solrException. XML parser doesn't support XInclude option
- Solr Negative Boost Query result containing Some Specific Words
- lucene 5.1.0 delete document from index with specific id
- how to add new Fields into solr schema
- How to find duplicates in lucene documents?
- Upgrading SOLR from 3.5 to 5.2
- Search for nodes in Neo4j with schema index
- How to wisely combine shingles and edgeNgram to provide flexible full text search?
Related Questions in COMPARISON
- Cell comparison and row inserts
- How to speed up string comparisons in an array with a for loop?
- Count number of ones in a array of characters
- why the following code gives the first mobile no. irrespective of name
- Comparing two decimals
- Groovy comparison chaining
- Numpy matrix row comparison
- What is faster: equal check or sign check
- In JavaScript, is there any difference between typeof x == 'y' and typeof x === 'y'?
- Using comparable to compare different variables
- count how many times characters from a range occur in a string
- MySQL Comparison Query
- Dynamically comparing two tables from two different databases and serves in SQL Server Management Studio
- Compare two DATETIME fields from multidimensional array and return a value or index
- Bash string comparison
Related Questions in MORELIKETHIS
- how does More_like_this elasticsearch work (into the whole index)
- Boosting in more like this elasticsearch
- Solr MoreLikeThis boosting query fields
- Sunspot / Solr / Lucene : Find similar article
- Difference between MoreLikeThisHandler and search MoreLikeThisComponent in SOLR?
- How to get full documents via MoreLikeThis search in solr?
- How to always recommend different documents (files) in Elasticsearch
- SOLR \ More Like This Feature \ how to do loose search of similar text, and have some freedom degree
- How to constrain/filter More Like This results in Solr?
- How to find similar documents
- Does Solr's "More like this" support facet queries?
- How to filter a MoreLikeThis Query
- How do I make use of Solr's MoreLikeThis feature with a MultiCore setup?
- How to Filter by Id
- SOLR MoreLikeThis with date field returns Invalid Date String
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
It creates a
TermQueryout of each of those terms, and chucks them all into a simpleBooleanQuery, boosting each term by the previously calculated tfidf score (boostFactor * myScore / bestScore, where boostFactor can be set by the user).Here is the source (version 5.0):