Whoosh Phrase Frequency in One Document

278 Views Asked by At

I am trying to find the frequency of phrases in the text. But if there are several phrases in one document Whoosh still counts the whole document as a hit but not the Phrases entry. Example:

self.analyzer = StandardAnalyzer(expression=r'([.,!?;:]+|\w+((\-|\'|\.)?\w+)*)', minsize=1, stoplist=[])
self.schema = Schema(tag=STORED, content=TEXT(analyzer=self.analyzer))
self.index = create_in("index", self.schema)
self.parser = QueryParser('content', self.index.schema)
writer = self.index.writer()
writer.add_document(tag=u"tag1", content=u"One two Search Phrase three four Search Phrase")
writer.add_document(tag=u"tag2", content=u"Foo bar Search Phrase foo bar")
writer.commit()
self.searcher = self.index.searcher()

query = self.parser.parse('"Search Phrase"') #The Phrase we need to find
results = self.searcher.search(query, limit=None)

# Here we will achieve only 2 hits because every document contains the search phrase, but how could we achieve 3 hits?
res_count = len(results) 

For Terms we have frequency count:

# Number of times content:wobble appears in all documents
freq = searcher.frequency("content", "wobble")

# Number of documents containing content:wobble
docfreq = searcher.doc_frequency("content", "wobble")

But the code above does not work with Phrases. Is there something similar for Phrases? Am I missing something? I have not found anything useful in the documentation. Any help is highly appreciated!

0

There are 0 best solutions below