Pytextrank - avoid lowercasing tags into key phrases extraction

147 Views Asked by Sross Gupta At 14 July 2020 at 14:02

I want to avoid lowercasing tags in pytextrank. Any suggestions on how that can be achieved?

There are 1 best solutions below

Paco On 01 March 2021 at 01:44

As of PyTextRank version 2.1.0 (released on 2021-01-31) when an application iterates through the ranked phrases, such as:

for phrase in doc._.phrases[:10]:
    print(phrase.text)

... the default text for each phrase is its most popular instance appearing in the document. That's what gets set in the text field of the Phrase data class.

However, check out the chunks field for all instances of the phrase that occur in the document. Since these are extracted from the document's raw text, these do not get forced to lowercase.

OTOH, when the algorithm constructs its internal lemma graph data structure, the lemmatized tokens are forced to lowercase. However, you don't need to use the lemma graph as the end results. Perhaps that may be some source of confusion?

Pytextrank - avoid lowercasing tags into key phrases extraction

There are 1 best solutions below

Related Questions in NLP

Related Questions in SPACY

Related Questions in PYTEXTRANK

Trending Questions

Popular # Hahtags

Popular Questions