Extract Wikipedia Entities from Text

590 Views Asked by Iknoor Singh At 17 April 2019 at 20:49

Is there any way we can extract all the wikipedia entities from the text using Wikipedia2Vec? Or is there any other way to do the same.

Example:

Text : "Scarlett Johansson is an American actress."  
Entities : [ 'Scarlett Johansson' , 'American' ]

I want to do it in Python

Thanks

Original Q&A

There are 2 best solutions below

Jindřich On 18 April 2019 at 08:53

You can use spacy:

import spacy
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = doc = nlp('Scarlett Johansson is an American actress.')
print([(X.text, X.label_) for X in doc.ents])

And you get:

[('Scarlett Johansson', 'PERSON'), ('American', 'NORP')]

Find more in spacy documentation.

alvas On 23 April 2019 at 07:17

Here's an NLTK version (may not be as good as SpaCy):

from nltk import Tree
from nltk import ne_chunk, pos_tag, word_tokenize

def get_continuous_chunks(text, chunk_func=ne_chunk):
    chunked = chunk_func(pos_tag(word_tokenize(text)))
    continuous_chunk = []
    current_chunk = []

    for subtree in chunked:
        if type(subtree) == Tree:
            current_chunk.append(" ".join([token for token, pos in subtree.leaves()]))
        elif current_chunk:
            named_entity = " ".join(current_chunk)
            if named_entity not in continuous_chunk:
                continuous_chunk.append(named_entity)
                current_chunk = []
        else:
            continue

    return continuous_chunk


text = 'Scarlett Johansson is an American actress.'
get_continuous_chunks(text)

Extract Wikipedia Entities from Text

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NLP

Related Questions in NLTK

Related Questions in WIKIPEDIA

Related Questions in ENTITY-LINKING

Trending Questions

Popular # Hahtags

Popular Questions