Spacy matcher is not finding any matches for counties

32 Views Asked by At

I am trying to make a matcher in spacy that pulls country names, including abbreviations. For example, Kenya, KE, and KEN should all be matched as Kenya. I built a simple matcher but it is not returning anything back.

Simple code below tried in Jupyter notebook

import spacy
import pycountry
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

for country in pycountry.countries:
    name = country.name
    pattern1 = [{'LOWER': name}]
    pattern2 = [{'LOWER': country.alpha_2}]
    pattern3 = [{'LOWER': country.alpha_3}]
    patterns = [pattern1, pattern2, pattern3]
    matcher.add(name, patterns)
doc = nlp(u"Kenya is a beautiful country. It is next to Somalia. KEN is in Africa. China is making investments there. It is near the UAE and SAU")
found_matches  = matcher(doc)
print(found_matches)
0

There are 0 best solutions below