TypeError: lemmatize() missing 3 required positional arguments: 'index', 'exceptions', and 'rules'

579 Views Asked by At

I'm working with text data right now and I'm preprocessing it (I'm working with French data text).

Here's my code so far:

df = pd.read_csv('file.csv', sep=';')

from nltk.corpus import stopwords

import re
from nltk.tokenize import RegexpTokenizer
from spacy.lang.fr import French


stop_words = set(stopwords.words('french'))
tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+')
lemmatizer = French.Defaults.create_lemmatizer()


def clean_text(text):
    text = text.lower()  
    text = tokenizer.tokenize(text)
    text = [word for word in text if not word in stop_words]
    text = [lemmatizer.lemmatize(word) for word in text]
    final_text = ' '.join( [w for w in text if len(w)>2] ) 
    return final_text

df['comms_clean'] = df['comms'].apply(lambda x : clean_text(x))

But I get this error:

TypeError: lemmatize() missing 3 required positional arguments: 'index', 'exceptions', and 'rules'

I'm used to work with English data so it's the first time I used this kind of packages so I'm quite lost. What should I do to fix this?

1

There are 1 best solutions below

1
On

The error you have shown is telling you that those arguments are missing when you call it, so you need them to call method lemmatize() but you are only passing one: lemmatize(string=word).

Here you have the official documentation: https://spacy.io/api/lemmatizer#_title

Here you have the implementation of the lemmatizer object where you can find the lemmatize method: https://github.com/explosion/spaCy/blob/master/spacy/lemmatizer.py

def lemmatize(self, string, index, exceptions, rules):