Is there a way to change the tokenizer in AllenNLP's coreference resolution model?

220 Views Asked by At

Does anyone know how to change the tokenizer in AllenNLP's coreference resolution? By default, it uses SpaCy and I would like to use a white space tokenizer so as to tokenize only words, not punctuation.

This is what I have tried so far but it does not seem to work:

review = """Judging from previous posts this used to be a good place, but not any longer.
        We, there were four of us, arrived at noon - the place was empty - 
        and the staff acted like we were imposing on them and they were very rude. 
        They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.
        The food was lousy - too sweet or too salty and the portions tiny.
        After all that, they complained to me about the small tip.
        Avoid this place!"""

from allennlp.data.tokenizers.whitespace_tokenizer import WhitespaceTokenizer
from allennlp.predictors.predictor import Predictor

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2020.02.27.tar.gz")
predictor._tokenizer = WhitespaceTokenizer()

pred = predictor.predict(document=review)

# expected output: 'Judging', 'from', 'previous', 'posts', 'this', 'used', 'to', 'be', 'a', 'good', 'place,', 'but', 'not', 'any', 'longer.'
print(pred['document'])

I found the documentation on tokenizers here, but I don't know if it is possible to use them on other models like on coreference resolution.

0

There are 0 best solutions below