Is there a way to change the tokenizer in AllenNLP's coreference resolution model?

226 Views Asked by rosamariar At 01 July 2025 at 21:08

Does anyone know how to change the tokenizer in AllenNLP's coreference resolution? By default, it uses SpaCy and I would like to use a white space tokenizer so as to tokenize only words, not punctuation.

This is what I have tried so far but it does not seem to work:

review = """Judging from previous posts this used to be a good place, but not any longer.
        We, there were four of us, arrived at noon - the place was empty - 
        and the staff acted like we were imposing on them and they were very rude. 
        They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.
        The food was lousy - too sweet or too salty and the portions tiny.
        After all that, they complained to me about the small tip.
        Avoid this place!"""

from allennlp.data.tokenizers.whitespace_tokenizer import WhitespaceTokenizer
from allennlp.predictors.predictor import Predictor

predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2020.02.27.tar.gz")
predictor._tokenizer = WhitespaceTokenizer()

pred = predictor.predict(document=review)

# expected output: 'Judging', 'from', 'previous', 'posts', 'this', 'used', 'to', 'be', 'a', 'good', 'place,', 'but', 'not', 'any', 'longer.'
print(pred['document'])

I found the documentation on tokenizers here, but I don't know if it is possible to use them on other models like on coreference resolution.

Original Q&A

Is there a way to change the tokenizer in AllenNLP's coreference resolution model?

There are 0 best solutions below

Related Questions in NLP

Related Questions in TOKENIZE

Related Questions in ALLENNLP

Related Questions in COREFERENCE-RESOLUTION

Trending Questions

Popular # Hahtags

Popular Questions