I am trying to create a Japanese-English translation model following this Medium article. https://arusl.medium.com/japanese-english-language-translation-with-transformer-using-pytorch-243738146806 Everything runs perfectly until the second to last cell, when I get an error running the translate function. The error is specifically on this line.
tokens = [BOS_IDX] + [src_vocab.stoi[tok] for tok in src_tokenizer.encode(src, out_type=str)]+ [EOS_IDX]
The error: AttributeError: 'Vocab' object has no attribute 'stoi'. Since the article was written, the method .stoi has changed to get_stoi() → Dict[str, int] according to the torchtext documentation (https://pytorch.org/text/stable/vocab.html). When I attempt to change the line to the following, however, I get the error "Counter object has no attribute 'get_stoi'."
tokens = [BOS_IDX] + [src_vocab.get_stoi()[tok] for tok in src_tokenizer.encode(src, out_type=str)]+ [EOS_IDX]
The same goes for the itos and get_itos() method. If I try to use the method as Any help for how to make this work would be greatly appreciated as I'm very dumbfounded at the moment.
A similar question was asked here but I don't see how to implement the answer or make it work in this case. 'Vocab' object has no attribute 'itos'
Edit: This function seems suspect as it is creating vocab out of a counter... is there a better way to do this?
def build_vocab(sentences, tokenizer):
counter = Counter()
for sentence in sentences:
counter.update(tokenizer.encode(sentence, out_type=str))
return Vocab(counter)
Thank you!
Hmm... After reading the Medium article, I think you're encountering this error due to changes in the torchtext library's methods for vocabulary handling. The 'Vocab' object no longer has the 'stoi' attribute; it has been replaced with 'get_stoi()'. Similarly, 'itos' is now 'get_itos()'.
To fix the error, update your code:
Original code:
Updated code:
Make the same change for 'itos' to 'get_itos()', if applicable.