Does a string always be same sentencepiece tokenizer encode result?

34 Views Asked by Zip At 07 June 2025 at 16:30

Will the tokenizer of sentencepiece always have the same encode result for the same string regardless of the context of the string?

For example:

sentence 1: abc bcde aa
sentence 2: nnnabc bcde zks

Will the overlap sub-string "abc bcde" get the same sentencepiece tokenizer encode result in both sentences?

There are 0 best solutions below