Does a string always be same sentencepiece tokenizer encode result?

26 Views Asked by At

Will the tokenizer of sentencepiece always have the same encode result for the same string regardless of the context of the string?

For example:

sentence 1: abc bcde aa
sentence 2: nnnabc bcde zks

Will the overlap sub-string "abc bcde" get the same sentencepiece tokenizer encode result in both sentences?

0

There are 0 best solutions below