Will the tokenizer of sentencepiece
always have the same encode result for the same string regardless of the context of the string?
For example:
sentence 1: abc bcde aa
sentence 2: nnnabc bcde zks
Will the overlap sub-string "abc bcde" get the same sentencepiece tokenizer encode result in both sentences?