Purpose of using special tokken in DONUT

45 Views Asked by At

text in this code they have used special tokken as New special tokens: ['<s_total>', '</s_total>', '<s_date>', '</s_date>', '<s_company>', '</s_company>', '<s_address>', '</s_address>', '<s>', '</s>'] and

adding via processor.tokenizer.add_special_tokens({"additional_special_tokens": new_special_tokens + [task_start_token] + [eos_token]})

while in default tokkenizing using <s>\<s_total\>$6.90\</s_total\>\<s_date\>27 MAR 2018\</s_date\>\<s_company\>UNIHAKKA INTERNATIONAL SDN BHD\</s_company\>\<s_address\>12, JALAN TAMPOI 7/4,KAWASAN PARINDUSTRIAN TAMPOI,81200 JOHOR BAHRU,JOHOR\</s_address\></s> What is the purpose of Special Tokken? It is going for the decoder part of the BART or the Encoder part of the BART?

Searched for it everywhere but no satisfactory ans

0

There are 0 best solutions below