I have a text dataset, which I trained on to get tokernizer, called "bert_tokenizer". Then I try to give a new word and get the word embedding out.
from transformers import RobertaConfig
config = RobertaConfig(
vocab_enter code heresize=tokenizer.get_vocab_size(),
max_position_embeddings=514,
num_attention_heads=12,
num_hidden_layers=6,
type_vocab_size=1,)
#re-create tokenizer in transformers
from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast.from_pretrained("bert_tokenizer", output_hidden_states =True, max_len=512)
#initialise model
from transformers import RobertaForMaskedLM
model = RobertaForMaskedLM(config=config)
model.eval()
word = tokenizer.encode('test test')
input = torch.LongTensor(word)
out = model(input_ids=input)
Failed the last line out = model(input_ids=input)
, immediately. Error: kernel died.
My training dataset is very small, is that a problem? Or other reasons?
I am following tutorial here: https://github.com/BramVanroy/bert-for-inference/blob/master/introduction-to-bert.ipynb
Thank you.