transformers: how to use hugging face EncoderDecoderModel to do machine translation task?

898 Views Asked by At

I have trained a EncoderDecoderModel from huggging face to do english-German translation task. I tried to overfit a small dataset (100 parallel sentences), and use model.generate() then tokenizer.decode() to perform the translation. However, the output seems to be proper German sentences, but it is definitely not the correct translation.

Here are the code for building the model

encoder_config = BertConfig()
decoder_config = BertConfig()
config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model = EncoderDecoderModel(config=config)

Here are the code for testing the model

model.eval()
input_ids = torch.tensor(tokenizer.encode(input_text)).unsqueeze(0)
output_ids = model.generate(input_ids.to('cuda'), decoder_start_token_id=model.config.decoder.pad_token_id)
output_text = tokenizer.decode(output_ids[0])

Example input: "iron cement is a ready for use paste which is laid as a fillet by putty knife or finger in the mould edges ( corners ) of the steel ingot mould ."

Ground truth translation: "iron cement ist eine gebrauchs ##AT##-##AT## fertige Paste , die mit einem Spachtel oder den Fingern als Hohlkehle in die Formecken ( Winkel ) der Stahlguss -Kokille aufgetragen wird ."

What the model outputs after trained 100 epochs: "[S] wenn sie den unten stehenden link anklicken, sehen sie ein video uber die erstellung ansprechender illustrationen in quarkxpress" which is totally nonesense.

Where is the problem?

0

There are 0 best solutions below