Flair training german NER model: Dev, f1 score almost 0.0, why the model doesn't learn?

593 Views Asked by At

I trained my NER model first with Spacy with the micro F1 score 64.7% (8 classes). Next step I wanted to train Flair, hoping to get better results. Of course the spacy format data would be convert to proper Flair corpus with some custom codes.

Info about the input data: Corpus: "Corpus: 4037 train + 840 dev + 448 test sentences"

in training set: 'Kultur' (1512), 'Erreger' (1376), 'Mittel' (1083), 'Auftreten' (583), 'Zeit' (285), 'Witterung' (238), 'BBCH_Stadium' (214), 'Ort' (161)

in test set: 'Erreger' (390), 'Mittel' (311), 'Kultur' (221), 'BBCH_Stadium' (148), 'Auftreten' (54), 'Witterung' (54), 'Ort' (53), 'Zeit' (40)

Corpus look like this:

Der O
Schwerpunkt O
der O
Unkrautbekämpfung O
in O
Kartoffeln S-Kultur
liegt O
im O
Vorauflauf S-BBCH_Stadium
. O

Sind O
die O
mechanischen O
Maßnahmen O
abgeschlossen O
, O
kann O
die O
erste O
Herbizidbehandlung O
auf O
gut O
abgesetzten O
Dämmen O
, O
je O
nach O
Produkt O
bis O
kurz B-BBCH_Stadium
vor I-BBCH_Stadium
dem I-BBCH_Stadium
Durchstoßen E-BBCH_Stadium
der O
Kartoffeln S-Kultur
( O
kvD O
) O
, O
erfolgen O
. O

The training code:

from flair.datasets import ColumnCorpus
from flair.embeddings import TokenEmbeddings, WordEmbeddings, StackedEmbeddings,  FlairEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
from typing import List
import time
start_time = time.time()

columns = {0: 'text', 1: 'ner'}

 data_path = "path/to/data"

 # initializing the corpus
 corpus: Corpus = ColumnCorpus(data_path, columns,
                          train_file = 'bb_train.txt',
                        #   dev_file = 'bb_test_sm_sm.txt',
                          test_file = 'bb_test_sm_sm.txt',
                          )

 # tag to predict
 tag_type = 'ner'

 tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

 word_vectors = gensim.models.KeyedVectors.load_word2vec_format('german.model', binary=True)
 word_vectors.save('german.model.gensim')

 german_embedding = WordEmbeddings('german.model.gensim')

# init forward embedding for German
 flair_embedding_forward = FlairEmbeddings('de-forward')
 flair_embedding_backward = FlairEmbeddings('de-backward')

embedding_types: List[TokenEmbeddings] = [
     german_embedding,
     flair_embedding_forward,
      flair_embedding_backward,
 ]

embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

tagger: SequenceTagger = SequenceTagger(hidden_size=256,
                                    embeddings=embeddings,
                                    tag_dictionary=tag_dictionary,
                                    tag_type=tag_type,
                                    use_crf=True)


trainer : ModelTrainer = ModelTrainer(tagger, corpus)

trainer.train('resources/taggers/ner_bb',
              learning_rate=0.01,
              mini_batch_size=64,
               max_epochs=5,
               )

print(f"It took {time.time() - start_time}")

the loss log is:

EPOCH   TIMESTAMP   BAD_EPOCHS  LEARNING_RATE   TRAIN_LOSS  DEV_LOSS    DEV_PRECISION   DEV_RECALL  DEV_F1
1   10:42:23    0   0.0100  42.28642028570175   28.403223037719727  0.0197  0.0748  0.0312
2   10:43:48    0   0.0100  17.928552985191345  14.348283767700195  0.3089  0.0312  0.0567
3   10:45:10    0   0.0100  10.604630261659622  13.98863697052002   0.3089  0.0312  0.0567
4   10:46:36    1   0.0100  10.26459190249443   13.614569664001465  0.3579  0.0279  0.0518
5   10:47:55    2   0.0100  9.987788125872612   13.339178085327148  0.3333  0.0164  0.0313

Why the score is so low ? the model doesn't learn anything. I tried until 10 epochs and same results.

Do I need to tune some parameters ? Is something wrong with my corpus ?

Thank you if you have experience with it.

0

There are 0 best solutions below