I am currently working on fine tuning DONUT transformer (https://huggingface.co/docs/transformers/model_doc/donut) on this task : I want it to extract only the paragraphs of my text document like this :
<> Text of the paragraph <>" .
For this, I used the notebooks of Donut fine tuned on doc parsing (https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Donut/CORD), and my own dataset with roundly 5000 training docs (from doclaynet).
For my training, I chose 20 epochs, a learning rate of 3e - 7 , a train batch size of 8. My training and validation losses are decreasing but my Tree edit distance (based on levenstein distance) is increasing whereas I want it near 0.
What is most surprising is how bad DONUT is at predictions doing things like that :
Prediction: """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" "" "" "" "" "" "" " "" " "" " "" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
My question is : do you think I did something wrong or is it just DONUT not created for this task ?
thanks a lot