# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="savasy/bert-base-turkish-sentiment-cased")
sentence = "Bakan Varank Milli elektrikli tren 29 Mayıs'ta test edilmeye başlanacak. Sanayi ve Teknoloji Bakanı Mustafa Varank yaptığı son dakika açıklamasında Milli elektrikli tren, 29 Mayıs'ta raylara indirilip test edilmeye başlanacak. Testlere göre, eylül ayında bu trenler vatandaşlarımızca kullanılmaya başlanacak dedi."
sentiment_result = pipe(sentence)
print(sentiment_result)
it prints this:
[{'label': 'negative', 'score': 0.6390795707702637}]
It should have been positive. What preprocessing can I do to take better score and label ? Would it be better if I tokenize Turkish sentence or apply other things ?
According to Google translate, this is what it says in English:
That sounds neutral to me, neither particularly positive or negative. Of course if the new train will make your commute better it is positive; if you work for a company maintaining the old diesel(?) trains it could be seen as negative. So this is a key point - sentiment is subjective.
https://huggingface.co/savasy/bert-base-turkish-sentiment-cased says it has been trained on movie reviews and tweets. And you'll get best results on a model trained of similar domain and expectations.
So you will get better results if you fine-tune on such a dataset. It won't have to be huge, you might just need 20-30 positive government-related news reports, and the same number of negative news reports, for the above sentence to start giving the result you expect. (A google on "how to fine-tune a huggingface sentiment model" brought up plenty of tutorials on how to do this.)
The other idea I had was to try machine-translate into English, and use one of the available English sentiment models, of which there is a wider choice.
I pasted the above text into https://huggingface.co/j-hartmann/sentiment-roberta-large-english-3-classes As the name suggests this one has a 3rd class, so it can classify as positive, neutral or negative. It came out as neutral 0.999.
So it could be a good solution, though it might not solve the problem as you perceive it.