How can I edit Turkish sentence to get better and consistent sentiment analysis from a pre trained model?

43 Views Asked by halil At 08 September 2023 at 09:00

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="savasy/bert-base-turkish-sentiment-cased")

sentence = "Bakan Varank Milli elektrikli tren 29 Mayıs'ta test edilmeye başlanacak. Sanayi ve Teknoloji Bakanı Mustafa Varank yaptığı son dakika açıklamasında Milli elektrikli tren, 29 Mayıs'ta raylara indirilip test edilmeye başlanacak. Testlere göre, eylül ayında bu trenler vatandaşlarımızca kullanılmaya başlanacak dedi."

sentiment_result = pipe(sentence)
print(sentiment_result)

it prints this: [{'label': 'negative', 'score': 0.6390795707702637}]

It should have been positive. What preprocessing can I do to take better score and label ? Would it be better if I tokenize Turkish sentence or apply other things ?

Original Q&A

There are 1 best solutions below

Darren Cook On 08 September 2023 at 15:29

According to Google translate, this is what it says in English:

Minister Varank The national electric train will start testing on May 29. In his last minute statement, Minister of Industry and Technology Mustafa Varank said that the National electric train will be put on the rails and tested on May 29. "According to the tests, these trains will start to be used by our citizens in September," he said.

That sounds neutral to me, neither particularly positive or negative. Of course if the new train will make your commute better it is positive; if you work for a company maintaining the old diesel(?) trains it could be seen as negative. So this is a key point - sentiment is subjective.

https://huggingface.co/savasy/bert-base-turkish-sentiment-cased says it has been trained on movie reviews and tweets. And you'll get best results on a model trained of similar domain and expectations.

So you will get better results if you fine-tune on such a dataset. It won't have to be huge, you might just need 20-30 positive government-related news reports, and the same number of negative news reports, for the above sentence to start giving the result you expect. (A google on "how to fine-tune a huggingface sentiment model" brought up plenty of tutorials on how to do this.)

The other idea I had was to try machine-translate into English, and use one of the available English sentiment models, of which there is a wider choice.

I pasted the above text into https://huggingface.co/j-hartmann/sentiment-roberta-large-english-3-classes As the name suggests this one has a 3rd class, so it can classify as positive, neutral or negative. It came out as neutral 0.999.

So it could be a good solution, though it might not solve the problem as you perceive it.

How can I edit Turkish sentence to get better and consistent sentiment analysis from a pre trained model?

There are 1 best solutions below

Related Questions in DEEP-LEARNING

Related Questions in NLP

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in SENTIMENT-ANALYSIS

Related Questions in TURKISH

Trending Questions

Popular # Hahtags

Popular Questions