I'm performing different sentiment analysis techniques for a set of Twitter data I have acquired. They are lexicon based (Vader Sentiment and SentiWordNet) and as such require no pre-labeled data.
I was wondering if there was a method (like F-Score, ROC/AUC) to calculate the accuracy of the classifier. Most of the methods I know require a target to compare the result to.
The short answer is no, I don't think so. (So, I'd be very interested if someone else posts a method.)
With some unsupervised machine learning techniques you can get some measurement of error. E.g. an autoencoder gives you an MSE (representing how accurately the lower-dimensional representation can be reconstructed back to the original higher-dimensional form).
But for sentiment analysis all I can think of is to use multiple algorithms and measure agreement between them on the same data. Where they all agree on a particular sentiment you mark it as more reliable prediction, where they all disagree you mark it as unreliable prediction. (This relies on none of the algorithms have the same biases, which is probably unlikely.)
The usual approach is to label some percentage of your data, and assume/hope it is representative of the whole data.