I am using Snorkel Labeling Package to programmatically label my unlabeled training data. I followed this like https://www.snorkel.org/use-cases/01-spam-tutorial
you should write several label function like:
from textblob import TextBlob
@labeling_function()
def lf_sent_blob(x):
sent = TextBlob(x.text).sentiment.polarity
if sent>0:
return positive
elif sent<0:
return negative
else:
#return neutral
return ABSTAIN
then you define the list of label functions and apply that to your unlabeled data like here:
from snorkel.labeling.model import LabelModel
from snorkel.labeling import PandasLFApplier
from snorkel.labeling import LFAnalysis
lfs = [lf_sent_emoji, lf_has_special, lf_has_capital, lf_not_ABSTAIN_v2, lf_not_ABSTAIN_v1,
lf_sent_blob]
# Apply the LFs to the unlabeled training data
applier = PandasLFApplier(lfs)
L_train = applier.apply(df)
# here you analyze your labels
LFAnalysis(L=L_train, lfs=lfs).lf_summary()
Here is my problem if at the beginning I define class labels values as below the analyzing result makes sense. for example coverage of labels or polarity of labels for lf_sent_blob label function makes sense. for example Polarity Infer the polarities of each LF based on evidence in a label matrix.
positive = 1
negative = 0
ABSTAIN = -1 # -1 should be reserved for abstain!?
However when I change it to which has better meaning for me:
positive = 1
negative = -1
ABSTAIN = 0
I receive this result that is not correct
This is my question, is -1 integer value reserved for ABSTAIN class in snorkel if yes all staff does make sense?!!! if yes why they did not mention it in their tutorials or documentations ?!