Set disjoint labels for Multi-Label classification with Simpletransformers

118 Views Asked by At

I need to clasify some text in labels of emotions. I'm using Multi-Label Classification because the same text can contain more than one emotion, but I want to implement that some of them be disjoint like happy/sad or calm/angry.

Let's imagine that I have this code in Python:

from simpletransformers.classification import (
    MultiLabelClassificationModel, MultiLabelClassificationArgs
)

model_args = MultiLabelClassificationArgs(num_train_epochs=1)

# Create a MultiLabelClassificationModel
model = MultiLabelClassificationModel(
    "roberta", "roberta-base", num_labels=4,
)

with this sample:

train_data = [
    ["AAA", [1, 0, 0, 1]],
    ["BBB", [0, 1, 1, 1]],
    ["CCC", [1, 0, 1, 1]],
]

and I want to set the first and second labels must be disjoint. How I could do it?

1

There are 1 best solutions below

0
On

I suggest putting efforts into post-preprocessing (i.e. after obtaining the prediction logits).

Assuming you have conducted threshold selection (maybe through ROC curve), one possible way:

  1. Separate labels into disjoint groups (e.g. happy/sad, calm/angry)
  2. Within each group, determine the priority of each label (e.g. Happy > Sad)
  3. Obtain the final labels by firstly classifying whether it's Happy, and then classify whether it's sad only when it's not Happy
  4. Fine-tune the threshold of labels calculated earlier such that the metrics of all labels satisfy your needs

In conclusion, be flexible and focus on your "business" objectives (or higher level objectives) after obtaining the logits from ML model! :)