How to perform positive unlabeled learning using a binary classifier?

539 Views Asked by LN3 At 28 June 2025 at 15:27

I have setup a bagging classifier in pyspark, in which a binary classifier trains on the positive samples and an equal number of randomly sampled unlabeled samples (given scores of 1 for positive and 0 for the unlabeled). The model then predicts the out of bag samples, and this process repeats so now I plan to take the average prediction per sample.

My question comes in as the output model prediction using PySpark is a probability column that is a vector of probabilities per class. So for example the output for binary classification looks like:

model.transform(test_data).show()
+-----+--------------------+
|label|         probability|
+-----+--------------------+
|    0|[0.294, 0.8]        |
|    1|[0.65, 0.2 ]        |

To perform positive unlabeled learning from a binary classifier that outputs this, do I need to drop the probabilities predicted for the negative class and use only the predictions the model has made for if the unlabeled samples are positive?

Original Q&A

There are 1 best solutions below

Baktaawar On 20 January 2022 at 06:08

Yes. The probability you get for each unlabeled data is the probability for that point to be positive as the model gains for. Then you take the average across iterations

How to perform positive unlabeled learning using a binary classifier?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in PYSPARK

Related Questions in SUPERVISED-LEARNING

Related Questions in SEMISUPERVISED-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions