removing unseen labels in testing set

34 Views Asked by At

I am trying to perform naive bayes, but my testing set has a large amount of unseen labels. I thought the easiest way fix this issue is to remove them from the dataset but I get the error, TypeError: unhashable type: 'set'

unseen_labels = {{'2926', '4254', '684', '1617', '3851', '9849', '9579', '6876', '5220', '5421', '1415', '1766', '4310', '8552', '7259', '5492', '6562', '9969', '5287', '9657', '6837', '9534', '789', '7646', '8045', '3408', '4798', '4685', '9805', '5410', '2428', '9734', '5301', '1347', '5096', '2546', '9909', '10518', '3921', '6225', '1677', '4781', '9045', '6215', '6044', '5789', '9152', '6315', '5635', '6144', '6853', '3293', '2271', '5519', '1414', '7736', '10980', '3151', '11289', '10063', '8611', '10418', '3559', '2347', '6198', '5400', '773', '4197', '2751'...}}  # Replace ... with the rest of your unseen labels

# Convert the 'LABEL' column to numeric (assuming the labels are numeric)
Test1Lables['LABEL'] = pd.to_numeric(df['LABEL'], errors='coerce')

# Remove rows with unseen labels
Test1Lables = Test1Lables[~Test1Lables['LABEL'].isin(unseen_labels)]

what is the best way to handle unseen labels, especially when there is so many?

0

There are 0 best solutions below