How does scikit's one-hot encoding assign dummies?

149 Views Asked by yogz123 At 06 June 2025 at 21:57

For a research paper, I will be using a lasso model to perform classification and feature selection. I am preparing to use one-hot encoding to process my categorical data and will need to figure out which feature maps to the original categorical values in order to determine which features were ultimately selected for the final model. I've been googling this question for a while but have not found an answer.

How does scikit's one-hot encoding assign values? For example, say my categorical values for a certain variable are {1, 2, 3, 4}. Does one-hot encoding organize them into dummies in chronological order (i.e. drops 1, makes the first dummy for value 2, second dummy for value 3, and third dummy for value 4? Or does it assign based on the order in which it finds different categorical values as it scans down the rows (e.g. the first observation has a value 3 and the second observation has value 2, so 3 is dropped and the first dummy becomes value 2)?

Thanks!

Original Q&A

There are 1 best solutions below

Oliver Dain On 27 December 2016 at 00:36

From a quick look at the source it appears to me that they do end up in order by integer value. However, as this is not documented you can not count on this: it's not part of the contract. If you need to know which value ends up where I suggest writing your own OneHot implementation. Shouldn't be too hard and then you can count on it when you upgrade to new versions, etc.

How does scikit's one-hot encoding assign dummies?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in ONE-HOT-ENCODING

Trending Questions

Popular # Hahtags

Popular Questions