I'm performing data analysis on a dataset with categorical labels are interrelated. My labels track experimental conditions. In my case, labels track concentrations of combinations of two chemicals that produce an output measured by n features.
Is it best practice to use the categorical labels in place of the concentrations of the combinations of chemicals, or is there a better method?
Here's a sample of the translation between categorical label and real life condition it represents.
| Condition | Chemical1 | Chemical2 |
|---|---|---|
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 3 | 0 | 1 |
| 4 | 0 | 2 |
| 5 | 1 | 1 |
| 6 | 1 | 2 |
The effectiveness of your approach significantly hinges on the quality and volume of your data. Consider the number of samples available and the complexity of the phenomenon you aim to model—specifically, the model's ability to generalize the problem at hand. I recommend experimenting with both methods to ascertain the most suitable one.
That said, you would have to implement a classification model for the labels, and a regression model for the concentration.