How to apply tf-idf on multiple predictors, don't want to concatenate into a single column

61 Views Asked by At

I have two predictors - want to vectorize each one of them using tf-idf (don't want to concatenate them since we need to have separate vocabulary for each). Should I apply the tf-idf vectorizers on each and then join the features.

For e.g. If i apply tf-idf on predictor1, I get 100 features from that and 200 from predictor2. My features for the training data would simply be 300 (100+200). Am i thinking correctly here?

I will get two matrices from this (one for each predictor), can i concatenate these using numpy functions and use them as features?

1

There are 1 best solutions below

0
On

Your suggestion on getting this done is correct. The most common way of using two vectors like this is to concatenate them into a longer vector and then feed it to the model.

If, for some reason, this doesn't work out for you, we can explore alternatives based on what your constraints are.

For example, if your constraint is total dimension size, one way to solve this would be to create a multilayered MLP autoencoder

  • We can train it with the combined vectors as both input and output until the encoder is trained
  • Subsequently, we can use any intermediate layer's activations as input to our model

It would be easier to suggest a solution if you can describe your constraints in the question.