I have been given the task of making a sentiment prediction model based on movie reviews. The train dataset along with the feature movie_review contains other features such as movie_name, release_date etc. and the sentiment (positive or negative).
I have vectorized the the feature movie_review using the TfidfVectorizer() function of sklearn. Now I am trying to concatenate two dataframes :-
- The
traindataset which is of shape (156311, 5) - The dataframe which I got as the output of
TfidfVectorizer()after vectorizing themovie_reviewfeature column of thetraindataset. Shape of this dataframe is (156311 × 65220)
In order to concatenate the two dataframes, I use the following function,
pd.concat([train, review_vectorized], axis=1)
The problem is that every time I try to run time the function, the RAM memory runs out and the google collab crashes.
Is there a more efficient way of concatenating dataframes? Or even better, is there a way to vectorize the textual column 'in-place'? So as we wouldn't need to create a separate dataframe with the vectorized text and the concatenate with the original dataframe?