How to extract keywords using TFIDF for each row in python?

Question

How to extract keywords using TFIDF for each row in python?

2.1k Views Asked by Monika Iyer At 18 August 2025 at 02:25

I have a column which has text only. I need to extract top keywords from each row using TFIDF.

Example Input:

df['Text']
'I live in India',
'My favourite colour is Red', 
'I Love Programming'

Expected output:

 df[Text]                            df[Keywords]
'I live in India'                  'live','India'
'My favourite colour is Red'       'favourite','colour','red'
'I Love Programming'               'love','programming'

How do i get this? I tried writing the below code

tfidf = TfidfVectorizer(max_features=300, ngram_range = (2,2))
Y = df['Text'].apply(lambda x: tfidf.fit_transform(x))

I am getting the below error Iterable over raw text documents expected, string object received.

Original Q&A

There are 3 best solutions below

Tom Ron On 20 April 2020 at 07:34

TfidfVectorizer fit_transform function expects an iterable type (e.g set, list, etc.) of sentences \ documents to fit the TfIdf score on.

So what you should do is actually -

Y = tfidf.fit_transform(df['Text'])

Marko Tankosic On 19 August 2020 at 08:58

As some people have pointed out already, there are several issues with your code and approach, first of them is the fact that you should not use TfIdf for this task (TfIdf is not meant to be used on small corpora). You'll be better of using RAKE or flashtext KeywordExtractor .

Another issue with your code is that you are trying to get 'unigrams' from your text, yet you have set up the ngram_range in your vectorizer to (2,2), meaning it will only find 'bigrams' (phrases consisting of two words).

If you insist on doing this with your chosen approach, firstly you need to split sentences in your df['text'] to one per row (you can use part of @ManojK solution for that), then pass the text from each row as a list:

Y = df['Text'].apply(lambda x: tfidf.fit_transform([x]))

However, if you want to extract feature names (what are essentially your keywords), you'll need to write a function to get_feature_names() after each iteration of your vectorizer (lambda x:) function.

**ManojK** · Accepted Answer

Try below code if you want to tokenize your sentences:

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

df = pd.DataFrame({'Text':['I live in India', 'My favourite colour is Red', 'I Love Programming']})
df['Keywords'] = df.Text.apply(lambda x: nltk.word_tokenize(x))
stops =  list(stopwords.words('english'))
df['Keywords'] = df['Keywords'].apply(lambda x: [item for item in x if item.lower() not in stops])
df['Keywords'] = df['Keywords'].apply(', '.join)

print(df)

                         Text                Keywords
0             I live in India             live, India
1  My favourite colour is Red  favourite, colour, Red
2          I Love Programming       Love, Programming

How to extract keywords using TFIDF for each row in python?

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in TF-IDF

Related Questions in TFIDFVECTORIZER

Related Questions in KEYWORD-EXTRACTION

Trending Questions

Popular # Hahtags

Popular Questions