Python Pandas concatenate a Series of strings into one string

26.7k Views Asked by At

In python pandas, there is a Series/dataframe column of str values to combine into one long string:

df = pd.DataFrame({'text' : pd.Series(['Hello', 'world', '!'], index=['a', 'b', 'c'])})

Goal: 'Hello world !'

Thus far methods such as df['text'].apply(lambda x: ' '.join(x)) are only returning the Series.

What is the best way to get to the goal concatenated string?

3

There are 3 best solutions below

2
On BEST ANSWER

You can join a string on the series directly:

In [3]:
' '.join(df['text'])

Out[3]:
'Hello world !'
1
On

Apart from join, you could also use pandas string method .str.cat

In [171]: df.text.str.cat(sep=' ')
Out[171]: 'Hello world !'

However, join() is much faster.

2
On

Your code is "returning the series" because you didn't specify the right axis. Try this:

df.apply(' '.join, axis=0)
text    Hello world !
dtype: object

Specifying the axis=0 combines all the values from each column and puts them in a single string. The return type is a series where the index labels are the column names, and the values are the corresponding joined string. This is particularly useful if you want to combine more than one column into a single string at a time.

Generally I find that it is confusing to understand which axis you need when using apply, so if it doesn't work the way you think it should, always try applying along the other axis too.