What and why following two with same apply() functionality differ in their results?

71 Views Asked by At

Objective is to convert each string to uppercase in Pandas dataframe (df). Used following 3 approaches but not sure why 1st is NOT working ? Please refer code snippet and 3 dataframes viz df1, df2, df3. Using pandas.version == 2.1.1

import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'C0': [1, 2, 3], 'C1': ['a', 'b', 'c']})

# apply lambda function to each element of the dataframe
df1 = df.apply(lambda x: x.upper() if isinstance(x, str) else x)
print(df1, '\n')

# convert all elements of the dataframe to strings and then convert them to uppercase
df2 = df.apply(lambda x: x.astype(str).str.upper())
print(df2, '\n')

# applymap lambda function to each element of the dataframe
df3 = df.applymap(lambda x: x.upper() if isinstance(x, str) else x)
print(df3, '\n')


   C0 C1
0   1  a
1   2  b
2   3  c 

  C0 C1
0  1  A
1  2  B
2  3  C 

   C0 C1
0   1  A
1   2  B
2   3  C 
  1. Why outputs differ in df1 and df2?
  2. Why we need to cast the type of each column into string as in df2?
  3. Why apply() doesn't work in df1 but applymap() worked with df3 where it checks each column datatype as string with isinstance()
2

There are 2 best solutions below

0
RomanPerekhrest On

Why apply() doesn't work in df1

df.apply by default applies a function to axis=0 (to each column), where each column is of type pd.Series, so it does not pass the check if isinstance(x, str) and remains as is.

Why we need to cast the type of each column into string as in df2?

You don't actually have to, as you expect to apply changes to a distinct concrete column. Unless you expect to change multiple str columns at once:

df1 = df.assign(C1=df['C1'].str.upper())
print(df1, '\n')

   C0 C1
0   1  A
1   2  B
2   3  C 
0
itprorh66 On

Since the second and third example outputs are exactly the same, I assume you'd like to understand why the first approach doesn't work. The simple answer is in the statement df.apply(lambda x: x.upper() if isinstance(x, str) else x) the variable x is not a single item but either a row object or a column object. Since the type of x is not a string, the x.Upper() is never run.