Dynamically accessing a pandas dataframe column

Question

Dynamically accessing a pandas dataframe column

5.9k Views Asked by ℕʘʘḆḽḘ At 17 August 2025 at 03:45

Consider this simple example

import pandas as pd

df = pd.DataFrame({'one' : [1,2,3],
                   'two' : [1,0,0]})

df 
Out[9]: 
   one  two
0    1    1
1    2    0
2    3    0

I want to write a function that takes as inputs a dataframe df and a column mycol.

Now this works:

df.groupby('one').two.sum()
Out[10]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

this works too:

 def okidoki(df,mycol):
    return df.groupby('one')[mycol].sum()

okidoki(df, 'two')
Out[11]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

but this FAILS

def megabug(df,mycol):
    return df.groupby('one').mycol.sum()

megabug(df, 'two')
 AttributeError: 'DataFrameGroupBy' object has no attribute 'mycol'

What is wrong here?

I am worried that okidoki uses some chaining that might create some subtle bugs (https://pandas.pydata.org/pandas-docs/stable/indexing.html#why-does-assignment-fail-when-using-chained-indexing).

How can I still keep the syntax groupby('one').mycol? Can the mycol string be converted to something that might work that way? Thanks!

Original Q&A

There are 2 best solutions below

jezrael On 28 August 2017 at 14:39

I think you need [] for select column by column name what is general solution for selecting columns, because select by attributes have many exceptions:

You can use this access only if the index element is a valid python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers.

The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed.

Similarly, the attribute will not be available if it conflicts with any of the following list: index, major_axis, minor_axis, items, labels.

In any of these cases, standard indexing will still work, e.g. s['1'], s['min'], and s['index'] will access the corresponding element or column.

def megabug(df,mycol):
    return df.groupby('one')[mycol].sum()

print (megabug(df, 'two'))

one
1    1
2    0
3    0
Name: two, dtype: int64

**this be Shiva** · Accepted Answer

You pass a string as the second argument. In effect, you're trying to do something like:

df.'two'

Which is invalid syntax. If you're trying to dynamically access a column, you'll need to use the index notation, [...] because the dot/attribute accessor notation doesn't work for dynamic access.

Dynamic access on its own is possible. For example, you can use getattr (but I don't recommend this, it's an antipattern):

In [674]: df
Out[674]: 
   one  two
0    1    1
1    2    0
2    3    0

In [675]: getattr(df, 'one')
Out[675]: 
0    1
1    2
2    3
Name: one, dtype: int64

Dynamically selecting by attribute from a groupby call can be done, something like:

In [677]: getattr(df.groupby('one'), mycol).sum() 
Out[677]: 
one
1    1
2    0
3    0
Name: two, dtype: int64

But don't do it. It is a horrid anti pattern, and much more unreadable than df.groupby('one')[mycol].sum().

Dynamically accessing a pandas dataframe column

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DYNAMIC

Related Questions in ACCESSOR

Trending Questions

Popular # Hahtags

Popular Questions