Why/How does Pandas use square brackets with .loc and .iloc?

4.3k Views Asked by At

So .loc and .iloc are not your typical functions. They somehow use [ and ] to surround the arguments so that it is comparable to normal array indexing. However, I have never seen this in another library (that I can think of, maybe numpy as something like this that I'm blanking on), and I have no idea how it technically works/is defined in the python code.

Are the brackets in this case just syntactic sugar for a function call? If so, how then would one make an arbitrary function use brackets instead of parenthesis? Otherwise, what is special about their use/defintion Pandas?

2

There are 2 best solutions below

2
On BEST ANSWER

Note: The first part of this answer is a direct adaptation of my answer to this other question, that was answered before this question was reopened. I expand on the "why" in the second part.

So .loc and .iloc are not your typical functions

Indeed, they are not functions at all. I'll make examples with loc, iloc is analogous (it uses different internal classes). The simplest way to check what loc actually is, is:

import pandas as pd
df = pd.DataFrame()
print(df.loc.__class__)

which prints

<class 'pandas.core.indexing._LocIndexer'>

this tells us that df.loc is an instance of a _LocIndexer class. The syntax loc[] derives from the fact that _LocIndexer defines __getitem__ and __setitem__*, which are the methods python calls whenever you use the square brackets syntax.

So yes, brackets are, technically, syntactic sugar for some function call, just not the function you thought it was (there are of course many reasons why python is designed this way, I won't go in the details here because 1) I am not sufficiently expert to provide an exhaustive answer and 2) there are a lot of better resources on the web about this topic).

*Technically, it's its base class _LocationIndexer that defines those methods, I'm simplifying a bit here


Why does Pandas use square brackets with .loc and .iloc?

I'm entering speculation area here, because I couldn't find any document explicitly talking about design choices in Pandas, however: there are at least two good reasons I see for choosing the square brackets.

The first, and most important reason is: you simply can't do with a function call everything you do with the square-bracket notation, because assigning to a function call is a syntax error in python:

# contrived example to show this can't work
a = []
def f():
  global a
  return a
f().append(1) # OK
f() = dict() # SyntaxError: cannot assign to function call

Using round brackets for a "function" call, calls the underlying __call__ method (note that any class that defines __call__ is callable, so "function" call is an incorrect term because python doesn't care whether something is a function or just behaves like one).

Using square brackets, instead, alternatively calls __getitem__ or __setitem__ depending on when the call happens (__setitem__ if it's on the left of an assignment operator, __getitem__ in any other case). There is no way to mimic this behaviour with a function call, you'd need a setter method to modify the data in the dataframe, but it still wouldn't be allowed in an assignment operation:

# imaginary method-based alternative to the square bracket notation:
my_data = df.get_loc(my_index)
df.set_loc(my_index, my_data*2)

This example brings me to the second reason: consistency. You can access elements of a DataFrame via square brackets:

something = df['a']
df['b'] = 2*something

when using loc you're still trying to refer to some items in the DataFrame, so it's more consistent to use the same syntax instead of asking the user to use some getter and setter functions (it's also, I believe, "more pythonic", but that's a fuzzy concept I'd rather stay away from).

1
On

Underneath the covers, both are using the __setitem__ and __getitem__ functions.