Is python pandas df.loc function limited by the size of the dataframe? It is working for small indexes but not for big ones

45 Views Asked by At

I have a pretty big 3D map as a dataframe (rawDF: 1392640 rows by 3 columns(named: 'X', 'Y, 'Z')). I want to access a Y point and analyze the corresponding X-Z profile.

I'm using pandas (pd) df.loc function and matplotlib.pyplot.plot (plt) to check the extracted profile:

2Dprofile = rawDF.loc[rawDF['Y'] == 2.58]
plt.plot(2Dprofile.X,2Dprofile.Z)
 

This works just fine, as expected. 2Dprofile is a 1360 rows by 3 columns dataframe with repeating values (2.58) in the Y column. In this case, i can proceed with the analysis without problem.

However, for higher values (in this case, index 1358612 to 1359971):

2Dprofile = rawDF.loc[rawDF['Y'] == 2574.84]
plt.plot(2Dprofile.X,2Dprofile.Z)
 

Returns an empty 2Dprofile (0 rows by 3 columns), and the plot is empty.

This method is implemented in a loop for each 2D profile. Right now, The loop runs fine in all points, but I have no way of checking whats happening (seeing) the profiles with high Y value.

I have searched online for a similar problem, but have not found it. I have tried forcing the dataframe to be "float" type:

rawDF = rawDF.astype(float)

But it does not work (i think its read as float to begin with).

I'm out of ideas and I think the problem is somehow related to the df.loc function. Does anyone know what is happening? I can provide the dataset for testing by email/link if required.

1

There are 1 best solutions below

1
On

If you're using floats, chances are rawDF['Y'] == 2574.84 is never true. You'd need e.g.

profile = rawDF.loc[rawDF['Y'].between(2574.84 - 0.01, 2574.84 + 0.01)]

(or whatever is your desired precision).