Why does df.loc[] return the same row twice, when there is only one?

650 Views Asked by At

I am currently working with a few Stock Price csv files but there is some strange behaviour of the .loc[] command after using pd.read_csv to get the data into a df. I am doing this with 10 files but I wanted to do a lot more before i ran into this problem, which literally ONLY happens with ONE of the files...

I basically want to subset each df to only show the data between 9:30 and 16:00 and it is a simple operation that has always worked without an issue:

open = dt.time(hour= 9, minute= 30)
close = dt.time(hour= 15, minute= 59)

but when i call:

 df.loc[open]

I get:

     Open   High     Low   Close  Volume
Date                                                      
2017-12-29 09:30:00  119.46  119.6  119.42  119.57     480
2017-12-29 09:30:00  119.46  119.6  119.42  119.57     480

BUT there are no duplicates in the csv, and when I make it print parts of the Dataframe or pause the debugger while running it to show me the df in memory, there are also no duplicates.

this happens with any time I choose to pass and with any column names i add to the loc[] command. BUT only with ONE of the dataframes.

This is also messing with other parts of my script, for instance when I want to retrieve a value from a row and use it in a calculation, it throws an error because this weirdness is returning a series when it should simply return one value

I have used .loc and Datetime.Indexes many times before but never encountered this.. I tried resetting the index, using different times, making copies of the dataframes, nothing seems to work and it keeps pretending that every row exists twice (in this one particular dataframe) which is not the case...

Thank you to anyone, who tries to help.

0

There are 0 best solutions below