Dask .loc only the first result (iloc[0])

169 Views Asked by At

Sample dask dataframe:

import pandas as pd
import dask
import dask.dataframe as dd

df = pd.DataFrame({'col_1': [1,2,3,4,5,6,7], 'col_2': list('abcdefg')}, 
                  index=pd.Index([0,0,1,2,3,4,5]))
df = dd.from_pandas(df, npartitions=2)

Now I would like to only get first (based on the index) result back - like this in pandas:

df.loc[df.col_1 >3].iloc[0]
   col_1 col_2
2      4     d

I know there is no positional row indexing in dask using iloc, but I wonder if it would be possible to limit the query to 1 result like in SQL?

1

There are 1 best solutions below

0
On BEST ANSWER

Got it - But not sure about the efficiency here:

tmp = df.loc[df.col_1 >3] 
tmp.loc[tmp.index == tmp.index.min().compute()].compute()