Sample dask dataframe:
import pandas as pd
import dask
import dask.dataframe as dd
df = pd.DataFrame({'col_1': [1,2,3,4,5,6,7], 'col_2': list('abcdefg')},
index=pd.Index([0,0,1,2,3,4,5]))
df = dd.from_pandas(df, npartitions=2)
Now I would like to only get first (based on the index) result back - like this in pandas:
df.loc[df.col_1 >3].iloc[0]
col_1 col_2
2 4 d
I know there is no positional row indexing in dask using iloc
, but I wonder if it would be possible to limit the query to 1 result like in SQL?
Got it - But not sure about the efficiency here: