I'm wondering if there is an efficient way to do the following in Julia:
I have a DataFrame of the following form:
julia> df1 = DataFrame(var1=["a","a","a","b","b","b","c","c","c"],
var2=["p","q","r","p","p","r","q","p","p"],
var3=[1,2,3,2,5,4,6,7,8])
9×3 DataFrame
│ Row │ var1 │ var2 │ var3 │
│ │ String │ String │ Int64 │
├─────┼────────┼────────┼───────┤
│ 1 │ a │ p │ 1 │
│ 2 │ a │ q │ 2 │
│ 3 │ a │ r │ 3 │
│ 4 │ b │ p │ 2 │
│ 5 │ b │ p │ 5 │
│ 6 │ b │ r │ 4 │
│ 7 │ c │ q │ 6 │
│ 8 │ c │ p │ 7 │
│ 9 │ c │ p │ 8 │
And I want to return a DataFrame that contains the same columns but only the rows where var3
has its minimum value within groups according to var1
.
I have tried using the split-apply-combine approach but couldn't seem to find a way to filter the rows while returning all columns.
Appreciate any help on this.
An alternative way to do it if you do not have duplicates in
:var3
per group is:If you may have duplicates then use:
instead.
Another example handling duplicates correctly is:
it is not very efficient in this case but shows you how you can work with
groupby
in DataFrames.jl.