How can I group_by and sum values in a data frame?

256 Views Asked by At

I have this data frame (please refer the figure below)

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    2
|  Mexico City | Alvaro O      |    3
|  Mexico City |  Miguel H     |    2
|   Gto        |   Leon        |    1
|   Gto        |   Leon        |    1

What I want to do is group by County and sum the value of homicides. for example

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    5
| Mexico City  |  Miguel H     |    2
|   Gto        |   Leon        |    2

As you can see I summarize the values of homicides with the same county name

This was my attempt

df1 >> group_by("County") >> summarize(County = X.County)

But is not doing what I Want, can someone guide me with this question, please.

Thanks

2

There are 2 best solutions below

0
coding On

With your help this was my last line of codes that help me with this issue

df1 = df1.groupby(['State',"County"]).agg('sum')
df1 =df1.reset_index()
df1   

This was my result

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    5
| Mexico City  |  Miguel H     |    2
|   Gto        |   Leon        |    2


0
johanna On

I had a same struggle to do group_by & summation with dfply. Description of "dfply" module for using group_by function in https://github.com/kieferk/dfply/blob/master/README.md is not enough, I think.

Try below.

df1 >> group_by(X.Country) >> summarize(sum_Country = X.County.sum())