Find the longest group after groupby on normalized json in pandas

Question

Find the longest group after groupby on normalized json in pandas

72 Views Asked by 0004 At 27 July 2025 at 17:24

My code below groups by values and creates a list of values that were once the length of arrays. But how can I return the id that has the largest sum of each number in the elements:

Original Json read into df (not same data as printed because it was too long)

{
   "kind":"admin#reports#activities",
   "etag":"\"5g8\"",
   "nextPageToken":"A:1651795128914034:-4002873813067783265:151219070090:C02f6wppb",
   "items":[
      {
         "kind":"admin#reports#activity",
         "id":{
            "time":"2022-05-05T23:59:39.421Z",
            "uniqueQualifier":"5526793068617678141",
            "applicationName":"token",
            "customerId":"cds"
         },
         "etag":"\"jkYcURYoi8\"",
         "actor":{
            "email":"[email protected]",
            "profileId":"1323"
         },
         "ipAddress":"107.178.193.87",
         "events":[
            {
               "type":"auth",
               "name":"activity",
               "parameters":[
                  {
                     "name":"api_name",
                     "value":"admin"
                  },
                  {
                     "name":"method_name",
                     "value":"directory.users.list"
                  },
                  {
                     "name":"client_id",
                     "value":"722230783769-dsta4bi9fkom72qcu0t34aj3qpcoqloq.apps.googleusercontent.com"
                  },
                  {
                     "name":"num_response_bytes",
                     "intValue":"7158"
                  },
                  {
                     "name":"product_bucket",
                     "value":"GSUITE_ADMIN"
                  },
                  {
                     "name":"app_name",
                     "value":"Untitled project"
                  },
                  {
                     "name":"client_type",
                     "value":"WEB"
                  }
               ]
            }
         ]
      },
      {
         "kind":"admin#reports#activity",
         "id":{
            "time":"2022-05-05T23:58:48.914Z",
            "uniqueQualifier":"-4002873813067783265",
            "applicationName":"token",
            "customerId":"df"
         },
         "etag":"\"5T53xK7dpLei95RNoKZd9uz5Xb8LJpBJb72fi2HaNYM/9DTdB8t7uixvUbjo4LUEg53_gf0\"",
         "actor":{
            "email":"[email protected]",
            "profileId":"1324"
         },
         "ipAddress":"54.80.168.30",
         "events":[
            {
               "type":"auth",
               "name":"activity",
               "parameters":[
                  {
                     "name":"api_name",
                     "value":"gmail"
                  },
                  {
                     "name":"method_name",
                     "value":"gmail.users.messages.list"
                  },
                  {
                     "name":"client_id",
                     "value":"927538837578.apps.googleusercontent.com"
                  },
                  {
                     "name":"num_response_bytes",
                     "intValue":"2"
                  },
                  {
                     "name":"product_bucket",
                     "value":"GMAIL"
                  },
                  
                  {
                     "name":"client_type",
                     "value":"WEB"
                  }
               ]
            }
         ]
      }
   ]
}

current code:

df = pd.json_normalize(response['items'])
    df['test'] = df.groupby('actor.profileId')['events'].apply(lambda x: [len(x.iloc[i][0]['parameters']) for i in range(len(x))])

output:

ID
1002306    [7, 7, 7, 5]
1234444    [3,5,6]
1222222    [1,3,4,5]

desired output

id       total
1002306  26

Sorry had to fill up more space, as there was so much code

Original Q&A

There are 1 best solutions below

**AudioBubble** · Accepted Answer

There’s no need to construct the intermediate df and do groupby on it. You can use pass the record and meta paths to json_normalize to directly flatten the json data. Then your job seems to be about counting the number of rows per actor.profileId and finding the maximum.

df = pd.json_normalize(response['items'], ['events','parameters'], ['actor'])
df['actor.profileId'] = df['actor'].str['profileId']
out = df.value_counts('actor.profileId').pipe(lambda x: x.iloc[[0]])

Output:

actor.profileId
1323    7
dtype: int64

Find the longest group after groupby on normalized json in pandas

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in JSON-NORMALIZE

Trending Questions

Popular # Hahtags

Popular Questions