Performing equivalent to pd.Grouper() in Pandas API on Spark

28 Views Asked by Mapajr At 19 January 2024 at 03:19

I've been trying to transition some of our codebases from using pure pandas to using Pandas API on Spark (in Databricks), and one function that I've been having trouble replicating so far has been pd.Grouper().

Specifically, existing code has many situations where we would have a table like the following (simplified for example):

ds	segment	value
11-12-2023	A	1
11-13-2023	B	2
12-11-2023	A	3
12-12-2023	B	5

And, we use the following code to aggregate:

import pandas as pd

df = pd.Groupby(['segment',pd.Grouper(key = 'ds', freq = 'M')]).sum()

How could we accomplish the same functionality, allowing us to group on different frequencies without creating new helper columns for each frequency we want to group on? pd.Grouper has full support for a list of offset aliases that we actively use.

I've tried using pd.Grouper functionality by replacing pandas with pyspark.pandas, but this function is not available:

import pyspark.pandas as ps

df = ps.Groupby(['segment',ps.Grouper(key = 'ds', freq = 'M')]).sum()

Original Q&A

Performing equivalent to pd.Grouper() in Pandas API on Spark

There are 0 best solutions below

Related Questions in PANDAS

Related Questions in GROUP-BY

Related Questions in DATABRICKS

Related Questions in PYSPARK-PANDAS

Related Questions in SPARK-KOALAS

Trending Questions

Popular # Hahtags

Popular Questions