Pandas UDF Structfield return

413 Views Asked by At

I am trying to return a StructField from a Pandas UDF in Pyspark used with aggregation with the following function signature:

def parcel_to_polygon(geom:pd.Series,entity_ids:pd.Series) -> Tuple[int,str,List[List[str]]]:

But it turns out that the return type is not supported. Is there an alternative way to achieve the same. I can make three Pandas udf and return the primitive types and that works, but the function logic is repeated in those three functions which is what I am trying to avoid(assuming it will be a bit more performant, maybe I'm wrong here).

1

There are 1 best solutions below

2
On

you can return all the values as a dataframe like this

schema = StructType([
    StructField('X', DoubleType()), 
    StructField('Y', DoubleType()),
]) 


@pandas_udf(schema)  
def polygon(Logitude,Latitude):
   return pd.DataFrame({"X":Longitude,"Y",Latitude}) .