In Palantir Foundry, how should I get the current SparkSession in a Transform?

1.2k Views Asked by At

I'm writing a Python Transform and need to get the SparkSession so I can construct a DataFrame.

How should I do this?

1

There are 1 best solutions below

2
On

You can pass the SparkContext as an argument in the transform, which can then be used to generate the SparkSession.

@transform(
    output=Output('/path/to/first/output/dataset'),
)
def my_compute_function(ctx, output):
    # type: (TransformContext, TransformOutput) -> None

    # In this example, the Spark session is used to create an empty data frame.
    columns = [
        StructField("col_a", StringType(), True)
    ]
    empty_df = ctx.spark_session.createDataFrame([], schema=StructType(columns))

    output.write_dataframe(empty_df)

This example can also be found in the Foundry documentation here: https://www.palantir.com/docs/foundry/transforms-python/transforms-python-api/#transform