Spark extract values from Json struct

102 Views Asked by At

I have a spark dataframe column (custHeader) in the below format and I want to extract the value of the key - phone into a separate column. trying to use the from_json function, but it is giving me a null value.

valArr:array
   element:struct
      key:string
      value:string

   ex:    [{"key": "BusinessName", "value": "RoadWay Diner"}, {"key": "Center", "value": "Denver"}, {"key": "phone", "value": "123456789"}, {"key": "customer", "value": "alex"}, {"key": "type", "value": "restaurant"}]

I have the below code and it is giving me null value. The result I want is to extract the value for the key - phone as a separate column.

 val sch = ArrayType(StructType(Array(
                           StructField("key", StringType),
                           StructField("value", BinaryType))))

    val df02 = df01.withColumn("ext_val", from_json(col("custHeader").cast(StringType), sch))
1

There are 1 best solutions below

0
On

Assuming there is only one phone per line, the following method can be used.

import pyspark.sql.functions as F

...
df = df.withColumn('ext_val', F.expr('filter(custHeader, x -> x.key = "phone")[0].value'))