Passing geojson via Spark datasource

108 Views Asked by At

I'm trying to write a dataframe to ArangoDB where one of the columns is a GeoJSON object. I have tried passing it as a string, but the double quotes are being escaped, so ArangoDB won't interpret it as GeoJSON type. If I make it a geometry column via Sedona, the JSON field looks completely wrong.

Has anyone had any luck writing GeoJSON to ArangoDB from pyspark?

Thanks for any pointers in resolving this issue.

My initial dataframe looks like this:

+------+------+--------------------+--------------------+--------------------+--------------------+
|alpha2|alpha3|                name|             geojson|             tmpJson|                _key|
+------+------+--------------------+--------------------+--------------------+--------------------+
|    CF|   CAF|CENTRAL AFRICAN REP.|{"type":"MultiPol...| {'type':'Point',...|002b3cfc5aaa6eaa5...|
|    CA|   CAN|              CANADA|{"type":"MultiPol...| {'type':'Point',...|010885e44f48a9947...|
|    AS|   ASM|      AMERICAN SAMOA|{"type":"MultiPol...| {'type':'Point',...|0174caa6734ea8d20...|

When I send it via the ArangoDB datasource, it looks like text as:

{"alpha2":"CF","alpha3":"CAF","name":"CENTRAL AFRICAN REP.","geojson":"{\"type\":\"MultiPolygon\",\"coordinates\":[[[[24.147363281250023,8.665625],[24.22089843750001,8.608251953124991],[24.179980468750017,8.461132812499997],[24.291406250000023,8.29140625],[24.736718750000023,8.191552734374994],[24.853320312500017,8.137548828124991],...

(truncated for ease of reading)

And the same sort of result if I write a json or csv to import. The csv looks like:

CF,CAF,CENTRAL AFRICAN REP.,"{\"type\":\"MultiPolygon\",\"coordinates\":[[[[24.147363281250023,8.665625],[24.22089843750001,8.608251953124991],[24.179980468750017,8.461132812499997],...`
0

There are 0 best solutions below