Receiving ClassCastException: org.wololo.geojson.FeatureCollection when trying to read a GeoJSON file in PySpark

68 Views Asked by At

I am running the following python code in Apache Spark, I am also using Apache Sedona:

geo_json_file_location='hdfs:///data/datafile.geojson'
segments= GeoJsonReader.readToGeometryRDD(sc, geo_json_file_location)
segmt_df = Adapter.toDf(segments, spark)
segmt_df.show(5)
segmt_df.createOrReplaceTempView('geodata')

but I am getting the following error:

Traceback (most recent call last):
  File "/vagrant/py/pyspark-lidar-draw-jaywalking-data.py", line 152, in <module>
    segments= GeoJsonReader.readToGeometryRDD(sc, geo_json_file_location)
  File "/home/vagrant/anaconda3/envs/hadoop/lib/python3.8/site-packages/sedona/utils/meta.py", line 134, in __call__
    return method.__get__(self).__call__(*args, **kwargs)
  File "/home/vagrant/anaconda3/envs/hadoop/lib/python3.8/site-packages/sedona/core/formatMapper/geo_json_reader.py", line 35, in readToGeometryRDD
    srdd = jvm.GeoJsonReader.readToGeometryRDD(
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.sedona.core.formatMapper.GeoJsonReader.readToGeometryRDD.
: java.lang.ClassCastException: class org.wololo.geojson.FeatureCollection cannot be cast to class org.wololo.geojson.Feature (org.wololo.geojson.FeatureCollection and org.wololo.geojson.Feature are in unnamed module of loader 'app')
        at org.apache.sedona.common.utils.FormatUtils.readGeoJsonPropertyNames(FormatUtils.java:130)

I am using Apache Spark 3.4.1, Python 3.8.10 and Sedona 1.5.0

  • The GeoJSON file has no newline characters.
  • I have verified that the GeoJSON is valid using geojson.io
0

There are 0 best solutions below