I'm trying to generate ydata-profiling report in a AWS glue environment, with the following version:
- glue_version 3.0
- ydata_profiling 4.5.1
- pyspark 3.1.1+amzn.0
I have used also glue_version 2.0 and other versions of ydata_profiling (e.g. 4.3.2), but have the same issue.
After getting data (just 3397 lines) correctly with
dataset = glueContext.create_data_frame_from_catalog(database=config['schema'], table_name=table)
I used the following line to generate ydata-profiling report:
prof = ydata_profiling.ProfileReport(dataset, config_file=config['profiler_config'])
report = prof.get_description()
and got this error:
DispatchError: Function <code object spark_get_series_descriptions at 0x7f8c28632a50, file "/home/spark/.local/lib/python3.7/site-packages/ydata_profiling/model/spark/summary_spark.py", line 67>
The config file shouldn't be the problem since i tried with the suggested config from ydata-profiling page
prof = ydata_profiling.ProfileReport(dataset,infer_dtypes=False,
interactions=None,
missing_diagrams=None,
correlations={"auto": {"calculate": False},
"pearson": {"calculate": True},
"spearman": {"calculate": True}})
report = prof.get_description()
but have the same issue. The issue is the same if i do
prof.to_file('prova.json')
or
prof.to_html('prova.html')
I have no idea how to fix the problem. Does someone have a suggestion or had the same issue?