This issue is driving me crazy. I am using the following code to read a .jsonlines file into python:
routes = pl.read_ndjson('~<Path>\az-routes.jsonlines')
Yesterday, it was nulling out a different column or columns each time (either the description, location, or protection column would be nulled out.) Now it has settled into nulling out the 'location' column every time.
Here is some sample data- as you can see the location data is present in the file:
{"route_name": "Bottom Shelf Lick Her", "grade": {"YDS": "V3-", "Font": "6A"}, "safety": "PG", "type": {"tr": true, "boulder": true}, "fa": "Joe Jenson", "description": ["Starts sitting to the right of the small roof and passes to the left under the roof, up the left side of the roof and back around the top of the roof to connect with either a high ball finish or a good jump down point. Crux is passing under the roof on the left and reaching the top of the roof, requires a tricky finger jam and a tiny crimp with limited footholds, mostly smears after you come out from under the roof."], "location": ["Continue driving down FS 136 past the waterfall for approximately half a mile until you see a very obvious and very large granite boulder on the right side of the road. Park at on the left side of the road below the boulder. Approach hike is around 100 feet."], "protection": ["Can be set up as a top rope from a solid tree if you want to top out the climb, the very top of the boulder is around 25 feet."], "metadata": {"left_right_seq": "999999", "parent_lnglat": [-111.90802, 34.53523], "parent_sector": "Copper Canyon", "mp_route_id": "111893803", "mp_sector_id": "111892198", "mp_path": "Central Arizona|Copper Canyon"}}
Only rows with data in the location column are changed to null, the original null values (coded as '') are left intact.
In addition, I have a brand new non-fatal error when running this code today:
Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydev_bundle\pydev_console_utils.py", line 424, in execTableCommand success, res = exec_table_command(command, command_type, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\pydevd_tables.py", line 51, in exec_table_command res.append(table_provider.get_value_occurrences_count(table)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\tables\pydevd_polars.py", line 92, in get_value_occurrences_count bin_counts.append(analyze_column(col, table[col])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\tables\pydevd_polars.py", line 110, in analyze_column column_visualisation_type, res = analyze_categorical_column(column, col_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\tables\pydevd_polars.py", line 133, in analyze_categorical_column value_counts = value_counts.sort("counts").reverse() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\name\anaconda3\Lib\site-packages\polars\dataframe\frame.py", line 4635, in sort .collect(_eager=True) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\name\anaconda3\Lib\site-packages\polars\lazyframe\frame.py", line 1730, in collect return wrap_df(ldf.collect()) ^^^^^^^^^^^^^ polars.exceptions.ColumnNotFoundError: counts
I have no idea where to start with this issue- I spent hours yesterday fiddling with the rest of my code, wondering why it was throwing different errors every time I ran it, only to find that it was running into errors with different columns depending on what polars had actually read into the dataframe. Please help!
I think the issue is the use of empty strings to represent missing locations; polars doesn't like mixed dtypes when deserializing. You can tell it that you're expecting
list[str]as follows:Note the correctly inferred
nulllocation in row 2 (Up and Away).