Create Polars DataFrame with Flattened Json File

84 Views Asked by At

The problem that I have is trying to read in a flattened json file into a polars dataframe in Rust.

Here is the Json example with a flattened JSON format. How would this structure be read into a DataFrame without labeling each column dtype in a struct?

{
  "data": [
    {
      "requestId": "IBM",
      "date": "2024-03-19",
      "sales": 61860,
      "company": "International Business Machines",
      "price": 193.34,
      "score": 7
    },
    {
      "requestId": "AAPL",
      "date": "2024-03-19",
      "sales": 383285,
      "company": "Apple Inc.",
      "price": 176.08,
      "score": 9
    },
    {
      "requestId": "MSFT",
      "date": "2024-03-19",
      "sales": 211915,
      "company": "Microsoft Corporation",
      "price": 421.41,
      "score": 7
    } 
  ]
}

There are only Integers, Floats, and Strings in the data.

Here is the example struct that I tried creating. If there are 200+ columns that change, would it be best to create a HashMap to store the columns dynamically?

#[derive(Debug, Deserialize, Serialize)]
#[serde(rename_all = "camelCase")]
struct Row {
    requestId: String,
    date: String,
    #[serde(flatten)]
    company_data: HashMap<String, serde_json::Value>,
}

This is a second half question for the Non-Flattened JSON data: Transform JSON Key into a Polars DataFrame

1

There are 1 best solutions below

2
Chayim Friedman On BEST ANSWER

This format is almost what polars' JsonReader expects; it is only the top-level object that is the problem. However, we can strip it with string manipulation:

pub fn flattened(json: &str) -> Result<DataFrame, Box<dyn Error>> {
    let json = json.trim();
    let json = json
        .strip_prefix("{")
        .ok_or("invalid JSON")?
        .strip_suffix("}")
        .ok_or("invalid JSON")?;
    let json = json.trim_start();
    let json = json.strip_prefix(r#""data""#).ok_or("invalid JSON")?;
    let json = json.trim_start();
    let json = json.strip_prefix(":").ok_or("invalid JSON")?;

    let json_reader = JsonReader::new(std::io::Cursor::new(json));
    let mut df = json_reader.finish()?;
    let date = df.column("date")?.cast(&DataType::Date)?;
    df.replace("date", date)?;

    Ok(df)
}