How to create normalized frequency histogram with Weights & Biases custom chart in Vega-Lite?

57 Views Asked by At

Goal: I am having trouble creating a histogram of normalized frequencies in Weights and Biases custom charts -- which are implemented in Vega-Lite. I would love some community help to resolve this.

I modify the default Vega-Lite code from W&B custom chart histograms to produce this plot:

plot of unnormalized histogram frequencies in weights and biases

I want to normalize the histograms per-group such that the bin heights add up to one. (Note that because the bin-width is set to one, this is both a valid PDF and PMF.)

I am surprised that this is so difficult to do -- normalized histograms are so common! -- and would immensely appreciate any help to get this to work.

Current Approach: I am following this example from the Vega-Lite documentation that creates a normalized frequency histogram. In the transform block, they aggregate by count, use joinaggregate to sum the entire count, and then calculate the datum.Count / datum.TotalCount to get the normalized frequencies.

When I try adding this functionality to my Vega-Lite code, no plot appears in the editor, indicating some sort of error .

Code I Used: More specifically, I got an error when adding the following Vega-Lite code to the bottom of my transform block:

{
      "joinaggregate": [
        {"op": "sum", "field": "Count", "as": "TotalCount"}
      ],
      "groupby": ["newGroupKeys", "color", "grouped"]
    },
    {
      "calculate": "datum.Count / datum.TotalCount",
      "as": "RelativeFrequency"
}

Here is my working Vega-Lite code used to produce the plot above. When adding changes to normalize by frequency, this code no longer works.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "description": "A simple histogram",
  "data": {
    "name": "wandb"
  },
   "transform": [
    {
      "calculate": "if('${field:groupKeys}' === ''  || datum['${field:groupKeys}'] === '', false, true)",
      "as": "grouped"
    },
    {
      "calculate": "if('${field:groupKeys}' === ''  || datum['${field:groupKeys}'] === '', datum.name, datum['${field:groupKeys}'])",
      "as": "newGroupKeys"
    },
    {
      "calculate": "if('${field:groupKeys}' === ''  || datum['${field:groupKeys}'] === '', datum.color, datum['${field:groupKeys}'])",
      "as": "color"
    },
  {
    "aggregate": [
      {
      "op" : "average",
      "field": "${field:value}",
      "as": "${field:value}"
      }
    ],
    "groupby": ["newGroupKeys", "color", "grouped", "${field:value}"]
  }
],
  "selection": {
    "grid": {
      "type": "interval", "bind": "scales"
    }
  },
  "title": "${string:title}",
  "layer": [
    {
      "transform": [
        {"filter": "datum.grouped == false"}
      ],
      "mark": {"type": "bar", "tooltip": {"content": "data"}},
      "encoding": {
        "x": {
          "bin": {"binned" : false, "step" : 1},
          "type": "quantitative",
          "field": "${field:value}"
        },
        "y": {
          "aggregate": "count",
          "stack": null
        },
        "opacity": {"value": 0.6},
        "detail": [{"field": "newGroupKeys"}, {"field": "color"}],
        "color": {
          "type": "nominal",
          "field": "newGroupKeys",
          "scale": {"range": {"field": "color"}},
          "legend": {"title": null}
        }
      }
    },
    {
      "transform": [
        {"filter": "datum.grouped == true"}
      ],
      "mark": {"type": "bar", "binSpacing": 0, "tooltip": {"content": "data"}, "clip": true},
      "encoding": {
        "x": {
          "bin" : {"binned" : false, "step" : 1}, 
          "type": "quantitative",
          "scale": {"domain": [0, 30]},
          "field": "${field:value}"
        },
        "y": {
          "aggregate": "count",
          "stack": null
        },
        "opacity": {"value": 0.6},
        "detail": [{"field": "newGroupKeys"}, {"field": "color"}],
        "color": {
          "field": "newGroupKeys",
          "type": "nominal",
          "scale": {"range": "category"},
          "legend": {"title": null}
        }
      }
    }
  ],
  "resolve": {"scale": {"color": "independent"}}
}

My wandb data looks something like:

{"data": { "values": [
    {"uturns/uturns": 0.3, "groupKeys": "group1", "grouped": true},
    {"uturns/uturns": 2.8, "groupKeys": "group1", "grouped": true},
    {"uturns/uturns": 1.7, "groupKeys": "group2", "grouped": true},
    {"uturns/uturns": 0.8, "groupKeys": "group2", "grouped": true},
]}}

Any help to diagnose this issue would be immensely appreciated. Thanks.

0

There are 0 best solutions below