Let's say I have a set of tuples to be processed by Cascalog, formatted like [Date, Name, Value], e.g.
2014-01-01 Pizza 3
2014-01-01 Hamburger 4
2014-01-01 Cheeseburger 2
2014-01-02 Pizza 1
2014-01-02 Hamburger 2
Given that I have a list of columns like [Pizza, Hamburger, Cheeseburger], I want to transpose / pivot the data so it looks like this:
Date Pizza Hamburger Cheeseburger
2014-01-01 3 4 2
2014-01-02 1 2 0
What's the best way to do this in Cascalog?
Here's one way to do it:
Let's have a quick run through the code:
Most of the action happens in the
transposefunction, which contains two queries:The inner query aggregates all
?name ?valuepairs for a given date into a?summap.The outer query uses
select-valuesto fetch the values for our columns out of the?summap, and into the final result rows.Since we know the columns are
Pizza, Hamburger, Cheeseburgerwe can simply hardcode them into the query. If you want to know how to make the columns dynamic, read Nathan Marz's blog post on creating a news feed in Cascalog.Note that we have to represent the columns as nullable variables (using
!) since not every column will have a value for any given row. If we wanted to avoidnullresults, we could changeselect-valuesto use 0 as the default value.(One caveat is that this won't produce any headers in the final output, so this has to be done as a post-processing step.)