I am doing D3 mapping on a state level. Here one problem that i met in data processing. For example, map data are like this, (dat1.ndjson)
{state: a, code: aa}
{state: b, code: bb}
{state: c, code: cc}
But usually the information we have are not complete, for example, there are no information in Antarctica usually but we still need to draw its contour when we do mapping. Information data is like, (dat2.ndjson)
{state: a, code: aa, count: 1}
{state: b, code: bb, count: 2}
So, when i try to do left join on these two data, it will returns (dat3.ndjson)
[{state: a, code: aa},{state: a, code: aa, count: 1}]
[{state: b, code: bb},{state: b, code: bb, count: 2}]
[{state: c, code: cc},null]
This is returned by
ndjson-join --left 'd.code' dat1.ndjson dat2.ndjson < merge.ndjson
The purpose is to connect this 'count' information to map data, so usually I will first assign all items a count = 0 in dat1.ndjson, like this, (dat11.ndjson)
{state: a, code: aa, count: 0}
{state: b, code: bb, count: 0}
{state: c, code: cc, count: 0}
and then use this left join method like the one I showed before to get something like this, (dat33.ndjson)
[{state: a, code: aa, count: 0},{state: a, code: aa, count: 1}]
[{state: b, code: bb, count: 0},{state: b, code: bb, count: 2}]
[{state: c, code: cc, count: 0},null]
But here comes the problem. If i use the following command to add all values together, it will return an error because of that null in the third line.
ndjson-map '{state: d[0].state, code: d[0].code, count: d[0].count +
d[1].count}' < dat33.ndjson > merge.ndjson
Now I have to do this data processing in R which takes a lot of time as I need to do transformation between .ndjson and .csv. So I am looking for a better way to do this. I think there might be some ways by using 'ndjson-cli', 'jq' or 'awk' and 'sed' and etc.
Anyone have ideas? Thank you! :)
E.
Here is a solution that has several parts:
To illustrate how straightforward everything is once you have the fluff issues resolved, here is the "main" jq program:
In effect, this says: perform the join using .state as the join-key, and then ensure .count is set.
The output from the above one-liner will be the NDJSON:
Part 1: dat1.json and dat2.json
I am going to assume that you can produce valid JSON from your inputs. For the sample data, I used
sed
:The following at any rate assumes you have two files, dat1.json and dat2.json, containing streams of valid JSON.
Part 2:
join
Here is a small library of filters for producing joins: the first works on streams, and the others on arrays. These definitions assume your jq has
INDEX/2
. See Part 4 if that is not the case.Part 3. Solution
First, let's keep things simple. If you place the above definitions for
join
andjoins
in a file, say d3.jq, followed by the one-line program given in the preamble, then the following invocation will do the trick, assuming your jq hasINDEX
:This assumes you are using a shell that supports process substitution. If not, then you can first run the "." programs separately, e.g. if you have
sponge
:Using
include
If your jq supports
include
, and if you have the above definitions ofjoin
in a private standard library such as ~/.jq/jq/jq.jq then your main jq program becomes the two-liner:This means you could dispense with d3.jq and use the invocation:
Part 4:
INDEX
Here is a copy of
INDEX
as provided by recent versions of jq. You could add these definitions to d3.jq (before the "main" part of the program), or to your library file, and so on: