I have Apache Drill querying uncompressed JSON files with no problem, but am struggling with gz compressed JSON archives.
My understanding is that Drill uses the Hadoop file connector which I believed had the ability to handle gz files, but it seems that that Drill's JSON querying capabilities are always locked to .json files.
I've tried doing something like this:
"formats": {
"gz": {
"type": "json"
}
}
However, receive a file not found error. Also tried this:
"formats": {
"json": {
"type": "json",
"extensions": [
"gz"
]
}
}
Which results in an "invalid JSON mapping" error.
This was a bug that has been fixed on latest master branch (0.8): https://issues.apache.org/jira/browse/DRILL-1871
My testing confirms that things work OK, still seeing issues, but get some results back.