Query compressed gz files with Apache Drill

1.9k Views Asked by At

I have Apache Drill querying uncompressed JSON files with no problem, but am struggling with gz compressed JSON archives.

My understanding is that Drill uses the Hadoop file connector which I believed had the ability to handle gz files, but it seems that that Drill's JSON querying capabilities are always locked to .json files.

I've tried doing something like this:

"formats": {
  "gz": {
    "type": "json"
  }
}

However, receive a file not found error. Also tried this:

"formats": {
  "json": {
    "type": "json",
    "extensions": [
       "gz"
     ]
  }
}

Which results in an "invalid JSON mapping" error.

1

There are 1 best solutions below

0
On BEST ANSWER

This was a bug that has been fixed on latest master branch (0.8): https://issues.apache.org/jira/browse/DRILL-1871

My testing confirms that things work OK, still seeing issues, but get some results back.