Some background
Versioning notebooks can become very inefficient if the output is expected to vary a lot. I solved this problem with my Jupyter notebooks using nbstripout
, but so far I've found no alternative for Zeppelin notebooks.
Because nbstripout
uses nbformat
to parse ipynb
files, it's not an easy patch to make it support Zeppelin. On the other hand, the goal is not that complex: simply empty out all the "msg": "..."
.
Goal
Given a JSON file, empty out all 'paragraphs.result.msg'
fields.
Sample (schema):
{"paragraps": [{"result": {"msg": "Very long output..."}}]}
Git Filter
The best solution (thanks to @steven-penny) is to run this:
git config filter.znbstripout.clean "jq '.paragraphs[].result.msg = \"\"'"
which will setup a filter called
znbstripout
that invokes thejq
tool. Then, in your.gitattributes
file you can just put:*.json filter=znbstripout
Python Script (usable with Git Hooks)
The following can be used as a git hook: