I have been experimenting with PostgreSQL and PL/V8, which embeds the V8 JavaScript engine into PostgreSQL. Using this, I can query into JSON data inside the database, which is rather awesome.
The basic approach is as follows:
CREATE or REPLACE FUNCTION
json_string(data json, key text) RETURNS TEXT AS $$
var data = JSON.parse(data);
return data[key];
$$ LANGUAGE plv8 IMMUTABLE STRICT;
SELECT id, data FROM things WHERE json_string(data,'name') LIKE 'Z%';
Using, V8 I can parse JSON data into JS, then return a field and I can use this as a regular pg query expression.
BUT
On large datasets, performance can be an issue, as for every row I need to parse the data. The parser is fast, but it is definitely the slowest part of the process and it has to happen every time.
What I am trying to work out (to finally get to an actual question) is if there is a way to cache or pre-process the JSON ... even storing a binary representation of the JSON in the table that could be used by V8 automatically as a JS object might be a win. I've had a look at using an alternative format such as messagepack or protobuf, but I don't think they will necessarily be as fast as the native JSON parser in any case.
THOUGHT
PG has blobs and binary types, so the data could be stored in binary, then we just need a way to marshall this into V8.
perhaps instead of making the retrieval phase responsible for parsing the data, creating a new data type which could pre-disseminate json data on input might be a better approach?
http://www.postgresql.org/docs/9.2/static/sql-createtype.html