What is the best alternative for the following situation?

166 Views Asked by At

I'm using a JSONB field in my Postgresql database to store the following document. I own thousands of documents. I need to create reports with this data, but the search is very slow.

If I need to create a report stating the new users of a month, I need to go through the entire document comparing if the user is in one month and not in another.

Message document:

[{"recipient":1,"user":4,"created_at":"2016-11-10","content":"Duis aliquam convallis nunc.","is_sender_user":true},
{"recipient":1,"user":18,"created_at":"2016-12-10","content":"Proin eu mi.","is_sender_user":false},
{"recipient":1,"user":4,"created_at":"2016-11-20","content":"In hac habitasse platea dictumstm.","is_sender_user":true},
{"recipient":1,"user":20,"created_at":"2016-12-14","content":"Donec ut dolor.","is_sender_user":true},
{"recipient":1,"user":13,"created_at":"2016-12-06","content":"Nulla mollis molestie lorem. Quisque ut erat. Curabitur gravida nisi at nibh.","is_sender_user":true}]

It would be better to create a User table and create a JSONB messages field to store your messages. Or the way it is I can create my report using JSONB queries?

2

There are 2 best solutions below

3
On BEST ANSWER

As Samuil Petrov mentioned you can create index on jsonb field, i suggest creating index on month part of created_at and user

create INDEX td002_si3 ON testData002 (substring(doc->>'created',0,8),(doc->>'user'));

with this the query

SELECT 
      substring(doc ->> 'created', 0, 8) AS m,
      ARRAY_AGG(DISTINCT doc ->> 'user')          AS users
    FROM testData002
    GROUP BY substring(doc ->> 'created', 0, 8)

will give you the monthly users from an index scan

GroupAggregate  (cost=0.28..381.52 rows=3485 width=50)
  Group Key: ""substring""((doc ->> 'created'::text), 0, 8)
  ->  Index Scan using td002_si3 on testdata002  (cost=0.28..294.28 rows=3500 width=50)

test data generated with

create table testData002 as 
     select row_number() OVER () as id
           ,jsonb_build_object('created',dt::DATE
                              ,'user',(random()*1000)::INT) as doc 
       from generate_series(1,10),generate_series('2016-01-01'::TIMESTAMP,'2016-12-15'::TIMESTAMP,'1 day'::INTERVAL) as dt;
0
On

Your message documents describe a relationship between users: a sender transmits content to a recipient. A sender may send many messages, a recipient may receive many messages. This is best represented in a relational structure, with a users table and a messages table having foreign key constraints for the sender and recipient.

It's possible to just heave everything into a JSONB field like you're doing, but there are some major disadvantages: query performance suffers, although as Samuil Petrov mentioned this can be ameliorated with indexing; but more importantly, there's nothing preventing a message from having an invalid user or recipient id. Using a schemaless JSONB field can simplify development while you're still hashing out what you need to store, but once you know what you need, it should be enforced by your schema.