How to query millions of JSON objects in multiple files with MariaDb CONNECT Engine without crash?

195 Views Asked by At

Assume I have a collection of .json files in a folder, each of which represents exactly one book, i.e.

  • Book 1 in folder/1.json
  • Book 2 in folder/2.json

The files are between only 6kb and 3mb in size each, but there are millions of them!

Now I do the following with MariaDb:

CREATE OR REPLACE TABLE books ( isbn CHAR(13) NOT NULL, title VARCHAR(256) NULL ) ENGINE=CONNECT TABLE_TYPE=JSON OPTION_LIST="PRETTY=2,JMODE=1,MAPPING=1" MULTIPLE=1 FILE_NAME="folder/*.json" READONLY=1;

and then a simple query such as

SELECT COUNT(*) FROM books;

This works fine if there are only a few files in the folder. However, as soon there are even just a few thousands of them, it does not work anymore: The client crashes with Lost connection to MySQL server during query and the server logs are not really helpful (to me).

I guess this is a memory problem, but I do not understand why it happens: It does not seem necessary to load each file/book object into memory; the server could just go each file one-by-one, right?

Things I tried without success

  • PRETTY=0 instead of PRETTY=2
  • HUGE=1
  • set global connect_work_size=999999999 (or so)
  • MAPPING=0instead of MAPPING=1
  • max_allowed_packet=1024M in my.ini

Any help is much appreciated.

Edit: The files are just ordinary object JSON files, such as {"timestamp":1564002510168,"isbn":"9781081958077","author":"Mark Twain", "title": "The Adventures of Tom Sawyer: Annotated"} (and some further attributes).

0

There are 0 best solutions below