A fast local BLOB storage

163 Views Asked by Felix At 17 August 2025 at 09:42

I have to make a text classfication engine with large training datasets (~1 million of texts, each of them has >= 1 categories).

The problem is to make a temporary storage for my statistics (text features, extracted from each text). For each text I have two arrays: int[numOfFeatures] (the indexes of features presented) and double[numOfFeatures] (the weight of each feature). In case of numOfFeatures == 3000, it's about 36 Kb size, or 36 Gb for all the base.

I tried to do it with SQLite database, packing the arrays as BLOBs:

CREATE TABLE 'docBlobs' ('docId' INTEGER, 'featureIndexes' BLOB, 'featureWeights' BLOB );

CREATE INDEX 'docIdIndex_docBlobs' ON 'docBlobs' ('docId' ASC);

The main use case of such statistics is sequential iterative loading of BLOBs of given category. The information of what document IDs are of each category, is stored in memory, and I tried some ways of loading BLOBs by a set of known IDs.

First, I tried performing

SELECT * FROM 'docBlobs' WHERE 'docBlobs'.'docId' = '?'

in a cycle. It gave about 20 rows per second.

Then, I tried to make a temp table in memory, put there the needed IDs and use it as IN list:

PRAGMA temp_storage = MEMORY

CREATE TEMP TABLE 'tempTable' ('docId' INTEGER)

INSERT INTO 'tempTable' VALUES (?)

SELECT * FROM 'docBlobs' WHERE 'docBlobs'.'docId' IN (SELECT 'tempTable'.'docId' FROM 'tempTable')

It gave me about 20 rows per second.

Then I tried to make a JOIN with temp table. Is also gave me about 20 rows per second.

Don't you know a way to get a normal reading speed for my task?

P.S. In case of SELECT * FROM 'docBlobs' the reading speed was about 1000 rows per second, just like the speed of my HDD.

Original Q&A

There are 1 best solutions below

CL. On 30 November 2013 at 15:00

If the records to be loaded are not adjacent on disk, the disk must seek. These delays cannot be avoided.

If you're loading documents by category, you could try storing them ordered by category, but this does not work if there are multiple categories per document.

A fast local BLOB storage

There are 1 best solutions below

Related Questions in PERFORMANCE

Related Questions in SQLITE

Related Questions in BLOB

Related Questions in KEY-VALUE

Related Questions in KEY-VALUE-STORE

Trending Questions

Popular # Hahtags

Popular Questions