Question one: Are there specialized databases to store dense and sparse matrices ? I googled but didn't find any...
The matrix in question is huge (10^5 by 10^5) but it's sparse, which means that most of its values are zeros and I only need to store the non-zero values. So I thought of making a table like this:
2D Matrix
---------------
X Y val
---------------
1 2 4.2
5 1 91.0
9 3 139.1
And so on. 3 columns, two for the coordinates, the third for the value of that cell in the sparse matrix. Question 2: Is this the best way to store a sparse matrix ? I also considered MongoDB but it seems that making one document per cell of the matrix would be too much overhead. Table oriented databases are slow but I can use VoltDB :) Side-node: I thought of a Redis Hash but can't make it bi-dimensional (found a way to serialize 2D matrixes and make it 1D, that way I can store in a Redis Hash or even List)
Question 3: How many bytes per line will VoltDB use ? The coordinates will be integers ranging from 0 to 10^5 maybe more, the values of the cell will be floats.
Regarding Question 3, based on your example, the X and Y columns could be the INTEGER datatype in VoltDB, which is 4 bytes. The value column could be a FLOAT datatype, which is 8 bytes.
Each record would therefore be 16 bytes, so the nominal size in memory would be 16 bytes * row count. In general, you add 30% for overhead, and then 1GB per server for heap size to determine the overall memory needed. See the references below for more detail.
You will probably want to index this table, so assuming you wanted a compound index of (x,y), the size would be as follows:
Tree index: (sum-of-column-sizes + 8 + 32) * rowcount Hash index: (((2 * rowcount) + 1) * 8) + ((sum-of-column-sizes + 32) * rowcount)
sum-of-column-sizes for (x,y) is 8 bytes.
References:
The available datatypes are listed in Appendix A of Using VoltDB: http://community.voltdb.com/docs/UsingVoltDB/ddlref_createtable#TabDatatypes
Guidelines and formulas for estimating the memory size are in the VoltDB Planning Guide: http://community.voltdb.com/docs/PlanningGuide/ChapMemoryRecs