Is there an erasure code, which can be applied to multiple chunks (maybe 100 or 200, each few hundred kB) by (somehow) adding redundancy chunks ?
I heard about Reed-Solomon, but it doesn't look like it can be used for huge data sets and multiple chunks, am I wrong ?
Thanks!
Erasure codes encode $N$ original data chunks into $M$ parity chunks for redundancy, while these $N$ original data chunks and $M$ parity chunks are only a stripe of the whole storage. Theoretically, the size of $N$ can be arbitrary large for Reed-Solomon(RS) codes, only if the Galois field $GF(2^w)$ the RS constructed over is large enough. Based on the above, your question is more likely as follow
The reasons are
update problem
andrepair problem
: If you encode a great number data chunks into parity chunks by the erasure codes, a lot of data/parity chunks are co-related. As long as you update one data chunk, all parity chunks must be updated as well, which cause heavy I/O for the parity part;repair problem
is the situation when one data/parity chunk is failed, lots of data/parity chunks are accessed and transferred for repairing, causing enormous disk I/O or network traffic.Take RAID5 of $3$ data chunks (A, B, C) and parity chunk P=A+B+C as a example, the repair for the failure of any chunk requires all other three chunks to participate.
The greater number of chunks are encoded, the more serious
update problem
andrepair problem
for a storage system might meet, which further greatly influences the performance of the system.BTW, the decoding(process to obtain the original data) speed greatly falls off when the $N$ enlarges.