Erasure Encoding with little resource usage

119 Views Asked by At

Hey I'm pretty new to a lot of the erasure encoding concepts. I've mostly only read about Reed-solomon, but it does not fit what I need.

I need to find a technique that can create parity shards on large data WITHOUT requiring heavy system resource usage.

For example:

I want to store a 32gb video cut up into eight 4gb shards. I want to create 3 parity shards for this. I can not exceed more than a few hundred mb of memory at most, and I want the entire parity shards created incrementally so that I may write them to another file system without storing the entire thing in memory/on local disk.

Is there an erasure encoding technique so I can:

  • Create parity shards for larges files without using significant amounts of memory
  • incrementally create and distribute the parity shards to another system by sending the bytes as they are created.
1

There are 1 best solutions below

0
On

So that I understand the goal here,if you consider the eight 4gb shards as a matrix of 8 rows, where each row has 4gb of data, then parities would be 3 rows where each row has 4gb of data? Assuming this is the case, then the code will need encode and transmit 11 row chunks at a time, perhaps using 10 MB chunks, which would need 110MB of memory (plus the overhead for tables used by the RSECC). It might be better to use much smaller chunks, depending on messaging overhead when transmitting the data.

On the receiving end, you'd want to at least double buffer the received data, delaying the initial video output for at least one buffer time so that the data reception and correction occur in parallel to the video display.

The question mentions erasures, is this to be an erasure only scheme which would require re-transmission in the case of an error? With 3 parities, 1 row chunk of data could be corrected, leaving one parity row left over for error detection.