Dividing a file up with integer math

97 Views Asked by At

I'm reading a file from n servers, and I want each to download 1/nth of the file. I thought some quick integer math would work, but it doesn't seem to always work:

threads = n
thread_id = 0:n-1
filesize (in bytes) = x

starting position = thread_id*(filesize/threads)
bytes to read = (filesize/threads)

Sometimes for just the right numbers, like a 26 byte file divided up by 9 threads (I know that's ridiculous but just for example), it doesn't work out in my favor. There must be a better way. Any ideas?

3

There are 3 best solutions below

1
On

you have to do something like:

starting position = thread_id * floor(filesize / threads)
bytes to read = floor(filesize / threads) if thread_id != threads-1
bytes to read = filesize - (threads-1)*floor(filesize / threads) if thread_id = threads - 1
0
On

It seems to me the only thing missing is the last thread (thread n-1) must read to the end of the file to grab the 'modulus' bytes - the bytes that were left over by dividing by threads. Basically:

bytes_to_read = (thread_id == n - 1) ? filesize / threads + filesize % threads
                                     : filesize / threads

Alternately you could split up this extra work over the first filesize % threads threads, by adding 1 byte per thread to the bytes_to_read - of course you'll have to adjust the starting positions.

0
On

To read each byte exactly once, compute the start and end position consistently, then subtract to get the number of bytes:

start_position = thread_id * file_size / n
end_position = (thread_id + 1) * file_size / n
bytes_to_read = end_position - start_position

Note that the position expression is carefully chosen to give you end_position == file_size when thread_id == n-1. If you do something else, like thread_id * (file_size/n), you will need to treat this as a special case, like @wuputah says.