External Insertion Sort On a Binary File

628 Views Asked by At

I am trying to preform an external insertion sort on a binary file full of random doubles between 0 and 1. I added a bunch of println statements after "value" and "temp" are assigned and it looks like they are getting the same value each iteration. I don't think I am traversing the file correctly.

public class ExternalFileSort 
{

    public static void sort(String filename, int length) throws IOException
    {
        int i, j;
        double value, temp;


        RandomAccessFile file = new RandomAccessFile(filename, "rw");

        for (i = 1; i < length; i++)
        {
            file.seek(i);
            temp = file.readDouble();

            j = i-1;
            file.seek(j);
            value = file.readDouble();


            while (j >= 0 && value > temp)
            {
                file.seek(j+1);
                file.writeDouble(value);
                j--;
            }

            file.seek(j+1);
            file.writeDouble(temp);

        }


        file.close();
    }
}
2

There are 2 best solutions below

0
On

The first thing you should know is that the double type uses 8 bytes. If your file is a binary array of doubles, the first 8 bytes will correspond to the first double, the next 8 bytes to the second double and so on. If you try reading a double after performing a file.seek(1), for example, you will read a messed up value because it would be composed by 7 bytes of the first double and the first byte of the second.
It would be a lot easier and more efficient to first read the entire file into an array of doubles, perform the sort algorithm in the array and then writing the resulting array back to the disk. For the first part you would calculate the number of doubles using int size = file.length() / 8;. Then you would create an array of doubles with this size and read them with the corresponding number of file.readDouble() calls.
If you absolutely have to do the entire operation directly on the binary file, don't forget that the real positions of doubles inside of the file should be multiplied by 8 on seek operations so they are converted into byte positions. Ex: the fist double would be at position 0, the second at 8, the third at 16 and so on.

0
On

Assuming you have doubles written in your file according to that definition (that is a mandatory prerequisite) https://docs.oracle.com/javase/7/docs/api/java/io/DataOutput.html#writeDouble%28double%29

RandomAccessFile file = new RandomAccessFile(filename, "rw");
long currentPosition = 0L;
while (currentPosition < file.length()) {
    double current = file.readDouble();
    double min = current;
    long minPosition = currentPosition;
    // Find the smallest value in the rest of the file
    while (currentPosition < file.length()) {
        double candidate = file.readDouble();
        if (candidate < min) {
            min = candidate;
            minPosition = file.getFilePointer() - 8;
        }
    }
    // Swap
    file.seek(minPosition);
    file.writeDouble(current);
    file.seek(currentPosition);
    file.writeDouble(min);
    currentPosition = file.getFilePointer();
}

Not tested, but you get the idea.