Splitting file into parts by bits

378 Views Asked by At

Ok, so this is a unique question.

We are getting files (daily) from a company. These files are downloaded from their servers to ours (SFTP). The company that we deal with deals with a third party provider that creates the files (and reduces their size) to make downloads faster and also reduce file-size on their servers.

We download 9 files daily from the server, 3 groups of 3 files
Each group of files consists of 2 XML files and one "image" file.
One of these XML files gives us information on the 'image' file. Information in the XML file we need:

  • offset: Gives us where a section of data starts
  • length: Used with offset, gives us the end of that section
  • count: Gives us the number of elements held in the file


The 'image' file itself is unusable until we split the file into pieces based on the offset and length of each image in the file. The images are basically concatenated together. We need to extract these images to be able to view them.

An example of offset, length and count values are as follows:

offset: 0
length: 2670

offset: 2670
length: 2670

offset: 5340
length: 2670

offset: 8010
length: 2670

count: 4

This means that there are 4 (count) items. The first count item begins at offset[0] and is length[0] in length. The second item begins at offset[1] and is length[1] in length, etc.

I need to split the images at these points and these points PRECISELY without room for error. The third party provider will not provide us with the code and we are to figure this out ourselves. The image file is not readable without splitting the files and are essentially useless until then.


My question: Does anyone have a way of splitting files at a specific byte?

P.S. I do not have any code yet. I don't even know where to begin with this one. I am not new to coding, but I have never done file splitting by the byte.

I don't care which language this uses. I just need to make it work.


EDIT
The OS is Windows

1

There are 1 best solutions below

1
On

You hooked me. Here's a rough Java method that can split a file based on offset and length. This requires at least Java 8.

A few of the classes used:

And an article I found helpful in producing this example.

/**
 * Method that splits the data provided in fileToSplit into outputDirectory based on the
 * collection of offsets and lengths provided in offsetAndLength.
 * 
 * Example of input offsetAndLength:
 *      Long[][] data = new Long[][]{
 *          {0, 2670},
 *          {2670, 2670},
 *          {5340, 2670},
 *          {8010, 2670}
 *      };
 * 
 * Output files will be placed in outputDirectory and named img0, img1... imgN
 * 
 * @param fileToSplit
 * @param outputDirectory
 * @param offsetAndLength
 * @throws IOException
 */
public static void split( Path fileToSplit, Path outputDirectory, Long[][] offsetAndLength ) throws IOException{

    try (SeekableByteChannel sbc = Files.newByteChannel(fileToSplit, StandardOpenOption.READ )){
        for(int x = 0; x < offsetAndLength.length; x++){

            ByteBuffer buffer = ByteBuffer.allocate(offsetAndLength[x][4].intValue());
            sbc.position(offsetAndLength[x][0]);
            sbc.read(buffer);

            buffer.flip();
            File img = new File(outputDirectory.toFile(), "img"+x);
            img.createNewFile();

            try(FileChannel output = FileChannel.open(img.toPath(), StandardOpenOption.WRITE)){
                output.write(buffer);
            }

            buffer.clear();
        }
    }

}

I leave parsing the XML file to you.