I came across a JPEG parsing function in Node.js that I'm attempting to adapt for use in a browser environment. The original code can be found here.

The original code uses Node.js' Buffer class. As i would like to use it it for a browser environment we have to use the DataView.getUint16(0, false /* big endian */) instead of buffer.readUInt16BE(0) /*BE = big endian */

Interestingly, DataView is also available in NodeJs, so the result could be cross environement.

Here what I found so far :

  • Introducing a variable j starting from 4 helps get the correct offset for the first iteration, as the buffer 4 first bytes are sliced :
  let j=4 // match the buffer slicing above
  • Adding + 2 to j for next reading does not help getting the correct offset for next iteration despite the buffer being sliced of exactly two more bytes
    j+=2; // match the buffer slicing below ( i + 2 )
    buffer = buffer.slice(i + 2); // Buffer is sliced of two bytes, 0 offset is now 2 bytes further ?

Here is the function with logging added

function calculate (buffer) {

  // Skip 4 chars, they are for signature
  buffer = buffer.slice(4);
  let j=4 // match the buffer slicing above
  let aDataView=new DataView(buffer.buffer);
  var i, next;
  while (buffer.length) {
    // read length of the next block
    i = buffer.readUInt16BE(0);
    console.log("i="+i,"read="+aDataView.getUint16(j,false));
    j+=2; // match the buffer slicing below ( i + 2 )
    // ensure correct format
    validateBuffer(buffer, i);

    // 0xFFC0 is baseline standard(SOF)
    // 0xFFC1 is baseline optimized(SOF)
    // 0xFFC2 is progressive(SOF2)
    next = buffer[i + 1];
    if (next === 0xC0 || next === 0xC1 || next === 0xC2) {
      return extractSize(buffer, i + 5);
    }

    // move to the next block
    buffer = buffer.slice(i + 2);
  }

  throw new TypeError('Invalid JPG, no size found');
}

Actual result on this image:

node .\start.js 
i=16 read=16 # Seems to be the correct offset
i=91 read=19014 # Wrong offset
i=132 read=18758

My debbuging steps are so far: Installed buffer-image-size from npm npm install buffer-image-size --save Wrote start.js as the following

var sizeOf = require('buffer-image-size');
const fs = require('fs');

fileBuffer = fs.readFileSync("flowers.jpg");
var dimensions = sizeOf(fileBuffer);
console.log(dimensions.width, dimensions.height);

Edited "node_modules\buffer-image-size\lib\types\jpg.js" adding mentioned lines and logging

Do you have any hint about

  • Why adding 2 to j does no helps to get the correct offset.
  • How to get the same algorithm without slicing the buffer over and over

I appreciate any insights or guidance on resolving this issue. Thank you!

1

There are 1 best solutions below

3
Bergi On BEST ANSWER

Yeah, avoid to both advance offsets and re-slice the buffer, it only gets confusing. I would write

function calculate(typedArray) {
  const view = new DataView(typedArray.buffer, typedArray.byteOffset, typedArray.byteLength);
  let i = 0;
  // Skip 4 chars, they are for signature
  i += 4;

  while (i < view.byteLength) {
    // read length of the next block
    const blockLen = view.getUint16(i, false /* big endian */);

    // ensure correct format
    // index should be within buffer limits
    if (i + blockLen > view.byteLength) {
      throw new TypeError('Corrupt JPG, exceeded buffer limits');
    }
    // Every JPEG block must begin with a 0xFF
    if (view.getUint8(i + blockLen) !== 0xFF) {
      throw new TypeError('Invalid JPG, marker table corrupted');
    }

    // 0xFFC0 is baseline standard(SOF)
    // 0xFFC1 is baseline optimized(SOF)
    // 0xFFC2 is progressive(SOF2)
    const next = view.getUint8(i + blockLen + 1);
    if (next === 0xC0 || next === 0xC1 || next === 0xC2) {
      return extractSize(view, i + blockLen + 5);
    }

    // move to the next block
    i += blockLen + 2;
  }

  throw new TypeError('Invalid JPG, no size found');
}

Notice that this code, which is a straightforward translation of the source, is slightly confusing and buggy:

  • i does not point to the start of the segment, but rather two bytes into the segment (after the marker)
  • the code skips the first dynamic segment (which, admittedly, is required to be an APP0 segment anyway)
  • the code assumes all segments to have a variable length specified in their header, and ignores standalone markers as well as fill bytes
  • the code may cause RangeError exceptions from accessing bytes beyond the end of the buffer, as it only checks for the past block to be within limits