Is representing a large matrix as a 1D array more efficient than a 2D array?

2.7k Views Asked by At

Context

I am implementing a seam carving algorithm.

I am representing the pixels in a picture as a 1D array

private int[] picture;

Each int represents the RGB of the pixel.

To access the pixels I use helper methods such as:

private int pixelToIndex(int x, int y) {return (y * width()) + x;}

The alternative would be to store in a 2D array:

private int[][] picture;

The seam carving algorithm has two parts.

Firstly, it does some image processing where it finds the horizontal or vertical connected seam with lowest energy. Here the pixel accesses jump around a bit across rows.

Secondly it removes this connected seam.

For vertical seams I mark the pixel to be removed with -1 and create a new picture array skipping the removed pixels like so:

int i = 0, j = 0;
while (i < temp.length) {
    if (picture[j] != -1) {
        temp[i++] = picture[j];
    }
    j++;
}
picture = temp;

For horizontal seams, given a specific column I shift all the pixels after the deleted pixel of that column up by one row as so:

for (int i = 0; i < temp.length; i++) {
    int row = indexToY(i);
    int col = indexToX(i);
    int deletedCell = seam[col];

    if (row >= deletedCell) temp[i] = picture[i + width()];
    else temp[i] = picture[i];
}
picture = temp;

The question

Obviously the 1D array uses less physical memory because of the overhead for each subarray but given the way I am iterating the matrix would the 2D array be more effectively cached by the CPU and thus more efficient?

How would the arrays differ in the way they would be loaded into the CPU cache and RAM? Would part of the 1D array go into the L1-cache? How would the 1D and 2D array be loaded into memory? Would it be dependent on size of the array?

2

There are 2 best solutions below

2
GhostCat On

An array of ints is just represented just as that: an array of int values. An array of arrays ... adds certain overhead. So, short answer: when dealing with really large amounts of data; plain 1-dimensional arrays are your friend.

On the other hand: only start optimizing after understanding the bottlenecks. You know, it doesn't help much to optimize your in-memory-datastructure ... when your application spends most of its time waiting for IO for example. And if your attempts to write "high performance" code yield "complicated, hard to read, thus hard to maintain" code ... you might have focused on the wrong area.

Besides: concrete performance numbers are affected by many different variables. So you want to do profiling first; and see what happens with different hardware, different data sets, and so on.

And another side note: sometimes, for the real number crunching; it can also be a viable option to implement something in C++ can make calls via JNI. It really depends on the nature of your problem; how often things will be used; response times expected by users; and so on.

1
user987339 On

Java has arrays of arrays for multi-dimensional arrays. In your case int[][] is an array of int[] (and of course int[] is an array of int). So, matrix is represented as a set of rows and pointers for each row. In this case it means that NxM matrix is occupying NxM for data and an array of pointers.

Since you can represent any matrix as an array you'll get less memory consumption storing it that way.

On the other hand address manipulation in case representing a 2D matrix as an arary is not that complex.

If we assume that you have a matrix that is NxM accessing and an Array with size NxM representing this matrix, yo can access element of Matrix[x,y] as Array[x*n+y].

Array[i] is compact and it has a high probability of being in L1 cache, or even in register cache.

Matrix[x,y] requires one memory read and addition Array[x*n+y] requires one multiplication and one addition.

So, I'll put my two cents on Array, but anyway it has to be tested (don't forget to wait for warming time for JIT compiler)