Context
I am implementing a seam carving algorithm.
I am representing the pixels in a picture as a 1D array
private int[] picture;
Each int represents the RGB of the pixel.
To access the pixels I use helper methods such as:
private int pixelToIndex(int x, int y) {return (y * width()) + x;}
The alternative would be to store in a 2D array:
private int[][] picture;
The seam carving algorithm has two parts.
Firstly, it does some image processing where it finds the horizontal or vertical connected seam with lowest energy. Here the pixel accesses jump around a bit across rows.
Secondly it removes this connected seam.
For vertical seams I mark the pixel to be removed with -1 and create a new picture array skipping the removed pixels like so:
int i = 0, j = 0;
while (i < temp.length) {
if (picture[j] != -1) {
temp[i++] = picture[j];
}
j++;
}
picture = temp;
For horizontal seams, given a specific column I shift all the pixels after the deleted pixel of that column up by one row as so:
for (int i = 0; i < temp.length; i++) {
int row = indexToY(i);
int col = indexToX(i);
int deletedCell = seam[col];
if (row >= deletedCell) temp[i] = picture[i + width()];
else temp[i] = picture[i];
}
picture = temp;
The question
Obviously the 1D array uses less physical memory because of the overhead for each subarray but given the way I am iterating the matrix would the 2D array be more effectively cached by the CPU and thus more efficient?
How would the arrays differ in the way they would be loaded into the CPU cache and RAM? Would part of the 1D array go into the L1-cache? How would the 1D and 2D array be loaded into memory? Would it be dependent on size of the array?
An array of ints is just represented just as that: an array of int values. An array of arrays ... adds certain overhead. So, short answer: when dealing with really large amounts of data; plain 1-dimensional arrays are your friend.
On the other hand: only start optimizing after understanding the bottlenecks. You know, it doesn't help much to optimize your in-memory-datastructure ... when your application spends most of its time waiting for IO for example. And if your attempts to write "high performance" code yield "complicated, hard to read, thus hard to maintain" code ... you might have focused on the wrong area.
Besides: concrete performance numbers are affected by many different variables. So you want to do profiling first; and see what happens with different hardware, different data sets, and so on.
And another side note: sometimes, for the real number crunching; it can also be a viable option to implement something in C++ can make calls via JNI. It really depends on the nature of your problem; how often things will be used; response times expected by users; and so on.