In Richard Fabian's Data Oriented Design, he gives an example of component based objects for a game where the different properties that an object can have are separated into distinct arrays, and functionality is created by processing the data in these arrays. He gives an example for rendering:
struct Orientation { vec pos, up, forward, right; };
SparseArray<Orientation> orientationArray;
SparseArray<Vec> velocityArray;
SparseArray<bool> isVisible;
SparseArray<AssetID> modelArray;
void RenderUpdate() {
    foreach( {index, assetID} in modelArray) {
        if( index in isVisible ) {
            gRenderer.AddModel( assetID, orientationArray[ index ] );
        }
    }
}
What I am confused about is how this code is cache friendly. From what I understand, Data Oriented Design achieves better performance by keeping the relevant data that is required for a certain operation in contiguous memory so when you iterate through the data to update, you will reduce the number of cache misses. However, in this example, rendering requires information from 3 different "components", so in each iteration you will be accessing a different array of data, not to mention potential performance hits from having to lookup an ID if any components need to interact with each other's data.
I am working on a very simple 2D game to practice Data Oriented Design. If I were to model a basic behavior in my game like having objects move following the pattern given above, it would be something like this:
struct Position { int x, y; };
struct Velocity { int x, y; };
SparseArray<Position> positionsArray;
SparseArray<Velocity> velocitiesArray;
Then in the physics update, I would update each Position in positionsArray with the corresponding Velocity in velocitiesArray.
Wouldn't it be better for the cache to combine the data required for this operation into a single struct:
struct MoveComponent { Position pos; Velocity vel };
SparseArray<MoveComponent> moveComponents;
so you have the required data needed to update positions all stored contiguously?