Linking graphs of structures by pointer vs. index in array

1.6k Views Asked by At

This is question about good practice

Consider situation which is typical e.g. in 3D engines, physics engines, Finite element method or classical molecular dynamics solvers: You have objects of various types ( e.g. vertexes, edges, faces, bounded solid volumes ) which are cross-linked to each other (e.g. vertex know which edge are connected to it and vice versa). For performance and convenience of usage of such engine is crucial to be able quickly browse the network of such connections.

The question is: Is it better to point to the linked object by index in array, or by pointer ? ... especially performance-wise

typedef index_t uint16_t;

class Vertex{
    Vec3 pos;
    #ifdef BY_POINTER
    Edge*   edges[nMaxEdgesPerVertex];
    Face*   faces[nMaxFacesPerVertex];
    #else
    index_t edges[nMaxEdgesPerVertex];
    index_t faces[nMaxFacesPerVertex];
    #endif
}

class Edge{
    Vec3   direction;
    double length;
    #ifdef BY_POINTER
    Vertex* verts[2];
    Faces*  faces[nMaxFacesPerEdge];  
    #else
    index_t verts[2];
    index_t faces[nMaxFacesPerEdge];
    #endif
}

class Face{
    Vec3   normal;
    double isoVal; // Plane equation: normal.dot(test_point)==isoVal
    #ifdef BY_POINTER
    Vertex* verts[nMaxVertsPerFace];
    Edge*   edges[nMaxEdgesPerFace];
    #else
    index_t verts[nMaxVertsPerFace];
    index_t edges[nMaxEdgesPerFace];
    #endif
}

#ifndef BY_POINTER
// we can use other datastructure here, such as std:vector or even some HashMap 
int nVerts,nEdges,nFaces;
Vertex verts[nMaxVerts];
Edge   edges[nMaxEdges];
Vertex faces[nMaxFaces];
#endif 

Advantages of index:

  • using index can be more memory efficient when we use uint8_t or uint16_t for index instead of 32-bit or 64-bit pointer
  • index can carry some additional information ( e.g. about orientation of edge ) encoded in some bits;
  • the ordering of objects in array can carry some information about structure ( e.g. vertexes of cube could be ordered as {0b000,0b001,0b010,0b011,0b100,0b101,0b110,0b111} ). This information is not visible in pointers

Advantages of pointers:

  • We don't need to care about the arrays (or other data-structures) to store the objects. The objects can be simply allocated dynamically on the heap by new Vertex().
  • May be faster (?) because it does not need to add the base address of the array (?). But this is probably negligible with respect to memory latency (?)
2

There are 2 best solutions below

1
Heisenbug On BEST ANSWER

using index can be more memory efficient when we use uint8_t or uint16_t for index instead of 32-bit or 64-bit pointer

True. Having a small representation reduce the total size of the structure, reducing cache miss when traversing it.

index can carry some additional information ( e.g. about orientation of edge ) encoded in some bits;

True.

We don't need to care about the arrays (or other data-structures) to store the objects. The objects can be simply allocated dynamically on the heap by new Vertex().

This is exactly what you don't want to do, speaking of performances. You want to be sure Vertex are all packed, to avoid unnecessary cache missing. In this case the array would save you from that wrong temptation. You also want to access them sequentially, at least as much as possible, again to minimize cache miss.

How much you data structure are packed, small and accessed sequentially, is what actually drive performances.

May be faster (?) because it does not need to add the base address of the array (?). But this is probably negligible with respect to memory latency (?)

Possibly negligible. Probably depends on specific hardware and|or compiler.

Another missing advantage about index: easier to manage when you reallocate. Consider a structure that can grow, like the following:

struct VertexList
{
  std::vector<Vertex> vertices;

  Vertex *start; // you can still access using vector if you prefer; start = &vertices[0];
}

If you are referencing a given vertex using pointers, and a reallocation occurs, you will end up with an invalid pointer.

4
John Zwinck On

For performance, what matters is the speed with which you can read the "next" element in whatever traversal order(s) are commonly done in the hot path.

For example, if you have a series of edges which represent some path, you would want them to be stored contiguously in memory (not using new for each one), in the order in which they are connected.

For this case (edges forming a path), it's clear that you do not need pointers, and you also do not need indexes. The connections are implied by the storage locations, so you just need a pointer to the first and perhaps the last edges (i.e. you can store the whole path in a std::vector<Edge>).

A second example illustrating domain knowledge that we can exploit: imagine we have a game supporting up to 8 players and want to store "who has visited each of the edges in a path." Again we do not need pointers nor indexes to refer to the 8 players. Instead, we can simply store a uint8_t inside each Edge and use the bits as flags for each player. Yes, this is low-level bit banging, but it gives us compact storage and efficient lookup once we have an Edge*. But if we need to do the lookup in the other direction, from players to Edges, the most efficient will be to store e.g. a vector of uint32_t inside each player and do indexing into the Edge array.

But what if edges can be added and removed in the middle of a path? Well then we might want a linked list. In this case we should use an intrusive linked list, and allocate the Edges in a pool. Having done this, we can store pointers to the Edges in each player object, and they will never change or need to be updated. We use an intrusive linked list with the understanding that an Edge is only ever part of a single path, so extrinsic storage of the linked-list pointers would be wasteful (std::list needs to store pointers to each object; an intrusive list does not).

So, each case must be considered individually, with as much knowledge of the domain as we can discover in advance. Neither pointers and indexing should be the first approach.