Hashing vertices of a Graph in C

43 Views Asked by At

Relative beginner in C here so go easy on me! Consider the following structures and types:

#include <stdint.h>

typedef uint32_t u32;
typedef u32 color;
typedef struct Vertice* vertice; // Puntero al Vertice
typedef struct Lado* lado;       // Puntero al Lado
typedef struct GrafoSt* Grafo;

struct GrafoSt {
    u32 n;              // Numero de vertices del grafo
    u32 m;              // Numero de lados del grafo
    u32 delta;          // Max de grados
    u32 sigma;          // Min de grados
    vertice* _vertices; // Arreglo de punteros a Vertice
    lado* _lados;       // Max m*2
};

struct Vertice {
    u32 nombre; // Nombre del vertice
    u32 grado;  // Numero de vecinos
    color color_;
    // Modificar
    u32* Vecinos; // Puntero al indice del primer vecino
};

struct Lado {
    u32 x;
    u32 y;
};

As you can see, _vertices is an array of pointers to Vertice. On initialization, this array is set to size Grafo -> n, the number of vertices in the graph.

I'm dealing with very large graph structures (potentially millions of vertices, and consequently even more edges). Thus, efficiency becomes a serious issue. I've been advised to use hashing to store each pointer to Vertice in the _vertices array. The general idea of the hashing function would be to store a Vertice of nombre = x in (Grapho -> _vertices)[my_hash_function(x)].

I'm new to the concept of hashing and have found a few issues. I found in StackOverflow the following hash function:

u32 Hashv(u32 x) {
    x = ((x >> 16) ^ x) * 0x45d9f3b;
    x = ((x >> 16) ^ x) * 0x45d9f3b;
    x = (x >> 16) ^ x;
    return (x);
}

which reportedly is good. However, the range of this function is u32; in other words, it is not restricted to [0, (Graph -> n) - 1] the possible index values of the _vertices array. My workaround was to use mod; this is, to store each each Vertice with nombre = x in (Graph -> _vertices)[Hashv(x) % Graph -> n]. This generated a whole of collisions, which I resolved using linear probing. However, in very large graphs, the great number of collisions makes the implementation suboptimal.

Is the idea of hashing verteces a good approach indeed? If so, how could it be improved in this implementation?


EDIT: The purpose of working with graphs is to design and test graph coloring algorithms. This makes it important to have an efficient access to the neighbors (Vecinos) of a gien vertex. For example, in a Greedy coloring algorithm, to color a vertex x I'll need to know the color of its neighbors.

The way I'm loading the graphs and initializing the struct is not really important I think so I'll skip that. I'm using Ubuntu Linux.

0

There are 0 best solutions below