Read from file fast

214 Views Asked by At

I have a txt file that contains 2 graphs and the number of vertices in the following format:

6
0 1 0 1 0 0
1 0 1 0 0 1
0 1 0 1 0 0
1 0 1 0 1 0
0 0 0 1 0 1
0 1 0 0 1 0
0 1 0 0 1 0
1 0 1 0 0 0
0 1 0 1 0 1
0 0 1 0 1 0
1 0 0 1 0 1
0 0 1 0 1 0

The matrices represent vertice adjacency. If two vertices are adjacent, their pair gets 1. Although the graphs are not separated visually, the second graph starts after the 6th row of the first. Each graph can have a lot of vertices, like 5000 and they are both of the same size (the graphs). I wrote an algorithm that checks if the two graphs are isomorphic and i noticed that reading the graphs takes 8 seconds and the actual algorithm takes 2.5 (for 5000 vertices). Since my goal is to optimize the overall speed of my program, I want to know if i can improve (in terms of speed) my current code of reading from file:

FILE* file = fopen ("input.txt", "r");
fscanf (file, "%d", &i);
int n = i;
while (!feof (file))
{  
    fscanf (file, "%d", &i); 
    if (j < (n*n)) {  // first graph
        if (i==1) {
            adj_1[j/n][v_rank_1[j/n]] = j - (j/n)*n; // add the vertice to the adjacents of the current vertice
            v_rank_1[j/n] += 1;
        }
    }
    else if (j>=(n*n)) {   // second graph
        if (i==1) {
            adj_2[(j-(n*n))/n][v_rank_2[(j-(n*n))/n]] = (j-(n*n)) - ((j-(n*n))/n)*n; // add the vertice to the adjacents of the current vertice
            v_rank_2[(j-(n*n))/n] += 1;
        }
    }
    j++;
}
fclose (file);

The adj_* table holds the indexes of the adjacent vertices of a vertice

The v_rank_* table holds the number of vertices adjacent to a vertice

It is important that I acquire this and only this information from the graph.

3

There are 3 best solutions below

2
On BEST ANSWER

The first optimization is to read the whole file in memory in one shot. Accessing memory in the loops will be faster than calling fread.

The second optimization is to do less arythmetic operations, even if it means more code.

Third optimization is treating the data from file as characters to avoid integer conversion.

The result could be:

// bulk read file into memory
fseek(file, 0, SEEK_END);
long fsize = ftell(file);
fseek(file, 0, SEEK_SET);
char *memFile = malloc(fsize + 1);
if (memFile == NULL) return; // not enough memory !! Handle it as you wish
fscanf(file, "%d", &n);
fread(memFile, fsize, 1, file);
fclose(file);
memfile[fsize] = 0;

// more code but less arythmetic operations
int lig, col;
char *mem = memFile, c;
for (int lig = 0; lig < n; lig++) { // first graph
    for (int col = 0; col < n; col++) {
        for (;;)
        {
            c = *mem;
            if (c == 0) break;
            mem++;
            if (c == '1') {
                adj_1[lig][v_rank_1[lig]++] = col; // add the vertice to the adjacents of the current vertice
                k++; // ??
                break;
            }
            if (c == '0') break;
        }

    }
}
for (int lig = 0; lig < n; lig++) { // second graph
    for (int col = 0; col < n; col++) {
            c = *mem;
            if (c == 0) break;
            mem++;
            if (c == '1') {
                adj_2[(lig][v_rank_2[lig]++] = col; // add the vertice to the adjacents of the current vertice
                l++;  // ??
                break;
            }
            if (c == '0') break;
        }
    }
}
free(memFile);

Remarks: you said nothing about variables k and l.

0
On

You could speed it up by accessing the file system less often. You are reading one integer at a time from the file thus accessing the file every time through the loop.

Instead, try reading the whole file or a large chunk of the file at once. (This is called block reading). You can buffer it into an array. Inside your loop, read from the memory buffer instead of the file. Refresh your memory buffer as needed inside the loop if you don't read in the entire file.

0
On

Use fgets() to read a line at a time into a line buffer. Parse the line buffer into integer values.

This function reduces the number of times you read from the file, because behind the scenes, fgets() reads a large chunk of data from the file and returns a line at a time. It only attempts to read another chunk when there are no more lines left in its internal buffer.