I'm working on improving a program to utilize statically allocated huge pages in Linux rather than standard 4Kb pages. I've already set up a system with static huge page support and allocated a large amount of huge pages in the huge page pool (/prov/sys/vm/nr_hugepages). My hope now was to go through the program and find where large malloc()'s are taking place so I can remove the malloc() and replace it with a shmget/shmat which will have the SHM_HUGETLB flag set.
Keep in mind this is highly concurrent program.
My goal in using shared memory isn't to actually use the memory in a shared fashion. I want to preserve all the functions of the malloc() routine while just using statically allocated huge page support in its place.
Below is a sample code segment of what I'm trying to accomplish. Before the program reaches this point it has forked off a number of processes. Each process will be going through this segment of code concurrently. Originally this segment contained just the following code:
if( !( PANEL->WORK = (void *)malloc( (size_t)(lwork) *
sizeof( double ) ) ) )
{
HPL_pabort( __LINE__, "HPL_pdpanel_init",
"Memory allocation failed" );
}
Which works fine.
Now I've modified that to this:
Note!!: I use a random number as a key so each process will have a random key identifying it's independent memory allocation.
int shmid1;
size_t dsize = (size_t)(lwork) * sizeof( double );
/*2097152 is 2MB or the huge page size, if memory request is less,
resort to standard malloc */
if( dsize < 2097152) {
if( !( PANEL->WORK = (void *)malloc(dsize) ) )
{
HPL_pabort( __LINE__, "HPL_pdpanel_init",
"Memory allocation failed" );
}
}
else {
/* Get random number to use as a key */
int randomData = open("/dev/random",O_RDONLY);
int random;
read(randomData, &random, sizeof(random));
close(randomData);
/* Get shared memory segment */
shmid1 = shmget(random, dsize, SHM_HUGETLB
| IPC_CREAT | SHM_R
| SHM_W);
if ( shmid1 < 0 ) {
perror("shmget");
exit(1);
}
printf("HugeTLB shmid: 0x%x\n", shmid1);
PANEL->WORK = (void *)shmat(shmid1, 0, 0);
}
With this new implementation whenever I attempt to read from PANEL->WORK I get a segmentation fault. Before I really start digging into why this is segfaulting I can't help but wonder if I'm approaching this correctly. Is this the best way to approach this and does anyone see a mistake in this methodology?