How to compress or do size optimization for large static sparse array in C

206 Views Asked by At
/* The following codes are compiled into library(test.a) */
typedef struct {
    short x[5];
} weight_t;

typedef struct {
    weight_t wgt;
    char *ptr;
} myarray_t;

/* huge static array and 70% of entries are empty/zero */
static myarray_t array1[] ={
{ { 0x0102, 0x0102, 0x0102, 0x0102, 0x0102, }, 0 }, /* 0000 */
{ { 0x0103, 0x0103, 0x0103, 0x0103, 0x0103, }, 0 }, /* 0001 */
{ { 0x0104, 0x0104, 0x0104, 0x0104, 0x0104, }, 0 }, /* 0002 */
{ { 0x0105, 0x0105, 0x0105, 0x0105, 0x0105, }, 0 }, /* 0003 */

{ { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, }, 0 }, /* 3134c */
{ { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, }, 0 }, /* 3134d */
......
......
.....

} /* more than one millon entries */

Since array1 contains more than one million entries(even though 70% of them are empty/zero entries), the size of library file(test.a) is quite big.

I know using hash table to only include non-empty entries can reduce the size of array1 but is that any way to compress or do size optimization for library file(test.a) without changing array1?

Since array1 has been referred by other libraries, revamping array1 to use other implementation might not be a solution for my current situation.

This is a followup to compress or size optimization for large sparse array in C which is basically asking the same thing, for file-size saving without changing the in-memory data structure. This one is tagged , the other one isn't. (Editor's note: I don't think that makes them different enough to be non-duplicates; closing this unless we get clarification of what the difference in this question was supposed to be.)

1

There are 1 best solutions below

0
On

One obvious size optimization is to get rid of struct padding. You could have padding between the 2x5 byte array and the pointer, and/or padding at the end of the struct. Likely you are either using 2x5 + 4 = 14 bytes or 2x5 + 8 = 18 bytes of actual data. Neither sits well with 4 or 8 byte alignent.

Therefore drop the struct:

static short array1 [5*n] = { ... };
static char* ptr [n] = { ... };

This will remove all padding present in your current, inefficient struct. You'll save some 2-6 bytes per item, meaning some 2-6 Mb waste of space for a million items.

Example for gcc x86_64 Linux:

#include <stdio.h>

#define n 1000000

int main()
{
    typedef struct {
        short x[5];
    } weight_t;

    typedef struct {
        weight_t wgt;
        char *ptr;
    } myarray_t;

    static myarray_t array1[n];
    printf("%zu\n", sizeof array1);

    static short array2 [5*n];
    static char* ptr [n];
    printf("%zu\n", sizeof array2 + sizeof ptr);

    return 0;
}

Output:

24000000
18000000

The change saved 6M bytes on this specific system. If you are stuck with the current struct for some backwards compatibility reasons, consider using some wrapper macros to emulate struct-like access while using raw arrays.