Burrows wheeler transform

118 Views Asked by At

for a project, I need to encode and decode a generic file using bwt. The only problem is that I'm experiencing problems encoding files different from a txt file. I really don't know why so I hope you can help me, here is the code:

int compare(const void *a, const void *b) {

    caratteri *ca = *(caratteri **) a;

    caratteri *cb = *(caratteri **) b;

    unsigned char *c1;

    unsigned char *c2;

    c1 = ca->first;

    c2 = cb->first;

    while (*c1 - *c2 == 0) {

        c1++;

        c2++;
    }

    return (*c1 - *c2);
}

caratteri **createStruct(unsigned char c[], caratteri car[], caratteri *ptr[], long size) {

    for (long i = 0; i < size; i++) {

        ptr[i] = &car[i];

        car[i].first = &c[i];

        car[i].last = &c[(size - 1 + i) % size];
    }

    return ptr;
}
caratteri **bwt(long size, FILE *file) {
    FILE *risultato;
    unsigned char *c = malloc(sizeof(unsigned char) * size);
    fread(c, sizeof(unsigned char), size, file);
    caratteri *car = malloc(sizeof(caratteri) * size);

    caratteri **pCaratteri = malloc(sizeof(caratteri *) * size);

    pCaratteri = createStruct(c, car, pCaratteri, size);

    qsort(pCaratteri, size, sizeof(pCaratteri), compare);

    risultato=fopen("risultato","wb");

    for(long i = 0; i < size; i++)

        fputc(*pCaratteri[i]->last,risultato);

    fclose(risultato);

    return pCaratteri;
}

the main class is:

int main() {
    FILE *file;
    file = fopen("thumbnail.jpg","rb");
    if (file == NULL) {
        printf("Errore di apertura file!");
        exit(2);
    }
    fseek(file, SEEK_SET, SEEK_END);
    long size = ftell(file)+1;
    rewind(file);
    caratteri **car = bwt(size, file);
    FILE *risultato;
    decryptbwt(risultato);
    return 0;
}

Again, I don't know why but the encoding and decoding works fine with txt files while it doesn't work with other extensions. Do you know why? Also, I'm new in this world so forgive my dumb mistakes and this not the full code. Obviously, this is just a simple implementation that I did avoiding suffix arrays.

0

There are 0 best solutions below