I'm trying to reverse engineer text compression algorithm and I've stuck in one place for about a month already. In general, here's the code of decoder in C. It works perfectly but I still don't understand how compression scheme works. The problem part is GetNextChararacter function.
I don't understand this iteration bitstream format. It looks like a binary huffman tree serialized in a strange way. So iteration bitstream is like recursion tree, and algorithm searches when current node is traversed through all leafs and outputs leaf count. After that, source bitstream skips forward some bits in iterationBitStream. And if iterBit at current position is set, we traverse the remainder of iteration stream and add result to the previous offset. And so on, until we reach leaf in iteration offset. Total count of reached leafs is the offset in a characters list for decoding character.
So that's the first time I see tree is used for storing just total amount of leafs in it. The compression looks overcomplicated with this scheme. In the same time, I feel, the code does something very basic, but my understanding of the algorithm is just wrong. Can someone tell me if there is another, more logical explanation of that algorithm or maybe it's simply described somewhere else?
Thanks everybody, who was involved. I've figured it out once I've constructed tree and tried to crawl in it by hands. In general, iteration bitstream is a binary tree, serialized in pre-order traversal. Looks like it's hard to recognize tree traversal in assembler code. Source stream is a plain path in that tree. Whole compression scheme is just a Huffman, with individual tree for each character. Next character of text is decoded based on prevous char's tree.