Huffman Encoding: How to find the path?

128 Views Asked by At

I have a tree with the most frequent letters at the top of the tree and I am trying to find the path to a node so I can turn a message into binary. In this program if I go left, I add 0 to path and, if I go right, I add 1 to the path until I find the node. But I have to go straight to the desired node which is not possible. The only thing I could think of is removing the last character or path if a node has no children, but it does not work if a node has grandchildren. Can someone help me on how to approach this? Thanks!

    // global variables
    String path;
    int mark;

    // encodes a message to binary
    String encoder(char data) {
        path = "";
        mark = 0;
        findPath(root, data);
        return path;
    }

    // finds the path to a node
    void findPath(TNode node, char data) {
        if(node.data == data) {
            mark = 1;
            return;
        }
        if(mark==0 && node.left != null) {
            path += 0;
            findPath(node.left, data);
        }
        if(mark==0 && node.right != null) {
            path += 1;
            findPath(node.right, data);
        }
        if(mark==0 && node.left == null || node.right == null) {
            path = path.substring(0, path.length() - 1);
        }
    }
2

There are 2 best solutions below

0
gnasher729 On

This is not how Huffman coding works. Not at all. You assign a code to every symbol, and symbols can have different lengths.

An algorithm to determine the code: Take two symbols with the lowest frequency, say an and b. Combine them into one symbol x with higher frequency obviously, and when we are finished the code for a is the code for x plus a zero, and the code for b is the code for x plus a one. Then you look again for the two symbols with lowest frequency and combine them and so on. When you are down to two symbols you give them codes 0 and 1 and find all the other codes for symbols.

Example: a, b, c, d with frequencies 3, 47, 2 and 109. Combine a and c to x with frequency 5. Combine x and b to y with frequency 52. Then a = code 0, y = code 1. x = code 10 and b = code 11. Then a = code 100 and c = code 101.

0
Mark Adler On

You would not encode messages using the tree. You should instead traverse the entire tree once recursively, generating all of the codes for all of the symbols, and make a table of the symbols and their associated codes. Then you use that table to encode your messages.