Algorithm Overview
Huffman coding is a statistical technique which attempts to reduce the amount of bits required to represent a string of symbols. The algorithm accomplishes its goals by allowing symbols to vary in length. Shorter codes are assigned to the most frequently used symbols, and longer codes to the symbols which appear less frequently in the string (that's where the tatistical part comes in). Arithmetic coding is another statistical coding technique.Building a Huffman Tree
The Huffman code for an alphabet (set of symbols) may be generated by constructing a binary tree with nodes containing the symbols to be encoded and their probabilities of occurrence. The tree may be constructed as follows:Step 1. Create a parentless node for each symbol. Each node should include the symbol and its probability.
Step 2. Select the two parentless nodes with the lowest probabilities.
Step 3. Create a new node which is the parent of the two lowest probability nodes.
Step 4. Assign the new node a probability equal to the sum of its children's probabilities.
Step 5. Repeat from Step 2 until there is only one parentless node left.
The code for each symbol may be obtained by tracing a path to the symbol from the root of the tree. A 1 is assigned for a branch in one direction and a 0 is assigned for a branch in the other direction. For example a symbol which is reached by branching right twice, then left once may be represented by the pattern '110'. The figure below depicts codes for nodes of a sample tree.
* / \ (0) (1) / \ (10)(11) / \ (110)(111)
Once a Huffman tree is built, Canonical Huffman codes, which require less information to rebuild, may be generated by the following steps:
Step 1. Remember the lengths of the codes resulting from a Huffman tree generated per above.
Step 2. Sort the symbols to be encoded by the lengths of their codes (use symbol value to break ties).
Step 3. Initialize the current code to all zeros and assign code values to symbols from longest to shortest code as follows:
Step 1. Remember the lengths of the codes resulting from a Huffman tree generated per above.
Step 2. Sort the symbols to be encoded by the lengths of their codes (use symbol value to break ties).
Step 3. Initialize the current code to all zeros and assign code values to symbols from longest to shortest code as follows:
- If the current code length is greater than the length of the code for the current symbol, right shift off the extra bits.
- Assign the code to the current symbol.
- Increment the code value.
- Get the symbol with the next longest code.
- Repeat from A until all symbols are assigned code
Example Given a 6 symbol alphabet with the following symbol probabilities: A = 1, B = 2, C = 4, D = 8, E = 16, F = 32
Step 1. Combine A and B into AB with a probability of 3.
Step 2. Combine AB and C into ABC with a probability of 7.
Step 3. Combine ABC and D into ABCD with a probability of 15.
Step 4. Combine ABCD and E into ABCDE with a probability of 31.
Step 5. Combine ABCDE and F into ABCDEF with a probability of 63.
The Following tree results:
Step 1. Combine A and B into AB with a probability of 3.
Step 2. Combine AB and C into ABC with a probability of 7.
Step 3. Combine ABC and D into ABCD with a probability of 15.
Step 4. Combine ABCD and E into ABCDE with a probability of 31.
Step 5. Combine ABCDE and F into ABCDEF with a probability of 63.
The Following tree results:
ABCDEF / \ (0)F ABCDE / \ (10)E ABCD / \ (110)D ABC / \ (1110)C AB / \ (11110)A B(11111)
Comments