AVR-Huffman

Aus LaborWiki
Wechseln zu: Navigation, Suche
           
AVR-Huffman

Release status: stable [box doku]

Avr-huffman-logo.png
Description decompression of huffman compressed data at runtime
Author(s)  Daniel (Bg)
Last Version  0.1.7 ()
Platform  AVR
License  GPLv3+
Download  SVN browse




Eine deutsche Version dieses Artikes ist verfügbar unte AVR-Huffman

About

AVR-Huffman is an impementation of huffman-compression, consisting of 3 components:

  • tool for compressing binary data (huffman-encode; host; written in portable C)
  • tool for uncompressind data (huffman-decode; host; written in portable C)
  • implementation of decompression for AVR microcontrollers (available in optimized assembler and C)

Facts

  • 486 bytes of code for the assembler variant
  • C-Interface (avr-gcc)

API

typedef ... huffman_dec_ctx_t;
void huffman_dec_init(huffman_dec_ctx_t* ctx, uint16_t(*rb_func)(uint16_t));
void huffman_dec_set_addr(huffman_dec_ctx_t* ctx,uint16_t addr);
uint16_t huffman_dec_byte(huffman_dec_ctx_t* ctx);

huffman_dec_ctx_t

This context type defines the storage structure used by an instance of the huffman decompressor. This means that a variable of this type may hold the information needed for decompressing (for example: the tree, buffers for bitwise reading, pointer to the reading-function).

huffman_dec_init

This function initializes a variable of the type huffman_dec_ctx_t which is located by a pointer to this function. The second argument is a pointer to the function which should be called to read compressed data.

huffman_dec_set_addr

This function can be used to set the address which is given as parameter to the reading function. The address is sored in the context and will be incremented by one automatically.

huffman_dec_byte

This function initializes the huffman-tree when first called with a new context. Every call to this function (including the first one) will return a byte of uncompressed data or EOF at the end of the compressed file. The function automatically calls the reading function if compressed data is needed.

reading-function

A pointer to the reading-function is used to initialize a context. The function takes a parameter which holds the address to be read. The address is always incremented by one, so it is save to ignore the parameter and use an internal address variable. The function also may return an EOF (0xffff), but this is not needed as the data stream includes an EOF-marker. The function may read data from any location including streams.

dynamic memory requirements

When called the first time avr-huffman-decode allocates dynamic memory for the internal huffman-tree. The memory is allocated via the calloc() function and later freed automatically when reaching the END-OF-FILE marker via the free() function. The required amount of dynamic memory only depends on the number of different symbols (bytes) in the uncompressed file. For each different symbol three bytes are needed. So in the worst-case (all 256 different byte values are present) 768 bytes of dynamic memory are required.

Concept

Fileformat

Note: The used format does not store the tree in an optimal way. This is justified by the reduced complexity of the decoder. The overhead should be in the range of a few bytes.

The file starts with a 15 bit magic-value of 0xC0 0xDE. The last bit may be 1 or 0 so the first two bytes may read as 0xC0 0xDE or 0xC0 0xDF. The next nine bits represent the number of leafs in the tree. The last leaf is always used for the EOF-marker.

Following the number of leafs there is repetitive structure starting always with a byte indicating the number of leaf at the current level followed by the leafs values.

The tree structure is followed by the binary representation of the compressed data. The highest bit in a byte is considered the first bit.

Example

Plaintext:

abcaaab\n

expressed as hexdump:

00000000  61 62 63 61 61 61 62 0a                           |abcaaab.|
00000008

compressed it would read like this:

00000000  c0 de 05 01 61 00 04 63  0a 62 ff 68 35 e0        |....a..c.b.h5.|
0000000e

The tree used in this example could be pictured this way: Huffman graph.png