Existence of Code
A source code for a random variable is a mapping from to
- : the range of
- -ary alphabet is
- : codeword for
- : length of
- : the expected length of a source code for a random variable
- rate : average length
Nonsigular code
- every element of the range maps into a different string in
- : extension of a code is mapping from finite length strings of to finite-length strings of ,
- Uniquely decodable: extension is nonsigular
Prefix Code (instantaneous code)
- no codeword is a prefix of any other codeword
- Tree representation
- Kraft Inequality (1949): For any instantaneous code over an alphabet of size , the codeword lengths must satisfy the inequality
- Conversly, given codeword satisfy above inequality, the prefix code exists.
- Interval representation
- Extended Kraft Inequality: For any countably infinite set of codewords
Optimal Codes
- optimal: is minimal
- optimization problem: Lagrange
- Bounds:
- Shannon codes:
Encode symbols together (block encode to remove 1 in )
- , if stationary stochastic process,
- shortcome: alphabet has
Wrong code
- The expected length under of the code assignment :
Kraft Inequality for Uniquely Decodable Codes (McMillan): The codeword length of any uniquely decodable -ary code must satisfy the Kraft inequality . Conversely, given a set of codeword satisfy this inequality, it is possible to construct a uniquely decodable with these codeword lengths. (prefix code is 'no less than' any other code)
Codes
Huffman Codes
- -ary Huffman codes (prefix code) for a given distribution: Each time combine symbols with the lowest probabilities into a single source symbol, until there is only one symbol
- add dummy symbols to the end of the set of symbols:
- optimal: is any other uniquely decodable code,
Shannon Codes
- -adic distribution: ,
- Shannon codes
- attain optimality within 1
- -adic: Shannon codes are optimal
- : Shannon codes may be worse
For any optimal coding scheme
- lengths are ordered inversely with probability ()
- the two longest codewords have the same length
- two of the longest codewords differ only in the last bit
Shannon-Fano-Elias Coding
- modified cumulative distribution function:
- Truncate to bit,
- ,
Random Variable Generation
- fair coin toss:
- expected number of fair bits
- generate variable: leaves are given symbols of
-
- dyadic distribution:
- otherwise: use binary expansions for the probabilities,
Universal Source Coding
- minimax redundancy:
- Arithmetic Coding
- LZ77: slidinig windows
- LZ78: Trie