5.4 Bounds on the optimal code length (Shannon-Fano coding)

发布时间 2023-10-16 04:54:37作者: 李斯赛特

From section 5.3, we have \(l_i^*=-\log_D p_i\), but it may not be integer, and we should choose \(l_i\) close to \(l_i^*\).

So round it up using the ceiling function (Shannon-Fano coding):

\[l_i=\left \lceil -\log_D p_i \right \rceil \Rightarrow -\log_D p_i\leq l_i < -\log_D p_i+1 \\ \Rightarrow -\sum_i p_i \log_D p_i\leq \sum_i p_i l_i < -\sum_i p_i\log_D p_i+\sum_i p_i \\ \Rightarrow H_D(X) \leq L < H_D(X)+1 \]

Thm.5.4.1 Let \(l_1^*, l_2^*,...,,l_m^*\) be optimal codeword lengths for a source distribution \(\mathbf{p}\) and a D-ary alphabet, and let \(L^*\) be the associated expected length of an optimal code (\(L^*=\sum p_i l_i^*\)). Then

\[H_D(X) \leq L^* < H_D(X)+1 \]


Next, consider sending a seq. of \(n\) symbols of \(X\), the symbols are drawn i.i.d. \(\sim p(x)\).

  • Let \(l(x_1,...,x_m)\) be the length of the binary codeword associated with \((x_1,...,x_m)\)
  • Let \(L_n=\frac{1}{n}\sum p(x_1,...,x_n)l(x_1,...,x_m)\) be the expected codeword length per input symbol

Then, by the bound we derived previously, we have

\[H(X_1,...,X_n) \leq \mathbb{E}[l(x_1,...,x_m)] <H(X_1,...,X_n)+1 \\ nH(X)\leq \mathbb{E}[l(x_1,...,x_m)] < nH(X)+1 \\ H(X) \leq L_n < H(X) + \frac{1}{n} \]

second line: \(X_1,...,X_n\) are i.i.d.; third line: dividing by n on each side

If \(X_1,...,X_n\) are not necessarily i.i.d.:

\[H(X_1,...,X_n) \leq \mathbb{E}[l(x_1,...,x_m)] <H(X_1,...,X_n)+1 \\ \frac{H(X_1,...,X_n)}{n} \leq L_n <\frac{H(X_1,...,X_n)}{n}+\frac{1}{n} \]

If this random process is stationary, then \(\lim_{n\rightarrow \infty}\frac{H(X_1,...,X_n)}{n} =H(\mathcal{X})\) and thereby \(\lim_{n\rightarrow \infty}L_n^*=H(\mathcal{X})\).


If the code is designed for a wrong distribution:

In many case, the true distribution \(p(x)\) is unknown, and we have to use the wrong distribution \(q(x)\) who may be the best estimate that we can make of the true distribution.

Thm.5.4.3 (Wrong code) The expected length under \(p(x)\) of the code assignment \(l(x)=\lceil -\log q(x) \rceil\) satisfies

\[H(p)+D(p||q) \leq \mathbb{E}_p[l(x)]<H(p)+D(p||q) +1 \]

proof:

\[\mathbb{E}_p[l(x)]=\sum_x p(x)l(x)=\sum_x p(x)\lceil \log \frac{1}{q(x)} \rceil\\ < \sum_x p(x)( \log \frac{1}{q(x)} +1)\\ =\sum_x p(x)\log( \frac{p(x)}{q(x)}\times\frac{1}{p(x)})+\sum_x p(x)\\ =H(p)+D(p||q) +1 \]

lower bound can be derived in a same way. The penalty term \(D(p||q)\) is large, when \(q\) differs from \(q\) a lot.