5.3 Optimal Codes-526互联

From Section 5.2, we know that any prefix code satisfies Kraft inequality.

Our goal is to design prefix codes with minimum $L(C)$, by Kraft inequality, such goal is equivalent to finding $l_1, l_2, ..., l_m$ satisfying Kraft inequality and $L(C)=\sum_{i=1}^m p_i l_i$ is less than the $L$ of any other prefix codes.

Let's formalize this problem mathematically:

Minimize $L=\sum_{i=1}^m p_i l_i $ such that $\sum_{i=1}^m D^{-l_i} \leq 1 $ and $ l_1,...,l_m $ are integers

This is a constrained optimization problem, and we can use Lagrange multipliers to solve it:

\[J=\sum_{i=1}^m p_i l_i +\lambda (\sum_{i=1}^m D^{-l_i}) \]

Differentiating w.r.t. $l_i$ and $\lambda$, we have:

\[\frac{\partial J}{\partial \lambda}=0 \Rightarrow \sum_{i=1}^m D^{-l_i}=1 \]

\[\frac{\partial J}{\partial l_i}=0 (i=1,2,...,m) \Rightarrow \\ p_i -\lambda D^{-l_i} \log_e D =0 \Rightarrow \\ D^{-l_i} = \frac{p_i}{\lambda \log_e D} \Rightarrow \\ \sum_{i=1}^mD^{-l_i} = \frac{\sum_{i=1}^m p_i}{\lambda \log_e D} \Rightarrow \\ \lambda = \frac{1}{\log_e D} \]

Substituting $\lambda = \frac{1}{\log_e D}$ into $D^{-l_i} = \frac{p_i}{\lambda \log_e D} $:

\[D^{-l_i}=p_i \Rightarrow l_i^* = -\log_D p_i \]

where $l_i^*$ is the optimal solution to such problem, but here we didn't consider the condition of integers. So these non-integer codeword lengths yields:

\[L^* =\sum_{i=1}^m p_i l_i^* = -\sum_{i=1}^m p_i \log_D p_i =H_{D}(X)=H_2(X)/\log_2 D \]

Intuitively, we should choose $l_i$ close to $l_i^*$. Now, we verify optimality directly in the proof of the following theorem:

Thm. 5.3.1 The expected length $L$ of any prefix D-ary code for a random variable $X$ is greater than or equal to the entropy $H_D(X)$; that is,

\[L\geq H_D(X) \]

with equality iff $D^{-l_i}=p_i$ --- such dist. is called D-adic.

Proof:

\[L-H_D(X)=\sum_i p_il_i +\sum_i \log_D p_i \\ =-\sum_i p_i \log_D D^{-li}+\sum_i \log_D p_i \\ =-\sum_i p_i \log_D \frac{p_i}{D^{-li}}\times\frac{\sum_j D^{-l_j}}{\sum_j D^{-l_j}} \]

Let $r_i=\frac{D^{-l_i}}{\sum_j D^{-l_j}}$, obviously, $r_i$ is a pmf, as $\sum r_i=1$ and $r_i\in [0,1]$.

And let $c=\sum_j D^{-l_j}$, $c\leq 1$ by Kraft inequality. Then,

\[= \sum_i p_i\log_D \frac{p_i}{r_i}+\sum_i p_i \log_D \frac{1}{c}\\ = D(\mathbf{p}||\mathbf{r})+\log_D\frac{1}{c} \geq 0 \]

first term is the relative entropy/KL divergence between distributions $\mathbf{p}$ and $\mathbf{r}$, which is always non-negative, and the second term is still non-negative because $\frac{1}{c} \geq 1$.

version compilesdkversion targetsdkversion version_codes

pythonic codes type how

5.3

shannon-fano shannon optimal bounds

codeforces encoding optimal 1508f