From Section 5.2, we know that any prefix code satisfies Kraft inequality.
Our goal is to design prefix codes with minimum \(L(C)\), by Kraft inequality, such goal is equivalent to finding \(l_1, l_2, ..., l_m\) satisfying Kraft inequality and \(L(C)=\sum_{i=1}^m p_i l_i\) is less than the \(L\) of any other prefix codes.
Let's formalize this problem mathematically:
Minimize $L=\sum_{i=1}^m p_i l_i $ such that $\sum_{i=1}^m D^{-l_i} \leq 1 $ and $ l_1,...,l_m $ are integers
This is a constrained optimization problem, and we can use Lagrange multipliers to solve it:
Differentiating w.r.t. \(l_i\) and \(\lambda\), we have:
Substituting \(\lambda = \frac{1}{\log_e D}\) into $D^{-l_i} = \frac{p_i}{\lambda \log_e D} $:
where \(l_i^*\) is the optimal solution to such problem, but here we didn't consider the condition of integers. So these non-integer codeword lengths yields:
Intuitively, we should choose \(l_i\) close to \(l_i^*\). Now, we verify optimality directly in the proof of the following theorem:
Thm. 5.3.1 The expected length \(L\) of any prefix D-ary code for a random variable \(X\) is greater than or equal to the entropy \(H_D(X)\); that is,
with equality iff \(D^{-l_i}=p_i\) --- such dist. is called D-adic.
Proof:
Let \(r_i=\frac{D^{-l_i}}{\sum_j D^{-l_j}}\), obviously, \(r_i\) is a pmf, as \(\sum r_i=1\) and \(r_i\in [0,1]\).
And let \(c=\sum_j D^{-l_j}\), \(c\leq 1\) by Kraft inequality. Then,
first term is the relative entropy/KL divergence between distributions \(\mathbf{p}\) and \(\mathbf{r}\), which is always non-negative, and the second term is still non-negative because \(\frac{1}{c} \geq 1\).