【笔记】凸优化 Convex Optimization

发布时间 2023-08-19 16:55:06作者: zrkc

Differentiation

Def. Gradient
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is differentiable. Then the gradient of \(f\) at \({\bf x}\in\cal{X}\), denoted by \(\nabla f({\bf x})\), is defined by

\[\nabla f({\bf x}) = \begin{bmatrix} \frac{\partial f}{\partial {\bf x} _1}({\bf x}) \\ \vdots \\ \frac{\partial f}{\partial {\bf x} _N}({\bf x}) \end{bmatrix} \]

Def. Hessian
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is twice differentiable. Then the Hessian of \(f\) at \({\bf x}\in\cal{X}\), denoted by \(\nabla ^2 f({\bf x})\), is defined by

\[\nabla ^2 f({\bf x}) = \begin{bmatrix} \frac{\partial ^2 f}{\partial {\bf x} _1^2}({\bf x}) & \cdots & \frac{\partial ^2 f}{\partial {\bf x} _1, {\bf x} _N}({\bf x}) \\ \vdots & \ddots & \vdots \\ \frac{\partial ^2 f}{\partial {\bf x} _N, {\bf x} _1}({\bf x}) & \cdots & \frac{\partial ^2 f}{\partial {\bf x} _N^2}({\bf x}) \end{bmatrix} \]

Th. Fermat's theorem
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is twice differentiable. If \(f\) admits a local extrememum at \({\bf x} ^*\), then \(\nabla f(\bf x ^*)=0\)

Convexity

Def. Convex set
A set \({\cal X}\sube \mathbb{R} ^N\) is said to be convex, if for any \({\bf x, y}\in \cal X\), the segment \(\bf [x, y]\) lies in \(\cal X\), that is \(\{\alpha{\bf x} + (1-\alpha){\bf y}:0\le\alpha\le 1 \}\sube\cal X\)

Th. Operations that preserve convexity

  • \({\cal C} _i\) is convex for all \(i\in I\), then \(\bigcap _{i\in I} {\cal C} _i\) is also convex
  • \({\cal C} _1, {\cal C} _2\) is convex, then \({\cal C _1+C _2}=\{x _1+x _2:x _1\in {\cal C} _1, x _2\in {\cal X} _2 \}\) is also convex
  • \({\cal C} _1, {\cal C} _2\) is convex, then \({\cal C _1\times C _2}=\{(x _1, x _2):x _1\in {\cal C} _1, x _2\in {\cal X} _2 \}\) is also convex
  • Any projection of a convex set is also convex

Def. Convex hull
The Convex hull \(\text{conv}(\cal X)\) of set \({\cal X}\sube \mathbb{R} ^N\), is the minimal convex set containing \(\cal X\), that is

\[\text{conv}({\cal X})=\left\{ \sum _{i=1}^m \alpha _i {\bf x} _i:\forall ({\bf x} _1,\dotsb, {\bf x} _m)\in {\cal X}, \alpha _i\ge 0, \sum _{i=1} ^m \alpha _i = 1 \right\} \]

Def. Epigraph
The epigraph of \(f:{\cal X}\to \mathbb{R}\), denoted by \(\text{Epi } f\), is defined by \(\{(x,y):x\in {\cal X}, y\ge f(x)\}\)

Def. Convex function
Convex set \(\cal X\). A function \(f:{\cal X}\to \mathbb{R}\) is said to be convex, iff \(\text{Epi }f\) is convex, or equivalently, for all \({\bf x, y}\in {\cal X},\alpha \in [0,1]\)

\[f(\alpha{\bf x} + (1-\alpha){\bf y})\le \alpha f({\bf x}) + (1-\alpha) f({\bf y}) \]

Moreover, \(f\) is said to be strictly convex if the inequality is strict when \(\bf x\ne y\) and \(\alpha\in (0,1)\). \(f\) is said to be (strictly) concave if \(-f\) is (strictly) convex.

Th. Convex function characterized by first-order differential
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is differentiable. Then \(f\) is convex iff \(\text{dom}(f)\) is convex, and

\[\forall {\bf x, y}\in \text{dom}(f),\quad f({\bf y})-f({\bf x})\ge \nabla f({\bf x})\cdot({\bf y-x}) \]

Th. Convex function characterized by second-order differential
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is twice differentiable. Then \(f\) is convex iff \(\text{dom}(f)\) is convex, and its Hessian is positive semidefinite (半正定)

\[\forall {\bf x}\in \text{dom}(f),\quad \nabla ^2 f({\bf x})\succeq 0 \]

  • Symmetric matrix is positive semidefinite if all eigenvalues non-negative
  • If \(f\) is scalar (eg. \(x\mapsto x ^2\)), then \(f\) is convex iff \(\forall x\in \text{dom}(f), f''(x)\ge 0\)
  • For example
    • Linear functions is both convex and concave
    • Any norm \(\Vert\cdot\Vert\) over convex set \(\cal X\) is a convex function
      \(\Vert\alpha{\bf x}+(1-\alpha){\bf y} \Vert\le \Vert\alpha{\bf x}\Vert+\Vert(1-\alpha){\bf y} \Vert\le \alpha\Vert{\bf x}\Vert+(1-\alpha)\Vert{\bf y}\Vert\)
  • Using composition rules to prove convexity

Th. Composition of convex/concave functions
Assume \(h:\mathbb{R}\to\mathbb{R}\) and \(g:\mathbb{R} ^N\to\mathbb{R}\) are twice differentiable. Define \(f({\bf x})=h(g({\bf x})), \forall {\bf x}\in \mathbb{R} ^N\), then

  • \(h\) is convex & non-decreasing, \(g\) is convex \(\implies\) \(f\) is convex
  • \(h\) is convex & non-increasing, \(g\) is concave \(\implies\) \(f\) is convex
  • \(h\) is concave & non-decreasing, \(g\) is concave \(\implies\) \(f\) is concave
  • \(h\) is concave & non-increasing, \(g\) is convex \(\implies\) \(f\) is concave

Proof: It holds for \(N=1\), which suffices to prove convexity (concavity) along all lines that intersect the domain.
Example: \(g\) could be any norm \(\Vert\cdot\Vert\)

Th. Pointwise maximum of convex functions
\(f _i\) is a convex function defined over convex set \(\cal C\) for all \(i\in I\), then \(f(x)=\sup _{i\in I}f _i(x), x\in \cal C\) is a convex function.
Proof: \(\text{Epi } f = \bigcap _{i\in I} \text{Epi } f _i\) is convex

  • \(f({\bf x})=\max _{i\in I}{\bf w} _i\cdot {\bf x}+b _i\) over a convex set, is a convex function
  • The maximum eigenvalue \(\lambda _{\max}({\bf M})\) over the set of symmetric matrices, is a convex function, since \(\lambda _{\max}({\bf M})=\sup _{\Vert\bf x\Vert _2\le 1}{\bf x}'{\bf Mx}\) is supremum of linear functions \({\bf M}\mapsto{\bf x}'{\bf Mx}\)
    More generally, let \(\lambda _{k}(\bf M)\) denote the top \(k\) eigenvalues, then \({\bf M}\mapsto \sum _{i=1} ^{k}\lambda _{i}(\bf M)\) and \({\bf M}\mapsto \sum _{i=n-k+1} ^{n}\lambda _{i}({\bf M})=-\sum _{i=1} ^{k}\lambda _i(-\bf M)\) are both convex function

Th. Partial infimum
Convex function \(f\) defined over convex set \(\cal C\sube X\times Y\), and conves set \(\cal B\sube Y\). Then \({\cal A}=\{x\in {\cal X}:\exist y\in{\cal B}, (x, y)\in {\cal C} \}\) is convex set if non-empty, and \(g(x)=\inf _{y\in\cal B} f(x, y)\) for all \(x\in \cal A\) is convex function.
For example, the distance to convex set \(\cal B\), \(d(x)=\inf _{y\in \cal B}\Vert x-y\Vert\) is convex function

Th. Jensen's inequality
Let r.v. \(X\) in convex set \({\cal C}\sube \mathbb{R} ^N\), and convex function \(f\) defined over \(\cal C\). Then, \(\mathbb{E}[X]\in {\cal C}, \mathbb{E}[f(X)]\) is finite, and

\[f(\mathbb{E}[X])\le \mathbb{E}[f(X)] \]

Sketch of proof: extending \(f(\sum \alpha x)\le\sum \alpha f(x)\) and \(\sum \alpha=1\) that can be interpreted as probabilities, to arbitraty contributions.

Constrained optimization

Def. Constrained optimization problem
\({\cal X}\sube \mathbb{R} ^N,\ f, g _i:{\cal X}\to\mathbb{R}, i\in [m]\),则带约束优化问题(也称为 primal problem)的形式为

\[\min _{\bf x\in\cal X} f({\bf x}) \\ subject\ to: g _i({\bf x})\le 0, \forall i\in [m] \]

\(\inf _{\bf x\in\cal X} f({\bf x})=p ^*\);注意到我们没有假设任何的 convexity;对于 \(g=0\) 的约束我们可以用 \(g\le 0, -g\le 0\) 来刻画
解决这类问题,先引入拉格朗日函数 Lagrange function,将约束以非正项引入

Def. Lagrange function
为带约束优化问题定义拉格朗日函数,为

\[\forall {\bf x}\in {\cal X},\forall{\boldsymbol{\alpha}\ge 0},\quad {\cal L}({\bf x}, {\boldsymbol{\alpha}})=f({\bf x}) + \sum _{i=1}^{m} \alpha _i g _i({\bf x}) \]

其中 \({\boldsymbol{\alpha}}=(\alpha _1,\dotsb, \alpha _m)'\) 称为对偶变量 dual variable
对于约束 \(g=0\),其系数 \(\alpha=\alpha _+ - \alpha _-\) 不需要非负(但是下文给出定理时,要求 \(g,-g\) 同时为凸,从而 \(g\) 得是仿射函数 affine,即形如 \({\bf w\cdot x+b}\)
注意到 \(p ^* = \inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}})\),因为当 \(\bf x\) 不满足约束时 \(\sup _{\boldsymbol{\alpha}}\) 可以取到无穷大,从而刻画了约束
有趣的来了,我们能构造一个 concave function,称为对偶函数 Dual function

Def. Dual function
为带约束优化问题定义对偶函数,为

\[\forall{\boldsymbol{\alpha}}\ge 0,\quad F({\boldsymbol{\alpha}})=\inf _{\bf x\in\cal X} {\cal L}({\bf x},{\boldsymbol{\alpha}})=\inf _{\bf x\in\cal X} \left(f({\bf x}) + \sum _{i=1}^{m} \alpha _i g _i({\bf x})\right) \]

它是 concave 的,因为 \(\cal L\) 是关于 \(\boldsymbol{\alpha}\) 的线性函数,且 pointwise infimum 保持了 concavity
同时注意到对任意 \({\boldsymbol{\alpha}}\)\(F({\boldsymbol{\alpha}})\le \inf _{\bf x\in\cal X}f({\bf x})=p ^*\)
定义对偶问题

Def. Dual problem
为带约束优化问题定义对偶问题,为

\[\max _{\boldsymbol{\alpha}} F({\boldsymbol{\alpha}}) \\ subject\ to:{\boldsymbol{\alpha}}\ge 0 \]

对偶问题是凸优化问题,即求 concave 函数的最大值,记其为 \(d ^*\);由上文可知 \(d ^*\le p ^*\),也就是:

\[d ^* = \sup _{\boldsymbol{\alpha}} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}})\le \inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}) = p ^* \]

称为弱对偶 weak duality,取等情况称为强对偶 strong duality
接下来会给出:

当凸优化问题满足约束规范性条件 constraint qualification (Slater's contidion) 时(此为充分条件),有 \(d ^*=p ^*\),且该解的充要条件为拉格朗日函数的鞍点 saddle point

Def. Constraint qualification (Slater's condition)
假设集合 \(\cal X\) 的内点非空 \(\text{int}({\cal X})\ne \empty\)

  • 定义 strong constraint qualification (Slater's condition)

\[\exist \bar{{\bf x}}\in\text{int}({\cal X}):\forall i\in [m], g _i(\bar{{\bf x}})< 0 \]

  • 定义 weak constraint qualification (weak Slater's condition)

\[\exist \bar{{\bf x}}\in\text{int}({\cal X}):\forall i\in [m], (g _i(\bar{{\bf x}})< 0)\vee (g _i(\bar{{\bf x}})=0\wedge g _i \text{ affine}) \]

这个条件有点像在说约束区域是一个 convex set ?
基于 Slater's condition,叙述拉格朗日函数的鞍点 saddle point 是带约束优化问题的解的充要条件

Th. Saddle point - sufficient condition
带约束优化问题,如果其拉格朗日函数存在鞍点 saddle point \(({\bf x} ^*, \boldsymbol{\alpha} ^*)\),即

\[\forall {\bf x}\in {\cal X},\forall{\boldsymbol{\alpha}\ge 0},\quad {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*)\le {\cal L}({\bf x}, {\boldsymbol{\alpha}} ^*) \]

\({\bf x} ^*\) 是该问题的解,\(f({\bf x} ^*)=\inf f({\bf x})\)

Th. Saddle point - necessary condition
假设 \(f, g _i, i\in [m]\) 为 convex function:

  • 若满足 Slater's condition,则带约束优化问题的解 \(\bf x ^*\) 满足存在 \(\boldsymbol{\alpha} ^*\ge 0\) 使得 \(({\bf x} ^*, \boldsymbol{\alpha} ^*)\) 是拉格朗日函数的鞍点
  • 若满足 weak Slater's condition 且 \(f, g _i\) 可导,则带约束优化问题的解 \(\bf x ^*\) 满足存在 \(\boldsymbol{\alpha} ^*\ge 0\) 使得 \(({\bf x} ^*, \boldsymbol{\alpha} ^*)\) 是拉格朗日函数的鞍点

充要性证明。由于书本上没提供必要性的证明,且充分性证明不难但是不够漂亮,所以就不抄了,只给出我自己的思路(虽然可能有缺陷,下图也只是示意):

回到最初的不等式:

\[d ^* = \sup _{\boldsymbol{\alpha}} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}})\le\inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}) = p ^* \]

定义 \({\bf x} ^*\) 取到 \(p ^*=\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}),\ \forall {\bf x}\)
定义 \({\boldsymbol{\alpha}} ^*\) 取到 \(d ^*=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\ge \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}}),\ \forall {\boldsymbol{\alpha}}\)

充分性:
若存在鞍点,反证法(见图右上),假设 \(({\bf x} ^* , {\boldsymbol{\alpha}} ^*)\) 不是鞍点,记鞍点为 \(({\bf x}' , {\boldsymbol{\alpha}}')\),则 \({\cal L}({\bf x}', {\boldsymbol{\alpha}} ^*){\color{green}<} {\cal L}({\bf x}', {\boldsymbol{\alpha}}')=\inf _{\bf x}{\cal L}({\bf x}, {\boldsymbol{\alpha}}'){\color{green}\le} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),观察不等号两头发现矛盾,则 \({\boldsymbol{\alpha} ^*}={\boldsymbol{\alpha}}'\);对 \(\bf x ^*,x'\) 同理
从而 \(({\bf x} ^* , {\boldsymbol{\alpha}} ^*)\) 是鞍点,即 \(\forall {\bf x},\forall {\boldsymbol{\alpha}},\ {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*)\le {\cal L}({\bf x}, {\boldsymbol{\alpha}} ^*)\),则有 \(\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}}) = {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*) = \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),从而 \(p ^* = d ^*\)

必要性:
\(p ^* =d ^*\),即 \(\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),且 \(\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\ge {\cal L}({\bf x} ^*, {\boldsymbol{\alpha} ^*})\ge \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\);故三者取等,故 \(\forall {\bf x},\forall {\boldsymbol{\alpha}},\ {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})={\cal L}({\bf x} ^*, {\boldsymbol{\alpha} ^*})=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\le {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),从而 \(({\bf x} ^* , {\boldsymbol{\alpha}} ^*)\) 是鞍点

爽!

综上,我们用一个定理来总结:KKT
Th. Karush-Kuhn-Tucker's theorem
带约束优化问题,假设 \(f, g _i:{\cal X}\to\mathbb{R},\forall i\in[m]\),为 convex and differentiable,且满足 Slater's condition

\[\min _{\bf x\in\cal X, {\bf g}({\bf x})\le {\bf 0}} f({\bf x}) \]

其拉格朗日函数为 \({\cal L}({\bf x}, {\boldsymbol{\alpha}})=f({\bf x}) + {\boldsymbol{\alpha}}\cdot {\bf g}({\bf x}), {\boldsymbol{\alpha}}\ge 0\)
\(\bar{\bf x}\) 是该问题的解,当且仅当存在 \(\bar{\boldsymbol{\alpha}}\ge 0\),满足:

\[\begin{aligned} \nabla _{\bf x} {\cal L}(\bar{\bf x}, \bar{\boldsymbol{\alpha}}) &= \nabla _{\bf x} f(\bar{\bf x}) + \bar{\boldsymbol{\alpha}}\cdot \nabla _{\bf x} {\bf g}(\bar{\bf x}) = 0 \\ \nabla _{\boldsymbol{\alpha}} {\cal L}(\bar{\bf x}, \bar{\boldsymbol{\alpha}})&={\bf g}(\bar{\bf x})\le 0 \\ \bar{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x}) &= 0 \end{aligned}\quad ;\text{KKT conditions} \]

其中后两条称为互补条件 complementarity conditions,即对任意 \(i\in[m], \bar{\boldsymbol{\alpha}} _i\ge 0,g _i(\bar{\bf x})\le 0\),且满足 \(\bar{\boldsymbol{\alpha}} _i g _i(\bar{\bf x})=0\)

充要性证明:

必要性,\(\bar{\bf x}\) 为解,则存在 \(\bar{\boldsymbol{\alpha}}\) 使得 \((\bar{\bf x}, \bar{\boldsymbol{\alpha}})\) 为鞍点,从而得到 KKT 条件:第一条即鞍点定义,第二、三条:

\[\begin{aligned} \forall {{\boldsymbol{\alpha}}},{\cal L}(\bar{\bf x}, {\boldsymbol{\alpha}})\le{\cal L}(\bar{\bf x}, \bar{\boldsymbol{\alpha}})&\implies{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x})\le \bar{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x})\\{\boldsymbol{\alpha}}\to +\infty&\implies {\bf g}(\bar{\bf x})\le 0 \\ {\boldsymbol{\alpha}}\to 0 &\implies \bar{\boldsymbol{\alpha}}\cdot{\bf g}(\bar{\bf x})=0 \end{aligned} \]

充分性,满足 KKT 条件,则对于满足 \(\bf g({\bf x})\le 0\)\(\bf x\)

\[\begin{aligned} f({\bf x}) - f(\bar{\bf x}) &\ge \nabla _{\bf x} f(\bar{\bf x})\cdot ({\bf x-\bar{x}}) & ;\text{convexity of $f$} \\ &= -\bar{\boldsymbol{\alpha}}\cdot \nabla _{\bf x} {\bf g}(\bar{\bf x})\cdot ({\bf x-\bar{x}}) &; \text{first cond}\\ &\ge -\bar{\boldsymbol{\alpha}}\cdot ({\bf g}({\bf x})-{\bf g}(\bar{\bf x})) &; \text{convexity of $g$}\\ &= -\bar{\boldsymbol{\alpha}}\cdot {\bf g}({\bf x}) \ge 0 &;\text{third cond} \end{aligned} \]

Fenchel duality

凸优化问题,对 \(f\) 不可导或者有无穷大的值的情况进行分析