【笔记】凸优化 Convex Optimization-526互联

Differentiation

Def. Gradient
$f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}$ is differentiable. Then the gradient of $f$ at ${\bf x}\in\cal{X}$, denoted by $\nabla f({\bf x})$, is defined by

\[\nabla f({\bf x}) = \begin{bmatrix} \frac{\partial f}{\partial {\bf x} _1}({\bf x}) \\ \vdots \\ \frac{\partial f}{\partial {\bf x} _N}({\bf x}) \end{bmatrix} \]

Def. Hessian
$f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}$ is twice differentiable. Then the Hessian of $f$ at ${\bf x}\in\cal{X}$, denoted by $\nabla ^2 f({\bf x})$, is defined by

\[\nabla ^2 f({\bf x}) = \begin{bmatrix} \frac{\partial ^2 f}{\partial {\bf x} _1^2}({\bf x}) & \cdots & \frac{\partial ^2 f}{\partial {\bf x} _1, {\bf x} _N}({\bf x}) \\ \vdots & \ddots & \vdots \\ \frac{\partial ^2 f}{\partial {\bf x} _N, {\bf x} _1}({\bf x}) & \cdots & \frac{\partial ^2 f}{\partial {\bf x} _N^2}({\bf x}) \end{bmatrix} \]

Th. Fermat's theorem
$f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}$ is twice differentiable. If $f$ admits a local extrememum at ${\bf x} ^*$, then $\nabla f(\bf x ^*)=0$

Convexity

Def. Convex set
A set ${\cal X}\sube \mathbb{R} ^N$ is said to be convex, if for any ${\bf x, y}\in \cal X$, the segment $\bf [x, y]$ lies in $\cal X$, that is $\{\alpha{\bf x} + (1-\alpha){\bf y}:0\le\alpha\le 1 \}\sube\cal X$

Th. Operations that preserve convexity

${\cal C} _i$ is convex for all $i\in I$, then $\bigcap _{i\in I} {\cal C} _i$ is also convex
${\cal C} _1, {\cal C} _2$ is convex, then ${\cal C _1+C _2}=\{x _1+x _2:x _1\in {\cal C} _1, x _2\in {\cal X} _2 \}$ is also convex
${\cal C} _1, {\cal C} _2$ is convex, then ${\cal C _1\times C _2}=\{(x _1, x _2):x _1\in {\cal C} _1, x _2\in {\cal X} _2 \}$ is also convex
Any projection of a convex set is also convex

Def. Convex hull
The Convex hull $\text{conv}(\cal X)$ of set ${\cal X}\sube \mathbb{R} ^N$, is the minimal convex set containing $\cal X$, that is

\[\text{conv}({\cal X})=\left\{ \sum _{i=1}^m \alpha _i {\bf x} _i:\forall ({\bf x} _1,\dotsb, {\bf x} _m)\in {\cal X}, \alpha _i\ge 0, \sum _{i=1} ^m \alpha _i = 1 \right\} \]

Def. Epigraph
The epigraph of $f:{\cal X}\to \mathbb{R}$, denoted by $\text{Epi } f$, is defined by $\{(x,y):x\in {\cal X}, y\ge f(x)\}$

Def. Convex function
Convex set $\cal X$. A function $f:{\cal X}\to \mathbb{R}$ is said to be convex, iff $\text{Epi }f$ is convex, or equivalently, for all ${\bf x, y}\in {\cal X},\alpha \in [0,1]$

\[f(\alpha{\bf x} + (1-\alpha){\bf y})\le \alpha f({\bf x}) + (1-\alpha) f({\bf y}) \]

Moreover, $f$ is said to be strictly convex if the inequality is strict when $\bf x\ne y$ and $\alpha\in (0,1)$. $f$ is said to be (strictly) concave if $-f$ is (strictly) convex.

Th. Convex function characterized by first-order differential
$f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}$ is differentiable. Then $f$ is convex iff $\text{dom}(f)$ is convex, and

\[\forall {\bf x, y}\in \text{dom}(f),\quad f({\bf y})-f({\bf x})\ge \nabla f({\bf x})\cdot({\bf y-x}) \]

Th. Convex function characterized by second-order differential
$f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}$ is twice differentiable. Then $f$ is convex iff $\text{dom}(f)$ is convex, and its Hessian is positive semidefinite (半正定)

\[\forall {\bf x}\in \text{dom}(f),\quad \nabla ^2 f({\bf x})\succeq 0 \]

Symmetric matrix is positive semidefinite if all eigenvalues non-negative
If $f$ is scalar (eg. $x\mapsto x ^2$), then $f$ is convex iff $\forall x\in \text{dom}(f), f''(x)\ge 0$
For example
- Linear functions is both convex and concave
- Any norm $\Vert\cdot\Vert$ over convex set $\cal X$ is a convex function
  $\Vert\alpha{\bf x}+(1-\alpha){\bf y} \Vert\le \Vert\alpha{\bf x}\Vert+\Vert(1-\alpha){\bf y} \Vert\le \alpha\Vert{\bf x}\Vert+(1-\alpha)\Vert{\bf y}\Vert$
Using composition rules to prove convexity

Th. Composition of convex/concave functions
Assume $h:\mathbb{R}\to\mathbb{R}$ and $g:\mathbb{R} ^N\to\mathbb{R}$ are twice differentiable. Define $f({\bf x})=h(g({\bf x})), \forall {\bf x}\in \mathbb{R} ^N$, then

$h$ is convex & non-decreasing, $g$ is convex $\implies$ $f$ is convex
$h$ is convex & non-increasing, $g$ is concave $\implies$ $f$ is convex
$h$ is concave & non-decreasing, $g$ is concave $\implies$ $f$ is concave
$h$ is concave & non-increasing, $g$ is convex $\implies$ $f$ is concave

Proof: It holds for $N=1$, which suffices to prove convexity (concavity) along all lines that intersect the domain.
Example: $g$ could be any norm $\Vert\cdot\Vert$

Th. Pointwise maximum of convex functions
$f _i$ is a convex function defined over convex set $\cal C$ for all $i\in I$, then $f(x)=\sup _{i\in I}f _i(x), x\in \cal C$ is a convex function.
Proof: $\text{Epi } f = \bigcap _{i\in I} \text{Epi } f _i$ is convex

$f({\bf x})=\max _{i\in I}{\bf w} _i\cdot {\bf x}+b _i$ over a convex set, is a convex function
The maximum eigenvalue $\lambda _{\max}({\bf M})$ over the set of symmetric matrices, is a convex function, since $\lambda _{\max}({\bf M})=\sup _{\Vert\bf x\Vert _2\le 1}{\bf x}'{\bf Mx}$ is supremum of linear functions ${\bf M}\mapsto{\bf x}'{\bf Mx}$
More generally, let $\lambda _{k}(\bf M)$ denote the top $k$ eigenvalues, then ${\bf M}\mapsto \sum _{i=1} ^{k}\lambda _{i}(\bf M)$ and ${\bf M}\mapsto \sum _{i=n-k+1} ^{n}\lambda _{i}({\bf M})=-\sum _{i=1} ^{k}\lambda _i(-\bf M)$ are both convex function

Th. Partial infimum
Convex function $f$ defined over convex set $\cal C\sube X\times Y$, and conves set $\cal B\sube Y$. Then ${\cal A}=\{x\in {\cal X}:\exist y\in{\cal B}, (x, y)\in {\cal C} \}$ is convex set if non-empty, and $g(x)=\inf _{y\in\cal B} f(x, y)$ for all $x\in \cal A$ is convex function.
For example, the distance to convex set $\cal B$, $d(x)=\inf _{y\in \cal B}\Vert x-y\Vert$ is convex function

Th. Jensen's inequality
Let r.v. $X$ in convex set ${\cal C}\sube \mathbb{R} ^N$, and convex function $f$ defined over $\cal C$. Then, $\mathbb{E}[X]\in {\cal C}, \mathbb{E}[f(X)]$ is finite, and

\[f(\mathbb{E}[X])\le \mathbb{E}[f(X)] \]

Sketch of proof: extending $f(\sum \alpha x)\le\sum \alpha f(x)$ and $\sum \alpha=1$ that can be interpreted as probabilities, to arbitraty contributions.

Constrained optimization

Def. Constrained optimization problem
${\cal X}\sube \mathbb{R} ^N,\ f, g _i:{\cal X}\to\mathbb{R}, i\in [m]$，则带约束优化问题（也称为 primal problem）的形式为

\[\min _{\bf x\in\cal X} f({\bf x}) \\ subject\ to: g _i({\bf x})\le 0, \forall i\in [m] \]

记 $\inf _{\bf x\in\cal X} f({\bf x})=p ^*$；注意到我们没有假设任何的 convexity；对于 $g=0$ 的约束我们可以用 $g\le 0, -g\le 0$ 来刻画
解决这类问题，先引入拉格朗日函数 Lagrange function，将约束以非正项引入

Def. Lagrange function
为带约束优化问题定义拉格朗日函数，为

\[\forall {\bf x}\in {\cal X},\forall{\boldsymbol{\alpha}\ge 0},\quad {\cal L}({\bf x}, {\boldsymbol{\alpha}})=f({\bf x}) + \sum _{i=1}^{m} \alpha _i g _i({\bf x}) \]

其中 ${\boldsymbol{\alpha}}=(\alpha _1,\dotsb, \alpha _m)'$ 称为对偶变量 dual variable
对于约束 $g=0$，其系数 $\alpha=\alpha _+ - \alpha _-$ 不需要非负（但是下文给出定理时，要求 $g,-g$ 同时为凸，从而 $g$ 得是仿射函数 affine，即形如 ${\bf w\cdot x+b}$）
注意到 $p ^* = \inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}})$，因为当 $\bf x$ 不满足约束时 $\sup _{\boldsymbol{\alpha}}$ 可以取到无穷大，从而刻画了约束
有趣的来了，我们能构造一个 concave function，称为对偶函数 Dual function

Def. Dual function
为带约束优化问题定义对偶函数，为

\[\forall{\boldsymbol{\alpha}}\ge 0,\quad F({\boldsymbol{\alpha}})=\inf _{\bf x\in\cal X} {\cal L}({\bf x},{\boldsymbol{\alpha}})=\inf _{\bf x\in\cal X} \left(f({\bf x}) + \sum _{i=1}^{m} \alpha _i g _i({\bf x})\right) \]

它是 concave 的，因为 $\cal L$ 是关于 $\boldsymbol{\alpha}$ 的线性函数，且 pointwise infimum 保持了 concavity
同时注意到对任意 ${\boldsymbol{\alpha}}$，$F({\boldsymbol{\alpha}})\le \inf _{\bf x\in\cal X}f({\bf x})=p ^*$
定义对偶问题

Def. Dual problem
为带约束优化问题定义对偶问题，为

\[\max _{\boldsymbol{\alpha}} F({\boldsymbol{\alpha}}) \\ subject\ to:{\boldsymbol{\alpha}}\ge 0 \]

对偶问题是凸优化问题，即求 concave 函数的最大值，记其为 $d ^*$；由上文可知 $d ^*\le p ^*$，也就是：

\[d ^* = \sup _{\boldsymbol{\alpha}} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}})\le \inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}) = p ^* \]

称为弱对偶 weak duality，取等情况称为强对偶 strong duality
接下来会给出：

当凸优化问题满足约束规范性条件 constraint qualification (Slater's contidion) 时（此为充分条件），有 $d ^*=p ^*$，且该解的充要条件为拉格朗日函数的鞍点 saddle point

Def. Constraint qualification (Slater's condition)
假设集合 $\cal X$ 的内点非空 $\text{int}({\cal X})\ne \empty$：

定义 strong constraint qualification (Slater's condition) 为

\[\exist \bar{{\bf x}}\in\text{int}({\cal X}):\forall i\in [m], g _i(\bar{{\bf x}})< 0 \]

定义 weak constraint qualification (weak Slater's condition) 为

\[\exist \bar{{\bf x}}\in\text{int}({\cal X}):\forall i\in [m], (g _i(\bar{{\bf x}})< 0)\vee (g _i(\bar{{\bf x}})=0\wedge g _i \text{ affine}) \]

这个条件有点像在说约束区域是一个 convex set ？
基于 Slater's condition，叙述拉格朗日函数的鞍点 saddle point 是带约束优化问题的解的充要条件

Th. Saddle point - sufficient condition
带约束优化问题，如果其拉格朗日函数存在鞍点 saddle point $({\bf x} ^*, \boldsymbol{\alpha} ^*)$，即

\[\forall {\bf x}\in {\cal X},\forall{\boldsymbol{\alpha}\ge 0},\quad {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*)\le {\cal L}({\bf x}, {\boldsymbol{\alpha}} ^*) \]

则 ${\bf x} ^*$ 是该问题的解，$f({\bf x} ^*)=\inf f({\bf x})$

Th. Saddle point - necessary condition
假设 $f, g _i, i\in [m]$ 为 convex function：

若满足 Slater's condition，则带约束优化问题的解 $\bf x ^*$ 满足存在 $\boldsymbol{\alpha} ^*\ge 0$ 使得 $({\bf x} ^*, \boldsymbol{\alpha} ^*)$ 是拉格朗日函数的鞍点
若满足 weak Slater's condition 且 $f, g _i$ 可导，则带约束优化问题的解 $\bf x ^*$ 满足存在 $\boldsymbol{\alpha} ^*\ge 0$ 使得 $({\bf x} ^*, \boldsymbol{\alpha} ^*)$ 是拉格朗日函数的鞍点

充要性证明。由于书本上没提供必要性的证明，且充分性证明不难但是不够漂亮，所以就不抄了，只给出我自己的思路（虽然可能有缺陷，下图也只是示意）：

回到最初的不等式：

\[d ^* = \sup _{\boldsymbol{\alpha}} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}})\le\inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}) = p ^* \]
定义 ${\bf x} ^*$ 取到 $p ^*=\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}),\ \forall {\bf x}$
定义 ${\boldsymbol{\alpha}} ^*$ 取到 $d ^*=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\ge \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}}),\ \forall {\boldsymbol{\alpha}}$

充分性：
若存在鞍点，反证法（见图右上），假设 $({\bf x} ^* , {\boldsymbol{\alpha}} ^*)$ 不是鞍点，记鞍点为 $({\bf x}' , {\boldsymbol{\alpha}}')$，则 ${\cal L}({\bf x}', {\boldsymbol{\alpha}} ^*){\color{green}<} {\cal L}({\bf x}', {\boldsymbol{\alpha}}')=\inf _{\bf x}{\cal L}({\bf x}, {\boldsymbol{\alpha}}'){\color{green}\le} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})$，观察不等号两头发现矛盾，则 ${\boldsymbol{\alpha} ^*}={\boldsymbol{\alpha}}'$；对 $\bf x ^*,x'$ 同理
从而 $({\bf x} ^* , {\boldsymbol{\alpha}} ^*)$ 是鞍点，即 $\forall {\bf x},\forall {\boldsymbol{\alpha}},\ {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*)\le {\cal L}({\bf x}, {\boldsymbol{\alpha}} ^*)$，则有 $\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}}) = {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*) = \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})$，从而 $p ^* = d ^*$

必要性：
若 $p ^* =d ^*$，即 $\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})$，且 $\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\ge {\cal L}({\bf x} ^*, {\boldsymbol{\alpha} ^*})\ge \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})$；故三者取等，故 $\forall {\bf x},\forall {\boldsymbol{\alpha}},\ {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})={\cal L}({\bf x} ^*, {\boldsymbol{\alpha} ^*})=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\le {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})$，从而 $({\bf x} ^* , {\boldsymbol{\alpha}} ^*)$ 是鞍点

爽！

综上，我们用一个定理来总结：KKT
Th. Karush-Kuhn-Tucker's theorem
带约束优化问题，假设 $f, g _i:{\cal X}\to\mathbb{R},\forall i\in[m]$，为 convex and differentiable，且满足 Slater's condition

\[\min _{\bf x\in\cal X, {\bf g}({\bf x})\le {\bf 0}} f({\bf x}) \]

其拉格朗日函数为 ${\cal L}({\bf x}, {\boldsymbol{\alpha}})=f({\bf x}) + {\boldsymbol{\alpha}}\cdot {\bf g}({\bf x}), {\boldsymbol{\alpha}}\ge 0$
则 $\bar{\bf x}$ 是该问题的解，当且仅当存在 $\bar{\boldsymbol{\alpha}}\ge 0$，满足：

\[\begin{aligned} \nabla _{\bf x} {\cal L}(\bar{\bf x}, \bar{\boldsymbol{\alpha}}) &= \nabla _{\bf x} f(\bar{\bf x}) + \bar{\boldsymbol{\alpha}}\cdot \nabla _{\bf x} {\bf g}(\bar{\bf x}) = 0 \\ \nabla _{\boldsymbol{\alpha}} {\cal L}(\bar{\bf x}, \bar{\boldsymbol{\alpha}})&={\bf g}(\bar{\bf x})\le 0 \\ \bar{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x}) &= 0 \end{aligned}\quad ;\text{KKT conditions} \]

其中后两条称为互补条件 complementarity conditions，即对任意 $i\in[m], \bar{\boldsymbol{\alpha}} _i\ge 0,g _i(\bar{\bf x})\le 0$，且满足 $\bar{\boldsymbol{\alpha}} _i g _i(\bar{\bf x})=0$

充要性证明：

必要性，$\bar{\bf x}$ 为解，则存在 $\bar{\boldsymbol{\alpha}}$ 使得 $(\bar{\bf x}, \bar{\boldsymbol{\alpha}})$ 为鞍点，从而得到 KKT 条件：第一条即鞍点定义，第二、三条：

\[\begin{aligned} \forall {{\boldsymbol{\alpha}}},{\cal L}(\bar{\bf x}, {\boldsymbol{\alpha}})\le{\cal L}(\bar{\bf x}, \bar{\boldsymbol{\alpha}})&\implies{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x})\le \bar{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x})\\{\boldsymbol{\alpha}}\to +\infty&\implies {\bf g}(\bar{\bf x})\le 0 \\ {\boldsymbol{\alpha}}\to 0 &\implies \bar{\boldsymbol{\alpha}}\cdot{\bf g}(\bar{\bf x})=0 \end{aligned} \]
充分性，满足 KKT 条件，则对于满足 $\bf g({\bf x})\le 0$ 的 $\bf x$：

\[\begin{aligned} f({\bf x}) - f(\bar{\bf x}) &\ge \nabla _{\bf x} f(\bar{\bf x})\cdot ({\bf x-\bar{x}}) & ;\text{convexity of $f$} \\ &= -\bar{\boldsymbol{\alpha}}\cdot \nabla _{\bf x} {\bf g}(\bar{\bf x})\cdot ({\bf x-\bar{x}}) &; \text{first cond}\\ &\ge -\bar{\boldsymbol{\alpha}}\cdot ({\bf g}({\bf x})-{\bf g}(\bar{\bf x})) &; \text{convexity of $g$}\\ &= -\bar{\boldsymbol{\alpha}}\cdot {\bf g}({\bf x}) \ge 0 &;\text{third cond} \end{aligned} \]

Fenchel duality

凸优化问题，对 $f$ 不可导或者有无穷大的值的情况进行分析

optimization exploring recursion convex

optimization university stanford convex

optimization definition convex the

convex

functions convex

sequence convex 049d agc

convex sets

题解sequence convex 049d

凸包concavity convex hull