目录
感知器的种类
- 离散感知器:输出的预测值仅为 0 或 1
- 连续感知器(逻辑分类器):输出的预测值可以是 0 到 1 的任何数字,标签为 0 的点输出接近于 0 的数,标签为 1 的点输出接近于 1 的数
- 逻辑回归算法(logistics regression algorithm):用于训练逻辑分类器的算法
sigmoid(logistics)函数
- sigmoid 函数:
\[\begin{aligned}
& g(z) = \frac{1}{1 + e^{-z}},\ z \in (-\infty, +\infty),\ 0 < g(z) < 1 \\
& when \ z \in (-\infty, 0), \ 0 < g(z) < 0.5 \\
& when \ z \in [0, +\infty), \ 0.5 \leq g(z) < 1
\end{aligned}
\]
- 决策边界(decision boundary):
\[线性决策边界:z = \vec{w} \cdot \vec{x} + b \\
非线性决策边界(例如):z = x_1^2 + x_2^2 - 1
\]
- sigmoid 函数与线性决策边界函数的结合(\(\hat{y}\) 为预测值):
\[\begin{aligned}
& \hat{y} = g(z) = \frac{1}{1 + e^{-z}} \\
& \hat{y} = f_{\vec{w}, b}(\vec{x}) = \frac{1}{1 + e^{-(\vec{w} \cdot \vec{x} + b)}}
\end{aligned}
\]
- 决策原理:
\[\begin{cases}
\hat{y} = 0 \ 或 \ 0 < f_{\vec{w}, b}(\vec{x}) < 0.5, \ 当 \ z = \vec{w} \cdot \vec{x} + b < 0 \\
\hat{y} = 1 \ 或 \ 0.5 \leq f_{\vec{w}, b}(\vec{x}) < 1, \ 当 \ z = \vec{w} \cdot \vec{x} + b \geq 0
\end{cases}
\]
代价/损失函数(cost function)——对数损失函数(log loss function)
- 一个训练样本:\(\vec{x}^{(i)} = (x_1^{(i)}, x_2^{(i)}, ..., x_n^{(i)})\) 和 \(y^{(i)}\)
- 训练样本总数 = \(m\)
- 对数损失函数(log loss function):
\[\begin{aligned}
L(f_{\vec{w}, b}(\vec{x}^{(i)}), y^{(i)}) &=
\begin{cases}
-\ln [f_{\vec{w}, b}(\vec{x}^{(i)})], \ y^{(i)} = 1 \\
-\ln [1 - f_{\vec{w}, b}(\vec{x}^{(i)})], \ y^{(i)} = 0 \\
\end{cases}
\\ & = -y^{(i)} \ln [f_{\vec{w}, b}(\vec{x}^{(i)})] - [1 - y^{(i)}] \ln [1 - f_{\vec{w}, b}(\vec{x}^{(i)})]
\end{aligned}
\]
- 代价函数(cost function):
\[\begin{aligned}
J(\vec{w}, b) &= \frac{1}{m} \sum_{i=1}^{m} L(f_{\vec{w}, b}(\vec{x}^{(i)}), y^{(i)}) \\
&= -\frac{1}{m} \sum_{i=1}^{m} \bigg(y^{(i)} \ln [f_{\vec{w}, b}(\vec{x}^{(i)})] + [1 - y^{(i)}] \ln [1 - f_{\vec{w}, b}(\vec{x}^{(i)})] \bigg)
\end{aligned}
\]
梯度下降算法(gradient descent algorithm)
- \(\alpha\):学习率(learning rate),用于控制梯度下降时的步长,以抵达损失函数的极小值处(不是最小值!)。
- 逻辑回归的梯度下降算法:
\[\begin{aligned}
repeat \{ \\
& tmp\_w_1 = w_1 - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] x_1^{(i)} \\
& tmp\_w_2 = w_2 - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] x_2^{(i)} \\
& ... \\
& tmp\_w_n = w_n - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] x_n^{(i)} \\
& tmp\_b = b - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] \\
& simultaneous \ update \ every \ parameters \\
\} until \ & converge
\end{aligned}
\]
正则化逻辑回归(regularization logistics regression)
- 正则化的作用:解决过拟合(overfitting)问题(也可通过增加训练样本数据解决)。
- 损失/代价函数(仅需正则化 \(w\),无需正则化 \(b\)):
\[\begin{aligned}
J(\vec{w}, b) &= -\frac{1}{m} \sum_{i=1}^{m} \bigg(y^{(i)} \ln [f_{\vec{w}, b}(\vec{x}^{(i)})] + [1 - y^{(i)}] \ln [1 - f_{\vec{w}, b}(\vec{x}^{(i)})] \bigg) + \frac{\lambda}{2m} \sum^{n}_{j=1} w_j^2
\end{aligned}
\]
其中,第二项为正则化项(regularization term),使 \(w_j\) 变小。初始设置的 \(\lambda\) 越大,最终得到的 \(w_j\) 越小。
- 梯度下降算法:
\[\begin{aligned}
repeat \{ \\
& tmp\_w_1 = w_1 - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] x_1^{(i)} + \frac{\lambda}{m} w_1 \\
& tmp\_w_2 = w_2 - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] x_2^{(i)} + \frac{\lambda}{m} w_2 \\
& ... \\
& tmp\_w_n = w_n - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] x_n^{(i)} + \frac{\lambda}{m} w_n \\
& tmp\_b = b - \alpha \frac{1}{m} \sum^{m}_{i=1} [f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)}] \\
& simultaneous \ update \ every \ parameters \\
\} until \ & converge
\end{aligned}
\]