note_02
Keywords: Classification, Logistic Regression, Overfitting, Regularization
1 Motivation
Classification:
- "binary classification": \(y\) can only be one of two values
- class / category
Try using linear regression to do:
It seems work.
However, when there's another one sample point:
It will cause the \(x\) value, corresponding to the threshold value 0.5, moving to right, which is worse because some point are misclassified.
Logistic regression
- Could be used to solve this situation.
- It is actually used for binary classification problems despite of its name.
2 Logistic Regression
2.1 Conception
sigmoid / logistic function: \(g(z)=\frac{1}{1+e^{-z}}\), \(0<g(z)<1\)
Logistic regression:
Understanding logistic regression: the possibility of the class or label \(y\) will be equal to \(1\) given a certain input \(x\).
2.2 Decision Boundary
2.2.1 Threshold
2.2.2 Linear
2.2.3 Non-linear
2.3 Cost Function
2.3.1 Squared Error Cost
How to choose \(\vec{w}\) and \(b\) for logistic regression:
The squared error cost function is not a good choice:
Because it has many local minimum and is not really as smooth as the "soup bowl" from linear regression:
2.3.2 Logistic Loss
We set the Logistic Loss Function as:
To understanding it:
We can see the cost curve is much better:
2.3.3 Simplified Cost Function
We can write loss function as:
The cost function could be simplified as:
2.4 Gradient Descent
Find \(\vec{w}\) and \(b\) to minimize the cost function.
Given new \(\vec{x}\), output
2.4.1 Implementation
That's quite amazing that the partial derivative of cost function is same.
Derivation
Here, I do the derivation:
Other parts is similar:
3 Overfitting
3.1 Conception
underfitting
- doesn't fit the training set well
- high bias
just right
- fits training set pretty well
- generalization
overfitting
- fits the training set extremely well
- high variance
3.2 Addressing Overfitting
Method 1: Collect more training examples
Method 2: Feature Selection - Select features to include / exclude
Method 3: Regularization
- more gentler
- Preserve all features while preventing them from having a significant impact
是否对\(b\)进行正则化,没有啥影响。
4 Regularization
4.1 Conception
Modify the cost function. Intuitively, when \(w\) is small, the modified cost function can be smaller.
So, set the cost function as:
- \(\lambda\): regularization parameter, positive
- \(b\) can be included or excluded
Here you can see, if the \(\lambda\) is too small (let's say \(0\)), it will be underfitting; and if the \(\lambda\) is too big (let's say \(10^{10}\)), it will be overfitting.
4.2 Regularized Linear Regression
Gradient descent
repeat{
}
Here, rewrite the \(w_j\):
We can know that the latter term of \(w_j\) is the usually-updating term, and the former one is to shrink \(w_j\).
4.3 Regularized Logistic Regression
- Learning Machine note 02learning machine note 02 learning machine note non-deep learning machine notes learning machine note 01 learning机器machine clustering algorithm learning machine learning project machine eecs collaboration differential learning machine learning machine bigdataaiml-ml-models bigdataaiml learning machine python in