学习 ML 过程中的一些概念及阐述

发布时间 2023-12-04 23:39:55作者: LacLic

random forest

a set of decision trees, make classification by voting (maybe with some weight)

多颗决策树, 采用类似投票的方式(可以占一定比重)决定分类

bagging and boosting

letting weak models consist of strong model

用多个弱模型组成强模型

bagging

randomly sampling the training data with replacement

generally 68% non-repeating is selected from the data

repeat the operation above in each classifier (like SVM, decision tree...)

Advantage: reduce the variance

有放回地随机选择与样本集等大的样本

不重复的样本约 68%

对每个分类器重复上述操作, 比如支持向量机, 决策树

优点: 降低方差

boosting (here directly introduce adaptive boosting)

do default training in the first classifier

raise the weight of mis-classified samples in the training of the next classifier

repeat the operation above

assign weight to each classifier based on the accuracy

对第一个分类器做默认训练

提高被错分的样本在下一个分类器的训练时的权重

重复上述操作

基于准确率对每个分类器分配权重