【AL&MT】Decision Tree

发布时间 2023-08-26 22:18:01作者: TangBao~

1 Introduction

  usual class in decision tree:ID3,C4.5,CART

  ID3:/Informattion Entropy,基于信息熵和信息增益

  C4.5:/信息增益率,base on the ID3

  CART:/基尼系数,using regress or class

2 achieving

  1.1 ID3 decision tree

  D-training set,a-attribute

  $input:a=\{a^{1},a^{2},...,a^{v}\}$

  $output:Gain(D,a)$

  model:

  $p_{i}:the\ i\ sample\ take\ part\ in\ the\ D$

  $Ent(D)=-\sum_{i=1}^{\|n\|}p_{i}log_{2}p_{i}$

  $Ent(D|a)=\sum^{V}_{v=1}\frac{|D^{v}|}{D}Ent(D^{v})$

  $Information\ Gain:Gain(D,a)=Ent(D)-Ent(D|a)$

  chosing the max of vartex of Gain

  1.2 C4.5 decision tree

  defect of id3:when the class of sample are overmach,it's class less precison.

  base on the Intrinsic Value

  $Gain:Gain(D,a)=Ent(D)-Ent(D|a)$

  $intrinsic\ value\ of\ a:$

  $IV(a)=-\sum^{V}_{v=1}\frac{|D^{v}|}{|D|}log_{2}\frac{|D^{v}|}{|D|}$

  $GainRatio(D,a)=\frac{Gain(D,a)}{IV(a)}$

  1.3 CART decision tree

  CART(Classification and regression tree),using the Gini index todevide sample.

  sklearn model in 'python' using cart mathods

  -Classification tree:aimed data divide or scatter

  -Regression tree:aimed data continuous

  $Gini(D)=-\sum^{|n|}_{i=1}\sum_{i'={i}}p_{i}^{i'}=1-\sum^{|n|}_{i=1}p^{2}_{i}$

  $GiniIndex(D,a)=-\sum^{V}_{v=1}\frac{|D^{v}|}{|D|}Gini(D^{v})$

3 sample

  T餐饮企业作为大型的连锁企业,生产的产品种类比较多,另外涉及的分店所处的位置也不同、数目比较多。对于企业的高层来讲,了解周末非周末销量是否有大的区别,以及天气、促销活动等因素是否能够影响门店的销量,对采取合理的营销策略,提高企业利润非常重要。因此,为了让决策者准确地了解和销量有关的一系列影响因素,需要构建模型来分析天气、是否周末和是否有促销等活动对其销量的影响。各属性的取值如下:

4 code

...

5 problem

...