python實(shí)現(xiàn)決策樹分類算法代碼示例

更新時(shí)間：2022年06月11日 10:51:15 作者：貮叁

決策樹分類算法是最為常見的一種分類算法,通過(guò)屬性劃分來(lái)建立一棵決策樹,測(cè)試對(duì)象通過(guò)在樹上由頂向下搜索確定所屬的分類,下面這篇文章主要給大家介紹了關(guān)于python實(shí)現(xiàn)決策樹分類算法的相關(guān)資料,需要的朋友可以參考下

前置信息

1、決策樹

決策樹是一種十分常用的分類算法，屬于監(jiān)督學(xué)習(xí)；也就是給出一批樣本，每個(gè)樣本都有一組屬性和一個(gè)分類結(jié)果。算法通過(guò)學(xué)習(xí)這些樣本，得到一個(gè)決策樹，這個(gè)決策樹能夠?qū)π碌臄?shù)據(jù)給出合適的分類

2、樣本數(shù)據(jù)

假設(shè)現(xiàn)有用戶14名，其個(gè)人屬性及是否購(gòu)買某一產(chǎn)品的數(shù)據(jù)如下：

編號(hào)	年齡	收入范圍	工作性質(zhì)	信用評(píng)級(jí)	購(gòu)買決策
01	<30	高	不穩(wěn)定	較差	否
02	<30	高	不穩(wěn)定	好	否
03	30-40	高	不穩(wěn)定	較差	是
04	>40	中等	不穩(wěn)定	較差	是
05	>40	低	穩(wěn)定	較差	是
06	>40	低	穩(wěn)定	好	否
07	30-40	低	穩(wěn)定	好	是
08	<30	中等	不穩(wěn)定	較差	否
09	<30	低	穩(wěn)定	較差	是
10	>40	中等	穩(wěn)定	較差	是
11	<30	中等	穩(wěn)定	好	是
12	30-40	中等	不穩(wěn)定	好	是
13	30-40	高	穩(wěn)定	較差	是
14	>40	中等	不穩(wěn)定	好	否

策樹分類算法

1、構(gòu)建數(shù)據(jù)集

為了方便處理，對(duì)模擬數(shù)據(jù)按以下規(guī)則轉(zhuǎn)換為數(shù)值型列表數(shù)據(jù)：

年齡：<30賦值為0；30-40賦值為1；>40賦值為2

收入：低為0；中為1；高為2

工作性質(zhì)：不穩(wěn)定為0；穩(wěn)定為1

信用評(píng)級(jí)：差為0；好為1

#創(chuàng)建數(shù)據(jù)集
def createdataset():
    dataSet=[[0,2,0,0,'N'],
            [0,2,0,1,'N'],
            [1,2,0,0,'Y'],
            [2,1,0,0,'Y'],
            [2,0,1,0,'Y'],
            [2,0,1,1,'N'],
            [1,0,1,1,'Y'],
            [0,1,0,0,'N'],
            [0,0,1,0,'Y'],
            [2,1,1,0,'Y'],
            [0,1,1,1,'Y'],
            [1,1,0,1,'Y'],
            [1,2,1,0,'Y'],
            [2,1,0,1,'N'],]
    labels=['age','income','job','credit']
    return dataSet,labels

調(diào)用函數(shù)，可獲得數(shù)據(jù)：

ds1,lab = createdataset()
print(ds1)
print(lab)

[[0, 2, 0, 0, ‘N’], [0, 2, 0, 1, ‘N’], [1, 2, 0, 0, ‘Y’], [2, 1, 0, 0, ‘Y’], [2, 0, 1, 0, ‘Y’], [2, 0, 1, 1, ‘N’], [1, 0, 1, 1, ‘Y’], [0, 1, 0, 0, ‘N’], [0, 0, 1, 0, ‘Y’], [2, 1, 1, 0, ‘Y’], [0, 1, 1, 1, ‘Y’], [1, 1, 0, 1, ‘Y’], [1, 2, 1, 0, ‘Y’], [2, 1, 0, 1, ‘N’]]
[‘age’, ‘income’, ‘job’, ‘credit’]

2、數(shù)據(jù)集信息熵

信息熵也稱為香農(nóng)熵，是隨機(jī)變量的期望。度量信息的不確定程度。信息的熵越大，信息就越不容易搞清楚。處理信息就是為了把信息搞清楚，就是熵減少的過(guò)程。

def calcShannonEnt(dataSet):
    numEntries = len(dataSet)
    labelCounts = {}
    for featVec in dataSet:
        currentLabel = featVec[-1]
        if currentLabel not in labelCounts.keys():
            labelCounts[currentLabel] = 0
        
        labelCounts[currentLabel] += 1            
        
    shannonEnt = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key])/numEntries
        shannonEnt -= prob*log(prob,2)
    
    return shannonEnt

樣本數(shù)據(jù)信息熵：

shan = calcShannonEnt(ds1)
print(shan)

0.9402859586706309

3、信息增益

信息增益：用于度量屬性A降低樣本集合X熵的貢獻(xiàn)大小。信息增益越大，越適于對(duì)X分類。

def chooseBestFeatureToSplit(dataSet):
    numFeatures = len(dataSet[0])-1
    baseEntropy = calcShannonEnt(dataSet)
    bestInfoGain = 0.0;bestFeature = -1
    for i in range(numFeatures):
        featList = [example[i] for example in dataSet]
        uniqueVals = set(featList)
        newEntroy = 0.0
        for value in uniqueVals:
            subDataSet = splitDataSet(dataSet, i, value)
            prop = len(subDataSet)/float(len(dataSet))
            newEntroy += prop * calcShannonEnt(subDataSet)
        infoGain = baseEntropy - newEntroy
        if(infoGain > bestInfoGain):
            bestInfoGain = infoGain
            bestFeature = i    
    return bestFeature

以上代碼實(shí)現(xiàn)了基于信息熵增益的ID3決策樹學(xué)習(xí)算法。其核心邏輯原理是：依次選取屬性集中的每一個(gè)屬性，將樣本集按照此屬性的取值分割為若干個(gè)子集；對(duì)這些子集計(jì)算信息熵，其與樣本的信息熵的差，即為按照此屬性分割的信息熵增益；找出所有增益中最大的那一個(gè)對(duì)應(yīng)的屬性，就是用于分割樣本集的屬性。

計(jì)算樣本最佳的分割樣本屬性，結(jié)果顯示為第0列，即age屬性：

col = chooseBestFeatureToSplit(ds1)
col

0

4、構(gòu)造決策樹

def majorityCnt(classList):
    classCount = {}
    for vote in classList:
        if vote not in classCount.keys():classCount[vote] = 0
        classCount[vote] += 1
    sortedClassCount = sorted(classList.iteritems(),key=operator.itemgetter(1),reverse=True)#利用operator操作鍵值排序字典
    return sortedClassCount[0][0]

#創(chuàng)建樹的函數(shù)    
def createTree(dataSet,labels):
    classList = [example[-1] for example in dataSet]
    if classList.count(classList[0]) == len(classList):
        return classList[0]
    if len(dataSet[0]) == 1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)
    bestFeatLabel = labels[bestFeat]
    myTree = {bestFeatLabel:{}}
    del(labels[bestFeat])
    featValues = [example[bestFeat] for example in dataSet]
    uniqueVals = set(featValues)
    for value in uniqueVals:
        subLabels = labels[:]
        myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels)
        
    return myTree

majorityCnt函數(shù)用于處理一下情況：最終的理想決策樹應(yīng)該沿著決策分支到達(dá)最底端時(shí)，所有的樣本應(yīng)該都是相同的分類結(jié)果。但是真實(shí)樣本中難免會(huì)出現(xiàn)所有屬性一致但分類結(jié)果不一樣的情況，此時(shí)majorityCnt將這類樣本的分類標(biāo)簽都調(diào)整為出現(xiàn)次數(shù)最多的那一個(gè)分類結(jié)果。

createTree是核心任務(wù)函數(shù)，它對(duì)所有的屬性依次調(diào)用ID3信息熵增益算法進(jìn)行計(jì)算處理，最終生成決策樹。

5、實(shí)例化構(gòu)造決策樹

利用樣本數(shù)據(jù)構(gòu)造決策樹：

Tree = createTree(ds1, lab)
print("樣本數(shù)據(jù)決策樹：")
print(Tree)

樣本數(shù)據(jù)決策樹：
{‘age’: {0: {‘job’: {0: ‘N’, 1: ‘Y’}},
1: ‘Y’,
2: {‘credit’: {0: ‘Y’, 1: ‘N’}}}}

6、測(cè)試樣本分類

給出一個(gè)新的用戶信息，判斷ta是否購(gòu)買某一產(chǎn)品：

年齡	收入范圍	工作性質(zhì)	信用評(píng)級(jí)
<30	低	穩(wěn)定	好
<30	高	不穩(wěn)定	好

def classify(inputtree,featlabels,testvec):
    firststr = list(inputtree.keys())[0]
    seconddict = inputtree[firststr]
    featindex = featlabels.index(firststr)
    for key in seconddict.keys():
        if testvec[featindex]==key:
            if type(seconddict[key]).__name__=='dict':
                classlabel=classify(seconddict[key],featlabels,testvec)
            else:
                classlabel=seconddict[key]
    return classlabel

labels=['age','income','job','credit']
tsvec=[0,0,1,1]
print('result:',classify(Tree,labels,tsvec))
tsvec1=[0,2,0,1]
print('result1:',classify(Tree,labels,tsvec1))

result: Y
result1: N

后置信息：繪制決策樹代碼

以下代碼用于繪制決策樹圖形，非決策樹算法重點(diǎn)，有興趣可參考學(xué)習(xí)

import matplotlib.pyplot as plt

decisionNode = dict(boxstyle="sawtooth", fc="0.8")
leafNode = dict(boxstyle="round4", fc="0.8")
arrow_args = dict(arrowstyle="<-")

#獲取葉節(jié)點(diǎn)的數(shù)目
def getNumLeafs(myTree):
    numLeafs = 0
    firstStr = list(myTree.keys())[0]
    secondDict = myTree[firstStr]
    for key in secondDict.keys():
        if type(secondDict[key]).__name__=='dict':#測(cè)試節(jié)點(diǎn)的數(shù)據(jù)是否為字典，以此判斷是否為葉節(jié)點(diǎn)
            numLeafs += getNumLeafs(secondDict[key])
        else:   numLeafs +=1
    return numLeafs

#獲取樹的層數(shù)
def getTreeDepth(myTree):
    maxDepth = 0
    firstStr = list(myTree.keys())[0]
    secondDict = myTree[firstStr]
    for key in secondDict.keys():
        if type(secondDict[key]).__name__=='dict':#測(cè)試節(jié)點(diǎn)的數(shù)據(jù)是否為字典，以此判斷是否為葉節(jié)點(diǎn)
            thisDepth = 1 + getTreeDepth(secondDict[key])
        else:   thisDepth = 1
        if thisDepth > maxDepth: maxDepth = thisDepth
    return maxDepth

#繪制節(jié)點(diǎn)
def plotNode(nodeTxt, centerPt, parentPt, nodeType):
    createPlot.ax1.annotate(nodeTxt, xy=parentPt,  xycoords='axes fraction',
             xytext=centerPt, textcoords='axes fraction',
             va="center", ha="center", bbox=nodeType, arrowprops=arrow_args )

#繪制連接線  
def plotMidText(cntrPt, parentPt, txtString):
    xMid = (parentPt[0]-cntrPt[0])/2.0 + cntrPt[0]
    yMid = (parentPt[1]-cntrPt[1])/2.0 + cntrPt[1]
    createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)

#繪制樹結(jié)構(gòu)  
def plotTree(myTree, parentPt, nodeTxt):#if the first key tells you what feat was split on
    numLeafs = getNumLeafs(myTree)  #this determines the x width of this tree
    depth = getTreeDepth(myTree)
    firstStr = list(myTree.keys())[0]     #the text label for this node should be this
    cntrPt = (plotTree.xOff + (1.0 + float(numLeafs))/2.0/plotTree.totalW, plotTree.yOff)
    plotMidText(cntrPt, parentPt, nodeTxt)
    plotNode(firstStr, cntrPt, parentPt, decisionNode)
    secondDict = myTree[firstStr]
    plotTree.yOff = plotTree.yOff - 1.0/plotTree.totalD
    for key in secondDict.keys():
        if type(secondDict[key]).__name__=='dict':#test to see if the nodes are dictonaires, if not they are leaf nodes   
            plotTree(secondDict[key],cntrPt,str(key))        #recursion
        else:   #it's a leaf node print the leaf node
            plotTree.xOff = plotTree.xOff + 1.0/plotTree.totalW
            plotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)
            plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))
    plotTree.yOff = plotTree.yOff + 1.0/plotTree.totalD

#創(chuàng)建決策樹圖形    
def createPlot(inTree):
    fig = plt.figure(1, facecolor='white')
    fig.clf()
    axprops = dict(xticks=[], yticks=[])
    createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)    #no ticks
    #createPlot.ax1 = plt.subplot(111, frameon=False) #ticks for demo puropses 
    plotTree.totalW = float(getNumLeafs(inTree))
    plotTree.totalD = float(getTreeDepth(inTree))
    plotTree.xOff = -0.5/plotTree.totalW; plotTree.yOff = 1.0;
    plotTree(inTree, (0.5,1.0), '')
    plt.savefig('決策樹.png',dpi=300,bbox_inches='tight')
    plt.show()

總結(jié)

到此這篇關(guān)于python實(shí)現(xiàn)決策樹分類算法的文章就介紹到這了,更多相關(guān)python決策樹分類算法內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

亚洲乱码中文字幕综合,中国熟女仑乱hd,亚洲精品乱拍国产一区二区三区,一本大道卡一卡二卡三乱码全集资源,又粗又黄又硬又爽的免费视频

軟件下載

源碼下載

軟件編程

網(wǎng)絡(luò)編程

在線工具

數(shù)據(jù)庫(kù)

CMS

常用工具

python實(shí)現(xiàn)決策樹分類算法代碼示例

目錄

前置信息

1、決策樹

2、樣本數(shù)據(jù)

策樹分類算法

1、構(gòu)建數(shù)據(jù)集

2、數(shù)據(jù)集信息熵

3、信息增益

4、構(gòu)造決策樹

5、實(shí)例化構(gòu)造決策樹

6、測(cè)試樣本分類

后置信息：繪制決策樹代碼

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

python實(shí)現(xiàn)決策樹分類算法代碼示例

目錄

前置信息

1、決策樹

2、樣本數(shù)據(jù)

策樹分類算法

1、構(gòu)建數(shù)據(jù)集

2、數(shù)據(jù)集信息熵

3、信息增益

4、構(gòu)造決策樹

5、實(shí)例化構(gòu)造決策樹

6、測(cè)試樣本分類

后置信息：繪制決策樹代碼

總結(jié)

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

1、決策樹

2、樣本數(shù)據(jù)

1、構(gòu)建數(shù)據(jù)集

2、數(shù)據(jù)集信息熵

4、構(gòu)造決策樹

5、實(shí)例化構(gòu)造決策樹

6、測(cè)試樣本分類