数据挖掘实验
报告
软件系统测试报告下载sgs报告如何下载关于路面塌陷情况报告535n,sgs报告怎么下载竣工报告下载
《数据挖掘》实验报告
实验序号: 实验项目名称:C4.5算法
12数学金融 学 号 姓 名 专业、班
2014.12.24 实验地点 实验楼5-510 指导教师 潘巍巍 实验时间 一、实验目的及要求
1:选择一个数据挖掘标准数据集,采用C4.5算法进行分类,给出分类精度,画出用C4.5算法诱导的
树并写出生成的规则集合。
2:在数据挖掘标准数据集上,实验对比剪枝与未剪枝的树的分类性能。 3:
总结
初级经济法重点总结下载党员个人总结TXt高中句型全总结.doc高中句型全总结.doc理论力学知识点总结pdf
C4.5算法的优缺点
二、实验设备(环境)及要求
电脑 WEKA 3.6.1
三、实验内容与步骤
(3)数据分类,c4.5算法实现,
1.导入数据
,2,选择C4.5分类器进行分类 结果为
其中分类精度为50%
生成的决策树为
分类规则:
J48 pruned tree
------------------
outlook = sunny
| humidity = high: no (3.0) | humidity = normal: yes (2.0) outlook = overcast: yes (4.0) outlook = rainy
| windy = TRUE: no (2.0) | windy = FALSE: yes (3.0) 剪枝后结果为
分类精度变为57.1% 性能变好
,1,C4.5算法优缺点
优点: 分类精度高,生成的分类规则比较简单,易于理解。 缺点: 需要多次扫描数据集,比较低效
五、分析与讨论
六、教师评语 成绩
签名:
日期:
《数据挖掘》实验报告
实验序号: 实验项目名称:KNN算法
12数学金融 学 号 姓 名 专业、班
2014.12.24 实验地点 实验楼5-510 指导教师 潘巍巍 实验时间 一、实验目的及要求
1:KNN算法的基本思路、步骤。
2:选择UCI中的5个标准数据集,使用KNN算法在该数据集上计算混淆矩阵。 3:选择2个数据集,选择不同的k值,k=1,3,5,7,9,对比KNN算法计算结果的差异。 二、实验设备(环境)及要求
电脑 WEKA 3.6.1
四、实验内容与步骤
1.数据集 contact-lenses.arff
Glass.arff
两者的混淆矩阵分别为
(2)两个数据集在K=1,3,5,7,9下结果分别为
Glass:
K=1;
=== Summary ===
Correctly Classified Instances 151 70.5607 %
Incorrectly Classified Instances 63 29.4393 %
Kappa statistic 0.6005
Mean absolute error 0.0897
Root mean squared error 0.2852
Relative absolute error 42.3747 %
Root relative squared error 87.8627 %
Total Number of Instances 214
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.786 0.167 0.696 0.786 0.738 0.806 build wind float
0.671 0.13 0.739 0.671 0.703 0.765 build wind
non-float
0.294 0.051 0.333 0.294 0.313 0.59 vehic wind float
0 0 0 0 0 ? vehic wind
non-float
0.769 0.03 0.625 0.769 0.69 0.895 containers
0.778 0.015 0.7 0.778 0.737 0.838 tableware
0.793 0.011 0.92 0.793 0.852 0.884 headlamps Weighted Avg. 0.706 0.109 0.709 0.706 0.704 0.792
=== Confusion Matrix ===
a b c d e f g <-- classified as
55 9 6 0 0 0 0 | a = build wind float
15 51 4 0 3 2 1 | b = build wind non-float
9 3 5 0 0 0 0 | c = vehic wind float
0 0 0 0 0 0 0 | d = vehic wind non-float
0 2 0 0 10 0 1 | e = containers
0 1 0 0 1 7 0 | f = tableware
0 3 0 0 2 1 23 | g = headlamps
K=3;
=== Summary ===
Correctly Classified Instances 154 71.9626 %
Incorrectly Classified Instances 60 28.0374 %
Kappa statistic 0.6097
Mean absolute error 0.0983
Root mean squared error 0.2524
Relative absolute error 46.4438 %
Root relative squared error 77.7792 %
Total Number of Instances 214
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.843 0.215 0.656 0.843 0.738 0.865 build wind float
0.711 0.138 0.74 0.711 0.725 0.835 build wind non-float
0.176 0.015 0.5 0.176 0.261 0.672 vehic wind
float
0 0 0 0 0 ? vehic wind non-float
0.615 0.015 0.727 0.615 0.667 0.913 containers
0.778 0.01 0.778 0.778 0.778 0.914 tableware
0.793 0.011 0.92 0.793 0.852 0.885 headlamps
Weighted Avg. 0.72 0.123 0.718 0.72 0.708 0.847
=== Confusion Matrix ===
a b c d e f g <-- classified as
59 10 1 0 0 0 0 | a = build wind float
19 54 2 0 1 0 0 | b = build wind non-float
10 4 3 0 0 0 0 | c = vehic wind float
0 0 0 0 0 0 0 | d = vehic wind non-float
0 3 0 0 8 0 2 | e = containers
0 1 0 0 1 7 0 | f = tableware
2 1 0 0 1 2 23 | g = headlamps
K=5;
=== Summary ===
Correctly Classified Instances 145 67.757 %
Incorrectly Classified Instances 69 32.243 %
Kappa statistic 0.5469
Mean absolute error 0.1085
Root mean squared error 0.2563
Relative absolute error 51.243 %
Root relative squared error 78.9576 %
Total Number of Instances 214
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.843 0.229 0.641 0.843 0.728 0.867
build wind float
0.684 0.174 0.684 0.684 0.684 0.848
build wind non-float
0 0.01 0 0 0 0.642 vehic wind float
0 0 0 0 0 ?
vehic wind non-float
0.385 0.025 0.5 0.385 0.435 0.952
containers
0.667 0.01 0.75 0.667 0.706 0.909 tableware
0.793 0.016 0.885 0.793 0.836 0.89
headlamps
Weighted Avg. 0.678 0.142 0.635 0.678 0.651 0.853
=== Confusion Matrix ===
a b c d e f g <-- classified as
59 10 1 0 0 0 0 | a = build wind float
20 52 1 0 3 0 0 | b = build wind non-float
12 5 0 0 0 0 0 | c = vehic wind float
0 0 0 0 0 0 0 | d = vehic wind non-float
0 5 0 0 5 0 3 | e = containers
0 2 0 0 1 6 0 | f = tableware
1 2 0 0 1 2 23 | g = headlamps
K=7;=== Summary ===
Correctly Classified Instances 137 64.0187 %
Incorrectly Classified Instances 77 35.9813 %
Kappa statistic 0.4948
Mean absolute error 0.1147
Root mean squared error 0.2557
Relative absolute error 54.1689 %
Root relative squared error 78.7876 %
Total Number of Instances 214
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.829 0.271 0.598 0.829 0.695 0.876
build wind float
0.605 0.181 0.648 0.605 0.626 0.852
build wind non-float
0.059 0.005 0.5 0.059 0.105 0.71
vehic wind float
0 0 0 0 0 ? vehic wind non-float
0.308 0.03 0.4 0.308 0.348 0.939
containers
0.556 0.015 0.625 0.556 0.588 0.976
tableware
0.793 0.016 0.885 0.793 0.836 0.89 headlamps
Weighted Avg. 0.64 0.158 0.636 0.64 0.617 0.864
=== Confusion Matrix ===
a b c d e f g <-- classified as
58 11 1 0 0 0 0 | a = build wind float
26 46 0 0 4 0 0 | b = build wind non-float
11 5 1 0 0 0 0 | c = vehic wind float
0 0 0 0 0 0 0 | d = vehic wind non-float
0 5 0 0 4 1 3 | e = containers
1 2 0 0 1 5 0 | f = tableware
1 2 0 0 1 2 23 | g = headlamps
K=9;
=== Summary ===
Correctly Classified Instances 135 63.0841 %
Incorrectly Classified Instances 79 36.9159 %
Kappa statistic 0.4782
Mean absolute error 0.1196
Root mean squared error 0.2581
Relative absolute error 56.4924 %
Root relative squared error 79.5178 %
Total Number of Instances 214
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area
Class
0.829 0.278 0.592 0.829 0.69 0.881 build wind float
0.645 0.174 0.671 0.645 0.658 0.853 build wind non-float
0 0.005 0 0 0 0.694 vehic wind float
0 0 0 0 0 ? vehic wind non-float
0.231 0.03 0.333 0.231 0.273 0.933 containers
0.222 0.015 0.4 0.222 0.286 0.964 tableware
0.793 0.027 0.821 0.793 0.807 0.888 headlamps
Weighted Avg. 0.631 0.159 0.58 0.631 0.597 0.864
=== Confusion Matrix ===
a b c d e f g <-- classified as
58 11 1 0 0 0 0 | a = build wind float
23 49 0 0 3 1 0 | b = build wind non-float
13 4 0 0 0 0 0 | c = vehic wind float
0 0 0 0 0 0 0 | d = vehic wind non-float
0 6 0 0 3 0 4 | e = containers
3 1 0 0 2 2 1 | f = tableware
1 2 0 0 1 2 23 | g = headlamps
contact-lenses:
K=1;
=== Summary ===
Correctly Classified Instances 19 79.1667 % Incorrectly Classified Instances 5 20.8333 % Kappa statistic 0.6262
Mean absolute error 0.2262
Root mean squared error 0.3165
Relative absolute error 59.8856 %
Root relative squared error 72.4707 %
Total Number of Instances 24
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.8 0.053 0.8 0.8 0.8 0.958 soft
0.75 0.1 0.6 0.75 0.667 0.925 hard
0.8 0.222 0.857 0.8 0.828 0.896 none Weighted Avg. 0.792 0.167 0.802 0.792 0.795 0.914
=== Confusion Matrix ===
a b c <-- classified as
4 0 1 | a = soft
0 3 1 | b = hard
1 2 12 | c = none
K=3;
=== Summary ===
Correctly Classified Instances 19 79.1667 % Incorrectly Classified Instances 5 20.8333 % Kappa statistic 0.6262
Mean absolute error 0.2262
Root mean squared error 0.3165
Relative absolute error 59.8856 %
Root relative squared error 72.4707 %
Total Number of Instances 24
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area
Class
0.8 0.053 0.8 0.8 0.8 0.958
soft
0.75 0.1 0.6 0.75 0.667 0.925
hard
0.8 0.222 0.857 0.8 0.828 0.896
none
Weighted Avg. 0.792 0.167 0.802 0.792 0.795 0.914
=== Confusion Matrix ===
a b c <-- classified as
4 0 1 | a = soft
0 3 1 | b = hard
1 2 12 | c = none
K=5;
=== Summary ===
Correctly Classified Instances 16 66.6667 % Incorrectly Classified Instances 8 33.3333 % Kappa statistic 0.3356
Mean absolute error 0.2793
Root mean squared error 0.3624
Relative absolute error 73.9227 %
Root relative squared error 82.9705 %
Total Number of Instances 24
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.6 0.053 0.75 0.6 0.667 0.947
soft
0.25 0.1 0.333 0.25 0.286 0.856
hard
0.8 0.556 0.706 0.8 0.75 0.859
none
Weighted Avg. 0.667 0.375 0.653 0.667 0.655 0.877
=== Confusion Matrix ===
a b c <-- classified as
3 0 2 | a = soft
0 1 3 | b = hard
K=7;=== Summary ===
Correctly Classified Instances 14 58.3333 %
Incorrectly Classified Instances 10 41.6667 %
Kappa statistic -0.0619
Mean absolute error 0.3188
Root mean squared error 0.387
Relative absolute error 84.3959 %
Root relative squared error 88.61 %
Total Number of Instances 24
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0 0.053 0 0 0 0.947
soft
0 0 0 0 0 0.831
hard
0.933 1 0.609 0.933 0.737 0.807
none
Weighted Avg. 0.583 0.636 0.38 0.583 0.461 0.841
=== Confusion Matrix ===
a b c <-- classified as
0 0 5 | a = soft
0 0 4 | b = hard
1 0 14 | c = none
K=9;
=== Summary ===
Correctly Classified Instances 14 58.3333 %
Incorrectly Classified Instances 10 41.6667 %
Kappa statistic -0.0619
Mean absolute error 0.3188
Root mean squared error 0.387
Relative absolute error 84.3959 %
Root relative squared error 88.61 %
Total Number of Instances 24
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0 0.053 0 0 0 0.947
soft
0 0 0 0 0 0.831
hard
0.933 1 0.609 0.933 0.737 0.807
none
Weighted Avg. 0.583 0.636 0.38 0.583 0.461 0.841
=== Confusion Matrix ===
a b c <-- classified as
0 0 5 | a = soft
0 0 4 | b = hard
1 0 14 | c = none
可以看出第一个数据集在K=3时分类精度最高,而第二个数据集在K=1或3时分类精度最
高。
五、分析与讨论