An Empirical Study of Classifier Behavior in Rattle Tool

Wibowo, Wahyu; Abdul-Rahman, Shuzlina

doi:10.1007/978-981-13-3441-2_25

Wahyu Wibowo¹² &
Shuzlina Abdul-Rahman¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 937))

Included in the following conference series:

International Conference on Soft Computing in Data Science

679 Accesses

Abstract

There are many factors that influence classifiers behavior in machine learning, and thus determining the best classifier is not an easy task. One way of tackling this problem is by experimenting the classifiers with several performance measures. In this paper, the behaviors of machine learning classifiers are experimented using the Rattle tool. Rattle tool is a graphical user interface (GUI) in R package used to carry out data mining modeling using classifiers namely, tree, boost, random forest, support vector machine, logit and neural net. This study was conducted using simulation and real data in which the behaviors of the classifiers are observed based on accuracy, ROC curve and modeling time. Based on the simulation data, there is grouping of the algorithms in terms of accuracy. The first are logit, neural net and support vector machine. The second are boost and random forest and the third is decision tree. Based on the real data, the highest accuracy based on the training data is boost algorithm and based on the testing data the highest accuracy is the neural net algorithm. Overall, the support vector machine and neural net classifier are the two best classifiers in both simulation and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Delgado, M.F., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)
MathSciNet MATH Google Scholar
Williams, G.J.: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9890-3
Book MATH Google Scholar
Statistics Indonesia: Labor Market Indicators Indonesia, February 2017. https://www.bps.go.id/publication/2017/08/03/60626049b6ad3a897e96b8c0/indikator-pasar-tenaga-kerja-indonesia-februari-2017.html. Accessed 01 Aug 2018
Mutalib, S., Ali, A., Rahman, S.A., Mohamed, A.: An exploratory study in classification methods for patients’ dataset. In: 2nd Conference on Data Mining and Optimization. IEEE (2009)
Google Scholar
Ali, A.M., Angelov, P.: Anomalous behaviour detection based on heterogeneous data and data fusion. Soft. Comput. 22(10), 3187–3201 (2018)
Article Google Scholar
Therneau, T., Atkinson, B., Ripley, B.: rpart: recursive partitioning and regression trees. R package version 4.1–11. https://cran.r-project.org/web/packages/rpart/index.html. Accessed 01 Aug 2018
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Google Scholar
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y.: xgboost: extreme gradient boosting. R package version 0.6.4.1. https://cran.r-project.org/web/packages/xgboost/index.html. Accessed 01 Aug 2018
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004). https://www.jstatsoft.org/article/view/v011i09. Accessed 01 Aug 2018
Article Google Scholar
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 01 Aug 2018
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). https://doi.org/10.1007/978-0-387-21706-2
Book MATH Google Scholar

Download references

Acknowledgment

The authors are grateful to the Institut Teknologi Sepuluh Nopember that has supported this work partly through the Research Grant contract number 1192/PKS/ITS/2018 (1302/PKS/ITS/2018).

Author information

Authors and Affiliations

Institut Teknologi Sepuluh Nopember, 60111, Surabaya, Indonesia
Wahyu Wibowo
Research Initiative Group of Intelligent Systems, Faculty of Computer & Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
Shuzlina Abdul-Rahman

Authors

Wahyu Wibowo
View author publications
You can also search for this author in PubMed Google Scholar
Shuzlina Abdul-Rahman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wahyu Wibowo .

Editor information

Editors and Affiliations

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia
Bee Wah Yap
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia
Azlinah Hj Mohamed
Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN, USA
Michael W. Berry

Appendices

Appendix A. Summary of Results

Replication	Training data accuracy
Replication	Tree	Forest	Boost	SVM	Logit	Neural
1	86.9	100	92.8	91.5	91.4	91.5
2	87	100	92.7	91.6	91.5	91.5
3	86.7	100	92.7	91.3	91.3	91.3
4	87	100	92.6	91.3	91.4	91.4
5	86.3	100	92.7	91.5	91.4	91.5
6	86.8	100	92.7	91.4	91.3	91.3
7	85.8	100	92.8	91.5	91.5	91.5
8	86.6	100	92.7	91.4	91.3	91.4
9	86.7	100	92.6	91.4	91.3	91.3
10	86.8	100	92.8	91.6	91.5	91.5
Mean	86.660	100.000	92.710	91.450	91.390	91.420
sd	0.366	0.000	0.074	0.108	0.088	0.092

Replication	Testing data accuracy
Replication	Tree	Forest	Boost	SVM	Logit	Neural
1	86.4	90.6	90.8	91	91.2	91.1
2	86.2	90.6	90.6	91	91	91
3	86.3	91	91.2	91.5	91.7	91.7
4	86.6	91.1	91	91.4	91.8	91.7
5	86.2	90.7	90.7	91.1	91.3	91.3
6	86.2	90.6	90.7	91.2	91.3	91.4
7	85.3	90.9	91	91.4	91.5	91.5
8	86.3	90.6	90.7	91.2	91.2	91.2
9	86.8	90.9	91.1	91.4	91.7	91.5
10	86.7	91.1	91	91.3	91.4	91.4
Mean	86.300	90.810	90.880	91.250	91.410	91.380
sd	0.414	0.213	0.204	0.178	0.260	0.235

Replication	Area under curve training data
Replication	Tree	Forest	Boost	SVM	Logit	Neural
1	0.8788	1	0.9751	0.9524	0.9661	0.9664
2	0.8740	1	0.9751	0.953	0.9664	0.9666
3	0.8690	1	0.9748	0.952	0.9653	0.9655
4	0.8785	1	0.9747	0.9528	0.9658	0.966
5	0.8673	1	0.9755	0.9547	0.9664	0.9667
6	0.8820	1	0.9749	0.954	0.9657	0.9659
7	0.8636	1	0.9756	0.9538	0.9665	0.9667
8	0.8779	1	0.9752	0.9547	0.9665	0.9667
9	0.8742	1	0.9749	0.9521	0.9658	0.9661
10	0.8733	1	0.975	0.9531	0.9662	0.9664
Mean	0.8739	1	0.9751	0.9533	0.9661	0.9663
sd	0.0058	0	0.0003	0.0010	0.0004	0.0004

Replication	Area under curve testing data
Replication	Tree	Forest	Boost	SVM	Logit	Neural
1	0.8697	0.9576	0.9615	0.9497	0.9647	0.9643
2	0.8615	0.9589	0.961	0.9483	0.9642	0.9639
3	0.8631	0.9623	0.9649	0.9548	0.9676	0.9673
4	0.8757	0.9625	0.9646	0.9544	0.968	0.9678
5	0.8597	0.9596	0.962	0.9529	0.9655	0.9654
6	0.8782	0.9597	0.9624	0.9512	0.9655	0.9654
7	0.857	0.9608	0.964	0.9529	0.9668	0.9665
8	0.873	0.9582	0.9603	0.9489	0.9642	0.964
9	0.8782	0.9611	0.9634	0.9512	0.9665	0.9662
10	0.8727	0.9621	0.9642	0.9534	0.9672	0.9669
Mean	0.869	0.960	0.963	0.952	0.966	0.966
sd	0.008	0.002	0.002	0.002	0.001	0.001

Replication	Processing time (sec)
Replication	Tree	Forest	Boost	SVM	Logit	Neural
1	4.51	76.8	5.39	504.6	3.2	19.37
2	4.71	81.6	3.22	494.4	2.49	21.41
3	5.14	89.4	3.96	510.6	2.18	19.95
4	4.96	92.4	3.25	513.6	2.9	21.09
5	4.8	90.6	4.69	483.6	2.12	22.9
6	4.55	79.8	3.11	481.8	1.92	21.28
7	4.74	90.6	4.82	496.2	1.98	21.69
8	4.74	83.4	4.58	496.8	1.91	21.74
9	4.99	84	2.79	504.6	1.93	19.06
10	4.8	80.4	3.01	480	2.17	21.26
Mean	4.794	84.900	3.882	496.620	2.280	20.975
sd	0.194	5.450	0.924	11.933	0.447	1.177

Appendix B. ROC Curve of Classifier Real Data

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wibowo, W., Abdul-Rahman, S. (2019). An Empirical Study of Classifier Behavior in Rattle Tool. In: Yap, B., Mohamed, A., Berry, M. (eds) Soft Computing in Data Science. SCDS 2018. Communications in Computer and Information Science, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-3441-2_25

Download citation

DOI: https://doi.org/10.1007/978-981-13-3441-2_25
Published: 11 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3440-5
Online ISBN: 978-981-13-3441-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics