Skip to main content

An Empirical Study of Classifier Behavior in Rattle Tool

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 937))

Included in the following conference series:

  • 679 Accesses

Abstract

There are many factors that influence classifiers behavior in machine learning, and thus determining the best classifier is not an easy task. One way of tackling this problem is by experimenting the classifiers with several performance measures. In this paper, the behaviors of machine learning classifiers are experimented using the Rattle tool. Rattle tool is a graphical user interface (GUI) in R package used to carry out data mining modeling using classifiers namely, tree, boost, random forest, support vector machine, logit and neural net. This study was conducted using simulation and real data in which the behaviors of the classifiers are observed based on accuracy, ROC curve and modeling time. Based on the simulation data, there is grouping of the algorithms in terms of accuracy. The first are logit, neural net and support vector machine. The second are boost and random forest and the third is decision tree. Based on the real data, the highest accuracy based on the training data is boost algorithm and based on the testing data the highest accuracy is the neural net algorithm. Overall, the support vector machine and neural net classifier are the two best classifiers in both simulation and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  2. Delgado, M.F., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  3. Williams, G.J.: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9890-3

    Book  MATH  Google Scholar 

  4. Statistics Indonesia: Labor Market Indicators Indonesia, February 2017. https://www.bps.go.id/publication/2017/08/03/60626049b6ad3a897e96b8c0/indikator-pasar-tenaga-kerja-indonesia-februari-2017.html. Accessed 01 Aug 2018

  5. Mutalib, S., Ali, A., Rahman, S.A., Mohamed, A.: An exploratory study in classification methods for patients’ dataset. In: 2nd Conference on Data Mining and Optimization. IEEE (2009)

    Google Scholar 

  6. Ali, A.M., Angelov, P.: Anomalous behaviour detection based on heterogeneous data and data fusion. Soft. Comput. 22(10), 3187–3201 (2018)

    Article  Google Scholar 

  7. Therneau, T., Atkinson, B., Ripley, B.: rpart: recursive partitioning and regression trees. R package version 4.1–11. https://cran.r-project.org/web/packages/rpart/index.html. Accessed 01 Aug 2018

  8. Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

    Google Scholar 

  9. Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y.: xgboost: extreme gradient boosting. R package version 0.6.4.1. https://cran.r-project.org/web/packages/xgboost/index.html. Accessed 01 Aug 2018

  10. Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004). https://www.jstatsoft.org/article/view/v011i09. Accessed 01 Aug 2018

    Article  Google Scholar 

  11. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 01 Aug 2018

  12. Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). https://doi.org/10.1007/978-0-387-21706-2

    Book  MATH  Google Scholar 

Download references

Acknowledgment

The authors are grateful to the Institut Teknologi Sepuluh Nopember that has supported this work partly through the Research Grant contract number 1192/PKS/ITS/2018 (1302/PKS/ITS/2018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wahyu Wibowo .

Editor information

Editors and Affiliations

Appendices

Appendix A. Summary of Results

Replication

Training data accuracy

Tree

Forest

Boost

SVM

Logit

Neural

1

86.9

100

92.8

91.5

91.4

91.5

2

87

100

92.7

91.6

91.5

91.5

3

86.7

100

92.7

91.3

91.3

91.3

4

87

100

92.6

91.3

91.4

91.4

5

86.3

100

92.7

91.5

91.4

91.5

6

86.8

100

92.7

91.4

91.3

91.3

7

85.8

100

92.8

91.5

91.5

91.5

8

86.6

100

92.7

91.4

91.3

91.4

9

86.7

100

92.6

91.4

91.3

91.3

10

86.8

100

92.8

91.6

91.5

91.5

Mean

86.660

100.000

92.710

91.450

91.390

91.420

sd

0.366

0.000

0.074

0.108

0.088

0.092

Replication

Testing data accuracy

Tree

Forest

Boost

SVM

Logit

Neural

1

86.4

90.6

90.8

91

91.2

91.1

2

86.2

90.6

90.6

91

91

91

3

86.3

91

91.2

91.5

91.7

91.7

4

86.6

91.1

91

91.4

91.8

91.7

5

86.2

90.7

90.7

91.1

91.3

91.3

6

86.2

90.6

90.7

91.2

91.3

91.4

7

85.3

90.9

91

91.4

91.5

91.5

8

86.3

90.6

90.7

91.2

91.2

91.2

9

86.8

90.9

91.1

91.4

91.7

91.5

10

86.7

91.1

91

91.3

91.4

91.4

Mean

86.300

90.810

90.880

91.250

91.410

91.380

sd

0.414

0.213

0.204

0.178

0.260

0.235

Replication

Area under curve training data

Tree

Forest

Boost

SVM

Logit

Neural

1

0.8788

1

0.9751

0.9524

0.9661

0.9664

2

0.8740

1

0.9751

0.953

0.9664

0.9666

3

0.8690

1

0.9748

0.952

0.9653

0.9655

4

0.8785

1

0.9747

0.9528

0.9658

0.966

5

0.8673

1

0.9755

0.9547

0.9664

0.9667

6

0.8820

1

0.9749

0.954

0.9657

0.9659

7

0.8636

1

0.9756

0.9538

0.9665

0.9667

8

0.8779

1

0.9752

0.9547

0.9665

0.9667

9

0.8742

1

0.9749

0.9521

0.9658

0.9661

10

0.8733

1

0.975

0.9531

0.9662

0.9664

Mean

0.8739

1

0.9751

0.9533

0.9661

0.9663

sd

0.0058

0

0.0003

0.0010

0.0004

0.0004

Replication

Area under curve testing data

Tree

Forest

Boost

SVM

Logit

Neural

1

0.8697

0.9576

0.9615

0.9497

0.9647

0.9643

2

0.8615

0.9589

0.961

0.9483

0.9642

0.9639

3

0.8631

0.9623

0.9649

0.9548

0.9676

0.9673

4

0.8757

0.9625

0.9646

0.9544

0.968

0.9678

5

0.8597

0.9596

0.962

0.9529

0.9655

0.9654

6

0.8782

0.9597

0.9624

0.9512

0.9655

0.9654

7

0.857

0.9608

0.964

0.9529

0.9668

0.9665

8

0.873

0.9582

0.9603

0.9489

0.9642

0.964

9

0.8782

0.9611

0.9634

0.9512

0.9665

0.9662

10

0.8727

0.9621

0.9642

0.9534

0.9672

0.9669

Mean

0.869

0.960

0.963

0.952

0.966

0.966

sd

0.008

0.002

0.002

0.002

0.001

0.001

Replication

Processing time (sec)

Tree

Forest

Boost

SVM

Logit

Neural

1

4.51

76.8

5.39

504.6

3.2

19.37

2

4.71

81.6

3.22

494.4

2.49

21.41

3

5.14

89.4

3.96

510.6

2.18

19.95

4

4.96

92.4

3.25

513.6

2.9

21.09

5

4.8

90.6

4.69

483.6

2.12

22.9

6

4.55

79.8

3.11

481.8

1.92

21.28

7

4.74

90.6

4.82

496.2

1.98

21.69

8

4.74

83.4

4.58

496.8

1.91

21.74

9

4.99

84

2.79

504.6

1.93

19.06

10

4.8

80.4

3.01

480

2.17

21.26

Mean

4.794

84.900

3.882

496.620

2.280

20.975

sd

0.194

5.450

0.924

11.933

0.447

1.177

Appendix B. ROC Curve of Classifier Real Data

figure a
figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wibowo, W., Abdul-Rahman, S. (2019). An Empirical Study of Classifier Behavior in Rattle Tool. In: Yap, B., Mohamed, A., Berry, M. (eds) Soft Computing in Data Science. SCDS 2018. Communications in Computer and Information Science, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-3441-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-3441-2_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-3440-5

  • Online ISBN: 978-981-13-3441-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics