Skip to main content

Fast Tree-Based Classification via Homogeneous Clustering

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2019 (IDEAL 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11871))

  • 1607 Accesses

Abstract

Data reduction, achieved by collecting a small subset of representative prototypes from the original patterns, aims at alleviating the computational burden of training a classifier without sacrificing performance. We propose an extension of the Reduction by finding Homogeneous Clusters algorithm, which utilizes the k-means method to propose a set of homogeneous cluster centers as representative prototypes. We propose two new classifiers, which recursively produce homogeneous clusters and achieve higher performance than current homogeneous clustering methods with significant speed up. The key idea is the development of a tree data structure that holds the constructed clusters. Internal tree nodes consist of clustering models, while leaves correspond to homogeneous clusters where the corresponding class label is stored. Classification is performed by simply traversing the tree. The two algorithms differ on the clustering method used to build tree nodes: the first uses k-means while the second applies EM clustering. The proposed algorithms are evaluated on a variety datasets and compared with well-known methods. The results demonstrate very good classification performance combined with large computational savings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  2. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006). https://doi.org/10.1109/TIT.1967.1053964

    Article  MATH  Google Scholar 

  3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  4. Scikit-learn developers: scikit-learn user guide, March 2019. https://Scikit-learn.org

  5. Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012). https://doi.org/10.1109/TPAMI.2011.142

    Article  Google Scholar 

  6. Ougiaroglou, S., Evangelidis, G.: Efficient editing and data abstraction by finding homogeneous clusters. Ann. Math. Artif. Intell. 76(3), 327–349 (2015). https://doi.org/10.1007/s10472-015-9472-8

    Article  MathSciNet  MATH  Google Scholar 

  7. Ougiaroglou, S., Evangelidis, G.: RHC: non-parametric cluster-based data reduction for efficient k-NN classification. Pattern Anal. Appl. 19(1), 93–109 (2016)

    Article  MathSciNet  Google Scholar 

  8. Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. Trans. Sys. Man Cybern. Part C 42(1), 86–100 (2012). https://doi.org/10.1109/TSMCC.2010.2103939

    Article  Google Scholar 

  9. Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008). https://doi.org/10.1007/s10115-007-0114-2

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefanos Ougiaroglou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pardis, G., Diamantaras, K.I., Ougiaroglou, S., Evangelidis, G. (2019). Fast Tree-Based Classification via Homogeneous Clustering. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33607-3_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33606-6

  • Online ISBN: 978-3-030-33607-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics