Advertisement

Fast Tree-Based Classification via Homogeneous Clustering

  • George Pardis
  • Konstantinos I. Diamantaras
  • Stefanos OugiaroglouEmail author
  • Georgios Evangelidis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11871)

Abstract

Data reduction, achieved by collecting a small subset of representative prototypes from the original patterns, aims at alleviating the computational burden of training a classifier without sacrificing performance. We propose an extension of the Reduction by finding Homogeneous Clusters algorithm, which utilizes the k-means method to propose a set of homogeneous cluster centers as representative prototypes. We propose two new classifiers, which recursively produce homogeneous clusters and achieve higher performance than current homogeneous clustering methods with significant speed up. The key idea is the development of a tree data structure that holds the constructed clusters. Internal tree nodes consist of clustering models, while leaves correspond to homogeneous clusters where the corresponding class label is stored. Classification is performed by simply traversing the tree. The two algorithms differ on the clustering method used to build tree nodes: the first uses k-means while the second applies EM clustering. The proposed algorithms are evaluated on a variety datasets and compared with well-known methods. The results demonstrate very good classification performance combined with large computational savings.

Keywords

Classification k-means EM Prototype generation 

References

  1. 1.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  2. 2.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006).  https://doi.org/10.1109/TIT.1967.1053964CrossRefzbMATHGoogle Scholar
  3. 3.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Scikit-learn developers: scikit-learn user guide, March 2019. https://Scikit-learn.org
  5. 5.
    Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012).  https://doi.org/10.1109/TPAMI.2011.142CrossRefGoogle Scholar
  6. 6.
    Ougiaroglou, S., Evangelidis, G.: Efficient editing and data abstraction by finding homogeneous clusters. Ann. Math. Artif. Intell. 76(3), 327–349 (2015).  https://doi.org/10.1007/s10472-015-9472-8MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Ougiaroglou, S., Evangelidis, G.: RHC: non-parametric cluster-based data reduction for efficient k-NN classification. Pattern Anal. Appl. 19(1), 93–109 (2016)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. Trans. Sys. Man Cybern. Part C 42(1), 86–100 (2012).  https://doi.org/10.1109/TSMCC.2010.2103939CrossRefGoogle Scholar
  9. 9.
    Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008).  https://doi.org/10.1007/s10115-007-0114-2CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • George Pardis
    • 1
  • Konstantinos I. Diamantaras
    • 1
  • Stefanos Ougiaroglou
    • 1
    • 2
    Email author
  • Georgios Evangelidis
    • 2
  1. 1.Department of Information and Electronic EngineeringInternational Hellenic UniversitySindos, ThessalonikiGreece
  2. 2.Department of Applied Informatics, School of Information SciencesUniversity of MacedoniaThessalonikiGreece

Personalised recommendations