Skip to main content

Rare-Events Classification: An Approach Based on Genetic Algorithm and Voronoi Tessellation

  • Conference paper
  • First Online:
  • 1251 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11154))

Abstract

Classification is a major constituent of the data mining tool kit. Well-known methods for classification are either built on the principle of logic or on statistical reasoning. For imbalanced and noisy cases, classification may however fail to deliver on basic data mining goals, i.e., identifying statistical dependencies in data. In this article, we propose a novel strategy for data mining based on partitioning of the feature space through Voronoi tessellation and Genetic Algorithm, where the latter is applied to solve a combinatorial optimization problem. We apply the suggested methodology to a range of classification problems of varying imbalance and noise and compare the performance of the suggested method with well-known classification methods such as (SVM, KNN, and ANN). The results obtained indicate the proposed methodology to be well suited for data mining tasks in case of highly imbalanced classes and significant noise.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agresti, A., Coull, B.A.: Approximate is better than exact for interval estimation of binomial proportions. Am. Stat. 52(2), 119–126 (1998)

    MathSciNet  Google Scholar 

  2. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees (1984)

    Google Scholar 

  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  4. Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4), 404–413 (1934)

    Article  Google Scholar 

  5. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  6. Fidelis, M.V., Lopes, H.S., Freitas, A.A.: Discovering comprehensible classification rules with a genetic algorithm. In: Proceedings of the 2000 Congress on Evolutionary Computation, vol. 1, pp. 805–810. IEEE (2000)

    Google Scholar 

  7. Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)

    Article  MathSciNet  Google Scholar 

  8. Khan, A.R., Schioler, H., Knudsen, T., Kulahci, M.: Statistical data mining for efficient quality control in manufacturing. In: 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA), pp. 1–4. IEEE (2015)

    Google Scholar 

  9. Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)

    Article  Google Scholar 

  10. Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007)

    Google Scholar 

  11. Lee, D.-T., Schachter, B.J.: Two algorithms for constructing a delaunay triangulation. Int. J. Comput. Inf. Sci. 9(3), 219–242 (1980)

    Article  MathSciNet  Google Scholar 

  12. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014141

    Chapter  Google Scholar 

  13. Niuniu, X., Yuxun, L.: Review of decision trees. In: 2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), pp. 105–109 (2010)

    Google Scholar 

  14. Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn. Technol. (2011)

    Google Scholar 

  15. Quinlan, J.: Programs for Machine Learning (1993)

    Google Scholar 

  16. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  17. Rastogi, R., Shim, K.: PUBLIC: a decision tree classifier that integrates building and pruning. VLDB 98, 24–27 (1998)

    MATH  Google Scholar 

  18. Ripley, B., Venables, W.: Package class. CRAN R Project (2015)

    Google Scholar 

  19. Rosenblatt, F.: Principles of Neurodynamics (1962)

    Google Scholar 

  20. Scholkopft, B., Mullert, K.-R.: Fisher discriminant analysis with kernels. Neural Netw. Signal Process. IX 1(1), 1 (1999)

    Google Scholar 

  21. Vladimir, V.N., Vapnik, V.: The Nature of Statistical Learning Theory (1995)

    Google Scholar 

  22. Wan, E.A.: Neural network classification: a bayesian interpretation. IEEE Trans. Neural Netw./A Publ. IEEE Neural Netw. Counc. 1(4), 303–305 (1989)

    Article  Google Scholar 

  23. Williams, D.R.G.H.R., Hinton, G.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Rauf Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khan, A.R., Schiøler, H., Zaki, M., Kulahci, M. (2018). Rare-Events Classification: An Approach Based on Genetic Algorithm and Voronoi Tessellation. In: Ganji, M., Rashidi, L., Fung, B., Wang, C. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 11154. Springer, Cham. https://doi.org/10.1007/978-3-030-04503-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04503-6_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04502-9

  • Online ISBN: 978-3-030-04503-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics