Advertisement

Apache Spark Implementation of the Distance-Based Kernel-Based Fuzzy C-Means Clustering Classifier

  • Joanna Jȩdrzejowicz
  • Piotr Jȩdrzejowicz
  • Izabela WierzbowskaEmail author
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 56)

Abstract

The paper presents an implementation of a classification algorithm based on Kernel-based fuzzy C-means clustering. The algorithm is implemented in Apache Spark environment, and it is based on Resilient Distributed Datasets (RDDs) and RDD actions and transformations. The choice allows for parallel data manipulation.

Keywords

Clustering Classification Fuzzy C-means 

References

  1. 1.
    Apache Spark website. http://spark.apache.org/
  2. 2.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science (2007). http://archive.ics.uci.edu/ml/
  3. 3.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inform. Theory IT-13, 21–27 (1967)Google Scholar
  4. 4.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001)zbMATHGoogle Scholar
  5. 5.
    Jȩdrzejowicz, J., Jȩdrzejowicz, P.: Online classifiers based on fuzzy c-means clustering. In: Badica, C., Nguyen, N.T., Brezovan, M. (eds.) Computational Collective Intelligence. Technologies and Applications, LNAI 8083, pp. 427–436. Springer, Berlin, Heidelberg (2013)Google Scholar
  6. 6.
    Jȩdrzejowicz, J., Jȩdrzejowicz, P.: A family of the on-line distance-based classifiers. In: Nguyen, N.T. et al. (eds.) Intelligent Information and Data-base Systems, LNAI 8398 Part II, pp. 177–186. Springer, Cham, Heidelberg. New York (2014)Google Scholar
  7. 7.
    Jȩdrzejowicz, J., Jȩdrzejowicz, P.: Distance-based ensemble online classifier with kernel clustering. In: Neves-Silva, R., Jain L.C., Howlett, R.J. (eds.), Intelligent Decision Technologies. Smart Innovation, Systems and Technologies, vol. 39, pp. 279–290. Springer (2015)Google Scholar
  8. 8.
    Jȩdrzejowicz J., Jȩdrzejowicz P.: A hybrid distance-based and naive bayes online classifier. In: Nnez, M., Nguyen, N.T., Camacho, D., Trawiski, B. (eds.) Computational Collective Intelligence: 7th International Conference, ICCCI 2015, Proceedings, pt. II. Madrid, Spain, 21–23 Sept 2015Google Scholar
  9. 9.
    Mitchell T.: Machine Learning. McGraw-Hill (1997)Google Scholar
  10. 10.
    Sparks’ machine learning library. http://spark.apache.org/docs/latest/mllib-guide.html
  11. 11.
    Zhang, D., Chen, S.: Fuzzy clustering using kernel method. In: Proceedings of the International Conference on Control and Automation ICCA, pp. 162–163. Xiamen, China (2003)Google Scholar
  12. 12.
    Ẑliobaite, I.: Combining similarity in time and space for training set formation under concept drift. Intell. Data Anal. 15(4), 589–611 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Joanna Jȩdrzejowicz
    • 1
  • Piotr Jȩdrzejowicz
    • 2
  • Izabela Wierzbowska
    • 2
    Email author
  1. 1.Institute of Informatics, Gdańsk UniversityGdańskPoland
  2. 2.Department of Information SystemsGdynia Maritime UniversityGdyniaPoland

Personalised recommendations