Advertisement

An Efficient Strategy to Handle Complex Datasets Having Multimodal Distribution

  • Samira GhodratnamaEmail author
  • Reza Boostani
Part of the Emergence, Complexity and Computation book series (ECC, volume 14)

Abstract

One of the main shortcomings of the conventional classifiers is appeared when facing with datasets having multimodal distribution. To overcome this drawback, here, an efficient strategy is proposed in which a clustering phase is firstly executed over all class samples to partition the feature space into separate subspaces (clusters). Since in clustering label of samples are not considered, each cluster contains impure samples belonging to different classes. The next phase is to apply a classifier to each of the created clusters. The main advantage of this proposed distributed approach is to simplify a complex pattern recognition problem by training a specific classifier for each subspace. It is expected applying an efficient classifier to a local cluster leads to better results compared to apply it to several scattered clusters. In the validation and test phases, before make a decision about which classifier should be applied, we should find the nearest cluster to the input sample and then utilize the corresponding trained classifier. Experimental results over different UCI datasets demonstrate a significant supremacy of the proposed distributed classifier system in comparison with single classifier approaches.

Keywords

Distributed classifiers classifier ensembles subspace classification distributed learning complex systems 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Breiman, L.: Bagging predictors. Journal of Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Schapiro, R.E.: The Strength of Weak Learnability. Journal of Machine Learning 5(2), 197–227 (1990)Google Scholar
  3. 3.
    Wolpert, D.H.: Stacked Generalization. Journal of Neural Networks 5(2), 241–259 (1992)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Breiman, L.: Stacked Regression. Journal of Machine Learning 24(1), 49–64 (1996)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Smyth, P., Wolpert, D.: Linearly Combining Density Estimators via Stacking. Journal of Machine Learning 36(1), 59–83 (1999)CrossRefGoogle Scholar
  6. 6.
    Lazarevic, A., Obradovic, Z.: Boosting Algorithms for Parallel and Distributed Learning. Journal of Distributed and Parallel Databases 11(2), 101–229 (2002)CrossRefGoogle Scholar
  7. 7.
    Parimala, M., Lopez, D., Senthilkumar, N.C.: A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases. International Journal of Advanced Science and Technology 31 (2011)Google Scholar
  8. 8.
    Nagpal, P., Mann, P.: Comparative Study of Density based Clustering Algorithms. International Journal of Computer Applications 27(11), 421–435 (2011)Google Scholar
  9. 9.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96(34), 226–231 (1996)Google Scholar
  10. 10.
    Ankrest, M., Breunig, M., Kriegel, H., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: International Conference on Management of Data, pp. 49–60 (1999)Google Scholar
  11. 11.
    Xiaowei, X., Jägerand, J., Kriegel, H.P.: A fast parallel clustering algorithm - for large spatial databases. Journal of Data Mining and Knowledge Discovery 3, 263–290 (1999)CrossRefGoogle Scholar
  12. 12.
    Hinneburg, A., Keim, D.A.: A General Approach to Clustering in Large Databases with Noise. Journal of Knowledge and Information Systems (KAIS) 5(4), 387–415 (2003)CrossRefGoogle Scholar
  13. 13.
    Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml Google Scholar
  14. 14.
    Cai, D., Zhang, C., He, X.: Unsupervised Feature Selection for Multi-cluster Data. In: 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2010) (July 2010)Google Scholar
  15. 15.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the Gap statistic. Journal of the Royal Statistical Society 63(2), 411–423 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Thorndike, R.L.: Who Belongs in the Family? Journal of Psychometrika 18, 267–276 (1953)CrossRefGoogle Scholar
  17. 17.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20(1), 53–65 (1987)CrossRefzbMATHGoogle Scholar
  18. 18.
    Tibshirani, R., Walther, G.: Cluster Validation by Prediction Strength. Journal of Computational and Graphical Statistics 14(3), 511–528 (2005)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Entezari-Maleki, R., Rezaei, A., Minaei-Bidgoli, B.: Comparison of Classification Methods Based on the Type of Attributes and Sample Size. Journal of Convergence Information Technology (JCIT) 4(3), 94–102 (2009)CrossRefGoogle Scholar
  20. 20.
    Friedman, M.: A correction: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32(200), 675–701 (1937)CrossRefGoogle Scholar
  21. 21.
    Ting, K., Zhu, L., Wells, J.R.: Local Models—The Key to Boosting Stable Learners Successfully. Journal of Computational Intelligence 29(2), 331–356 (2013)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Electrical and Computer EngineeringShiraz UniversityShirazIran

Personalised recommendations