Robustness Analysis of Eleven Linear Classifiers in Extremely High–Dimensional Feature Spaces

  • Ludwig Lausser
  • Hans A. Kestler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5998)


In this study we address the linear classification of noisy high-dimensional data in a two class scenario. We assume that the cardinality of the data is much lower than its dimensionality. The problem of classification in this setting is intensified in the presence of noise. Eleven linear classifiers were compared on two-thousand-one-hundred-and-fifty artificial datasets from four different experimental setups, and five real world gene expression profile datasets, in terms of classification accuracy and robustness. We specifically focus on linear classifiers as the use of more complex concept classes would make over-adaptation even more likely. Classification accuracy is measured by mean error rate and mean rank of error rate. These criteria place two large margin classifiers, SVM and ALMA, and an online classification algorithm called PA at the top, with PA being statistically different from SVM on the artificial data. Surprisingly, these algorithms also outperformed statistically significant all classifiers investigated with dimensionality reduction.


Support Vector Machine Linear Discriminant Analysis Robustness Analysis Error Curve Fisher Linear Discriminant Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. Chapman & Hall/CRC (1984)Google Scholar
  2. 2.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  3. 3.
    Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Heidelberg (1996)Google Scholar
  4. 4.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)zbMATHGoogle Scholar
  5. 5.
    Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2(6), 559–572 (1901)Google Scholar
  6. 6.
    Ans, B., Hérault, J., Jutten, C.: Adaptive neural architectures: Detection of primitives. In: Proceedings of COGNITIVA 1985, pp. 593–597 (1985)Google Scholar
  7. 7.
    Lu, J., Plataniotis, K., Venetsanopoulos, A.: Face recognition using LDA-based algorithms. IEEE Transactions on Neural Networks 14(1), 195–200 (2003)CrossRefGoogle Scholar
  8. 8.
    Zolnay, A., Kocharov, D., Schlüter, R., Ney, H.: Using multiple acoustic feature sets for speech recognition. Speech Commun. 49(6), 514–525 (2007)CrossRefGoogle Scholar
  9. 9.
    Buchholz, M., Kestler, H.A., Bauer, A., et al.: Specialized DNA arrays for the differentiation of pancreatic tumors. Clinical Cancer Research 11(22), 8048–8054 (2005)CrossRefGoogle Scholar
  10. 10.
    Cover, T.M.: Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition. IEEE Transactions on Electronic Computers 14(3), 326–334 (1965)zbMATHCrossRefGoogle Scholar
  11. 11.
    Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS 99(10), 6567–6572 (2002)CrossRefGoogle Scholar
  12. 12.
    Bhattacharyya, C., Grate, L.R., Rizki, A., et al.: Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data. Signal Process. 83(4), 729–743 (2003)zbMATHCrossRefGoogle Scholar
  13. 13.
    Veenman, C.J., Tax, D.M.: Less: A model-based classifier for sparse subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(9), 1496–1500 (2005)CrossRefGoogle Scholar
  14. 14.
    Rosenblatt, F.: The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psych. Rev. 65(6), 386–407 (1958)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Gentile, C.: A new approximate maximal margin classification algorithm. Journal of Machine Learning Research 2 (2001)Google Scholar
  16. 16.
    Li, Y., Long, P.M.: The Relaxed Online Maximum Margin Algorithm. Machine Learning 46(1-3), 361–387 (2002)zbMATHCrossRefGoogle Scholar
  17. 17.
    Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online Passive-Aggressive Algorithms. Journal of Machine Learning Research 7, 551–585 (2006)MathSciNetGoogle Scholar
  18. 18.
    Bittner, M., Meltzer, P., Chen, Y., et al.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795), 536–540 (2000)CrossRefGoogle Scholar
  19. 19.
    Golub, T., Slonim, D., Tamayo, P., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  20. 20.
    Notterman, D., Alon, U., Sierk, A., Levine, A.: Transcriptional Gene Expression Profiles of Colorectal Adenoma, Adenocarcinoma, and Normal Tissue Examined by Oligonucleotide Arrays. Cancer Research 61(7), 3124–3130 (2001)Google Scholar
  21. 21.
    West, M., Blanchette, C., Dressman, H., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20), 11462–11467 (2001)CrossRefGoogle Scholar
  22. 22.
    Raudys, S., Duin, R.: Expected classification error of the fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recognition Letters 19(5), 385–392 (1998)zbMATHCrossRefGoogle Scholar
  23. 23.
    Dougherty, E.R.: Feature-selection overfitting with small-sample classifier design. IEEE Intelligent Systems 20(6), 64–66 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ludwig Lausser
    • 1
  • Hans A. Kestler
    • 1
    • 2
  1. 1.Department of Internal Medicine IUniversity Hospital UlmGermany
  2. 2.Institute of Neural Information ProcessingUniversity of UlmGermany

Personalised recommendations