Label-Noise Robust Logistic Regression and Its Applications

  • Jakramate Bootkrajang
  • Ata Kabán
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)


The classical problem of learning a classifier relies on a set of labelled examples, without ever questioning the correctness of the provided label assignments. However, there is an increasing realisation that labelling errors are not uncommon in real situations. In this paper we consider a label-noise robust version of the logistic regression and multinomial logistic regression classifiers and develop the following contributions: (i) We derive efficient multiplicative updates to estimate the label flipping probabilities, and we give a proof of convergence for our algorithm. (ii) We develop a novel sparsity-promoting regularisation approach which allows us to tackle challenging high dimensional noisy settings. (iii) Finally, we throughly evaluate the performance of our approach in synthetic experiments and we demonstrate several real applications including gene expression analysis, class topology discovery and learning from crowdsourcing data.


Logistic Regression Receiver Operating Characteristic Curve Local Binary Pattern Multinomial Logistic Regression True Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Alon, U., Barkai, N., Notterman, D.A., Gishdagger, K., Ybarradagger, S., Mackdagger, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Barandela, R., Gasca, E.: Decontamination of Training Samples for Supervised Pattern Recognition Methods. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 621–630. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)zbMATHGoogle Scholar
  4. 4.
    Cawley, G.C., Talbot, N.L.C.: Gene selection in cancer classification using sparse logistic regression with bayesian regularization. Bioinformatics/Computer Applications in The Biosciences 22, 2348–2355 (2006)Google Scholar
  5. 5.
    Cawley, G.C., Talbot, N.L.C.: Preventing over-fitting during model selection via bayesian regularisation of the hyper-parameters. J. Mach. Learn. Res. 8, 841–861 (2007)zbMATHGoogle Scholar
  6. 6.
    Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 906–914 (2000)Google Scholar
  7. 7.
    Hausman, J.A., Abrevaya, J., Scott-Morton, F.M.: Misclassification of the dependent variable in a discrete-response setting. Journal of Econometrics 87(2), 239–269 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Hestenes, M.R., Stiefel, E.: Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research of the National Bureau of Standards 49(6), 409–436 (1952)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Jiang, Y., Zhou, Z.-H.: Editing Training Data for kNN Classifiers with Neural Network Ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Kabán, A., Tiňo, P., Girolami, M.: A General Framework for a Principled Hierarchical Visualization of Multivariate Data. In: Yin, H., Allinson, N.M., Freeman, R., Keane, J.A., Hubbard, S. (eds.) IDEAL 2002. LNCS, vol. 2412, pp. 518–523. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Kadota, K., Tominaga, D., Akiyama, Y., Takahashi, K.: Detecting outlying samples in microarray data: A critical assessment of the effect of outliers on sample classification. Chem. Bio. Informatics Journal 3(1), 30–45 (2003)CrossRefGoogle Scholar
  12. 12.
    Krishnan, T., Nandy, S.C.: Efficiency of discriminant analysis when initial samples are classified stochastically. Pattern Recognition 23(5), 529–537 (1990)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Lawrence, N.D., Schölkopf, B.: Estimating a kernel fisher discriminant in the presence of label noise. In: Proceedings of the 18th International Conference on Machine Learning, pp. 306–313. Morgan Kaufmann (2001)Google Scholar
  14. 14.
    Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 556–562. MIT Press (2001)Google Scholar
  15. 15.
    Li, L., Darden, T.A., Weingberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm / k-nearest neighbor method. In: Combinatorial Chemistry and High Throughput Screening, pp. 727–739 (2001)Google Scholar
  16. 16.
    Liu, Z., Jiang, F., Tian, G., Wang, S., Sato, F., Meltzer, S.J., Tan, M.: Sparse logistic regression with lp penalty for biomarker identification. Statistical Applications in Genetics and Molecular Biology 6(1), 6 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, ICCV 1999, vol. 2, pp. 1150–1157. IEEE Computer Society, Washington, DC (1999)Google Scholar
  18. 18.
    Lugosi, G.: Learning with an unreliable teacher. Pattern Recogn. 25, 79–87 (1992)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Mackay, D.J.C.: Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems 6, 469–505 (1995)zbMATHCrossRefGoogle Scholar
  20. 20.
    Magder, L.S., Hughes, J.P.: Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 146(2), 195–203 (1997)CrossRefGoogle Scholar
  21. 21.
    Malossini, A., Blanzieri, E., Ng, R.T.: Detecting potential labeling errors in microarrays by data perturbation. Bioinformatics 22(17), 2114–2121 (2006)CrossRefGoogle Scholar
  22. 22.
    Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)CrossRefGoogle Scholar
  23. 23.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  24. 24.
    Roth, V.: The generalized lasso. IEEE Transactions on Neural Networks 15, 16–28 (2004)CrossRefGoogle Scholar
  25. 25.
    Yasui, Y., Pepe, M., Hsu, L., Adam, B.L., Feng, Z.: Partially supervised learning using an emboosting algorithm. Biometrics 60(1), 199–206 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Zhang, C., Wu, C., Blanzieri, E., Zhou, Y., Wang, Y., Du, W., Liang, Y.: Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model. Bioinformatics 25, 2708–2714 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jakramate Bootkrajang
    • 1
  • Ata Kabán
    • 1
  1. 1.School of Computer ScienceThe University of BirminghamBirminghamUK

Personalised recommendations