Advertisement

Time integration and reject options for probabilistic output of pairwise LVQ

  • Johannes BrinkrolfEmail author
  • Barbara Hammer
WSOM 2017
  • 42 Downloads

Abstract

Learning vector quantization (LVQ) constitutes a very popular machine learning technology with applications, for example, in biomedical data analysis, predictive maintenance/quality as well as product individualization. Albeit probabilistic LVQ variants exist, its deterministic counterparts are often preferred due to their better efficiency. The latter do not allow an immediate probabilistic interpretation of its output; hence, a rejection of classification based on confidence values is not possible. In this contribution, we investigate different schemes how to extend and integrate pairwise LVQ schemes to an overall probabilistic output, in comparison with a recent heuristic surrogate measure for the security of the classification, which is directly based on LVQ’s multi-class classification scheme. Furthermore, we propose a canonic way how to fuse these values over a given time window in case a possibly disrupted measurement is taken over a longer time interval to counter the uncertainty of a single point in time. Experimental results indicate that an explicit probabilistic treatment often yields superior results as compared to a standard deterministic LVQ method, but metric learning is able to annul this difference. Fusion over a short time period is beneficial in case of an unclear classification.

Keywords

Prototype-based classifiers Probabilistic output Time integration 

Notes

Acknowledgements

This research has been funded by the Federal Ministry of Education and Research of Germany in the frame of the project ITS.ML (BMBF Grant Number 01IS18041A).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: 21st European symposium on artificial neural networks, ESANN 2013, Bruges, Belgium, April 24–26, 2013. http://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-84.pdf
  2. 2.
    Athalye A, Engstrom L, Ilyas A, Kwok K (2017) Synthesizing robust adversarial examples. CoRR arXiv:abs/1707.07397
  3. 3.
    Bagnall A, Lines J, Bostrom A, Large J, Keogh EJ (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660.  https://doi.org/10.1007/s10618-016-0483-9 MathSciNetCrossRefGoogle Scholar
  4. 4.
    Biehl M, Bunte K, Schleif F, Schneider P, Villmann T (2012) Large margin linear discriminative visualization by matrix relevance learning. In: The 2012 international joint conference on neural networks (IJCNN), Brisbane, Australia, June 10–15, 2012. IEEE, pp 1–8.  https://doi.org/10.1109/IJCNN.2012.6252627
  5. 5.
    Biehl M, Ghosh A, Hammer B (2007) Dynamics and generalization ability of LVQ algorithms. J Mach Learn Res 8:323–360MathSciNetzbMATHGoogle Scholar
  6. 6.
    Biehl M, Hammer B, Villmann T (2016) Prototype-based models in machine learning. WIREs Cognit Sci 7(2):92–111CrossRefGoogle Scholar
  7. 7.
    Brinkrolf J, Hammer B (2017) Probabilistic extension and reject options for pairwise LVQ. In: J. Lamirel, M. Cottrell, M. Olteanu (eds.) 12th international workshop on self-organizing maps and learning vector quantization, clustering and data visualization, WSOM 2017, Nancy, France, June 28–30, 2017, pp. 205–212. IEEE.  https://doi.org/10.1109/WSOM.2017.8020028
  8. 8.
    Brinkrolf J, Hammer B (2018) Interpretable machine learning with reject option. Automatisierungstechnik 66(4):283–290.  https://doi.org/10.1515/auto-2017-0123 CrossRefGoogle Scholar
  9. 9.
    Bunte K, Schneider P, Hammer B, Schleif F, Villmann T, Biehl M (2012) Limited rank matrix learning, discriminative dimension reduction and visualization. Neural Netw 26:159–173.  https://doi.org/10.1016/j.neunet.2011.10.001 CrossRefGoogle Scholar
  10. 10.
    Chow C (1970) On optimum recognition error and reject tradeoff. IEEE Trans Inf Theory 16(1):41–46CrossRefzbMATHGoogle Scholar
  11. 11.
    Denecke A, Wersing H, Steil JJ, Körner E (2009) Online figure-ground segmentation with adaptive metrics in generalized LVQ. Neurocomputing 72(7–9):1470–1482.  https://doi.org/10.1016/j.neucom.2008.11.028 CrossRefGoogle Scholar
  12. 12.
    Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25CrossRefGoogle Scholar
  13. 13.
    Fischer L, Hammer B, Wersing H (2015) Efficient rejection strategies for prototype-based classification. Neurocomputing 169:334–342.  https://doi.org/10.1016/j.neucom.2014.10.092 CrossRefGoogle Scholar
  14. 14.
    Fischer L, Hammer B, Wersing H (2016) Optimal local rejection for classifiers. Neurocomputing 214:445–457.  https://doi.org/10.1016/j.neucom.2016.06.038 CrossRefGoogle Scholar
  15. 15.
    Friedman JH (1996) Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford University. http://www-stat.stanford.edu/~jhf/ftp/poly.ps.Z
  16. 16.
    Gamelas Sousa R, Rocha Neto AR, Cardoso JS, Barreto GA (2015) Robust classification with reject option using the self-organizing map. Neural Comput Appl 26(7):1603–1619.  https://doi.org/10.1007/s00521-015-1822-2 CrossRefGoogle Scholar
  17. 17.
    Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. ArXiv e-printsGoogle Scholar
  18. 18.
    Hammer B, Hofmann D, Schleif FM, Zhu X (2014) Learning vector quantization for (dis-)similarities. Neurocomputing 131:43–51CrossRefGoogle Scholar
  19. 19.
    Hammer B, Strickert M, Villmann T (2004) Relevance LVQ versus SVM. In: Rutkowski L, Siekmann JH, Tadeusiewicz R, Zadeh LA (eds) Proceedings 7th international conference on artificial intelligence and soft computing - ICAISC 2004, Zakopane, Poland, June 7–11, 2004. Lecture notes in computer science, vol. 3070. Springer, pp 592–597.  https://doi.org/10.1007/978-3-540-24844-6_89
  20. 20.
    Hastie T, Tibshirani R (1997) Classification by pairwise coupling. In: Jordan MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems 10, [NIPS Conference, Denver, Colorado, USA, 1997]. The MIT Press, pp 507–513. http://papers.nips.cc/paper/1375-classification-by-pairwise-coupling
  21. 21.
    Herbei R, Wegkamp MH (2006) Classification with reject option. Can J Stat 34(4):709–721MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554.  https://doi.org/10.1109/34.291440 CrossRefGoogle Scholar
  23. 23.
    Jiménez JG, Monroy JG, Blanco J (2011) The multi-chamber electronic nose—an improved olfaction sensor for mobile robotics. Sensors 11(6):6145–6164.  https://doi.org/10.3390/s110606145 CrossRefGoogle Scholar
  24. 24.
    Kirstein S, Wersing H, Gross H, Körner E (2012) A life-long learning vector quantization approach for interactive learning of multiple categories. Neural Netw 28:90–105.  https://doi.org/10.1016/j.neunet.2011.12.003 CrossRefGoogle Scholar
  25. 25.
    Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 22 Mar 2017
  26. 26.
    Losing V, Hammer B, Wersing H (2015) Interactive online learning for obstacle classification on a mobile robot. In: 2015 International joint conference on neural networks, IJCNN 2015, Killarney, Ireland, July 12–17, 2015, pp 1–8.  https://doi.org/10.1109/IJCNN.2015.7280610
  27. 27.
    Losing V, Hammer B, Wersing H (2016) Choosing the best algorithm for an incremental on-line learning task. In: ESANNGoogle Scholar
  28. 28.
    van der Maaten L (2013) Matlab toolbox for dimensionality reduction. Tilburg University, TilburgGoogle Scholar
  29. 29.
    Mäntysalo J, Torkkola K, Kohonen T (1992) LVQ-based speech recognition with high-dimensional context vectors. In: The second international conference on spoken language processing, ICSLP 1992, Banff, Alberta, Canada, October 13–16, 1992. http://www.isca-speech.org/archive/icslp_1992/i92_0539.html
  30. 30.
    Merényi E, Farrand WH, Taranik JV, Minor TB (2014) Classification of hyperspectral imagery with neural networks: comparison to conventional tools. EURASIP J Adv Sig Proc 2014:71.  https://doi.org/10.1186/1687-6180-2014-71 CrossRefGoogle Scholar
  31. 31.
    Mori U, Mendiburu A, Keogh EJ, Lozano JA (2017) Reliable early classification of time series based on discriminating the classes over time. Data Min Knowl Discov 31(1):233–263.  https://doi.org/10.1007/s10618-016-0462-1 MathSciNetCrossRefGoogle Scholar
  32. 32.
    Mukherjee G, Bhanot G, Raines K, Sastry S, Doniach S, Biehl M (2016) Predicting recurrence in clear cell renal cell carcinoma: Analysis of TCGA data using outlier analysis and generalized matrix LVQ. In: IEEE congress on evolutionary computation, CEC 2016, Vancouver, BC, Canada, July 24–29, 2016, pp 656–661.  https://doi.org/10.1109/CEC.2016.7743855
  33. 33.
    Nadeem MSA, Zucker J, Hanczar B (2010) Accuracy-rejection curves (arcs) for comparing classification methods with a reject option. In: Dzeroski S, Geurts P, Rousu J (eds) JMLR Proceedings of the third international workshop on machine learning in systems biology, MLSB 2009, Ljubljana, Slovenia, September 5–6, 2009, vol. 8. JMLR.org., pp 65–81. http://www.jmlr.org/proceedings/papers/v8/nadeem10a.html
  34. 34.
    Nene SA, Nayar SK, Murase H (1996) Columbia object image library (coil-20). Technical reportGoogle Scholar
  35. 35.
    Paaßen B, Mokbel B, Hammer B (2016) Adaptive structure metrics for automated feedback provision in intelligent tutoring systems. Neurocomputing 192:3–13.  https://doi.org/10.1016/j.neucom.2015.12.108 CrossRefGoogle Scholar
  36. 36.
    Pillai I, Fumera G, Roli F (2013) Multi-label classification with a reject option. Pattern Recognit 46(8):2256–2266.  https://doi.org/10.1016/j.patcog.2013.01.035 CrossRefGoogle Scholar
  37. 37.
    Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ (ed) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74Google Scholar
  38. 38.
    Price D, Knerr S, Personnaz L, Dreyfus G (1994) Pairwise neural network classifiers with probabilistic outputs. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems 7, [NIPS Conference, Denver, Colorado, USA, 1994]. MIT Press, pp 1109–1116http://papers.nips.cc/paper/883-pairwise-neural-network-classifiers-with-probabilistic-outputs
  39. 39.
    Rodríguez-Fdez I, Canosa A, Mucientes M, Bugarín A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: Proceedings of the 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE)Google Scholar
  40. 40.
    Sato A, Yamada K (1995) Generalized learning vector quantization. In: Touretzky DS, Mozer M, Hasselmo ME (eds) Advances in neural information processing systems 8, NIPS, Denver, CO, November 27–30. MIT Press, pp 423–429. http://papers.nips.cc/paper/1113-generalized-learning-vector-quantization
  41. 41.
    Schneider P, Biehl B, Hammer B (2010) Hyperparameter learning in probabilistic prototype-based models. Neurocomputing 73(7–9):1117–1124CrossRefGoogle Scholar
  42. 42.
    Schneider P, Biehl M, Hammer B (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21(12):3532–3561.  https://doi.org/10.1162/neco.2009.11-08-908 MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Schulz A, Mokbel B, Biehl M, Hammer B ((2015)) Inferring feature relevances from metric learning. In: IEEE symposium series on computational intelligence, SSCI 2015, Cape Town, South Africa, December 7-10, 2015. IEEE, pp 1599–1606.  https://doi.org/10.1109/SSCI.2015.225
  44. 44.
    Seo S, Obermayer K (2003) Soft learning vector quantization. Neural Comput 15(7):1589–1604.  https://doi.org/10.1162/089976603321891819 CrossRefzbMATHGoogle Scholar
  45. 45.
    Su J, Vasconcellos Vargas D, Kouichi S (2017) One pixel attack for fooling deep neural networks. ArXiv e-printsGoogle Scholar
  46. 46.
    Varshney KR, Alemzadeh H (2016) On the safety of machine learning: cyber-physical systems, decision sciences, and data products. CoRR arXiv:abs/1610.01256
  47. 47.
    Villmann T, Kaden M, Bohnsack A, Villmann J, Drogies T, Saralajew S, Hammer B (2016) Self-adjusting reject options in prototype based classification. In: Advances in self-organizing maps and learning vector quantization—proceedings of the 11th international workshop WSOM 2016, Houston, Texas, USA, January 6–8, 2016, pp 269–279.  https://doi.org/10.1007/978-3-319-28518-4_24
  48. 48.
    Villmann T, Kästner M, Backhaus A, Seiffert U (2013) Processing hyperspectral data in machine learning. In: 21st European symposium on artificial neural networks, ESANN 2013, Bruges, Belgium, April 24–26, 2013. http://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-9.pdf
  49. 49.
    Villmann T, Merényi E, Hammer B (2003) Neural maps in remote sensing image analysis. Neural Netw 16(3–4):389–403.  https://doi.org/10.1016/S0893-6080(03)00021-2 CrossRefGoogle Scholar
  50. 50.
    de Vries G, Pauws SC, Biehl M (2015) Insightful stress detection from physiology modalities using learning vector quantization. Neurocomputing 151:873–882.  https://doi.org/10.1016/j.neucom.2014.10.008 CrossRefGoogle Scholar
  51. 51.
    Wu T, Lin C, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005MathSciNetzbMATHGoogle Scholar
  52. 52.
    Yuan M, Wegkamp M (2010) Classification methods with reject option based on convex risk minimization. J Mach Learn Res 11:111–130MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Machine Learning GroupBielefeld UniversityBielefeldGermany
  2. 2.Research Institute for Cognition and Robotics (CoR-Lab)Bielefeld UniversityBielefeldGermany

Personalised recommendations