Representative Prototype Sets for Data Characterization and Classification

Lausser, Ludwig; Müssel, Christoph; Kestler, Hans A.

doi:10.1007/978-3-642-33212-8_4

Ludwig Lausser²²,
Christoph Müssel²² &
Hans A. Kestler²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7477))

Included in the following conference series:

IAPR Workshop on Artificial Neural Networks in Pattern Recognition

1315 Accesses
3 Citations

Abstract

Common classifier models are designed to achieve high accuracies, while often neglecting the question of interpretability. In particular, most classifiers do not allow for drawing conclusions on the structure and quality of the underlying training data. By keeping the classifier model simple, an intuitive interpretation of the model and the corresponding training data is possible. A lack of accuracy of such simple models can be compensated by accumulating the decisions of several classifiers. We propose an approach that is particularly suitable for high-dimensional data sets of low cardinality, such as data gained from high-throughput biomolecular experiments. Here, simple base classifiers are obtained by choosing one data point of each class as a prototype for nearest neighbour classification. By enumerating all such classifiers for a specific data set, one can obtain a systematic description of the data structure in terms of class coherence. We also investigate the performance of the classifiers in cross-validation experiments by applying stand-alone prototype classifiers as well as ensembles of selected prototype classifiers.

Download to read the full chapter text

Chapter PDF

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Chameleon: A Python Workflow Toolkit for Feature Selection

The δ-Machine: Classification Based on Distances Towards Prototypes

Article Open access 22 August 2019

Beibei Yuan, Willem Heiser & Mark de Rooij

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Fix, E., Hodges, J.: Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Technical Report Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolf Field, Texas (1951)
Google Scholar
Kohonen, T.: Learning vector quantization. Neural Networks 1, 303 (1988)
Article Google Scholar
Kohonen, T.: Learning vector quantization. In: Arbib, M. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 537–540. MIT Press, Cambridge (1995)
Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS 99(10), 6567–6572 (2002)
Article Google Scholar
Kuncheva, L., Bezdek, J.: Nearest prototype classification: Clustering, genetic algorithms or random search? IEEE Transactions on Systems, Man, and Cybernetics C28(1), 160–164 (1998)
Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14, 515–516 (1968)
Article Google Scholar
Kuncheva, L.: Fitness functions in editing k-nn reference set by genetic algorithms. Pattern Recognition 30(6), 1041–1049 (1997)
Article Google Scholar
Gil-Pita, R., Yao, X.: Using a Genetic Algorithm for Editing k-Nearest Neighbor Classifiers. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 1141–1150. Springer, Heidelberg (2007)
Chapter Google Scholar
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery 6(2), 153–172 (2002)
Article MathSciNet MATH Google Scholar
Dasarathy, B.: Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press (1991)
Google Scholar
Littlestone, N., Warmuth, M.: Relating data compression and learnability (1986) (unpublished manuscript)
Google Scholar
Langford, J.: Tutorial on practical prediction theory for classification. Journal of Machine Learning Research 6, 273–306 (2005)
MathSciNet MATH Google Scholar
Yule, G.: On the association of attributes in statistics: With illustrations from the material of the childhood society. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 194, 257–319 (1900)
Article MATH Google Scholar
Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A., Sampas, N., Dougherty, E., Wang, E., Marincola, F., Gooden, C., Lueders, J., Glatfelter, A., Pollock, P., Carpten, J., Gillanders, E., Leja, D., Dietrich, K., Beaudry, C., Berens, M., Alberts, D., Sondak, V.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795), 536–540 (2000)
Article Google Scholar
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)
Article MathSciNet MATH Google Scholar
Notterman, D., Alon, U., Sierk, A.J., Levine, A.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61(7), 3124–3130 (2001)
Google Scholar
Pomeroy, S., Tamayo, P., Gaasenbeek, M., Sturla, L., Angelo, M., McLaughlin, M., Kim, J., Goumnerova, L., Black, P., Lau, C., Allen, J., Zagzag, D., Olson, J., Curran, T., Wetmore, C., Biegel, J., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D., Mesirov, J., Lander, E., Golub, T.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)
Article Google Scholar
Shipp, M., Ross, K., Tamayo, P., Weng, A., Kutok, J., Aguiar, R., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G., Ray, T., Koval, M., Last, K., Norton, A., Lister, T., Mesirov, J., Neuberg, D., Lander, E., Aster, J., Golub, T.: Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine 8(1), 68–74 (2002)
Article Google Scholar
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.J., Marks, J.R., Nevins, J.R.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS 98(20), 11462–11467 (2001)
Article Google Scholar
Wolpert, D.: The lack of a priori distinctions between learning algorithms. Neural Computation 8(7), 1341–1390 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research Group Bioinformatics and Systems Biology, Institute of Neural Information Processing, Ulm University, 89069, Ulm, Germany
Ludwig Lausser, Christoph Müssel & Hans A. Kestler

Authors

Ludwig Lausser
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Müssel
View author publications
You can also search for this author in PubMed Google Scholar
Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fondazione Bruno Kessler (FBK), 38123, Trento, Italy
Nadia Mana
Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Friedhelm Schwenker
Dipartimento di Ingegneria dell’Informazione, Università di Siena, 53100, Siena, Italy
Edmondo Trentin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lausser, L., Müssel, C., Kestler, H.A. (2012). Representative Prototype Sets for Data Characterization and Classification. In: Mana, N., Schwenker, F., Trentin, E. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2012. Lecture Notes in Computer Science(), vol 7477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33212-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-33212-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33211-1
Online ISBN: 978-3-642-33212-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Representative Prototype Sets for Data Characterization and Classification

Abstract

Chapter PDF

Similar content being viewed by others

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Chameleon: A Python Workflow Toolkit for Feature Selection

The δ-Machine: Classification Based on Distances Towards Prototypes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Representative Prototype Sets for Data Characterization and Classification

Abstract

Chapter PDF

Similar content being viewed by others

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Chameleon: A Python Workflow Toolkit for Feature Selection

The δ-Machine: Classification Based on Distances Towards Prototypes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation