Correcting the Training Data

Barandela, Ricardo; Gasca, Eduardo; Alejo, Roberto

doi:10.1007/978-1-4613-0231-5_1

Ricardo Barandela^5,6,
Eduardo Gasca⁵ &
Roberto Alejo⁵

Part of the book series: Combinatorial Optimization ((COOP,volume 13))

454 Accesses
3 Citations

Abstract

Traditionally, learning algorithms and pattern recognition methods have been sorted into two broad groups: supervised and unsupervised (predictive and informative in Data Mining terminology) whether training data is available or not. Supervised classifier design is based on the information supplied by a training sample (TS): a set of training patterns, instances or prototypes that are assumed to represent all the relevant classes and to bear correct class labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.: 1992, ‘Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms’. International Journal of Man-Machine Studies 36, 267–287.
Article Google Scholar
Barandela, R.: 1987, ‘The NN Rule: An Empirical Study of its Methodological Aspects’. Ph.D. thesis, Unpublished.
Google Scholar
Barandela, R.: 1990a, ‘La Regla NN con Muestras de Entrenamiento No Balanceadas’. Investigación Operacional X, 45–56.
Google Scholar
Barandela, R.: 1990b, ‘Métodos de Reconocimiento de Patrones en la Solución de Tareas Geologo- Geofísicas’. Ciencias de la Tierra y el Espacio 19, 1–7.
Google Scholar
Barandela, R.: 1995, ‘Una Metodología para el Reconocimiento de Patrones en Tareas Geologo-Geofísicas’. Geofisica Internacional 34(4), 399–405.
Google Scholar
Barandela, R. and E. Castellanos: 1996, ‘La Regla NN para la Interpretación de Imàgenes de Percepción Remota’. In: Tercer Taller Iberoamericano Geociencias e Informatica. La Habana.
Google Scholar
Barandela, R., N. Cortes, and A. Palacios: 2001, ‘The Nearest Neighbor Rule and the Reduction of the Training Sample Size’. In: Proceedings of the 9th Spanish Symposium on Pattern Recognition and Image Analysis. pp. 103–108.
Google Scholar
Barandela, R. and M. Juàrez: 2001, ‘Ongoing Learning for Supervised Pattern Recognition’. In: Proceedings of the 4th Brazilian Symposium on Computer Graphics and Image Processing. pp. 41–50.
Google Scholar
Bolstad, P. and T. Lillesand: 1991, ‘Semi-Automated Training Approaches for Spectral Class Definition’. International Journal of Remote Sensing 13 (16), 3157–3168.
Article Google Scholar
Brodley, C. and M. Friedl: 1999, ‘Identifying Mislabeled Training Data’. Journal of Artificial Intelligence Research 11, 131–167.
MATH Google Scholar
Chitinenni, C.: 1979, ‘Learning with Imperfectly Labeled Patterns’. In: Proceedings Conference on Pattern Recognition and Image Processing. Chicago.
Google Scholar
Congalton, R. and K. Green: 1999, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices. Boca Raton: CRC Press.
Google Scholar
Dasarathy, B.: 1979, ‘All You Need to Know About the Neighbors’. In: Proceedings International Conference on Cybernetics and Society. Denver.
Google Scholar
Dasarathy, B.: 1993, ‘Is Your Near Enough Neighbor Friendly Enough? Recognition in Partially Exposed Fuzzy Learning Environments’. In: Proceedings of the North American Fuzzy Information Processing Society.
Google Scholar
Dasarathy, B. and B. Sheela: 1979, ‘Design of Composite Classifier Systems in Imperfectly Supervised Environments’. In: Proceedings of the IEEE Computer Society on Pattern Recognition and Image Processing. Chicago.
Google Scholar
Denouex, T.: 1995, ‘A K-Nearest Neighbor Classification Rule Based on Dempster-Shafer Theory’. IEEE Transactions on Systems, Man and Cybernetics 25(5), 804–813.
Article Google Scholar
Devijver, P. and J. Kittler: 1982, Pattern Recognition - a Statistical Approach. London: Prentice Hall.
MATH Google Scholar
Dietterich, T.: 1997, ‘Machine Learning Research: Four Current Directions’. AI Magazine 68(4), 97–136.
Google Scholar
Duda, R. and P. Hart: 1973, Pattern Classification and Scene Analysis. New York: Wiley.
MATH Google Scholar
Foody, G.: 1990, ‘Directed Ground Survey for Improved Maximum Likelihood Classification of Remotely Sensed Data’. International Journal of Remote Sensing 11(10), 1935–1940.
Article Google Scholar
Foody, G., N. Campbell, N. Trodd, and T. Wood: 1992, ‘Derivation and Application of Probabilistic Measures of Class Membership from the Maximum Likelihood Classification’.. Phot. Eng. and Remote Sensing 58(9), 1335–1341.
Google Scholar
Gasca, E. and R. Barandela: 2000, Influencia del Preprocesamiento de la Muestra de Entrenamiento en el Poder de Generalización del Perceptron Multicapa’. In: Proceedings of the 6th Brazilian Symposium on Neural Networks. Río de Janeiro.
Google Scholar
Copal, S. and C. Woodcock: 1994, ‘Theory and Methods for Accuracy Assessment of Thematic Maps Using Fuzzy Sets’. Phot. Eng. and Remote Sensing 60(2), 181–188.
Google Scholar
Gopalakrishnan, M., V. Sridhar, and H. Krishnamurthy: 1995, ‘Some Applications of Clustering in the Design of Neural Networks’. Pattern Recognition Letters 16, 59–65.
Article Google Scholar
Gowda, K. and G. Krishna: 1979, ‘Learning with a Mutualistic Teacher’. Pattern Recognition 11, 387–390.
Google Scholar
Guha, S., R. Rastogi, and K. Shim: 1998, ‘CURE: An Efficient Clustering Algorithm for Large Databases’. In: Proceedings of the ACMSIGMOD International Conference On Management of Data. Seattle, Washington.
Google Scholar
Hand, D.: 1997, Construction and Assessment of Classification Rules. Chichester: John Wiley and Sons.
MATH Google Scholar
Hardin, P.: 1994, ‘Parametric and Nearest Neighbor Methods for Hybrid Classification: A Comparison of Pixel Assignment Accuracy’. Phot. Eng. and Remote Sensing 60(12), 1439–1448.
Google Scholar
Hardin, P. and C. Thomson: 1992, ‘Fast Nearest Neighbor Classification Methods for Multi-Spectral Imagery’. The Professional Geographer 44(2), 191–201.
Article Google Scholar
Hart, P.: 1968, ‘The Condensed Nearest Neighbor Rule’. IEEE Transactions on Information Theory IT-14, 505–516.
Google Scholar
Huang, Y., K. Liu, and C. Suan: 1995, ‘A New Method of Optimizing Prototypes for Nearest Neighbor Classifiers Using a Multi-Layer Network’. Pattern Recognition Letters 16, 77–82.
Article Google Scholar
Hung, C.: 1993, ‘Competitive Learning Networks for Unsupervised Training’ International Journal of Remote Sensing 14(12), 2411–2415.
Article Google Scholar
John, G.: 1997, ‘Enhancements to the Data Mining Process’. Ph.D. thesis, Stanford University.
Google Scholar
Kershaw, C. and R. Fuller: 1992, ‘Statistical Problems in the Discrimination of Land Cover from Satellite Images: A Case Study in Lowland Britain’. International Journal of Remote Sensing 13(16), 3085–3104.
Article Google Scholar
Koplowitz, J. and T. Brown: 1978, ‘On the Relation of Performance to Editing in Nearest Neighbor Rules’. In: Proc. 4th International Joint Conference on Pattern Recognition. Japan.
Google Scholar
Mather, P.: 1999, Computer Processing of Remotely Sensed Images - an Introduction. Chichester: Wiley and Sons, second edition.
Google Scholar
Merz, C. and P. Murphy: 1998, ‘UCI Repository of Ma-chine Learning Databases’. University of California at Irvine. http://www.csi.uci.edu/mlearn.
Muzzolini, R., Y. Yang, and R. Pierson: 1998, ‘Classifier Design with Incomplete Knowledge’. Pattern Recognition 31(4), 345–369.
Article Google Scholar
Optiz, D. and R. Maclin: 1999, ‘Popular Ensemble Methods: An Empirical Study’. Journal of Artificial Intelligence Research 11, 169–198.
Google Scholar
Ritter, G. and M. Gallegos: 1997, ‘Outliers in Statistical Pattern Recognition and an Application to Automatic Chromosome Classification’. Pattern Recognition Letters 18, 525–539.
Article Google Scholar
Ritter, G., H. Woodritz, S. Lowry, and T. Isenhour: 1975, ‘An Algorithm for Selective Nearest Neighbor Rule’. IEEE Transactions on Information Theory IT-21, 665–669.
Article Google Scholar
Rodriguez, M. and R. Barandela: 1989, ‘Aplicación de Algunas Técnicas de Reconocimiento de Patrones en la Caracterización Estratigrafica Del Yacimiento Varadero’. Serie Geologica 2, 29–38.
Google Scholar
Sanchez, J., F. Pla, and F. Ferri: 1997, ‘Prototype Selection for the Nearest Neighbor Rule Through Proximity Graphs’. Pattern Recognition Letters 18(6), 507–513.
Article Google Scholar
Tax, D. and R. Duin: 1998, ‘Outlier Detection Using Classifier Instability’. In: A. Amin, D. Dori, P. Pudil, and H. Freeman (eds.): Advances in Pattern Recognition, Lecture Notes in Computer Science, Vol.1451. Berlin: Springer.
Google Scholar
Tomek, I.: 1976, ‘An Experiment with the Edited Nearest Neighbor Rule’. IEEE Transactions on Systems, Man and Cybernetics SMC-6, 448–452.
MathSciNet Google Scholar
Urahama, K. and Y. Furukawa: 1995, ‘Gradient Descent Learning of Nearest Neighbor Classifiers with Outlier Rejection’. Pattern Recognition 28(5), 761–768.
Article Google Scholar
Valladares, S.: 1986, ‘Metodologia para la Evaluación de los Colectores y sus Propiedades en las Rocas Pertenecientes al Complejo Aloctono Eugeosinclinal’. Ph.D. thesis, La Habana.
Google Scholar
Warren, S., M. Johnson, W. Goran, and V. Diersing: 1990, ‘An Automated Objective Procedure for Selecting Representative Field Sample Sites’. Phot. Eng. and Remote Sensing 56(3), 333–335.
Google Scholar
Wilkinson, G., F. Feriens, and I. Kenellopoulos: 1995, ‘Integration of Neural and Statistical Approaches in Spatial Data Classification’. Geographical Systems 2, 1–20.
Google Scholar
Wilson, D.: 1972, ‘Asymptotic Properties of Nearest Neighbor Rules Using Edited Data Sets’. IEEE Transactions on Systems, Man and Cybernetics SMC-2, 408–421.
Article Google Scholar
Wilson, D. and T. Martinez: 2000, ‘Reduction Techniques for Instance-Based Learning Algorithms’. Machine Learning 38(3), 257–286.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Tecnológico de Toluca, 52140, Metepec, México
Ricardo Barandela, Eduardo Gasca & Roberto Alejo
Instituto de Geografía Tropical, La Habana, Cuba
Ricardo Barandela

Authors

Ricardo Barandela
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Gasca
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Alejo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Wisconsin — Green Bay, Green Bay, WI, USA
Dechang Chen
The George Washington University, Washington DC, USA
Xiuzhen Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Barandela, R., Gasca, E., Alejo, R. (2003). Correcting the Training Data. In: Chen, D., Cheng, X. (eds) Pattern Recognition and String Matching. Combinatorial Optimization, vol 13. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0231-5_1

Download citation

DOI: https://doi.org/10.1007/978-1-4613-0231-5_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7952-2
Online ISBN: 978-1-4613-0231-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics