Learning with Missing or Incomplete Data

Gabrys, Bogdan

doi:10.1007/978-3-642-04146-4_1

Bogdan Gabrys¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5716))

Included in the following conference series:

International Conference on Image Analysis and Processing

1390 Accesses
3 Citations

Abstract

The problem of learning with missing or incomplete data has received a lot of attention in the literature [6,10,13,21,23]. The reasons for missing data can be multi-fold ranging from sensor failures in engineering applications to deliberate withholding of some information in medical questioners in the case of missing input feature values or lack of solved (labelled) cases required in supervised learning algorithms in the case of missing labels. And though such problems are very interesting from the practical and theoretical point of view, there are very few pattern recognition techniques which can deal with missing values in a straightforward and efficient manner. It is in a sharp contrast to the very efficient way in which humans deal with unknown data and are able to perform various pattern recognition tasks given only a subset of input features or few labelled reference cases.

In the context of pattern recognition or classification systems the problem of missing labels and the problem of missing features are very often treated separately.

The availability or otherwise of labels determines the type of the learning algorithm that can be used and has led to the well known split into supervised, unsupervised or more recently introduced hybrid/semi-supervised classes of learning algorithms.

Commonly, using supervised learning algorithms enables designing of robust and well performing classifiers. Unfortunately, in many real world applications labelling of the data is costly and thus possible only to some extent. Unlabelled data on the other hand is often available in large quantities but a classifier built using unsupervised learning is likely to demonstrate performance inferior to its supervised counterpart. The interest in a mixed supervised and unsupervised learning is thus a natural consequence of this state of things and various approaches have been discussed in the literature [2,5,10,12,14,15,18,19]. Our experimental results have shown [10] that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results based on random selection of labelled samples show high variability and the performance of the final classifier is more dependent on how reliable the labelled data samples are rather than use of additional unlabelled data. This points to a very interesting discussion point related to the issue of the trade-off between the information content in the observed data (in this case available labels) versus the impact that can be achieved by employing sophisticated data processing algorithms which we will also revisit when discussing approaches dealing with missing feature values.

Download to read the full chapter text

Chapter PDF

Learning from data with structured missingness

Article 25 January 2023

Dealing with Missing Data and Uncertainty in the Context of Data Mining

Missing Data Imputation and Its Effect on the Accuracy of Classification

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Berthold, M.R., Huber, K.-P.: Missing values and learning of fuzzy rules. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(2), 171–178 (1998)
Article MATH Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, pp. 92–100. ACM, New York (1998)
Chapter Google Scholar
Budka, M., Gabrys, B.: Electrostatic Field Classifier for Deficient Data. In: The sixth International Conference on Computer Recognition Systems, Jelenia Góra, Poland, May 25-28 (2009a)
Google Scholar
Budka, M., Gabrys, B.: Mixed supervised and unsupervised learning from incomplete data using a physical field model. Natural Computing (submitted, 2009)
Google Scholar
Dara, R., Kremer, S., Stacey, D.: Clustering unlabeled data with SOMs improves classification of labeled real-world data. In: Proceedings of the World Congress on Computational Intelligence, WCCI (2002)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Gabrys, B.: Agglomerative Learning Algorithms for General Fuzzy Min-Max Neural Network. The Journal of VLSI Signal Processing 32(1), 67–82 (2002)
Article MATH Google Scholar
Gabrys, B.: Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning 30(3), 149–179 (2002)
Article MathSciNet MATH Google Scholar
Gabrys, B., Bargiela, A.: General fuzzy min-max neural network for clustering and classification. IEEE Transactions on Neural Networks 11(3), 769–783 (2000)
Article Google Scholar
Gabrys, B., Petrakieva, L.: Combining labelled and unlabelled data in the design of pattern classification systems. International Journal of Approximate Reasoning 35(3), 251–273 (2004)
Article MathSciNet MATH Google Scholar
Ghahramani, Z., Jordan, M.: Supervised learning from incomplete data via an EM approach. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 120–127 (1994)
Google Scholar
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: Proceedings of ICML (1998)
Google Scholar
Graham, J., Cumsille, P., Elek-Fisk, E.: Methods for handling missing data. Handbook of psychology 2, 87–114 (2003)
Google Scholar
Kothari, R., Jain, V.: Learning from labeled and unlabeled data. In: Proceedings of the 2002 International Joint Conference on Neural Networks, 2002. IJCNN 2002, vol. 3 (2002); Loss, D., Di Vincenzo, D.: Quantum computation with quantum dots. Physical Review A 57 (1), 120–126 (1998)
Google Scholar
Mitchell, T.: The role of unlabeled data in supervised learning. In: Proceedings of the Sixth International Colloquium on Cognitive Science (1999)
Google Scholar
Nauck, D., Kruse, R.: Learning in neuro-fuzzy systems with symbolic attributes and missing values. In: Proceedings of the International Conference on Neural Information Processing – ICONIP 1999, Perth, pp. 142–147 (1999)
Google Scholar
Nijman, M.J., Kappen, H.J.: Symmetry breaking and training from incomplete data with radial basis Boltzmann machines. International Journal of Neural Systems 8(3), 301–315 (1997)
Article Google Scholar
Nigam, K., Ghani, R.: Understanding the behavior of co-training. In: Proceedings of KDD 2000 Workshop on Text Mining (2000)
Google Scholar
Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Transactions on Systems, Man, and Cybernetics, Part B 27(5), 787–795 (1997)
Article Google Scholar
Rubin, D.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet MATH Google Scholar
Rubin, D.: Multiple Imputation for Nonresponse in Surveys. Wiley-Interscience, Hoboken (1987)
Book MATH Google Scholar
Ruta, D., Gabrys, B.: A Framework for Machine Learning based on Dynamic Physical Fields. Natural Computing Journal on Nature-inspired Learning and Adaptive Systems 8(2), 219–237 (2009)
MathSciNet MATH Google Scholar
Schafer, J., Graham, J.: Missing data: Our view of the state of the art. Psychological Methods 7(2), 147–177 (2002)
Article Google Scholar
Tresp, V., Ahmad, S., Neuneier, R.: Training neural networks with deficient data. Advances in Neural Information Processing Systems 6, 128–135 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Smart Technology Research Centre Computational Intelligence Research Group, Bournemouth University, UK
Bogdan Gabrys

Authors

Bogdan Gabrys
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione e Ingegneria Elettrica, Università di Salerno, Via Ponte Don Melillo, 1, 84084, Fisciano (SA), Italy
Pasquale Foggia
Dipartimento di Informatica e Sistemistica, Università di Napoli Federico II, Via Claudio, 21, I-80125, Napoli, Italy
Carlo Sansone
Dipartimento di Ingegneria dell’Informazione ed Ingegneria Elettrica, Università di Salerno, via P.te Don Melillo, I-84084, Fisciano (SA), Italy
Mario Vento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gabrys, B. (2009). Learning with Missing or Incomplete Data. In: Foggia, P., Sansone, C., Vento, M. (eds) Image Analysis and Processing – ICIAP 2009. ICIAP 2009. Lecture Notes in Computer Science, vol 5716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04146-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-04146-4_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04145-7
Online ISBN: 978-3-642-04146-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Learning with Missing or Incomplete Data

Abstract

Chapter PDF

Similar content being viewed by others

Learning from data with structured missingness

Dealing with Missing Data and Uncertainty in the Context of Data Mining

Missing Data Imputation and Its Effect on the Accuracy of Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Learning with Missing or Incomplete Data

Abstract

Chapter PDF

Similar content being viewed by others

Learning from data with structured missingness

Dealing with Missing Data and Uncertainty in the Context of Data Mining

Missing Data Imputation and Its Effect on the Accuracy of Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation