Advertisement

Natural Computing

, Volume 10, Issue 2, pp 921–945 | Cite as

Electrostatic field framework for supervised and semi-supervised learning from incomplete data

  • Marcin Budka
  • Bogdan Gabrys
Article

Abstract

In this paper a classification framework for incomplete data, based on electrostatic field model is proposed. An original approach to exploiting incomplete training data with missing features, involving extensive use of electrostatic charge analogy, has been used. The framework supports a hybrid supervised and unsupervised training scenario, enabling learning simultaneously from both labelled and unlabelled data using the same set of rules and adaptation mechanisms. Classification of incomplete patterns has been facilitated by introducing a local dimensionality reduction technique, which aims at exploiting all available information using the data ‘as is’, rather than trying to estimate the missing values. The performance of all proposed methods has been extensively tested in a wide range of missing data scenarios, using a number of standard benchmark datasets in order to make the results comparable with those available in current and future literature. Several modifications to the original Electrostatic Field Classifier aiming at improving speed and robustness in higher dimensional spaces have also been introduced and discussed.

Keywords

Pattern classification Deficient data Gravity field Electrostatic field Incomplete data Hybrid learning Machine learning Missing data Physical phenomena 

References

  1. Aggarwal C (2001) Re-designing distance functions and distance-based applications for high dimensional data. ACM SIGMOD Rec 30(1):13–18CrossRefGoogle Scholar
  2. Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. Lect Notes Comput Sci 2001:420–435CrossRefGoogle Scholar
  3. Asuncion A, Newman D (2007) UCI machine learning repositoryGoogle Scholar
  4. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful. Lect Notes Comput Sci 1540:217–235CrossRefGoogle Scholar
  5. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, New York, NY, USA, pp 92–100Google Scholar
  6. Budka M, Gabrys B (2009) Electrostatic field classifier for deficient data. In: Computer recognition systems 3: Proceedings of 6th international conference on computer recognition systems cores 09. Springer, pp 311–318Google Scholar
  7. Chuang I, Nielsen M (2000) Quantum information. Cambridge University PressGoogle Scholar
  8. Dara R, Kremer S, Stacey D (2002) Clustering unlabeled data with SOMs improves classification of labeled real-world data. In: Neural networks, 2002. IJCNN’02. Proceedings of the 2002 international joint conference on, vol 3, pp 2237–2242Google Scholar
  9. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38MathSciNetMATHGoogle Scholar
  10. Duin R, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax D, Verzakov S (2007) Pr-tools 4.1, a matlab toolbox for pattern recognition. http://prtools.org
  11. Francois D, Wertz V, Verleysen M (2005) Non-Euclidean metrics for similarity search in noisy datasets. In: Proceedings of the European symposium on artificial neural networks, pp 339–334Google Scholar
  12. Gabrys B (2002) Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. Int J Approx Reason 30(3):149–179MathSciNetMATHCrossRefGoogle Scholar
  13. Gabrys B, Petrakieva L (2004) Combining labelled and unlabelled data in the design of pattern classification systems. Int J Approx Reason 35(3):251–273MathSciNetMATHCrossRefGoogle Scholar
  14. Ghahramani Z, Jordan M, Cowan J, Tesauro G, Alspector J (1994) Supervised learning from incomplete data via an EM approach. Adv Neural Inf Process Syst 6:120–127Google Scholar
  15. Graham J, Cumsille P, Elek-Fisk E (2003) Methods for handling missing data. Handb Psychol 2:87–114Google Scholar
  16. Hakkoymaz H, Chatzimilioudis G, Gunopulos D, Mannila H (2009) Applying electromagnetic field theory concepts to clustering with constraints. In: Proceedings of the European conference on machine learning and knowledge discovery in databases: part I. Springer, p 500Google Scholar
  17. Hild K, Erdogmus D, Principe J (2001) Blind source separation using Renyi’s mutual information. IEEE Signal Process Lett 8(6):174–176CrossRefGoogle Scholar
  18. Hochreiter S, Mozer M (2001) Coulomb classifiers: reinterpreting SVMs as electrostatic systems. Technical report CU-CS-921-01. Department of Computer Science, University of Colorado, BoulderGoogle Scholar
  19. Hochreiter S, Mozer M, Obermayer K (2003) Coulomb classifiers: generalizing support vector machines via an analogy to electrostatic systems. Adv Neural Inf Process Syst 15:545–552Google Scholar
  20. Jenssen R, Eltoft T, Erdogmus D, Principe J (2006) Some equivalences between kernel methods and information theoretic methods. J VLSI Signal Process 45(1):49–65CrossRefGoogle Scholar
  21. Kothari R, Jain V (2002) Learning from labeled and unlabeled data. In: Neural networks, 2002. IJCNN’02. Proceedings of the 2002 international joint conference on, vol 3Google Scholar
  22. Kuncheva L (2000) Fuzzy classifier design. Physica VerlagGoogle Scholar
  23. Loss D, DiVincenzo D (1998) Quantum computation with quantum dots. Phys Rev A 57(1):120–126CrossRefGoogle Scholar
  24. Madow W, Olkin I (1983) Incomplete data in sample surveys, vol 3, Proceedings of the symposium. Academic Press, New YorkGoogle Scholar
  25. Mitchell T (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive scienceGoogle Scholar
  26. Nigam K, Ghani R (2000) Understanding the behavior of co-training. In: Proceedings of KDD-2000 workshop on text miningGoogle Scholar
  27. Outhwaite W, Turner SP (2007) Handbook of social science methodology. SAGE Publications LtdGoogle Scholar
  28. Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern B 27(5):787–795CrossRefGoogle Scholar
  29. Principe J, Xu D, Fisher J (2000a) Information theoretic learning, chapter 7. Wiley, New York, pp 265–319Google Scholar
  30. Principe J, Xu D, Zhao Q, Fisher J (2000b) Learning from examples with information theoretic criteria. J VLSI Signal Process 26(1):61–77MATHCrossRefGoogle Scholar
  31. Ripley B (1996) Pattern recognition and neural networks. Cambridge University PressGoogle Scholar
  32. Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th international conference on machine learning, pp 441–448Google Scholar
  33. Rubin D (1976) Inference and missing data. Biometrika 63(3):581–592MathSciNetMATHCrossRefGoogle Scholar
  34. Rubin D (1987) Multiple imputation for nonresponse in surveys. Wiley-InterscienceGoogle Scholar
  35. Ruta D, Gabrys B (2003) Physical field models for pattern classification. Soft Comput 8(2):126–141MathSciNetGoogle Scholar
  36. Ruta D, Gabrys B (2005) Nature inspired learning models. In: Proceedings of the European symposium on nature inspired smart information systems, Albufeira, PortugalGoogle Scholar
  37. Ruta D, Gabrys B (2009) A framework for machine learning based on dynamic physical fields. Nat Comput 8(2):219–237MathSciNetMATHCrossRefGoogle Scholar
  38. Sarle W (1998) Prediction with missing inputs. JCIS 98:399–402Google Scholar
  39. Schafer J, Graham J (2002) Missing data: our view of the state of the art. Psychol Methods 7(2):147–177CrossRefGoogle Scholar
  40. Schafer J, Schenker N (2000) Inference with imputed conditional means. J Am Stat Assoc 95(449):144–154MathSciNetMATHCrossRefGoogle Scholar
  41. Sg SG, Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proceedings of the 17th international conference on machine learning, pp 327–334Google Scholar
  42. Torkkola K (2003) Feature extraction by non parametric mutual information maximization. J Mach Learn Res 3:1415–1438MathSciNetMATHCrossRefGoogle Scholar
  43. Tresp V, Ahmad S, Neuneier R (1994) Training neural networks with deficient data. Adv Neural Inf Process Syst 6:128–135Google Scholar
  44. Walther P, Resch K, Rudolph T, Schenck E, Weinfurter H, Vedral V, Aspelmeyer M, Zeilinger A (2005) Experimental one-way quantum computing. Nature 434:169–176CrossRefGoogle Scholar
  45. Zurek W (1989) Complexity, entropy and the physics of information. Westview PressGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Computational Intelligence Research Group, School of Design, Engineering & ComputingBournemouth UniversityFern Barrow, PooleUK

Personalised recommendations