Skip to main content

Mining Dependence Structures from Statistical Learning Perspective

  • Conference paper
  • First Online:
  • 1711 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2412))

Abstract

Mining various dependence structures from data are important to many data mining applications. In this paper, several major dependence structure mining tasks are overviewed from statistical learning perspective, with a number of major results on unsupervised learning models that range from a single-object world to a multi-object world. Moreover, efforts towards a key challenge to learning have been discussed in three typical streams, based on generalization error bounds, Ockham principle, and BYY harmony learning, respectively.

The work described in this paper was fully supported by a grant from the Research Grant Council of the Hong Kong SAR (project No: CUHK4383/99E).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike, H. (1974), “A new look at the statistical model identification”, IEEE Tr. Automatic Control, 19, 714–723.

    Google Scholar 

  2. Akaike, H., (1981), “Likelihood of a model and information criteria”, Journal of Econometrics, 16, 3–14.

    Article  MATH  Google Scholar 

  3. Akaike, H., (1987), “Factor analysis and AIC”, Psychometrika, 52, 317–332.

    Article  MATH  MathSciNet  Google Scholar 

  4. Amari, S.-I., Cichocki, A., & Yang, H.H., (1996), “A new learning algorithm for blind separation of sources”, in D. S. Touretzky, et al, eds, Advances in Neural Information Processing 8, MIT Press, 757–763.

    Google Scholar 

  5. Anderson, T.W., & Rubin, H., (1956), “Statistical inference in factor analysis”, Proc. Berkeley Symp. Math. Statist. Prob. 3rd 5, UC Berkeley, 111–150.

    MathSciNet  Google Scholar 

  6. Atkinson, A. C., (1981), “Likelihood ratios, posterior odds and information criteria”, Journal of Econometrics, 16, 15–20.

    Article  Google Scholar 

  7. Barlow, H.B., (1989), “Unsupervised learning”, Neural Computation, 1, 295–311.

    Article  Google Scholar 

  8. Bell, A.J. & Sejnowski, T.J., (1995), “An information-maximization approach to blind separation and blind de-convolution”, Neural Computation 7, 1129–1159.

    Article  Google Scholar 

  9. Berger, J., (1985), Statistical Decision Theory and Bayesian Analyses, Springer-Verlag, New York.

    Google Scholar 

  10. Bozdogan, H. (1987) “Model Selection and Akaike’s Information Criterion: The general theory and its analytical extension”, Psychometrika, 52, 345–370.

    Article  MATH  MathSciNet  Google Scholar 

  11. Bozdogan, H. & Ramirez, D. E., (1988), “FACAIC: Model selection algorithm for the orthogonal factor model using AIC and FACAIC”, Psychometrika, 53(3), 407–415.

    Article  MATH  Google Scholar 

  12. Cavanaugh, J.E. (1997), “Unifying the derivations for the Akaike and corrected Akaike information criteria”, Statistics & Probability Letters, 33, 201–208.

    Article  MATH  MathSciNet  Google Scholar 

  13. Chib, S. (1995), “Marginal likelihood from the Gibbs output”, Journal of the American Statistical Association, 90(432), 1313–1321.

    Article  MATH  MathSciNet  Google Scholar 

  14. Chow, G. C., (1981), “A comparison of the information and posterior probability criteria for model selection”, Journal of Econometrics, 16, 21–33.

    Article  MATH  Google Scholar 

  15. Cooper, G. & Herskovitz, E., (1992), “A Bayesian method for the induction of probabilistic networks from data”, Machine Learning, 9, 309–347.

    MATH  Google Scholar 

  16. Comon, P. (1994), “Independent component analysis-a new concept?”, Signal Processing 36, 287–314.

    Article  MATH  Google Scholar 

  17. Dempster, A.P., et al, (1977), “Maximum-likelihood from incomplete data via the EM algorithm”, J. of Royal Statistical Society, B39, 1–38.

    MathSciNet  Google Scholar 

  18. Devijver, P.A., & Kittler, J., (1982), Pattern Recognition: A Statistical Approach, Prentice-Hall.

    Google Scholar 

  19. Devroye, L., et al (1996), A Probability Theory of Pattern Recognition, Springer.

    Google Scholar 

  20. DiCiccio, T. J., et al, (1997), “ Computing Bayes factors by combining simulations and asymptotic Approximations”, Journal of the American Statistical Association, 92(439), 903–915.

    Article  MATH  MathSciNet  Google Scholar 

  21. R.O. Duda and P.E. Hart, Pattern classification and Scene analysis, Wiley (1973).

    Google Scholar 

  22. Efron, B. (1983) “Estimating the error rate of a prediction rule: Improvement on cross-validation”, Journal of the American Statistical Association, 78, 316–331.

    Article  MATH  MathSciNet  Google Scholar 

  23. Efron, B. & Tibshirani, R., (1993), An Introduction to the Bootstrap, Chaoman and Hall, New York.

    MATH  Google Scholar 

  24. Fyfe, C.,et al, ed. (1998), Special issue on Independence and artificial neural networks, Neurocomputing, Vol. 22, No. 1–3.

    Google Scholar 

  25. Gaeta, M., & Lacounme, J.-L, (1990), “Source Separation without a priori knowledge: the maximum likelihood solution”, in Proc. EUSIPCO90, 621–624.

    Google Scholar 

  26. Gelfand, A. E. & Dey, D. K. (1994), “ Bayesian model choice: Asymptotics and exact calculations”, Journal of the Royal Statistical Society B, 56(3), 501–514.

    MATH  MathSciNet  Google Scholar 

  27. Geman, S., Bienenstock, E., & Doursat, R., (1992), “Neural Networks and the bias-variance dilemma”, Neural Computation, 4, 1–58.

    Article  Google Scholar 

  28. Ghahramani, Z. & Beal, M.J., (2000), “Variational inference for Bayesian mixture of factor analysis”, S.A. Solla, T.K. Leen & K.-R. Muller, eds, Advances in Neural Information Processing Systems 12, Cambridge, MA: MIT Press, 449–455.

    Google Scholar 

  29. Girosi, F., et al, (1995) “Regularization theory and neural architectures”, Neural Computation, 7, 219–269.

    Article  Google Scholar 

  30. Han, J. and Kamber, M., (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.

    Google Scholar 

  31. Hinton, G.E. & Zemel, R.S., (1994), “Autoencoders, minimum description length and Helmholtz free energy”, Advances in NIPS, 6, 3–10.

    Google Scholar 

  32. Hotelling, H., (1936), “Simplified calculation of principal components”, Psy-chometrika 1, 27–35.

    Google Scholar 

  33. Hurvich, C.M., & Tsai, C.L. (1989), “Regression and time series model in samll samples”, Biometrika, 76, 297–307.

    Article  MATH  MathSciNet  Google Scholar 

  34. Hurvich, C.M., & Tsai, C.L. (1993), “A corrected Akaike information criterion for vector autoregressive model selection”, J. of Time Series Analysis, 14, 271–279.

    Article  MATH  MathSciNet  Google Scholar 

  35. Jacobs, R.A., et al, (1991), “Adaptive mixtures of local experts”, Neural Computation, 3, 79–87.

    Article  Google Scholar 

  36. Jacobs, R. A., (1997) “Bias/Variance Analyses of Mixtures-of-Experts Architectures”, Neural Computation, 9.

    Google Scholar 

  37. Jeffreys, H., (1939), Theory of Probability, Clarendon Press, Oxford.

    Google Scholar 

  38. Jensen, F.V., (1996), An introduction to Bayesian networks, University of Collage London Press.

    Google Scholar 

  39. Jordan, M. I., & Jacobs, R.A., (1994), “Hierarchical mixtures of experts and the EM algorithm”, Neural Computation, 6, 181–214.

    Article  Google Scholar 

  40. Jordan, M. I., & Xu, L., (1995), “Convergence results for the EM approach to mixtures of experts”, Neural Networks, 8, 1409–1431.

    Article  Google Scholar 

  41. Jutten, C & Herault, J., (1988), “Independent Component Analysis versus Principal Component Analysis”, Proc. EUSIPCO88, 643–646.

    Google Scholar 

  42. Kashyap, R.L., (1982), “Optimal choice of AR and MA parts in autoregressive and moving-average models”, IEEE Trans. PAMI, 4, 99–104.

    MATH  Google Scholar 

  43. Kass, R. E. & Raftery, A. E., (1995), “ Bayes factors”, Journal of the American Statistical Association, 90(430), 773–795.

    Article  MATH  Google Scholar 

  44. Kass, R. E. & Wasserman, L., (1996), “The selection of prior distributions by formal rules”, Journal of the American Statistical Association, 91(435), 1343–1370.

    Article  MATH  Google Scholar 

  45. Katz, R. W., (1981), “On some criteria for estimating the order of a Markov chain“, Technometrics, 23(3), 243–249.

    Article  MATH  MathSciNet  Google Scholar 

  46. King, I. & Xu, L., (1995), “Adaptive contrast enhancement by entropy maximization with a 1-K-1 constrained network”, Proc. ICONIP’95, pp703–706.

    Google Scholar 

  47. Kohonen, T, (1995), Self-Organizing Maps, Springer-Verlag, Berlin.

    Google Scholar 

  48. Kohonen, T., (1982), “Self-organized formation of topologically correct feature maps”, Biological Cybernetics 43, 59–69.

    Article  MATH  MathSciNet  Google Scholar 

  49. Kontkanen, P., et al, (1998), “Bayesian and Information-Theoretic priors for Bayeisan network parameters”, Machine Learning: ECML-98, Lecture Notes in Artificial Intelligence, Vol. 1398, 89–94, Springer-Verlag.

    Google Scholar 

  50. Mackey, D. (1992a) “A practical Bayesian framework for backpropagation”, Neural Computation, 4, 448–472.

    Article  Google Scholar 

  51. Mackey, D. (1992b) “Bayesian Interpolation”, Neural Computation, 4, 405–447.

    Google Scholar 

  52. von der Malsburg, Ch. (1973), Self-organization of orientation sensitive cells in the striate cortex, Kybernetik 14, 85–100.

    Article  Google Scholar 

  53. McDonald, R, (1985), Factor Analysis and Related Techniques, Lawrence Erlbaum.

    Google Scholar 

  54. McLachlan, G. J. & Krishnan, T. (1997) The EM Algorithm and Extensions, John Wiley & Son, INC.

    Google Scholar 

  55. Moody, J. & Darken, J., (1989) “Fast learning in networks of locally-tuned processing units”, Neural Computation, 1, 281–294.

    Article  Google Scholar 

  56. Neath, A.A., & Cavanaugh, J.E., (1997), “Regression and Time Series model selection using variants of the Schwarz information criterion”, Communications in Statistics A, 26, 559–580.

    Article  MATH  MathSciNet  Google Scholar 

  57. Neal, R.M., (1996), Bayesian learning for neural networks, New York: Springer-Verlag.

    MATH  Google Scholar 

  58. Newton, M. A. & Raftery, A. E., (1994), “Approximate Bayesian inference with the weighted likelihood Bootstrap”, J. Royal Statistical Society B, 56(1), 3–48.

    MATH  MathSciNet  Google Scholar 

  59. Nowlan, S.J., (1990), “Max likelihood competition in RBF networks”, Tech. Rep. CRG-Tr-90-2, Dept. of Computer Sci., U. of Toronto.

    Google Scholar 

  60. O’Hagan, A., (1995), “Fractional Bayes factors for model comparison”, J. Royal Statistical Society B, 57(1), 99–138.

    MATH  MathSciNet  Google Scholar 

  61. Oja, E., (1983), Subspace Methods of Pattern Recognition, Research Studies Press, UK.

    Google Scholar 

  62. Pearl, J, (1988), Probabilistic reasoning in intelligent systems: networks of plausible inference, San Fransisca, CA: Morgan Kaufman.

    Google Scholar 

  63. Rabiner, L. & Juang, B.H., (1993), Fundamentals of Speech Recognition, Prentice Hall, Inc..

    Google Scholar 

  64. Wolpert, D. H., (1997), “On Bias Plus Variance”, Neural Computation, 9.

    Google Scholar 

  65. Vapnik, V.N., (1995), The Nature Of Statistical Learning Theory, Springer-Verlag.

    Google Scholar 

  66. Redner, R.A. & Walker, H.F., (1984), “Mixture densities, maximum likelihood, and the EM algorithm”, SIAM Review, 26, 195–239.

    Article  MATH  MathSciNet  Google Scholar 

  67. Rissanen, J. (1986), “Stochastic complexity and modeling”, Annals of Statistics, 14(3), 1080–1100.

    Article  MATH  MathSciNet  Google Scholar 

  68. Rissanen, J. (1989), Stochastic Complexity in Statistical Inquiry, World Scientific: Singapore.

    MATH  Google Scholar 

  69. Rissanen, J. (1999), “Hypothesis selection and testing by the MDL principle”, Computer Journal, 42(4), 260–269.

    Article  MATH  MathSciNet  Google Scholar 

  70. Rivals, I. & Personnaz, L., (1999) “On Cross Validation for Model Selection”, Neural Computation, 11, 863–870.

    Article  Google Scholar 

  71. Rubi, D & Thayer, D., (1976), “EM algorithm for ML factor analysis”, Psychome-trika 57, 69–76.

    Google Scholar 

  72. Rumelhart, D.E., Hinton, G.E., & Williams, R.J., (1986), “Learning internal representations by error propagation”, Parallel Distributed Processing, 1, MIT press.

    Google Scholar 

  73. Sato, M., (2001), “Online model selection based on the vairational Bayes”, Neural Computation, 13, 1649–1681.

    Article  MATH  Google Scholar 

  74. Schwarz, G., (1978), “Estimating the dimension of a model”, Annals of Statistics, 6, 461–464.

    Article  MATH  MathSciNet  Google Scholar 

  75. Sclove, S. L., (1987), “ Application of model-selection criteria to some problems in multivariate analysis”, Psychometrika, 52(3), 333–343.

    Article  Google Scholar 

  76. Spearman, C., (1904), “General intelligence domainively determined and measured”, Am. J. Psychol. 15, 201–293.

    Article  Google Scholar 

  77. Stone, M. (1974), “Cross-validatory choice and assessment of statistical prediction”, J. Royal Statistical Society B, 36, 111–147.

    MATH  Google Scholar 

  78. Stone, M. (1977a), “Asymptotics for and against cross-validation”, Biometrika, 64(1), 29–35.

    Article  MATH  MathSciNet  Google Scholar 

  79. Stone, M. (1977b), “An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion”, J. Royal Statistical Society B, 39(1), 44–47.

    MATH  Google Scholar 

  80. Stone, M. (1978),“Cross-validation: A review”, Math. Operat. Statist., 9, 127–140.

    MATH  Google Scholar 

  81. Stone, M. (1979), “Comments on model selection criteria of Akaike and Schwartz. J. Royal Statistical Society B, 41(2), 276–278.

    Google Scholar 

  82. Sugiura, N. (1978), “Further analysis of data by Akaike’s infprmation criterion and the finite corrections”, Communications in Statistics A, 7, 12–26.

    MathSciNet  Google Scholar 

  83. Tikhonov, A.N. & Arsenin, V.Y., (1977), Solutions of Ill-posed Problems, V.H. Winston and Sons.

    Google Scholar 

  84. Tipping, M.E., and Bishop, C.M., (1999), “Mixtures of probabilistic principal component analysis”, Neural Computation, 11, 443–482.

    Article  Google Scholar 

  85. Tong, L., Inouye, Y., & Liu, R., (1993) “Waveform-preserving blind estimation of multiple independent sources”, IEEE Trans. on Signal Processing 41, 2461–2470.

    Article  MATH  Google Scholar 

  86. Wallace, C.S. & Boulton, D.M., (1968), “An information measure for classification”, Computer Journal, 11, 185–194.

    MATH  Google Scholar 

  87. Wallace, C.S. & Freeman, P.R., (1987), “Estimation and inference by compact coding”, J. of the Royal Statistical Society, 49(3), 240–265.

    MATH  MathSciNet  Google Scholar 

  88. Wallace, C.S. & Dowe, D.R., (1999), “Minimum message length and Kolmogorov complexity”, Computer Journal, 42(4), 270–280.

    Article  MATH  Google Scholar 

  89. Waterhouse, S., et al, (1996), “Bayesian method for mixture of experts”, D.S. Touretzky, et al, eds, Advances in NIPS 8, 351–357.

    Google Scholar 

  90. Xu, L., (2002a), “ BYY Harmony Learning, Structural RPCL, and Topological Self-Organizing on Mixture Models ”, to appear on Neural Networks, 2002.

    Google Scholar 

  91. Xu, L., (2002b), “ BYY Learning, Regularized Implementation, and Model Selection on Modular Networks with One Hidden Layer of Binary Units ”, to appear on a special issue, Neurocomputing, 2002.

    Google Scholar 

  92. Xu, L., (2001a), “BYY Harmony Learning, Independent State Space and Generalized APT Financial Analyses ”, IEEE Trans on Neural Networks, 12(4), 822–849.

    Article  Google Scholar 

  93. Xu, L., (2001b), “Best Harmony, Unified RPCL and Automated Model Selection for Unsupervised and Supervised Learning on Gaussian Mixtures, Three-Layer Nets and ME-RBF-SVM Models”, Intl J. of Neural Systems 11(1), 43–69.

    Google Scholar 

  94. Xu, L., (2000), “Temporal BYY Learning for State Space Approach, Hidden Markov Model and Blind Source Separation”, IEEE Trans on Signal Processing 48, 2132–2144.

    Article  MATH  Google Scholar 

  95. Xu, L., (1998a), “RBF Nets, Mixture Experts, and Bayesian Ying-Yang Learning“, Neurocomputing, Vol. 19, No. 1-3, 223–257.

    Article  MATH  Google Scholar 

  96. Xu, L., (1998b), “Bayesian Kullback Ying-Yang Dependence Reduction Theory”, Neurocomputing 22, No. 1-3, 81–112, 1998.

    Article  MATH  Google Scholar 

  97. Xu, L., Cheung, C.C., & Amari, S.-I., (1998) “Learned Parametric Mixture Based ICA Algorithm”, Neurocomputing 22, No. 1-3, 69–80. A part of its preliminary version on Proc. ESANN97, 291–296.

    Article  MATH  Google Scholar 

  98. Xu, L., (1997), “Bayesian Ying-Yang Machine, Clustering and Number of Clusters”, Pattern Recognition Letters 18, No. 11-13, 1167–1178.

    Article  Google Scholar 

  99. Xu, L. Yang, H.H., & Amari, S.-I., (1996), “Signal Source Separation by Mixtures Accumulative Distribution Functions or Mixture of Bell-Shape Density Distribution Functions ”, Research Proposal, presented at FRONTIER FORUM (speakers: D. Sherrington, S. Tanaka, L.Xu & J. F. Cardoso), organized by S.Amari, S.Tanaka & A.Cichocki, RIKEN, Japan, April 10, 1996.

    Google Scholar 

  100. Xu, L., (1996&95), “A Unified Learning Scheme: Bayesian-Kullback YING-YANG Machine”, Advances in Neural Information Processing Systems, 8, 444–450 (1996). A part of its preliminary version on Proc. ICONIP95-Peking, 977–988(1995).

    Google Scholar 

  101. Xu, L., (1995), “A unified learning framework: multisets modeling learning,” Proceedings of 1995 World Congress on Neural Networks, vol. 1, pp. 35–42.

    Google Scholar 

  102. Xu, L., Jordan, M.I., & Hinton, G.E., (1995), “An Alternative Model for Mixtures of Experts”, Advances in Neural Information Processing Systems 7, eds., Cowan, J.D., et al, MIT Press, 633–640, 1995.

    Google Scholar 

  103. Xu, L., Krzyzak, A., & Yuille, A.L., (1994), “On Radial Basis Function Nets and Kernel Regression: Statistical Consistency, Convergence Rates and Receptive Field Size”, Neural Networks, 7, 609–628.

    Article  MATH  Google Scholar 

  104. Xu, L., Krzyzak, A. & Oja, E. (1993), “Rival Penalized Competitive Learning for Clustering Analysis, RBF net and Curve Detection”, IEEE Tr. on Neural Networks 4, 636–649.

    Article  Google Scholar 

  105. Xu, L., (1991&93) “Least mean square error reconstruction for self-organizing neural-nets”, Neural Networks 6, 627–648, 1993. Its early version on Proc. IJCNN91’Singapore, 2363-2373, 1991.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, L. (2002). Mining Dependence Structures from Statistical Learning Perspective. In: Yin, H., Allinson, N., Freeman, R., Keane, J., Hubbard, S. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2002. IDEAL 2002. Lecture Notes in Computer Science, vol 2412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45675-9_47

Download citation

  • DOI: https://doi.org/10.1007/3-540-45675-9_47

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44025-3

  • Online ISBN: 978-3-540-45675-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics