Mining Dependence Structures from Statistical Learning Perspective

Xu, Lei

doi:10.1007/3-540-45675-9_47

Mining Dependence Structures from Statistical Learning Perspective

Lei Xu⁷

Conference paper
First Online: 01 January 2002

1711 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2412))

Abstract

Mining various dependence structures from data are important to many data mining applications. In this paper, several major dependence structure mining tasks are overviewed from statistical learning perspective, with a number of major results on unsupervised learning models that range from a single-object world to a multi-object world. Moreover, efforts towards a key challenge to learning have been discussed in three typical streams, based on generalization error bounds, Ockham principle, and BYY harmony learning, respectively.

The work described in this paper was fully supported by a grant from the Research Grant Council of the Hong Kong SAR (project No: CUHK4383/99E).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akaike, H. (1974), “A new look at the statistical model identification”, IEEE Tr. Automatic Control, 19, 714–723.
Google Scholar
Akaike, H., (1981), “Likelihood of a model and information criteria”, Journal of Econometrics, 16, 3–14.
Article MATH Google Scholar
Akaike, H., (1987), “Factor analysis and AIC”, Psychometrika, 52, 317–332.
Article MATH MathSciNet Google Scholar
Amari, S.-I., Cichocki, A., & Yang, H.H., (1996), “A new learning algorithm for blind separation of sources”, in D. S. Touretzky, et al, eds, Advances in Neural Information Processing 8, MIT Press, 757–763.
Google Scholar
Anderson, T.W., & Rubin, H., (1956), “Statistical inference in factor analysis”, Proc. Berkeley Symp. Math. Statist. Prob. 3rd 5, UC Berkeley, 111–150.
MathSciNet Google Scholar
Atkinson, A. C., (1981), “Likelihood ratios, posterior odds and information criteria”, Journal of Econometrics, 16, 15–20.
Article Google Scholar
Barlow, H.B., (1989), “Unsupervised learning”, Neural Computation, 1, 295–311.
Article Google Scholar
Bell, A.J. & Sejnowski, T.J., (1995), “An information-maximization approach to blind separation and blind de-convolution”, Neural Computation 7, 1129–1159.
Article Google Scholar
Berger, J., (1985), Statistical Decision Theory and Bayesian Analyses, Springer-Verlag, New York.
Google Scholar
Bozdogan, H. (1987) “Model Selection and Akaike’s Information Criterion: The general theory and its analytical extension”, Psychometrika, 52, 345–370.
Article MATH MathSciNet Google Scholar
Bozdogan, H. & Ramirez, D. E., (1988), “FACAIC: Model selection algorithm for the orthogonal factor model using AIC and FACAIC”, Psychometrika, 53(3), 407–415.
Article MATH Google Scholar
Cavanaugh, J.E. (1997), “Unifying the derivations for the Akaike and corrected Akaike information criteria”, Statistics & Probability Letters, 33, 201–208.
Article MATH MathSciNet Google Scholar
Chib, S. (1995), “Marginal likelihood from the Gibbs output”, Journal of the American Statistical Association, 90(432), 1313–1321.
Article MATH MathSciNet Google Scholar
Chow, G. C., (1981), “A comparison of the information and posterior probability criteria for model selection”, Journal of Econometrics, 16, 21–33.
Article MATH Google Scholar
Cooper, G. & Herskovitz, E., (1992), “A Bayesian method for the induction of probabilistic networks from data”, Machine Learning, 9, 309–347.
MATH Google Scholar
Comon, P. (1994), “Independent component analysis-a new concept?”, Signal Processing 36, 287–314.
Article MATH Google Scholar
Dempster, A.P., et al, (1977), “Maximum-likelihood from incomplete data via the EM algorithm”, J. of Royal Statistical Society, B39, 1–38.
MathSciNet Google Scholar
Devijver, P.A., & Kittler, J., (1982), Pattern Recognition: A Statistical Approach, Prentice-Hall.
Google Scholar
Devroye, L., et al (1996), A Probability Theory of Pattern Recognition, Springer.
Google Scholar
DiCiccio, T. J., et al, (1997), “ Computing Bayes factors by combining simulations and asymptotic Approximations”, Journal of the American Statistical Association, 92(439), 903–915.
Article MATH MathSciNet Google Scholar
R.O. Duda and P.E. Hart, Pattern classification and Scene analysis, Wiley (1973).
Google Scholar
Efron, B. (1983) “Estimating the error rate of a prediction rule: Improvement on cross-validation”, Journal of the American Statistical Association, 78, 316–331.
Article MATH MathSciNet Google Scholar
Efron, B. & Tibshirani, R., (1993), An Introduction to the Bootstrap, Chaoman and Hall, New York.
MATH Google Scholar
Fyfe, C.,et al, ed. (1998), Special issue on Independence and artificial neural networks, Neurocomputing, Vol. 22, No. 1–3.
Google Scholar
Gaeta, M., & Lacounme, J.-L, (1990), “Source Separation without a priori knowledge: the maximum likelihood solution”, in Proc. EUSIPCO90, 621–624.
Google Scholar
Gelfand, A. E. & Dey, D. K. (1994), “ Bayesian model choice: Asymptotics and exact calculations”, Journal of the Royal Statistical Society B, 56(3), 501–514.
MATH MathSciNet Google Scholar
Geman, S., Bienenstock, E., & Doursat, R., (1992), “Neural Networks and the bias-variance dilemma”, Neural Computation, 4, 1–58.
Article Google Scholar
Ghahramani, Z. & Beal, M.J., (2000), “Variational inference for Bayesian mixture of factor analysis”, S.A. Solla, T.K. Leen & K.-R. Muller, eds, Advances in Neural Information Processing Systems 12, Cambridge, MA: MIT Press, 449–455.
Google Scholar
Girosi, F., et al, (1995) “Regularization theory and neural architectures”, Neural Computation, 7, 219–269.
Article Google Scholar
Han, J. and Kamber, M., (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.
Google Scholar
Hinton, G.E. & Zemel, R.S., (1994), “Autoencoders, minimum description length and Helmholtz free energy”, Advances in NIPS, 6, 3–10.
Google Scholar
Hotelling, H., (1936), “Simplified calculation of principal components”, Psy-chometrika 1, 27–35.
Google Scholar
Hurvich, C.M., & Tsai, C.L. (1989), “Regression and time series model in samll samples”, Biometrika, 76, 297–307.
Article MATH MathSciNet Google Scholar
Hurvich, C.M., & Tsai, C.L. (1993), “A corrected Akaike information criterion for vector autoregressive model selection”, J. of Time Series Analysis, 14, 271–279.
Article MATH MathSciNet Google Scholar
Jacobs, R.A., et al, (1991), “Adaptive mixtures of local experts”, Neural Computation, 3, 79–87.
Article Google Scholar
Jacobs, R. A., (1997) “Bias/Variance Analyses of Mixtures-of-Experts Architectures”, Neural Computation, 9.
Google Scholar
Jeffreys, H., (1939), Theory of Probability, Clarendon Press, Oxford.
Google Scholar
Jensen, F.V., (1996), An introduction to Bayesian networks, University of Collage London Press.
Google Scholar
Jordan, M. I., & Jacobs, R.A., (1994), “Hierarchical mixtures of experts and the EM algorithm”, Neural Computation, 6, 181–214.
Article Google Scholar
Jordan, M. I., & Xu, L., (1995), “Convergence results for the EM approach to mixtures of experts”, Neural Networks, 8, 1409–1431.
Article Google Scholar
Jutten, C & Herault, J., (1988), “Independent Component Analysis versus Principal Component Analysis”, Proc. EUSIPCO88, 643–646.
Google Scholar
Kashyap, R.L., (1982), “Optimal choice of AR and MA parts in autoregressive and moving-average models”, IEEE Trans. PAMI, 4, 99–104.
MATH Google Scholar
Kass, R. E. & Raftery, A. E., (1995), “ Bayes factors”, Journal of the American Statistical Association, 90(430), 773–795.
Article MATH Google Scholar
Kass, R. E. & Wasserman, L., (1996), “The selection of prior distributions by formal rules”, Journal of the American Statistical Association, 91(435), 1343–1370.
Article MATH Google Scholar
Katz, R. W., (1981), “On some criteria for estimating the order of a Markov chain“, Technometrics, 23(3), 243–249.
Article MATH MathSciNet Google Scholar
King, I. & Xu, L., (1995), “Adaptive contrast enhancement by entropy maximization with a 1-K-1 constrained network”, Proc. ICONIP’95, pp703–706.
Google Scholar
Kohonen, T, (1995), Self-Organizing Maps, Springer-Verlag, Berlin.
Google Scholar
Kohonen, T., (1982), “Self-organized formation of topologically correct feature maps”, Biological Cybernetics 43, 59–69.
Article MATH MathSciNet Google Scholar
Kontkanen, P., et al, (1998), “Bayesian and Information-Theoretic priors for Bayeisan network parameters”, Machine Learning: ECML-98, Lecture Notes in Artificial Intelligence, Vol. 1398, 89–94, Springer-Verlag.
Google Scholar
Mackey, D. (1992a) “A practical Bayesian framework for backpropagation”, Neural Computation, 4, 448–472.
Article Google Scholar
Mackey, D. (1992b) “Bayesian Interpolation”, Neural Computation, 4, 405–447.
Google Scholar
von der Malsburg, Ch. (1973), Self-organization of orientation sensitive cells in the striate cortex, Kybernetik 14, 85–100.
Article Google Scholar
McDonald, R, (1985), Factor Analysis and Related Techniques, Lawrence Erlbaum.
Google Scholar
McLachlan, G. J. & Krishnan, T. (1997) The EM Algorithm and Extensions, John Wiley & Son, INC.
Google Scholar
Moody, J. & Darken, J., (1989) “Fast learning in networks of locally-tuned processing units”, Neural Computation, 1, 281–294.
Article Google Scholar
Neath, A.A., & Cavanaugh, J.E., (1997), “Regression and Time Series model selection using variants of the Schwarz information criterion”, Communications in Statistics A, 26, 559–580.
Article MATH MathSciNet Google Scholar
Neal, R.M., (1996), Bayesian learning for neural networks, New York: Springer-Verlag.
MATH Google Scholar
Newton, M. A. & Raftery, A. E., (1994), “Approximate Bayesian inference with the weighted likelihood Bootstrap”, J. Royal Statistical Society B, 56(1), 3–48.
MATH MathSciNet Google Scholar
Nowlan, S.J., (1990), “Max likelihood competition in RBF networks”, Tech. Rep. CRG-Tr-90-2, Dept. of Computer Sci., U. of Toronto.
Google Scholar
O’Hagan, A., (1995), “Fractional Bayes factors for model comparison”, J. Royal Statistical Society B, 57(1), 99–138.
MATH MathSciNet Google Scholar
Oja, E., (1983), Subspace Methods of Pattern Recognition, Research Studies Press, UK.
Google Scholar
Pearl, J, (1988), Probabilistic reasoning in intelligent systems: networks of plausible inference, San Fransisca, CA: Morgan Kaufman.
Google Scholar
Rabiner, L. & Juang, B.H., (1993), Fundamentals of Speech Recognition, Prentice Hall, Inc..
Google Scholar
Wolpert, D. H., (1997), “On Bias Plus Variance”, Neural Computation, 9.
Google Scholar
Vapnik, V.N., (1995), The Nature Of Statistical Learning Theory, Springer-Verlag.
Google Scholar
Redner, R.A. & Walker, H.F., (1984), “Mixture densities, maximum likelihood, and the EM algorithm”, SIAM Review, 26, 195–239.
Article MATH MathSciNet Google Scholar
Rissanen, J. (1986), “Stochastic complexity and modeling”, Annals of Statistics, 14(3), 1080–1100.
Article MATH MathSciNet Google Scholar
Rissanen, J. (1989), Stochastic Complexity in Statistical Inquiry, World Scientific: Singapore.
MATH Google Scholar
Rissanen, J. (1999), “Hypothesis selection and testing by the MDL principle”, Computer Journal, 42(4), 260–269.
Article MATH MathSciNet Google Scholar
Rivals, I. & Personnaz, L., (1999) “On Cross Validation for Model Selection”, Neural Computation, 11, 863–870.
Article Google Scholar
Rubi, D & Thayer, D., (1976), “EM algorithm for ML factor analysis”, Psychome-trika 57, 69–76.
Google Scholar
Rumelhart, D.E., Hinton, G.E., & Williams, R.J., (1986), “Learning internal representations by error propagation”, Parallel Distributed Processing, 1, MIT press.
Google Scholar
Sato, M., (2001), “Online model selection based on the vairational Bayes”, Neural Computation, 13, 1649–1681.
Article MATH Google Scholar
Schwarz, G., (1978), “Estimating the dimension of a model”, Annals of Statistics, 6, 461–464.
Article MATH MathSciNet Google Scholar
Sclove, S. L., (1987), “ Application of model-selection criteria to some problems in multivariate analysis”, Psychometrika, 52(3), 333–343.
Article Google Scholar
Spearman, C., (1904), “General intelligence domainively determined and measured”, Am. J. Psychol. 15, 201–293.
Article Google Scholar
Stone, M. (1974), “Cross-validatory choice and assessment of statistical prediction”, J. Royal Statistical Society B, 36, 111–147.
MATH Google Scholar
Stone, M. (1977a), “Asymptotics for and against cross-validation”, Biometrika, 64(1), 29–35.
Article MATH MathSciNet Google Scholar
Stone, M. (1977b), “An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion”, J. Royal Statistical Society B, 39(1), 44–47.
MATH Google Scholar
Stone, M. (1978),“Cross-validation: A review”, Math. Operat. Statist., 9, 127–140.
MATH Google Scholar
Stone, M. (1979), “Comments on model selection criteria of Akaike and Schwartz. J. Royal Statistical Society B, 41(2), 276–278.
Google Scholar
Sugiura, N. (1978), “Further analysis of data by Akaike’s infprmation criterion and the finite corrections”, Communications in Statistics A, 7, 12–26.
MathSciNet Google Scholar
Tikhonov, A.N. & Arsenin, V.Y., (1977), Solutions of Ill-posed Problems, V.H. Winston and Sons.
Google Scholar
Tipping, M.E., and Bishop, C.M., (1999), “Mixtures of probabilistic principal component analysis”, Neural Computation, 11, 443–482.
Article Google Scholar
Tong, L., Inouye, Y., & Liu, R., (1993) “Waveform-preserving blind estimation of multiple independent sources”, IEEE Trans. on Signal Processing 41, 2461–2470.
Article MATH Google Scholar
Wallace, C.S. & Boulton, D.M., (1968), “An information measure for classification”, Computer Journal, 11, 185–194.
MATH Google Scholar
Wallace, C.S. & Freeman, P.R., (1987), “Estimation and inference by compact coding”, J. of the Royal Statistical Society, 49(3), 240–265.
MATH MathSciNet Google Scholar
Wallace, C.S. & Dowe, D.R., (1999), “Minimum message length and Kolmogorov complexity”, Computer Journal, 42(4), 270–280.
Article MATH Google Scholar
Waterhouse, S., et al, (1996), “Bayesian method for mixture of experts”, D.S. Touretzky, et al, eds, Advances in NIPS 8, 351–357.
Google Scholar
Xu, L., (2002a), “ BYY Harmony Learning, Structural RPCL, and Topological Self-Organizing on Mixture Models ”, to appear on Neural Networks, 2002.
Google Scholar
Xu, L., (2002b), “ BYY Learning, Regularized Implementation, and Model Selection on Modular Networks with One Hidden Layer of Binary Units ”, to appear on a special issue, Neurocomputing, 2002.
Google Scholar
Xu, L., (2001a), “BYY Harmony Learning, Independent State Space and Generalized APT Financial Analyses ”, IEEE Trans on Neural Networks, 12(4), 822–849.
Article Google Scholar
Xu, L., (2001b), “Best Harmony, Unified RPCL and Automated Model Selection for Unsupervised and Supervised Learning on Gaussian Mixtures, Three-Layer Nets and ME-RBF-SVM Models”, Intl J. of Neural Systems 11(1), 43–69.
Google Scholar
Xu, L., (2000), “Temporal BYY Learning for State Space Approach, Hidden Markov Model and Blind Source Separation”, IEEE Trans on Signal Processing 48, 2132–2144.
Article MATH Google Scholar
Xu, L., (1998a), “RBF Nets, Mixture Experts, and Bayesian Ying-Yang Learning“, Neurocomputing, Vol. 19, No. 1-3, 223–257.
Article MATH Google Scholar
Xu, L., (1998b), “Bayesian Kullback Ying-Yang Dependence Reduction Theory”, Neurocomputing 22, No. 1-3, 81–112, 1998.
Article MATH Google Scholar
Xu, L., Cheung, C.C., & Amari, S.-I., (1998) “Learned Parametric Mixture Based ICA Algorithm”, Neurocomputing 22, No. 1-3, 69–80. A part of its preliminary version on Proc. ESANN97, 291–296.
Article MATH Google Scholar
Xu, L., (1997), “Bayesian Ying-Yang Machine, Clustering and Number of Clusters”, Pattern Recognition Letters 18, No. 11-13, 1167–1178.
Article Google Scholar
Xu, L. Yang, H.H., & Amari, S.-I., (1996), “Signal Source Separation by Mixtures Accumulative Distribution Functions or Mixture of Bell-Shape Density Distribution Functions ”, Research Proposal, presented at FRONTIER FORUM (speakers: D. Sherrington, S. Tanaka, L.Xu & J. F. Cardoso), organized by S.Amari, S.Tanaka & A.Cichocki, RIKEN, Japan, April 10, 1996.
Google Scholar
Xu, L., (1996&95), “A Unified Learning Scheme: Bayesian-Kullback YING-YANG Machine”, Advances in Neural Information Processing Systems, 8, 444–450 (1996). A part of its preliminary version on Proc. ICONIP95-Peking, 977–988(1995).
Google Scholar
Xu, L., (1995), “A unified learning framework: multisets modeling learning,” Proceedings of 1995 World Congress on Neural Networks, vol. 1, pp. 35–42.
Google Scholar
Xu, L., Jordan, M.I., & Hinton, G.E., (1995), “An Alternative Model for Mixtures of Experts”, Advances in Neural Information Processing Systems 7, eds., Cowan, J.D., et al, MIT Press, 633–640, 1995.
Google Scholar
Xu, L., Krzyzak, A., & Yuille, A.L., (1994), “On Radial Basis Function Nets and Kernel Regression: Statistical Consistency, Convergence Rates and Receptive Field Size”, Neural Networks, 7, 609–628.
Article MATH Google Scholar
Xu, L., Krzyzak, A. & Oja, E. (1993), “Rival Penalized Competitive Learning for Clustering Analysis, RBF net and Curve Detection”, IEEE Tr. on Neural Networks 4, 636–649.
Article Google Scholar
Xu, L., (1991&93) “Least mean square error reconstruction for self-organizing neural-nets”, Neural Networks 6, 627–648, 1993. Its early version on Proc. IJCNN91’Singapore, 2363-2373, 1991.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, NT, Hong Kong, P.R. China
Lei Xu

Authors

Lei Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Electronics, UMIST, Manchester, M60 1QD, UK
Hujun Yin , Nigel Allinson & Richard Freeman , &
Department of Computation, UMIST, Manchester, M60 1QD, UK
John Keane
Department of Biomolecular Science, UMIST, Manchester, M60 1QD, UK
Simon Hubbard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, L. (2002). Mining Dependence Structures from Statistical Learning Perspective. In: Yin, H., Allinson, N., Freeman, R., Keane, J., Hubbard, S. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2002. IDEAL 2002. Lecture Notes in Computer Science, vol 2412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45675-9_47

Download citation

DOI: https://doi.org/10.1007/3-540-45675-9_47
Published: 20 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44025-3
Online ISBN: 978-3-540-45675-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics