Modelling Multi-dimensional Contingency Tables: LASSO and Stepwise Algorithms

  • Nur Huda Nabihan Md ShahriEmail author
  • Susana Conde
Conference paper


This study identifies an efficient method for the main interaction between categorical variables in multi-dimensional contingency tables. LASSO and Stepwise Algorithms are cross-validation methods for finding the penalty coefficient and model selection method. The methods used the Akaike Information Criterion (AIC) and p-value as indicators for selecting parameters to be included in the model. The aims of the study are to review the literature related to multi-dimensional contingency tables with log-linear models and high dimensional tables; to analyse the obesity dataset from Locksmith (GSK) GP Research Database from around the year 2000, where the dataset is composed of p = 10 binary comorbidities in n = 5000 patients using the models; and lastly, to compare the results obtained from the models. Stepwise Algorithms is an appropriate method for finding the parsimonious interaction structure between the categorical variables. The method defines a continuous shrinking operation that can produce coefficients which are exactly zero.


Multi-dimensional Contingency table LASSO Stepwise algorithms 


  1. 1.
    Pearson, K.: X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. London Edinb. Dublin Philos. Mag. J. Sci. 50(302), 157–175 (1900)Google Scholar
  2. 2.
    Heyde, C.C., Seneta, E.: IJ Bienaymé: statistical theory anticipated, vol. 3. Springer Science & Business Media (2012)Google Scholar
  3. 3.
    Dobson, A.J., Barnett, A.: An introduction to generalized linear models. CRC Press (2008)Google Scholar
  4. 4.
    Fienberg, S.E., Rinaldo, A.: Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation. J. Stat. Plan. Inference 137(11), 3430–3445 (2007)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Powers, D., Xie, Y.: Statistical Methods for Categorical Data Analysis. Emerald Group Publishing (2008)Google Scholar
  6. 6.
    Fienberg, S.E.: The Analysis of Cross-Classified Categorical Data. Springer Science & Business Media (2007)Google Scholar
  7. 7.
    O’Flaherty, M., MacKenzie, G.: Algorithm AS 172: direct simulation of nested Fortran DO-LOOPS. J. R. Stat. Soc. Ser. C (Appl. Stat.) 31(1), 71–74Google Scholar
  8. 8.
    Bishop, Y.M., Fienberg, S.E., Holland, P.W., Light, R.J., Mosteller, F.: Book review: discrete multivariate analysis: theory and practice. Appl. Psychol. Meas. 1(2), 297–306 (1977)CrossRefGoogle Scholar
  9. 9.
    Dahinden, C., Parmigiani, G., Emerick, M.C., Bühlmann, P.: Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries. BMC Bioinform. 8(1), 476 (2007)CrossRefGoogle Scholar
  10. 10.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 267–288 (1996)Google Scholar
  11. 11.
    Tibshirani, R.J.: The lasso problem and uniqueness. Electron. J. Stat. 7, 1456–1490 (2013)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Universiti Teknologi MARAShah AlamMalaysia
  2. 2.University of GlasgowGlasgowUK

Personalised recommendations