Abstract
In many research areas today the number of features for which data is collected is much larger than the sample size based on which inference is made. This is especially true for applications in bioinformatics, but the theory presented here is of general interest in any data mining context, where the number of “interesting” features is expected to be small. In particular mBIC, mBIC1 and mBIC2 are discussed, three modifications of the Bayesian information criterion BIC which in case of an orthogonal designs control the family wise error (mBIC) and the false discovery rate (mBIC1, mBIC2), respectively. In a brief simulation study the performance of these criteria is illustrated for orthogonal and non-orthogonal regression matrices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34, 584–653 (2006)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 57, 289–300 (1995)
Bogdan,M., Chakrabarti, A., Frommlet, F., Ghosh, J. K.: Asymptotic Bayes-Optimality under sparsity of some multiple testing procedures. Ann. Statist. 39, 1551-1579 (2011)
Bogdan, M., Frommlet, F., Biecek, P., Cheng, R., Ghosh, J.K. and Doerge, R.W.: Extending the Modified Bayesian Information Criterion (mBIC) to dense markers and multiple interval mapping. Biometrics 64, 1162 – 1169 (2008)
Bogdan, M., Ghosh, J.K., Doerge, R.W.: Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitive trait loci. Genetics 167, 989–999 (2004)
Bogdan, M., ˙Zak-Szatkowska, M., Ghosh, J.K.: Selecting explanatory variables with themodified version of Bayesian Information Criterion. Quality and Reliability Engineering International 24, 627–641, (2008)
Frommlet, F., Chakrabarti, A., Murawska, M., Bogdan, M.,: Asymptotic Bayes optimality under sparsity for generally distributed effect sizes under the alternative. Technical report, arXiv:1005.4753 (2011)
Frommlet, F., Ruhaltinger, F., Twar´og, P., Bogdan, M.,: Modified versions of Bayesian Information Criterion for genome-wide association studies. CSDA, in print, doi:10.1016/j.csda.2011.05.005 (2011)
Hoggart, C.J., Whittaker, J.C., De Iorio, M., Balding, D.J.: Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies. PLOS Genetics 4(7), e1000130. doi:10.1371/journal.pgen.1000130, (2008)
Żak-Szatkowska M., Bogdan, M.: Modified versions of Bayesian Information Criterion for sparse Generalized Linear Models. CSDA, in press, doi:10.1016/j.csda.2011.04.016 (2011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Frommlet, F. (2012). Modifications of BIC for data mining under sparsity. In: Klatte, D., Lüthi, HJ., Schmedders, K. (eds) Operations Research Proceedings 2011. Operations Research Proceedings. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29210-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-29210-1_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29209-5
Online ISBN: 978-3-642-29210-1
eBook Packages: Business and EconomicsBusiness and Management (R0)