Abstract
The accumulation of electronically accessible data and knowledge are posing theoretical and practical challenges for study design and statistical data analysis. It consists of the use of the results of earlier high-throughput measurements of genetic variations, microRNA, and gene expression levels, and the use of the biological knowledge bases. We investigate fusion in the phases of study design, data analysis, and interpretation; specifically, we present methodologies and bioinformatic tools in the Bayesian framework to deepen, lengthen, and broaden this fusion. First, we overview a Bayesian decision support for design of partial genetic association studies (GASs) incorporating domain literature, knowledge bases, and results of analysis of earlier studies. Second, we present a Bayesian multilevel analysis (BMLA) for GAS, which performs an integrated analysis at the univariate and multivariate levels, and at the level of interactions. Third, we present a Bayesian logic to support interpretation, which integrates the results of data analysis and factual domain knowledge. Finally, we discuss the advantages of the Bayesian framework to cope with small sample size, fusion of data and knowledge, challenges of multiple testing, meta-analysis, and positive results bias (i.e., the communication of scientific uncertainty). The genomics of asthma will serve as an application domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aerts S et al (2006) Gene prioritization through genomic data fusion. Nature 24:537–544
Ananiadou S, Mcnaught J (2006) Text mining for biology and biomedicine, Artech House
Antal P, Millinghoffer (2006) A literature mining using Bayesian networks. In Proceedings of third European workshop on probabilistic graphical models, Prague, pp 17–24
Antal P, Fannes G, Moreau Y, Timmerman D, DeMoor B (2004) Using literature and data to learn Bayesian networks as clinical models of ovarian tumors. Artif Intell Med 30:257–281
Antal P, Gezsi A, Hullam G, Millinghoffer A (2006) Learning complex Bayesian network features for classification. In: Proceedings of third European workshop on probabilistic graphical models, Prague, pp 9–16
Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791
Beckman Coulter – SNPStream: http://www.beckmancoulter.com/products/instrument/geneticanalysis/ceq/genomelab_snpstream_dcr.asp
Bonis J et al (2006) OSIRIS: A tool for retrieving literature about sequence variants. Bioinformatics 22(20):2567–2569
Boutilier C, Friedman N, Goldszmidt M, Koller D (1996) Context-Specific Independence in Bayesian Networks, Proc. of the 20th Conf. on Uncertainty in Artificial Intelligence ({UAI}-1996), 115–123
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347
Couzin J (2008) MicroRNAs make big impression in disease after disease. Science 319:1782–1784
De La Vega FM et al (2006) A tool for selecting SNPs for association studies based on observed linkage disequilibrium patterns. Pac Symp Biocomput 11:487–498
Denison DGT, Holmes CC, Mallick BK, Smith AFM (2002) Bayesian Methods for Nonlinear Classification and Regression. Wiley & Sons
Estivill X, Armengol L (2007) Copy number variants and common disorders: Filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet 3:1787–1799
Franke A et al (2006) Genomizer: An integrated analysis system for genome wide association data. Hum Mutat 27(6):583–588
Friedman N (2003) Inferring cellular networks using probabilistic graphical models. Science 303(5659):799–805
Friedman N, Koller D (2003) Being Bayesian about network structure. Mach Learn 50(2):95–125
Gamerman D (1997) Markov Chain Monte Carlo. Chapman & Hall, London
Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis. Chapman & Hall, London
Gerstein M, Junker J (2001) Blurring the boundaries between scientific “papers” and biological databases. Nature (web debate, on-line 7 May 2001)
Giudici P, Castelo R (2003) Improving Markov Chain Monte Carlo model search for data mining. Machine Learning, 50:127–158
Grover D et al (2007) QuickSNP: An automated web server for selection of tagSNPs. Nucleic Acids Res 35:W115–W120
Gu S et al (2005) HAPLOT: A graphical comparison of haplotype blocks, tagSNP sets and SNP variation for multiple populations. Bioinformatics 21(20):3938–3939
Ingenuity Systems (2007) Ingenuity pathways analysis
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Moffatt MF et al (2007) Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448:470–473
Pearl J (1988) Probabilistic reasoning in intelligent systems. Morgan Kaufmann, San Francisco
Peer D, Regev A, Elidan G, Friedman N (2001) Inferring subnetworks from perturbed expression profiles. Bioinformatics, Proc. of ISMB, 17(Suppl. 1):215–224
Petretto E, Liu ET, Aitman TJ (2007) A gene harvest revealing the archeology and complexity of human disease. Nat Genet 39:1299–1301
Pettersson F et al (2004) GOLDsurfer: Three dimensional display of linkage disequilibrium. Bioinformatics 20(17):3241–3243
Russel S, Norvig P (2001) Artificial intelligence. Prentice Hall
Shriner D, Vaughan LK, Padilla MA, Tiwari HK (2007) Problems with genome-wide association studies. Science 316:1840–1842
Szalai C, Ungvári I, Pelyhe L, Tölgyesi G, Falus A (2008) Asthma from a pharmacogenomic point of view. Br J Pharmacol 153:1602–1614
Wang L et al (2005) SNPHunter a bioinformatic software for single nucleotide polymorphism data acquisition and management. BMC Bioinformatics 6:16
Xu H et al (2005) SNPselector: A web tool for selecting SNPs for genetic association studies. Bioinformatics 21(22):4181–4186
Yue P et al (2006) SNPs3D: Candidate gene and SNP selection for association studies. BMC Bioinformatics 7:166
Zhang Y, Liu JS (2007) Bayesian inference of epistatic interactions incase-control studies. Nat Genet 39(9):1167–1173
Acknowledgments
We thank Yves Moreau for his insightful suggestion to apply the SNP study design system for prior generation in our Bayesian data analysis. Supported by grants from the OTKA National Scientific Research Fund (PD-76348); NKTH TECH_08-A1/2-2008-0120 (Genagrid), and the János Bolyai Research Scholarship of the Hungarian Academy of Sciences (P. Antal).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Antal, P., Millinghoffer, A., Hullám, G., Hajós, G., Szalai, C., Falus, A. (2009). A Bioinformatic Platform for a Bayesian, Multiphased, Multilevel Analysis in Immunogenomics. In: Flower, D., Davies, M., Ranganathan, S. (eds) Bioinformatics for Immunomics. Immunomics Reviews:, vol 3. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0540-6_11
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0540-6_11
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-0539-0
Online ISBN: 978-1-4419-0540-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)