Prediction of Single-Nucleotide Polymorphisms Causative of Rare Diseases

Ferraro, Maria Brigida; Guarracino, Mario Rosario

doi:10.1007/978-3-319-09042-9_15

Maria Brigida Ferraro⁷ &
Mario Rosario Guarracino⁸

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8452))

Included in the following conference series:

International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics

954 Accesses
2 Citations

Abstract

The study of rare diseases uses next-generation sequencing (NGS) technology to detect causative mutations in the human genome. NGS is a new approach for biomedical research, useful for the genetic diagnosis in extremely heterogeneous conditions. Nevertheless, only few publications address the problem when pooled experiments are considered, and existing tools are often inaccurate. In this work we focus on rare diseases and we describe how data are generated by NGS.

We present how data are organized in the pre-processing phase, how they are filtered and features constructed in the learning phase. We compare different computational procedures to identify and classify variants potentially related to rare diseases and we biologically validate the obtained results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

An Introduction to Next-Generation Sequencing Technology. www.illumina.com/NGS
Licastro, D., Mutarelli, M., Peluso, I., Neveling, K., Wieskamp, N., Rispoli, R., Vozzi, D., Athanasakis, E., D’Eustacchio, A., Pizzo, M., D’Amico, F., Ziviello, C., Simonelli, F., Fabretto, A., Scheffer, H., Gasparini, P., Banfi, S., Nigro, V.: Molecular diagnosis of Usher syndrome: application of two different next generation sequencing-based procedures. PLoS ONE 7, Article number 43799 (2012)
Google Scholar
Cacciottolo, M., Numitone, G., Aurino, S., Caserta, I.R., Fanin, M., Politano, L., Minetti, C., Ricci, E., Piluso, G., Angelini, C., Nigro, V.: Muscular dystrophy with marked dysferlin deficiency is consistently caused by primary dysferlin gene mutations. Eur. J. Hum. Genet. 19, 974–980 (2011)
Article Google Scholar
Nigro, V.: Improving the course of muscular dystrophy? (Editorial). Acta Myol. 31, 109 (2012)
Google Scholar
Kaplan, J.C.: The 2012 version of the gene table of monogenic neuromuscular disorders. Neuromuscul. Disord. 21, 833–861 (2011)
Article Google Scholar
Futschik, A., Schlotterer, C.: The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186, 207–218 (2010)
Article Google Scholar
Calvo, S., Tucker, E., Compton, A., Kirby, D., Crawford, G., Burtt, N., Rivas, M., Guiducci, C., Bruno, D., Goldberger, O., Redman, M., Wiltshire, E., Wilson, C., Altshuler, D., Gabriel, S., Daly, M., Thorburn, D., Mootha, V.: High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nat. Genet. 42(10), 851–860 (2011)
Article Google Scholar
Wang, T., Pradhan, K., Ye, K., Wong, L.-J., Rohan, T.: Estimating allele frequency from next-generation sequencing of pooled mitochondrial DNA samples. Front. Genet. 2, 51 (2011)
Google Scholar
Ding, J., Bashashati, A., Roth, A., Oloumi, A., Tse, K., Zeng, T., Haffari, G., Hirst, M., Marra, M., Condon, A., Aparicio, S., Shah, S.: Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics 28, 167–175 (2012)
Article Google Scholar
Next-Gen Sequencing: Advancing Sequencing for a Better World. Agilent Technologies Target Enrichment Solutions. www.agilent.com/genomics/ngs
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., G.P.D.P. Subgroup: The sequence alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)
Article Google Scholar
Mangasarian, O., Wild, E.: Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 28, 69–74 (2006)
Article Google Scholar
Parlett, B.N.: The Symmetric Eigenvalue Problem, p. 357. SIAM, Philadelphia (1998)
Book MATH Google Scholar
Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Clarendon Press, Oxford (1988)
MATH Google Scholar
Guarracino, M.R., Cifarelli, C., Seref, O., Pardalos, P.: A classification algorithm based on generalized eigenvalue problems. Optim. Method Softw. 22, 73–81 (2007)
Article MATH MathSciNet Google Scholar
Cifarelli, C., Guarracino, M., Seref, O., Cuciniello, S., Pardalos, P.: Incremental classification with generalized eigenvalues. J. Classif. 24, 205–219 (2007)
Article MATH MathSciNet Google Scholar
DePristo, M.A., Banks, E., Poplin, R.E., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Daly, M.J.: A framework for variation discovery and genotyping using nextgeneration DNA sequencing data. Nat. Genet. 43, 491–498 (2011)
Article Google Scholar
McKenna, A., Hanna, M., Banks, E., et al.: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)
Article Google Scholar
Wei, Z., Wang, W., Hu, P., Lyon, G.J., Hakonarson, H.: SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, 1–13 (2011)
Article Google Scholar
Garrison, E., Marth, G. (2012). Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907
Bansal, V.: A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26, 318–324 (2010)
Article Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997)
Article MATH Google Scholar
Fix, E., Hodges, J.L.: Discriminatory analysis, non parametric discrimination: consistency properties. Technical report 4, USAF School of Aviation Medicine, Randolph Field, Texas (1951)
Google Scholar
Broomhead, D.S., Lowe, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–355 (1988)
MATH MathSciNet Google Scholar
Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Mach. Learn. 95, 161–205 (2005)
Article Google Scholar
Sumner, M., Frank, E., Hall, M.: Speeding up logistic model tree induction. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 675–683. Springer, Heidelberg (2005)
Google Scholar
Rennie, J.D.M., Shih, L., Teevan, J., Karge, D.R.: Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 616–623 (2003)
Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated Sub-GrAdient SOlver for SVM. In: 24th International Conference on Machine Learning, pp. 807–814 (2007)
Google Scholar
Ng, P.C., Henikoff, S.: SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31, 3812–3814 (2003)
Article Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, Berkshire (1997)
MATH Google Scholar

Download references

Acknowledgment

Authors would like to thank V. Nigro, M. Savarese, G. Di Fruscio, T. Giugliano, M. Iacomino, A. Torella, A. Garofalo, C. Pisano, F. Del Vecchio Blanco and G. Piluso (Seconda Universitá di Napoli, Patologia Generale), M. Mutarelli, V. Singh Marwah and M. Dionisi (TIGEM), and Italian LGMD network. This work has been partially funded by Italian Flagship project Interomics and by \(\mathrm{{PON02}}\_\)00619 projects.

Author information

Authors and Affiliations

Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maria Brigida Ferraro
High Performance Computing and Networking Institute, National Research Council, Naples, Italy
Mario Rosario Guarracino

Authors

Maria Brigida Ferraro
View author publications
You can also search for this author in PubMed Google Scholar
Mario Rosario Guarracino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Brigida Ferraro .

Editor information

Editors and Affiliations

University Nice Sophia Antipolis, Sophia Antipolis, France
Enrico Formenti
University of Salerno, Fisciano, Italy
Roberto Tagliaferri
University of Groningen, AG Groningen, The Netherlands
Ernst Wit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferraro, M.B., Guarracino, M.R. (2014). Prediction of Single-Nucleotide Polymorphisms Causative of Rare Diseases. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-09042-9_15
Published: 16 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09041-2
Online ISBN: 978-3-319-09042-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics