Abstract
We present a suite of Machine Learning and knowledge-based components for textual-profile based gene prioritization. Most genetic diseases are characterized by many potential candidate genes that can cause the disease. Gene expression analysis typically produces a large number of co-expressed genes that could be potentially responsible for a given disease. Extracting prior knowledge from text-based genomic information sources is essential in order to reduce the list of potential candidate genes to be then further analyzed in laboratory. In this paper we present a suite of Machine Learning algorithms and knowledge-based components for improving the computational gene prioritization process. The suite includes basic Natural Language Processing capabilities, advanced text classification and clustering algorithms, robust information extraction components based on qualitative and quantitative keyword extraction methods and exploitation of lexical knowledge bases for semantic text processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Masys, D.R., Welsh, J.B., Fink, J.L., Gribskov, M., Klacansky, I., Corbeil, J.: Use of keyword hierarchies to interpret gene expression. Bioinformatics 17, 319–326 (2001)
Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)
Jenssen, T., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28, 21–28 (2001)
Raychaudhuri, S., Schutze, H., Altman, R.B.: Using text analysis to identify functionally coherent gene groups. Genome Res. 12, 1582–1590 (2002)
Shatkay, H., Edwards, S., Boguski, M.: Information retrieval meets gene analysis. IEEE Intelligent Systems (Special Issue on Intelligent Systems in Biology) 17, 45–53 (2002)
Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3 (2002)
Glenisson, P., Antal, P., Mathys, J., Moreau, Y., Moor, B.D.: Evaluation of the vector space representation in text-based gene clustering. In: Pacific Symposium on Biocomputing, pp. 391–402 (2003)
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and applications. Ellis Horwood, UK (1994)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography 3(4), 235–244 (1990)
Fellbaum, C.: WordNet an Electronic Database, pp. 1–23. MIT Press, Cambridge (1998)
Magnini, B., Strapparava, C., Pezzulo, G., Gliozzo, A.: The role of domain Information in Word Sense Disambiguation. Natural Language Engineering 8(4), 359–373 (2002)
Magnini, B., Cavagli, G.: Integrating Subject Field Codes into WordNet. ITC-irst. In: Proc. Second International Conference on Language Resources and Evaluation, LREC 2000, pp. 1–6 (2000)
Esposito, F., Ferilli, S., Fanizzi, N., Basile, T.M., Di Mauro, N.: Incremental multistrategy learning for document processing. Applied Artificial Intelligence: An International Journal 17(8/9), 859–883 (2003)
Ferilli, S., Basile, T.M.A., Biba, M., Di Mauro, N., Esposito, F.: A General Similarity Framework for Horn Clause Logic. Fundamenta Informaticae Journal 90(1-2), 43–66 (2009)
Esposito, F., Ferilli, S., Basile, T.M.A., Di Mauro, N.: Machine Learning for Digital Document Processing: From Layout Analysis To Metadata Extraction - Machine Learning in Document Analysis and Recognition, pp. 105–138 (2008)
Uzun, Y.: Keyword Extraction Using Naïve Bayes, Bilkent University, Department of Computer Science (2005)
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.: The similarity metric. IEEE Transactions On Information Theory 50(12) (December 2004)
Ferilli, S., Biba, M., Basile, T.M.A., Esposito, F.: Combining Qualitative and Quantitative Keyword Extraction Methods with Document Layout Analysis. In: Proceedings of 5th Italian Research Conference on Digital Libraries (IRCDL 2009), DELOS: an Association for Digital Libraries (2009)
Angioni, M., Demontis, R., Tuveri, F.: A Semantic Approach for Resource Cataloguing and Query Resolution. Communications of SIWN 5, 62–66 (2008)
Perez-Iratxeta, C., Wjst, M., Bork, P., Andrade, M.A.: G2D: a tool for mining genes associated with disease. BMC Genet. 6, 45 (2005)
Turner, F.S., Clutterbuck, D.R., Semple, C.A.M.: POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 4(11), R75 (2003)
Tiffin, N., Kelso, J.F., Powell, A.R., Pan, H., Bajic, V.B., Hide, W.A.: Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res. 33(5), 1544–1552 (2005)
Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., Carmeliet, P., Moreau, Y.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)
Gaulton, K.J., Mohlke, K.L., Vision, T.: A computational system to select candidate genes for complex human traits. Bioinformatics 23(9), 1132–1140 (2007)
Glenisson, P., Coessens, B., Van Vooren, S., Mathys, J., Moreau, Y., De Moor, B.: TXTGate: profiling gene groups with text-based information. Genome Biol. 5(6), R43 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esposito, F., Biba, M., Ferilli, S. (2010). Intelligent Text Processing Techniques for Textual-Profile Gene Characterization. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-14571-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)