Intelligent Text Processing Techniques for Textual-Profile Gene Characterization

Esposito, Floriana; Biba, Marenglen; Ferilli, Stefano

doi:10.1007/978-3-642-14571-1_3

Floriana Esposito²²,
Marenglen Biba²³ &
Stefano Ferilli²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6160))

Included in the following conference series:

International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics

961 Accesses

Abstract

We present a suite of Machine Learning and knowledge-based components for textual-profile based gene prioritization. Most genetic diseases are characterized by many potential candidate genes that can cause the disease. Gene expression analysis typically produces a large number of co-expressed genes that could be potentially responsible for a given disease. Extracting prior knowledge from text-based genomic information sources is essential in order to reduce the list of potential candidate genes to be then further analyzed in laboratory. In this paper we present a suite of Machine Learning algorithms and knowledge-based components for improving the computational gene prioritization process. The suite includes basic Natural Language Processing capabilities, advanced text classification and clustering algorithms, robust information extraction components based on qualitative and quantitative keyword extraction methods and exploitation of lexical knowledge bases for semantic text processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Masys, D.R., Welsh, J.B., Fink, J.L., Gribskov, M., Klacansky, I., Corbeil, J.: Use of keyword hierarchies to interpret gene expression. Bioinformatics 17, 319–326 (2001)
Article Google Scholar
Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Jenssen, T., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28, 21–28 (2001)
Article Google Scholar
Raychaudhuri, S., Schutze, H., Altman, R.B.: Using text analysis to identify functionally coherent gene groups. Genome Res. 12, 1582–1590 (2002)
Article Google Scholar
Shatkay, H., Edwards, S., Boguski, M.: Information retrieval meets gene analysis. IEEE Intelligent Systems (Special Issue on Intelligent Systems in Biology) 17, 45–53 (2002)
Google Scholar
Chaussabel, D., Sher, A.: Mining microarray expression data by literature profiling. Genome Biol. 3 (2002)
Google Scholar
Glenisson, P., Antal, P., Mathys, J., Moreau, Y., Moor, B.D.: Evaluation of the vector space representation in text-based gene clustering. In: Pacific Symposium on Biocomputing, pp. 391–402 (2003)
Google Scholar
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and applications. Ellis Horwood, UK (1994)
MATH Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography 3(4), 235–244 (1990)
Article Google Scholar
Fellbaum, C.: WordNet an Electronic Database, pp. 1–23. MIT Press, Cambridge (1998)
MATH Google Scholar
Magnini, B., Strapparava, C., Pezzulo, G., Gliozzo, A.: The role of domain Information in Word Sense Disambiguation. Natural Language Engineering 8(4), 359–373 (2002)
Article Google Scholar
Magnini, B., Cavagli, G.: Integrating Subject Field Codes into WordNet. ITC-irst. In: Proc. Second International Conference on Language Resources and Evaluation, LREC 2000, pp. 1–6 (2000)
Google Scholar
Esposito, F., Ferilli, S., Fanizzi, N., Basile, T.M., Di Mauro, N.: Incremental multistrategy learning for document processing. Applied Artificial Intelligence: An International Journal 17(8/9), 859–883 (2003)
Article Google Scholar
Ferilli, S., Basile, T.M.A., Biba, M., Di Mauro, N., Esposito, F.: A General Similarity Framework for Horn Clause Logic. Fundamenta Informaticae Journal 90(1-2), 43–66 (2009)
MATH Google Scholar
Esposito, F., Ferilli, S., Basile, T.M.A., Di Mauro, N.: Machine Learning for Digital Document Processing: From Layout Analysis To Metadata Extraction - Machine Learning in Document Analysis and Recognition, pp. 105–138 (2008)
Google Scholar
Uzun, Y.: Keyword Extraction Using Naïve Bayes, Bilkent University, Department of Computer Science (2005)
Google Scholar
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.: The similarity metric. IEEE Transactions On Information Theory 50(12) (December 2004)
Google Scholar
Ferilli, S., Biba, M., Basile, T.M.A., Esposito, F.: Combining Qualitative and Quantitative Keyword Extraction Methods with Document Layout Analysis. In: Proceedings of 5th Italian Research Conference on Digital Libraries (IRCDL 2009), DELOS: an Association for Digital Libraries (2009)
Google Scholar
Angioni, M., Demontis, R., Tuveri, F.: A Semantic Approach for Resource Cataloguing and Query Resolution. Communications of SIWN 5, 62–66 (2008)
Google Scholar
Perez-Iratxeta, C., Wjst, M., Bork, P., Andrade, M.A.: G2D: a tool for mining genes associated with disease. BMC Genet. 6, 45 (2005)
Article Google Scholar
Turner, F.S., Clutterbuck, D.R., Semple, C.A.M.: POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 4(11), R75 (2003)
Article Google Scholar
Tiffin, N., Kelso, J.F., Powell, A.R., Pan, H., Bajic, V.B., Hide, W.A.: Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res. 33(5), 1544–1552 (2005)
Article Google Scholar
Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., Carmeliet, P., Moreau, Y.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)
Article Google Scholar
Gaulton, K.J., Mohlke, K.L., Vision, T.: A computational system to select candidate genes for complex human traits. Bioinformatics 23(9), 1132–1140 (2007)
Article Google Scholar
Glenisson, P., Coessens, B., Van Vooren, S., Mathys, J., Moreau, Y., De Moor, B.: TXTGate: profiling gene groups with text-based information. Genome Biol. 5(6), R43 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bari, Via, E. Orabona, 4, 70125, Bari, Italy
Floriana Esposito & Stefano Ferilli
Department of Computer Science, University of New York Tirana, Rr. Komuna e Parisit, Tirana, Albania
Marenglen Biba

Authors

Floriana Esposito
View author publications
You can also search for this author in PubMed Google Scholar
Marenglen Biba
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Ferilli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DISI - Dipartimento di Informatica e Scienze dell’Informazione, Università di Genova, Via Dodecaneso 35, 16146, Genova, Italy
Francesco Masulli
Center for Biostatistics, The Methodist Hospital Research Institute (TMHRI), Weill Cornell Medical College, Cornell University, 6565 Fannin, Suite MGJ6-031, 77030, Houston, Texas, USA
Leif E. Peterson
Dipartimento di Matematica ed Informatica, Università di Salerno, Via Ponte don Melillo, 84084, Fisciano, (Sa), Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Esposito, F., Biba, M., Ferilli, S. (2010). Intelligent Text Processing Techniques for Textual-Profile Gene Characterization. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-14571-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics