A Supervised Machine Learning Approach of Extracting and Ranking Published Papers Describing Coexpression Relationships among Genes

Tiwari, Richa; Zhang, Chengcui; Solorio, Thamar; Chen, Wei-Bang

doi:10.1007/978-3-7091-0738-6_14

Richa Tiwari⁴,
Chengcui Zhang⁴,
Thamar Solorio⁴ &
…
Wei-Bang Chen⁴

457 Accesses

Abstract

In this chapter, we describe a framework to extract information about coexpression relationships among genes from published literature using a supervised machine learning approach, and later rank those papers to provide users with a complete specialized information retrieval system. We use Dynamic Conditional Random Fields (DCRFs), for training our classification model. Our approach is based on semantic analysis of text to classify the predicates describing coexpression rather than detecting the presence of keywords. Our framework outperformed the baseline by almost 52%, with DCRFs showing superior performance to Bayes Net, SVM, and Naïve Bayes classification algorithm. In our second experiment, the comparison of our ranked results to that of PubMed and Google demonstrates that our proposed model performs better than both in distinguishing a positive paper from a negative paper. In conclusion, this chapter describes a specialized classification and ranking framework that can retrieve articles that discuss coexpression among genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

PDFBox - Java PDF library, The Apache Software Foundation, http://incubator.apache.org/pdfbox/index.html
Blaschke, C., Valencia, A.: The Frame-Based module of the SUISEKI information extraction system. J. Intell. Syst. 17(2), 14–20 (2002)
Google Scholar
Bunescu, R., Mooney, R., Ramani, A., Marcotte, E.: Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from Medline. In: Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis (BioNLP ’06), pp. 49–56 (2006)
Google Scholar
Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.P.: Extraction of semantic biomedical relations from text using conditional random fields. J. BMC Bioinformatics 9(207), (2008)
Google Scholar
Clark, J., Koprinska I., Poon J.: A neural network based approach to automated e-mail classification. In: IEEE/WIC International Conference on Web Intelligence, pp. 702–705 (2003)
Google Scholar
Cohen, A., Hersch, W.: A survey of current work in biomedical text mining. Briefings in Bioinformatics. 6, 57–71 (2005)
Article Google Scholar
Coulibaly, I., Page, G.P.: Bioinformatic tools for inferring functional information from plant microarray data II: analysis beyond single gene. Int. J. Plant Genomics. (2008)
Google Scholar
Craven, M.: Learning to extract relations from medline. In: AAAI-99 Workshop on Machine Learning for Information Extraction (1999)
Google Scholar
Tsuruoka, Y., Jun’ichi, T.: Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data, Proceedings of HLT/EMNLP, pp. 467–474 (2005).
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P.W.I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), (2009)
Google Scholar
KDD: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 23–26 Edmonton, Alberta, CA http://www.sigkdd.org/kdd2002 (2002)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: 7th Conference on Natural Language Learning (CoNLL), pp: 188-191 (2003)
Google Scholar
MEDLINE, National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/pubs/facsheets/medline.html
Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., Tsujii, J.: Evaluating contributions of natural language parsers to protein-protein interaction extraction. J. Bioinformatics. 25(3), 394–400 (2009)
Article Google Scholar
Miwa, M., Saetre, R., Miyao, Y., Tsujii, J.: A rich feature vector for protein-protein interaction extraction from multiple corpora. In: Conference on Empirical Methods in Natural Language Processing (EMNLP ’09), pp. 121–130 (2009)
Google Scholar
PubMed, National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/pubmed
Peri, S., Navaroo, J.D., Kristiansen, T.Z., Amanchy, R., Surendranath, V., Muthusamy, B., Gandhi, T.K., Chandrika, K.N., Deshpande, N., Suresh, S.: Human protein referncee database as a discovery resource for proteomics: J. Nuclein Acids Res, (2004)
Google Scholar
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: AAAI Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)
Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning, MIT Press (2006)
Google Scholar
Sutton, C.: GRMM: GRaphical Models in Mallet, http://mallet.cs.umass.edu/grmm (2006)
Sutton, C., McCallum, A., Rohanimanesh, K.: Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. J. Mach. Learn. Res. 8, 693–723 (2004)
Google Scholar
Rau, L.F., Jacobs, P.S., Zernik, U.: Information extraction and text summarization using linguistic knowledge acquisition. J. Inform. Process. Manag. 25(4), 419–428 (1989)
Article Google Scholar
Tiwari, R., Zhang, C., Solorio, T.: A Supervised Machine Learning Approach of Extracting Coexpression Relationship among Genes from Literature. In: 11th IEEE International Conference on Information Reuse and Integration, pp. 98–103 (2010)
Google Scholar
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: 21st Annual Int. ACM SIGIR, New York, NY, USA, pp. 315–323 (1998)
Google Scholar
Wei, X., Croft, B., McCallum, A.: Table extraction for answer retrieval. J. Information Retrieval. 9(5), 589–611 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, The University of Alabama at Birmingham, 115A Campbell Hall, 1300 University Boulevard, Birmingham, Alabama, 35294, USA
Richa Tiwari, Chengcui Zhang, Thamar Solorio & Wei-Bang Chen

Authors

Richa Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Chengcui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Thamar Solorio
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Bang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richa Tiwari .

Editor information

Editors and Affiliations

, Department of Computer Engineering, Tobb University, Sögütözü Caddesi 43, Ankara, 06560, Turkey
Tansel Özyer
, Department of Electrical Engineering, University of Western Ontario, Building 363, London, N6A 5B9, Ontario, Canada
Keivan Kianmehr
, Department of Computer Engineering, Tobb University, Söğütözü Caddesi 43, Ankara, 06560, Ankara, Turkey
Mehmet Tan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tiwari, R., Zhang, C., Solorio, T., Chen, WB. (2012). A Supervised Machine Learning Approach of Extracting and Ranking Published Papers Describing Coexpression Relationships among Genes. In: Özyer, T., Kianmehr, K., Tan, M. (eds) Recent Trends in Information Reuse and Integration. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0738-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-7091-0738-6_14
Published: 20 August 2011
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0737-9
Online ISBN: 978-3-7091-0738-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics