Abstract
In this chapter, we describe a framework to extract information about coexpression relationships among genes from published literature using a supervised machine learning approach, and later rank those papers to provide users with a complete specialized information retrieval system. We use Dynamic Conditional Random Fields (DCRFs), for training our classification model. Our approach is based on semantic analysis of text to classify the predicates describing coexpression rather than detecting the presence of keywords. Our framework outperformed the baseline by almost 52%, with DCRFs showing superior performance to Bayes Net, SVM, and Naïve Bayes classification algorithm. In our second experiment, the comparison of our ranked results to that of PubMed and Google demonstrates that our proposed model performs better than both in distinguishing a positive paper from a negative paper. In conclusion, this chapter describes a specialized classification and ranking framework that can retrieve articles that discuss coexpression among genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
PDFBox - Java PDF library, The Apache Software Foundation, http://incubator.apache.org/pdfbox/index.html
Blaschke, C., Valencia, A.: The Frame-Based module of the SUISEKI information extraction system. J. Intell. Syst. 17(2), 14–20 (2002)
Bunescu, R., Mooney, R., Ramani, A., Marcotte, E.: Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from Medline. In: Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis (BioNLP ’06), pp. 49–56 (2006)
Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.P.: Extraction of semantic biomedical relations from text using conditional random fields. J. BMC Bioinformatics 9(207), (2008)
Clark, J., Koprinska I., Poon J.: A neural network based approach to automated e-mail classification. In: IEEE/WIC International Conference on Web Intelligence, pp. 702–705 (2003)
Cohen, A., Hersch, W.: A survey of current work in biomedical text mining. Briefings in Bioinformatics. 6, 57–71 (2005)
Coulibaly, I., Page, G.P.: Bioinformatic tools for inferring functional information from plant microarray data II: analysis beyond single gene. Int. J. Plant Genomics. (2008)
Craven, M.: Learning to extract relations from medline. In: AAAI-99 Workshop on Machine Learning for Information Extraction (1999)
Tsuruoka, Y., Jun’ichi, T.: Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data, Proceedings of HLT/EMNLP, pp. 467–474 (2005).
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P.W.I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1), (2009)
KDD: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 23–26 Edmonton, Alberta, CA http://www.sigkdd.org/kdd2002 (2002)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: 7th Conference on Natural Language Learning (CoNLL), pp: 188-191 (2003)
MEDLINE, National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/pubs/facsheets/medline.html
Miyao, Y., Sagae, K., Saetre, R., Matsuzaki, T., Tsujii, J.: Evaluating contributions of natural language parsers to protein-protein interaction extraction. J. Bioinformatics. 25(3), 394–400 (2009)
Miwa, M., Saetre, R., Miyao, Y., Tsujii, J.: A rich feature vector for protein-protein interaction extraction from multiple corpora. In: Conference on Empirical Methods in Natural Language Processing (EMNLP ’09), pp. 121–130 (2009)
PubMed, National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/pubmed
Peri, S., Navaroo, J.D., Kristiansen, T.Z., Amanchy, R., Surendranath, V., Muthusamy, B., Gandhi, T.K., Chandrika, K.N., Deshpande, N., Suresh, S.: Human protein referncee database as a discovery resource for proteomics: J. Nuclein Acids Res, (2004)
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: AAAI Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)
Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. Introduction to Statistical Relational Learning, MIT Press (2006)
Sutton, C.: GRMM: GRaphical Models in Mallet, http://mallet.cs.umass.edu/grmm (2006)
Sutton, C., McCallum, A., Rohanimanesh, K.: Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. J. Mach. Learn. Res. 8, 693–723 (2004)
Rau, L.F., Jacobs, P.S., Zernik, U.: Information extraction and text summarization using linguistic knowledge acquisition. J. Inform. Process. Manag. 25(4), 419–428 (1989)
Tiwari, R., Zhang, C., Solorio, T.: A Supervised Machine Learning Approach of Extracting Coexpression Relationship among Genes from Literature. In: 11th IEEE International Conference on Information Reuse and Integration, pp. 98–103 (2010)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: 21st Annual Int. ACM SIGIR, New York, NY, USA, pp. 315–323 (1998)
Wei, X., Croft, B., McCallum, A.: Table extraction for answer retrieval. J. Information Retrieval. 9(5), 589–611 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Vienna
About this chapter
Cite this chapter
Tiwari, R., Zhang, C., Solorio, T., Chen, WB. (2012). A Supervised Machine Learning Approach of Extracting and Ranking Published Papers Describing Coexpression Relationships among Genes. In: Özyer, T., Kianmehr, K., Tan, M. (eds) Recent Trends in Information Reuse and Integration. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0738-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-7091-0738-6_14
Published:
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0737-9
Online ISBN: 978-3-7091-0738-6
eBook Packages: Computer ScienceComputer Science (R0)