Improving Latent Semantic Analysis of Biomedical Literature Integrating UMLS Metathesaurus and Biomedical Pathways Databases

  • Francesco Abate
  • Elisa Ficarra
  • Andrea Acquaviva
  • Enrico Macii
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 273)


The increasing pace of biotechnological advances produced an unprecedented amount of both experimental data and biological information mostly diffused on the web. However, the heterogeneity of the data organization and the different knowledge representations open the ways to new challenges in the integration and the extraction of biological information fundamental for correctly interpreter experimental results.

In the present work we introduce a new methodology for quantitatively scoring the degree of biological correlation among biological terms occurring in biomedical abstracts. The proposed flow is based on the latent semantic analysis of biomedical literature coupled with the UMLS Metathesarurs and PubMed literature information. The results demonstrate that the structured and consolidated knowledge in the UMLS and pathway database efficiently improves the accuracy of the latent semantic analysis of biomedical literature.


Bioinformatic Latent semantic analysis Text mining Biological pathway 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Doms, A., Schroeder, M.: GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Research (2005)Google Scholar
  2. 2.
    Plake, C., Royer, L., Winnenburg, R., Hakenberg, J., Schroeder, M.: GoGene: gene annotation in the fast lane. Nucleic Acids Research (2009)Google Scholar
  3. 3.
  4. 4.
    MeSH, Medical Subject Headings (MeSH) Fact sheet. National Library of Medicine (2005)Google Scholar
  5. 5.
    The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics (2000)Google Scholar
  6. 6.
    Wang, J.Z., Du, Z., Payattakool, R., Yu, P.S., Chen, C.F.: A New Method to Measure the Semantic Similarity of GO Terms. Bioinformatics (2007)Google Scholar
  7. 7.
    Abate, F., Ficarra, E., Acquaviva, A., Macii, E.: An Automated Tool for Scoring Biomedical Terms Correlation Based on Semantic Analysis. In: International Conference on Complex, Intelligent and Software Intensive Systems (2010)Google Scholar
  8. 8.
    Gliozzo, A.M., Strapparava, C.: Domain Kernels for Text Categorization. In: Ninth Conference on Computational Natural Language Learning (2005)Google Scholar
  9. 9.
    Aronson, A.R.: Effective Mapping of Biomedical Text to the UMLS. Metathesaurus: The MetaMap Program. In: AMIA Fall Symposium (2001)Google Scholar
  10. 10.
    Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research (2004)Google Scholar
  11. 11.
    Hill, D.P., Smith, B., McAndrews-Hill, M.S., Blake, J.A.: Gene Ontology annotations: what they mean and where they come from. Bioinformatics (2008)Google Scholar
  12. 12.
    Kanehisa, M., Goto, S.: Kegg: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research (1999)Google Scholar
  13. 13.
    Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Research (2006)Google Scholar
  14. 14.
    Romero, P., Wagg, J., Green, M.L., Kaiser, D., Krummenacker, M., Karp, P.D.: Computational prediction of human metabolic pathways from the complete human genome. Genome Biology (2004)Google Scholar
  15. 15.
    Pathway Commons (2007),
  16. 16.
    Cerami, E.G., Bader, G.D., Gross, B.E., Sander, C.: cPath: open source software for collecting, storing, and querying biological pathways. Bioinformatics (2006)Google Scholar
  17. 17.
    Hermjakob, H., et al.: The HUPO PSI’s molecular interaction format community standard for the representation of protein interaction data. Natural Biotechnology (2004)Google Scholar
  18. 18.
    BioPAX: Biological Pathways Exchange (2007),
  19. 19.
    Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Sprinkling: Supervised Latent Semantic Indexing. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 510–514. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Francesco Abate
    • 1
  • Elisa Ficarra
    • 1
  • Andrea Acquaviva
    • 1
  • Enrico Macii
    • 1
  1. 1.Dep. of Control and Computer EngineeringPolitecnico di TorinoTorinoItaly

Personalised recommendations