Biomedical Literature Mining

  • Chaolin Zhang
  • Michael Q. Zhang


A hurdle of large-scale genomic studies is to incorporate existing knowledge from published literature. This is accomplished by human experts but suffers from the heavy labor and the difficulty to keep knowledge up to date. Biomedical literature mining provides a potential solution to extracting and integrating useful information from literature automatically, which can lead to new discoveries.


Literature Mining PubMed Abstract Gene Interaction Network Functional Coherence Implicit Relationship 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adar, E. (2004) SaRAD: a Simple and Robust Abbreviation Dictionary. Bioinformatics20(4), 527–533.PubMedCrossRefGoogle Scholar
  2. Aderem, A. (2005) Systems biology: its practice and challenges. Cell121(4), 511–3PubMedCrossRefGoogle Scholar
  3. Ashburner, M., Ball, C.A., et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet25(1), 25–29PubMedCrossRefGoogle Scholar
  4. Bader, G.D., Donaldson, I., et al. (2001) BIND-The Biomolecular Interaction Network Database. Nucl. Acids. Res.29(1), 242–245PubMedCrossRefGoogle Scholar
  5. Becker, K., Hosack, D., et al. (2003) PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics4(1), 61.PubMedCrossRefGoogle Scholar
  6. Boeckmann, B., Bairoch, A., et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl. Acids Res.31(1), 365–370.PubMedCrossRefGoogle Scholar
  7. Cavalli-Sforza, L.L. (2005) The Human Genome Diversity Project: past, present and future. Nat Rev Genet6(4), 333–40.PubMedCrossRefGoogle Scholar
  8. Chang, J.T., Raychaudhuri, S., et al. (2001). Including biological literature improves homology search. Pac Symp Biocomput.Google Scholar
  9. Chen, L., Liu, H., et al. (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics4(1), 11Google Scholar
  10. Cohen, A., Hersh, W., et al. (2005) Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts. BMC Bioinformatics6(1),103PubMedCrossRefGoogle Scholar
  11. Collier, N., Nobata, C, et al. (2000). Extracting the names of genes and gene products with a hidden Markov model. Proceedings of the 18th International Conference on Computational Linguistics (COLING2000), Saarbruck, Allemagne.Google Scholar
  12. Ding, J., Berleant, D., et al. (2002). Mining MEDLINE: abstracts, sentences, or phrases? Pac Symp BiocomputGoogle Scholar
  13. Donaldson, I., Martin, J., et al. (2003) PreBIND and Textomy — mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics4(1), 11Google Scholar
  14. Emili, A.Q. and Cagney, G. (2000) Large-scale functional analysis using peptide or protein arrays. Nat Biotechnol18(4), 393–7.PubMedCrossRefGoogle Scholar
  15. Fukuda, K., Tsunoda, T., et al. (1998). Torward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing(PSB98), Hawaii.Google Scholar
  16. Hamosh, A., Scott, A.F., et al. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl. Acids Res.30(1), 52–55.PubMedCrossRefGoogle Scholar
  17. Hirschman, L., Park, J.C., et al. (2002) Accomplishments and challenges in literature data mining for biology. Bioinformatics18(12), 1553–1561.PubMedCrossRefGoogle Scholar
  18. Hoffmann, R. and Valencia, A. (2004) A gene network for navigating the literature. Nat Genet36(7), 664.PubMedCrossRefGoogle Scholar
  19. Impey, S., McCorkle, S.R., et al. (2004) Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell119(7), 1041–54.PubMedGoogle Scholar
  20. Jenssen, T.K., Laegreid, A., et al. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet28(1), 21–28.PubMedCrossRefGoogle Scholar
  21. Jeong, H., Tombor, B., et al. (2000) The large-scale organization of metabolic networks. Nature407(6804), 651–654.PubMedCrossRefGoogle Scholar
  22. Kanehisa, M. and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl. Acids. Res.28(1), 27–30.PubMedCrossRefGoogle Scholar
  23. Kim, T.H., Barrera, L.O., et al. (2005) A high-resolution map of active promoters in the human genome. Nature436(7052), 876–80.PubMedCrossRefGoogle Scholar
  24. Kirschner, M.W. (2005) The meaning of systems biology. Cell121(4), 503–4PubMedCrossRefGoogle Scholar
  25. Krallinger, M. and Valencia, A. (2005) Text-mining and information-retrieval services for molecular biology. Genome Biology6(7), 224PubMedCrossRefGoogle Scholar
  26. Leek, T.R. (1997). Information extraction using hidden Markov models. Department of Computer Science, University of California,Google Scholar
  27. San Diego. Lenhard, B., Hayes, W.S., et al. (2001) GeneLynx: a gene-centric portal to the human genome. Genome Res11(12), 2151–7.Google Scholar
  28. Liu, E.T. (2005) Systems biology, integrative biology, predictive biology. Cell121(4), 505–6.PubMedCrossRefGoogle Scholar
  29. Lockhart, D.J., Dong, H., et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol14(13), 1675–80.PubMedCrossRefGoogle Scholar
  30. Matsunaga, T. and Muramatsu, M.-a. (2005) Knowledge-based computational search for genes associated with the metabolic syndrome. Bioinformatics21(14), 3146–3154.PubMedCrossRefGoogle Scholar
  31. Palla, G., Derenyi, I., et al. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature435(7043), 814–818.PubMedCrossRefGoogle Scholar
  32. Ramani, A., Bunescu, R., et al. (2005) Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome. Genome Biology6(5),R40.PubMedCrossRefGoogle Scholar
  33. Raychaudhuri, S., Schutze, H., et al. (2003) Inclusion of textual documentation in the analysis of multidimensional data sets: Application to gene expression data. Machine Learning 52(1-2), 119–145CrossRefGoogle Scholar
  34. Reiner, A., Yekutieli, D., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics19(3), 368–375.PubMedCrossRefGoogle Scholar
  35. Rubinstein, R. and Simon, I. (2005) MILANO - custom annotation of microarray results using automatic literature searches. BMC Bioinformatics6(1), 12.PubMedCrossRefGoogle Scholar
  36. Safran, M., Solomon, I., et al. (2002) GeneCards 2002: towards a complete, object-oriented, human gene compendium. Bioinformatics18(11), 1542–3.PubMedCrossRefGoogle Scholar
  37. Salwinski, L., Miller, C.S., et al. (2004) The Database of Interacting Proteins: 2004 update. Nucl. Acids Res.32(90001), D449–451PubMedCrossRefGoogle Scholar
  38. Schuemie, M.J., Weeber, M., et al. (2004) Distribution of information in biomedical abstracts and full-text publications. Bioinformatics20(16), 2597–2604.PubMedCrossRefGoogle Scholar
  39. Shatkay, H. and Feldman, R. (2003) Mining the Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology10(6), 821–855.PubMedCrossRefGoogle Scholar
  40. Shen, D., Zhang, J., et al. (2003). Effective adaptation of hidden markov model-based named entity recognizer for biomedical domain. ACL-03 Workshop on Natural Language Processing in BiomedicineGoogle Scholar
  41. Shi, L. and Campagne, F. (2005) Building a protein name dictionary from full text: a machine learning term extraction approach. BMC Bioinformatics6(1), 88.PubMedCrossRefGoogle Scholar
  42. Sokal, R.R. and Rohlf, F.J. (1995). Biometry. New York, W. H. Freeman.Google Scholar
  43. Stephens, M., Palakal, M., et al. (2001). Detecting gene relationships from MEDLINE abatracts. Pac Symp Biocomput.Google Scholar
  44. Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. PNAS 100(16), 9440–9445.Google Scholar
  45. Temkin, J.M. and Gilder, M.R. (2003) Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics19(16), 2046–2053.PubMedCrossRefGoogle Scholar
  46. Venter, J.C., Adams, M.D., et al. (2001) The sequence of the human genome. Science 291(5507), 1304–51.PubMedCrossRefGoogle Scholar
  47. Watson, J.D. (1990) The human genome project: past, present, and future. Science248(4951), 44–9.PubMedCrossRefGoogle Scholar
  48. Wilkinson, D.M. and Huberman, B.A. (2004) A method for finding communities of related genes. PNAS101(suppl_l), 5241–5248Google Scholar
  49. Wren, J.D., Bekeredjian, R., et al. (2004) Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics20(3), 389–398.PubMedCrossRefGoogle Scholar
  50. Wren, J.D., Chang, J.T., et al. (2005) Biomedical term mapping databases. Nucl. Acids Res. 33(suppl_l), D289–293.Google Scholar
  51. Yuan, G.C., Liu, Y.J., et al. (2005) Genome-scale identification of nucleosome positions in S. cerevisiae. Science309(5734), 626–30.PubMedCrossRefGoogle Scholar
  52. Zanzoni, A., Montecchi-Palazzi, L., et al. (2002) MINT: a Molecular INTeraction database. FEBS Letters513(1), 135–140.PubMedCrossRefGoogle Scholar
  53. Zhang, C. and Li, S. (2004). Modeling of neuro-endoimmune network via subject oriented literature mining. The Fourth International Conference on Bioinformatics of Genome Regulation and Structure (BGRS2004).Google Scholar
  54. Zhou, G., Zhang, J., et al. (2004) Recognizing names in biomedical texts: a machine learning approach. Bioinformatics20(7), 1178–1190.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Chaolin Zhang
    • 1
  • Michael Q. Zhang
    • 2
  1. 1.Cold Spring Harbor LaboratoryCold Spring Harbor
  2. 2.Department of Biomedical EngineeringState University ofNewyork at Stony Brook

Personalised recommendations