Abstract
Biomedical literature offers a systematic catalogue of interpretations about data that can be used to infer new knowledge. The analysis of this literature (referred to in this chapter as the “bibliome”) in light of the exponential growth of biomedical data necessitates methodologies to transform data into knowledge. Such techniques are wrought with challenges, but offer some promise for transforming the big data deluge into novel hypotheses that can lead to new knowledge. This chapter provides an overview of the knowledge discovery process in the context of biomedical literature, and explains how such a process (referred to as “bibliome mining”) can be seen as an integral part of a learning healthcare system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ackoff R. From data to wisdom. J Appl Syst Anal. 1989;16:3–9.
Sarkar I. Methods in biomedical informatics: a pragmatic approach. Boston: Academic; 2013.
Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform. 2014.
Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48(1):38–44.
Sharing clinical research data: workshop summary. The National Academies Collection: Reports funded by National Institutes of Health. Washington, DC; 2013.
Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–70.
Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39(Web Server issue):W541–5.
Wiesenauer M, Johner C, Rohrig R. Secondary use of clinical data in healthcare providers – an overview on research, regulatory and ethical requirements. Stud Health Technol Inform. 2012;180:614–8.
Collen MF. Computer medical databases: the first six decades (1950–2010). London/New York: Springer; 2012. xix, 288 p.
Grivell L. Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep. 2002;3(3):200–3.
The tree of life blog by Jonathan Eisen [Mar 6, 2014]. Available from: http://phylogenomics.blogspot.com/2010/03/bibliome-wikipedia-free-encyclopedia.html.
Scientific data [Mar 6, 2014]. Available from: http://www.nature.com/scientificdata/.
Muller H, Michoux N, Bandon D, Geissbuhler A. A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform. 2004;73(1):1–23.
Lam HY, Marenco L, Clark T, Gao Y, Kinoshita J, Shepherd G, et al. AlzPharm: integration of neurodegeneration data using RDF. BMC Bioinforma. 2007;8 Suppl 3:S4.
Sandor A, de Waard A. Identifying claimed knowledge updates in biomedical research articles. Proceedings of the Workshop on Detecting Structure in Scholarly Discourse, Jeju Island, Korea. 2012. p. 10–7.
Ciccarese P, Wu E, Wong G, Ocana M, Kinoshita J, Ruttenberg A, et al. The SWAN biomedical discourse ontology. J Biomed Inform. 2008;41(5):739–51.
Beck J. NISO Z39.96 The Journal Article Tag Suite (JATS): what happened to the NLM DTDs? J Electron Publ. 2011;14(1). http://dx.doi.org/10.3998/3336451.0014.106
Cohen KB, Demner-Fushman D. Biomedical natural language processing. Amsterdam: John Benjamins Publishing Company; 2013. pages cm. p.
Ferucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, et al. Building Watson: an overview of the DeepQA Project. AI Mag. 2010;31(3):59–79.
Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36.
Jonquet C, Shah NH, Musen MA. The open biomedical annotator. Summit Transl Bioinforma. 2009;2009:56–60.
Cimino JJ. Infobuttons: anticipatory passive decision support. AMIA Annu Symp Proc. 2008:1203–4
Friedman C. A broad-coverage natural language processing system. AMIA Annu Symp Proc. 2000:270–4.
Dublin S, Baldwin E, Walker RL, Christensen LM, Haug PJ, Jackson ML, et al. Natural language processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22(8):834–41.
Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In: Proceedings of the workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, PA. 2002. p. 29–36.
Hahn U, Romacker M, Schulz S. MEDSYNDIKATE–a natural language system for the extraction of medical information from findings reports. Int J Med Inform. 2002;67(1–3):63–74.
D’Avolio LW, Nguyen TM, Farwell WR, Chen Y, Fitzmeyer F, Harris OM, et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). J Am Med Inform Assoc. 2010;17(4):375–82.
Cunningham H, Maynard D, Bontcheva K, Tablan V. GATE: an architecture for development of Robust HLT applications. In: ACL ‘02 Proceedings of the 40th annual meeting on Association for Computational Linguistics, Stroudsburg, PA; 2002. p. 168–75.
Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng. 2004;10(3–4):327–48.
Athenikos SJ, Han H. Biomedical question answering: a survey. Comput Methods Programs Biomed. 2010;99(1):1–24.
WolframAlpha [Mar 6, 2014]. Available from: http://www.wolframalpha.com/.
Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ. The NLM indexing initiative’s medical text indexer. Stud Health Technol Inform. 2004;107(Pt 1):268–72.
Weibel S. The Dublin core: a simple content description model for electronic resources. Bull Am Soc Inf Sci Technol. 1997;24(1):9–11.
Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, et al. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics. 2010;26(18):2354–6.
Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986;30(1):7–18.
DiGiacomo RA, Kremer JM, Shah DM. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am J Med. 1989;86(2):158–64.
Smalheiser NR, Swanson DR. Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed. 1998;57(3):149–53.
Arrowsmith [Mar 6, 2014]. Available from: http://arrowsmith.psych.uic.edu/arrowsmith_uic/.
Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1983. xv, 448 p.
Sarkar IN. A vector space model approach to identify genetically related diseases. J Am Med Inform Assoc. 2012;19(2):249–54.
Sharma V, Sarkar IN. Leveraging concept-based approaches to identify potential phyto-therapies. J Biomed Inform. 2013;46(4):602–14.
Carletta J. Assessing agreement on classification tasks: the Kappa statistic. Comput Linguis. 1996;22(2):249–54.
Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76(5):378–82.
Kwon SW. Surviving in the era of “Big Data”. Blood Res. 2013;48(3):167–8.
Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, et al. Big data: the future of biocuration. Nature. 2008;455(7209):47–50.
Baldwin G. Small fish, big data pond. Health Data Manag. 2009;17(9):48.
Fitbit [Mar 6, 2014]. Available from: https://www.fitbit.com/.
23andMe [Mar 6, 2014]. Available from: https://www.23andme.com/.
Additional Reading
Collen MF. Computer medical databases: the first six decades (1950–2010). London: Springer; 2012. xix, 288 p.
Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform. 2014. (in press) http://dx.doi.org/10.1016/j.jbi.2014.02.003
Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp. 2000:270–4.
Grivell L. Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep. 2002;3(3):200–3.
Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, et al. Big data: the future of biocuration. Nature. 2008;455(7209):47–50.
Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48(1):38–44.
Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1983. xv, 448 p.
Sarkar I. Methods in biomedical informatics: a pragmatic approach. Boston: Academic; 2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag London
About this chapter
Cite this chapter
Sarkar, I.N. (2015). Mining the Bibliome. In: Payne, P., Embi, P. (eds) Translational Informatics. Health Informatics. Springer, London. https://doi.org/10.1007/978-1-4471-4646-9_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4646-9_5
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4645-2
Online ISBN: 978-1-4471-4646-9
eBook Packages: MedicineMedicine (R0)