Mining the Bibliome

Sarkar, Indra Neil

doi:10.1007/978-1-4471-4646-9_5

Indra Neil Sarkar PhD, MLIS^3,4

Part of the book series: Health Informatics ((HI))

1158 Accesses

Abstract

Biomedical literature offers a systematic catalogue of interpretations about data that can be used to infer new knowledge. The analysis of this literature (referred to in this chapter as the “bibliome”) in light of the exponential growth of biomedical data necessitates methodologies to transform data into knowledge. Such techniques are wrought with challenges, but offer some promise for transforming the big data deluge into novel hypotheses that can lead to new knowledge. This chapter provides an overview of the knowledge discovery process in the context of biomedical literature, and explains how such a process (referred to as “bibliome mining”) can be seen as an integral part of a learning healthcare system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ackoff R. From data to wisdom. J Appl Syst Anal. 1989;16:3–9.
Google Scholar
Sarkar I. Methods in biomedical informatics: a pragmatic approach. Boston: Academic; 2013.
Google Scholar
Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform. 2014.
Google Scholar
Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48(1):38–44.
CAS PubMed Google Scholar
Sharing clinical research data: workshop summary. The National Academies Collection: Reports funded by National Institutes of Health. Washington, DC; 2013.
Google Scholar
Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(Database issue):D267–70.
Article CAS PubMed Central PubMed Google Scholar
Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39(Web Server issue):W541–5.
Article CAS PubMed Central PubMed Google Scholar
Wiesenauer M, Johner C, Rohrig R. Secondary use of clinical data in healthcare providers – an overview on research, regulatory and ethical requirements. Stud Health Technol Inform. 2012;180:614–8.
PubMed Google Scholar
Collen MF. Computer medical databases: the first six decades (1950–2010). London/New York: Springer; 2012. xix, 288 p.
Google Scholar
Grivell L. Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep. 2002;3(3):200–3.
Article CAS PubMed Central PubMed Google Scholar
The tree of life blog by Jonathan Eisen [Mar 6, 2014]. Available from: http://phylogenomics.blogspot.com/2010/03/bibliome-wikipedia-free-encyclopedia.html.
Scientific data [Mar 6, 2014]. Available from: http://www.nature.com/scientificdata/.
Muller H, Michoux N, Bandon D, Geissbuhler A. A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform. 2004;73(1):1–23.
Article PubMed Google Scholar
Lam HY, Marenco L, Clark T, Gao Y, Kinoshita J, Shepherd G, et al. AlzPharm: integration of neurodegeneration data using RDF. BMC Bioinforma. 2007;8 Suppl 3:S4.
Article Google Scholar
Sandor A, de Waard A. Identifying claimed knowledge updates in biomedical research articles. Proceedings of the Workshop on Detecting Structure in Scholarly Discourse, Jeju Island, Korea. 2012. p. 10–7.
Google Scholar
Ciccarese P, Wu E, Wong G, Ocana M, Kinoshita J, Ruttenberg A, et al. The SWAN biomedical discourse ontology. J Biomed Inform. 2008;41(5):739–51.
Article PubMed Google Scholar
Beck J. NISO Z39.96 The Journal Article Tag Suite (JATS): what happened to the NLM DTDs? J Electron Publ. 2011;14(1). http://dx.doi.org/10.3998/3336451.0014.106
Cohen KB, Demner-Fushman D. Biomedical natural language processing. Amsterdam: John Benjamins Publishing Company; 2013. pages cm. p.
Google Scholar
Ferucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, et al. Building Watson: an overview of the DeepQA Project. AI Mag. 2010;31(3):59–79.
Google Scholar
Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36.
PubMed Central PubMed Google Scholar
Jonquet C, Shah NH, Musen MA. The open biomedical annotator. Summit Transl Bioinforma. 2009;2009:56–60.
Google Scholar
Cimino JJ. Infobuttons: anticipatory passive decision support. AMIA Annu Symp Proc. 2008:1203–4
Google Scholar
Friedman C. A broad-coverage natural language processing system. AMIA Annu Symp Proc. 2000:270–4.
Google Scholar
Dublin S, Baldwin E, Walker RL, Christensen LM, Haug PJ, Jackson ML, et al. Natural language processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22(8):834–41.
Article PubMed Google Scholar
Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In: Proceedings of the workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, PA. 2002. p. 29–36.
Google Scholar
Hahn U, Romacker M, Schulz S. MEDSYNDIKATE–a natural language system for the extraction of medical information from findings reports. Int J Med Inform. 2002;67(1–3):63–74.
Article PubMed Google Scholar
D’Avolio LW, Nguyen TM, Farwell WR, Chen Y, Fitzmeyer F, Harris OM, et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). J Am Med Inform Assoc. 2010;17(4):375–82.
Article PubMed Central PubMed Google Scholar
Cunningham H, Maynard D, Bontcheva K, Tablan V. GATE: an architecture for development of Robust HLT applications. In: ACL ‘02 Proceedings of the 40th annual meeting on Association for Computational Linguistics, Stroudsburg, PA; 2002. p. 168–75.
Google Scholar
Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng. 2004;10(3–4):327–48.
Article Google Scholar
Athenikos SJ, Han H. Biomedical question answering: a survey. Comput Methods Programs Biomed. 2010;99(1):1–24.
Article PubMed Google Scholar
WolframAlpha [Mar 6, 2014]. Available from: http://www.wolframalpha.com/.
Aronson AR, Mork JG, Gay CW, Humphrey SM, Rogers WJ. The NLM indexing initiative’s medical text indexer. Stud Health Technol Inform. 2004;107(Pt 1):268–72.
PubMed Google Scholar
Weibel S. The Dublin core: a simple content description model for electronic resources. Bull Am Soc Inf Sci Technol. 1997;24(1):9–11.
Article Google Scholar
Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, et al. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics. 2010;26(18):2354–6.
Article CAS PubMed Central PubMed Google Scholar
Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986;30(1):7–18.
CAS PubMed Google Scholar
DiGiacomo RA, Kremer JM, Shah DM. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am J Med. 1989;86(2):158–64.
Article CAS PubMed Google Scholar
Smalheiser NR, Swanson DR. Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed. 1998;57(3):149–53.
Article CAS PubMed Google Scholar
Arrowsmith [Mar 6, 2014]. Available from: http://arrowsmith.psych.uic.edu/arrowsmith_uic/.
Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1983. xv, 448 p.
Google Scholar
Sarkar IN. A vector space model approach to identify genetically related diseases. J Am Med Inform Assoc. 2012;19(2):249–54.
Article PubMed Central PubMed Google Scholar
Sharma V, Sarkar IN. Leveraging concept-based approaches to identify potential phyto-therapies. J Biomed Inform. 2013;46(4):602–14.
Article PubMed Central PubMed Google Scholar
Carletta J. Assessing agreement on classification tasks: the Kappa statistic. Comput Linguis. 1996;22(2):249–54.
Google Scholar
Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76(5):378–82.
Article Google Scholar
Kwon SW. Surviving in the era of “Big Data”. Blood Res. 2013;48(3):167–8.
Article PubMed Central PubMed Google Scholar
Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, et al. Big data: the future of biocuration. Nature. 2008;455(7209):47–50.
Article CAS PubMed Central PubMed Google Scholar
Baldwin G. Small fish, big data pond. Health Data Manag. 2009;17(9):48.
PubMed Google Scholar
Fitbit [Mar 6, 2014]. Available from: https://www.fitbit.com/.
23andMe [Mar 6, 2014]. Available from: https://www.23andme.com/.

Additional Reading

Collen MF. Computer medical databases: the first six decades (1950–2010). London: Springer; 2012. xix, 288 p.
Google Scholar
Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform. 2014. (in press) http://dx.doi.org/10.1016/j.jbi.2014.02.003
Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp. 2000:270–4.
Google Scholar
Grivell L. Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information. EMBO Rep. 2002;3(3):200–3.
Article CAS PubMed Central PubMed Google Scholar
Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, et al. Big data: the future of biocuration. Nature. 2008;455(7209):47–50.
Article CAS PubMed Central PubMed Google Scholar
Prokosch HU, Ganslandt T. Perspectives for medical informatics. Reusing the electronic medical record for clinical research. Methods Inf Med. 2009;48(1):38–44.
CAS PubMed Google Scholar
Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1983. xv, 448 p.
Google Scholar
Sarkar I. Methods in biomedical informatics: a pragmatic approach. Boston: Academic; 2013.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Clinical and Translational Science, University of Vermont, Burlington, VT, USA
Indra Neil Sarkar PhD, MLIS
Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA
Indra Neil Sarkar PhD, MLIS

Authors

Indra Neil Sarkar PhD, MLIS
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Indra Neil Sarkar PhD, MLIS .

Editor information

Editors and Affiliations

Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
Philip R.O. Payne
Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio, USA
Peter J. Embi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sarkar, I.N. (2015). Mining the Bibliome. In: Payne, P., Embi, P. (eds) Translational Informatics. Health Informatics. Springer, London. https://doi.org/10.1007/978-1-4471-4646-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4646-9_5
Published: 31 July 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4645-2
Online ISBN: 978-1-4471-4646-9
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics