Improving Literature Preselection by Searching for Images

Mathiak, Brigitte; Kupfer, Andreas; Münch, Richard; Täubner, Claudia; Eckstein, Silke

doi:10.1007/11683568_2

Brigitte Mathiak²⁴,
Andreas Kupfer²⁴,
Richard Münch²⁵,
Claudia Täubner²⁴ &
…
Silke Eckstein²⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3886))

Included in the following conference series:

International Workshop on Knowledge Discovery in Life Science LIterature

464 Accesses
5 Citations

Abstract

In this paper we present a picture search engine for life science literature and show how it can be used to improve literature preselection. This preselection is needed as a way to compensate for the vast amounts of literature that are available. While searching for DNA binding sites for example, we wanted to add the results of specific experiments (DNAse I footprint and EMSA) to our database. The preselection via abstract search was very unspecific (150 000 hits), but by looking for paper with images concerning the experiments, we could improve precision immensely. They are displayed like hits in a search engine, allowing easy and quick quality assessment without having to read through the whole paper. The images are found by their annotation in the paper: the figure caption. To identify that, we analyse the layout of the paper: the position of the image and the surrounding text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

PubMed (2004), http://www.ncbi.nlm.nih.gov/pubmed/
Münch, R., Hiller, K., Barg, H., Heldt, H., Linz, S., Wingender, E., Jahn, D.: Prodoric: prokaryotic database of gene regulation. Nucleic Acids Research 31(1), 266–269 (2003)
Article Google Scholar
Faulstich, L.C., Stadler, P.F., Thurner, C., Witwer, C.: litsift: Automated text categorization in bibliographic search. In: Data Mining and Text Mining for Bioinformatics, Workshop at the ECML / PKDD 2003 (2003)
Google Scholar
Yeh, A., Hirschman, L., Morgan, A.: Evaluation of text data mining for database curation: lessons learned from the kdd challenge cup. Bioinformatics 19(1) (2003)
Google Scholar
Shah, P., Perez-Iratxeta, C., Bork, P., Andrade, M.: Information extraction from full text scientific articles: Where are the keywords? BMC Bioinformatics (2003)
Google Scholar
Adobe Network Solutions: PDF Reference Fourth Edition (2004), http://partners.adobe.com/asn/acrobat/sdk/publicdocs/PDFReference15_v6.pdf
BCL: BCL Jade (2004), http://www.bcltechnologies.com/document/products/jade/jade.htm
Kovacevic, M., Diligenti, M., Gori, M., Milutinovic, V.: Visual Adjacency Multigraphs - a Novel Approach to Web Page Classification. In: Proceedings of SAWM 2004 workshop, ECML 2004 (2004)
Google Scholar
Litchfield, B.: PDFBox (2004), http://www.pdfbox.org/ or http://sourceforge.net/
The Apache Software Foundation: Digester (2005), http://jakarta.apache.org/commons/digester/
Hatcher, E., Gospodnetic, O.: Lucene in Action. Manning Publications (2004)
Google Scholar
The Apache Software Foundation: Tomcat (2005), http://jakarta.apache.org/tomcat/
Coward, D., Yoshida, Y.: Java Servlet Specification (2003), http://jcp.org/aboutJava/communityprocess/final/jsr154/index.html
Deutsch, L.: Deflate compressed data format specification. Request for Comments No 1951, Network Working Group (1996)
Google Scholar
International Organization for Standardization: ISO/IEC 10918-1:1994: Information technology — Digital compression and coding of continuous-tone still images: Requirements and guidelines. International Organization for Standardization, Geneva, Switzerland (1994)
Google Scholar
Galperin, M.Y.: The Molecular Biology Database Collection: 2005 update. Nucleic Acids Research 33(Database-Issue), 5–24 (2005)
Google Scholar
Schmeier, S., Hakenberg, J., Kowald, A., Klipp, E., Leser, U.: Text mining for systems biology using statistical learning methods. In: 3. Workshop des Arbeitskreises Knowledge Discovery (2003)
Google Scholar
Wang, Y., Phillips, I.T., Haralick, R.M.: Table detection via probability optimization. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 272–283. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informationssysteme, TU Braunschweig, Germany
Brigitte Mathiak, Andreas Kupfer, Claudia Täubner & Silke Eckstein
Institut für Mikrobiologie, TU Braunschweig, Germany
Richard Münch

Authors

Brigitte Mathiak
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Kupfer
View author publications
You can also search for this author in PubMed Google Scholar
Richard Münch
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Täubner
View author publications
You can also search for this author in PubMed Google Scholar
Silke Eckstein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Brain Tumor Research Program, Children’s Memorial Hospital, and Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Eric G. Bremer
Computer Science Department, Knowledge Management in Bioinformatics, Humbold-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Jörg Hakenberg
iXmatch Inc., 5555 West 78th Street Suite E, 55439-2702, Minneapolis, MN, USA
Eui-Hong (Sam) Han
School of Biomedical Sciences, University of Ulster, Cromore Road,, BT52 1SA, Coleraine, Northern Ireland, UK
Daniel Berrar
School of Biomedial Sciences, Bioinformatics Research Group, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland, UK
Werner Dubitzky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mathiak, B., Kupfer, A., Münch, R., Täubner, C., Eckstein, S. (2006). Improving Literature Preselection by Searching for Images. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_2

Download citation

DOI: https://doi.org/10.1007/11683568_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32809-4
Online ISBN: 978-3-540-32810-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics