Abstract
In this paper we present a picture search engine for life science literature and show how it can be used to improve literature preselection. This preselection is needed as a way to compensate for the vast amounts of literature that are available. While searching for DNA binding sites for example, we wanted to add the results of specific experiments (DNAse I footprint and EMSA) to our database. The preselection via abstract search was very unspecific (150 000 hits), but by looking for paper with images concerning the experiments, we could improve precision immensely. They are displayed like hits in a search engine, allowing easy and quick quality assessment without having to read through the whole paper. The images are found by their annotation in the paper: the figure caption. To identify that, we analyse the layout of the paper: the position of the image and the surrounding text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
PubMed (2004), http://www.ncbi.nlm.nih.gov/pubmed/
Münch, R., Hiller, K., Barg, H., Heldt, H., Linz, S., Wingender, E., Jahn, D.: Prodoric: prokaryotic database of gene regulation. Nucleic Acids Research 31(1), 266–269 (2003)
Faulstich, L.C., Stadler, P.F., Thurner, C., Witwer, C.: litsift: Automated text categorization in bibliographic search. In: Data Mining and Text Mining for Bioinformatics, Workshop at the ECML / PKDD 2003 (2003)
Yeh, A., Hirschman, L., Morgan, A.: Evaluation of text data mining for database curation: lessons learned from the kdd challenge cup. Bioinformatics 19(1) (2003)
Shah, P., Perez-Iratxeta, C., Bork, P., Andrade, M.: Information extraction from full text scientific articles: Where are the keywords? BMC Bioinformatics (2003)
Adobe Network Solutions: PDF Reference Fourth Edition (2004), http://partners.adobe.com/asn/acrobat/sdk/publicdocs/PDFReference15_v6.pdf
BCL: BCL Jade (2004), http://www.bcltechnologies.com/document/products/jade/jade.htm
Kovacevic, M., Diligenti, M., Gori, M., Milutinovic, V.: Visual Adjacency Multigraphs - a Novel Approach to Web Page Classification. In: Proceedings of SAWM 2004 workshop, ECML 2004 (2004)
Litchfield, B.: PDFBox (2004), http://www.pdfbox.org/ or http://sourceforge.net/
The Apache Software Foundation: Digester (2005), http://jakarta.apache.org/commons/digester/
Hatcher, E., Gospodnetic, O.: Lucene in Action. Manning Publications (2004)
The Apache Software Foundation: Tomcat (2005), http://jakarta.apache.org/tomcat/
Coward, D., Yoshida, Y.: Java Servlet Specification (2003), http://jcp.org/aboutJava/communityprocess/final/jsr154/index.html
Deutsch, L.: Deflate compressed data format specification. Request for Comments No 1951, Network Working Group (1996)
International Organization for Standardization: ISO/IEC 10918-1:1994: Information technology — Digital compression and coding of continuous-tone still images: Requirements and guidelines. International Organization for Standardization, Geneva, Switzerland (1994)
Galperin, M.Y.: The Molecular Biology Database Collection: 2005 update. Nucleic Acids Research 33(Database-Issue), 5–24 (2005)
Schmeier, S., Hakenberg, J., Kowald, A., Klipp, E., Leser, U.: Text mining for systems biology using statistical learning methods. In: 3. Workshop des Arbeitskreises Knowledge Discovery (2003)
Wang, Y., Phillips, I.T., Haralick, R.M.: Table detection via probability optimization. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 272–283. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mathiak, B., Kupfer, A., Münch, R., Täubner, C., Eckstein, S. (2006). Improving Literature Preselection by Searching for Images. In: Bremer, E.G., Hakenberg, J., Han, EH.(., Berrar, D., Dubitzky, W. (eds) Knowledge Discovery in Life Science Literature. KDLL 2006. Lecture Notes in Computer Science(), vol 3886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11683568_2
Download citation
DOI: https://doi.org/10.1007/11683568_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32809-4
Online ISBN: 978-3-540-32810-0
eBook Packages: Computer ScienceComputer Science (R0)