Skip to main content

Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4956))

Abstract

Through mass-digitization projects and with the use of OCR technologies, digitized books are becoming available on the Web and in digital libraries. The unprecedented scale of these efforts, the unique characteristics of the digitized material as well as the unexplored possibilities of user interactions make full-text book search an exciting area of information retrieval (IR) research. Emerging research questions include: How appropriate and effective are traditional IR models when applied to books? What book specific features (e.g., back-of-book index) should receive special attention during the indexing and retrieval processes? How can we tackle scalability? In order to answer such questions, we developed an experimental platform to facilitate rapid prototyping of a book search system as well as to support large-scale tests. Using this system, we performed experiments on a collection of 10 000 books, evaluating the efficiency of a novel multi-field inverted index and the effectiveness of the BM25F retrieval model adapted to books, using book-specific fields.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Büttcher, S., Clarke, C.L.A., Lushman, B.: Hybrid index maintenance for growing text collections. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) SIGIR, pp. 356–363. ACM, New York (2006)

    Google Scholar 

  2. Clark, J., DeRose, S.: XML Path Language (XPath) version 1.0. W3C Recommendation. Technical Report REC-xpath-19991116, W3C (World Wide Web Consortium) (November 1999), http://www.w3.org/TR/xpath

  3. Craswell, N., Zaragoza, H., Robertson, S.: Microsoft cambridge at TREC-14: Enterprise track

    Google Scholar 

  4. Hawking, D.: Challenges in enterprise search. In: Schewe, K.-D., Williams, H.E. (eds.) ADC. CRPIT, vol. 27, pp. 15–24. Australian Computer Society (2004)

    Google Scholar 

  5. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (ACM TOIS) 20(4), 422–446 (2002)

    Article  Google Scholar 

  6. Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the integration of structure indexes and inverted lists. In: ICDE, p. 829 (2004)

    Google Scholar 

  7. Liu, S., Zou, Q., Chu, W.W.: Configurable indexing and ranking for XML information retrieval. In: Sanderson, M., Järvelin, K., Allan, J., Bruza, P. (eds.) SIGIR, pp. 88–95. ACM, New York (2004)

    Google Scholar 

  8. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)

    Article  Google Scholar 

  9. Reddy, R., StClair, G.: The million book digital library project

    Google Scholar 

  10. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR, pp. 232–241. ACM/Springer (1994)

    Google Scholar 

  11. Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: Grossman, D., Gravano, L., Zhai, C., Herzog, O., Evans, D.A. (eds.) CIKM, pp. 42–49. ACM, New York (2004)

    Google Scholar 

  12. Sabine, Sabine: How people use books. Library Quarterly 56(4), 399–408 (1986)

    Article  Google Scholar 

  13. Taylor, M.J., Guiver, J., Robertson, S., Minka, T.: Softrank: Optimising non-smooth ranking metrics. In: First ACM International Conference on Web Search and Data Mining (WSDM), Stanford, California (2008)

    Google Scholar 

  14. Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005)

    Google Scholar 

  15. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Craig Macdonald Iadh Ounis Vassilis Plachouras Ian Ruthven Ryen W. White

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, H., Kazai, G., Taylor, M. (2008). Book Search Experiments: Investigating IR Methods for the Indexing and Retrieval of Books. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78646-7_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78645-0

  • Online ISBN: 978-3-540-78646-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics