Skip to main content

A Comparison of Top-k Temporal Keyword Querying over Versioned Text Collections

  • Conference paper
Database and Expert Systems Applications (DEXA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7447))

Included in the following conference series:

Abstract

As the web evolves over time, the amount of versioned text collections increases rapidly. Most web search engines will answer a query by ranking all known documents at the (current) time the query is posed. There are applications however (for example customer behavior analysis, crime investigation, etc.) that would need to efficiently query these sources as of some past time, that is, retrieve the results as if the user was posing the query in a past time instant, thus accessing data known as of that time. Ranking and searching over versioned documents considers not only keyword constraints but also the time dimension, most commonly, a time point or time range of interest. In this paper, we deal with top-k query evaluations with both keyword and temporal constraints over versioned textual documents. In addition to considering previous solutions, we propose novel data organization and indexing solutions: the first one partitions data along ranking positions, while the other maintains the full ranking order through the use of a multiversion ordered list. We present an experimental comparison for both time point and time interval constraints. For time-interval constraints, different querying definitions, such as aggregation functions and consistent top-k queries are evaluated. Experimental evaluations on large real world datasets demonstrate the advantages of the newly proposed data organization and indexing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wikipedia, http://en.wikipedia.org/

  2. Internet Archive, http://www.archive.org/

  3. European Archive, http://www.europarchive.org/

  4. Google Zeitgeist, http://www.google.com/zeitgeist/

  5. Anand, A., Bedathur, S., Berberich, K., Schenkel, R.: Efficient Temporal Keyword Queries over Versioned Text. In: CIKM (2010)

    Google Scholar 

  6. Anand, A., Bedathur, S., Berberich, K., Schenkel, R.: Temporal Index Sharding for Space-Time Efficiency in Archive Search. In: SIGIR (2011)

    Google Scholar 

  7. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)

    Google Scholar 

  8. Becker, B., Gschwind, S., Ohler, T., Seeger, B., Widmayer, P.: An asymptotically optimal multiversion B-tree. VLDB Journal (1996)

    Google Scholar 

  9. Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A Time Machine for Text Search. In: SIGIR (2007)

    Google Scholar 

  10. Berberich, K., Bedathur, S., Weikum, G.: Efficient Time-Travel on Versioned Text Collections. In: BTW (2007)

    Google Scholar 

  11. Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. He, J., Suel, T.: Faster Temporal Range Queries over Versioned Text. In: SIGIR (2011)

    Google Scholar 

  13. Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: SIGIR (1998)

    Google Scholar 

  14. Robertson, S.E., Walker, S.: Okapi/keenbow at TREC-8. In: TREC (1999)

    Google Scholar 

  15. Tsotras, V.J., Kangelaris, N.: The Snapshot Index: an I/O Optimal Access Method for Snapshot Queries. Information System 20(3), 237–260 (1995)

    Article  Google Scholar 

  16. U, L.H., Mamoulis, N., Berberich, K., Bedathur, S.: Durable Top-k Search in Document Archives. In: SIGMOD (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huo, W., Tsotras, V.J. (2012). A Comparison of Top-k Temporal Keyword Querying over Versioned Text Collections. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32597-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32597-7_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32596-0

  • Online ISBN: 978-3-642-32597-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics