Skip to main content

A Fingerprinting Technique for Evaluating Semantics Based Indexing

  • Conference paper
Advances in Information Retrieval (ECIR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3936))

Included in the following conference series:

Abstract

The quality of search engines depends usually on the content of the returned documents rather than on the text used to express this content. So ideally, search techniques should be directed more toward the semantic dependencies underlying documents than toward the texts themselves. The most visible examples in this direction are Latent Semantic Analysis (LSA), and the Hyperspace Analog to Language (HAL). If these techniques are really based on semantic dependencies, as they contend, then they should be applicable across languages.

To investigate this contention we used electronic versions of two kinds of material with their translations: a novel, and a popular treatise about cosmology. We used the analogy of fingerprinting as employed in forensics to establish whether individuals are related. Genetic fingerprinting uses enzymes to split the DNA and then compare the resulting band patterns. Likewise, in our research we used queries to split a document into fragments. If a search technique really isolates fragments semantically related to the query, then a document and its translation should have similar band patterns.

In this paper we (1) present the fingerprinting technique, (2) introduce the material used, and (3) report results of an evaluation for two semantic indexing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hull, D.A., Grefenstette, G.: Querying across languages: A dictionary-based approach to multilingual information retrieval. In: 19th Annual ACM Conference on Research and Development in Information Retrieval, pp. 49–57 (1996)

    Google Scholar 

  2. Yang, Y., Carbonell, J.G., Brown, R.D., Frederking, R.E.: Translingual information retrieval: Learning from bilingual corpora. Artificial Intelligence 103, 323–345 (1998)

    Article  MATH  Google Scholar 

  3. Hoenkamp, E.: Unitary operators on the document space. Journal of the American Society for Information Science and Technology 54, 314–320 (2003)

    Article  Google Scholar 

  4. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  5. Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The measurement of meaning. University of Illinois Press, Urbana (1957)

    Google Scholar 

  6. Burgess, C., Livesay, K., Lund, K.: Explorations in context space: Words, sentences, discourse. Discourse Processes 25, 211–257 (1998)

    Article  Google Scholar 

  7. Hoenkamp, E., Song, D.: The document as an ergodic markov chain. In: Proceedings of the 27th Conference on Research and Development in Information Retrieval, pp. 496–497 (2004)

    Google Scholar 

  8. Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for IR. In: Proceedings of the 24th Conference on Research and Development in Information Retrieval, pp. 111–119 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hoenkamp, E., van Dijk, S. (2006). A Fingerprinting Technique for Evaluating Semantics Based Indexing. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_35

Download citation

  • DOI: https://doi.org/10.1007/11735106_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33347-0

  • Online ISBN: 978-3-540-33348-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics