Skip to main content

On the Usability of Random Indexing in Patent Retrieval

  • Conference paper
Graph-Based Representation and Reasoning (ICCS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8577))

Included in the following conference series:

Abstract

Statistical semantics methods are fairly controversial in the IR community, mostly because of their instability and difficulty to debug. At the same time, they are extremely tempting, in the same way perhaps, as Artificial Intelligence was in the 60s. Then, it took a few decades for the hype to pass and for us to learn the real utility and limits of the great technologies developed earlier. This paper takes an exhaustive view of the performance and utility of a particular statistical semantics method, Random Indexing, in the context of difficult texts. After over a year of CPU time in experiments, we provide a global view of the behaviour of the method on a particularly challenging test collection based on patent data. In the end, we observe interesting patterns emerging in the semantic space created by the method, which we hypothesize to be the cause of the behaviour observed in the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achlioptas, D.: Database-friendly random projections. In: Proc. of PODS (2001)

    Google Scholar 

  2. Adams, S.: The text, the full text and nothing but the text: Part 1 - standards for creating textual information in patent documents and general search implications. WPI Journal 32(1), 22–29 (2010)

    Google Scholar 

  3. Atkinson, K.H.: Towards a more rational patent search paradigm. In: Proc. of PaIR (2008)

    Google Scholar 

  4. Bast, H., Majumdar, D.: Why spectral retrieval works. In: Proc. of SIGIR (2005)

    Google Scholar 

  5. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proc. of KDD (2001)

    Google Scholar 

  6. Bradford, R.B.: An empirical study of required dimensionality for large-scale latent semantic indexing applications. In: Proc. of CIKM (2008)

    Google Scholar 

  7. Cohen, T., Schvaneveldt, R., Widdows, D.: Reflective random indexing and indirect inference: A scalable method for discovery of implicit connections. Journal of Biomedical Informatics 43(2) (2010)

    Google Scholar 

  8. Furnas, G.W., Dumais, S.T., Landauer, T.K., Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information Retrieval using Singular Value Decomposition Model of Latent Semantic Structure. In: Proc. of SIGIR (1988)

    Google Scholar 

  9. Garron, A., Kontostathis, A.: Applying latent semantic indexing on the trec 2010 legal dataset. In: Text Retrieval Conference, TREC (2010)

    Google Scholar 

  10. Johnson, W.B., Lindenstrauss, J.: Extensions to lipschiz mapping into hilbert space. Contemporary Mathematics 26 (1984)

    Google Scholar 

  11. Joho, H., Sanderson, M.: Document frequency and term specificity. In: Large Scale Semantic Access to Content (Text, Image, Video, & Sound), RIAO (2007)

    Google Scholar 

  12. Jonnalagadda, S., Cohen, T., Wu, S., Gonzalez, G.: Enhancing clinical concept extraction with distributional semantics. Journal of Biomedical Informatics 45(1), 129–140 (2012)

    Article  Google Scholar 

  13. Karlgren, J., Sahlgren, M.: From words to understanding. In: Uesaka, Y., Kanerva, P., Ashton, H. (eds.) Foundations of Real-World Intelligence (2001)

    Google Scholar 

  14. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 211–240 (1997)

    Google Scholar 

  15. Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods 28 (1996)

    Google Scholar 

  16. Lupu, M., Hanbury, A.: Patent Retrieval. Foundations and Trends in Information Retrieval 7(1) (2013)

    Google Scholar 

  17. Martin, D., Berry, M.: Mathematical Foundations Behind Latent Semantic Analysis. In: Handbook of Latent Semantic Analysis (2007)

    Google Scholar 

  18. Oostdijk, N., D’hondt, E., van Halteren, H., Verberne, S.: Genre and domain in patent texts. In: Proc. of PaIR (2010)

    Google Scholar 

  19. Piroi, F., Lupu, M., Hanbury, A., Zenz, V.: Clef-ip 2011: Retrieval in the intellectual property domain. In: CLEF (Notebook Papers/Labs/Workshop) (2011)

    Google Scholar 

  20. Sahlgren, M.: An introduction to random indexing. Technical report, SICS, Swedish Institute of Computer Science (2005)

    Google Scholar 

  21. Sahlgren, M., Hansen, P., Karlgren, J.: English-Japanese cross-lingual query expansion using random indexing of aligned bilingual text data. In: Proc. of NTCIR (2002)

    Google Scholar 

  22. Sahlgren, M., Karlgren, J.: Vector-based semantic analysis using random indexing for cross-lingual query expansion. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 169–176. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  23. Sahlgren, M., Karlgren, J.: Terminology mining in social media. In: Proc. of CIKM (2009)

    Google Scholar 

  24. Sanderson, M.: Ambiguous queries: test collections need more sense. In: Proc. of SIGIR (2008)

    Google Scholar 

  25. Schütze, H.: Dimensions of meaning. In: Proceedings of the Supercomputing 1992 (1992)

    Google Scholar 

  26. Schütze, H., Pederse, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management 33(3) (1997)

    Google Scholar 

  27. Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: Proc. of ICSC (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Lupu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Lupu, M. (2014). On the Usability of Random Indexing in Patent Retrieval. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds) Graph-Based Representation and Reasoning. ICCS 2014. Lecture Notes in Computer Science(), vol 8577. Springer, Cham. https://doi.org/10.1007/978-3-319-08389-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08389-6_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08388-9

  • Online ISBN: 978-3-319-08389-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics