On the Usability of Random Indexing in Patent Retrieval

Lupu, Mihai

doi:10.1007/978-3-319-08389-6_17

Mihai Lupu⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8577))

Included in the following conference series:

International Conference on Conceptual Structures

1060 Accesses
4 Citations

Abstract

Statistical semantics methods are fairly controversial in the IR community, mostly because of their instability and difficulty to debug. At the same time, they are extremely tempting, in the same way perhaps, as Artificial Intelligence was in the 60s. Then, it took a few decades for the hype to pass and for us to learn the real utility and limits of the great technologies developed earlier. This paper takes an exhaustive view of the performance and utility of a particular statistical semantics method, Random Indexing, in the context of difficult texts. After over a year of CPU time in experiments, we provide a global view of the behaviour of the method on a particularly challenging test collection based on patent data. In the end, we observe interesting patterns emerging in the semantic space created by the method, which we hypothesize to be the cause of the behaviour observed in the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Achlioptas, D.: Database-friendly random projections. In: Proc. of PODS (2001)
Google Scholar
Adams, S.: The text, the full text and nothing but the text: Part 1 - standards for creating textual information in patent documents and general search implications. WPI Journal 32(1), 22–29 (2010)
Google Scholar
Atkinson, K.H.: Towards a more rational patent search paradigm. In: Proc. of PaIR (2008)
Google Scholar
Bast, H., Majumdar, D.: Why spectral retrieval works. In: Proc. of SIGIR (2005)
Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proc. of KDD (2001)
Google Scholar
Bradford, R.B.: An empirical study of required dimensionality for large-scale latent semantic indexing applications. In: Proc. of CIKM (2008)
Google Scholar
Cohen, T., Schvaneveldt, R., Widdows, D.: Reflective random indexing and indirect inference: A scalable method for discovery of implicit connections. Journal of Biomedical Informatics 43(2) (2010)
Google Scholar
Furnas, G.W., Dumais, S.T., Landauer, T.K., Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information Retrieval using Singular Value Decomposition Model of Latent Semantic Structure. In: Proc. of SIGIR (1988)
Google Scholar
Garron, A., Kontostathis, A.: Applying latent semantic indexing on the trec 2010 legal dataset. In: Text Retrieval Conference, TREC (2010)
Google Scholar
Johnson, W.B., Lindenstrauss, J.: Extensions to lipschiz mapping into hilbert space. Contemporary Mathematics 26 (1984)
Google Scholar
Joho, H., Sanderson, M.: Document frequency and term specificity. In: Large Scale Semantic Access to Content (Text, Image, Video, & Sound), RIAO (2007)
Google Scholar
Jonnalagadda, S., Cohen, T., Wu, S., Gonzalez, G.: Enhancing clinical concept extraction with distributional semantics. Journal of Biomedical Informatics 45(1), 129–140 (2012)
Article Google Scholar
Karlgren, J., Sahlgren, M.: From words to understanding. In: Uesaka, Y., Kanerva, P., Ashton, H. (eds.) Foundations of Real-World Intelligence (2001)
Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 211–240 (1997)
Google Scholar
Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods 28 (1996)
Google Scholar
Lupu, M., Hanbury, A.: Patent Retrieval. Foundations and Trends in Information Retrieval 7(1) (2013)
Google Scholar
Martin, D., Berry, M.: Mathematical Foundations Behind Latent Semantic Analysis. In: Handbook of Latent Semantic Analysis (2007)
Google Scholar
Oostdijk, N., D’hondt, E., van Halteren, H., Verberne, S.: Genre and domain in patent texts. In: Proc. of PaIR (2010)
Google Scholar
Piroi, F., Lupu, M., Hanbury, A., Zenz, V.: Clef-ip 2011: Retrieval in the intellectual property domain. In: CLEF (Notebook Papers/Labs/Workshop) (2011)
Google Scholar
Sahlgren, M.: An introduction to random indexing. Technical report, SICS, Swedish Institute of Computer Science (2005)
Google Scholar
Sahlgren, M., Hansen, P., Karlgren, J.: English-Japanese cross-lingual query expansion using random indexing of aligned bilingual text data. In: Proc. of NTCIR (2002)
Google Scholar
Sahlgren, M., Karlgren, J.: Vector-based semantic analysis using random indexing for cross-lingual query expansion. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 169–176. Springer, Heidelberg (2002)
Chapter Google Scholar
Sahlgren, M., Karlgren, J.: Terminology mining in social media. In: Proc. of CIKM (2009)
Google Scholar
Sanderson, M.: Ambiguous queries: test collections need more sense. In: Proc. of SIGIR (2008)
Google Scholar
Schütze, H.: Dimensions of meaning. In: Proceedings of the Supercomputing 1992 (1992)
Google Scholar
Schütze, H., Pederse, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management 33(3) (1997)
Google Scholar
Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: Proc. of ICSC (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Wien, Austria
Mihai Lupu

Authors

Mihai Lupu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mihai Lupu .

Editor information

Editors and Affiliations

Université Toulouse le Mirail, Toulouse, France
Nathalie Hernandez
L3S Research Center, Hannover, Germany
Robert Jäschke
LIRMM, Montpellier, France
Madalina Croitoru

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lupu, M. (2014). On the Usability of Random Indexing in Patent Retrieval. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds) Graph-Based Representation and Reasoning. ICCS 2014. Lecture Notes in Computer Science(), vol 8577. Springer, Cham. https://doi.org/10.1007/978-3-319-08389-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-08389-6_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08388-9
Online ISBN: 978-3-319-08389-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics