Selective Compound Splitting of Swedish Queries for Boolean Combinations of Truncated Terms

Cöster, Rickard; Sahlgren, Magnus; Karlgren, Jussi

doi:10.1007/978-3-540-30222-3_32

Rickard Cöster¹⁹,
Magnus Sahlgren¹⁹ &
Jussi Karlgren¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3237))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

393 Accesses
3 Citations

Abstract

In languages that use compound words such as Swedish, it is often neccessary to split compound words when indexing documents or queries. One of the problems is that it is difficult to find constituents that express a concept similar to that expressed by the compound. The approach taken here is to expand a query with the leading constituents of the compound words. Every query term is truncated so as to increase recall by hopefully finding other compounds with the leading constituent as prefix. This approach increases recall in a rather uncontrolled way, so we use a Boolean quorum-level search method to rank documents both according to a tf-idf factor but also to the number of matching Boolean combinations.

The Boolean combinations performed relatively well, taking into consideration that the queries were very short (maximum of five search terms). Also included in this paper are the results of two other methods we are currently working on in our lab; one for re-ranking search results on the basis of stylistic analysis of documents, and one for dimensionality reduction using Random Indexing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Folk, M.J., Zoellick, B., Riccardi, G.: File Structures: An Object-Oriented Approach with C++, 3rd edn. Addison-Wesley, Reading (1998)
Google Scholar
Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036, Erlbaum (2000)
Google Scholar
Karlgren, J., Sahlgren, M.: From words to understanding. In: Uesaka, Y., Kanerva, P., Asoh, H. (eds.) Foundations of Real World Intelligence, pp. 294–308. CSLI publications, Stanford (2001)
Google Scholar
Kaski, S.: Dimensionality reduction by random mapping: Fast similarity computation for clustering. In: Proceedings of the IJCNN 1998, International Joint Conference on Neural Networks, pp. 413–418. IEEE Service Center, Piscataway (1998)
Google Scholar
Sahlgren, M., Karlgren, J., Cöster, R., Järvinen, T.: SICS at CLEF 2002: Automatic query expansion using random indexing. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)
Chapter Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th International Conference on Research and Development in Information Retrieval, pp. 21–29 (1996)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Swedish Institute of Computer Science, SICS, Box 1263, SE-164 29, Kista, Sweden
Rickard Cöster, Magnus Sahlgren & Jussi Karlgren

Authors

Rickard Cöster
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Sahlgren
View author publications
You can also search for this author in PubMed Google Scholar
Jussi Karlgren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISTI-CNR, Area di Ricerca, Pisa, Italy
Carol Peters
No Affiliations,
Julio Gonzalo & Martin Braschler &
German Institute for International and Security Affairs, Stiftung Wissenschaft und Politik (SWP), Ludwigkirchplatz 3-4, 10719, Berlin, Germany
Michael Kluck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cöster, R., Sahlgren, M., Karlgren, J. (2004). Selective Compound Splitting of Swedish Queries for Boolean Combinations of Truncated Terms. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-30222-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24017-4
Online ISBN: 978-3-540-30222-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics