Towards an effective automatic query expansion process using an association rule mining approach

Latiri, Chiraz; Haddad, Hatem; Hamrouni, Tarek

doi:10.1007/s10844-011-0189-9

Towards an effective automatic query expansion process using an association rule mining approach

Published: 20 December 2011

Volume 39, pages 209–247, (2012)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Chiraz Latiri¹,
Hatem Haddad¹ &
Tarek Hamrouni¹

415 Accesses
19 Citations
Explore all metrics

Abstract

The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Dempster-Shafer’s Evidence Theory for Query Expansion Based on Freebase Knowledge

Query expansion based on clustering and personalized information retrieval

Article 04 March 2019

An Empirical Comparison of Term Association and Knowledge Graphs for Query Expansion

Notes

By analogy to the itemset terminology used in data mining for a set of items.
The AMARYLLIS project is initiated by INIST-CNRS and co-funded by AUPELF-UREF. Its goal is to evaluate French Text retrieval systems.
The Cross-Language Evaluation Forum (CLEF) promotes multilingual information access. It offers benchmark collection data for evaluating IR systems. The associated website is: http://www.clef-campaign.org/.
In this paper, we denote by |X| the cardinality of the set X.
In the remainder, T ₁ \(\stackrel{c}{\Rightarrow}\) T ₂ indicates that the rule T ₁ \({\Rightarrow}\) T ₂ has a value of confidence equal to c.
The rule T \(\Rightarrow\) ∅ is usually considered as not informative.
Available at: http://www.adrem.ua.ac.be/~goethals/software/.
Distributed by the Synapse Development Corporation.
maxsupp means that the termset must occur at most below than this user-defined threshold.
Test datasets are available at: http://fimi.cs.helsinki.fi/data.
Freely available at: http://sourceforge.net/projects/lemur/, while the information about this toolkit is available at: http://www.lemurproject.org/.
idf is the acronym of inverted document frequency.
By baseline, we refer to the baselines using tf×idf, BM25tf and OKAPI BM25 weighting schemes.

References

Agrawal, R., & Skirant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th international conference on very large databases (VLDB 1994) (pp. 478–499). Santiago, Chile.
Armstrong, W. W. (1974). Dependency structures of database relationships. In Proceedings of IFIP congress (pp. 580–583). Geneva, Switzerland.
Ashrafi, M. Z., Taniar, D., & Smith, K. (2007). Redundant association rules reduction techniques. International Journal Business Intelligence and Data Mining, 1(2), 29–63.
Article Google Scholar
Balcázar, J. L. (2010). Redundancy, deduction schemes, and minimum-size bases for association rules. Logical Methods in Computer Science, 6(2:3), 1–33.
Google Scholar
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., & Lakhal, L. (2000). Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st international conference on computational logic (DOOD 2000), LNAI (Vol. 1861, pp. 972–986). London, UK: Springer-Verlag.
Google Scholar
Ben Yahia, S., Hamrouni, T., & Mephu Nguifo, E. (2006). Frequent closed itemset based algorithms: A thorough structural and analytical survey. ACM-SIGKDD Explorations, 8(1), 93–104.
Article Google Scholar
BenYahia, S., Gasmi, G., & Mephu Nguifo, E. (2009). A new generic basis of factual and implicative association rules. Intelligent Data Analysis, 13(4), 633–656.
Google Scholar
Bodner, R. C., & Song, F. (1996). Knowledge-based approaches to query expansion in information retrieval. In Proceedings of the 11th Biennial conference of the Canadian society for computational studies of intelligence on advances in artificial intelligence (AI 1996), LNCS (Vol. 1081, pp. 146–158). Toronto, Ontario, Canada: Springer-Verlag.
Google Scholar
Buckley, C., Salton, G., Allan, J., & Singhal, A. (1994). Automatic query expansion using SMART: TREC-3. In Proceedings of the 3rd text retrieval conference (TREC 1994).
Croft, B., & Yufeng, J. (1994). An association thesaurus for information retrieval. In Proceedings of the 4th international conference on computer-assisted information retrieval (RIAO 1994) (pp. 146–161). New York, USA: CID Press.
Google Scholar
de Vries, A. P., & Roelleke, T. (2005). Relevance information: A loss of entropy but a gain for IDF? In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2005) (pp. 282–289). Salvador, Brazil: ACM Press.
Chapter Google Scholar
El-Hajj, M., & Zaiane, O. (2005). Finding all frequent patterns starting from the closure. In Proceedings of the international conference on advanced data mining and applications (ADMA 2005) (pp. 67–74). Wuhan, China.
Fonseca, B. M., Golgher, P. B., Pôssas, B., Ribeiro-Neto, B. A., & Ziviani, N. (2005). Concept-based interactive query expansion. In Proceedings of the 14th international conference on information and knowledge management (CIKM 2005) (pp. 696–703). Bremen, Germany: ACM Press.
Google Scholar
Ganter, B., & Wille, R. (1999). Formal concept analysis. Springer-Verlag, Heidelberg.
Book MATH Google Scholar
Grefenstette, G. (1992). Use of semantic context to produce term association lists for text retrieval. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1992) (pp. 89–97). Copenhagen, Denmark: ACM Press.
Chapter Google Scholar
Guillet, F., & Hamilton, H. J. (2007). Quality measures in data mining, Vol. 43. Studies in Computational Intelligence, Springer.
Haddad, H., Chevallet, J. P., & Bruandet, M. F. (2000). Relations between terms discovered by association rules (12 pages). In Proceedings of the workshop on machine learning and textual information access in conjunction with the 4th European conference on principles and practices of knowledge discovery in databases (PKDD 2000). Lyon, France.
Haiquan, L., Jinyan, L., Wong, L., Feng, M., & Tan, Y. P. (2005). Relative risk and odds ration: A data mining perspective. In Proceedings of the 24th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS 2005) (pp. 368–377). Baltimore, Maryland, USA: ACM Press.
Google Scholar
Hu, J., Wang, G., Lochovsky, F. H., Sun, J-T., & Chen, Z. (2009). Understanding user’s query intent with Wikipedia. In Proceedings of the 18th international conference on world wide web (WWW 2009) (pp. 471–480). Madrid, Spain: ACM Press.
Chapter Google Scholar
Joho, H., Sanderson, M., & Beaulieu, M. (2004). A study of user interaction with a concept-based interactive query expansion support tool. In Proceedings of the 26th European Conference on Information Retrieval Research (ECIR 2004), LNCS (Vol. 2997, pp. 42–56). Sunderland, UK: Springer-Verlag.
Google Scholar
Jones, K. S., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, Elsevier, 36(6), 779–840.
Article Google Scholar
Kryszkiewicz, M. (2002). Concise representation of frequent patterns and association rules. Habilitation dissertation, Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland.
Google Scholar
Lin, H. C., Wang, L. H., & Chen, S. M. (2008). Query expansion for document retrieval by mining additional query terms. Information and Management Sciences, 19(1), 17–30.
MATH Google Scholar
Liu, H., Sun, J., & Zhang, H. (2009). Post-Mining of association rules: Techniques for effective knowledge extraction. Chapter V: Post-processing for rule reduction using closed set. IGI Global Publisher.
Lucchese, C., Orlando, S., Palmerini, P., Perego, R., & Silvestri, F. (2003). kDCI: A multi-strategy algorithm for mining frequent sets. In Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI 2003). CEUR Workshop Proceedings (Vol. 90). Melbourne, Florida, USA.
Luxenburger, M. (1991). Implications partielles dans un contexte. Mathématiques, Informatique et Sciences Humaines, 29(113), 35–55.
MathSciNet Google Scholar
Mitra, M., Singhal, A., & Buckley, C. (1998). Improving automatic query expansion. In Proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1998) (pp. 206–214). Melbourne, Australia: ACM Press.
Chapter Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., & Lakhal, L. (2005). Generating a condensed representation for association rules. Journal of Intelligent Information Systems, 24(1), 25–60.
Article Google Scholar
Pfaltz, J. L., & Taylor, C. M. (2002). Scientific knowledge discovery through iterative transformation of concept lattices. In Proceedings of the workshop on discrete applied mathematics in conjunction with the 2nd SIAM international conference on data mining (SDM 2002) (pp. 65–74). Arlington, Virginia, USA.
Qui, Y., & Frei, H. P. (1993). Concept based query expansion. In Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1993) (pp. 160–169). Pittsburgh, PA, USA: ACM Press.
Google Scholar
Rungsawang, A., Tangpong, A., Laohawee, P., & Khampachua, T. (1999). Novel query expansion technique using Apriori algorithm. In Proceedings of the 8th Text REtrieval Conference (TREC 1999).
Ruthven, I. (2003). Re-examining the potential effectiveness of interactive query expansion. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2003) (pp. 213–220). Toronto, Canada: ACM Press.
Chapter Google Scholar
Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, Cambridge University Press, 18(2), 95–145.
Google Scholar
Salton, G. (1971). The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall Series in Automatic Computation, Prentice-Hall, NJ, USA.
Google Scholar
Salton, G., & Buckely, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.
Article Google Scholar
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.
Schenkel, R., & Theobald, M. (2005). Relevance feedback for structural query expansion. In Proceedings of the 4th international workshop of the initiative for the evaluation of XML retrieval (INEX 2005), LNCS (Vol. 3977, pp. 344–357). Dagstuhl Castle, Germany: Springer-Verlag.
Google Scholar
Shi, X., & Yang, C. C. (2007). Mining related queries from web search engine query logs using an improved association rule mining model. Journal of the American Society for Information Science and Technology, Wiley, 58(12), 1871–1883.
Article MathSciNet Google Scholar
Smucker, M. D., Allan, J., & Carterette, B. (2007). A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th international conference on information and knowledge management (CIKM 2007) (pp. 623–632). Lisboa, Portugal: ACM Press.
Google Scholar
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., & Lakhal, L. (2002). Computing Iceberg concept lattices with Titanic. Data & Knowledge Engineering, 2(42), 189–222.
Article Google Scholar
Sun, R., Ong, C., & Chua, T. (2006). Mining dependency relations for query expansion in passage retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2006) (pp. 382–389). Seattle, Washington, USA: ACM Press.
Chapter Google Scholar
Tangpong, A., & Rungsawang, A. (2000). Applying association rules discovery in query expansion process. In Proceedings of the 4th world multi-conference on systemics, cybernetics and informatics (SCI 2000). Orlando, Florida, USA.
Voorhees, E. M. (1993). Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1993) (pp. 171–180). Pittsburgh, PA, USA: ACM Press.
Chapter Google Scholar
Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1996) (pp. 4–11). Zurich, Switzerland: ACM Press.
Chapter Google Scholar
Yurekli, B., Capan, G., Yilmazel, B., & Yilmazel, O. (2009). Guided navigation using query log mining through query expansion. In Proceedings of the 3rd international conference on network and system security (NSS 2009). IEEE computer society (pp. 560–564). Gold Coast, Queensland, Australia.
Zaki, M. J. (2004). Mining non-redundant association rules. Data Mining and Knowledge Discovery, 9(3), 223–248.
Article MathSciNet Google Scholar
Zhai, C. (2001). Notes on the Lemur tfidf model. Note with Lemur 1.9 documentation. Technical report, School of Computer Science, Computer Science Department, Carnegie Mellon University (CMU), Pittsburgh, PA, USA.

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments and suggestions. We are also grateful to the Evaluations and Language resources Distribution Agency (ELDA) which kindly provided us the LE Monde 94 and ATS 94 document collections of the CLEF 2003 campaign.

Author information

Authors and Affiliations

URPAH Team, Computer Sciences Department, Faculty of Sciences of Tunis, El Manar University, Tunis, Tunisia
Chiraz Latiri, Hatem Haddad & Tarek Hamrouni

Authors

Chiraz Latiri
View author publications
You can also search for this author in PubMed Google Scholar
Hatem Haddad
View author publications
You can also search for this author in PubMed Google Scholar
Tarek Hamrouni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tarek Hamrouni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Latiri, C., Haddad, H. & Hamrouni, T. Towards an effective automatic query expansion process using an association rule mining approach. J Intell Inf Syst 39, 209–247 (2012). https://doi.org/10.1007/s10844-011-0189-9

Download citation

Received: 04 May 2011
Revised: 20 November 2011
Accepted: 24 November 2011
Published: 20 December 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10844-011-0189-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards an effective automatic query expansion process using an association rule mining approach

Abstract

Access this article

Similar content being viewed by others

Using Dempster-Shafer’s Evidence Theory for Query Expansion Based on Freebase Knowledge

Query expansion based on clustering and personalized information retrieval

An Empirical Comparison of Term Association and Knowledge Graphs for Query Expansion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards an effective automatic query expansion process using an association rule mining approach

Abstract

Access this article

Similar content being viewed by others

Using Dempster-Shafer’s Evidence Theory for Query Expansion Based on Freebase Knowledge

Query expansion based on clustering and personalized information retrieval

An Empirical Comparison of Term Association and Knowledge Graphs for Query Expansion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation