Skip to main content
Log in

Towards an effective automatic query expansion process using an association rule mining approach

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The steady growth in the size of textual document collections is a key progress-driver for modern information retrieval techniques whose effectiveness and efficiency are constantly challenged. Given a user query, the number of retrieved documents can be overwhelmingly large, hampering their efficient exploitation by the user. In addition, retaining only relevant documents in a query answer is of paramount importance for an effective meeting of the user needs. In this situation, the query expansion technique offers an interesting solution for obtaining a complete answer while preserving the quality of retained documents. This mainly relies on an accurate choice of the added terms to an initial query. Interestingly enough, query expansion takes advantage of large text volumes by extracting statistical information about index terms co-occurrences and using it to make user queries better fit the real information needs. In this respect, a promising track consists in the application of data mining methods to the extraction of dependencies between terms. In this paper, we present a novel approach for mining knowledge supporting query expansion that is based on association rules. The key feature of our approach is a better trade-off between the size of the mining result and the conveyed knowledge. Thus, our association rules mining method implements results from Galois connection theory and compact representations of rules sets in order to reduce the huge number of potentially useful associations. An experimental study has examined the application of our approach to some real collections, whereby automatic query expansion has been performed. The results of the study show a significant improvement in the performances of the information retrieval system, both in terms of recall and precision, as highlighted by the carried out significance testing using the Wilcoxon test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. By analogy to the itemset terminology used in data mining for a set of items.

  2. The AMARYLLIS project is initiated by INIST-CNRS and co-funded by AUPELF-UREF. Its goal is to evaluate French Text retrieval systems.

  3. The Cross-Language Evaluation Forum (CLEF) promotes multilingual information access. It offers benchmark collection data for evaluating IR systems. The associated website is: http://www.clef-campaign.org/.

  4. In this paper, we denote by |X| the cardinality of the set X.

  5. In the remainder, T 1 \(\stackrel{c}{\Rightarrow}\) T 2 indicates that the rule T 1 \({\Rightarrow}\) T 2 has a value of confidence equal to c.

  6. The rule T \(\Rightarrow\) ∅ is usually considered as not informative.

  7. Available at: http://www.adrem.ua.ac.be/~goethals/software/.

  8. Distributed by the Synapse Development Corporation.

  9. maxsupp means that the termset must occur at most below than this user-defined threshold.

  10. Test datasets are available at: http://fimi.cs.helsinki.fi/data.

  11. Freely available at: http://sourceforge.net/projects/lemur/, while the information about this toolkit is available at: http://www.lemurproject.org/.

  12. idf is the acronym of inverted document frequency.

  13. By baseline, we refer to the baselines using tf×idf, BM25tf and OKAPI BM25 weighting schemes.

References

  • Agrawal, R., & Skirant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th international conference on very large databases (VLDB 1994) (pp. 478–499). Santiago, Chile.

  • Armstrong, W. W. (1974). Dependency structures of database relationships. In Proceedings of IFIP congress (pp. 580–583). Geneva, Switzerland.

  • Ashrafi, M. Z., Taniar, D., & Smith, K. (2007). Redundant association rules reduction techniques. International Journal Business Intelligence and Data Mining, 1(2), 29–63.

    Article  Google Scholar 

  • Balcázar, J. L. (2010). Redundancy, deduction schemes, and minimum-size bases for association rules. Logical Methods in Computer Science, 6(2:3), 1–33.

    Google Scholar 

  • Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., & Lakhal, L. (2000). Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st international conference on computational logic (DOOD 2000), LNAI (Vol. 1861, pp. 972–986). London, UK: Springer-Verlag.

    Google Scholar 

  • Ben Yahia, S., Hamrouni, T., & Mephu Nguifo, E. (2006). Frequent closed itemset based algorithms: A thorough structural and analytical survey. ACM-SIGKDD Explorations, 8(1), 93–104.

    Article  Google Scholar 

  • BenYahia, S., Gasmi, G., & Mephu Nguifo, E. (2009). A new generic basis of factual and implicative association rules. Intelligent Data Analysis, 13(4), 633–656.

    Google Scholar 

  • Bodner, R. C., & Song, F. (1996). Knowledge-based approaches to query expansion in information retrieval. In Proceedings of the 11th Biennial conference of the Canadian society for computational studies of intelligence on advances in artificial intelligence (AI 1996), LNCS (Vol. 1081, pp. 146–158). Toronto, Ontario, Canada: Springer-Verlag.

    Google Scholar 

  • Buckley, C., Salton, G., Allan, J., & Singhal, A. (1994). Automatic query expansion using SMART: TREC-3. In Proceedings of the 3rd text retrieval conference (TREC 1994).

  • Croft, B., & Yufeng, J. (1994). An association thesaurus for information retrieval. In Proceedings of the 4th international conference on computer-assisted information retrieval (RIAO 1994) (pp. 146–161). New York, USA: CID Press.

    Google Scholar 

  • de Vries, A. P., & Roelleke, T. (2005). Relevance information: A loss of entropy but a gain for IDF? In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2005) (pp. 282–289). Salvador, Brazil: ACM Press.

    Chapter  Google Scholar 

  • El-Hajj, M., & Zaiane, O. (2005). Finding all frequent patterns starting from the closure. In Proceedings of the international conference on advanced data mining and applications (ADMA 2005) (pp. 67–74). Wuhan, China.

  • Fonseca, B. M., Golgher, P. B., Pôssas, B., Ribeiro-Neto, B. A., & Ziviani, N. (2005). Concept-based interactive query expansion. In Proceedings of the 14th international conference on information and knowledge management (CIKM 2005) (pp. 696–703). Bremen, Germany: ACM Press.

    Google Scholar 

  • Ganter, B., & Wille, R. (1999). Formal concept analysis. Springer-Verlag, Heidelberg.

    Book  MATH  Google Scholar 

  • Grefenstette, G. (1992). Use of semantic context to produce term association lists for text retrieval. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1992) (pp. 89–97). Copenhagen, Denmark: ACM Press.

    Chapter  Google Scholar 

  • Guillet, F., & Hamilton, H. J. (2007). Quality measures in data mining, Vol. 43. Studies in Computational Intelligence, Springer.

  • Haddad, H., Chevallet, J. P., & Bruandet, M. F. (2000). Relations between terms discovered by association rules (12 pages). In Proceedings of the workshop on machine learning and textual information access in conjunction with the 4th European conference on principles and practices of knowledge discovery in databases (PKDD 2000). Lyon, France.

  • Haiquan, L., Jinyan, L., Wong, L., Feng, M., & Tan, Y. P. (2005). Relative risk and odds ration: A data mining perspective. In Proceedings of the 24th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS 2005) (pp. 368–377). Baltimore, Maryland, USA: ACM Press.

    Google Scholar 

  • Hu, J., Wang, G., Lochovsky, F. H., Sun, J-T., & Chen, Z. (2009). Understanding user’s query intent with Wikipedia. In Proceedings of the 18th international conference on world wide web (WWW 2009) (pp. 471–480). Madrid, Spain: ACM Press.

    Chapter  Google Scholar 

  • Joho, H., Sanderson, M., & Beaulieu, M. (2004). A study of user interaction with a concept-based interactive query expansion support tool. In Proceedings of the 26th European Conference on Information Retrieval Research (ECIR 2004), LNCS (Vol. 2997, pp. 42–56). Sunderland, UK: Springer-Verlag.

    Google Scholar 

  • Jones, K. S., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, Elsevier, 36(6), 779–840.

    Article  Google Scholar 

  • Kryszkiewicz, M. (2002). Concise representation of frequent patterns and association rules. Habilitation dissertation, Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland.

    Google Scholar 

  • Lin, H. C., Wang, L. H., & Chen, S. M. (2008). Query expansion for document retrieval by mining additional query terms. Information and Management Sciences, 19(1), 17–30.

    MATH  Google Scholar 

  • Liu, H., Sun, J., & Zhang, H. (2009). Post-Mining of association rules: Techniques for effective knowledge extraction. Chapter V: Post-processing for rule reduction using closed set. IGI Global Publisher.

  • Lucchese, C., Orlando, S., Palmerini, P., Perego, R., & Silvestri, F. (2003). kDCI: A multi-strategy algorithm for mining frequent sets. In Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI 2003). CEUR Workshop Proceedings (Vol. 90). Melbourne, Florida, USA.

  • Luxenburger, M. (1991). Implications partielles dans un contexte. Mathématiques, Informatique et Sciences Humaines, 29(113), 35–55.

    MathSciNet  Google Scholar 

  • Mitra, M., Singhal, A., & Buckley, C. (1998). Improving automatic query expansion. In Proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1998) (pp. 206–214). Melbourne, Australia: ACM Press.

    Chapter  Google Scholar 

  • Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., & Lakhal, L. (2005). Generating a condensed representation for association rules. Journal of Intelligent Information Systems, 24(1), 25–60.

    Article  Google Scholar 

  • Pfaltz, J. L., & Taylor, C. M. (2002). Scientific knowledge discovery through iterative transformation of concept lattices. In Proceedings of the workshop on discrete applied mathematics in conjunction with the 2nd SIAM international conference on data mining (SDM 2002) (pp. 65–74). Arlington, Virginia, USA.

  • Qui, Y., & Frei, H. P. (1993). Concept based query expansion. In Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1993) (pp. 160–169). Pittsburgh, PA, USA: ACM Press.

    Google Scholar 

  • Rungsawang, A., Tangpong, A., Laohawee, P., & Khampachua, T. (1999). Novel query expansion technique using Apriori algorithm. In Proceedings of the 8th Text REtrieval Conference (TREC 1999).

  • Ruthven, I. (2003). Re-examining the potential effectiveness of interactive query expansion. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2003) (pp. 213–220). Toronto, Canada: ACM Press.

    Chapter  Google Scholar 

  • Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, Cambridge University Press, 18(2), 95–145.

    Google Scholar 

  • Salton, G. (1971). The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall Series in Automatic Computation, Prentice-Hall, NJ, USA.

    Google Scholar 

  • Salton, G., & Buckely, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.

    Article  Google Scholar 

  • Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.

  • Schenkel, R., & Theobald, M. (2005). Relevance feedback for structural query expansion. In Proceedings of the 4th international workshop of the initiative for the evaluation of XML retrieval (INEX 2005), LNCS (Vol. 3977, pp. 344–357). Dagstuhl Castle, Germany: Springer-Verlag.

    Google Scholar 

  • Shi, X., & Yang, C. C. (2007). Mining related queries from web search engine query logs using an improved association rule mining model. Journal of the American Society for Information Science and Technology, Wiley, 58(12), 1871–1883.

    Article  MathSciNet  Google Scholar 

  • Smucker, M. D., Allan, J., & Carterette, B. (2007). A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th international conference on information and knowledge management (CIKM 2007) (pp. 623–632). Lisboa, Portugal: ACM Press.

    Google Scholar 

  • Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., & Lakhal, L. (2002). Computing Iceberg concept lattices with Titanic. Data & Knowledge Engineering, 2(42), 189–222.

    Article  Google Scholar 

  • Sun, R., Ong, C., & Chua, T. (2006). Mining dependency relations for query expansion in passage retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2006) (pp. 382–389). Seattle, Washington, USA: ACM Press.

    Chapter  Google Scholar 

  • Tangpong, A., & Rungsawang, A. (2000). Applying association rules discovery in query expansion process. In Proceedings of the 4th world multi-conference on systemics, cybernetics and informatics (SCI 2000). Orlando, Florida, USA.

  • Voorhees, E. M. (1993). Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1993) (pp. 171–180). Pittsburgh, PA, USA: ACM Press.

    Chapter  Google Scholar 

  • Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 1996) (pp. 4–11). Zurich, Switzerland: ACM Press.

    Chapter  Google Scholar 

  • Yurekli, B., Capan, G., Yilmazel, B., & Yilmazel, O. (2009). Guided navigation using query log mining through query expansion. In Proceedings of the 3rd international conference on network and system security (NSS 2009). IEEE computer society (pp. 560–564). Gold Coast, Queensland, Australia.

  • Zaki, M. J. (2004). Mining non-redundant association rules. Data Mining and Knowledge Discovery, 9(3), 223–248.

    Article  MathSciNet  Google Scholar 

  • Zhai, C. (2001). Notes on the Lemur tfidf model. Note with Lemur 1.9 documentation. Technical report, School of Computer Science, Computer Science Department, Carnegie Mellon University (CMU), Pittsburgh, PA, USA.

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments and suggestions. We are also grateful to the Evaluations and Language resources Distribution Agency (ELDA) which kindly provided us the LE Monde 94 and ATS 94 document collections of the CLEF 2003 campaign.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tarek Hamrouni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Latiri, C., Haddad, H. & Hamrouni, T. Towards an effective automatic query expansion process using an association rule mining approach. J Intell Inf Syst 39, 209–247 (2012). https://doi.org/10.1007/s10844-011-0189-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-011-0189-9

Keywords

Navigation