Skip to main content

Efficient Term Set Prediction Using the Bell-Wigner Inequality

  • Conference paper
  • First Online:
Book cover String Processing and Information Retrieval (SPIRE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9309))

Included in the following conference series:

  • International Symposium on String Processing and Information Retrieval
  • 1079 Accesses

Abstract

The task of measuring the dependence between terms is computationally expensive for IR systems which have to deal with large and sparse datasets. The current approaches to mining frequent term sets are based on the enumeration of the term sets found in a set of documents and on monotonicity, the latter being the property that a term set is frequent only if all its subsets are frequent as implemented by Apriori. However, the computational time can be very large. An alternative approach is to store the dataset in a FPT and to visit and prune the tree in a recursive way as implemented by FPGrowth. However, the storage space can still be very large. We introduce the BWI as a conceptual enhancement of monotonicity to predict with certainty when an itemset is frequent and when it is infrequent. We describe the empirical validation that the BWI can significantly reduce both the computational time of Apriori and the storage space of pattern tree-based algorithms such as FPGrowth. The empirical validation has been performed using some runs produced by IR systems from the TIPSTER test collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2006)

    Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD, Washington, D.C., pp. 207–216 (1993)

    Google Scholar 

  3. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp. 1–12 (2000)

    Google Scholar 

  4. Pitowsky, I.: Correlation polytopes: Their geometry and complexity. Mathematical Programming 50, 395–414 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  5. Pitowsky, I.: Quantum Probability - Quantum Logic. Springer (1989)

    Google Scholar 

  6. Blanco, R., Boldi, P.: Extending BM25 with multiple query operators. In: Proceedings of SIGIR, pp. 921–930 (2012)

    Google Scholar 

  7. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15, 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  8. Kirsch, A., Mitzenmacher, M., Pietracaprina, A., Pucci, G., Upfal, E., Vandin, F.: An efficient rigorous approach for identifying statistically significant frequent itemsets. Journal of the ACM 59(3) (2012)

    Google Scholar 

  9. Wang, K., He, Y., Han, J.: Mining frequent itemsets using support constraints. In: Proceedings of VLDB (2000)

    Google Scholar 

  10. Burdick, D., Calimlim, M., Flannick, J., Gehrke, J., Yiu, T.: MAFIA: A maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering 11, 1490–1504 (2005)

    Article  Google Scholar 

  11. Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proceedings of ICDM (2001)

    Google Scholar 

  12. Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of KDD, pp. 401–406. ACM New York (2001)

    Google Scholar 

  13. Liu, J., Pan, Y., Wang, K., Han, J.: Mining frequent item sets by opportunistic projection. In: Proceedings of KDD, pp. 229–238. ACM, New York (2002)

    Google Scholar 

  14. Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of ICDM, pp. 441–448. IEEE Computer Society, Washington, DC (2001)

    Google Scholar 

  15. Pietracaprina, A., Zandolin, D.: Mining frequent itemsets using patricia tries. In: Goethals, B., Zaki, M.J. (eds.) FIMI. CEUR Workshop Proceedings, vol. 90. CEUR-WS.org (2003)

    Google Scholar 

  16. Schlegel, B., Gemulla, R., Lehner, W.L.W.: Memory-efficient frequent-itemset mining. In: Proceedings of EDBT, pp. 461–472 (2011)

    Google Scholar 

  17. Pôssas, B., Ziviani, N., Meira Jr, W., Ribeiro-Neto, B.: Set-based vector model: An efficient approach for correlation-based ranking. ACM Trans. Inf. Syst. 23(4), 397–429 (2005)

    Article  Google Scholar 

  18. Amir, A., Aumann, Y., Feldman, R., Fresko, M.: Maximal association rules: A tool for mining associations in text. J. Intell. Inf. Syst. 25(3), 333–345 (2005)

    Article  Google Scholar 

  19. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Data Min. Knowl. Discov. 4(2–3), 89–125 (2000)

    Article  Google Scholar 

  20. Fonseca, B.M., Golgher, P., Pôssas, B., Ribeiro-Neto, B., Ziviani, N.: Concept-based interactive query expansion. In: Proceedings of CIKM, CIKM 2005, pp. 696–703. ACM, New York (2005)

    Google Scholar 

  21. Fonseca, B.M., Golgher, P.B., De Moura, E.S., Pôssas, B., Ziviani, N.: Discovering search engine related queries using association rules. J. Web Eng. 2(4), 215–227 (2003)

    Google Scholar 

  22. Song, D., Huang, Q., Rüger, S.M., Bruza, P.D.: Facilitating Query Decomposition in Query Language Modeling by Association Rule Mining Using Multiple Sliding Windows. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 334–345. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)

    Google Scholar 

  24. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)

    Article  Google Scholar 

  25. Keyword Discovery. http://www.keyworddiscovery.com/keyword-stats.html (visited on April 2014)

  26. Bendersky, M., Croft, W.B.: Analysis of long queries in a large scale search log. In: Proceedings of the Workshop on Web Search Click Data, WSCD 2009, pp. 8–14. ACM, New York (2009)

    Google Scholar 

  27. Gan, Q., Attenberg, J., Markowetz, A., Suel, T.: Analysis of geographic queries in a search engine log. In: Proceedings of the International Workshop on Location and the Web, LOCWEB 2008, pp. 49–56. ACM New York (2008)

    Google Scholar 

  28. Jansen, B.J., Spink, A.: How are we searching the world wide web?: a comparison of nine search engine transaction logs. Inf. Process. Manage. 42, 248–263 (2006)

    Article  Google Scholar 

  29. Jansen, B.J., Booth, D.L., Spink, A.: Determining the user intent of Web search engine queries. In: Proceedings of WWW, pp. 1149–1150. ACM, New York (2007)

    Google Scholar 

  30. Jansen, B.J., Booth, D.L., Spink, A.: Determining the informational, navigational, and transactional intent of Web queries. Inf. Process. Manage. 44, 1251–1266 (2008)

    Article  Google Scholar 

  31. Jansen, B.J., Booth, D.L., Spink, A.: Patterns of query reformulation during Web searching. Journal of the American Society for Information Science and Technology 60, 1358–1371 (2009)

    Article  Google Scholar 

  32. Huston, S., Croft, W.B.: Evaluating verbose query processing techniques. In: Proceedings of SIGIR, pp. 291–298. ACM, New York (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Melucci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Melucci, M. (2015). Efficient Term Set Prediction Using the Bell-Wigner Inequality. In: Iliopoulos, C., Puglisi, S., Yilmaz, E. (eds) String Processing and Information Retrieval. SPIRE 2015. Lecture Notes in Computer Science(), vol 9309. Springer, Cham. https://doi.org/10.1007/978-3-319-23826-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23826-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23825-8

  • Online ISBN: 978-3-319-23826-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics