Skip to main content

Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning

  • Conference paper
  • First Online:
Data Warehousing and Knowledge Discovery (DaWaK 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1874))

Included in the following conference series:

Abstract

In this paper, we propose a new algorithm named Inverted Hashing and Pruning (IHP) for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently, because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, the Apriori algorithm [1] and Direct Hashing and Pruning (DHP) algorithm [5], are evaluated in the context of mining text databases, and are compared with the proposed IHP algorithm. It has been shown that the IHP algorithm has better performance for large text databases.

This research was supported in part by Ohio Board of Regents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. of the 20th VLDB Conf., 1994, pp. 487–499.

    Google Scholar 

  2. S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc of the ACM SIGMOD Int’l Conf. on Management of Data, 1997, pp. 255–264.

    Google Scholar 

  3. M. S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No. 6, Dec. 1996, pp. 866–883.

    Article  Google Scholar 

  4. M. Gordon and S. Dumais, “Using Latent Semantic Indexing for Literature Based Discovery,” Journal of the Amer. Soc. of Info Science, Vol. 49, No. 8, June 1998, pp. 674–685.

    Article  Google Scholar 

  5. J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. on Knowledge and Data Engineering, Vol. 9, No. 5, Sep/Oct 1997, pp. 813–825.

    Article  Google Scholar 

  6. G. Salton, Automatic Text Processing: the transformation, analysis, and retrieval of information by computer, Addison-Wesley Publishing, 1988.

    Google Scholar 

  7. A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. of the 21st VLDB Conf., 1995, pp. 432–444.

    Google Scholar 

  8. H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. of the 22nd VLDB Conf., 1996, pp. 134–145.

    Google Scholar 

  9. E. M. Voorhees and D. K. Harmon (editors), The Fifth Text Retrieval Conference, National Institute of Standards and Technology, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Holt, J.D., Chung, S.M. (2000). Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_29

Download citation

  • DOI: https://doi.org/10.1007/3-540-44466-1_29

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67980-6

  • Online ISBN: 978-3-540-44466-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics