Advertisement

Efficient Clustering of Databases Induced by Local Patterns

  • Animesh AdhikariEmail author
  • Pralhad Ramachandrarao
  • Witold Pedrycz
Chapter
Part of the Advanced Information and Knowledge Processing book series (AI&KP)

Abstract

In view of answering queries provided in multiple large databases, it might be required to mine relevant databases en block. In this chapter, we present an efficient solution to clustering multiple large databases. We present two measures of similarity between a pair of databases and study their main properties. In the sequel, we design an algorithm for clustering multiple databases based on an introduced similarity measure. Also, we present a coding, referred to as IS coding, to represent itemsets space efficiently. The coding of this nature enables more frequent itemsets to participate in the determination of the similarity between two databases. Thus the invoked clustering process becomes more accurate. We also show that the IS coding attains maximum efficiency in most of the cases of the mining processes. The clustering algorithm becomes improved (in terms of its time complexity) when contrasted with the existing clustering algorithms. The efficiency of the clustering process has been improved using several strategies that is by reducing execution time of the clustering algorithm, using more suitable similarity measure, and storing frequent itemsets space efficiently.

Keywords

Association Rule Cluster Process Frequent Itemsets Good Partition Transactional Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Adhikari A, Rao PR (2008) Efficient clustering of databases induced by local patterns. Decision Support Systems 44(4):925–943CrossRefGoogle Scholar
  2. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD Conference, Washington, DC, pp. 207–216Google Scholar
  3. Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA, pp. 115–118Google Scholar
  4. Babcock B, Chaudhury S, Das G (2003) Dynamic sample selection for approximate query processing. In: Proceedings of ACM SIGMOD Conference Management of Data, New York, pp. 539–550Google Scholar
  5. Bandyopadhyay S, Giannella C, Maulik U, Kargupta H, Liu K, Datta S (2006) Clustering distributed data streams in peer-to-peer environments. Information Sciences 176(14): 1952–1985CrossRefGoogle Scholar
  6. Barte RG (1976) The Elements of Real Analysis. Second edition, John Wiley & Sons, New YorkGoogle Scholar
  7. Frequent Itemset Mining Dataset Repository (2004) http://fimi.cs.helsinki.fi/data
  8. Huffman DA (1952) A method for the construction of minimum redundancy codes. In: Proceedings of the IRE 40(9), pp. 1098–1101CrossRefGoogle Scholar
  9. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: A review. ACM Computing Surveys 31(3): 264–323CrossRefGoogle Scholar
  10. Lee C-H, Lin C-R, Chen M-S (2001) Sliding-window filtering: An efficient algorithm for incremental mining. In: Proceedings of the 10th International Conference on Information and Knowledge Management, Atlanta, USA, pp. 263–270Google Scholar
  11. Li H, Hu X, Zhang Y (2009) An improved database classification algorithm for multi-database mining. In: Proceedings of the 3d International Workshop on Frontiers in Algorithmics, Springer, Berlin/Heidelberg, pp. 346–357Google Scholar
  12. Ling CX, Yang Q (2006) Discovering classification from data of multiple sources. Data Mining Knowledge Discovery 12(2–3): 181–201MathSciNetCrossRefGoogle Scholar
  13. Liu CL (1985) Elements of Discrete Mathematics. Second edition, McGraw-Hill, New YorkzbMATHGoogle Scholar
  14. Liu H, Lu H, Yao J (2001) Toward multi-database mining: Identifying relevant databases. IEEE Transactions on Knowledge and Data Engineering 13(4): 541–553CrossRefGoogle Scholar
  15. Sayood K (2000) Introduction to data compression. Morgan Kaufmann, San FranciscoGoogle Scholar
  16. Su K, Huang H, Wu X, S. Zhang S (2006) A logical framework for identifying quality knowledge from different data sources. Decision Support Systems 42(3): 1673–1683Google Scholar
  17. Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD Conference, Edmonton, Alberta, Canada, pp. 32–41Google Scholar
  18. Wu X, Wu Y, Wang Y, Li Y (2005a) Privacy-aware market basket data set generation: A feasible approach for inverse frequent set mining. In: Proceedings of SIAM International Conference on Data Mining, pp. 103–114Google Scholar
  19. Wu X, Zhang C, Zhang S (2005b) Database classification for multi-database mining. Information Systems 30(1): 71–88zbMATHCrossRefGoogle Scholar
  20. Yang W, Huang S (2008) Data privacy protection in multi-party clustering. Data and Knowledge Engineering 67(1): 185–199CrossRefGoogle Scholar
  21. Yin X, Han J (2005) Efficient classification from multiple heterogeneous databases. In: Proceedings of 9-th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pp. 404–416Google Scholar
  22. Yin X, Yang J, Yu PS, Han J (2006) Efficient classification across multiple database relations: A crossmine approach. IEEE Transactions on Knowledge and Data Engineering 18(6): 770–783CrossRefGoogle Scholar
  23. Zhang S (2002) Knowledge discovery in multi-databases by analyzing local instances, Ph D thesis, Deakin UniversityGoogle Scholar
  24. Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery 1(2): 141–182CrossRefGoogle Scholar
  25. Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Computational Intelligence Bulletin 2(1): 5–13Google Scholar

Copyright information

© Springer-Verlag London 2010

Authors and Affiliations

  • Animesh Adhikari
    • 1
    Email author
  • Pralhad Ramachandrarao
    • 2
  • Witold Pedrycz
    • 3
  1. 1.Department of Computer ScienceSmt. Parvatibal Chowgule CollegeMargoaIndia
  2. 2.Department of Computer Science & TechnologyGoa UniversityGoaIndia
  3. 3.Department of Electrical & Computer EngineeringUniversity of AlbertaEdmontonCanada

Personalised recommendations