Skip to main content

Clustering Local Frequency Items in Multiple Data Sources

  • Chapter
  • First Online:
Book cover Data Analysis and Pattern Recognition in Multiple Databases

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 61))

  • 1458 Accesses

Abstract

Frequent items could be considered as a generic type of pattern in a database. In the context of multiple data sources most of the global patterns are based on local frequency items. A multi-branch company transacting from different branches often needs to extract global patterns from data distributed over the branches. Global decisions could be made effectively using such patterns. Thus it becomes important to cluster local frequency items in multiple databases. In this chapter an overview of the existing measures of association is presented. For the purpose of selecting the suitable technique of mining multiple databases we have surveyed the existing multi-database mining techniques. A study on the related clustering techniques is also covered here. We present the notion of high frequency itemsets, and an algorithm for synthesizing the supports of such itemsets is designed. The existing clustering technique might cluster a set of items at a low level since it estimates association among items in an itemset with low accuracy, and thus a new algorithm for clustering local frequency items is proposed. Due to the suitability of measure of association A 2, on its basis, association among items in a high frequency itemset is synthesized. The soundness of the clustering technique has been shown. Numerous experiments are conducted using five datasets, and the results concerning different aspects of the proposed problem are presented in the experimental section. The effectiveness of the proposed clustering technique is more visible in dense databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Adhikari A (2012) Synthesizing global exceptional patterns in different data sources. J Intell Syst 21(3):293–323

    Google Scholar 

  • Adhikari A (2013) Clustering local frequency items in multiple databases. Inf Sci 237:221–241

    Article  MathSciNet  Google Scholar 

  • Adhikari A, Rao PR (2008a) Capturing association among items in a database. Data Knowl Eng 67(3):430–443

    Article  Google Scholar 

  • Adhikari A, Rao PR (2008b) Mining conditional patterns in a database. Pattern Recogn Lett 29(10):1515–1523

    Article  Google Scholar 

  • Adhikari A, Rao PR (2008c) Efficient clustering of databases induced by local patterns. Decis Support Syst 44(4):925–943

    Article  Google Scholar 

  • Adhikari A, Rao PR (2008d) Synthesizing heavy association rules from different real data sources. Pattern Recogn Lett 29(1):59–71

    Article  Google Scholar 

  • Adhikari J, Rao PR, Adhikari A (2009) Clustering items in different data sources induced by stability. Int Arab J Inf Technol 6(4):394–402

    Google Scholar 

  • Adhikari A, Ramachandrarao P, Pedrycz W (2010a) Developing multi-database mining applications. Springer, London

    Book  MATH  Google Scholar 

  • Adhikari A, Rao PR, Prasad B, Adhikari J (2010b) Mining multiple large data sources. Int Arab J Inf Technol 7(2):243–251

    Google Scholar 

  • Adhikari A, Ramachandrarao P, Pedrycz W (2011a) Study of select items in different data sources by grouping. Knowl Inf Syst 27(1):23–43

    Article  Google Scholar 

  • Adhikari J, Rao PR, Pedrycz W (2011b) Mining icebergs in time-stamped databases. In: Proceedings of Indian international conferences on artificial intelligence, pp 639–658

    Google Scholar 

  • Agrawal R, Shafer J (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969

    Article  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large data bases, pp 487–499

    Google Scholar 

  • Aggarwal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of PODS, pp 18–24

    Google Scholar 

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD conference on management of data, pp 207–216

    Google Scholar 

  • Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, pp 115–118

    Google Scholar 

  • Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of SIGMOD conference, pp 255–264

    Google Scholar 

  • Chattratichat J, Darlington J, Ghanem M, Guo Y, Hüning H, Köhler M, Sutiwaraphun J, To HW, Yang D (1997) Large scale data mining: challenges and responses. In: Proceedings of the third international conference on knowledge discovery and data mining, pp 143–146

    Google Scholar 

  • Chen L, Zou L, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47

    Article  Google Scholar 

  • Cheung D, Ng V, Fu A, Fu Y (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6):911–922

    Article  Google Scholar 

  • Duan L, Street WN (2009) Finding maximal fully-correlated itemsets in large databases. In: Proceedings of ICDM, pp 770–775

    Google Scholar 

  • Estivill-Castro V, Yang J (2004) Fast and robust general purpose clustering algorithms. Data Min Knowl Disc 8(2):127–150

    Article  MathSciNet  Google Scholar 

  • Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/

  • Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi/src/

  • Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kauffmann Publishers, San Francisco

    Google Scholar 

  • Han J, Pei J, Yiwen Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD conference on management of data, pp 1–12

    Google Scholar 

  • Hershberger SL, Fisher DG (2005) Measures of association, encyclopedia of statistics in behavioral science. Wiley, London

    Google Scholar 

  • He D, Wu X, Zhu X (2010) Rule synthesizing from multiple related databases. In: Proceedings of PAKDD(2), pp 201–213

    Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Lee J-S, Ólafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181(4):732–746

    Article  Google Scholar 

  • Liu CL (1985) Elements of discrete mathematics. McGraw-Hill, New York

    MATH  Google Scholar 

  • Liu H, Lu H, Yao J (2001) Toward multi-database mining: identifying relevant databases. IEEE Trans Knowl Data Eng 13(4):541–553

    Article  Google Scholar 

  • Malinen MI, Fränti P (2012) Clustering by analytic functions. Inf Sci 217:31–38

    Article  Google Scholar 

  • Mampaey M, Vreeken J (2013) Summarizing categorical data by clustering attributes. Data Min Knowl Disc 26(1):130–173

    Article  MathSciNet  MATH  Google Scholar 

  • Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Proceedings of knowledge discovery in databases, pp 229–248

    Google Scholar 

  • Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st international conference on very large data bases, pp 432–443

    Google Scholar 

  • Tan P-N, Kumar V, Srivastava J (2003) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD conference, pp 32–41

    Google Scholar 

  • UCI ML repository content summary. http://www.ics.uci.edu/~mlearn/MLSummary.html

  • Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 14(2):353–367

    Google Scholar 

  • Wu X, Zhang C, Zhang S (2005) Database classification for multi-database mining. Inf Syst 30(1):71–88

    Article  MATH  Google Scholar 

  • Yakut I, Polat H (2012) Privacy-preserving hybrid collaborative filtering on cross distributed data. Knowl Inf Syst 30(2):405–433

    Article  Google Scholar 

  • Zhang C, Liu M, Nie W, Zhang S (2004a) Identifying global exceptional patterns in multi-database mining. IEEE Comput Intell Bull 3(1):19–24

    Google Scholar 

  • Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182

    Article  Google Scholar 

  • Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1):5–13

    Google Scholar 

  • Zhang S, Zhang C, Wu X (2004b) Knowledge discovery in multiple databases. Springer, London

    Book  MATH  Google Scholar 

  • Zhou W, Xiong H (2009) Efficient discovery of confounders in large data sets. In: Proceedings of ICDM, pp 647–656

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Animesh Adhikari .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Adhikari, A., Adhikari, J., Pedrycz, W. (2014). Clustering Local Frequency Items in Multiple Data Sources. In: Data Analysis and Pattern Recognition in Multiple Databases. Intelligent Systems Reference Library, vol 61. Springer, Cham. https://doi.org/10.1007/978-3-319-03410-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03410-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03409-6

  • Online ISBN: 978-3-319-03410-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics