Clustering Local Frequency Items in Multiple Data Sources

Adhikari, Animesh; Adhikari, Jhimli; Pedrycz, Witold

doi:10.1007/978-3-319-03410-2_5

Animesh Adhikari⁶,
Jhimli Adhikari⁷ &
Witold Pedrycz⁸

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 61))

1458 Accesses

Abstract

Frequent items could be considered as a generic type of pattern in a database. In the context of multiple data sources most of the global patterns are based on local frequency items. A multi-branch company transacting from different branches often needs to extract global patterns from data distributed over the branches. Global decisions could be made effectively using such patterns. Thus it becomes important to cluster local frequency items in multiple databases. In this chapter an overview of the existing measures of association is presented. For the purpose of selecting the suitable technique of mining multiple databases we have surveyed the existing multi-database mining techniques. A study on the related clustering techniques is also covered here. We present the notion of high frequency itemsets, and an algorithm for synthesizing the supports of such itemsets is designed. The existing clustering technique might cluster a set of items at a low level since it estimates association among items in an itemset with low accuracy, and thus a new algorithm for clustering local frequency items is proposed. Due to the suitability of measure of association A ₂, on its basis, association among items in a high frequency itemset is synthesized. The soundness of the clustering technique has been shown. Numerous experiments are conducted using five datasets, and the results concerning different aspects of the proposed problem are presented in the experimental section. The effectiveness of the proposed clustering technique is more visible in dense databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adhikari A (2012) Synthesizing global exceptional patterns in different data sources. J Intell Syst 21(3):293–323
Google Scholar
Adhikari A (2013) Clustering local frequency items in multiple databases. Inf Sci 237:221–241
Article MathSciNet Google Scholar
Adhikari A, Rao PR (2008a) Capturing association among items in a database. Data Knowl Eng 67(3):430–443
Article Google Scholar
Adhikari A, Rao PR (2008b) Mining conditional patterns in a database. Pattern Recogn Lett 29(10):1515–1523
Article Google Scholar
Adhikari A, Rao PR (2008c) Efficient clustering of databases induced by local patterns. Decis Support Syst 44(4):925–943
Article Google Scholar
Adhikari A, Rao PR (2008d) Synthesizing heavy association rules from different real data sources. Pattern Recogn Lett 29(1):59–71
Article Google Scholar
Adhikari J, Rao PR, Adhikari A (2009) Clustering items in different data sources induced by stability. Int Arab J Inf Technol 6(4):394–402
Google Scholar
Adhikari A, Ramachandrarao P, Pedrycz W (2010a) Developing multi-database mining applications. Springer, London
Book MATH Google Scholar
Adhikari A, Rao PR, Prasad B, Adhikari J (2010b) Mining multiple large data sources. Int Arab J Inf Technol 7(2):243–251
Google Scholar
Adhikari A, Ramachandrarao P, Pedrycz W (2011a) Study of select items in different data sources by grouping. Knowl Inf Syst 27(1):23–43
Article Google Scholar
Adhikari J, Rao PR, Pedrycz W (2011b) Mining icebergs in time-stamped databases. In: Proceedings of Indian international conferences on artificial intelligence, pp 639–658
Google Scholar
Agrawal R, Shafer J (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Article Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large data bases, pp 487–499
Google Scholar
Aggarwal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of PODS, pp 18–24
Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD conference on management of data, pp 207–216
Google Scholar
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, pp 115–118
Google Scholar
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of SIGMOD conference, pp 255–264
Google Scholar
Chattratichat J, Darlington J, Ghanem M, Guo Y, Hüning H, Köhler M, Sutiwaraphun J, To HW, Yang D (1997) Large scale data mining: challenges and responses. In: Proceedings of the third international conference on knowledge discovery and data mining, pp 143–146
Google Scholar
Chen L, Zou L, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47
Article Google Scholar
Cheung D, Ng V, Fu A, Fu Y (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6):911–922
Article Google Scholar
Duan L, Street WN (2009) Finding maximal fully-correlated itemsets in large databases. In: Proceedings of ICDM, pp 770–775
Google Scholar
Estivill-Castro V, Yang J (2004) Fast and robust general purpose clustering algorithms. Data Min Knowl Disc 8(2):127–150
Article MathSciNet Google Scholar
Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/
Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi/src/
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kauffmann Publishers, San Francisco
Google Scholar
Han J, Pei J, Yiwen Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD conference on management of data, pp 1–12
Google Scholar
Hershberger SL, Fisher DG (2005) Measures of association, encyclopedia of statistics in behavioral science. Wiley, London
Google Scholar
He D, Wu X, Zhu X (2010) Rule synthesizing from multiple related databases. In: Proceedings of PAKDD(2), pp 201–213
Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Lee J-S, Ólafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181(4):732–746
Article Google Scholar
Liu CL (1985) Elements of discrete mathematics. McGraw-Hill, New York
MATH Google Scholar
Liu H, Lu H, Yao J (2001) Toward multi-database mining: identifying relevant databases. IEEE Trans Knowl Data Eng 13(4):541–553
Article Google Scholar
Malinen MI, Fränti P (2012) Clustering by analytic functions. Inf Sci 217:31–38
Article Google Scholar
Mampaey M, Vreeken J (2013) Summarizing categorical data by clustering attributes. Data Min Knowl Disc 26(1):130–173
Article MathSciNet MATH Google Scholar
Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Proceedings of knowledge discovery in databases, pp 229–248
Google Scholar
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st international conference on very large data bases, pp 432–443
Google Scholar
Tan P-N, Kumar V, Srivastava J (2003) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD conference, pp 32–41
Google Scholar
UCI ML repository content summary. http://www.ics.uci.edu/~mlearn/MLSummary.html
Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 14(2):353–367
Google Scholar
Wu X, Zhang C, Zhang S (2005) Database classification for multi-database mining. Inf Syst 30(1):71–88
Article MATH Google Scholar
Yakut I, Polat H (2012) Privacy-preserving hybrid collaborative filtering on cross distributed data. Knowl Inf Syst 30(2):405–433
Article Google Scholar
Zhang C, Liu M, Nie W, Zhang S (2004a) Identifying global exceptional patterns in multi-database mining. IEEE Comput Intell Bull 3(1):19–24
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182
Article Google Scholar
Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1):5–13
Google Scholar
Zhang S, Zhang C, Wu X (2004b) Knowledge discovery in multiple databases. Springer, London
Book MATH Google Scholar
Zhou W, Xiong H (2009) Efficient discovery of confounders in large data sets. In: Proceedings of ICDM, pp 647–656
Google Scholar

Download references

Author information

Authors and Affiliations

Parvatibai Chowgule College, P. O. Fatorda, Margao, Goa, 403 602, India
Animesh Adhikari
Narayan Zantye College, P. O. Bicholim Industrial Estate, Bicholim, 403 529, India
Jhimli Adhikari
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, ABT6G2V4, Canada
Witold Pedrycz

Authors

Animesh Adhikari
View author publications
You can also search for this author in PubMed Google Scholar
Jhimli Adhikari
View author publications
You can also search for this author in PubMed Google Scholar
Witold Pedrycz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Animesh Adhikari .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Adhikari, A., Adhikari, J., Pedrycz, W. (2014). Clustering Local Frequency Items in Multiple Data Sources. In: Data Analysis and Pattern Recognition in Multiple Databases. Intelligent Systems Reference Library, vol 61. Springer, Cham. https://doi.org/10.1007/978-3-319-03410-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-03410-2_5
Published: 07 December 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03409-6
Online ISBN: 978-3-319-03410-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics