Abstract
Frequent items could be considered as a generic type of pattern in a database. In the context of multiple data sources most of the global patterns are based on local frequency items. A multi-branch company transacting from different branches often needs to extract global patterns from data distributed over the branches. Global decisions could be made effectively using such patterns. Thus it becomes important to cluster local frequency items in multiple databases. In this chapter an overview of the existing measures of association is presented. For the purpose of selecting the suitable technique of mining multiple databases we have surveyed the existing multi-database mining techniques. A study on the related clustering techniques is also covered here. We present the notion of high frequency itemsets, and an algorithm for synthesizing the supports of such itemsets is designed. The existing clustering technique might cluster a set of items at a low level since it estimates association among items in an itemset with low accuracy, and thus a new algorithm for clustering local frequency items is proposed. Due to the suitability of measure of association A 2, on its basis, association among items in a high frequency itemset is synthesized. The soundness of the clustering technique has been shown. Numerous experiments are conducted using five datasets, and the results concerning different aspects of the proposed problem are presented in the experimental section. The effectiveness of the proposed clustering technique is more visible in dense databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adhikari A (2012) Synthesizing global exceptional patterns in different data sources. J Intell Syst 21(3):293–323
Adhikari A (2013) Clustering local frequency items in multiple databases. Inf Sci 237:221–241
Adhikari A, Rao PR (2008a) Capturing association among items in a database. Data Knowl Eng 67(3):430–443
Adhikari A, Rao PR (2008b) Mining conditional patterns in a database. Pattern Recogn Lett 29(10):1515–1523
Adhikari A, Rao PR (2008c) Efficient clustering of databases induced by local patterns. Decis Support Syst 44(4):925–943
Adhikari A, Rao PR (2008d) Synthesizing heavy association rules from different real data sources. Pattern Recogn Lett 29(1):59–71
Adhikari J, Rao PR, Adhikari A (2009) Clustering items in different data sources induced by stability. Int Arab J Inf Technol 6(4):394–402
Adhikari A, Ramachandrarao P, Pedrycz W (2010a) Developing multi-database mining applications. Springer, London
Adhikari A, Rao PR, Prasad B, Adhikari J (2010b) Mining multiple large data sources. Int Arab J Inf Technol 7(2):243–251
Adhikari A, Ramachandrarao P, Pedrycz W (2011a) Study of select items in different data sources by grouping. Knowl Inf Syst 27(1):23–43
Adhikari J, Rao PR, Pedrycz W (2011b) Mining icebergs in time-stamped databases. In: Proceedings of Indian international conferences on artificial intelligence, pp 639–658
Agrawal R, Shafer J (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the international conference on very large data bases, pp 487–499
Aggarwal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of PODS, pp 18–24
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD conference on management of data, pp 207–216
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, pp 115–118
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of SIGMOD conference, pp 255–264
Chattratichat J, Darlington J, Ghanem M, Guo Y, Hüning H, Köhler M, Sutiwaraphun J, To HW, Yang D (1997) Large scale data mining: challenges and responses. In: Proceedings of the third international conference on knowledge discovery and data mining, pp 143–146
Chen L, Zou L, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47
Cheung D, Ng V, Fu A, Fu Y (1996) Efficient mining of association rules in distributed databases. IEEE Trans Knowl Data Eng 8(6):911–922
Duan L, Street WN (2009) Finding maximal fully-correlated itemsets in large databases. In: Proceedings of ICDM, pp 770–775
Estivill-Castro V, Yang J (2004) Fast and robust general purpose clustering algorithms. Data Min Knowl Disc 8(2):127–150
Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/
Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi/src/
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kauffmann Publishers, San Francisco
Han J, Pei J, Yiwen Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD conference on management of data, pp 1–12
Hershberger SL, Fisher DG (2005) Measures of association, encyclopedia of statistics in behavioral science. Wiley, London
He D, Wu X, Zhu X (2010) Rule synthesizing from multiple related databases. In: Proceedings of PAKDD(2), pp 201–213
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Lee J-S, Ólafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181(4):732–746
Liu CL (1985) Elements of discrete mathematics. McGraw-Hill, New York
Liu H, Lu H, Yao J (2001) Toward multi-database mining: identifying relevant databases. IEEE Trans Knowl Data Eng 13(4):541–553
Malinen MI, Fränti P (2012) Clustering by analytic functions. Inf Sci 217:31–38
Mampaey M, Vreeken J (2013) Summarizing categorical data by clustering attributes. Data Min Knowl Disc 26(1):130–173
Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Proceedings of knowledge discovery in databases, pp 229–248
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st international conference on very large data bases, pp 432–443
Tan P-N, Kumar V, Srivastava J (2003) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD conference, pp 32–41
UCI ML repository content summary. http://www.ics.uci.edu/~mlearn/MLSummary.html
Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 14(2):353–367
Wu X, Zhang C, Zhang S (2005) Database classification for multi-database mining. Inf Syst 30(1):71–88
Yakut I, Polat H (2012) Privacy-preserving hybrid collaborative filtering on cross distributed data. Knowl Inf Syst 30(2):405–433
Zhang C, Liu M, Nie W, Zhang S (2004a) Identifying global exceptional patterns in multi-database mining. IEEE Comput Intell Bull 3(1):19–24
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182
Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1):5–13
Zhang S, Zhang C, Wu X (2004b) Knowledge discovery in multiple databases. Springer, London
Zhou W, Xiong H (2009) Efficient discovery of confounders in large data sets. In: Proceedings of ICDM, pp 647–656
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Adhikari, A., Adhikari, J., Pedrycz, W. (2014). Clustering Local Frequency Items in Multiple Data Sources. In: Data Analysis and Pattern Recognition in Multiple Databases. Intelligent Systems Reference Library, vol 61. Springer, Cham. https://doi.org/10.1007/978-3-319-03410-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-03410-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03409-6
Online ISBN: 978-3-319-03410-2
eBook Packages: EngineeringEngineering (R0)