Advertisement

Introduction

  • Animesh AdhikariEmail author
  • Pralhad Ramachandrarao
  • Witold Pedrycz
Chapter
Part of the Advanced Information and Knowledge Processing book series (AI&KP)

Abstract

Many large organizations operate from multiple branches. Some of these branches collect data continuously. Thus, there are multi-branch organizations that possess multiple databases. Global decisions made by such an organization might be more appropriate if they are based on the data distributed over the branches. Moreover, the number of such applications is increasing over time. In this chapter, we discuss some of the major challenges encountered in multi-database mining that need to be dealt with. We discuss different issues of distributed data mining arising in this setting. In addition, we present three fundamental approaches to mining multiple large databases. We also elaborate on the recent developments that are taken place in this area. We provide a roadmap on how to develop an effective multi-database mining application and conclude the chapter by identifying some future research directions.

Keywords

Data Mining Association Rule Local Pattern Frequent Itemsets Multiple Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Adhikari A, Rao PR (2007) Synthesizing global exceptional patterns in multiple databases. In: Proceedings of the 3rd Indian International Conference on Artificial Intelligence, pp. 512–531Google Scholar
  2. Adhikari A, Rao PR (2008a) Synthesizing heavy association rules from different real data sources. Pattern Recognition Letters 29(1): 59–71CrossRefGoogle Scholar
  3. Adhikari A, Rao PR (2008b) Efficient clustering of databases induced by local patterns. Decision Support Systems 44(4): 925–943CrossRefGoogle Scholar
  4. Agrawal R, Shafer J (1999) Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6): 962–969CrossRefGoogle Scholar
  5. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of International Conference on Very Large Data Bases, Santiago, Chile, pp. 487–499Google Scholar
  6. Babcock B, Chaudhury S, Das G (2003) Dynamic sample selection for approximate query processing. In: Proceedings of ACM SIGMOD Conference Management of Data, New York, pp. 539–550Google Scholar
  7. Cochran WG (1977) Sampling techniques. Third edition, Wiley, New YorkzbMATHGoogle Scholar
  8. Coenen F, Leng P, Ahmed S (2004) Data structure for association rule mining: T-trees and P-trees. IEEE Transactions on Knowledge and Data Engineering 16(6):774–778CrossRefGoogle Scholar
  9. Congiusta A, Talia D, Trunfio P (2007) Service-oriented middleware for distributed data mining on the grid. Journal of Parallel and Distributed Computing 68(1): 3–15CrossRefGoogle Scholar
  10. Da Silva JC, Giannellab C, Bhargava R, Kargupta H, Klusch M (2005) Distributed data mining and agents. Engineering Applications of Artificial Intelligence 18(7): 791–807CrossRefGoogle Scholar
  11. Da Silva JC, Klusch M (2006) Inference in distributed data clustering. Engineering Applications of Artificial Intelligence 19(4): 363–369CrossRefGoogle Scholar
  12. Fiolet V, Toursel B (2007) A clustering method to distribute a database on a grid. Future Generation Computer Systems 23(8): 997–1002CrossRefGoogle Scholar
  13. Foster I, Kesselman C (eds.) (1999) The Grid: Blueprint for a future computing infrastructure. Morgan Kaufmann, San FranciscoGoogle Scholar
  14. Greenfield A (2006) Everyware: The Dawning Age of Ubiquitous Computing. First edition, New Riders Publishing, Indianapolis, INGoogle Scholar
  15. Han J, Nishio S, Kawano H, Wang W (1998) Generalization-based data mining in object-oriented databases using an object cube model. Data and Knowledge Engineering 25(1–2): 55–97CrossRefzbMATHGoogle Scholar
  16. Han J, Pei J, Yiwen Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 1–12Google Scholar
  17. Hu J, Zhong N (2006) Organizing multiple data sources for developing intelligent e-business portals. Data Mining and Knowledge Discovery 12(2–3): 127–150MathSciNetCrossRefGoogle Scholar
  18. Inan A, Kaya SV, Saygın Y, Savas E, Hintoglu AA, Levi A (2007) Privacy preserving clustering on horizontally partitioned data. Data and Knowledge Engineering 63(3): 646–666CrossRefGoogle Scholar
  19. Kargupta H, Han J, Yu PS, Motwani R, Kumar V (2008) Next Generation of Data Mining. CRC Press, Bocca RatonGoogle Scholar
  20. Kargupta H, Huang W, Krishnamurthy S, Park B, Wang S (2000) Collective PCA from distributed and heterogeneous data. In: Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer Verlag, pp. 452–457.Google Scholar
  21. Kargupta H, Joshi A, Sivakumar K, Yesha Y (2004) Data Mining: Next Generation Challenges and Future Directions. MIT/AAAI Press, Cambridge, MAGoogle Scholar
  22. Kargupta H, Liu K, Ryan J (2003) Privacy sensitive distributed data mining from multi-party data. In: Proceedings of Intelligence and Security Informatics, Springer-Verlag, pp. 336–342.Google Scholar
  23. Karp P, Riley M, Paley S, Pellegrini-Toole A (1997) EcoCyc: Electronic encyclopedia of E. coli genes and metabolism. Nucleic Acids Research, 25(1), 43–50CrossRefGoogle Scholar
  24. Kum H-C, Chang HC, Wang W (2006) Sequential pattern mining in multi-databases via multiple alignment. Data Mining and Knowledge Discovery 12(2–3): 151–180MathSciNetCrossRefGoogle Scholar
  25. Luo J, Wang M, Hu J, Shi J (2007) Distributed data mining on Agent Grid: Issues, platform and development toolkit. Future Generation Computer Systems 23(1, 1): 61–68CrossRefGoogle Scholar
  26. Page D, Craven M (2003) Biological applications of multi-relational data mining. SIGKDD Explorations 5(1): 69–79CrossRefGoogle Scholar
  27. Peng W-C, Liao Z-X (2009) Mining sequential patterns across multiple sequence databases. Data & Knowledge Engineering 68(10): 1014–1033CrossRefGoogle Scholar
  28. Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 432–443Google Scholar
  29. Siadaty MS, Harrison Jr JH (2008) Multi-database mining. Clinics in Laboratory Medicine 28(1): 73–82CrossRefGoogle Scholar
  30. Stankovski V, Swain M, Kravtsov V, Niessen T, Wegener D, Kindermann J, Dubitzky W (2008) Grid-enabling data mining applications with DataMiningGrid: An architectural perspective. Future Generation Computer Systems 24(4): 259–279CrossRefGoogle Scholar
  31. Stolfo S, Prodromidis AL, Chan PK (1997) JAM: Java agents for meta-learning over distributed databases. In: Proceedings of Third International Conference on Knowledge Discovery and Data Mining, pp. 74–81Google Scholar
  32. Su K, Huang H, Wu X, Zhang S (2006) A logical framework for identifying quality knowledge from different data sources. Decision Support Systems 42(3): 1673–1683CrossRefGoogle Scholar
  33. Wang JT, Zaki MJ, Toivonen HT, Shasha DE (2005) Data Mining in Bioinformatics. Springer, London/New YorkzbMATHGoogle Scholar
  34. Wilkinson (2009) Grid computing: Techniques and applications, CRC Press, Boca RatonGoogle Scholar
  35. Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Transactions on Knowledge and Data Engineering 14(2): 353–367Google Scholar
  36. Wu X, Zhang C, Zhang S (2005) Database classification for multi-database mining. Information Systems 30(1): 71–88CrossRefzbMATHGoogle Scholar
  37. Yan J, Liu N, Yang Q, Zhang B, Cheng Q, Chen Z (2006) Mining adaptive ratio rules from distributed data sources. Data Mining and Knowledge Discovery 12 (2–3): 249–273MathSciNetCrossRefGoogle Scholar
  38. Yi X, Zhang Y (2007) Privacy-preserving distributed association rule mining via semi-trusted mixer. Data and Knowledge Engineering 63(2): 550–567CrossRefGoogle Scholar
  39. Zhan J, Matwina S, Chang LW (2006) Privacy-preserving collaborative association rule mining. Journal of Network and Computer Applications 30(3): 1216–1227CrossRefGoogle Scholar
  40. Zhang C, Liu M, Nie W, Zhang S (2004a) Identifying global exceptional patterns in multi-database mining. IEEE Computational Intelligence Bulletin 3(1): 19–24Google Scholar
  41. Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Computational Intelligence Bulletin 2(1): 5–13Google Scholar
  42. Zhang S, You X, Jin Z, Wu X (2009) Mining globally interesting patterns from multiple databases using kernel estimation. Expert Systems with Applications 36(8): 10863–10869CrossRefGoogle Scholar
  43. Zhang S, Zhang C, Wu X (2004b) Knowledge discovery in multiple databases. Springer, New YorkCrossRefzbMATHGoogle Scholar
  44. Zhao F, Guibas L (2004) Wireless Sensor Networks: An Information Processing Approach. Morgan Kaufmann, San FranciscoGoogle Scholar
  45. Zhong S (2007) Privacy-preserving algorithms for distributed mining of frequent itemsets. Information Sciences 177(2): 490–503CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London 2010

Authors and Affiliations

  • Animesh Adhikari
    • 1
    Email author
  • Pralhad Ramachandrarao
    • 2
  • Witold Pedrycz
    • 3
  1. 1.Department of Computer ScienceSmt. Parvatibal Chowgule CollegeMargoaIndia
  2. 2.Department of Computer Science & TechnologyGoa UniversityGoaIndia
  3. 3.Department of Electrical & Computer EngineeringUniversity of AlbertaEdmontonCanada

Personalised recommendations