Advertisement

Toward Distributed Knowledge Discovery on Grid Systems

  • Nhien An Le KhacEmail author
  • Lamine M. Aouad
  • M-Tahar Kechadi
Part of the Advanced Information and Knowledge Processing book series (AI&KP)

Abstract

While massive amounts of data are being collected and stored from not only science fields but also industry and commerce fields, the efficient mining and management of useful information of this data is becoming a challenge and a massive economic need. This led to the development of distributed data mining techniques to deal with huge multi-dimensional datasets distributed among several sites.

Besides, to cope with large, graphically distributed, high dimensional, multi-owner, and heterogeneous datasets, Grid platforms are well suited for data storage and they provide an effective computational support for distributed data mining applications. Although Grid platforms allow to share resources distributed in large, heterogeneous environments, there are still many challenges on carrying these distributed data mining techniques on Grid because of lacking efficient distributed data mining systems.

In this chapter, we present a new DDM system basing on a Grid/P2P middleware tools to execute new distributed data mining techniques on very large and distributed heterogeneous datasets.

Keywords

Association Rule Frequent Itemsets Grid Environment Grid Service Global Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8, 962–969 (1996) CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB’94: Proceedings of the 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994 Google Scholar
  3. 3.
    Alsabti, K., Ranka, S., Singh, V.: A one-pass algorithm for accurately estimating quantiles for disk-resident data. In: Proceedings of the VLDE’97 Conference, pp. 346–355. Morgan Kaufmann, San Francisco (1997) Google Scholar
  4. 4.
    Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Lightweight clustering technique for distributed data mining applications. In: The 7th Industrial Conference on Data Mining ICDM 2007. Lecture Notes in Artificial Intelligence, vol. 4597. Springer, Berlin (2007) Google Scholar
  5. 5.
    Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: A multi-stage clustering algorithm for distributed data mining environments. In: COSI 2008, Colloque sur l’Optimisation et les Systèmes d’Information (2008) Google Scholar
  6. 6.
    Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Performance study of distributed apriori-like frequent itemset mining, University College Dublin, Technical report (2008) Google Scholar
  7. 7.
    Aronis, J., Kulluri, V., Provost, F., Buchanan, B.: The WoRLD: Knowledge discovery and multiple distributed databases. In: Proceedings of Florida Artificial Intelligence Research Symposium (FLAIRS-97) (1997) Google Scholar
  8. 8.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999) CrossRefGoogle Scholar
  9. 9.
    Brezany, P., Hofer, J., Tjoa, A., Wohrer, A.: GridMiner: An infrastructure for data mining on computational grids. In: Data Mining on Computational Grids APAC’03 Conference, Gold Coast, Australia, October 2003 Google Scholar
  10. 10.
    Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.: GridMiner: A framework for knowledge discovery on the Grid—from a vision to design and implementation. In: Cracow Grid Workshop, Cracow, December 2004, pp. 12–15 (2004) Google Scholar
  11. 11.
    Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD’97: Proceedings ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, USA, May 13–15, 1997 Google Scholar
  12. 12.
    Buchanan, B.G., Shortliffe, E.H.: Rule-Based Expert Systems: The MYCIN Experiments of The Standford Heuristic Programming Projects. Addison-Wesley, Reading (1984) Google Scholar
  13. 13.
    Buzan, T., Buzan, B.: The Mind Map Book. Plume, New York (1996) Google Scholar
  14. 14.
    Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communication in Statistics Journal 3(1), 1–27 (1974) MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Cannataro, M., et al.: A data mining toolset for distributed high performance platforms. In: Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Others Fields, pp. 41–50. WIT Press, Southampton (2002) Google Scholar
  16. 16.
    Cannataro, M., Talia, D., Trunfio, P.: Distributed data mining on the grid. Future Generation Computer Systems 18(8), 1101–1112 (2002) zbMATHCrossRefGoogle Scholar
  17. 17.
    Chan, P., Stolfo, S.: Toward parallel and distributed learning by meta-learning. In: Working Notes AAAI Workshop in Knowledge Discovery in Databases, pp. 227–240. AAAI Press, Menlo Park (1993) Google Scholar
  18. 18.
    Chattratichat, J., et al.: An architecture for distributed enterprise data mining. In: HPCN Europe, pp. 573–582. Springer, Heidelberg (1999) Google Scholar
  19. 19.
    Chen, S.M., Ke, J.-S., Chang, J.-F.: Knowledge representation using fuzzy Petri nets. IEEE Transactions on Knowledge and Data Engineering 2(3), 311–319 (1990) CrossRefGoogle Scholar
  20. 20.
    Curcin, V., Ghanem, M., Guo, Y., Kohler, M., Rowe, A., Syed, J., Wendel, P.: Discovery net: towards a grid of knowledge discovery. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 658–663. ACM, New York (2002) CrossRefGoogle Scholar
  21. 21.
    Czajkowski, K., et al.: The WS-resource framework, Version 1.0. http://www-106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf
  22. 22.
    Davenport, T.H., Prusak, L.: Working Knowledge. Harvard Business School Press, Cambridge (1998) Google Scholar
  23. 23.
    Deng, Y., Chang, S.-K.: A G-net model for knowledge representation and reasoning. IEEE Transactions on Knowledge and Data Engineering 2(3), 295–310 (1990) CrossRefGoogle Scholar
  24. 24.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40, 139–158 (2000) CrossRefGoogle Scholar
  25. 25.
    Dunham, M.H.: Data Mining Introductory and Advanced Topics. Prentice-Hall, Englewood Cliffs (2002) Google Scholar
  26. 26.
    Eppler, M.J.: Making knowledge visible through intranet knowledge maps: Concepts, elements, cases. In: Proceedings of the 34th Hawaii International Conference on System Sciences (2001) Google Scholar
  27. 27.
    Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. In: SIGKDD Explorations, vol. 2 (2000) Google Scholar
  28. 28.
    Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, pp. 593–620. Morgan Kaufmann, Los Altos (2004) Google Scholar
  29. 29.
    Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: An open grid services architecture for distributed systems integration. http://www.globus.org/research/papers/ogsa.pdf
  30. 30.
    Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic, Dordrecht (1998) zbMATHGoogle Scholar
  31. 31.
    Globus Tool Kit website: http://www.globus.org
  32. 32.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA (2000). Google Scholar
  33. 33.
    Hudzia, B., McDermott, L., Illahi, T.N., Kechadi, M.-T.: Entity based peer-to-peer in a data grid environment. In: Proc. of 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation, Paris, France, July 2005, pp. 11–15 (2005) Google Scholar
  34. 34.
    Januzaj, E., Kriegel, H.-P., Pfeifle, M.: DBDC: Density-based distributed clustering. In: Proc. of 9th Int. Conf. on Extending Database Technology (EDBT), Heraklion, Greece, pp. 88–105 (2004) Google Scholar
  35. 35.
    Joshi, M., et al.: Parallel algorithms for data mining. In: CRPC Parallel Computing Handbook. Morgan Kaufmann, San Francisco (2000) Google Scholar
  36. 36.
    Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient support management tool for distributed data mining environments. In: 2nd IEEE International Conference on Digital Information Management (ICDIM’07), Lyon, France, October 28–31, 2007 Google Scholar
  37. 37.
    Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient knowledge management tool for distributed data mining environments. International Journal of Computational Intelligence Research 5(1), 5–15 (2009) CrossRefGoogle Scholar
  38. 38.
    Martynov, M., Novikov, B.: An indexing algorithm for text retrieval. In: Proceedings of the International Workshop on Advances in Databases and Information System (ADBIS’96), Moscow, pp. 171–175 (1996) Google Scholar
  39. 39.
    Merz, C.J., Pazzani, M.J.: A principal components approach to combining regression estimates. Machine Learning 36, 9–32 (1999) CrossRefGoogle Scholar
  40. 40.
    Mingjin, Y., Keying, Y.: Determining the number of clusters using the weighted gap statistic. Biometrics 63(4), 1031–1037 (2007) MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994 Google Scholar
  42. 42.
    Novak, J.D., Gowin, D.B.: Learning How to Learn. Cambridge University Press, Cambridge (1984) CrossRefGoogle Scholar
  43. 43.
    OGSA-DAI website: http://www.ogsadai.org.uk/
  44. 44.
    Park, J.S., Chen, M.-S., Yu, P.S.: An effective hash-based algorithm for mining association rules. In: SIGMOD’95: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA (1995) Google Scholar
  45. 45.
    Peterson, J.-L.: Petri nets. ACM Computing Surveys 9(3), 223–252 (1977) zbMATHCrossRefGoogle Scholar
  46. 46.
    Purdom, P.W., Van Gucht, D., Groth, D.P.: Average-case performance of the Apriori algorithm. SIAM Journal on Computing 33(5) (2004) Google Scholar
  47. 47.
    Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: VLDB’95: Proceedings of the 21st International Conference on Very Large Databases, Zurich, Switzerland (1995) Google Scholar
  48. 48.
    Schuster, A., Wolff, R., Trock, D.: A high-performance distributed algorithm for mining association rules. In: ICDM’03: Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, Florida, USA (2003) Google Scholar
  49. 49.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. Stanford University (2000) Google Scholar
  50. 50.
    Wexler, M.N.: The who, what and why of knowledge mapping. Journal of Knowledge Management 5, 249–263 (2001) CrossRefGoogle Scholar
  51. 51.
    Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005) CrossRefGoogle Scholar
  52. 52.
    Zhang, B., Hsu, M., Dayal, U.: k-harmonic means—A data clustering algorithm, HP Labs (1999) Google Scholar
  53. 53.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2), Article 6 (2006) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2010

Authors and Affiliations

  • Nhien An Le Khac
    • 1
    Email author
  • Lamine M. Aouad
    • 1
  • M-Tahar Kechadi
    • 1
  1. 1.School of Computer Science and InformaticsUniversity College DublinDublinIreland

Personalised recommendations