Toward Distributed Knowledge Discovery on Grid Systems

Le Khac, Nhien An; Aouad, Lamine M.; Kechadi, M-Tahar

doi:10.1007/978-1-84996-077-9_9

Nhien An Le Khac⁵,
Lamine M. Aouad⁵ &
M-Tahar Kechadi⁵

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

735 Accesses
2 Citations

Abstract

While massive amounts of data are being collected and stored from not only science fields but also industry and commerce fields, the efficient mining and management of useful information of this data is becoming a challenge and a massive economic need. This led to the development of distributed data mining techniques to deal with huge multi-dimensional datasets distributed among several sites.

Besides, to cope with large, graphically distributed, high dimensional, multi-owner, and heterogeneous datasets, Grid platforms are well suited for data storage and they provide an effective computational support for distributed data mining applications. Although Grid platforms allow to share resources distributed in large, heterogeneous environments, there are still many challenges on carrying these distributed data mining techniques on Grid because of lacking efficient distributed data mining systems.

In this chapter, we present a new DDM system basing on a Grid/P2P middleware tools to execute new distributed data mining techniques on very large and distributed heterogeneous datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8, 962–969 (1996)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB’94: Proceedings of the 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994
Google Scholar
Alsabti, K., Ranka, S., Singh, V.: A one-pass algorithm for accurately estimating quantiles for disk-resident data. In: Proceedings of the VLDE’97 Conference, pp. 346–355. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Lightweight clustering technique for distributed data mining applications. In: The 7th Industrial Conference on Data Mining ICDM 2007. Lecture Notes in Artificial Intelligence, vol. 4597. Springer, Berlin (2007)
Google Scholar
Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: A multi-stage clustering algorithm for distributed data mining environments. In: COSI 2008, Colloque sur l’Optimisation et les Systèmes d’Information (2008)
Google Scholar
Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Performance study of distributed apriori-like frequent itemset mining, University College Dublin, Technical report (2008)
Google Scholar
Aronis, J., Kulluri, V., Provost, F., Buchanan, B.: The WoRLD: Knowledge discovery and multiple distributed databases. In: Proceedings of Florida Artificial Intelligence Research Symposium (FLAIRS-97) (1997)
Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Article Google Scholar
Brezany, P., Hofer, J., Tjoa, A., Wohrer, A.: GridMiner: An infrastructure for data mining on computational grids. In: Data Mining on Computational Grids APAC’03 Conference, Gold Coast, Australia, October 2003
Google Scholar
Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.: GridMiner: A framework for knowledge discovery on the Grid—from a vision to design and implementation. In: Cracow Grid Workshop, Cracow, December 2004, pp. 12–15 (2004)
Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD’97: Proceedings ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, USA, May 13–15, 1997
Google Scholar
Buchanan, B.G., Shortliffe, E.H.: Rule-Based Expert Systems: The MYCIN Experiments of The Standford Heuristic Programming Projects. Addison-Wesley, Reading (1984)
Google Scholar
Buzan, T., Buzan, B.: The Mind Map Book. Plume, New York (1996)
Google Scholar
Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communication in Statistics Journal 3(1), 1–27 (1974)
Article MathSciNet MATH Google Scholar
Cannataro, M., et al.: A data mining toolset for distributed high performance platforms. In: Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Others Fields, pp. 41–50. WIT Press, Southampton (2002)
Google Scholar
Cannataro, M., Talia, D., Trunfio, P.: Distributed data mining on the grid. Future Generation Computer Systems 18(8), 1101–1112 (2002)
Article MATH Google Scholar
Chan, P., Stolfo, S.: Toward parallel and distributed learning by meta-learning. In: Working Notes AAAI Workshop in Knowledge Discovery in Databases, pp. 227–240. AAAI Press, Menlo Park (1993)
Google Scholar
Chattratichat, J., et al.: An architecture for distributed enterprise data mining. In: HPCN Europe, pp. 573–582. Springer, Heidelberg (1999)
Google Scholar
Chen, S.M., Ke, J.-S., Chang, J.-F.: Knowledge representation using fuzzy Petri nets. IEEE Transactions on Knowledge and Data Engineering 2(3), 311–319 (1990)
Article Google Scholar
Curcin, V., Ghanem, M., Guo, Y., Kohler, M., Rowe, A., Syed, J., Wendel, P.: Discovery net: towards a grid of knowledge discovery. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 658–663. ACM, New York (2002)
Chapter Google Scholar
Czajkowski, K., et al.: The WS-resource framework, Version 1.0. http://www-106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf
Davenport, T.H., Prusak, L.: Working Knowledge. Harvard Business School Press, Cambridge (1998)
Google Scholar
Deng, Y., Chang, S.-K.: A G-net model for knowledge representation and reasoning. IEEE Transactions on Knowledge and Data Engineering 2(3), 295–310 (1990)
Article Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40, 139–158 (2000)
Article Google Scholar
Dunham, M.H.: Data Mining Introductory and Advanced Topics. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Eppler, M.J.: Making knowledge visible through intranet knowledge maps: Concepts, elements, cases. In: Proceedings of the 34th Hawaii International Conference on System Sciences (2001)
Google Scholar
Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. In: SIGKDD Explorations, vol. 2 (2000)
Google Scholar
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, pp. 593–620. Morgan Kaufmann, Los Altos (2004)
Google Scholar
Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: An open grid services architecture for distributed systems integration. http://www.globus.org/research/papers/ogsa.pdf
Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic, Dordrecht (1998)
MATH Google Scholar
Globus Tool Kit website: http://www.globus.org
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA (2000).
Google Scholar
Hudzia, B., McDermott, L., Illahi, T.N., Kechadi, M.-T.: Entity based peer-to-peer in a data grid environment. In: Proc. of 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation, Paris, France, July 2005, pp. 11–15 (2005)
Google Scholar
Januzaj, E., Kriegel, H.-P., Pfeifle, M.: DBDC: Density-based distributed clustering. In: Proc. of 9th Int. Conf. on Extending Database Technology (EDBT), Heraklion, Greece, pp. 88–105 (2004)
Google Scholar
Joshi, M., et al.: Parallel algorithms for data mining. In: CRPC Parallel Computing Handbook. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient support management tool for distributed data mining environments. In: 2nd IEEE International Conference on Digital Information Management (ICDIM’07), Lyon, France, October 28–31, 2007
Google Scholar
Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient knowledge management tool for distributed data mining environments. International Journal of Computational Intelligence Research 5(1), 5–15 (2009)
Article Google Scholar
Martynov, M., Novikov, B.: An indexing algorithm for text retrieval. In: Proceedings of the International Workshop on Advances in Databases and Information System (ADBIS’96), Moscow, pp. 171–175 (1996)
Google Scholar
Merz, C.J., Pazzani, M.J.: A principal components approach to combining regression estimates. Machine Learning 36, 9–32 (1999)
Article Google Scholar
Mingjin, Y., Keying, Y.: Determining the number of clusters using the weighted gap statistic. Biometrics 63(4), 1031–1037 (2007)
Article MathSciNet MATH Google Scholar
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994
Google Scholar
Novak, J.D., Gowin, D.B.: Learning How to Learn. Cambridge University Press, Cambridge (1984)
Book Google Scholar
OGSA-DAI website: http://www.ogsadai.org.uk/
Park, J.S., Chen, M.-S., Yu, P.S.: An effective hash-based algorithm for mining association rules. In: SIGMOD’95: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA (1995)
Google Scholar
Peterson, J.-L.: Petri nets. ACM Computing Surveys 9(3), 223–252 (1977)
Article MATH Google Scholar
Purdom, P.W., Van Gucht, D., Groth, D.P.: Average-case performance of the Apriori algorithm. SIAM Journal on Computing 33(5) (2004)
Google Scholar
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: VLDB’95: Proceedings of the 21st International Conference on Very Large Databases, Zurich, Switzerland (1995)
Google Scholar
Schuster, A., Wolff, R., Trock, D.: A high-performance distributed algorithm for mining association rules. In: ICDM’03: Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, Florida, USA (2003)
Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. Stanford University (2000)
Google Scholar
Wexler, M.N.: The who, what and why of knowledge mapping. Journal of Knowledge Management 5, 249–263 (2001)
Article Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Article Google Scholar
Zhang, B., Hsu, M., Dayal, U.: k-harmonic means—A data clustering algorithm, HP Labs (1999)
Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2), Article 6 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Informatics, University College Dublin, Belfield, Dublin, 4, Ireland
Nhien An Le Khac, Lamine M. Aouad & M-Tahar Kechadi

Authors

Nhien An Le Khac
View author publications
You can also search for this author in PubMed Google Scholar
Lamine M. Aouad
View author publications
You can also search for this author in PubMed Google Scholar
M-Tahar Kechadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nhien An Le Khac .

Editor information

Editors and Affiliations

INSA de Lyon, avenue Jean Capelle 7, Villeurbanne CX, 69621, France
Youakim Badr
Fac. Sciences Mirande, UMR CNRS 5158, Université de Bourgogne, Dijon CX, France
Richard Chbeir
Technology, Center for Quantifiable Quality of, Norwegian University of Science &, O.S. Bragstads plass 2E, Trondheim, 7491, Norway
Ajith Abraham
College of Business & Administration, Dept. Quantitative Methods &, Kuwait University, Safat, Kuwait
Aboul-Ella Hassanien

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Le Khac, N.A., Aouad, L.M., Kechadi, MT. (2010). Toward Distributed Knowledge Discovery on Grid Systems. In: Badr, Y., Chbeir, R., Abraham, A., Hassanien, AE. (eds) Emergent Web Intelligence: Advanced Semantic Technologies. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84996-077-9_9

Download citation

DOI: https://doi.org/10.1007/978-1-84996-077-9_9
Publisher Name: Springer, London
Print ISBN: 978-1-84996-076-2
Online ISBN: 978-1-84996-077-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Toward Distributed Knowledge Discovery on Grid Systems