Learning Rules from Distributed Data

Hall, Lawrence O.; Chawla, Nitesh; Bowyer, Kevin W.; Kegelmeyer, W. Philip

doi:10.1007/3-540-46502-2_11

Lawrence O. Hall³,
Nitesh Chawla³,
Kevin W. Bowyer³ &
…
W. Philip Kegelmeyer⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1759))

706 Accesses
2 Citations

Abstract

In this paper a concern about the accuracy (as a function of parallelism) of a certain class of distributed learning algorithms is raised, and one proposed improvement is illustrated. We focus on learning a single model from a set of disjoint data sets, which are distributed across a set of computers. The model is a set of rules. The distributed data sets may be disjoint for any of several reasons. In our approach, the first step is to construct a rule set (model) for each of the original disjoint data sets. Then rule sets are merged until an eventual final rule set is obtained which models the aggregate data. We show that this approach compares to directly creating a rule set from the aggregate data and promises faster learning. Accuracy can drop off as the degree of parallelism increases. However, an approach has been developed to extend the degree of parallelism achieved before this problem takes over.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Williams, Inducing and Combining Multiple Decision Trees. PhD thesis, Australian National University, Canberra, Australia, 1990.
Google Scholar
F. Provost and D. Hennessy, “Scaling up: Distributed machine learning with cooperation,” in Proceedings of AAAI’96, pp. 74–79, 1996.
Google Scholar
J. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992. San Mateo, CA.
Google Scholar
J. Quinlan, “Improved use of continuous attributes in C4.5,” Journal of Artificial Intelligence Research, vol. 4, pp. 77–90, 1996.
Article MATH Google Scholar
S. Clearwater, T. Cheng, H. Hirsh, and B. Buchanan, “Incremental batch learning,” in Proceedings of the Sixth Int. Workshop on Machine Learning, pp. 366–370, 1989.
Google Scholar
W. Cohen, “Fast effective rule induction,” in Proceedings of the 12th Conference on Machine Learning, 1995.
Google Scholar
R. Kufrin, “Generating C4.5 production rules in parallel,” in Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), pp. 565–570, July 1997.
Google Scholar
J. Quinlan, “Generating production rules from decision trees,” in Proceedings of IJCAI-87, pp. 304–307, 1987.
Google Scholar
F. Provost and D. Hennessy, “Distributed machine learning: Scaling up with coarse-grained parallelism,” in Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 1994.
Google Scholar
C. Merz and P. Murphy, UCI Repository of Machine Learning Databases. Univ. of CA., Dept. of CIS, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html.
L. Hall, N. Chawla, and K. Bowyer, “Decision tree learning on very large data sets,” in International Conference on Systems, Man and Cybernetics, pp. 2579–2584, Oct 1998.
Google Scholar
R. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Eugenics, vol. 7, 1936.
Google Scholar
S. Weiss, R. Galen, and P. Tadepalli, “Maximizing the predictive value of production rules,” Artificial Intelligence, vol. 45, pp. 47–71, 1990.
Article Google Scholar
P. Chan and S. Stolfo, “Scaling learning by meta-learning over disjoint and partially replicated data,” in Proceedings of the Florida Artificial Intelligence Society, 1996.
Google Scholar
S. Stolfo, A. Prodromidis, S. Tselepis, W. Lee, D. Fan, and P. Chan, “JAM: Java agents for meta-learning over distributed databases,” in Proc. KDD-97, 1997.
Google Scholar
P. K. Chan and S. J. Stolfo, “Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection,” in Proc. KDD-98, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, ENB 118, University of South Florida, 4202 E. Fowler Ave., Tampa, Fl, 33620
Lawrence O. Hall, Nitesh Chawla & Kevin W. Bowyer
Advanced Concepts Department, Sandia National Laboratories, P.O. Box 969, MS 9214, Livermore, CA, 94551-0969
W. Philip Kegelmeyer

Authors

Lawrence O. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Nitesh Chawla
View author publications
You can also search for this author in PubMed Google Scholar
Kevin W. Bowyer
View author publications
You can also search for this author in PubMed Google Scholar
W. Philip Kegelmeyer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Mohammed J. Zaki
K55/B1, IBM Almaden Research Center, 650 Harry Road, San Jose, CA, 95120, USA
Ching-Tien Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hall, L.O., Chawla, N., Bowyer, K.W., Kegelmeyer, W.P. (2000). Learning Rules from Distributed Data. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_11

Download citation

DOI: https://doi.org/10.1007/3-540-46502-2_11
Published: 17 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics