Advertisement

Communication-Efficient Classification in P2P Networks

  • Hock Hee Ang
  • Vivekanand Gopalkrishnan
  • Wee Keong Ng
  • Steven Hoi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

Distributed classification aims to learn with accuracy comparable to that of centralized approaches but at far lesser communication and computation costs. By nature, P2P networks provide an excellent environment for performing a distributed classification task due to the high availability of shared resources, such as bandwidth, storage space, and rich computational power. However, learning in P2P networks is faced with many challenging issues; viz., scalability, peer dynamism, asynchronism and fault-tolerance. In this paper, we address these challenges by presenting CEMPaR—a communication-efficient framework based on cascading SVMs that exploits the characteristics of DHT-based lookup protocols. CEMPaR is designed to be robust to parameters such as the number of peers in the network, imbalanced data sizes and class distribution while incurring extremely low communication cost yet maintaining accuracy comparable to the best-in-the-class approaches. Feasibility and effectiveness of our approach are demonstrated with extensive experimental studies on real and synthetic datasets.

Keywords

Communication Cost Physical Address Prediction Phase Class Count Prediction Cost 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Ang, H.H., Gopalkrishnan, V., Hoi, S.C.H., Ng, W.-K.: Cascade RSVM in peer-to-peer networks. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 55–70. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Ang, H.H., Gopalkrishnan, V., Hoi, S.C.H., Ng, W.K., Datta, A.: Classification in P2P networks by bagging cascade RSVMs. In: VLDB Workshop on DBISP2P, pp. 13–25 (2008)Google Scholar
  3. 3.
    Gorodetskiy, V., Karsaev, O., Samoilov, V., Serebryakov, S.: Agent-based service-oriented intelligent P2P networks for distributed classification. In: Hybrid Information Technology, pp. 224–233 (2006)Google Scholar
  4. 4.
    Luo, P., Xiong, H., Lü, K., Shi, Z.: Distributed classification in peer-to-peer networks. In: ACM SIGKDD, pp. 968–976 (2007)Google Scholar
  5. 5.
    Siersdorfer, S., Sizov, S.: Automatic document organization in a P2P environment. In: ECIR, pp. 265–276 (2006)Google Scholar
  6. 6.
    Volkmer, T., Smith, J.R., Natsev, A.P.: A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. In: ACM Multimedia, pp. 892–901 (2005)Google Scholar
  7. 7.
    Datta, S., Bhaduri, K., Giannella, C., Wolff, R., Kargupta, H.: Distributed data mining in peer-to-peer networks. IEEE Internet Computing, Special issue on Distributed Data Mining 10(4), 18–26 (2006)Google Scholar
  8. 8.
    Lin, K., Lin, C.: A study on reduced support vector machines. IEEE Transactions on Neural Networks 14(6), 1449–1459 (2003)CrossRefGoogle Scholar
  9. 9.
    Balakrishnan, H., Kaashoek, M.F., Karger, D.R., Morris, R., Stoica, I.: Looking up data in P2P systems. Communications of the ACM 46(2), 43–48 (2003)CrossRefGoogle Scholar
  10. 10.
    Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160 (2001)Google Scholar
  11. 11.
    Breiman, L.: Pasting small votes for classification in large databases and on-line. Machine Learning 36(1-2), 85–103 (1999)CrossRefGoogle Scholar
  12. 12.
    Polikar, R.: Esemble based systems in decision making. IEEE Circuits and Systems Magazine 9(3), 21–45 (2006)CrossRefGoogle Scholar
  13. 13.
    Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: An empirical study. Technical report, Department of Computer Science, Rutgers University (2001)Google Scholar
  14. 14.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  15. 15.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  16. 16.
    Baumgart, I., Heep, B., Krause, S.: Oversim: A flexible overlay network simulation framework. In: IEEE Global Internet Symposium, pp. 79–84 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Hock Hee Ang
    • 1
  • Vivekanand Gopalkrishnan
    • 1
  • Wee Keong Ng
    • 1
  • Steven Hoi
    • 1
  1. 1.Nanyang Technological UniversitySingapore

Personalised recommendations