Skip to main content

Regression from Distributed Data Sources Using Discrete Neighborhood Representations and Modified Stalked Generalization Models

  • Conference paper
Book cover Intelligent Distributed Computing VIII

Part of the book series: Studies in Computational Intelligence ((SCI,volume 570))

  • 1543 Accesses

Abstract

In this work we present a Distributed Regression approach, which works in problems where distributed data sources may have different contexts. Different context is defined as the change of the underlying law of probability in the distributed sources. We present an approach which uses a discrete representation of the probability density functions (pdfs). We create neighborhoods of similar datasets, comparing their pdfs, and use this information to build an ensemble-based approach and to improve a second level model used in this proposal, that is based in stalked generalization. We compare the proposal with other state of the art models with 5 real data sets and obtain favorable results in the majority of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Context-Aware Regression from Distributed Sources. In: IDC 2013, Prague, Czech Republic, pp. 17–22 (2013)

    Google Scholar 

  2. Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Wind Speed Forecast under a Distributed Learning Approach. In: V Chilean Workshop of Pattern Recognition, Temuco, Chile (2013)

    Google Scholar 

  3. Allende-Cid, H., Allende, H., Monge, R.: Soft Computing applied to Distributed Regression with Context-Heterogeneity. Submitted to the Journal of Multivalued Logic and Soft Computing (January 2014)

    Google Scholar 

  4. Balcan, M.-F., Ehrlich, S., Liang, Y.: Distributed k-means and k-median clustering on general communication topologies. Paper presented at the meeting of the NIPS (2013)

    Google Scholar 

  5. Bello-Orgaz, G., Menéndez, H., Camacho, D.: Adaptive K-Means Algorithm for overlapped graph clustering. International Journal of Neural Systems 22(5), 1–19 (2012)

    Article  Google Scholar 

  6. Caragea, D., Silvescu, A., Honavar, V.: Analysis and synthesis of agents that learn from distributed dynamic data sources. In: Wermter, S., Austin, J., Willshaw, D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience, pp. 547–559 (2001)

    Google Scholar 

  7. Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences 1(4), 300–307 (2007)

    MathSciNet  Google Scholar 

  8. Chawla, N.V., Lawrence Hall, O., Kevin Bowyer, W., Phillip Kegelmeyer, W.: Learning ensembles from bites: A scalable and accurate approach. Journal Machine Learning Res. 5, 421–445 (2004)

    Google Scholar 

  9. D-Lib Magazine. A research library based on historical collections of the Internet Archive (2000), http://www.dlib.org/dlib/february06/arms/02arms.html (accesed February 26, 2014)

  10. Eyal, I., Keidar, I., Rom, R.: Distributed data clustering in sensor networks. Distributed Computing 24(5), 207–222 (2011)

    Article  MATH  Google Scholar 

  11. Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. SIGKDD Explor. Newsl. 2(2), 34–38 (2000)

    Article  Google Scholar 

  12. Hefeeda, M., Gao, F., Abd-Almageed, W.: Distributed approximate spectral clustering for large-scale datasets. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012 (2012)

    Google Scholar 

  13. Ienco, D., Bifet, A., Zliobaite, I., Pfahringer, B.: Clustering Based Active Learning for Evolving Data Streams. Discovery Science, 79–93 (2013)

    Google Scholar 

  14. Lattner, A., Grimme, A., Timm, I.: An evaluation of Meta Learning and Distributed Strategies in Distributed Machine Learning. In: European Conference on Data Mining 2010, pp. 67–74 (2010)

    Google Scholar 

  15. Lazarevic, A., Obradovic, Z.: The Distributed Boosting Algorithm. In: Knowledge Discovery and Data Mining, pp. 311–316 (2001)

    Google Scholar 

  16. López, L.I., Bardallo, J.M., De Vega, M.A., Peregrin, A.: Regaltc: A distributed genetic algorithm for concept learning based on regal and the treatment of counter examples. Soft Comput. 15(7), 1389–1403 (2011)

    Article  Google Scholar 

  17. Menéndez, H., Barrero, D., Camacho, D.: A Genetic Graph-based approach for Partitional Clustering. International Journal of Neural Systems 24(1430008), 1–19 (2014)

    Google Scholar 

  18. Moretti, C., Steinhaeuser, K., Thain, D., Chawla, N.V.: Scaling up classifiers to cloud computers. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 472–481 (2008)

    Google Scholar 

  19. Pardo, L.: Statistical Inference Based on Divergence Measures. Ed. Chapman and Hall (2005)

    Google Scholar 

  20. Park, B., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. Data Mining Handbook (2002)

    Google Scholar 

  21. Peteiro-Barral, D., Guijarro-Berdinas, B.: A survey of methods for distributed machine learning. Journal of Progress in Artificial Intelligence 2, 1–11 (2013)

    Article  Google Scholar 

  22. Rodríguez, M., Escalante, D.M., Peregrín, A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11(1), 733–743 (2011)

    Article  Google Scholar 

  23. Salicrú, M., Morales, D., Menéndez, M.L., Pardo, L.: On the applications of divergence type measures in testing statistical hypotheses. J. Multivar. Anal. 51(2), 372–391 (1994)

    Article  MATH  Google Scholar 

  24. Tsoumakas, G., Vlahavas, I.P.: Effective Stacking of Distributed Classifiers. In: ECAI 2002, pp. 340–344 (2002)

    Google Scholar 

  25. Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml

    Google Scholar 

  26. Wirth, R., Borth, M., Hipp, J.: When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In: ECML 2001, pp. 3–7 (2001)

    Google Scholar 

  27. Wolpert, D.: Stacked Generalization. Neural Networks 5(2), 241–259 (1992)

    Article  MathSciNet  Google Scholar 

  28. Xing, Y., Madden, M., Duggan, J., Lyons, G.: Context-based Distributed Regression in Virtual Organizations. In: Parallel and Distributed Computing for Machine Learning. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), Cavtat-Dubrovnik, Croatia (2003)

    Google Scholar 

  29. Xing, Y., Madden, M.G., Duggan, J., Lyons, G.J.: Context-Sensitive Regression Analysis for Distributed Data. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 292–299. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Héctor Allende-Cid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Allende-Cid, H., Moraga, C., Allende, H., Monge, R. (2015). Regression from Distributed Data Sources Using Discrete Neighborhood Representations and Modified Stalked Generalization Models. In: Camacho, D., Braubach, L., Venticinque, S., Badica, C. (eds) Intelligent Distributed Computing VIII. Studies in Computational Intelligence, vol 570. Springer, Cham. https://doi.org/10.1007/978-3-319-10422-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10422-5_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10421-8

  • Online ISBN: 978-3-319-10422-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics