Regression from Distributed Data Sources Using Discrete Neighborhood Representations and Modified Stalked Generalization Models

Allende-Cid, Héctor; Moraga, Claudio; Allende, Héctor; Monge, Raúl

doi:10.1007/978-3-319-10422-5_27

Héctor Allende-Cid⁶,
Claudio Moraga^7,8,
Héctor Allende⁶ &
…
Raúl Monge⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 570))

1543 Accesses

Abstract

In this work we present a Distributed Regression approach, which works in problems where distributed data sources may have different contexts. Different context is defined as the change of the underlying law of probability in the distributed sources. We present an approach which uses a discrete representation of the probability density functions (pdfs). We create neighborhoods of similar datasets, comparing their pdfs, and use this information to build an ensemble-based approach and to improve a second level model used in this proposal, that is based in stalked generalization. We compare the proposal with other state of the art models with 5 real data sets and obtain favorable results in the majority of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Context-Aware Regression from Distributed Sources. In: IDC 2013, Prague, Czech Republic, pp. 17–22 (2013)
Google Scholar
Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Wind Speed Forecast under a Distributed Learning Approach. In: V Chilean Workshop of Pattern Recognition, Temuco, Chile (2013)
Google Scholar
Allende-Cid, H., Allende, H., Monge, R.: Soft Computing applied to Distributed Regression with Context-Heterogeneity. Submitted to the Journal of Multivalued Logic and Soft Computing (January 2014)
Google Scholar
Balcan, M.-F., Ehrlich, S., Liang, Y.: Distributed k-means and k-median clustering on general communication topologies. Paper presented at the meeting of the NIPS (2013)
Google Scholar
Bello-Orgaz, G., Menéndez, H., Camacho, D.: Adaptive K-Means Algorithm for overlapped graph clustering. International Journal of Neural Systems 22(5), 1–19 (2012)
Article Google Scholar
Caragea, D., Silvescu, A., Honavar, V.: Analysis and synthesis of agents that learn from distributed dynamic data sources. In: Wermter, S., Austin, J., Willshaw, D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience, pp. 547–559 (2001)
Google Scholar
Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences 1(4), 300–307 (2007)
MathSciNet Google Scholar
Chawla, N.V., Lawrence Hall, O., Kevin Bowyer, W., Phillip Kegelmeyer, W.: Learning ensembles from bites: A scalable and accurate approach. Journal Machine Learning Res. 5, 421–445 (2004)
Google Scholar
D-Lib Magazine. A research library based on historical collections of the Internet Archive (2000), http://www.dlib.org/dlib/february06/arms/02arms.html (accesed February 26, 2014)
Eyal, I., Keidar, I., Rom, R.: Distributed data clustering in sensor networks. Distributed Computing 24(5), 207–222 (2011)
Article MATH Google Scholar
Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. SIGKDD Explor. Newsl. 2(2), 34–38 (2000)
Article Google Scholar
Hefeeda, M., Gao, F., Abd-Almageed, W.: Distributed approximate spectral clustering for large-scale datasets. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012 (2012)
Google Scholar
Ienco, D., Bifet, A., Zliobaite, I., Pfahringer, B.: Clustering Based Active Learning for Evolving Data Streams. Discovery Science, 79–93 (2013)
Google Scholar
Lattner, A., Grimme, A., Timm, I.: An evaluation of Meta Learning and Distributed Strategies in Distributed Machine Learning. In: European Conference on Data Mining 2010, pp. 67–74 (2010)
Google Scholar
Lazarevic, A., Obradovic, Z.: The Distributed Boosting Algorithm. In: Knowledge Discovery and Data Mining, pp. 311–316 (2001)
Google Scholar
López, L.I., Bardallo, J.M., De Vega, M.A., Peregrin, A.: Regaltc: A distributed genetic algorithm for concept learning based on regal and the treatment of counter examples. Soft Comput. 15(7), 1389–1403 (2011)
Article Google Scholar
Menéndez, H., Barrero, D., Camacho, D.: A Genetic Graph-based approach for Partitional Clustering. International Journal of Neural Systems 24(1430008), 1–19 (2014)
Google Scholar
Moretti, C., Steinhaeuser, K., Thain, D., Chawla, N.V.: Scaling up classifiers to cloud computers. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 472–481 (2008)
Google Scholar
Pardo, L.: Statistical Inference Based on Divergence Measures. Ed. Chapman and Hall (2005)
Google Scholar
Park, B., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. Data Mining Handbook (2002)
Google Scholar
Peteiro-Barral, D., Guijarro-Berdinas, B.: A survey of methods for distributed machine learning. Journal of Progress in Artificial Intelligence 2, 1–11 (2013)
Article Google Scholar
Rodríguez, M., Escalante, D.M., Peregrín, A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11(1), 733–743 (2011)
Article Google Scholar
Salicrú, M., Morales, D., Menéndez, M.L., Pardo, L.: On the applications of divergence type measures in testing statistical hypotheses. J. Multivar. Anal. 51(2), 372–391 (1994)
Article MATH Google Scholar
Tsoumakas, G., Vlahavas, I.P.: Effective Stacking of Distributed Classifiers. In: ECAI 2002, pp. 340–344 (2002)
Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml
Google Scholar
Wirth, R., Borth, M., Hipp, J.: When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In: ECML 2001, pp. 3–7 (2001)
Google Scholar
Wolpert, D.: Stacked Generalization. Neural Networks 5(2), 241–259 (1992)
Article MathSciNet Google Scholar
Xing, Y., Madden, M., Duggan, J., Lyons, G.: Context-based Distributed Regression in Virtual Organizations. In: Parallel and Distributed Computing for Machine Learning. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), Cavtat-Dubrovnik, Croatia (2003)
Google Scholar
Xing, Y., Madden, M.G., Duggan, J., Lyons, G.J.: Context-Sensitive Regression Analysis for Distributed Data. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 292–299. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso, Chile
Héctor Allende-Cid, Héctor Allende & Raúl Monge
European Centre for Soft Computing, 33600, Mieres, Asturias, Spain
Claudio Moraga
TU Dortmund University, 44220, Dortmund, Germany
Claudio Moraga

Authors

Héctor Allende-Cid
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Moraga
View author publications
You can also search for this author in PubMed Google Scholar
Héctor Allende
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Monge
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Héctor Allende-Cid .

Editor information

Editors and Affiliations

Computer Science Department, Technical School of Engineering, Universidad Autónoma de Madrid, Madrid, Spain
David Camacho
University of Hamburg, Hamburg, Germany
Lars Braubach
Department of Industrial and Information Engineering, Second University of Naples, Aversa, Italy
Salvatore Venticinque
Software Engineering Department, Faculty of Automatics, Computers and Ele, University of Craiova, Craiova, Romania
Costin Badica

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Allende-Cid, H., Moraga, C., Allende, H., Monge, R. (2015). Regression from Distributed Data Sources Using Discrete Neighborhood Representations and Modified Stalked Generalization Models. In: Camacho, D., Braubach, L., Venticinque, S., Badica, C. (eds) Intelligent Distributed Computing VIII. Studies in Computational Intelligence, vol 570. Springer, Cham. https://doi.org/10.1007/978-3-319-10422-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-10422-5_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10421-8
Online ISBN: 978-3-319-10422-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics