Abstract
In this work we present a Distributed Regression approach, which works in problems where distributed data sources may have different contexts. Different context is defined as the change of the underlying law of probability in the distributed sources. We present an approach which uses a discrete representation of the probability density functions (pdfs). We create neighborhoods of similar datasets, comparing their pdfs, and use this information to build an ensemble-based approach and to improve a second level model used in this proposal, that is based in stalked generalization. We compare the proposal with other state of the art models with 5 real data sets and obtain favorable results in the majority of the datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Context-Aware Regression from Distributed Sources. In: IDC 2013, Prague, Czech Republic, pp. 17–22 (2013)
Allende-Cid, H., Moraga, C., Allende, H., Monge, R.: Wind Speed Forecast under a Distributed Learning Approach. In: V Chilean Workshop of Pattern Recognition, Temuco, Chile (2013)
Allende-Cid, H., Allende, H., Monge, R.: Soft Computing applied to Distributed Regression with Context-Heterogeneity. Submitted to the Journal of Multivalued Logic and Soft Computing (January 2014)
Balcan, M.-F., Ehrlich, S., Liang, Y.: Distributed k-means and k-median clustering on general communication topologies. Paper presented at the meeting of the NIPS (2013)
Bello-Orgaz, G., Menéndez, H., Camacho, D.: Adaptive K-Means Algorithm for overlapped graph clustering. International Journal of Neural Systems 22(5), 1–19 (2012)
Caragea, D., Silvescu, A., Honavar, V.: Analysis and synthesis of agents that learn from distributed dynamic data sources. In: Wermter, S., Austin, J., Willshaw, D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience, pp. 547–559 (2001)
Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences 1(4), 300–307 (2007)
Chawla, N.V., Lawrence Hall, O., Kevin Bowyer, W., Phillip Kegelmeyer, W.: Learning ensembles from bites: A scalable and accurate approach. Journal Machine Learning Res. 5, 421–445 (2004)
D-Lib Magazine. A research library based on historical collections of the Internet Archive (2000), http://www.dlib.org/dlib/february06/arms/02arms.html (accesed February 26, 2014)
Eyal, I., Keidar, I., Rom, R.: Distributed data clustering in sensor networks. Distributed Computing 24(5), 207–222 (2011)
Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. SIGKDD Explor. Newsl. 2(2), 34–38 (2000)
Hefeeda, M., Gao, F., Abd-Almageed, W.: Distributed approximate spectral clustering for large-scale datasets. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012 (2012)
Ienco, D., Bifet, A., Zliobaite, I., Pfahringer, B.: Clustering Based Active Learning for Evolving Data Streams. Discovery Science, 79–93 (2013)
Lattner, A., Grimme, A., Timm, I.: An evaluation of Meta Learning and Distributed Strategies in Distributed Machine Learning. In: European Conference on Data Mining 2010, pp. 67–74 (2010)
Lazarevic, A., Obradovic, Z.: The Distributed Boosting Algorithm. In: Knowledge Discovery and Data Mining, pp. 311–316 (2001)
López, L.I., Bardallo, J.M., De Vega, M.A., Peregrin, A.: Regaltc: A distributed genetic algorithm for concept learning based on regal and the treatment of counter examples. Soft Comput. 15(7), 1389–1403 (2011)
Menéndez, H., Barrero, D., Camacho, D.: A Genetic Graph-based approach for Partitional Clustering. International Journal of Neural Systems 24(1430008), 1–19 (2014)
Moretti, C., Steinhaeuser, K., Thain, D., Chawla, N.V.: Scaling up classifiers to cloud computers. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), pp. 472–481 (2008)
Pardo, L.: Statistical Inference Based on Divergence Measures. Ed. Chapman and Hall (2005)
Park, B., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. Data Mining Handbook (2002)
Peteiro-Barral, D., Guijarro-Berdinas, B.: A survey of methods for distributed machine learning. Journal of Progress in Artificial Intelligence 2, 1–11 (2013)
Rodríguez, M., Escalante, D.M., Peregrín, A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11(1), 733–743 (2011)
Salicrú, M., Morales, D., Menéndez, M.L., Pardo, L.: On the applications of divergence type measures in testing statistical hypotheses. J. Multivar. Anal. 51(2), 372–391 (1994)
Tsoumakas, G., Vlahavas, I.P.: Effective Stacking of Distributed Classifiers. In: ECAI 2002, pp. 340–344 (2002)
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml
Wirth, R., Borth, M., Hipp, J.: When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In: ECML 2001, pp. 3–7 (2001)
Wolpert, D.: Stacked Generalization. Neural Networks 5(2), 241–259 (1992)
Xing, Y., Madden, M., Duggan, J., Lyons, G.: Context-based Distributed Regression in Virtual Organizations. In: Parallel and Distributed Computing for Machine Learning. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), Cavtat-Dubrovnik, Croatia (2003)
Xing, Y., Madden, M.G., Duggan, J., Lyons, G.J.: Context-Sensitive Regression Analysis for Distributed Data. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 292–299. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Allende-Cid, H., Moraga, C., Allende, H., Monge, R. (2015). Regression from Distributed Data Sources Using Discrete Neighborhood Representations and Modified Stalked Generalization Models. In: Camacho, D., Braubach, L., Venticinque, S., Badica, C. (eds) Intelligent Distributed Computing VIII. Studies in Computational Intelligence, vol 570. Springer, Cham. https://doi.org/10.1007/978-3-319-10422-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-10422-5_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10421-8
Online ISBN: 978-3-319-10422-5
eBook Packages: EngineeringEngineering (R0)