Abstract
Using statistical learning theory and machine learning techniques surrounding the principles of Rival Penalised Competitive Learning (RPCL), this chapter proposes a novel approach aiming to aid Big Data Thinning, i.e., analysing only the potential data sub-spaces and not the entire extensive data space. Data scientists, data analysts, IoT applications and Edge-centric services are in need for predictive modelling and analytics. This is achieved by learning from past issued analytics queries and exploiting the analytics query access patterns over the large distributed data-sets revealing the most interested and important sub-spaces for further exploratory analysis. By analysing user queries and respectively mapping them into relatively small-scale predictive local regression models, we can yield higher predictive accuracy. This is done by thinning the data space and freeing it of irrelevant and non-popular data sub-spaces; thus, making use of less training data instances. Experimental results and statistical analysis support the research idea proposed in this work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ahalt, S.C., Krishnamurthy, A.K., Chen, P., Melton, D.E.: Competitive learning algorithms for vector quantization. Neural Netw. 3(3), 277–290 (1990). ISSN 0893-6080. https://doi.org/10.1016/0893-6080(90)90071-R
Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018)
Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017)
Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018)
Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)
Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://dl.acm.org/citation.cfm?id=1283383.1283494
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). ISSN 2150-8097. https://doi.org/10.14778/2180912.2180915
Bohn, R., Short, J.E.: How much information? 2009 report on American consumers, vol. 01 (2009). https://www.researchgate.net/publication/242562463_How_Much_ Information_2009_Report_on_American_Consumers
Bohn, R., Short, J.E.: How much information? 2010 report on enterprise server information, p. 7 (2010). https://www.clds.info/uploads/1/2/0/5/120516768/hmi_ 2010_enterprisereport_jan_2011.pdf
Botoca, C., Budura, G., Miclau, N.: Competitive learning algorithms for data clustering. Facta Univ. Ser. Electron. Energetics 19, 01 (2005). https://doi.org/10.2298/FUEE0602261B
Constandinos, X.M., George, M., Jordi, M.B.: Internet of Things (IoT) in 5G Mobile Technologies. Springer International Publishing AG (2016). ISSN 2196-7326. https://doi.org/10.1007/978-3-319-30913-2
Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019)
Contandriopoulos, D., Brousselle, A.: Evaluation models and evaluation use. Evaluation 18(1), 61–77 (2012). https://doi.org/10.1177/1356389011430371
Desieno, D.: Adding a conscience to competitive learning. In: IEEE 1988 International Conference on Neural Networks, vol. 1, pp. 117–124 (1988). https://doi.org/10.1109/icnn.1988.23839
Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019)
Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976)
Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011). ISSN 0036-8075. https://doi.org/10.1126/science.1200970
Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019)
Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)
Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982)
Makhoul, L., Rpucos, S., Gish, H.: Vector quantization in speech coding. IEEE Trans. Neural Netw. 73(11), 1551–1558 (1985). https://labrosa.ee.columbia.edu/~dpwe/papers/MakhRG85-vq.pdf
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2
Nasrabadi, N.M., King, R.A.: Image coding using vector quantization: a review. IEEE Trans. Commun. 36, 957–971 (1988). ISSN 0090-6778. https://doi.org/10.1109/26.3776
Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530
Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece
Xu, L., Krzyzak, A., Oja, E.: Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans. Neural Netw. 4(4), 636–649 (1993). ISSN 1045-9227. https://doi.org/10.1109/72.238318
Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019)
Acknowledgements
This research is funded by the EU-H2020 GNFUV Project (#Grant 645220) and the EU-H2020 MSCA INNOVATE Project (#Grant 745829).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Shehab, N., Anagnostopoulos, C. (2020). Big Data Thinning: Knowledge Discovery from Relevant Data. In: Mastorakis, G., Mavromoustakis, C., Batalla, J., Pallis, E. (eds) Convergence of Artificial Intelligence and the Internet of Things. Internet of Things. Springer, Cham. https://doi.org/10.1007/978-3-030-44907-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-44907-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44906-3
Online ISBN: 978-3-030-44907-0
eBook Packages: EngineeringEngineering (R0)