Big Data Thinning: Knowledge Discovery from Relevant Data

Shehab, Naji; Anagnostopoulos, Christos

doi:10.1007/978-3-030-44907-0_11

Big Data Thinning: Knowledge Discovery from Relevant Data

Naji Shehab⁷ &
Christos Anagnostopoulos⁷

Chapter
First Online: 07 May 2020

874 Accesses

Part of the book series: Internet of Things ((ITTCC))

Abstract

Using statistical learning theory and machine learning techniques surrounding the principles of Rival Penalised Competitive Learning (RPCL), this chapter proposes a novel approach aiming to aid Big Data Thinning, i.e., analysing only the potential data sub-spaces and not the entire extensive data space. Data scientists, data analysts, IoT applications and Edge-centric services are in need for predictive modelling and analytics. This is achieved by learning from past issued analytics queries and exploiting the analytics query access patterns over the large distributed data-sets revealing the most interested and important sub-spaces for further exploratory analysis. By analysing user queries and respectively mapping them into relatively small-scale predictive local regression models, we can yield higher predictive accuracy. This is done by thinning the data space and freeing it of irrelevant and non-popular data sub-spaces; thus, making use of less training data instances. Experimental results and statistical analysis support the research idea proposed in this work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ahalt, S.C., Krishnamurthy, A.K., Chen, P., Melton, D.E.: Competitive learning algorithms for vector quantization. Neural Netw. 3(3), 277–290 (1990). ISSN 0893-6080. https://doi.org/10.1016/0893-6080(90)90071-R
Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018)
Google Scholar
Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017)
Google Scholar
Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018)
Google Scholar
Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)
Article Google Scholar
Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://dl.acm.org/citation.cfm?id=1283383.1283494
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). ISSN 2150-8097. https://doi.org/10.14778/2180912.2180915
Bohn, R., Short, J.E.: How much information? 2009 report on American consumers, vol. 01 (2009). https://www.researchgate.net/publication/242562463_How_Much_ Information_2009_Report_on_American_Consumers
Bohn, R., Short, J.E.: How much information? 2010 report on enterprise server information, p. 7 (2010). https://www.clds.info/uploads/1/2/0/5/120516768/hmi_ 2010_enterprisereport_jan_2011.pdf
Botoca, C., Budura, G., Miclau, N.: Competitive learning algorithms for data clustering. Facta Univ. Ser. Electron. Energetics 19, 01 (2005). https://doi.org/10.2298/FUEE0602261B
Article Google Scholar
Constandinos, X.M., George, M., Jordi, M.B.: Internet of Things (IoT) in 5G Mobile Technologies. Springer International Publishing AG (2016). ISSN 2196-7326. https://doi.org/10.1007/978-3-319-30913-2
Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019)
Google Scholar
Contandriopoulos, D., Brousselle, A.: Evaluation models and evaluation use. Evaluation 18(1), 61–77 (2012). https://doi.org/10.1177/1356389011430371
Article Google Scholar
Desieno, D.: Adding a conscience to competitive learning. In: IEEE 1988 International Conference on Neural Networks, vol. 1, pp. 117–124 (1988). https://doi.org/10.1109/icnn.1988.23839
Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019)
Google Scholar
Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976)
Google Scholar
Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011). ISSN 0036-8075. https://doi.org/10.1126/science.1200970
Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019)
Google Scholar
Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)
Article Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982)
Google Scholar
Makhoul, L., Rpucos, S., Gish, H.: Vector quantization in speech coding. IEEE Trans. Neural Netw. 73(11), 1551–1558 (1985). https://labrosa.ee.columbia.edu/~dpwe/papers/MakhRG85-vq.pdf
Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2
Google Scholar
Nasrabadi, N.M., King, R.A.: Image coding using vector quantization: a review. IEEE Trans. Commun. 36, 957–971 (1988). ISSN 0090-6778. https://doi.org/10.1109/26.3776
Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530
Google Scholar
Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece
Google Scholar
Xu, L., Krzyzak, A., Oja, E.: Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans. Neural Netw. 4(4), 636–649 (1993). ISSN 1045-9227. https://doi.org/10.1109/72.238318
Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019)
Google Scholar

Download references

Acknowledgements

This research is funded by the EU-H2020 GNFUV Project (#Grant 645220) and the EU-H2020 MSCA INNOVATE Project (#Grant 745829).

Author information

Authors and Affiliations

School of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK
Naji Shehab & Christos Anagnostopoulos

Authors

Naji Shehab
View author publications
You can also search for this author in PubMed Google Scholar
Christos Anagnostopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naji Shehab .

Editor information

Editors and Affiliations

Department of Management Science and Technology, Hellenic Mediterranean University, Agios Nikolaos, Crete, Greece
George Mastorakis
Mobile Systems Laboratory (MoSys Lab), Department of Computer Science, Research Centres (UNRF) – University of Nicosia, Nicosia, Cyprus
Constandinos X. Mavromoustakis
Department of Telecommunications, Warsaw University of Technology, Warsaw, Poland
Jordi Mongay Batalla
Department of Electrical and Computer Engineering, Hellenic Mediterranean University, Heraklion, Crete, Greece
Evangelos Pallis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shehab, N., Anagnostopoulos, C. (2020). Big Data Thinning: Knowledge Discovery from Relevant Data. In: Mastorakis, G., Mavromoustakis, C., Batalla, J., Pallis, E. (eds) Convergence of Artificial Intelligence and the Internet of Things. Internet of Things. Springer, Cham. https://doi.org/10.1007/978-3-030-44907-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-44907-0_11
Published: 07 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44906-3
Online ISBN: 978-3-030-44907-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics