Skip to main content

Big Data Thinning: Knowledge Discovery from Relevant Data

  • Chapter
  • First Online:
  • 874 Accesses

Part of the book series: Internet of Things ((ITTCC))

Abstract

Using statistical learning theory and machine learning techniques surrounding the principles of Rival Penalised Competitive Learning (RPCL), this chapter proposes a novel approach aiming to aid Big Data Thinning, i.e., analysing only the potential data sub-spaces and not the entire extensive data space. Data scientists, data analysts, IoT applications and Edge-centric services are in need for predictive modelling and analytics. This is achieved by learning from past issued analytics queries and exploiting the analytics query access patterns over the large distributed data-sets revealing the most interested and important sub-spaces for further exploratory analysis. By analysing user queries and respectively mapping them into relatively small-scale predictive local regression models, we can yield higher predictive accuracy. This is done by thinning the data space and freeing it of irrelevant and non-popular data sub-spaces; thus, making use of less training data instances. Experimental results and statistical analysis support the research idea proposed in this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   79.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ahalt, S.C., Krishnamurthy, A.K., Chen, P., Melton, D.E.: Competitive learning algorithms for vector quantization. Neural Netw. 3(3), 277–290 (1990). ISSN 0893-6080. https://doi.org/10.1016/0893-6080(90)90071-R

  2. Anagnostopoulos, C., Kolomvatsos, K.: Predictive intelligence to the edge through approximate collaborative context reasoning. Appl. Intell. 48(4), 966–991 (2018)

    Google Scholar 

  3. Anagnostopoulos, C., Triantafillou, P.: Efficient scalable accurate regression queries in In-DBMS analytics. In: IEEE International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 (2017)

    Google Scholar 

  4. Anagnostopoulos, C., Triantafillou, P.: Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics (2018)

    Google Scholar 

  5. Anagnostopoulos, C., Triantafillou, P.: Query-driven learning for predictive analytics of data subspace cardinality. ACM Trans Knowl Discov. Data 11(4), 47 (2017)

    Article  Google Scholar 

  6. Anagnostopoulos, C., Savva, F., Triantafillou, P.: Scalable aggregation predictive analytics: a query-driven machine learning approach. Appl. Intell. 48(9), 2546–2567 (2018)

    Article  Google Scholar 

  7. Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07. Society for Industrial and Applied Mathematics, pp. 1027–1035. Philadelphia, PA, USA (2007). ISBN 978-0-898716-24-5. http://dl.acm.org/citation.cfm?id=1283383.1283494

  8. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012). ISSN 2150-8097. https://doi.org/10.14778/2180912.2180915

  9. Bohn, R., Short, J.E.: How much information? 2009 report on American consumers, vol. 01 (2009). https://www.researchgate.net/publication/242562463_How_Much_ Information_2009_Report_on_American_Consumers

  10. Bohn, R., Short, J.E.: How much information? 2010 report on enterprise server information, p. 7 (2010). https://www.clds.info/uploads/1/2/0/5/120516768/hmi_ 2010_enterprisereport_jan_2011.pdf

  11. Botoca, C., Budura, G., Miclau, N.: Competitive learning algorithms for data clustering. Facta Univ. Ser. Electron. Energetics 19, 01 (2005). https://doi.org/10.2298/FUEE0602261B

    Article  Google Scholar 

  12. Constandinos, X.M., George, M., Jordi, M.B.: Internet of Things (IoT) in 5G Mobile Technologies. Springer International Publishing AG (2016). ISSN 2196-7326. https://doi.org/10.1007/978-3-319-30913-2

  13. Constandinos X.M. et al.: Socially-oriented edge computing for energy-awareness in IoT architectures. IEEE Commun. (2019)

    Google Scholar 

  14. Contandriopoulos, D., Brousselle, A.: Evaluation models and evaluation use. Evaluation 18(1), 61–77 (2012). https://doi.org/10.1177/1356389011430371

    Article  Google Scholar 

  15. Desieno, D.: Adding a conscience to competitive learning. In: IEEE 1988 International Conference on Neural Networks, vol. 1, pp. 117–124 (1988). https://doi.org/10.1109/icnn.1988.23839

  16. Georgios, S. et al.: Elasticity debt analytics exploitation for green mobile cloud computing: an equilibrium model. IEEE Trans. Green Commun. Netw. (2019)

    Google Scholar 

  17. Grossberg, S.: Adaptive pattern classification and universal recoding: 1. Parallel development and coding of neural feature detectors. Biol. Cybern. 23, 121–134 (1976)

    Google Scholar 

  18. Hilbert, M., López, P.: The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011). ISSN 0036-8075. https://doi.org/10.1126/science.1200970

  19. Jun, L. et al.: D2D communication mode selection and resource optimization algorithm with optimal throughput in 5G network. IEEE Access, pp. 25263–25273 (2019)

    Google Scholar 

  20. Kolomvatsos, K., Anagnostopoulos, C.: Reinforcement machine learning for predictive analytics in smart cities. Informatics 4(3), 16 (2017)

    Article  Google Scholar 

  21. Lloyd, S.P.: Least squares quantization in PCM. Information Theory, IEEE Trans. 28(2), 129–137 (1982)

    Google Scholar 

  22. Makhoul, L., Rpucos, S., Gish, H.: Vector quantization in speech coding. IEEE Trans. Neural Netw. 73(11), 1551–1558 (1985). https://labrosa.ee.columbia.edu/~dpwe/papers/MakhRG85-vq.pdf

  23. Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1989). ISBN 0-13-485558-2

    Google Scholar 

  24. Nasrabadi, N.M., King, R.A.: Image coding using vector quantization: a review. IEEE Trans. Commun. 36, 957–971 (1988). ISSN 0090-6778. https://doi.org/10.1109/26.3776

  25. Rumelhart, D., McClelland, J.: University of California. Parallel Distributed Processing: Foundations. A Bradford book. MIT Press (1986). ISBN 9780262680530

    Google Scholar 

  26. Stelios, P., Evangelos, S., George, M., Constandinos, X.M.: A hyper-box approach using relational databases for large scale machine learning. International conference on telecommunications and multimedia TEMU 2014. IEEE Communications Society proceedings, pp. 69–73, 28–30 July, Crete, Greece

    Google Scholar 

  27. Xu, L., Krzyzak, A., Oja, E.: Rival penalized competitive learning for clustering analysis, RBF net, and curve detection. IEEE Trans. Neural Netw. 4(4), 636–649 (1993). ISSN 1045-9227. https://doi.org/10.1109/72.238318

  28. Yannis, N. et al.: Vulnerability assessment as a Service for Fog-Centric Healthcare ICT ecosystems. J. Peer-to-Peer Netw. Appl. Springer (2019)

    Google Scholar 

Download references

Acknowledgements

This research is funded by the EU-H2020 GNFUV Project (#Grant 645220) and the EU-H2020 MSCA INNOVATE Project (#Grant 745829).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naji Shehab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shehab, N., Anagnostopoulos, C. (2020). Big Data Thinning: Knowledge Discovery from Relevant Data. In: Mastorakis, G., Mavromoustakis, C., Batalla, J., Pallis, E. (eds) Convergence of Artificial Intelligence and the Internet of Things. Internet of Things. Springer, Cham. https://doi.org/10.1007/978-3-030-44907-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44907-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44906-3

  • Online ISBN: 978-3-030-44907-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics