Skip to main content

DataSpeak: Data Extraction, Aggregation, and Classification Using Big Data Novel Algorithm

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 810))

Abstract

A huge amount of data is coming due to large set of computing devices. As a birth of the variety of data, data processing and analysis is a big issue in big data analytics. On other hand, data consistency and scalability is also a major problem in the large set of data. Our research and proposed algorithm aims to data extraction, aggregation, and classification based on novel approach as “DataSpeak”. We have used k-Nearest Neighbors with Spark as reference and produced a novel approach with modified algorithm. We have analyzed our approach on the large dataset from travel and tourism, placement papers, movies and historical, smartphone, etc., domains. As for ability and accuracy of our algorithm, we have used cross validation, precision, recall, and comparative statistical analysis with the existing algorithm. Our approach returns with the fast accessing of data with efficient data extraction in a minimal time when compared to the existing algorithm in same domain. As concerned with the data aggregation and classification, our approach returns 98% of data aggregation and classification based on the data structure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Google Cloud and Big Data. https://cloud.google.com/bigquery/ (2016). Accessed 20 Oct 2016

  2. Digital Innovation Mobile Big Data. www.digitalinnovationgazette.com/mobile\big\data/ (2017). Accessed 27 Nov 2017

  3. Venturebeat Big Data Analytics. www.venturebeat.com/2015/01/22/big-data-and-mobile-analytics-ready-to-rule-2015/ (2017). Accessed 15 Oct 2017

  4. Knowledge Hut Types of Big Data. https://www.knowledgehut.com/blog/bigdata-hadoop/types-of-big-data (2017). Accessed 22 Sept 2017

  5. Impact Radius The seven Vs. https://www.impactradius.com/blog/7-vs-big-data/ (2017). Accessed 02 Nov 2017

  6. Cover, T., Hart, P.: Nearest neighbor pattern classification. In: IEEE Transactions on Information Theory, vol. 13, Issue 1, pp. 21–27, Jan 1967. https://doi.org/10.1109/TIT.1967.1053964

    Article  Google Scholar 

  7. Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on WWW, pp. 287–297 (2016)

    Google Scholar 

  8. Shankar, V.G., Somani, G., Gaur, M.S., Laxmi, V., Conti, M.: AndroTaint: an efficient android malware detection framework using dynamic taint analysis. In: 2017 ISEA Asia Security and Privacy (ISEASP), Surat, pp. 1–13 (2017). https://doi.org/10.1109/iseasp.2017.7976989

  9. Shrivastava, A., Verma, V.K., Shankar, V.G.: XTrap: trapping client and server side XSS vulnerability. In: 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat, pp. 394–398 (2016). https://doi.org/10.1109/pdgc.2016.7913227

  10. Data Aspirant k- Nearest Neighbor. http://dataaspirant.com/2016/12/23/k-nearest-neighbor-classifier-intro/ (2017). Accessed 14 Aug 2017

  11. Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, vol. 93, pp. 311–321 (1993)

    Google Scholar 

  12. Vaidya, P.M.: An o(nlogn) algorithm for the all-nearest-neighbors problem. In: Discrete Computational Geometry, vol. 4(2), pp. 101–115 (1989)

    Article  MathSciNet  Google Scholar 

  13. Apache Apache Spark. https://spark.apache.org/ (2017). Accessed 26 Aug 2017

  14. Nada, E., Ahmed, E.: Big data analytics: a literature review paper. In: Lecture Notes in Computer Science, pp. 214–227. Springer, Aug 2014

    Google Scholar 

  15. Demetrios, Z.Y., Shonali, K.: Mobile big data analytics: research, practice, and opportunities. In: Proceeding MDM’ 2014, 15th International Conference on Mobile Data Management, vol. 01, pp. 1–2 (2014)

    Google Scholar 

  16. He, Y., Yu, F.R., Zhao, N., Yin, H., Yao, H., Robert, C.: Big data analytics in mobile cellular networks. In: IEEE Access, vol. 4 (2016). https://doi.org/10.1109/access.2016.2540520

    Article  Google Scholar 

  17. EMC.: Dell EMC data science analytics. In: EMC Education Services, pp. 1–508 (2015)

    Google Scholar 

  18. Shankar, V.G., Somani, G.: Anti-Hijack: runtime detection of malware initiated hijacking in android. In: Procedia Computer Science, vol. 78, pp. 587–594 (2016). https://doi.org/10.1016/j.procs.2016.02.105

    Article  Google Scholar 

  19. Fu, C., Cai, D.: EFANNA: An extremely fast approximate nearest neighbor search algorithm based on kNN graph. In: Computer Vision and Pattern Recognition (2016). http://arxiv.org/abs/1609.07228

  20. Georgios, S., Mavromoustakis, C.X., Mastorakis, G., Batalla, J.M., Dobre, C., Panagiotakis, S., Pallis, E.: Big data and cloud computing: a survey of the state-of-the-art and research challenges. In: Advances in Mobile Cloud Computing and Big Data in the 5G Era Studies in Big Data 22 (2017)

    Google Scholar 

  21. Kune, R., Konugurthi, P.K., Agarwal, A., Chillarige, R.R., Buyya, R.: The anatomy of big data computing. In: Softw. Pract. Exper. 46, 79105 (2016)

    Google Scholar 

  22. Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big Data and cloud computing: innovation opportunities and challenges. In: International Journal of Digital Earth. Published by Informa UK Limited, trading as Taylor Francis (2016)

    Google Scholar 

  23. Tsai, C.W., Lai, C.F., Chao, H.C., Vasilakos, A.V.: Big Data Anal Surv J. Big Data 2, 21 (2015). https://doi.org/10.1186/s40537-015-0030-3

    Article  Google Scholar 

  24. Knoema Tourism Dataset. https://knoema.com/atlas/topics/Tourism/datasets (2017). Accessed 24 Oct 2017

  25. Vyoms Placement Dataset. http://www.vyoms.com/placement-papers/domains/details/business-analysis-223.asp (2017). Accessed 28 Oct 2017

  26. IMDB Movies Dataset. https://www.kaggle.com/orgesleka/imdbmovies (2017). Accessed 21 Oct 2017

  27. Google Smartphone Dataset. https://cloud.google.com/public-datasets/ (2017). Accessed 19 Oct 2017

  28. Rtwilson Geographical Dataset. https://freegisdata.rtwilson.com/ (2017). Accessed 13 Oct 2017

  29. Google Satellite Dataset. https://earthengine.google.com/datasets/ (2017). Accessed 13 Oct 2017

  30. MIT Genetic Dataset. https://www.ll.mit.edu//ideval/data/ (2017). Accessed 17 Oct 2017

  31. Shankar, V.G., Jangid, M., Devi, B., Kabra, S.: Mobile big data: malware and its analysis. In: Proceedings of First International Conference on Smart System, Innovations and Computing. Smart Innovation, Systems and Technologies, vol. 79, pp. 831–842, Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5828-8_79

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Venkatesh Gauri Shankar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shankar, V.G., Devi, B., Srivastava, S. (2019). DataSpeak: Data Extraction, Aggregation, and Classification Using Big Data Novel Algorithm. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds) Computing, Communication and Signal Processing . Advances in Intelligent Systems and Computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1513-8_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1512-1

  • Online ISBN: 978-981-13-1513-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics