Abstract
A huge amount of data is coming due to large set of computing devices. As a birth of the variety of data, data processing and analysis is a big issue in big data analytics. On other hand, data consistency and scalability is also a major problem in the large set of data. Our research and proposed algorithm aims to data extraction, aggregation, and classification based on novel approach as “DataSpeak”. We have used k-Nearest Neighbors with Spark as reference and produced a novel approach with modified algorithm. We have analyzed our approach on the large dataset from travel and tourism, placement papers, movies and historical, smartphone, etc., domains. As for ability and accuracy of our algorithm, we have used cross validation, precision, recall, and comparative statistical analysis with the existing algorithm. Our approach returns with the fast accessing of data with efficient data extraction in a minimal time when compared to the existing algorithm in same domain. As concerned with the data aggregation and classification, our approach returns 98% of data aggregation and classification based on the data structure.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Google Cloud and Big Data. https://cloud.google.com/bigquery/ (2016). Accessed 20 Oct 2016
Digital Innovation Mobile Big Data. www.digitalinnovationgazette.com/mobile\big\data/ (2017). Accessed 27 Nov 2017
Venturebeat Big Data Analytics. www.venturebeat.com/2015/01/22/big-data-and-mobile-analytics-ready-to-rule-2015/ (2017). Accessed 15 Oct 2017
Knowledge Hut Types of Big Data. https://www.knowledgehut.com/blog/bigdata-hadoop/types-of-big-data (2017). Accessed 22 Sept 2017
Impact Radius The seven Vs. https://www.impactradius.com/blog/7-vs-big-data/ (2017). Accessed 02 Nov 2017
Cover, T., Hart, P.: Nearest neighbor pattern classification. In: IEEE Transactions on Information Theory, vol. 13, Issue 1, pp. 21–27, Jan 1967. https://doi.org/10.1109/TIT.1967.1053964
Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on WWW, pp. 287–297 (2016)
Shankar, V.G., Somani, G., Gaur, M.S., Laxmi, V., Conti, M.: AndroTaint: an efficient android malware detection framework using dynamic taint analysis. In: 2017 ISEA Asia Security and Privacy (ISEASP), Surat, pp. 1–13 (2017). https://doi.org/10.1109/iseasp.2017.7976989
Shrivastava, A., Verma, V.K., Shankar, V.G.: XTrap: trapping client and server side XSS vulnerability. In: 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat, pp. 394–398 (2016). https://doi.org/10.1109/pdgc.2016.7913227
Data Aspirant k- Nearest Neighbor. http://dataaspirant.com/2016/12/23/k-nearest-neighbor-classifier-intro/ (2017). Accessed 14 Aug 2017
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, vol. 93, pp. 311–321 (1993)
Vaidya, P.M.: An o(nlogn) algorithm for the all-nearest-neighbors problem. In: Discrete Computational Geometry, vol. 4(2), pp. 101–115 (1989)
Apache Apache Spark. https://spark.apache.org/ (2017). Accessed 26 Aug 2017
Nada, E., Ahmed, E.: Big data analytics: a literature review paper. In: Lecture Notes in Computer Science, pp. 214–227. Springer, Aug 2014
Demetrios, Z.Y., Shonali, K.: Mobile big data analytics: research, practice, and opportunities. In: Proceeding MDM’ 2014, 15th International Conference on Mobile Data Management, vol. 01, pp. 1–2 (2014)
He, Y., Yu, F.R., Zhao, N., Yin, H., Yao, H., Robert, C.: Big data analytics in mobile cellular networks. In: IEEE Access, vol. 4 (2016). https://doi.org/10.1109/access.2016.2540520
EMC.: Dell EMC data science analytics. In: EMC Education Services, pp. 1–508 (2015)
Shankar, V.G., Somani, G.: Anti-Hijack: runtime detection of malware initiated hijacking in android. In: Procedia Computer Science, vol. 78, pp. 587–594 (2016). https://doi.org/10.1016/j.procs.2016.02.105
Fu, C., Cai, D.: EFANNA: An extremely fast approximate nearest neighbor search algorithm based on kNN graph. In: Computer Vision and Pattern Recognition (2016). http://arxiv.org/abs/1609.07228
Georgios, S., Mavromoustakis, C.X., Mastorakis, G., Batalla, J.M., Dobre, C., Panagiotakis, S., Pallis, E.: Big data and cloud computing: a survey of the state-of-the-art and research challenges. In: Advances in Mobile Cloud Computing and Big Data in the 5G Era Studies in Big Data 22 (2017)
Kune, R., Konugurthi, P.K., Agarwal, A., Chillarige, R.R., Buyya, R.: The anatomy of big data computing. In: Softw. Pract. Exper. 46, 79105 (2016)
Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big Data and cloud computing: innovation opportunities and challenges. In: International Journal of Digital Earth. Published by Informa UK Limited, trading as Taylor Francis (2016)
Tsai, C.W., Lai, C.F., Chao, H.C., Vasilakos, A.V.: Big Data Anal Surv J. Big Data 2, 21 (2015). https://doi.org/10.1186/s40537-015-0030-3
Knoema Tourism Dataset. https://knoema.com/atlas/topics/Tourism/datasets (2017). Accessed 24 Oct 2017
Vyoms Placement Dataset. http://www.vyoms.com/placement-papers/domains/details/business-analysis-223.asp (2017). Accessed 28 Oct 2017
IMDB Movies Dataset. https://www.kaggle.com/orgesleka/imdbmovies (2017). Accessed 21 Oct 2017
Google Smartphone Dataset. https://cloud.google.com/public-datasets/ (2017). Accessed 19 Oct 2017
Rtwilson Geographical Dataset. https://freegisdata.rtwilson.com/ (2017). Accessed 13 Oct 2017
Google Satellite Dataset. https://earthengine.google.com/datasets/ (2017). Accessed 13 Oct 2017
MIT Genetic Dataset. https://www.ll.mit.edu//ideval/data/ (2017). Accessed 17 Oct 2017
Shankar, V.G., Jangid, M., Devi, B., Kabra, S.: Mobile big data: malware and its analysis. In: Proceedings of First International Conference on Smart System, Innovations and Computing. Smart Innovation, Systems and Technologies, vol. 79, pp. 831–842, Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5828-8_79
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shankar, V.G., Devi, B., Srivastava, S. (2019). DataSpeak: Data Extraction, Aggregation, and Classification Using Big Data Novel Algorithm. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds) Computing, Communication and Signal Processing . Advances in Intelligent Systems and Computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_16
Download citation
DOI: https://doi.org/10.1007/978-981-13-1513-8_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1512-1
Online ISBN: 978-981-13-1513-8
eBook Packages: EngineeringEngineering (R0)