A secure technique for unstructured big data using clustering method

  • Md Tabrez NafisEmail author
  • Ranjit Biswas
Original Research


Analyzing the multi-dimensional data in faster way is an important and basic aspect in any clustering mechanism. At the same time the clustering mechanism should provide a security structure to protect data loss by preventing distinguish security attacks and transmission errors. Also, to control the data size, security overheads and data loss due to data and time overheads during clustering of unstructured and uncertain big data, a data compression system is also needed. Among the various researches, the current literature fails to suggest any integrated technique to solve these issues in a combinatorial way. Henceforth, this research provides a new compact clustering technique which prevents data loss by including SDES encryption technique to protect security attacks, a new error control scheme to address any number of transmission errors and Huffman compression to control the data size and extra security overheads. Apart from that, the faster execution of proposed integrated technique reduces the time overhead. The experimental result shows, it offers higher data integrity by producing lower percentage of Information Loss, higher SNR and compression ratio. Furthermore, the capacity to produce higher Throughputs and low Cyclomatic Complexity shows its time efficiencies.


Big data Data loss Security attacks Transmission errors Unstructured 


  1. 1.
    Ranjan, Andrew, Peter (2013) Benchmarking Apache AccumuloBigData distributed table store using its continuous test suite. 2013 IEEE international congress on Big Data, Santa Clara, CA, pp 334–341Google Scholar
  2. 2.
    Tseng KK, Jiang JM, Pan JS, Tang LL, Hsu CY, Chen CC (2012) Enhanced Huffman coding with encryption for wireless data broadcasting system. In: Computer, consumer and control (IS3C), 2012 international symposium on. IEEE, pp 622–625Google Scholar
  3. 3.
    Jin X, Wah BW, Cheng X, Wang Y (2017) Significance and challenges of big data research. J Big Data Res (Elsevier)Google Scholar
  4. 4.
    Rumbold JMM, Pierscionek BK (2017) What are data? A categorization of the data sensitivity spectrum. J Big Data Res (Elsevier)Google Scholar
  5. 5.
    Bogdan, Cristian (2011) A comparison between several NoSQL databases with comments and notes. In: 2011 RoEduNet international conference 10th edition: networking in education and research, Iasi, pp 1–5Google Scholar
  6. 6.
    Ferrucci et al (2010) Building Watson: an overview of the DeepQA project. AI Magazine, Fall 2010, pp 59–79Google Scholar
  7. 7.
    Breitinger F, Stivaktakis G, Roussev V (2014) Evaluating detection error trade-offs for bytewise approximate matching algorithms. Digit Investig 11(2):81–89CrossRefGoogle Scholar
  8. 8.
    Domingos (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87CrossRefGoogle Scholar
  9. 9.
    Dong XL, Srivastava D (2013) Big data integration. Data Engineering (ICDE), 2013 IEEE 29th International Conference on, Brisbane, QLD, pp 1245–1248Google Scholar
  10. 10.
    Jin Y, Hongbo K (2011) FPGA implementation of dynamic key management for DES encryption algorithm. Electronic and mechanical engineering and information technology (EMEIT), 2011 international conference on, vol 9, pp 4795–4798, 12–14 Aug 2011Google Scholar
  11. 11.
    Puangpronpitag, Kasabai, Pansa (2012) ‘An enhancement of the SDP security description (sdes) for key protection. Electrical engineering/electronics, computer, telecom and information technology (ECTI-CON), 2012 9th international conference on (IEEE), pp 1–4Google Scholar
  12. 12.
    Alomari MA, Samsudin K (2011) A framework for GPU-accelerated AES-XTS encryption in mobile devices. TENCON 2011-2011 IEEE Region 10 Conference (IEEE), pp 144–148Google Scholar
  13. 13.
    Srikanth S, Meher S (2013) Compression efficiency for combining different embedded image compression techniques with Huffman encoding. In: Communications and signal processing (ICCSP), 2013 international conference on. IEEE, pp 816–820Google Scholar
  14. 14.
    Andreu, Joan, Itziar, Jose, Virginia, Agustín, Elisabet (2014) Improving data partition schemes in Smart Grids via clustering data streams. Expert Syst Appl 41(13):5832–5842CrossRefGoogle Scholar
  15. 15.
    Ohbyung, Jae (2013) Effects of data set features on the performances of classification algorithms. Expert Syst Appl 40(5):1847–1857 (ISSN 0957-4174) CrossRefGoogle Scholar
  16. 16.
    Ranga, Auroop, Varun, Anthony, Scott, Shashi (2012) Spatiotemporal data mining in the era of big spatial data: algorithms and applications. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data (BigSpatial ‘12). ACM, New York, USA, pp 1–10Google Scholar
  17. 17.
    Craig, Samuel, Artem, John, Pearl (2011) Distributed semantic web data management in HBase and MySQL cluster. Cloud computing (CLOUD), 2011 IEEE international conference on, Washington, DC, pp 105–112Google Scholar
  18. 18.
    Eldefrawy, Alghathbar, Khan MK (2011) ‘OTP-based two-factor authentication using mobile phones’, Information technology: new generations (ITNG), 2011 eighth international conference on (IEEE), pp 327–31Google Scholar
  19. 19.
    Subramaniyaswamy V, Vijayakumar V, Logesh R, Indragandhi V (2015) Unstructured data analysis on big data using map reduce. Procedia Comput Sci 50:456–465CrossRefGoogle Scholar
  20. 20.
    Patwary MMA, Byna S, Satish NR, Sundaram N, Lukić Z, Roytershteyn V, Anderson MJ, Yao Y, Dubey P (2015) Bd-cats: Big data clustering at trillion particle scale. In: Proceedings of the international conference for high performance computing, networking, storage and analysis. ACM, p 6Google Scholar
  21. 21.
    Diego, Eleni (2015) From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. 2015 IEEE 8th international conference on cloud computing, New York City, NY, pp 81–89Google Scholar
  22. 22.
    Li W, Pang ZP, Liu ZJ (2010) SPIHT algorithm combined with Huffman encoding. In: Intelligent information technology and security informatics (IITSI), 2010 third international symposium on. IEEE, pp 341–343Google Scholar
  23. 23.
    Chandra (2013) The AppScale cloud platform: enabling portable, scalable web application deployment. In: IEEE internet computing, vol 17, no 2, pp 72-75Google Scholar
  24. 24.
    Traganitis PA, Slavakis K, Giannakis GB (2015) Sketch and validate for big data clustering. IEEE J Sel Top Signal Process 9(4):678–690CrossRefGoogle Scholar
  25. 25.
    Subramanyan, Chhabria, Vivek, Babu (2011) Image encryption based on AES key expansion. Emerging applications of information technology (EAIT), 2011 second international conference on (IEEE), pp 217–220Google Scholar
  26. 26.
    Timothy, Vamsi, Dhruba, and Mark (2013) LinkBench: a database benchmark based on the Facebook social graph. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data (SIGMOD ‘13), ACM, New York, NY, USA, pp 1185–1196Google Scholar
  27. 27.
    Baron D, Jacob T (2012) Variable length compression of codeword indices for lossy compression. Signal Process Lett IEEE 19(12):849–852CrossRefGoogle Scholar
  28. 28.
    Yuanbin M, Yubing Q, Jizhong L, Yanxia L (2011) A data compression algorithm based on adaptive Huffman code for wireless sensor networks. Intelligent computation technology and automation (ICICTA), 2011 international conference on (1: IEEE), pp 3–6Google Scholar
  29. 29.
    Lorbeer B, Kosareva A et al (2017) Variations on the clustering algorithm BIRCH. J Big Data Res (Elsevier)Google Scholar
  30. 30.
    Pal A, Agrawal A (2014) An experimental approach towards big data for analyzing memory utilization on a hadoop cluster using HDFS and MapReduce. In: Networks and soft computing (ICNSC), 2014 first international conference on, pp 442–447. IEEEGoogle Scholar
  31. 31.
    Zhen et al (2014) Characterizing and subsetting big data workloads. Workload characterization (IISWC), 2014 IEEE international symposium on, Raleigh, NC, pp 191–201Google Scholar
  32. 32.
    Eric et al (2015) Petuum: a new platform for distributed machine learning on big data. IEEE Trans Big Data 1(2):49–67MathSciNetCrossRefGoogle Scholar

Copyright information

© Bharati Vidyapeeth's Institute of Computer Applications and Management 2019

Authors and Affiliations

  1. 1.Department of Computer Science and Engineering, School of Engineering Sciences and TechnologyJamia HamdardNew DelhiIndia

Personalised recommendations