Cluster Computing

, Volume 22, Supplement 5, pp 10467–10479 | Cite as

Detection of duplicated data with minimum overhead and secure data transmission for sensor big data

  • S. BeulahEmail author
  • F. Ramesh Dhanaseelan


Big data refers to the data sets that are difficult to deal with traditional data processing applications because of its speed, size and variety of data. The big data were generated from activities, sensing devices, mobile devices, Internet, RFID readers etc. One of the key sources of big data is the data from the sensor. The significant amounts of the data from the sensor are either redundant or almost similar. It initiates the requirement of de-duplication of the sensor data. The data from the sensors need to be stored for further process or analysis which requires end-to-end security for the data. A method is proposed in this paper for detecting the similar data with light-weight process using pattern analysis and matching. The distributed encoding process is proposed here for imposing end-to-end security for the generated data with reduced communication overhead. The data received in the processing server are decoded, analyzed and matched with patterns for removing similar and duplicated data. The result shows that the proposed system secures data during transmission with light-weighted processes. The duplicated and similar data are detected efficiently through inline process before the data enter into the storage. Experimental results are given as proof of the above mentioned concept.


Secure transmission De-duplication Cluster formation Pattern generation Pattern matching 


  1. 1.
    Digital Data Created in 2020 Forecasted at 35 Zettabytes; Cloud Computing Will Manage Data Growth [Online]. By Todd EricksonGoogle Scholar
  2. 2.
    Gantz, J., Reinsel, D.: Extracting value from chaos. IDC Rev. 1142, 1–12 (2011)Google Scholar
  3. 3.
    Xia, W., et al.: DARE: A deduplication-aware resemblance detection and elimination scheme for data reduction with low overheads. IEEE Trans. Comput. 65(6), 1692–1705 (2016)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings USENIX Conferences File Storage Technologies, Jan, pp. 89–101 (2002)Google Scholar
  5. 5.
    Zhu, B., Li, K., Patterson, R.H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings 6th USENIX Conferences File Storage Technologies, Feb, vol. 8, pp. 1–14 (2008)Google Scholar
  6. 6.
    Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. In: Proceedings ACM Symposium on Operating Systems Principles. Oct, pp. 1–14 (2001)Google Scholar
  7. 7.
    Shilane, P., Huang, M., Wallace, G., Hsu, W.: WAN optimized replication of backup datasets using stream-informed delta compression. In: Proceedings 10th USENIX Conferences File Storage Technol., Feb, pp. 49–64 (2012)Google Scholar
  8. 8.
    Kulkarni, P., Douglis, F., LaVoie, J.D., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings USENIX Annual Technical Conference, Jun, pp. 59–72 (2012)Google Scholar
  9. 9.
    Yang, Q., Ren, J.: I-cash: Intelligently coupled array of SSD and HDD. In: Proceedings 17th IEEE International Symposium High Perform. Computer Architecture , Feb, pp. 278–289 (2011)Google Scholar
  10. 10.
    Gupta, D., Lee, S., Vrable, M., Savage, S., Snoeren, A.C., Varghese, G., Voelker, G.M., Vahdat, A.: Difference engine: Harnessing memory redundancy in virtu’al machines. In: Proceedings 5th Symposium on Operating Systems Design Implementation., Dec, pp. 309–322 ( 2008)Google Scholar
  11. 11.
    Dong, X., et al.: Secure sensitive data sharing on a big data platform. Tsinghua Sci. Technol. 20(1), 72–80 (2015)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Muhammad, K.; Steganography: A Secure way for Transmission in Wireless Sensor Networks. arXiv preprint arXiv:1511.08865 (2015)
  13. 13.
    Lu, H., Li, J., Guizani, M.: Secure and efficient data transmission for cluster-based wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 25(3), 750–761 (2014)CrossRefGoogle Scholar
  14. 14.
    Xu, K., Yue, H., Guo, L., Guo, Y., Fang, Y.; Privacy-preserving machine learning algorithms for big data systems. In: ICDCS, IEEE (2015)Google Scholar
  15. 15.
    Zakerzadeh, H., Aggarwal, C.C, Barker, K.: Privacy-preserving big data publishing. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management. ACM (2015)Google Scholar
  16. 16.
    Yan, Z., et al.: Deduplication on encrypted big data in cloud. IEEE Trans. Big Data 2(2), 138–150 (2016)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Hashem, I.A.T., et al.: The rise of “big data” on cloud computing: Review and open research issues. Inf. Syst. 47, 98–115 (2015)CrossRefGoogle Scholar
  18. 18.
    Harnik, D., Pinkas, B., Shulman-Peleg, A.: Side channels in cloud services: Deduplication in cloud storage. IEEE Secur. Priv. 8(6), 40–47 (2010)CrossRefGoogle Scholar
  19. 19.
    Akhila, K., Ganesh, A., Sunitha, C.: A study on deduplication techniques over encrypted data. Proc. Comput. Sci. 87, 38–43 (2016)CrossRefGoogle Scholar
  20. 20.
    Jayapandian, N., Md Rahman, A.M.J.: Secure and efficient online data storage and sharing over cloud environment using probabilistic with homomorphic encryption. Clust. Comput. 20(2), 1561–1573 (2017)CrossRefGoogle Scholar
  21. 21.
    Stanek, J. et al.: A secure data deduplication scheme for cloud storage. In: International Conference on Financial Cryptography and Data Security. Springer, Heidelberg (2014)Google Scholar
  22. 22.
    Low, W.L., Lee, M.L., Ling, T.W.: A knowledge-based approach for duplicate elimination in data cleaning. Inf. Syst. 26(8), 585–606 (2001)CrossRefGoogle Scholar
  23. 23.
    Li, Y. et al.: Outsourced privacy-preserving C4. 5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties. Clust. Comput. (2017). doi: 10.1007/s10586-017-1019-9
  24. 24.
    Sharma, P.K., Mahajan, R.: A security architecture for attacks detection and authentication in wireless mesh networks. Clust. Comput. (2017). doi: 10.1007/s10586-017-0970-9

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Computer ApplicationsNoorul Islam Centre for Higher EducationKanyakumariIndia
  2. 2.Department of Computer ApplicationsSt. Xaviers Catholic College of EngineeringNagercoil, KanyakumariIndia

Personalised recommendations