Skip to main content

An Efficient Implementation of the Algorithm by Lukáš et al. on Hadoop

  • Conference paper
  • First Online:
Book cover Green, Pervasive, and Cloud Computing (GPC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10232))

Included in the following conference series:

Abstract

Apache Hadoop offers the possibility of coding full-fledged distributed applications with very low programming efforts. However, the resulting implementations may suffer from some performance bottlenecks that nullify the potential of a distributed system. An engineering methodology based on the implementation of smart optimizations driven by a careful profiling activity may lead to a much better experimental performance as shown in this paper.

In particular, we take as a case study the algorithm by Lukáš et al. used to solve the Source Camera Identification problem (i.e., recognizing the camera used for acquiring a given digital image). A first implementation has been obtained, with little effort, using the default facilities available with Hadoop. A deep profiling allowed us to pinpoint some serious performance issues affecting the initial steps of the algorithm and related to a bad usage of the cluster resources. Optimizations were then developed and their effects were measured by accurate experimentation. The improved implementation is able to optimize the usage of the underlying cluster resources as well as of the Hadoop framework, thus resulting in a much better performance than the original naive implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    A copy of the source code of our implementation is available upon request.

References

  1. Bayram, S., Sencar, H.T., Memon, N., Avcibas, I.: Source camera identification based on CFA interpolation. In: IEEE International Conference on Image Processing (ICIP), vol. 3, pp. 69–72. IEEE (2005)

    Google Scholar 

  2. Cattaneo, G., Ferraro Petrillo, U., Giancarlo, R., Roscigno, G.: An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop. J. Supercomput., 1–17 (2016). http://dx.doi.org/10.1007/s11227-016-1835-3

  3. Cattaneo, G., Ferraro Petrillo, U., Roscigno, G., Fusco, C.: A PNU-based technique to detect forged regions in digital images. In: Battiato, S., Blanc-Talon, J., Gallo, G., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2015. LNCS, vol. 9386, pp. 486–498. Springer, Cham (2015). doi:10.1007/978-3-319-25903-1_42

    Chapter  Google Scholar 

  4. Cattaneo, G., Roscigno, G.: A possible pitfall in the experimental analysis of tampering detection algorithms. In: 17th International Conference on Network-Based Information Systems (NBiS), pp. 279–286, September 2014

    Google Scholar 

  5. Cattaneo, G., Roscigno, G., Bruno, A.: Using PNU-based techniques to detect alien frames in videos. In: Blanc-Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2016. LNCS, vol. 10016, pp. 735–746. Springer, Cham (2016). doi:10.1007/978-3-319-48680-2_64

    Chapter  Google Scholar 

  6. Cattaneo, G., Roscigno, G., Ferraro Petrillo, U.: Experimental evaluation of an algorithm for the detection of tampered JPEG images. In: Linawati, M.M.S., Neuhold, E.J., Tjoa, A.M., You, I. (eds.) CT-EurAsia 2014. LNCS, vol. 8407, pp. 643–652. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55032-4_66

    Chapter  Google Scholar 

  7. Cattaneo, G., Roscigno, G., Ferraro Petrillo, U.: A scalable approach to source camera identification over Hadoop. In: IEEE 28th International Conference on Advanced Information Networking and Applications (AINA), pp. 366–373. IEEE (2014)

    Google Scholar 

  8. Choi, J., Choi, C., Ko, B., Choi, D., Kim, P.: Detecting web based DDoS attack using MapReduce operations in cloud computing environment. J. Internet Serv. Inf. Secur. (JISIS) 3(3/4), 28–37 (2013)

    Google Scholar 

  9. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  10. Ferraro Petrillo, U., Roscigno, G., Cattaneo, G., Giancarlo, R.: FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications. Bioinformatics (2017). https://dx.doi.org/10.1093/bioinformatics/btx010

  11. Fridrich, J., Lukáš, J., Goljan, M.: Detecting digital image forgeries using sensor pattern noise. In: SPIE, Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents VIII, vol. 6072, pp. 1–11 (2006)

    Google Scholar 

  12. Gloe, T.: Feature-based forensic camera model identification. In: Shi, Y.Q., Katzenbeisser, S. (eds.) Transactions on Data Hiding and Multimedia Security VIII. LNCS, vol. 7228, pp. 42–62. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31971-6_3

    Chapter  Google Scholar 

  13. Goljan, M., Fridrich, J., Filler, T.: Large scale test of sensor fingerprint camera identification. In: IS&T/SPIE, Electronic Imaging, Security and Forensics of Multimedia Contents XI, vol. 7254, pp. 1–12. International Society for Optics and Photonics (2009)

    Google Scholar 

  14. Goljan, M., Fridrich, J., Filler, T.: Managing a large database of camera fingerprints. In: SPIE Conference on Media Forensics and Security, vol. 7541, pp. 1–12. International Society for Optics and Photonics (2010)

    Google Scholar 

  15. Golpayegani, N., Halem, M.: Cloud computing for satellite data processing on high end compute clusters. In: IEEE International Conference on Cloud Computing, pp. 88–92. IEEE (2009)

    Google Scholar 

  16. Lukáš, J., Fridrich, J., Goljan, M.: Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 1, 205–214 (2006)

    Article  Google Scholar 

  17. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al.: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)

    Article  Google Scholar 

  18. Precision Optical Imaging: ISO Noise Chart 15739 (2011). http://www.precisionopticalimaging.com/products/products.asp?type=15739

  19. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  20. The Apache Software Foundation: Apache Hadoop (2016). http://hadoop.apache.org/

  21. White, T.: The small files problem. Cloudera (2009). http://www.cloudera.com/blog/2009/02/the-small-files-problem/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gianluca Roscigno .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Cattaneo, G., Ferraro Petrillo, U., Nappi, M., Narducci, F., Roscigno, G. (2017). An Efficient Implementation of the Algorithm by Lukáš et al. on Hadoop. In: Au, M., Castiglione, A., Choo, KK., Palmieri, F., Li, KC. (eds) Green, Pervasive, and Cloud Computing. GPC 2017. Lecture Notes in Computer Science(), vol 10232. Springer, Cham. https://doi.org/10.1007/978-3-319-57186-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57186-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57185-0

  • Online ISBN: 978-3-319-57186-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics