Skip to main content
Log in

Distributed media indexing based on MPI and MapReduce

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Web-scale digital assets comprise millions or billions of documents. Due to such increase, sequential algorithms cannot cope with this data, and parallel and distributed computing become the solution of choice. MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. In contrast, the message passing interface (MPI) is suitable for high performance algorithms. This paper proposes an adapted structure of the MapReduce programming model using MPI for multimedia indexing. Experimental results are done on various multimedia applications to validate our model. The experiments indicate that our proposed model achieves good speedup compared to the original sequential versions, Hadoop and the earlier versions of MapReduce using MPI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Ahmad F, Lee S, Thottethodi M, Vijaykumar TN (2007) Mapreduce with communication overlap. Technical report

  2. Amato G, Savino P (2008) Approximate similarity search in metric spaces using inverted files. In: Proceedings of the 3rd international conference on scalable information systems, InfoScale ’08, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). ICST, Brussels, Belgium, pp 28:1–28:10. http://dl.acm.org/citation.cfm?id=1459693.1459731

  3. Bruno E, Marchand-Maillet S (2009) Multimodal preference aggregation for multimedia information retrieval. J Multimedia 4(5):321–329

    Article  Google Scholar 

  4. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22

  5. Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on opearting systems design & implementation, vol 6. USENIX Association, Berkeley, p 10

    Google Scholar 

  6. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR ’09

  7. Ekanayake J, Li H, Zhang B, Gunarathne T, Bae SH, Qiu J, Fox G (2011) Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, HPDC ’10. ACM, pp 810–818. doi:10.1145/1851476.1851593

  8. Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI users’ group meeting, pp 97–104

  9. Gillick D, Faria A, Denero J (2006) Mapreduce: distributed computing for machine learning

  10. Gonzalez E, Figueroa K, Navarro G (2008) Effective proximity retrieval by ordering permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658. doi:10.1109/TPAMI.2007.70815

    Article  Google Scholar 

  11. Gropp W, Lusk E, Skjellum A (1994) Using MPI: portable parallel programming with the message-passing interface. MIT Press, Cambridge

    Google Scholar 

  12. Hoefler T, Lumsdaine A, Dongarra J (2009) Towards efficient mapreduce using mpi. In: Ropo M, Westerholm J, Dongarra J (eds) PVM/MPI, Lecture notes in computer science, vol 5759. Springer, pp 240–249

  13. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on theory of computing, STOC ’98. ACM, New York, pp 604–613. doi:10.1145/276698.276876

    Google Scholar 

  14. Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-based queries. In: Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS ’95. ACM, New York, pp 36–45. doiL:10.1145/212433.212444

    Chapter  Google Scholar 

  15. Kumar V (2002) Introduction to parallel computing, 2nd edn. Addison-Wesley Longman, Boston

    Google Scholar 

  16. Lu X, Wang B, Zha L, Xu Z (2011) Can mpi benefit hadoop and mapreduce applications? In: 40th international conference on parallel processing workshops (ICPPW), pp 371–379. doi:10.1109/ICPPW.2011.56

  17. McCreadie R, Macdonald C, Ounis I (2011) Mapreduce indexing strategies: studying scalability and efficiency. Inf Process Manag. doi:10.1016/j.ipm.2010.12.003

  18. Message passing interface. http://www.mpi-forum.org/

  19. Mpich2. http://www.mcs.anl.gov/mpi/mpich2

  20. Patella M, Ciaccia P (2009) Approximate similarity search: a multi-faceted problem. J Discrete Algorithms 7(1):36–48. doi:10.1016/j.jda.2008.09.014

    Article  MATH  MathSciNet  Google Scholar 

  21. Plimpton SJ, Devine KD (2011) Mapreduce in mpi for large-scale graph algorithms. Parallel Comput 37(9):610–632

    Article  Google Scholar 

  22. Project gutenberg. http://www.gutenberg.org/

  23. Rajasekaran R, Reif J (2007) Handbook of parallel computing: models, algorithms and applications. CRC Press

  24. Samet H (2006) Foundations of multidimensional and metric data structures. In: The Morgan Kaufmann series in computer graphics and geometric modeling. Elsevier/Morgan Kaufmann. http://books.google.ch/books?id=vO-NRRKHG84C

  25. Stanfill C (1990) Partitioned posting files: a parallel inverted file structure for information retrieval. In: Proceedings of the 13th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’90. ACM, New York, pp 413–428. doi:10.1145/96749.98247

    Chapter  Google Scholar 

  26. White T (2009) Hadoop: the definitive guide, 1st edn. O’Reilly

  27. Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  28. von Wyl M, Mohamed H, Bruno E, Marchand-Maillet S (2011) A parallel cross-modal search engine over large-scale multimedia collections with interactive relevance feedback. In: Proceedings of the 1st ACM international conference on multimedia retrieval, pp 73:1–73:2

  29. Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach, advances in database systems, vol 32. Springer

Download references

Acknowledgements

This work is jointly supported by the Swiss National Science Foundation (SNSF) via the Swiss National Center of Competence in Research (NCCR) on Interactive Multimodal Information Management (IM2) and the European COST Action on Multilingual and Multifaceted Interactive Information Access (MUMIA) via the Swiss State Secretariat for Education and Research (SER).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hisham Mohamed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohamed, H., Marchand-Maillet, S. Distributed media indexing based on MPI and MapReduce. Multimed Tools Appl 69, 513–537 (2014). https://doi.org/10.1007/s11042-012-1283-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1283-x

Keywords

Navigation