Distributed media indexing based on MPI and MapReduce

Mohamed, Hisham; Marchand-Maillet, Stéphane

doi:10.1007/s11042-012-1283-x

Distributed media indexing based on MPI and MapReduce

Published: 15 November 2012

Volume 69, pages 513–537, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hisham Mohamed¹ &
Stéphane Marchand-Maillet¹

406 Accesses
8 Citations
Explore all metrics

Abstract

Web-scale digital assets comprise millions or billions of documents. Due to such increase, sequential algorithms cannot cope with this data, and parallel and distributed computing become the solution of choice. MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. In contrast, the message passing interface (MPI) is suitable for high performance algorithms. This paper proposes an adapted structure of the MapReduce programming model using MPI for multimedia indexing. Experimental results are done on various multimedia applications to validate our model. The experiments indicate that our proposed model achieves good speedup compared to the original sequential versions, Hadoop and the earlier versions of MapReduce using MPI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced Single-Pass Algorithm for Efficient Indexing Using Hashing in Map Reduce Paradigm

Multimedia Video Information Retrieval Based on MapReduce under Cloud Computing

Time-Quality Tradeoff of MuseHash Query Processing Performance

References

Ahmad F, Lee S, Thottethodi M, Vijaykumar TN (2007) Mapreduce with communication overlap. Technical report
Amato G, Savino P (2008) Approximate similarity search in metric spaces using inverted files. In: Proceedings of the 3rd international conference on scalable information systems, InfoScale ’08, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). ICST, Brussels, Belgium, pp 28:1–28:10. http://dl.acm.org/citation.cfm?id=1459693.1459731
Bruno E, Marchand-Maillet S (2009) Multimodal preference aggregation for multimedia information retrieval. J Multimedia 4(5):321–329
Article Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on opearting systems design & implementation, vol 6. USENIX Association, Berkeley, p 10
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR ’09
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae SH, Qiu J, Fox G (2011) Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, HPDC ’10. ACM, pp 810–818. doi:10.1145/1851476.1851593
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI users’ group meeting, pp 97–104
Gillick D, Faria A, Denero J (2006) Mapreduce: distributed computing for machine learning
Gonzalez E, Figueroa K, Navarro G (2008) Effective proximity retrieval by ordering permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658. doi:10.1109/TPAMI.2007.70815
Article Google Scholar
Gropp W, Lusk E, Skjellum A (1994) Using MPI: portable parallel programming with the message-passing interface. MIT Press, Cambridge
Google Scholar
Hoefler T, Lumsdaine A, Dongarra J (2009) Towards efficient mapreduce using mpi. In: Ropo M, Westerholm J, Dongarra J (eds) PVM/MPI, Lecture notes in computer science, vol 5759. Springer, pp 240–249
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on theory of computing, STOC ’98. ACM, New York, pp 604–613. doi:10.1145/276698.276876
Google Scholar
Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-based queries. In: Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS ’95. ACM, New York, pp 36–45. doiL:10.1145/212433.212444
Chapter Google Scholar
Kumar V (2002) Introduction to parallel computing, 2nd edn. Addison-Wesley Longman, Boston
Google Scholar
Lu X, Wang B, Zha L, Xu Z (2011) Can mpi benefit hadoop and mapreduce applications? In: 40th international conference on parallel processing workshops (ICPPW), pp 371–379. doi:10.1109/ICPPW.2011.56
McCreadie R, Macdonald C, Ounis I (2011) Mapreduce indexing strategies: studying scalability and efficiency. Inf Process Manag. doi:10.1016/j.ipm.2010.12.003
Message passing interface. http://www.mpi-forum.org/
Mpich2. http://www.mcs.anl.gov/mpi/mpich2
Patella M, Ciaccia P (2009) Approximate similarity search: a multi-faceted problem. J Discrete Algorithms 7(1):36–48. doi:10.1016/j.jda.2008.09.014
Article MATH MathSciNet Google Scholar
Plimpton SJ, Devine KD (2011) Mapreduce in mpi for large-scale graph algorithms. Parallel Comput 37(9):610–632
Article Google Scholar
Project gutenberg. http://www.gutenberg.org/
Rajasekaran R, Reif J (2007) Handbook of parallel computing: models, algorithms and applications. CRC Press
Samet H (2006) Foundations of multidimensional and metric data structures. In: The Morgan Kaufmann series in computer graphics and geometric modeling. Elsevier/Morgan Kaufmann. http://books.google.ch/books?id=vO-NRRKHG84C
Stanfill C (1990) Partitioned posting files: a parallel inverted file structure for information retrieval. In: Proceedings of the 13th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’90. ACM, New York, pp 413–428. doi:10.1145/96749.98247
Chapter Google Scholar
White T (2009) Hadoop: the definitive guide, 1st edn. O’Reilly
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images, 2nd edn. Morgan Kaufmann, San Francisco
Google Scholar
von Wyl M, Mohamed H, Bruno E, Marchand-Maillet S (2011) A parallel cross-modal search engine over large-scale multimedia collections with interactive relevance feedback. In: Proceedings of the 1st ACM international conference on multimedia retrieval, pp 73:1–73:2
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach, advances in database systems, vol 32. Springer

Download references

Acknowledgements

This work is jointly supported by the Swiss National Science Foundation (SNSF) via the Swiss National Center of Competence in Research (NCCR) on Interactive Multimodal Information Management (IM2) and the European COST Action on Multilingual and Multifaceted Interactive Information Access (MUMIA) via the Swiss State Secretariat for Education and Research (SER).

Author information

Authors and Affiliations

Viper Group, Computer Vision and Multimedia Laboratory, University of Geneva, 7 Route de Drize, Geneva, Switzerland
Hisham Mohamed & Stéphane Marchand-Maillet

Authors

Hisham Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Marchand-Maillet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hisham Mohamed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohamed, H., Marchand-Maillet, S. Distributed media indexing based on MPI and MapReduce. Multimed Tools Appl 69, 513–537 (2014). https://doi.org/10.1007/s11042-012-1283-x

Download citation

Published: 15 November 2012
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11042-012-1283-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed media indexing based on MPI and MapReduce

Abstract

Access this article

Similar content being viewed by others

Enhanced Single-Pass Algorithm for Efficient Indexing Using Hashing in Map Reduce Paradigm

Multimedia Video Information Retrieval Based on MapReduce under Cloud Computing

Time-Quality Tradeoff of MuseHash Query Processing Performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed media indexing based on MPI and MapReduce

Abstract

Access this article

Similar content being viewed by others

Enhanced Single-Pass Algorithm for Efficient Indexing Using Hashing in Map Reduce Paradigm

Multimedia Video Information Retrieval Based on MapReduce under Cloud Computing

Time-Quality Tradeoff of MuseHash Query Processing Performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation