Abstract
Web-scale digital assets comprise millions or billions of documents. Due to such increase, sequential algorithms cannot cope with this data, and parallel and distributed computing become the solution of choice. MapReduce is a programming model proposed by Google for scalable data processing. MapReduce is mainly applicable for data intensive algorithms. In contrast, the message passing interface (MPI) is suitable for high performance algorithms. This paper proposes an adapted structure of the MapReduce programming model using MPI for multimedia indexing. Experimental results are done on various multimedia applications to validate our model. The experiments indicate that our proposed model achieves good speedup compared to the original sequential versions, Hadoop and the earlier versions of MapReduce using MPI.
Similar content being viewed by others
References
Ahmad F, Lee S, Thottethodi M, Vijaykumar TN (2007) Mapreduce with communication overlap. Technical report
Amato G, Savino P (2008) Approximate similarity search in metric spaces using inverted files. In: Proceedings of the 3rd international conference on scalable information systems, InfoScale ’08, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). ICST, Brussels, Belgium, pp 28:1–28:10. http://dl.acm.org/citation.cfm?id=1459693.1459731
Bruno E, Marchand-Maillet S (2009) Multimodal preference aggregation for multimedia information retrieval. J Multimedia 4(5):321–329
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, pp 1–22
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on opearting systems design & implementation, vol 6. USENIX Association, Berkeley, p 10
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR ’09
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae SH, Qiu J, Fox G (2011) Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, HPDC ’10. ACM, pp 810–818. doi:10.1145/1851476.1851593
Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI users’ group meeting, pp 97–104
Gillick D, Faria A, Denero J (2006) Mapreduce: distributed computing for machine learning
Gonzalez E, Figueroa K, Navarro G (2008) Effective proximity retrieval by ordering permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658. doi:10.1109/TPAMI.2007.70815
Gropp W, Lusk E, Skjellum A (1994) Using MPI: portable parallel programming with the message-passing interface. MIT Press, Cambridge
Hoefler T, Lumsdaine A, Dongarra J (2009) Towards efficient mapreduce using mpi. In: Ropo M, Westerholm J, Dongarra J (eds) PVM/MPI, Lecture notes in computer science, vol 5759. Springer, pp 240–249
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on theory of computing, STOC ’98. ACM, New York, pp 604–613. doi:10.1145/276698.276876
Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-based queries. In: Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS ’95. ACM, New York, pp 36–45. doiL:10.1145/212433.212444
Kumar V (2002) Introduction to parallel computing, 2nd edn. Addison-Wesley Longman, Boston
Lu X, Wang B, Zha L, Xu Z (2011) Can mpi benefit hadoop and mapreduce applications? In: 40th international conference on parallel processing workshops (ICPPW), pp 371–379. doi:10.1109/ICPPW.2011.56
McCreadie R, Macdonald C, Ounis I (2011) Mapreduce indexing strategies: studying scalability and efficiency. Inf Process Manag. doi:10.1016/j.ipm.2010.12.003
Message passing interface. http://www.mpi-forum.org/
Patella M, Ciaccia P (2009) Approximate similarity search: a multi-faceted problem. J Discrete Algorithms 7(1):36–48. doi:10.1016/j.jda.2008.09.014
Plimpton SJ, Devine KD (2011) Mapreduce in mpi for large-scale graph algorithms. Parallel Comput 37(9):610–632
Project gutenberg. http://www.gutenberg.org/
Rajasekaran R, Reif J (2007) Handbook of parallel computing: models, algorithms and applications. CRC Press
Samet H (2006) Foundations of multidimensional and metric data structures. In: The Morgan Kaufmann series in computer graphics and geometric modeling. Elsevier/Morgan Kaufmann. http://books.google.ch/books?id=vO-NRRKHG84C
Stanfill C (1990) Partitioned posting files: a parallel inverted file structure for information retrieval. In: Proceedings of the 13th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’90. ACM, New York, pp 413–428. doi:10.1145/96749.98247
White T (2009) Hadoop: the definitive guide, 1st edn. O’Reilly
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes: compressing and indexing documents and images, 2nd edn. Morgan Kaufmann, San Francisco
von Wyl M, Mohamed H, Bruno E, Marchand-Maillet S (2011) A parallel cross-modal search engine over large-scale multimedia collections with interactive relevance feedback. In: Proceedings of the 1st ACM international conference on multimedia retrieval, pp 73:1–73:2
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach, advances in database systems, vol 32. Springer
Acknowledgements
This work is jointly supported by the Swiss National Science Foundation (SNSF) via the Swiss National Center of Competence in Research (NCCR) on Interactive Multimodal Information Management (IM2) and the European COST Action on Multilingual and Multifaceted Interactive Information Access (MUMIA) via the Swiss State Secretariat for Education and Research (SER).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mohamed, H., Marchand-Maillet, S. Distributed media indexing based on MPI and MapReduce. Multimed Tools Appl 69, 513–537 (2014). https://doi.org/10.1007/s11042-012-1283-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1283-x