Computer vision technology plays a key role in diverse multimedia applications, including surveillance, environmental monitoring, smart space, and so on. These has led to a massive research effort devoted to research challenges in the development of computer vision algorithms for managing, processing, analyzing, and interpreting the multimedia data collected. The aim of this special issue is to consolidate the recent research achievements that address the broad challenges in computer vision technologies with a specific focus in multimedia applications.

This special issue received more than 30 high quality submissions from over 10 countries. All submitted papers were peer-reviewed by at least three independent reviewers. In the end, only 10 papers were finally included in this special issue.

The first two papers addressed mobile multimedia. Xu et al. studied the problem of human position recommendation in mobile photographing and proposed a learning-based method to summarize the photographing knowledge from massive social images. Lopez et al. studied how to build mosaic images of printed documents and natural scenes from low resolution video frames

The third and fourth paper tackled motion analysis. Hsu et al. presented a multimedia presentation system using a 3D gesture interface in museums. Li and Sun proposed a generative approach in the framework of evolutionary computation for tracking human pose in high dimensional pose state space.

The fifth, sixth and seventh paper studied image enhancement. Lagodzinski and Smolka proposed an image colorization scheme that takes advantage of the modified morphological distance transform to propagate the color. Xu et al. provided an extensive survey on image contrast measures. Hua et al. proposed an image denoising method based on two dimensional finite impulse response filtering.

The last three papers discussed applied computer vision techniques. Li et al. proposed a comic page segmentation approach with the aim to automatically decompose scanned comic images into storyboards. Hu et al. proposed an image defect detection approach for the steel images. Yang et al. proposed a fast localization-verification scheme for video text detection and recognition, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate, followed by an image entropy-based filter for verification and a machine learning approach for recognition.

We close by thanking authors for their submissions, reviewers for their constructive comments, Prof. Borko Furht, Editor-in-Chief of MTAP, and all publishing staffs for guiding us through the whole process.