Abstract
Nowadays, concept detection from multimedia data is considered as an emerging topic due to its applicability to various applications in both academia and industry. However, there are some inevitable challenges including the high volume and variety of multimedia data as well as its skewed distribution. To cope with these challenges, in this paper, a novel framework is proposed to integrate two correlation-based methods, Feature-Correlation Maximum Spanning Tree (FC-MST) and Negative-based Sampling (NS), with a well-known deep learning algorithm called Convolutional Neural Network (CNN). First, FC-MST is introduced to select the most relevant low-level features, which are extracted from multiple modalities, and to decide the input layer dimension of the CNN. Second, NS is adopted to improve the batch sampling in the CNN. Using NUS-WIDE image data set as a web-based application, the experimental results demonstrate the effectiveness of the proposed framework for semantic concept detection, comparing to other well-known classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhu, Q., et al.: Feature selection using correlation and reliability based scoring metric for video semantic detection. In: 2010 IEEE Fourth International Conference on Semantic Computing (ICSC) (2010)
Shyu, M.-L., et al.: Network intrusion detection through adaptive sub-eigenspace modeling in multiagent systems. ACM Trans. Auton. Adapt. Syst. (TAAS) 2(3), 9 (2007)
Shyu, M.-L., et al.: Image database retrieval utilizing affinity relationships. In: Proceedings of the 1st ACM International Workshop on Multimedia Databases (2003)
Shyu, M.-L., et al.: Mining user access behavior on the WWW. In: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, pp. 1717–1722 (2001)
Shyu, M.-L., et al.: Generalized affinity-based association rule mining for multimedia database queries. Knowl. Inf. Syst. (KAIS) 3, 319–337 (2001)
Ha, H.-Y., et al.: Content-based multimedia retrieval using feature correlation clustering and fusion. Int. J. Multimedia Data Eng. Manage. (IJMDEM) 4(5), 46–64 (2013)
Li, X., et al.: An effective content-based visual image retrieval system. In: Proceedings of the 26th IEEE Computer Society International Computer Software and Applications Conference (COMPSAC) (2002)
Huang, X., et al.: User concept pattern discovery using relevance feedback and multiple instance learning for content-based image retrieval. In: Proceedings of the Third International Workshop on Multimedia Data Mining (MDM/KDD), in conjunction with the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Chen, S.-C., et al.: Augmented transition networks as video browsing models for multimedia databases and multimedia information systems. In: Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 175–182 (1999)
Chen, S.-C., et al.: Identifying overlapped objects for video indexing and modeling in multimedia database systems. Int. J. Artif. Intell. Tools 10(4), 715–734 (2001)
Chen, X., et al.: A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval. In: Proceedings of the IEEE International Symposium on Multimedia (ISM), pp. 37–44 (2005)
Ha, H.-Y., Chen, S.-C., Chen, M.: FC-MST: feature correlation maximum spanning tree for multimedia concept classification. In: IEEE International Conference on Semantic Computing (ICSC) (2015)
Ha, H.-Y., Chen, S.-C., Shyu, M.-L.: Negative-based sampling for multimedia retrieval. In: The 16th IEEE International Conference on Information Reuse and Integration (IRI) (2015)
LeCun, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Ruck, D.W., et al.: The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Trans. Neural Netw. 1(4), 296–298 (1990)
Yang, J., Yan, R., Hauptmann, A.G.: Cross-domain video concept detection using adaptive svms. In: Proceedings of the 15th ACM International Conference on Multimedia (2007)
Meng, T., Shyu, M.-L.: Leveraging concept association network for multimedia rare concept mining and retrieval. In: IEEE International Conference on Multimedia and Expo (ICME) (2012)
Ballan, L., et al.: Event detection and recognition for semantic annotation of video. Multimedia Tools Appl. 51(1), 279–302 (2011)
Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: Proceedings of the 26th ACM Annual International Conference on Machine Learning (2009)
Zou, W., et al.: Deep learning of invariant features via simulated fixations in video. In: Advances in Neural Information Processing Systems (2012)
Yang, Y., Shah, M.: Complex events detection using data-driven concepts. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 722–735. Springer, Heidelberg (2012)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia (2014)
Bastien, F., et al.: Theano: new features and speed improvements. arXiv preprint arXiv:1211.5590 (2012)
Krizhevsky, A.: Cuda-convnet (2012). https://code.google.com/p/cuda-convnet/
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Berg, A., Deng, J., Fei-Fei, L.: Large scale visual recognition challenge 2010 (2010). www.imagenet.org/challenges
Donahue, J., et al.: Decaf: a deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531 (2013)
Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Felzenszwalb, P.F., et al.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Snoek, C.G.M., et al.: MediaMill at TRECVID 2013: searching concepts, objects, instances and events in video. In: NIST TRECVID Workshop (2013)
Over, P., et al.: TRECVID 2010: an overview of the goals, tasks, data, evaluation mechanisms, and metrics (2011)
Ngiam, J., et al.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML) (2011)
Wan, J., et al.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the ACM International Conference on Multimedia (2014)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Serre, T., et al.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)
McCann, S., Reesman, J.: Object detection using convolutional neural networks
Lin, L., et al.: Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE MultiMedia 18(3), 32–43 (2011)
Yang, Y., Chen, S.-C., Shyu, M.-L.: Temporal multiple correspondence analysis for big data mining in soccer videos. In: The First IEEE International Conference on Multimedia Big Data (BigMM) (2015)
Chua, T.-S., et al.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval (2009)
Chen, C., et al.: Web media semantic concept retrieval via tag removal and model fusion. ACM Trans. Intell. Syst. Technol. (TIST) 4(4), 61 (2013)
Acknowledgment
This research was supported in part by the U.S. Department of Homeland Security under grant Award Number 2010-ST-062-000039, the U.S. Department of Homeland Security’s VACCINE Center under Award Number 2009-ST-061-CI0001, NSF HRD-0833093, CNS-1126619, and CNS-1461926.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ha, HY., Yang, Y., Pouyanfar, S., Tian, H., Chen, SC. (2015). Correlation-Based Deep Learning for Multimedia Semantic Concept Detection. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9419. Springer, Cham. https://doi.org/10.1007/978-3-319-26187-4_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-26187-4_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26186-7
Online ISBN: 978-3-319-26187-4
eBook Packages: Computer ScienceComputer Science (R0)