Abstract
This paper introduces and describes a manually generated synchronization ground truth, accurate to the level of the audio sample, for the Jiku Mobile Video Dataset, a dataset containing hundreds of videos recorded by mobile users at different events with drama, dancing and singing performances. It aims at encouraging researchers to evaluate the performance of their audio, video, or multimodal synchronization methods on a publicly available dataset, to facilitate easy benchmarking, and to ease the development of mobile video processing methods like audio and video quality enhancement, analytics and summary generation that depend on an accurately synchronized dataset.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
ATSC. Relative Timing of Sound and Vision for Broadcast Operations (IS-191). Advanced Television Systems Committee (June 2003)
Baluja, S., Covell, M.: Content fingerprinting using wavelets. In: 3rd European Conference on Visual Media Production, CVMP 2006, pp. 198–207 (November 2006)
Duong, N., Howson, C., Legallais, Y.: Fast second screen tv synchronization combining audio fingerprint technique and generalized cross correlation. In: 2012 IEEE International Conference on Consumer Electronics - Berlin (ICCE-Berlin), pp. 241–244 (September 2012)
Guggenberger, M., Lux, M., Boszormenyi, L.: Audioalign - synchronization of A/V-streams based on audio data. In: 2012 IEEE International Symposium on Multimedia (ISM), pp. 382–383 (December 2012)
Guggenberger, M., Lux, M., Böszörmenyi, L.: An analysis of time drift in hand-held recording devices. In: He, X., Xu, C., Tao, D., Luo, S., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part II. LNCS, vol. 8935, pp. 199–209. Springer, Heidelberg (2015)
Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system. In: Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR), Paris, France (2002)
ITU. Relative timing of sound and vision for broadcasting (ITU-R BT.1359-1). International Telecommunication Union (November 1998)
Ke, Y., Hoiem, D., Sukthankar, R.: Computer vision for music identification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 597–604 (June 2005)
Kennedy, L., Naaman, M.: Less talk, more rock: Automated organization of community-contributed collections of concert videos. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 311–320. ACM, New York (2009)
Llagostera Casanovas, A., Cavallaro, A.: Audio-visual events for multi-camera synchronization. Multimedia Tools and Applications, 1–24 (2014)
Mansoo, P., Hoi-Rin, K., Ro, Y.M., Munchurl, K.: Frequency filtering for a highly robust audio fingerprinting scheme in a real-noise environment. IEICE Transactions on Information and Systems 89(7), 2324–2327 (2006)
Moon, S., Skelly, P., Towsley, D.: Estimation and removal of clock skew from network delay measurements. In: Proceedings of the Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies, INFOCOM 1999, vol. 1, pp. 227–234. IEEE (March 1999)
Saini, M., Venkatagiri, S.P., Ooi, W.T., Chan, M.C.: The jiku mobile video dataset. In: Proceedings of the 4th ACM Multimedia Systems Conference, MMSys 2013, pp. 108–113. ACM, New York (2013)
Shankar, S., Lasenby, J., Kokaram, A.: Warping trajectories for video synchronization. In: Proceedings of the 4th ACM/IEEE International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream, ARTEMIS 2013, pp. 41–48. ACM, New York (2013)
Sharma, S., Hussain, A., Saran, H.: Experience with heterogenous clock-skew based device fingerprinting. In: Proceedings of the 2012 Workshop on Learning from Authoritative Security Experiment Results, LASER 2012, pp. 9–18. ACM, New York (2012)
Shrestha, P., Barbieri, M., Weda, H., Sekulovski, D.: Synchronization of multiple camera videos using audio-visual features. IEEE Transactions on Multimedia 12(1), 79–92 (2010)
Shrestha, P., Weda, H., Barbieri, M., Sekulovski, D.: Synchronization of multiple video recordings based on still camera flashes. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, MULTIMEDIA 2006, pp. 137–140. ACM, New York (2006)
Shrstha, P., Barbieri, M., Weda, H.: Synchronization of multi-camera video recordings based on audio. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA 2007, pp. 545–548. ACM, New York (2007)
Whitehead, A., Laganiere, R., Bose, P.: Temporal synchronization of video sequences in theory and in practice. In: Seventh IEEE Workshops on Application of Computer Vision, WACV/MOTIONS 2005, vol. 1, 2, pp. 132–137 (January 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Guggenberger, M., Lux, M., Böszörmenyi, L. (2015). A Synchronization Ground Truth for the Jiku Mobile Video Dataset. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8936. Springer, Cham. https://doi.org/10.1007/978-3-319-14442-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-14442-9_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14441-2
Online ISBN: 978-3-319-14442-9
eBook Packages: Computer ScienceComputer Science (R0)