Quick Browsing of Shared Experience Videos Based on Conversational Field Detection

Toyama, Kai; Sumi, Yasuyuki

doi:10.1007/978-3-319-90740-6_3

Kai Toyama¹⁹ &
Yasuyuki Sumi¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 240))

Included in the following conference series:

International Conference on Mobile Computing, Applications, and Services

806 Accesses

Abstract

We propose a system to aid the browsing of shared experience data that includes multiple first-person view videos. Using this system, users can avoid the tedious task of searching through lengthy videos. Our system aids browsing by displaying situational information cues on the video seek-bar, and visualizing node graphs showing members participating in the scenes and their approximate location. Users of our system can search and browse events with the help of cues indicating participant names and their locations. We use auditory similarity to detect conversational fields in order to detect the dynamics of groups in crowded areas. We conduct an experiment to evaluate the ability of our system to decrease the time needed for finding specified scenes in lifelog videos. Our experimental results suggest that our system can aid the browsing of videos that include one’s own experiences, but cannot be proven to aid the browsing of unfamiliar data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hall, E.T.: The Hidden Dimension. Doubleday, New York (1966)
Google Scholar
Borovoy, R., Martin, F., Vemuri, S., Resnick, M., Silverman, B., Hancock, C.: Meme tags and community mirrors: moving from conferences to collaboration. In: Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work (CSCW 1998), pp. 159–168. ACM, New York (1998)
Google Scholar
Wyatt, D., Bilmes, J., Choudhury, T., Kitts, J.A.: Towards the automated social analysis of situated speech data. In: Proceedings of the 10th International Conference on Ubiquitous Computing (UbiComp 2008), pp. 168–171. ACM, New York (2008)
Google Scholar
Yoshida, H., Ito, S., Kawaguchi, N.: Evaluation of pre-acquisition methods for position estimation system using wireless LAN. In: Proceedings of the Third International Conference on Mobile Computing and Ubiquitous Networking (ICMU 2006), pp. 148–155 (2006)
Google Scholar
Do, T.-M.-T., Gatica-Perez, D.: Contextual grouping: discovering real-life interaction types from longitudinal bluetooth data. In: IEEE 12th International Conference on Mobile Data Management (MDM 2011), vol. 1, pp. 256–265, June 2011
Google Scholar
Intille, S.S., Davis, J.W., Bobick, A.F.: Real-time closed-world tracking. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 1997), pp. 697–703, June 1997
Google Scholar
McKenna, S.J., Jabri, S., Duric, Z., Rosenfeld, A., Wechsler, H.: Tracking groups of people. Comput. Vis. Image Underst. 80(1), 42–56 (2000)
Article Google Scholar
Kendon, A.: Conducting Interaction: Patterns of Behavior in Focused Encounters. Cambridge University Press, Cambridge (1990)
Google Scholar
Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: A game-theoretic probabilistic approach for detecting conversational groups. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 658–675. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16814-2_43
Chapter Google Scholar
Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: Detecting conversational groups in images and sequences: a robust game-theoretic approach. Comput. Vis. Image Underst. 143, 11–24 (2016). Inference and Learning of Graphical Models: Theory and Applications in Computer Vision and Image Analysis
Article Google Scholar
Alameda-Pineda, X., Yan, Y., Ricci, E., Lanz, O., Sebe, N.: Analyzing free-standing conversational groups: a multimodal approach. In: Proceedings of the 23rd ACM International Conference on Multimedia (MM 2015), pp. 5–14. ACM, New York (2015)
Google Scholar
Vázquez, M., Steinfeld, A., Hudson, S.E.: Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015), pp. 3010–3017, September 2015
Google Scholar
Lane, N.D., Georgiev, P., Qendro, L.: DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015), pp. 283–294. ACM, New York (2015)
Google Scholar
Kannan, P.G., Venkatagiri, S.P., Chan, M.C., Ananda, A.L., Peh, L.-S.: Low cost crowd counting using audio tones. In: Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems (SenSys 2012), pp. 155–168. ACM, New York (2012)
Google Scholar
Azizyan, M., Constandache, I., Choudhury, R.R.: SurroundSense: mobile phone localization via ambience fingerprinting. In: Proceedings of the 15th Annual International Conference on Mobile Computing and Networking (MobiCom 2009), pp. 261–272. ACM, New York (2009)
Google Scholar
Zhang, B., Trott, M.D.: Reference-free audio matching for rendezvous. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pp. 3570–3573, March 2010
Google Scholar
Nirjon, S., Dickerson, R., Stankovic, J., Shen, G., Jiang, X.: sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices - a feasibility study. In: Proceedings of the 14th Workshop on Mobile Computing Systems and Applications (HotMobile 2013), pp. 8:1–8:6. ACM, New York (2013)
Google Scholar
Tan, W.-T., Baker, M., Lee, B., Samadani, R.: The sound of silence. In: Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems (SenSys 2013), pp. 19:1–19:14. ACM, New York (2013)
Google Scholar
Aoki, P.M., Romaine, M., Szymanski, M.H., Thornton, J.D., Wilson, D., Woodruff, A.: The Mad Hatter’s cocktail party: a social mobile audio space supporting multiple simultaneous conversations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2003), pp. 425–432. ACM, New York (2003)
Google Scholar
Wirz, M., Roggen, D., Tröster, G.: A wearable, ambient sound-based approach for infrastructureless fuzzy proximity estimation. In: International Symposium on Wearable Computers (ISWC 2010), pp. 1–4, October 2010
Google Scholar
Nakakura, T., Sumi, Y., Nishida, T.: Neary: conversational field detection based on situated sound similarity. IEICE Trans. Inf. Syst. E94–D(6), 1164–1172 (2011)
Article Google Scholar
Kopf, J., Cohen, M.F., Szeliski, R.: First-person hyper-lapse videos. ACM Trans. Graph. 33(4), 78:1–78:10 (2014)
Article Google Scholar
Poleg, Y., Halperin, T., Arora, C., Peleg, S.: EgoSampling: fast-forward and stereo for egocentric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 4768–4776 (2015)
Google Scholar
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 1346–1353, June 2012
Google Scholar
Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Graph. 33(4), 81:1–81:11 (2014)
Article Google Scholar
Lee, Y.J., Grauman, K.: Predicting important objects for egocentric video summarization. Int. J. Comput. Vis. 114(1), 38–55 (2015)
Article MathSciNet Google Scholar
Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 2714–2721, June 2013
Google Scholar
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: Proceedings of the 2011 International Conference on Computer Vision (ICCV 2011), pp. 407–414. IEEE Computer Society, Washington, DC (2011)
Google Scholar
Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 3570–3577, June 2013
Google Scholar
Cai, M., Kitani, K.M., Sato, Y.: A scalable approach for understanding the visual structures of hand grasps. In: IEEE International Conference on Robotics and Automation (ICRA 2015), pp. 1360–1366, May 2015
Google Scholar
Yonetani, R., Kitani, K.M., Sato, Y.: Ego-surfing first person videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 5445–5454, June 2015
Google Scholar
Poleg, Y., Ephrat, A., Peleg, S., Arora, C.: Compact CNN for indexing egocentric videos. In: IEEE Winter Conference on Applications of Computer Vision (WACV 2016), pp. 1–9 (2016)
Google Scholar
Higuchi, K., Yonetani, R., Sato, Y.: EgoScanning: quickly scanning first-person videos with egocentric elastic timelines. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI 2017), pp. 6536–6546. ACM, New York (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Future University Hakodate, Hokkaido, Japan
Kai Toyama & Yasuyuki Sumi

Authors

Kai Toyama
View author publications
You can also search for this author in PubMed Google Scholar
Yasuyuki Sumi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Toyama .

Editor information

Editors and Affiliations

Ritsumeikan University, Kusatsu-shi, Shiga, Japan
Kazuya Murao
Toyohashi University of Technology, Toyohashi-shi, Aichi, Japan
Ren Ohmura
Kyushu Institute of Technology, Kitakyushu-shi, Fukuoka, Japan
Sozo Inoue
Okayama University, Okayama-shi, Okayama, Japan
Yusuke Gotoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toyama, K., Sumi, Y. (2018). Quick Browsing of Shared Experience Videos Based on Conversational Field Detection. In: Murao, K., Ohmura, R., Inoue, S., Gotoh, Y. (eds) Mobile Computing, Applications, and Services. MobiCASE 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 240. Springer, Cham. https://doi.org/10.1007/978-3-319-90740-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-90740-6_3
Published: 06 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90739-0
Online ISBN: 978-3-319-90740-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics