Abstract
We propose a system to aid the browsing of shared experience data that includes multiple first-person view videos. Using this system, users can avoid the tedious task of searching through lengthy videos. Our system aids browsing by displaying situational information cues on the video seek-bar, and visualizing node graphs showing members participating in the scenes and their approximate location. Users of our system can search and browse events with the help of cues indicating participant names and their locations. We use auditory similarity to detect conversational fields in order to detect the dynamics of groups in crowded areas. We conduct an experiment to evaluate the ability of our system to decrease the time needed for finding specified scenes in lifelog videos. Our experimental results suggest that our system can aid the browsing of videos that include one’s own experiences, but cannot be proven to aid the browsing of unfamiliar data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hall, E.T.: The Hidden Dimension. Doubleday, New York (1966)
Borovoy, R., Martin, F., Vemuri, S., Resnick, M., Silverman, B., Hancock, C.: Meme tags and community mirrors: moving from conferences to collaboration. In: Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work (CSCW 1998), pp. 159–168. ACM, New York (1998)
Wyatt, D., Bilmes, J., Choudhury, T., Kitts, J.A.: Towards the automated social analysis of situated speech data. In: Proceedings of the 10th International Conference on Ubiquitous Computing (UbiComp 2008), pp. 168–171. ACM, New York (2008)
Yoshida, H., Ito, S., Kawaguchi, N.: Evaluation of pre-acquisition methods for position estimation system using wireless LAN. In: Proceedings of the Third International Conference on Mobile Computing and Ubiquitous Networking (ICMU 2006), pp. 148–155 (2006)
Do, T.-M.-T., Gatica-Perez, D.: Contextual grouping: discovering real-life interaction types from longitudinal bluetooth data. In: IEEE 12th International Conference on Mobile Data Management (MDM 2011), vol. 1, pp. 256–265, June 2011
Intille, S.S., Davis, J.W., Bobick, A.F.: Real-time closed-world tracking. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 1997), pp. 697–703, June 1997
McKenna, S.J., Jabri, S., Duric, Z., Rosenfeld, A., Wechsler, H.: Tracking groups of people. Comput. Vis. Image Underst. 80(1), 42–56 (2000)
Kendon, A.: Conducting Interaction: Patterns of Behavior in Focused Encounters. Cambridge University Press, Cambridge (1990)
Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: A game-theoretic probabilistic approach for detecting conversational groups. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 658–675. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16814-2_43
Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: Detecting conversational groups in images and sequences: a robust game-theoretic approach. Comput. Vis. Image Underst. 143, 11–24 (2016). Inference and Learning of Graphical Models: Theory and Applications in Computer Vision and Image Analysis
Alameda-Pineda, X., Yan, Y., Ricci, E., Lanz, O., Sebe, N.: Analyzing free-standing conversational groups: a multimodal approach. In: Proceedings of the 23rd ACM International Conference on Multimedia (MM 2015), pp. 5–14. ACM, New York (2015)
Vázquez, M., Steinfeld, A., Hudson, S.E.: Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015), pp. 3010–3017, September 2015
Lane, N.D., Georgiev, P., Qendro, L.: DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015), pp. 283–294. ACM, New York (2015)
Kannan, P.G., Venkatagiri, S.P., Chan, M.C., Ananda, A.L., Peh, L.-S.: Low cost crowd counting using audio tones. In: Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems (SenSys 2012), pp. 155–168. ACM, New York (2012)
Azizyan, M., Constandache, I., Choudhury, R.R.: SurroundSense: mobile phone localization via ambience fingerprinting. In: Proceedings of the 15th Annual International Conference on Mobile Computing and Networking (MobiCom 2009), pp. 261–272. ACM, New York (2009)
Zhang, B., Trott, M.D.: Reference-free audio matching for rendezvous. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pp. 3570–3573, March 2010
Nirjon, S., Dickerson, R., Stankovic, J., Shen, G., Jiang, X.: sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices - a feasibility study. In: Proceedings of the 14th Workshop on Mobile Computing Systems and Applications (HotMobile 2013), pp. 8:1–8:6. ACM, New York (2013)
Tan, W.-T., Baker, M., Lee, B., Samadani, R.: The sound of silence. In: Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems (SenSys 2013), pp. 19:1–19:14. ACM, New York (2013)
Aoki, P.M., Romaine, M., Szymanski, M.H., Thornton, J.D., Wilson, D., Woodruff, A.: The Mad Hatter’s cocktail party: a social mobile audio space supporting multiple simultaneous conversations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2003), pp. 425–432. ACM, New York (2003)
Wirz, M., Roggen, D., Tröster, G.: A wearable, ambient sound-based approach for infrastructureless fuzzy proximity estimation. In: International Symposium on Wearable Computers (ISWC 2010), pp. 1–4, October 2010
Nakakura, T., Sumi, Y., Nishida, T.: Neary: conversational field detection based on situated sound similarity. IEICE Trans. Inf. Syst. E94–D(6), 1164–1172 (2011)
Kopf, J., Cohen, M.F., Szeliski, R.: First-person hyper-lapse videos. ACM Trans. Graph. 33(4), 78:1–78:10 (2014)
Poleg, Y., Halperin, T., Arora, C., Peleg, S.: EgoSampling: fast-forward and stereo for egocentric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 4768–4776 (2015)
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 1346–1353, June 2012
Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Graph. 33(4), 81:1–81:11 (2014)
Lee, Y.J., Grauman, K.: Predicting important objects for egocentric video summarization. Int. J. Comput. Vis. 114(1), 38–55 (2015)
Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 2714–2721, June 2013
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: Proceedings of the 2011 International Conference on Computer Vision (ICCV 2011), pp. 407–414. IEEE Computer Society, Washington, DC (2011)
Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 3570–3577, June 2013
Cai, M., Kitani, K.M., Sato, Y.: A scalable approach for understanding the visual structures of hand grasps. In: IEEE International Conference on Robotics and Automation (ICRA 2015), pp. 1360–1366, May 2015
Yonetani, R., Kitani, K.M., Sato, Y.: Ego-surfing first person videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 5445–5454, June 2015
Poleg, Y., Ephrat, A., Peleg, S., Arora, C.: Compact CNN for indexing egocentric videos. In: IEEE Winter Conference on Applications of Computer Vision (WACV 2016), pp. 1–9 (2016)
Higuchi, K., Yonetani, R., Sato, Y.: EgoScanning: quickly scanning first-person videos with egocentric elastic timelines. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI 2017), pp. 6536–6546. ACM, New York (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Toyama, K., Sumi, Y. (2018). Quick Browsing of Shared Experience Videos Based on Conversational Field Detection. In: Murao, K., Ohmura, R., Inoue, S., Gotoh, Y. (eds) Mobile Computing, Applications, and Services. MobiCASE 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 240. Springer, Cham. https://doi.org/10.1007/978-3-319-90740-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-90740-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90739-0
Online ISBN: 978-3-319-90740-6
eBook Packages: Computer ScienceComputer Science (R0)