Skip to main content

Quick Browsing of Shared Experience Videos Based on Conversational Field Detection

  • Conference paper
  • First Online:
Mobile Computing, Applications, and Services (MobiCASE 2018)

Abstract

We propose a system to aid the browsing of shared experience data that includes multiple first-person view videos. Using this system, users can avoid the tedious task of searching through lengthy videos. Our system aids browsing by displaying situational information cues on the video seek-bar, and visualizing node graphs showing members participating in the scenes and their approximate location. Users of our system can search and browse events with the help of cues indicating participant names and their locations. We use auditory similarity to detect conversational fields in order to detect the dynamics of groups in crowded areas. We conduct an experiment to evaluate the ability of our system to decrease the time needed for finding specified scenes in lifelog videos. Our experimental results suggest that our system can aid the browsing of videos that include one’s own experiences, but cannot be proven to aid the browsing of unfamiliar data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hall, E.T.: The Hidden Dimension. Doubleday, New York (1966)

    Google Scholar 

  2. Borovoy, R., Martin, F., Vemuri, S., Resnick, M., Silverman, B., Hancock, C.: Meme tags and community mirrors: moving from conferences to collaboration. In: Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work (CSCW 1998), pp. 159–168. ACM, New York (1998)

    Google Scholar 

  3. Wyatt, D., Bilmes, J., Choudhury, T., Kitts, J.A.: Towards the automated social analysis of situated speech data. In: Proceedings of the 10th International Conference on Ubiquitous Computing (UbiComp 2008), pp. 168–171. ACM, New York (2008)

    Google Scholar 

  4. Yoshida, H., Ito, S., Kawaguchi, N.: Evaluation of pre-acquisition methods for position estimation system using wireless LAN. In: Proceedings of the Third International Conference on Mobile Computing and Ubiquitous Networking (ICMU 2006), pp. 148–155 (2006)

    Google Scholar 

  5. Do, T.-M.-T., Gatica-Perez, D.: Contextual grouping: discovering real-life interaction types from longitudinal bluetooth data. In: IEEE 12th International Conference on Mobile Data Management (MDM 2011), vol. 1, pp. 256–265, June 2011

    Google Scholar 

  6. Intille, S.S., Davis, J.W., Bobick, A.F.: Real-time closed-world tracking. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 1997), pp. 697–703, June 1997

    Google Scholar 

  7. McKenna, S.J., Jabri, S., Duric, Z., Rosenfeld, A., Wechsler, H.: Tracking groups of people. Comput. Vis. Image Underst. 80(1), 42–56 (2000)

    Article  Google Scholar 

  8. Kendon, A.: Conducting Interaction: Patterns of Behavior in Focused Encounters. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  9. Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: A game-theoretic probabilistic approach for detecting conversational groups. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 658–675. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16814-2_43

    Chapter  Google Scholar 

  10. Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: Detecting conversational groups in images and sequences: a robust game-theoretic approach. Comput. Vis. Image Underst. 143, 11–24 (2016). Inference and Learning of Graphical Models: Theory and Applications in Computer Vision and Image Analysis

    Article  Google Scholar 

  11. Alameda-Pineda, X., Yan, Y., Ricci, E., Lanz, O., Sebe, N.: Analyzing free-standing conversational groups: a multimodal approach. In: Proceedings of the 23rd ACM International Conference on Multimedia (MM 2015), pp. 5–14. ACM, New York (2015)

    Google Scholar 

  12. Vázquez, M., Steinfeld, A., Hudson, S.E.: Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015), pp. 3010–3017, September 2015

    Google Scholar 

  13. Lane, N.D., Georgiev, P., Qendro, L.: DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015), pp. 283–294. ACM, New York (2015)

    Google Scholar 

  14. Kannan, P.G., Venkatagiri, S.P., Chan, M.C., Ananda, A.L., Peh, L.-S.: Low cost crowd counting using audio tones. In: Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems (SenSys 2012), pp. 155–168. ACM, New York (2012)

    Google Scholar 

  15. Azizyan, M., Constandache, I., Choudhury, R.R.: SurroundSense: mobile phone localization via ambience fingerprinting. In: Proceedings of the 15th Annual International Conference on Mobile Computing and Networking (MobiCom 2009), pp. 261–272. ACM, New York (2009)

    Google Scholar 

  16. Zhang, B., Trott, M.D.: Reference-free audio matching for rendezvous. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pp. 3570–3573, March 2010

    Google Scholar 

  17. Nirjon, S., Dickerson, R., Stankovic, J., Shen, G., Jiang, X.: sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices - a feasibility study. In: Proceedings of the 14th Workshop on Mobile Computing Systems and Applications (HotMobile 2013), pp. 8:1–8:6. ACM, New York (2013)

    Google Scholar 

  18. Tan, W.-T., Baker, M., Lee, B., Samadani, R.: The sound of silence. In: Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems (SenSys 2013), pp. 19:1–19:14. ACM, New York (2013)

    Google Scholar 

  19. Aoki, P.M., Romaine, M., Szymanski, M.H., Thornton, J.D., Wilson, D., Woodruff, A.: The Mad Hatter’s cocktail party: a social mobile audio space supporting multiple simultaneous conversations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2003), pp. 425–432. ACM, New York (2003)

    Google Scholar 

  20. Wirz, M., Roggen, D., Tröster, G.: A wearable, ambient sound-based approach for infrastructureless fuzzy proximity estimation. In: International Symposium on Wearable Computers (ISWC 2010), pp. 1–4, October 2010

    Google Scholar 

  21. Nakakura, T., Sumi, Y., Nishida, T.: Neary: conversational field detection based on situated sound similarity. IEICE Trans. Inf. Syst. E94–D(6), 1164–1172 (2011)

    Article  Google Scholar 

  22. Kopf, J., Cohen, M.F., Szeliski, R.: First-person hyper-lapse videos. ACM Trans. Graph. 33(4), 78:1–78:10 (2014)

    Article  Google Scholar 

  23. Poleg, Y., Halperin, T., Arora, C., Peleg, S.: EgoSampling: fast-forward and stereo for egocentric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 4768–4776 (2015)

    Google Scholar 

  24. Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 1346–1353, June 2012

    Google Scholar 

  25. Arev, I., Park, H.S., Sheikh, Y., Hodgins, J., Shamir, A.: Automatic editing of footage from multiple social cameras. ACM Trans. Graph. 33(4), 81:1–81:11 (2014)

    Article  Google Scholar 

  26. Lee, Y.J., Grauman, K.: Predicting important objects for egocentric video summarization. Int. J. Comput. Vis. 114(1), 38–55 (2015)

    Article  MathSciNet  Google Scholar 

  27. Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 2714–2721, June 2013

    Google Scholar 

  28. Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: Proceedings of the 2011 International Conference on Computer Vision (ICCV 2011), pp. 407–414. IEEE Computer Society, Washington, DC (2011)

    Google Scholar 

  29. Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 3570–3577, June 2013

    Google Scholar 

  30. Cai, M., Kitani, K.M., Sato, Y.: A scalable approach for understanding the visual structures of hand grasps. In: IEEE International Conference on Robotics and Automation (ICRA 2015), pp. 1360–1366, May 2015

    Google Scholar 

  31. Yonetani, R., Kitani, K.M., Sato, Y.: Ego-surfing first person videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), pp. 5445–5454, June 2015

    Google Scholar 

  32. Poleg, Y., Ephrat, A., Peleg, S., Arora, C.: Compact CNN for indexing egocentric videos. In: IEEE Winter Conference on Applications of Computer Vision (WACV 2016), pp. 1–9 (2016)

    Google Scholar 

  33. Higuchi, K., Yonetani, R., Sato, Y.: EgoScanning: quickly scanning first-person videos with egocentric elastic timelines. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI 2017), pp. 6536–6546. ACM, New York (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Toyama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Toyama, K., Sumi, Y. (2018). Quick Browsing of Shared Experience Videos Based on Conversational Field Detection. In: Murao, K., Ohmura, R., Inoue, S., Gotoh, Y. (eds) Mobile Computing, Applications, and Services. MobiCASE 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 240. Springer, Cham. https://doi.org/10.1007/978-3-319-90740-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90740-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90739-0

  • Online ISBN: 978-3-319-90740-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics