Massive Semantic Video Annotation in High-End Customer Service

Fukuda, Ken; Vizcarra, Julio; Nishimura, Satoshi

doi:10.1007/978-3-030-50341-3_4

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12204))

Included in the following conference series:

International Conference on Human-Computer Interaction

3 Citations

Abstract

In high-end hospitality industries such as airline lounges, high star hotels, and high-class restaurants, employee service skills play an important role as an element of the brand identity. However, it is very difficult to train an intermediate employee into an expert employee who can provide higher value services which exceed customers’ expectations. To hire and develop employees who embody the value of the brand, it is necessary to clearly communicate the value of the brand to their employees. In the video analysis domain, especially analyzing human behaviors, an important task is the understanding and representation of human activities such as conversation, physical actions and their connections on the time. This paper addresses the problem of massively annotating video contents such as multimedia training materials, which then can be processed by human-interaction training support systems (such as VR training systems) as resources for content generation. In this paper, we propose a POC (proof of concept) system of a service skill assessing platform, which is a knowledge graph (KG) of high-end service provision videos massively annotated with human interaction semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anne Hendricks, L., et al.: Localizing moments in video with natural language. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5803–5812 (2017)
Google Scholar
Brugman, H., Russel, A., Nijmegen, X.: Annotating multi-media/multi-modal resources with ELAN. In: LREC (2004)
Google Scholar
CEN/TS 16880:2015 Service Excellence
Google Scholar
Chandan, G., Jain, A., Jain, H., et al.: Real time object detection and tracking using deep learning and OpenCV. In: 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 1305–1308. IEEE (2018)
Google Scholar
Das, S., et al.: A new hybrid architecture for human activity recognition from RGB-D videos. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 493–505. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05716-9_40
Chapter Google Scholar
Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1491–1498. IEEE (2009)
Google Scholar
Fuhl, W., et al.: MAM: transfer learning for fully automatic video annotation and specialized detector creation. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 375–388. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_23
Chapter Google Scholar
Huang, K., Delany, S.J., McKeever, S.: Human action recognition in videos using transfer learning. In: IMVIP 2019: Irish Machine Vision & Image Processing, Technological University Dublin, Dublin, Ireland, 28–30 August 2019. https://doi.org/10.21427/mfrv-ah30
Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10133–10142 (2019)
Google Scholar
Nishimura, S., Oota, Y., Fukuda, K.: Ontology construction for annotating skill and situation of airline services to multi-modal data. In: Proceedings of International Conference on Human-Computer Interaction (2020, in press)
Google Scholar
Oliphant, T.E.: Python for scientific computing. Comput. Sci. Eng. 9(3), 10–20 (2007)
Article Google Scholar
Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68234-9_39
Chapter Google Scholar
Song, J., et al.: Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans. Image Process. 25(11), 4999–5011 (2016)
Article MathSciNet Google Scholar
Stuart, F.I., Tax, S.: Toward an integrative approach to designing service experiences lessons learned from the theatre. J. Oper. Manage. 22, 609–627 (2004)
Google Scholar
Thomas, A.O., Antonenko, P.D., Davis, R.: Understanding metacomprehension accuracy within video annotation systems. Comput. Hum. Behav. 58, 269–277 (2016)
Article Google Scholar
Villazon-Terrazas, B., et al.: Knowledge graph foundations. Exploiting Linked Data and Knowledge Graphs in Large Organisations, pp. 17–55. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-45654-6_2
Chapter Google Scholar
Xiao, Y., Chen, J., Wang, Y., Cao, Z., Zhou, J.T., Bai, X.: Action recognition for depth video using multi-view dynamic images. Inf. Sci. 480, 287–304 (2019)
Article Google Scholar
Xu, Y., Dong, J., Zhang, B., Xu, D.: Background modeling methods in video analysis: a review and comparative evaluation. CAAI Trans. Intell. Technol. 1(1), 43–60 (2016)
Article Google Scholar
Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47
Chapter Google Scholar

Download references

Acknowledgments

Part of this work was supported by Council for Science, Technology and Innovation, “Cross-ministerial Strategic Innovation Promotion Program (SIP), Big-data and AI-enabled Cyberspace Technologies” (funding agency: NEDO).

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, 1350064, Japan
Ken Fukuda, Julio Vizcarra & Satoshi Nishimura

Authors

Ken Fukuda
View author publications
You can also search for this author in PubMed Google Scholar
Julio Vizcarra
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nishimura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ken Fukuda .

Editor information

Editors and Affiliations

Missouri University of Science and Technology, Rolla, MO, USA
Fiona Fui-Hoon Nah
Missouri University of Science and Technology, Rolla, MO, USA
Keng Siau

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fukuda, K., Vizcarra, J., Nishimura, S. (2020). Massive Semantic Video Annotation in High-End Customer Service. In: Nah, FH., Siau, K. (eds) HCI in Business, Government and Organizations. HCII 2020. Lecture Notes in Computer Science(), vol 12204. Springer, Cham. https://doi.org/10.1007/978-3-030-50341-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-50341-3_4
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50340-6
Online ISBN: 978-3-030-50341-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics