A Scalable Video Conferencing System Using Cached Facial Expressions

  • Fang-Yu Shih
  • Ching-Ling Fan
  • Pin-Chun Wang
  • Cheng-Hsin HsuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10133)


We propose a scalable video conferencing system that streams High-Definition videos (when bandwidth is sufficient) and ultra-low-bitrate (\({<}0.25\) kbps) cached facial expressions (when the bandwidth is scarce). Our solution consists of optimized approaches to: (i) choose representative facial expressions from training video frames and (ii) match an incoming Webcam frame against the pre-transmitted facial expressions. To the best of our knowledge, such approach has never been studied in the literature. We evaluate the implemented video conferencing system using Webcam videos captured from 9 subjects. Compared to the state-of-the-art scalable codec, our solution: (i) reduces the bitrate by about 130 times when the bandwidth is scarce, (ii) achieves the same coding efficiency when the bandwidth is sufficient, (iii) allows exercising the tradeoff between initialization overhead and coding efficiency, (iv) performs better when the resolution is higher, and (v) runs reasonably fast before extensive code optimization.


Compression Cache Codec Facial landmarks Facial models 


  1. 1.
    Allen, N., Naidoo, B., McDonald, S.: Model-based compression for low-bitrate comms: a statistical approach to facial video encoding. In: Proceedings of Southern Africa Telecommunication Networks and Applications Conference (SATNAC), September 2006Google Scholar
  2. 2.
    Ari, I., Uyar, A., Akarun, L.: Facial feature tracking and expression recognition for sign language. In: Proceedings of International Symposium on Computer and Information Sciences (ISCIS), pp. 1–6, October 2008Google Scholar
  3. 3.
    Baltruvsaitis, T., Robinson, P., Morency, L.: 3D constrained local model for rigid and non-rigid facial tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2012Google Scholar
  4. 4.
    Boyce, J., Ye, Y., Chen, J., Ramasubramonian, A.: Overview of SHVC: scalable extensions of the high efficiency video coding standard. IEEE Trans. Circuits Syst. Video Technol. 26(1), 20–34 (2016)CrossRefGoogle Scholar
  5. 5.
    Cristinacce, D., Cootes, T.: Feature detection and tracking with constrained local models. In: Proceedings of the British Machine Vision Conference (BMVC), September 2006Google Scholar
  6. 6.
    Fasel, B., Luettin, J.: Automatic facial expression analysis: a survey. Pattern Recognit. 36(1), 259–275 (2003)CrossRefzbMATHGoogle Scholar
  7. 7.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  8. 8.
    HEVC Scalability Extension (SHVC) official site.
  9. 9.
    Hypertext transfer protocol (1999).
  10. 10.
    Koufakis, I., Buxton, B.: Very low bit rate face video compression using linear combination of 2D face views and principal components analysis. Image Vis. Comput. 17(14), 1031–1051 (1999)CrossRefGoogle Scholar
  11. 11.
    NVIDIA SHIELD: The best Android TV box (2016).
  12. 12.
    Plan network requirements for Skype for business (2015)., September 2015
  13. 13.
    Qi, X., Yang, Q., Nguyen, D., Zhou, G., Peng, G.: LBVC: towards low-bandwidth video chat on smartphones. In: Proceedings of ACM Multimedia System Conference (MMSys), March 2015Google Scholar
  14. 14.
    RTP: A transport protocol for real-time applications (1996).
  15. 15.
    Suk, M., Prabhakaran, B.: Real-time facial expression recognition on smartphones. In: Proceedings of the IEEE Applications of Computer Vision (WACV), January 2015Google Scholar
  16. 16.
    Video conferencing market to expand at 9.3% CAGR to 2020 thanks to increasing usage in healthcare and defense, July 2015.
  17. 17.
    Wang, J., Cohen, M.: Very low frame-rate video streaming for face-to-face teleconference. In: Data Compression Conference, pp. 309–318. IEEE (2005)Google Scholar
  18. 18.
    Wang, P., Fan, C., Huang, C., Chen, K., Hsu, C.: Towards ultra-low-bitrate video conferencing using facial landmarks. In: Proceedings of ACM Multimedia Conference (MM), October 2016Google Scholar
  19. 19.
    Wang, Z., Lu, L., Bovik, A.: Video quality assessment based on structural distortion measurement. Signal Process. Image Commun. 19(2), 121–132 (2004)CrossRefGoogle Scholar
  20. 20.
    Zeng, W., Yang, M., Cui, Z.: Ultra-low bit rate facial coding hybrid model based on saliency detection. J. Image Graph. 3(1), 25–29 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Fang-Yu Shih
    • 1
  • Ching-Ling Fan
    • 1
  • Pin-Chun Wang
    • 1
  • Cheng-Hsin Hsu
    • 1
    Email author
  1. 1.Department of Computer ScienceNational Tsing Hua UniversityHsin ChuTaiwan

Personalised recommendations