Index Point Detection and Semantic Indexing of Videos—A Comparative Review

  • Mehul MahrishiEmail author
  • Sudha Morwal
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1154)


Primarily used for fun and entertainment, videos are now a motivation behind social, commercial, and business activities. It is presumed that by 2025, about 75% of all Internet traffic will be of videos. In education, videos are a source of learning. Study Webs of Active Learning for Young Aspiring Minds (SWAYAM), National Programme on Technology Enhanced Learning (NPTEL), Massive Open Online Courses (MOOCs), Coursera, and many other similar platforms provide not only courseware but also beyond the curriculum contents apart from the conventional syllabi. Even at the junior level, Byju’s and similar educational portals are witnessing an explosive growth in video contents. Despite that we are now able to extract semantic features from images, video sequences and besides being ubiquitous in nature, video lectures have a limitation of smooth navigation between topics. Through this paper, we want to throw light on existing automated video indexing approaches and their prerequisites that are recently proposed. We tried to analyze them based on some existing measures.


E-learning Lecture videos Video segmentation Video indexing Text similarity Video analysis 


  1. 1.
    Abowd, G.D.: Classroom 2000: an experiment with the instrumentation of a living educational environment. IBM Syst. J. 38, 508–530 (2000)Google Scholar
  2. 2.
    Harley, D., Henke, J., Lawrence, S., Mc- Martin, F., Maher, M., Gawlik, M., Muller, P.: Costs, culture, and complexity: an analysis of technology enhancements in a large lecture course, at UC Berkeley, Cent. Stud. High. Educ. (2003)Google Scholar
  3. 3.
    Tuna, T., et al.: Indexed captioned searchable videos: a learning companion for STEM coursework. J. Sci. Educ. Technol. 26(1), 8299 (2017)CrossRefGoogle Scholar
  4. 4.
    Yang, Z.-J.Z.: Exploiting web images for semantic video indexing via robust sample-specific loss. IEEE Trans. Multimedia 16(6) (2014)Google Scholar
  5. 5.
    Lin, M., Nunamaker, J J.F., Chau, M., Chen, H.: Segmentation of lecture videos based on text: a method combining multiple linguistic features, J. System Sciences, 2004. Proceedings of the 37th IEEE Annual Hawaii International Conference, p. 9( 2004)Google Scholar
  6. 6.
    Percannella, G., Sorrentino, D., Vento, M.: Automatic indexing of news videos through text classification techniques. In: Pattern Recognition and Image Analysis, vol. 3687 of Lecture Notes in Computer Science, pp. 512–521. Springer, Berlin (2005)Google Scholar
  7. 7.
    Biswas, A., Gandhi, A., Deshmukh, O.: MM-TOC: a multi modal method for table of content creation in educational videos. In: Proceedings of the 23rd ACM International Conference on Multi-media, ACM, New York, NY, USA, pp. 621–630 (2015)Google Scholar
  8. 8.
    Tippaya, S., et al.: Video shot boundary detection based on candidate segment selection and transition pattern analysis. In: IEEE International Conference on Digital Signal Processing (DSP) (2015)Google Scholar
  9. 9.
    Sze, K.W., Lam, K.M., Qiu, G.: A new key frame representation for video segment retrieval. IEEE Trans. Circuits Syst. Video Technol. 15(9), 11481155 (2005)Google Scholar
  10. 10.
    Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3(1), 3 (2007)Google Scholar
  11. 11.
    Besiris, D., Laskaris, N., Fotopoulou, F., Economou, G.: Key frame extraction in video sequences: a vantage points approach. In: 2007 IEEE 9th Workshop on Multimedia Signal Processing, pp. 434–437 (2007)Google Scholar
  12. 12.
    Mukherjee, D.P., Das, S.K., Saha, S.: Key frame estimation in video using randomness measure of feature point pattern. IEEE Trans. Circuits Syst. Video Technol. 17(5), 612–620 (2007)Google Scholar
  13. 13.
    Tippaya, S., et al.: Multi-modal visual features-based video shot boundary detection. IEEE Access 5, 12563–12575 (2017)Google Scholar
  14. 14.
    Rashmi, B., et al.: Abrupt shot detection in video using weighted edge information. In: ICIA (2016)Google Scholar
  15. 15.
    Xu, J., et al.: Shot boundary detection using convolutional neural networks VCIP (2016)Google Scholar
  16. 16.
    Lin, G.S., Chang, M.-K., Chiu, S.-T.: Video-shot transition detection using spatio temporal analysis and fuzzy classification. In: Proceedings of the Circuits and Systems (2009)Google Scholar
  17. 17.
    Ling, X., Yuanxin, O., Huan, L., Zhang, X.: A method for fast shot boundary detection based on SVM. In: Proceedings of the Second International Congress on the Image and Signal Processing. CISP 08, pp. 445–449 (2008)Google Scholar
  18. 18.
    Lakshmi Priya, G.G., Domnic, S.: Edge strength extraction using orthogonal vectors for shot boundary detection. Proc. Technol. 6, 247–254 (2012)Google Scholar
  19. 19.
    Ren, J., Jiang, J., Chen, J.: Shot boundary detection in MPEG videos using local and global indicators. IEEE Trans. Circuits Syst. Video Technol. 19(8) (2009)Google Scholar
  20. 20.
    Huang, C.-L., Liao, B,-Y.: A robust scene change detection method for video segmentation circuits and systems for video technology. IEEE Trans. 1(12), 1281–1288 (2001)Google Scholar
  21. 21.
    Picard, R.W.: Affective computing MIT, Media Laboratory Perceptual Computing Section. Tech. Rep. 321 (1995)Google Scholar
  22. 22.
    Ahanger, G., Little, T.D.: A survey of technologies for parsing and indexing digital video. J. Vision Commun. Image Rep. 7(1), 28–43 (1996)Google Scholar
  23. 23.
    Project Document, National Programme on Technology Enhanced Learning (NPTEL) (2003–2007)Google Scholar
  24. 24.
    Krishnan, M.S., et al.: Text transcription of technical video lectures and creation of search able video index. Metadata and Online Quizzes, Project Proposal (2013)Google Scholar
  25. 25.
    Sauli, F., Cattaneo, A., vander Meij, H.: Hypervideo for educational purposes: a literature review on a multifaceted technological tool. J. Technol. Pedagogy Educ. (2017)Google Scholar
  26. 26.
    Lienhart, R.: Reliable dissolve detection. In: Proceedings of the SPIE Storage Retrieval Media Database, vol. 4315, 219–230 (2001)Google Scholar
  27. 27.
    Bi, C., et al.: Dynamic mode decomposition based video shot detection. IEEE J. Transl. Content Mining 6, 2169–3536 (2018)Google Scholar
  28. 28.
    Yang, H., Siebert, M., Lhne, P., Sack, H., Meinel, C.: Lecture video indexing and analysis using video OCR technology. In: 7th International Conference on Signal Image Technology and Internet Based Systems (2011)Google Scholar
  29. 29.
    Liu, F., Wan, Y.: Improving the video shot boundary detection using the HSV color space and image subsampling 7th International Conference on Advanced Computational Intelligence (2015)Google Scholar
  30. 30.
    Hannane, R., Elboush, A.: An efficient method for video shot boundary detection and key frame extraction using SIFT-point distribution histogram. Int. J. Multimedia Info. Retrieval 5(2), 89–104 (2016)Google Scholar
  31. 31.
    Thounaojam, D.M., et al.: A genetic algorithm and fuzzy logic approach for video shot boundary detection. Comput. Intell. Neurosci. (2016)Google Scholar
  32. 32.
    Shen, et al.: Automatic detection of video shot boundary in social media using a hybrid approach of HLFPN and key point matching. IEEE Trans. Comput. Soc. Syst. 5(1) (2018)Google Scholar
  33. 33.
    Tippaya, S., et al.: A study of discriminant visual descriptors for sport video shot boundary detection. In: 10th Asian Control Conference (ASCC) (2015)Google Scholar
  34. 34.
    Shekar, B.H., et al.: Shot boundary detection using correlation based spectral residual saliency map. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2016)Google Scholar
  35. 35.
    Thounaojam, D.M., et al.: Shot boundary detection using perceptual and semantic information. Int. J. Multimedia Inf. Retrieval 6(2), 167–174 (2017)Google Scholar
  36. 36.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on KDD (1996)Google Scholar
  37. 37.
    Dokmanic, I., et al.: Euclidean distance matrices: essential theory, algorithms, and applications. IEEE Sign. Process. Maga. 32(6) (2015)Google Scholar
  38. 38.
    Yates, R.B., Neto, B.R.: Modern Information Retrieval. Addison-Wesley, New York (1999)Google Scholar
  39. 39.
    Huang, A.: Similarity measures for text document clustering. New Zealand Computer Science Research Student Conference (2008)Google Scholar
  40. 40.
    Lienhart, R., et al.: Automatic text segmentation and text recognition for video indexing. Multimedia Syst. 8, 69–81 (2000)Google Scholar
  41. 41.
    Guidelines for TRECVID, October (2016)Google Scholar
  42. 42.
    Sahare, P., et al.: Review of text extraction algorithms for scene text and document images. IETE Tech. Rev. 34(2), 144–164 (2017)CrossRefGoogle Scholar
  43. 43.
    Truong, B.T., Venkatesh, S. : Video abstraction: a systematic review and classification. ACM Trans. Multimedia Comput. Commun. Appl. 3, 1, Article 3 (2007)Google Scholar
  44. 44.
    Shrivakshan, G., Chandrasekar, C.: A comparison of various edge detection techniques used in image processing, I. J. Comput. Sci. Issues 9, 5, 269–276 (2012)Google Scholar
  45. 45.
    Samuel, A.L.: Some studies in machine learning using the game of checkers IBM. J. Res. Development 3(3), 210229 (1959)Google Scholar
  46. 46.
    Diken, G., et al.: A review on feature extraction for speaker recognition under degraded conditions. IEEE Techn. Rev. pp. 321–332 (2016)Google Scholar
  47. 47.
    Kumar, R.: Speaker verification from short utterance perspective: a review. IETE Tech. Rev. (2017)Google Scholar
  48. 48.
    Adcock, J., et al.: TalkMiner: a lecture webcast search engine. MM10, October 25–29 (2010)Google Scholar
  49. 49.
    Camastra, F., Vinciarelli.: A Video segmentation and keyframe extraction. In: Machine Learning for Audio, Image and Video Analysis, Advanced Information and Knowledge Processing, pp. 413–430. Springer, London (2008)Google Scholar
  50. 50.
    Abdel-Mottaleb, M., et al.: CONIVAS: CONtent based Image and video access system. ACM Multimedia (1996)Google Scholar
  51. 51.
    Hearst, M.A.: TextTiling: A Quantitative Approach to Discourse Segmentation Technical Report. University of California at Berkeley, Berkeley, CA, USA (1993)Google Scholar
  52. 52.
    Sebastiani, F., et al.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), pp. 1–47(2002)Google Scholar
  53. 53.
    Nandzik, J., et al.: CONTENTUS Technologies for Next Generation Multimedia Libraries Automatic Multimedia Processing for Semantic Search. Springer Science Business Media, LLC (2012)Google Scholar
  54. 54.
    Haberdar, H., Shah, S.K.: Change detection in dynamic scenes using local adaptive transform, British Machine Vision Conference, p. 6. BMVA Press (2013)Google Scholar
  55. 55.
    Haberdar, H., Shah, S.K.: Video synchronization as one-class learning 27th Conference on Image and Vision Computing New Zealand, pp. 469–474. ACM (2012)Google Scholar
  56. 56.
    Baraldi, L., et al.: A browsing and retrieval system for broadcast videos using scene detection and automatic annotation MM (2016)Google Scholar
  57. 57.
    Podlesnaya, A., et al.: Deep learning based semantic video indexing and retrieval. In: Proceedings of SAI Intelligent Systems Conference (IntelliSys), pp. 359–372 (2016)Google Scholar
  58. 58.
    Yue-Hei Ng, J., et al.: Beyond short snippets: deep networks for video classification CVPR (open access) (2015)Google Scholar
  59. 59.
    Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  60. 60.
    Baraldi, L., et al.: Neural story: an interactive multimedia system for video indexing and re-use. In: Proceedings of the 15th International Workshop on CBMI ’17, Article No. 21 Florence, Italy (2017)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Swami Keshvanand Institute of TechnologyJaipurIndia

Personalised recommendations