International Journal of Speech Technology

, Volume 17, Issue 3, pp 259–269 | Cite as

Segmentation, indexing and retrieval of TV broadcast news bulletins using Gaussian mixture models and vector quantization codebooks

  • K. Sreenivasa Rao
  • Ketan Pachpande


In this paper we proposed two-stage segmentation approach for splitting the TV broadcast news bulletins into sequence of news stories and codebooks derived from vector quantization are used for retrieving the segmented stories. At the first stage of segmentation, speaker (news reader) specific characteristics present in initial headlines of news bulletin are used for gross level segmentation. During second stage, errors in the gross level segmentation (first stage) are corrected by exploiting the speaker specific information captured from the individual news stories other than headlines. During headlines the captured speaker specific information is mixed with background music, and hence the segmentation at the first stage may not be accurate. In this work speaker specific information is represented by using mel frequency cepstral coefficients, and captured by Gaussian mixture models (GMMs). The proposed two-stage segmentation method is evaluated on manual segmented broadcast TV news bulletins. From the evaluation results, it is observed that about 93 % of the news stories are correctly segmented, 7 % are missed and 6 % are spurious. For navigating the bulletins, a quick navigation indexing method is developed based on speaker change points. Performance of the proposed two-stage segmentation and quick navigation methods are evaluated using GMM and neural networks models. For retrieving the target news stories from news corpus, sequence of codebook indices derived from vector quantization is explored. Proposed retrieval approach is evaluated using queries of different sizes. Evaluation results indicating that the retrieval accuracy is proportional to size of the query.


TV broadcast news Two-stage segmentation Gaussian mixture model Speaker specific information Autoassociative neural network model Vector quantization codebooks Audio indexing and retrieval 


  1. Antonelli, M., Rizzi, A., & del Vescovo, G. (2010, Dec). A query by humming system for music information retrieval. In: Intelligent Systems Design and Applications (ISDA), 10th International Conference (pp.586–591).Google Scholar
  2. Bengherabi, M., & Sehad, A. (Apr. 2006). Development and evaluation of automatic-speaker based-audio identification and segmentation for broadcast news recordings indexation. Information and Communication Technologies, 1, 1230–1235.Google Scholar
  3. Butko, T., & Nadeu, C. (2011, May). Audio segmentation of broadcast news: A hierarchical system with feature selection for the Albayzin-2010 evaluation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (pp.357–360)Google Scholar
  4. Chen, S., & Gopalakrishnan, P. S. (1998). Speaker, environment and channel change detection and clustering via the bayesian information criterion. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop.Google Scholar
  5. Delacourt, P., & Wellekens, C. (2000). Distbic: A speaker-based segmentation for audio data indexing. Speech Communication, 32(12), 111–126.CrossRefGoogle Scholar
  6. Dhananjaya, N., Prasad, S. G., and Yegnanarayana, B. (2004, Nov). Speaker segmentation based on subsegmental features and neural network models, 11th International Conference on Neural Information Processing (ICONIP-2004), vol. 50, (pp.1210–1215).Google Scholar
  7. Dhananjaya, N., & Yegnanarayana, B. (2008). Speaker change detection in casual conversations using excitation source features. Speech Communication, 50, 153–161.CrossRefGoogle Scholar
  8. Foote, J. (2000). Automatic audio segmentation using a measure of audio novelty. In: Proceedings of International Conference on Multimedia and Expo, textit1, (pp.452–455)Google Scholar
  9. Gish, H., Siu, M.-H., & Rohlicek, R. (1991). Segregation of speakers for speech recognition and speaker identification. In Proceedings of IEEE International Conference acoust, speech and signal processing, 2,(pp.873–876).Google Scholar
  10. Hauptmann, A.G., and Witbrock, M. J. (1998, April). Story segmentation and detection of commercials in broadcast news video. In Proceedings of IEEE International Forum Research and Technology Advances in Digital Libraries, Santa Barbara, CA, USA (pp.168–179)Google Scholar
  11. He, Q.-H., Yang, J.-C., Li, Y.-X., He, J., Zhang, X.-Y., & Li, W. (2010). Combining GMM, Jensen’s inequality and BIC for speaker indexing. Electronics Letters, 46(654–655), 29.Google Scholar
  12. Huang, R., & Hansen, J. H. L. (2004, May). Advances in unsupervised audio segmentation for the broadcast news and ngsw corpora. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 1, (pp.741–744).Google Scholar
  13. Karydis, A. P.: I Nanopoulos and Y. Manolopoulos. (2005, Jan). Audio indexing for efficient music information retrieval. In: Multimedia Modelling Conference, (pp.22–29)Google Scholar
  14. Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000). Strategies for automatic segmentation of audio data. In Proceedings of IEEE International Conference Acoust Speech Signal Processing, 3, 1423–1426.Google Scholar
  15. Kotti, M., Benetos, E., & Kotropoulos, C. (2008). Computationally efficient and robust bic-based speaker segmentation. IEEE Transactions on Audio, Speech and Language Processing, 16, 920–933.CrossRefGoogle Scholar
  16. Lei, W.: Unsupervised techniques for audio content analysis and summarization. PhD thesis, School of Computer Engineering, Nanyang Technological University, Singapore, May 2008.Google Scholar
  17. Li, D., Sethi, I. K., Dimitrova, N., & McGee, T. (Apr. 2001). Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22, 533–544. Google Scholar
  18. Lu, L, Jiang, H., & Zhang, H. J. (2001, Oct). A robust audio classification and segmentation method. In: Proceedings of the ninth ACM International Conference in Multimedia, Ottawa, Canada (pp.203–211).Google Scholar
  19. Lu, G. (2001). Indexing and retrieval of audio: A survey. Multimedia Tools and Applications, 15, 269–290.CrossRefMATHGoogle Scholar
  20. Makhoul, J., Kubala, F., Leek, T., Leu, D., Nguyen, L., Schwartz, R., et al. (2000). Speech and language technologies for audio indexing and retrieval. Processing of the IEEE, 88, 1338–1353.CrossRefGoogle Scholar
  21. Meinedo, H., & Neto, J. (2003, April). Audio segmentation, classification and clustering in a broadcast news task. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 2, (pp.5–8.)Google Scholar
  22. Nwe, T. L., & Li, H. (2005, Mar). Broadcast news segmentation by audio type analysis. In: Proceedings of IEEE International Conference on Acoustics Speech snd Signal Processing 2, (pp.1065–1068).Google Scholar
  23. Ohtsuki, K., Bessho, K., Matsuo, Y., Matsunaga, S., & Hayashi, Y. (2006). Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news. Signal Processing Magazine IEEE, 23, 69–78.CrossRefGoogle Scholar
  24. Perez-Freire, L., & Garcia-Mateo, C. (2004, May). A multimedia approach for audio segmentation in TV broadcast news. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1.Google Scholar
  25. Rao, K. S., Pachpande, K., Reddy, V. R., & Maity, S. (2012, Feb). Segmentation of tv broadcast news using speaker specific information. In: National Conference on Communications (NCC-2012), IIT Kharagpur, Kharagpur, India.Google Scholar
  26. Reiss, J., Aucouturier, J. J., & Sandler, M. (2001). Efficient multidimensional searching routines for music information retrieval. International Society of Musical, Information Retrieval, pp.163–171, 2001.Google Scholar
  27. Renals, S., Abberley, D., Kirby, D., & Robinson, T. (2000). Indexing and retrieval of broadcast news. Speech Communication, 32, 5–20.CrossRefGoogle Scholar
  28. Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20(6), 1894–1903.CrossRefGoogle Scholar
  29. Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology (Springer), 16(2), 229–235.CrossRefGoogle Scholar
  30. Wu, C.-H., & Hsieh, C.-H. (Mar. 2006). Multiple change-point audio segmentation and classification using an MDL-based Gaussian model. IEEE Transactions on Audio, Speech, and Language Processing, 14, 647–657.Google Scholar
  31. Xue, H., Li, H., Gao, C., & Shi, Z. (2010). Computationally efficient audio segmentation through a multi-stage bic approach. In 3rd International Congress on Image and Signal Processing (CISP), 8, (pp.3774–3777).Google Scholar
  32. Zhang, T., & Kuo, C.-C. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9, 441–457.CrossRefGoogle Scholar
  33. Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.CrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.School of Information TechnologyIndian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations