Audio Content-based Classification

  • Abelhakim Saadane
Part of the Multimedia Systems and Applications Series book series (MMSA, volume 22)


Audio invades lots of fields such as video indexing [Min 98], [Liu 97] audio production [Pai 97], broadcast [Gau 00], [Hu 98], etc. In video indexing, the indexed video segments are correlated to music and speech detection. The indexed video segments, when presented on the Video Sound Browser, let users randomly access the video. In audio production, lots of tools exist in the market, such as Lytha Studios-DMD, Making waves and SpyderNet, however few of them use content-based audio retrieval in the production process. In the broadcast domain, indexing and speech transcription techniques have been proposed for content exploitation and power conservation on mobile computers and conventional architectures.


Discrete Cosine Transform Vocal Tract Audio Data Audio Feature Semantic Classis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Ata 71]
    B.S. Atal and S. L. Hanauer, Speech analysis and synthesis by linear prediction of the speech wave, Journal of the Acoustical Society of America 50, 637–655, 1971.CrossRefGoogle Scholar
  2. [Bah 83]
    L.R. Bahl, F. Jelinek, and R.L. Mercer, A maximum likelihood approach to continuous speech recognition. IEEE Trans. Patt. Anal. Machine Intell. Pami-5, 179–190, 1983.CrossRefGoogle Scholar
  3. [Del 99]
    Delacour P., «Indexation de données audio: reconnaissance de la séquence de locuteur engagés dans une conversation», Thèse de L’ENST, departement Eurécom — 1999.Google Scholar
  4. [Fei 94]
    Feiten B., S. Guznel “Automatic Indexing of a Sound Database Using Self-Organizing Neural Nets”, Computer Music J., Vol. 18, N° 3, 1994, pages 53–65, 1994.CrossRefGoogle Scholar
  5. [Fos 82]
    S. Foster, W. Schloss, A. J. Rockmore “Towards an intelligent Editor of Digital Audio: Data Processing Methods”, Computer Music Journal, Vol. 6, N° 1, pages 42–51, 1982.CrossRefGoogle Scholar
  6. [Fur 81]
    S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. ASSP, ASSP-29, 254–272, 1981.CrossRefGoogle Scholar
  7. [Gau 00]
    Jean-Luc Gauvain, Lori Lamel and Gilles Adda, “Transcribing broadcast news for audio and video indexing, Communications of the ACM”, Volume 43, Issue 2, 2000.Google Scholar
  8. [Haw 93]
    Michael Hawley. Structure out of Sound. PhD thesis, MIT Media Laboratory, 1993.Google Scholar
  9. [Hu 98]
    Qinglong Hu, Dik Lun Lee, Wang-Chien Lee, “A Comparision of Indexing Methods for Data Broadcast on the Air”, Proceedings of the 13th International Conference on Information Networking (ICOIN ′98), Tokyo, Japan, January 21–23, 1998.Google Scholar
  10. [Liu 97]
    Liu Z. & al., “audio feature extraction & Analysis for Scene classification”, IEEE data processing society 1997 workshop on Multimedia data processing, June 23–25, 1997, princeton, new jersey, USA.Google Scholar
  11. [Min 98]
    Kenichi Minami, Akihito Akutsu, Hiroshi Hamada, and Yoshinobu Tonomura, “Video Handling based on Music and Speech Detection”, IEEE MultiMedia, Vol. 5, No. 3, July-September 1998.Google Scholar
  12. [Moo 51]
    Moores C. N. “Datacoding applied to mechanical organization of knowledge” AM. Doc. 2(1951), 20–32.CrossRefGoogle Scholar
  13. [Mpe 99]
    “MPEG-7 Requirements Document”, Doc. ISO/MPEG N2727, MPEG Seoul Meeting, March 1999Google Scholar
  14. [Pai 97]
    Wan-Chieh Pai and Peter C. Doerschuk, “Data Processing Using Statistical Nonlinear Speech Production Models”, 1997 IEEE Workshop on Nonlinear Data and Image Processing, September 8–10, Grand Hotel on Mackinac Island, Michigan USA, 1997.Google Scholar
  15. [Rag 89]
    Raghavan, V., Jung, G., and Bollman, P., “A Critical Investigation of Recall and Precision as Measures”, ACM Transactions on Information Systems 7(3), page 205–229, 1989.CrossRefGoogle Scholar
  16. [Ram 94]
    T. V. Raman “Audio system for Technical Readings”. Doctoral dissertation, Cornell University.Google Scholar
  17. [Rij 79]
    C. J. Keith van Rijsbergen “Information retrieval”, Second edition, London: Butterworths, 1979.Google Scholar
  18. [Sal 83]
    Salton and McGill, 1983. “Introduction to Modern Information Retrieval”, McGraw-Hill, New York, NY.Google Scholar
  19. [Sau 96]
    John Saunders. Real time discrimination of broadcast speech/music. In Proc. 1996 IEEE ICASSP, pages 993–996, Mars 1996.Google Scholar
  20. [Sch 97]
    E. Scheirer et M. Slaney, “construction and evaluation of a robust Multimedia speech/music discriminator”, processing of the 1997 international conference on acoustic speech and data processing (ICASSP), Munich, Germany, April 21–24, 1997.Google Scholar
  21. [Sou 83]
    P. D. Souza «A statistical approach to the design of an adaptative self normalizing silence detector» Proceeding of the IEEE, P678–P684, vol. ASSP-31, N° 3, June 1983.Google Scholar
  22. [Sub 98]
    S.R. Subramanya, A. Youssef, “Wavelet-based Indexing of Audio Data in audio/Multimedia Database”, 4th Int. — Workshop on Multimedia DBMS, Dayton, Ohio, August 1998.Google Scholar
  23. [Sti 95]
    Stiel N., “Multimedia: the new frontier”, Encyclopedia Universalis, 1995, pp. 144–149.Google Scholar
  24. [Wol 96]
    E. Wold, T. Blum, D. Keislar, J. Wheaton «Content-based classification search and retrieval of audio», Proceeding of the IEEE, P27–P36, 1996.Google Scholar
  25. [Pfe 96]
    Silvia Pfeiffer, Stephan Fischer, Wolfgang Effelsberg: Automatic Audio Content Analysis. ACM Multimedia 1996: 21–30.Google Scholar

Copyright information

© Springer Science+Business Media New York 2003

Authors and Affiliations

  • Abelhakim Saadane
    • 1
  1. 1.IRCCyNNantes UniversityFrance

Personalised recommendations