Advertisement

Nearest-Neighbor Automatic Sound Annotation with a WordNet Taxonomy

  • Pedro Cano
  • Markus Koppenberger
  • Sylvain Le Groux
  • Julien Ricard
  • Nicolas Wack
  • Perfecto Herrera
Article

Abstract

Sound engineers need to access vast collections of sound effects for their film and video productions. Sound effects providers rely on text-retrieval techniques to give access to their collections. Currently, audio content is annotated manually, which is an arduous task. Automatic annotation methods, normally fine-tuned to reduced domains such as musical instruments or limited sound effects taxonomies, are not mature enough for labeling with great detail any possible sound. A general sound recognition tool would require first, a taxonomy that represents the world and, second, thousands of classifiers, each specialized in distinguishing little details. We report experimental results on a general sound annotator. To tackle the taxonomy definition problem we use WordNet, a semantic network that organizes real world knowledge. In order to overcome the need of a huge number of classifiers to distinguish many different sound classes, we use a nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts. A 30% concept prediction is achieved on a database of over 50,000 sounds and over 1600 concepts.

Keywords

audio identification WordNet nearest-neighbor everyday sound knowledge management 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnard, K. et al. (2003). Matching Words and Pictures. Journal of Machine Learning Research, {3}, 1107–1135.Google Scholar
  2. Cano, P. et al. (2004a). Perceptual and Semantic Management of Sound Effects with a WordNet-based Taxonomy. In Proc. of the ICETE, Setúbal, Portugal.Google Scholar
  3. Cano, P. et al. (2004b). Nearest-Neighbor Generic Sound Classification with a WordNet-based Taxonomy. In Proc.116th AES Convention, Berlin, Germany.Google Scholar
  4. Cano, P. et al. (2004c). Sound Effects Taxonomy Management in Production Environments. In Proc. AES 25th Int. Conf., London, UK.Google Scholar
  5. Casey, M. (2002). Generalized Sound Classification and Similarity in MPEG-7. Organized Sound, {6}(2).Google Scholar
  6. Dubnov, S. and Ben-Shalom, A. (2003). Review of ICA and HOS Methods for Retrieval of Natural Sounds and Sound Effects. In 4th IInt. Symposium on Independent Component Analysis and Blind Signal Separation, Japan.Google Scholar
  7. Gygi, B. (2001). Factors in the Identification of Environmental Sounds. Ph.D. Thesis, Indiana University.Google Scholar
  8. Herrera, P., et al. (2002). Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques. In C. Anagnostopoulou, M. Ferrand, and A. Smaill (Eds.), Music and Artificial Intelligence. Springer.Google Scholar
  9. Herrera, P. et al. (2003). Automatic Classification of Musical Instrument Sounds. Journal of New Music Research, {32}(1).Google Scholar
  10. Jain, A.K. et al. (2000). Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, {22}(1), 4–37.Google Scholar
  11. Kostek, B. and Czyzewski, A. (2001). Representing Musical Instrument Sounds for Their Automatic Classification. J. Audio Eng. Soc. {49}(9), 768–785.Google Scholar
  12. Lakatos, S. (2000). A Common Perceptual Space for Harmonic and Percussive Timbres. Perception & Psychoacoustics, (62), 1426–1439.Google Scholar
  13. Logan, B. (2000). Mel Frequency Cepstral Coefficients for Music Modeling. In Proc. of the ISMIR, Plymouth, MA.Google Scholar
  14. Martin, K.D. (1999). Sound-Source Recognition: A Theory and Computacional Model. Ph.D. Thesis, M.I.T.Google Scholar
  15. Miller, G.A. (1995). WordNet: A Lexical Database for English. Communications of the ACM, 39–45.Google Scholar
  16. Mott, R.L. (1990). Sound Effects: Radio, TV, and Film. Focal Press.Google Scholar
  17. Peeters, G. and Rodet, X. (2003). Hierarchical Gaussian Tree with Inertia Ratio Maximization for the Classification of Large Musical Instruments Databases. In Proc. of the 6th Int. Conf. on Digital Audio Effects, London.Google Scholar
  18. Peltonen, V. et al. (2002). Computational Auditory Scene Recognition. In Proc. of ICASSP, Florida, USA.Google Scholar
  19. Slaney, M. (2002). Mixture of Probability Experts for Audio Retrieval and Indexing. In IEEE IInt. Conference on Multimedia and Expo.Google Scholar
  20. Wold, E. et al. (1996). Content-Based Classification, Search, and Retrieval of Audio. IEEE Multimedia, {3}(3), 27–36.Google Scholar
  21. Zhang, T. and Kuo, C.-C. J. (1999). Classification and Retrieval of Sound Effects in Audiovisual Data Management. In Proc. of the 33rd Asilomar Conference on Signals, Systems and Computers.Google Scholar
  22. Zwicker, E. and Fastl, H. (1990). Psychoacoustics, Facts and Models. Springer-Verlag.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Pedro Cano
    • 1
  • Markus Koppenberger
    • 1
  • Sylvain Le Groux
    • 1
  • Julien Ricard
    • 1
  • Nicolas Wack
    • 1
  • Perfecto Herrera
    • 1
  1. 1.Music Technology Group, Institut Universitari de l’AudiovisualUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations