Audio characterization; Audio feature extraction
An audio signal is a signal that contains information in the audible frequency range. Audio representation refers to the extraction of audio signal properties, or features, that are representative of the audio signal composition (both in temporal and spectral domain) and audio signal behavior over time. Feature extraction is typically combined with feature selection, through which the best set of features for the intended operation on the audio signal is defined.
Audio feature extraction typically leads to a strongly reduced audio signal representation. Obtaining such representation can improve the efficiency of audio processing and benefit many applications based on such processing. For example, a compact representation of an audio signal in the form of a fingerprintcan enable extremely fast search for a match between this signal and a large-scale audio database for the purpose of audio signal...
- 3.Foote J. Content-based retrieval of music and audio. In: Proceedings of the SPIE Multimedia Storage and Archiving Systems II; 1997. p. 138–47.Google Scholar
- 8.Peltonen V, Tuomi J, Klapuri AP, Huopaniemi J, Sorsa T. Computational auditory scene recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; 2002. p. 1941–4.Google Scholar
- 9.Rabiner L, Juang BH. Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall; 1993.Google Scholar
- 10.Saunders J. Real-time discrimination of broadcast speech/music. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; 1996. p. 993–6.Google Scholar
- 11.Scheirer E, Slaney M. Construction and evaluation of a robust multifeature music/speech discriminator. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; 1997. p. 1331–4.Google Scholar
- 13.Wall ME, Rechtsteiner A, Rocha LM. Singular value decomposition and principal component analysis. In: Berrar DP, Dubitzky W, Granzow M, editors. A practical approach to microarray data analysis. Norwell: Kluwer; 2003. p. 91–109. LANL LA-UR-02-4001.Google Scholar
- 15.Zhang T, Kuo C-CJ. Video content parsing based on combined audio and visual information. In: Proceedings of the SPIE: Multimedia Storage and Archiving Systems IV; 1999. p. 78–89.Google Scholar