Abstract
We present an algorithmic framework for phoneme classification where the set of phonemes is organized in a predefined hierarchical structure. This structure is encoded via a rooted tree which induces a metric over the set of phonemes. Our approach combines techniques from large margin kernel methods and Bayesian analysis. Extending the notion of large margin to hierarchical classification, we associate a prototype with each individual phoneme and with each phonetic group which corresponds to a node in the tree. We then formulate the learning task as an optimization problem with margin constraints over the phoneme set. In the spirit of Bayesian methods, we impose similarity requirements between the prototypes corresponding to adjacent phonemes in the phonetic hierarchy. We describe a new online algorithm for solving the hierarchical classification problem and provide worst-case loss analysis for the algorithm. We demonstrate the merits of our approach by applying the algorithm to synthetic data and as well as speech data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deller, J., Proakis, J., Hansen, J.: Discrete-Time Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1987)
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)
Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks 5, 298–305 (1994)
Clarkson, P., Moreno, P.: On the use of support vector machines for phonetic classification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing 1999, Phoenix, Arizona (1999)
Salomon, J.: Support vector machines for phoneme classification. Master’s thesis, University of Edinburgh (2001)
Koller, D., Sahami, M.: Hierarchically classifying docuemnts using very few words. In: Machine Learning: Proceedings of the Fourteenth International Conference, pp. 171–178 (1997)
McCallum, A.K., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of ICML 1998, pp. 359–367 (1998)
Weigend, A.S., Wiener, E.D., Pedersen, J.O.: Exploiting hierarchy in text categorization. Information Retrieval 1, 193–216 (1999)
Dumais, S.T., Chen, H.: Hierarchical classification of Web content. In: Proceedings of SIGIR 2000, pp. 256–263 (2000)
Katz, S.: Estimation of probabilities from sparsedata for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing (ASSP) 35, 400–440 (1987)
Vapnik, V.N.: Statistical Learning Theory. Wiley, Chichester (1998)
Crammer, K., Dekel, O., Shalev-Shwartz, S., Singer, Y.: Online passive aggressive algorithms. Advances in Neural Information Processing Systems 16 (2003)
Herbster, M.: Learning additive models online with fast evaluating kernels. In: Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pp. 444–460 (2001)
Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Information and Computation 132, 1–64 (1997)
Cesa-Bianchi, N., Conconi, A., Gentile, C.: On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory (2004) (to appear)
Weston, J., Watkins, C.: Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks (1999)
Censor, Y., Zenios, S.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, New York (1997)
Lemel, L., Kassel, R., Seneff, S.: Speech database development: Design and analysis. In: Proc. DARPA Speech Recognition Workshop, Report no. SAIC-86/1546 (1986)
ETSI Standard, ETSI ES 201 108 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dekel, O., Keshet, J., Singer, Y. (2005). An Online Algorithm for Hierarchical Phoneme Classification. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-30568-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)