Abstract
Previous work analyzed the information in speech using analysis of variance (ANOVA). ANOVA assumes that sources of information (phone, speaker, and channel) are univariate gaussian. The sources of information, however, are not unimodal gaussian. Phones in speech recognition, e.g., are generally modeled using a multi-state, multi-mixture model. Therefore, this work extends ANOVA by assuming phones with 3 state, single mixture distribution and 5 state, single mixture distribution. This multi-state model was obtained by extracting variability due to position within phone from the error term in ANOVA. Further, linear discriminant analysis (LDA) is used to design discriminant features that better represent both the phone-induced variability and the position-within-phone variability. These features perform significantly better than conventional discriminant features obtained from 1-state phone model on continuous digit recognition task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. van Vuuren and H. Hermansky: Data-driven design of RASTA-like filters. Proc. of EUROSPEECH, Greece (1997) 409–412.
Sachin S. Kajarekar, N. Malayath and H. Hermansky: Analysis of Sources of Variability in Speech. Proc. of EUROSPEECH, Budapest (1999) 343–346.
Sachin S. Kajarekar, N. Malayath and H. Hermansky: Analysis of Speaker and Channel Variability in Speech. Proc. of ASRU, Colorado (1999).
R. Cole and M. Noel and T. Lander: Telephone speech corpus development at CSLU. Proc. ICSLP, (1994).
H. Hermansky and N. Malayath: Spectral basis functions from discriminant analysis Proc. of ICSLP, Sydney, (1998).
K. Fukunaga: Statistical Pattern Recognition, 2nd ed., Academic Press, San Diego (1998).
Thomas M. Cover and Joy A. Thomas: Elements of Information Theory, John Wiley & Sons, Inc (1991).
Robert V. Hogg and Elliot A. Tannis: Statistical Analysis and Inference, 5th ed., Prentice Hall (1997)283–288.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kajarekar, S.S., Hermansky, H. (2000). Analysis of Information in Speech and Its Application in Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_48
Download citation
DOI: https://doi.org/10.1007/3-540-45323-7_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41042-3
Online ISBN: 978-3-540-45323-9
eBook Packages: Springer Book Archive