Text-Independent Phone Segmentation Method Using Gaussian Function

  • Dac-Thang Hoang
  • Hsiao-Chuan Wang
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 244)


In this paper, an effective method is proposed for the automatic phone segmentation of speech signal without using prior information about the transcript of utterance. The spectral change is used as the criterion for hypothesizing the phone boundary. Gaussian function can be used to measure the similarity of two vectors. Then a dissimilarity function is derived from the Gaussian function to measure the variation of speech spectra between mean feature vectors before and after the considered location. The peaks in the dissimilarity curve indicate locations of phone boundaries. Experiments on the TIMIT corpus show that the proposed method is more accurate than previous methods.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Scharenborg, O., Wan, V., Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Amer. 172(2), 1084–1095 (2010)CrossRefGoogle Scholar
  2. 2.
    Estevan, Y.P., Wan, V., Scharenborg, O.: Finding Maximum Margin Segments in Speech. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process 2007, ICASSP 2007, pp. 937–940 (2007)Google Scholar
  3. 3.
    Räsänen, O., Laine, U.K., Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–124. InTech Publishing (2011)Google Scholar
  4. 4.
    Aversano, G., Esposito, A., Esposito, A., Marinaro, M.: A New Text-Independent Method for Phoneme Segmentation. In: Proc. the 44th IEEE Midwest Symposium on Circuit and System 2001, vol. 2, pp. 516–519 (2001)Google Scholar
  5. 5.
    Dusan, S., Rabiner, L.: On the Relation between Maximum Spectral Transition Position and Phone Boundaries. In: Proc. INTERSPEECH 2006, pp. 17–21 (2006)Google Scholar
  6. 6.
    ten Bosch, L., Cranen, B.: A computational model for unsupervised word discovery. In: Proc. INTERSPEECH 2007, pp. 1481–1484 (2007)Google Scholar
  7. 7.
    Almpanidis, G., Kotti, M., Kotropoulos, C.: Robust Detection of Phone Boundaries Using Model Selection Criteria with Few Observation. IEEE Trans. on Audio, Speech, and Lang. Process. 17(2), 287–298 (2009)CrossRefGoogle Scholar
  8. 8.
    Qiao, Y., Shimomura, N., Minematsu, N.: Unsupervised Optimal Phoneme Segmentation: Objective, Algorithm, and Comparisons. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. 2008, ICASSP 2008, pp. 3989–3992 (2008)Google Scholar
  9. 9.
    Lee, C.Y., Glass, J.: A nonparametric Bayesian Approach to Acoustic Model Discovery. In: Proc. 50th Annual Meeting of the Association for Computational Linguistics, pp. 40–49 (2012)Google Scholar
  10. 10.
    Cherniz, A.S., Torres, M.E., Rufiner, H.L.: Dynamic Speech Parameterization for Text-Independent Phone Segmentation. In: Proc. 32nd Annual International Conference of the IEEE EMBS, pp. 4044–4047 (2010)Google Scholar
  11. 11.
    Khanagha, V., Daoudi, K., Pont, O., Yahia, H.: A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscal formalism. In: Proc. INTERSPEECH 2010, pp. 1393–1396 (2010)Google Scholar
  12. 12.
    Khanagha, V., Daoudi, K., Pont, O., Yahia, H.: Improving Text-Independent Phonetic Segmentation based on the microcanonical multiscal formalism. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. 2011, ICASSP 2011, pp. 4484–4487 (2011)Google Scholar
  13. 13.
    Huang, X., Acero, A., Hon, H.W.: Section 5.4 Digital Filters and Windows. In: Spoken Language Processing. Prentice Hall PTR (2001)Google Scholar
  14. 14.
    Deller Jr., J.R., Hansen, J.H.L., Proakis, J.G.: Section 6.2.4 Other Forms and Variations on the stRC Parameters. In: Discrete-Time Processing of Speech Signals. IEEE Press (2000)Google Scholar
  15. 15.
    Peng, H., Luo, L., Lin, C.: The parameter optimization of Gaussian function via the similarity comparison within class and between classes. In: Proc. Third Pacific-Asia Conference on Circuits, Communications and System 2011, PACCS 2011, pp. 1–4 (2011)Google Scholar
  16. 16.
    Delacourt, P., Wellekens, C.J.: DISTBIC: A Speaker-based segmentation for audio data indexing. Speech Commun. 32(1-2), 111–126 (2000)CrossRefGoogle Scholar
  17. 17.
    Ajmera, J., McCowan, I., Bourlard, H.: Robust Speaker Change Detection. IEEE Signal Processing Letters 11(8), 649–651 (2004)CrossRefGoogle Scholar
  18. 18.
    Räsänen, O.J., Laine, U.K., Altosaar: An Improved Speech Segmentation Quality Measure: the R-value. In: Proc. INTERSPEECH 2009, pp. 1851–1854 (1854)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Dac-Thang Hoang
    • 1
    • 2
  • Hsiao-Chuan Wang
    • 1
  1. 1.Department of Electrical EngineeringNational Tsing Hua UniversityHsinchuTaiwan
  2. 2.Department of Network SystemInstitute of Information TechnologyHanoiVietnam

Personalised recommendations