Using symbol clustering to improve probabilistic automaton inference

  • Pierre Dupont
  • Lin Chase
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1433)


In this paper we show that clustering alphabet symbols before PDFA inference is performed reduces perplexity on new data. This result is especially important in real tasks, such as spoken language interfaces, in which data sparseness is a significant issue. We describe the application of the ALERGIA algorithm combined with an independent clustering technique to the Air Travel Information System (ATIS) task. A 25 % reduction in perplexity was obtained. This result outperforms a trigram model under the same simple smoothing scheme.


Deterministic Finite Automaton Alphabet Symbol Probabilistic Automaton Grammatical Inference Prefix Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    N. Abe and M. Warmuth. On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205–260, 1992.zbMATHGoogle Scholar
  2. 2.
    P. Brown, V. Della Pietra, P. de Souza, J. Lai, and R. Mercer. Class-based N-gram models of natural language. Computational Linguistics, 18(4):467–479, 1992.Google Scholar
  3. 3.
    R. Carrasco and J. Oncina. Learning stochastic regular grammars by means of a state merging method. In Grammatical Inference and Applications, ICGI'94,number 862 in Lecture Notes in Artificial Intelligence, pages 139–150. SpringerVerlag, 1994.Google Scholar
  4. 4.
    L. Hirschman. Multi-site data collection for a spoken language corpus. In Proc. of DARPA Speech and Natural Language Workshop, pages 7–14, 1992.Google Scholar
  5. 5.
    W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    S.M. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustic, Speech and Signal Processing, 35(3):400–401, 1987.CrossRefGoogle Scholar
  7. 7.
    M.J. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, R.E. Schapire, and L. Sellie. On the learnability of discrete distributions. In Proc. of the 25th Annual ACM Symposium on Theory of Computing, pages 273–282, 1994.Google Scholar
  8. 8.
    R. Kneser and H. Ney. Improved backing-off for m-gram language modeling. In International Conference on Acoustic, Speech and Signal Processing, pages 181–184, 1995.Google Scholar
  9. 9.
    K. Lang. Merge order counts. Technical report, NEC Research Institute, September 1997.Google Scholar
  10. 10.
    K.J. Lang. Random DFA's can be approximately learned from sparse uniform examples. In 5th ACM workshop on Computational Learning Theory, pages 45–52, 1992.Google Scholar
  11. 11.
    H. Ney, U. Essen, and R. Kneser. On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8:1–38, 1994.CrossRefGoogle Scholar
  12. 12.
    H. Ney and R. Knesser. Improved clustering techniques for class-based statistical language modelling. In European Conference on Speech Communication and Technology, pages 973–976, Berlin, 1993.Google Scholar
  13. 13.
    J. Oncina and P. García. Inferring regular languages in polynomial update time. In N. Pérez de la Bianca, A. Sanfeliu, and E.Vidal, editors, Pattern Recognition and Image Analysis, volume 1 of Series in Machine Perception and Artificial Intelligence, pages 49–61. World Scientific, 1992.Google Scholar
  14. 14.
    D. Ron, Y. Singer, and N. Tishby. On the learnability and usage of acyclic probabilistic automata. to appear in Journal of Computer and System Sciences.Google Scholar
  15. 15.
    H. Rulot and E. Vidal. An efficient algorithm for the inference of circuit-free automata. In G. Ferratè, T. Pavlidis, A. Sanfeliu, and H. Bunke, editors, Advances in Structural and Syntactic Pattern Recognition, pages 173–184. NATO ASI, Springer-Verlag, 1988.Google Scholar
  16. 16.
    B. Trakhtenbrot and Ya. Barzdin. Finite Automata: Behavior and Synthesis. North Holland Pub. Comp., Amsterdam, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Pierre Dupont
    • 1
  • Lin Chase
    • 2
  1. 1.EURISEUniversité Jean MonnetSaint-Etienne CedexFrance
  2. 2.LIMSI/CNRSOrsay CedexFrance

Personalised recommendations