A Spectral Learning Algorithm for Finite State Transducers

  • Borja Balle
  • Ariadna Quattoni
  • Xavier Carreras
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6911)


Finite-State Transducers (FSTs) are a popular tool for modeling paired input-output sequences, and have numerous applications in real-world problems. Most training algorithms for learning FSTs rely on gradient-based or EM optimizations which can be computationally expensive and suffer from local optima issues. Recently, Hsu et al. [13] proposed a spectral method for learning Hidden Markov Models (HMMs) which is based on an Observable Operator Model (OOM) view of HMMs. Following this line of work we present a spectral algorithm to learn FSTs with strong PAC-style guarantees. To the best of our knowledge, ours is the first result of this type for FST learning. At its core, the algorithm is simple, and scalable to large data sets. We present experiments that validate the effectiveness of the algorithm on synthetic and real data.


Hide State Input Symbol Input Distribution Input Alphabet Output Symbol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abe, N., Takeuchi, J., Warmuth, M.: Polynomial Learnability of Stochastic Rules with Respect to the KL-Divergence and Quadratic Distance. IEICE Transactions on Information and Systems 84(3), 299–316 (2001)Google Scholar
  2. 2.
    Bailly, R., Denis, F., Ralaivola, L.: Grammatical inference as a principal component analysis problem. In: Proc. ICML (2009)Google Scholar
  3. 3.
    Bernard, M., Janodet, J.-C., Sebban, M.: A discriminative model of stochastic edit distance in the form of a conditional transducer. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 240–252. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Carlyle, J.W., Paz, A.: Realization by stochastic finite automaton. Journal of Computer and System Sciences 5, 26–40 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Casacuberta, F.: Inference of finite-state transducers by using regular grammars and morphisms. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 1–14. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Chang, J.T.: Full reconstruction of markov models on evolutionary trees: Identifiability and consistency. Mathematical Biosciences 137, 51–73 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chen, S., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proc. of ACL, pp. 310–318 (1996)Google Scholar
  8. 8.
    Clark, A.: Partially supervised learning of morphology with stochastic transducers. In: Proc. of NLPRS, pp. 341–348 (2001)Google Scholar
  9. 9.
    Clark, A., Costa Florêncio, C., Watkins, C.: Languages as hyperplanes: grammatical inference with string kernels. Machine Learning, 1–23 (2010)Google Scholar
  10. 10.
    Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: Proc. of ACL, pp. 1–8 (2002)Google Scholar
  11. 11.
    Fliess, M.: Matrices de Hankel. Journal de Mathematiques Pures et Appliquees 53, 197–222 (1974)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Haardt, M., Nossek, J.A.: Simultaneous schur decomposition of several nonsymmetric matrices to achieve automatic pairing in multidimensional harmonic retrieval problems. IEEE Transactions on Signal Processing 46(1) (1998)Google Scholar
  13. 13.
    Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden markov models. In: Proc. of COLT (2009)Google Scholar
  14. 14.
    Jaeger, H.: Observable operator models for discrete stochastic time series. Neural Computation 12, 1371–1398 (2000)CrossRefGoogle Scholar
  15. 15.
    Jelinek, F.: Statistical Methods for Speech Recognition (Language, Speech, and Communication). MIT Press, Cambridge (1998)Google Scholar
  16. 16.
    Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)Google Scholar
  17. 17.
    Li, H., Kumaran, A., Pervouchine, V., Zhang, M.: Report of news 2009 machine transliteration shared task. In: Proc. Named Entities Workshop (2009)Google Scholar
  18. 18.
    Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden markov models. In: Proc. of STOC (2005)Google Scholar
  19. 19.
    Neuts, M.F.: Matrix-geometric solutions in stochastic models: an algorithmic approach. Johns Hopkins University Press, Baltimore (1981)zbMATHGoogle Scholar
  20. 20.
    Ng, A., Jordan, M.: On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In: NIPS (2002)Google Scholar
  21. 21.
    Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. In: Proc. of COLT, pp. 31–40 (1995)Google Scholar
  22. 22.
    Schützenberger, M.: On the definition of a family of automata. Information and Control 4, 245–270 (1961)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Siddiqi, S.M., Boots, B., Gordon, G.J.: Reduced-Rank Hidden Markov Models. In: Proc. AISTATS, pp. 741–748 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Borja Balle
    • 1
  • Ariadna Quattoni
    • 1
  • Xavier Carreras
    • 1
  1. 1.Universitat Politècnica de CatalunyaSpain

Personalised recommendations