Machine Learning

, Volume 96, Issue 1–2, pp 99–127 | Cite as

Adaptively learning probabilistic deterministic automata from data streams



Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for inferring models in this class in the restrictive data stream scenario: Unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We also present extensions of the algorithm that (1) reduce to a minimum the need for guessing parameters of the target distribution and (2) are able to adapt to changes in the input distribution, relearning new models when needed. We provide rigorous PAC-like bounds for all of the above. Our algorithm makes a key usage of stream sketching techniques for reducing memory and processing time, and is modular in that it can use different tests for state equivalence and for change detection in the stream.


PAC learning Data streams Probabilistic automata PDFA Stream sketches 



This work was partially supported by MICINN projects TIN2011-27479-C04-03 (BASMATI) and TIN-2007-66523 (FORMALISM), by SGR2009-1428 (LARCA), and by the EU PASCAL2 Network of Excellence (FP7-ICT-216886). B. Balle is supported by an FPU fellowship (AP2008-02064) from the Spanish Ministry of Education.

A preliminary version of this work was presented at the 11th Intl. Conf. on Grammatical Inference (Balle et al. 2012a). Here we provide missing proofs and discussions, and extend the results there to streams that evolve over time. On the other hand, Balle et al. (2012a) outlined an efficient state-similarity test based on bootstrapping. Because it can be used independently of the specific PDFA learning method discussed here, and the full presentation and analysis are long, it will be published elsewhere.


  1. Aggarwal, C. (Ed.) (2007). Data streams—models and algorithms. Berlin: Springer. MATHGoogle Scholar
  2. Balle, B., Castro, J., & Gavaldà, R. (2012a). Bootstrapping and learning pdfa in data streams. In International colloquium on grammatical inference (ICGI). Google Scholar
  3. Balle, B., Castro, J., & Gavaldà, R. (2012b). Learning probabilistic automata: a study in state distinguishability. Theoretical Computer Science. Google Scholar
  4. Bifet, A. (2010). Frontiers of artificial intelligence series and applications. Adaptive stream mining: pattern learning and mining from evolving data streams. Amsterdam: IOS Press. Google Scholar
  5. Bousquet, O., Boucheron, S., & Lugosi, G. (2004). Introduction to statistical learning theory. Advanced Lectures on Machine Learning. Google Scholar
  6. Carrasco, R. C., & Oncina, J. (1999). Learning deterministic regular grammars from stochastic samples in polynomial time. Informatique Théorique Et Applications, 33(1), 1–20. MathSciNetCrossRefMATHGoogle Scholar
  7. Castro, J., & Gavaldà, R. (2008). Towards feasible PAC-learning of probabilistic deterministic finite automata. In International colloquium on grammatical inference (ICGI). Google Scholar
  8. Clark, A., & Thollard, F. (2004). PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research. Google Scholar
  9. Dupont, P., Denis, F., & Esposito, Y. (2005). Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognition, 38, 1349–1371. CrossRefMATHGoogle Scholar
  10. Gama, J. (2010). Knowledge discovery from data streams. London: Taylor and Francis. CrossRefMATHGoogle Scholar
  11. Guttman, O., Vishwanathan, S. V. N., & Williamson, R. C. (2005). Learnability of probabilistic automata via oracles. In Conference on algorithmic learning theory (ALT). Google Scholar
  12. de la Higuera, C. (2010). Grammatical inference: learning automata and grammars. Cambridge: Cambridge University Press. Google Scholar
  13. Hsu, D., Kakade, S. M., & Zhang, T. (2009). A spectral algorithm for learning hidden Markov models. In Conference on learning theory (COLT). Google Scholar
  14. Kearns, M. J., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R. E., & Sellie, L. (1994). On the learnability of discrete distributions. In Symposium on theory of computation (STOC). Google Scholar
  15. Lin, X., & Zhang, Y. (2008). Aggregate computation over data streams. In Asian-pacific web conference (APWeb). Google Scholar
  16. Menascé, D. A., Almeida, V. A. F., Fonseca, R., & Mendes, M. A. (1999). A methodology for workload characterization of e-commerce sites. In Proceedings of the 1st ACM conference on electronic commerce EC’99 (pp. 119–128). New York: ACM. doi: 10.1145/336992.337024. CrossRefGoogle Scholar
  17. Metwally, A., Agrawal, D., & Abbadi, A. (2005). Efficient computation of frequent and top-k elements in data streams. In International conference on database theory (ICDT). Google Scholar
  18. Muthukrishnan, S. (2005). Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science. Google Scholar
  19. Palmer, N., & Goldberg, P. W. (2007). PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance. Theoretical Computer Science. Google Scholar
  20. Palmer, N. J. (2008). Pattern classification via unsupervised learners. PhD thesis, University of Warwick. Google Scholar
  21. Ron, D., Singer, Y., & Tishby, N. (1998). On the learnability and usage of acyclic probabilistic finite automata. Journal of Computing Systems Science. Google Scholar
  22. Schmidt, J., & Kramer, S. (2012). Online induction of probabilistic real time automata. In IEEE International Conference on Data Mining (pp. 625–634). doi: 10.1109/ICDM.2012.121 Google Scholar
  23. Schmidt, J., Ansorge, S., & Kramer, S. (2012). Scalable induction of probabilistic real-time automata using maximum frequent pattern based clustering. In Proceedings of the twelfth SIAM international conference on data mining (pp. 272–283). Google Scholar
  24. Terwijn, S. (2002). On the learnability of hidden Markov models. In Intl. conf. on grammatical inference (ICGI). Google Scholar
  25. Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Y. Eldar & G. Kutyniok (Eds.), Compressed sensing, theory and applications, CUP, Chap. 5. Google Scholar
  26. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., & Carrasco, R. C. (2005a). Probabilistic finite-state machines—part I. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google Scholar
  27. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., & Carrasco, R. C. (2005b). Probabilistic finite-state machines—part II. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.LARCA research groupUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations