Enumerated Automata Implementation of String Dictionaries

  • Robert Bakarić
  • Damir Korenčić
  • Strahil RistovEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11601)


Over the last decade a considerable effort was invested into research on implementing string dictionaries. String dictionary is a data structure that bijectively maps a set of strings to a set of integers, and that is used in various index-based applications. A recent paper [18] can be regarded as a reference work on the subject of string dictionary implementations. Although very comprehensive, [18] does not cover the implementation of a string dictionary with the enumerated deterministic finite automaton, a data structure naturally suited for this purpose. We compare the results for the state-of-the-art compressed enumerated automaton with those presented in [18] on the same collection of data sets, and on the collection of natural language word lists. We show that our string dictionary implementation is a competitive variant for different types of data, especially when dealing with large sets of strings, and when strings have more similarity between them. In particular, our method presents as a prominent solution for storing DNA motifs and words of inflected natural languages. We provide the code used for the experiments.


String dictionary Enumerated DFA Recursive automaton LZ trie DNA indexing 



We are grateful to Miguel Martínez-Prieto for kindly providing data sets used in [18].


  1. 1.
    Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Blelloch, G.E., Halperin, D. (eds.) ALENEX 2010, pp. 84–97. SIAM, Philadelphia (2010). Scholar
  2. 2.
    Arz, J., Fischer, J.: LZ-compressed string dictionaries. In: DCC 2014, pp. 322–331. IEEE (2014).
  3. 3.
    Benoit, D., Demaine, E.D., Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Brisaboa, N.R., Cánovas, R., Claude, F., Martínez-Prieto, M.A., Navarro, G.: Compressed string dictionaries. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 136–147. Springer, Heidelberg (2011). Scholar
  5. 5.
    Daciuk, J., van Noord, G.: Finite automata for compact representation of language models in NLP. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 65–73. Springer, Heidelberg (2002). Scholar
  6. 6.
    Daciuk, J., van Noord, G.: Finite automata for compact representation of tuple dictionaries. Theor. Comput. Sci. 313(1), 45–56 (2004)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Daciuk, J.: Experiments with automata compression. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 105–112. Springer, Heidelberg (2001). Scholar
  8. 8.
    Daciuk, J., Piskorski, J.: Gazetteer compression technique based on substructure recognition. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) IIPWM 2006. AINSC, vol. 35, pp. 87–95. Springer, Heidelberg (2006). Scholar
  9. 9.
    Daciuk, J., Piskorski, J., Ristov, S.: Natural language dictionaries implemented as finite automata. In: Martín-Vide, C. (ed.) Mathematics, Computing, Language, and Life: Frontiers in Mathematical Linguistics and Language Theory, vol. 2, pp. 133–204. World Scientific & Imperial College Press, London (2010)zbMATHGoogle Scholar
  10. 10.
    Daciuk, J., Weiss, D.: Smaller representation of finite state automata. In: Bouchou-Markhoff, B., Caron, P., Champarnaud, J.-M., Maurel, D. (eds.) CIAA 2011. LNCS, vol. 6807, pp. 118–129. Springer, Heidelberg (2011). Scholar
  11. 11.
    Ferragina, P., Grossi, R., Gupta, A., Shah, R., Vitter, J.S.: On searching compressed string collections cache-obliviously. In: PODS 2008, pp. 181–190. ACM, New York (2008).
  12. 12.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: FOCS 2005, pp. 184–196. IEEE Computer Society (2005).
  13. 13.
    Ferragina, P., Venturini, R.: The compressed permuterm index. ACM Trans. Algorithms 7(1), 10:1–10:21 (2010). Scholar
  14. 14.
    Georgiev, K.: Compression of minimal acyclic deterministic FSAs preserving the linear accepting complexity. In: Mihov, S., Schulz, K.U. (eds.) Proceedings Workshop on Finite-State Techniques and Approximate Search 2007, pp. 7–13 (2007)Google Scholar
  15. 15.
    Grossi, R., Ottaviano, G.: Fast compressed tries through path decompositions. ACM J. Exp. Algorithmics 19(1), 3.4:1.1–3.4:1.20 (2014)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000). Scholar
  17. 17.
    Lucchesi, C.L., Kowaltowski, T.: Applications of finite automata representing large vocabularies. Softw. Pract. Exp. 23(1), 15–30 (1993)CrossRefGoogle Scholar
  18. 18.
    Martínez-Prieto, M.A., Brisaboa, N., Cánovas, R., Claude, F., Navarro, G.: Practical compressed string dictionaries. Inf. Syst. 56(C), 73–108 (2016)CrossRefGoogle Scholar
  19. 19.
    Navarro, G.: Indexing text using the Ziv-Lempel trie. J. Discret. Algorithms 2(1), 87–114 (2004). Scholar
  20. 20.
    Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding \(k\)-ary trees and multisets. In: Eppstein, D. (ed.) Proceedings of SODA 2002, pp. 233–242. ACM/SIAM, Philadelphia (2002)Google Scholar
  21. 21.
    Revuz, D.: Dictionnaires et lexiques: méthodes et algorithmes. Ph.D. thesis, Institut Blaise Pascal, Paris, France (1991)Google Scholar
  22. 22.
    Ristov, S.: LZ trie and dictionary compression. Softw. Pract. Exp. 35(5), 445–465 (2005). Scholar
  23. 23.
    Ristov, S., Korenčić, D.: Fast construction of space-optimized recursive automaton. Softw. Pract. Exp. 45(6), 783–799 (2014). Scholar
  24. 24.
    Ristov, Strahil, Laporte, Eric: Ziv Lempel compression of huge natural language data tries using suffix arrays. In: Crochemore, Maxime, Paterson, Mike (eds.) CPM 1999. LNCS, vol. 1645, pp. 196–211. Springer, Heidelberg (1999). Scholar
  25. 25.
    Skibiński, P., Grabowski, S., Deorowicz, S.: Revisiting dictionary-based compression. Softw. Pract. Exp. 35(15), 1455–1476 (2005). Scholar
  26. 26.
    Tounsi, L., Bouchou, B., Maurel, D.: A compression method for natural language automata. In: FSMNLP 2008, pp. 146–157. IOS Press, Amsterdam (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Robert Bakarić
    • 1
  • Damir Korenčić
    • 1
  • Strahil Ristov
    • 1
    Email author
  1. 1.Department of ElectronicsRuđer Bošković InstituteZagrebCroatia

Personalised recommendations