Skip to main content

Factor Automata of Automata and Applications

  • Conference paper
Implementation and Application of Automata (CIAA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4783))

Included in the following conference series:

Abstract

An efficient data structure for representing the full index of a set of strings is the factor automaton, the minimal deterministic automaton representing the set of all factors or substrings of these strings. This paper presents a novel analysis of the size of the factor automaton of an automaton, that is the minimal deterministic automaton accepting the set of factors of a finite set of strings, itself represented by a finite automaton. It shows that the factor automaton of a set of strings U has at most 2|Q| − 2 states, where Q is the number of nodes of a prefix-tree representing the strings in U, a bound that significantly improves over 2 ||U||− 1, the bound given by Blumer et al. (1987), where ||U|| is the sum of the lengths of all strings in U. It also gives novel and general bounds for the size of the factor automaton of an automaton as a function of the size of the original automaton and the maximal length of a suffix shared by the strings it accepts. Our analysis suggests that the use of factor automata of automata can be practical for large-scale applications, a fact that is further supported by the results of our experiments applying factor automata to a music identification task with more than 15,000 songs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge, UK (1997)

    MATH  Google Scholar 

  2. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2002)

    Google Scholar 

  3. Allauzen, C., Mohri, M., Saraclar, M.: General Indexation of Weighted Automata – Application to Spoken Utterance Retrieval. In: Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval (HLT/NAACL 2004), Boston, Massachusetts, pp. 33–40 (2004)

    Google Scholar 

  4. Weinstein, E., Moreno, P.: Music Identification with Weighted Finite-State Transducers. In: Proceedings of ICASSP 2007, Honolulu, Hawaii (2007)

    Google Scholar 

  5. Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M., Seiferas, J.: The smallest automaton recognizing the subwords of a text. Theoretical Computer Science 40, 31–55 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  6. Crochemore, M.: Transducers and repetitions. Theoretical Computer Science 45, 63–86 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  7. Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., McConnell, R.: Complete inverted files for efficient text retrieval and analysis. Journal of the ACM 34, 578–589 (1987)

    Article  MathSciNet  Google Scholar 

  8. Mohri, M.: Finite-state transducers in language and speech processing. Computational Linguistics 23, 269–311 (1997)

    MathSciNet  Google Scholar 

  9. Mohri, M.: Statistical Natural Language Processing. In: Lothaire, M. (ed.) Applied Combinatorics on Words, Cambridge University Press, Cambridge (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jan Holub Jan Žďárek

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mohri, M., Moreno, P., Weinstein, E. (2007). Factor Automata of Automata and Applications. In: Holub, J., Žďárek, J. (eds) Implementation and Application of Automata. CIAA 2007. Lecture Notes in Computer Science, vol 4783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76336-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76336-9_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76335-2

  • Online ISBN: 978-3-540-76336-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics