Skip to main content

A General Weighted Grammar Library

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3317))

Abstract

We present a general weighted grammar software library, the GRM Library, that can be used in a variety of applications in text, speech, and biosequence processing. The underlying algorithms were designed to support a wide variety of semirings and the representation and use of very large grammars and automata of several hundred million rules or transitions. We describe several algorithms and utilities of this library and point out in each case their application to several text and speech processing tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  2. Allauzen, C., Crochemore, M., Raffinot, M.: Efficient experimental string matching by weak factor recognition. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 51–72. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Allauzen, C., Mohri, M., Roark, B.: Generalized algorithms for constructing language models. In: Proceedings of ACL 2003, pp. 40–47 (2003)

    Google Scholar 

  4. Allauzen, C., Raffinot, M.: Simple optimal string matching. Journal of Algorithms 36(1), 102–116 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  5. Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Seiferas, J.I.: The smallest automaton recognizing the subwords of a text. Theoretical Computer Science 40(1), 31–55 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  6. Blumer, A., Blumer, J., Haussler, D., McConnel, R.M., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. Journal of the ACM 34(3), 578–595 (1987)

    Article  Google Scholar 

  7. Cortes, C., Mohri, M.: Distribution Kernels Based on Moments of Counts. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada (July 2004)

    Google Scholar 

  8. Crochemore, M.: Transducers and repetitions. Theoretical Computer Science 45(1), 63–86 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  9. Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12(4/5), 247–267 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  10. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  11. Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustic, Speech, and Signal Processing 35(3), 400–401 (1987)

    Article  Google Scholar 

  12. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of ICASSP, vol. 1, pp. 181–184 (1995)

    Google Scholar 

  13. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  14. Mohri, M.: Syntactic analysis by local grammars automata: an efficient algorithm. In: Proceedings of the International Conference on Computational Lexicography (COMPLEX 1994), Linguistic Institute, Hungarian Academy of Science (1994)

    Google Scholar 

  15. Mohri, M.: String-matching with automata. Nordic Journal of Computing 2(2), 217–231 (1997)

    MathSciNet  Google Scholar 

  16. Mohri, M.: Weighted Grammar Tools: the GRM Library. In: Robustness in Language and Speech Technology, pp. 165–186. Kluwer, Dordrecht (2001)

    Google Scholar 

  17. Mohri, M., Nederhof, M.-J.: Regular Approximation of Context-Free Grammars through Transformation. In: Robustness in Language and Speech Technology, pp. 153–163. Kluwer, Dordrecht (2001)

    Google Scholar 

  18. Mohri, M., Pereira, F.C.N., Riley, M.: The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231, 17–32 (2000), http://www.research.att.com/sw/tools/fsm

    Article  MATH  MathSciNet  Google Scholar 

  19. Mohri, M., Pereira, F.C.N., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)

    Article  Google Scholar 

  20. Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependences in stochastic language modeling. Computer Speech and Language 8(1), 1–38 (1994)

    Article  Google Scholar 

  21. Seymore, K., Rosenfeld, R.: Scalable backoff language models. In: Proceedings of ICSLP, Philadelphia, Pennsylvania, vol. 1, pp. 232–235 (1996)

    Google Scholar 

  22. Stolcke, A.: Entropy-based pruning of backoff language models. In: Proc. DARPA Broadcast News Transcription and Understanding Workshop, pp. 270–274 (1998)

    Google Scholar 

  23. Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proc. Intl. Conf. on Spoken Language Processing (ICSLP 2002), vol. 2, pp. 901–904 (2002)

    Google Scholar 

  24. Ullian, J.: Partial algorithm problems for context free languages. Information and Control 11, 80–101 (1967)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Allauzen, C., Mohri, M., Roark, B. (2005). A General Weighted Grammar Library. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds) Implementation and Application of Automata. CIAA 2004. Lecture Notes in Computer Science, vol 3317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30500-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30500-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24318-2

  • Online ISBN: 978-3-540-30500-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics