Compressed Storage of Sparse Finite-State Transducers
This paper presents an eclectic approach for compressing weighted finite-state automata and transducers, with minimal impact on performance. The approach is eclectic in the sense that various complementary methods have been employed: row-indexed storage of sparse matrices, dictionary compression, bit manipulation, and lossless omission of data. The compression rate is over 83% with respect to the current Bell Labs finite-state library.
KeywordsDestination State Compression Rate Sparse Matrice Input Symbol Output Symbol
Unable to display preview. Download preview PDF.
- Halle, M. and S. Keyser. 1971. English Stress, Its Forms, Its Growth, and Its Role in Verse. Studies in Language. Harper & Row, New York.Google Scholar
- Kaplan, R. and M. Kay. 1994. Regularmo dels of phonological rule systems. Computational Linguistics, 20(3):331–78.Google Scholar
- Karttunen, L. and K. Beesley. 1992. Two-level rule compiler. Technical report, Palo Alto Research Center, Xerox Corporation.Google Scholar
- Kay, M. and R. Kaplan. 1983. Word recognition. This paper was never published. The core ideas are published in Kaplan and Kay (1994).Google Scholar
- Koskenniemi, K. 1983. Two-Level Morphology. Ph.D. thesis, University of Helsinki.Google Scholar
- Liang, F. 1983. Word Hy-phen-a-tion by Comp-uter. Ph.D. thesis, Stanford Univeristy.Google Scholar
- Mohri, M. and R. Sproat. 1996. An efficient compiler for weighted rewrite rules. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 231–8.Google Scholar
- Ritchie, G., A. Black, G. Russell, and S. Pulman. 1992. Computational Morphology: Practical Mechanisms for the English Lexicon. MIT Press, Cambridge, MA.Google Scholar
- Roche, E. and Y. Schabes. 1995. Deterministic part-of-speech tagging with finite-state transducers. Computational Linguistics, 21(2):227–53.Google Scholar
- Roche, E. and Y. Schabes, editors. 1997. Finite-State Language Processing. MIT Press.Google Scholar
- Sproat, R., editor. 1997. Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Kluwer, Boston, MA.Google Scholar