Speeding up Parsing of Biological Context-Free Grammars
Grammars have been shown to be a very useful way to model biological sequences families. As both the quantity of biological sequences and the complexity of the biological grammars increase, generic and efficient methods for parsing are needed. We consider two parsers for context-free grammars: depth-first top-down parser and chart parser; we analyse and compare them, both theoretically and empirically, with respect to biological data. The theoretical comparison is based on a common feature of biological grammars: the gap – a gap is an element of the grammars designed to match any subsequence of the parsed string. The empirical comparison is based on grammars and sequences used by the bioinformatics community. Our conclusions are that: (1) the chart parsing algorithm is significantly faster than the depth-first top-down algorithm, (2) designing special treatments in the algorithms for managing gaps is useful, and (3) the way the grammar encodes gaps has to be carefully chosen, when using parsers not optimised for managing gaps, to prevent important increases in running times.
KeywordsLeft Part Recursive Call Biological Sequence Parsing Algorithm Prosite Pattern
Unable to display preview. Download preview PDF.
- 1.Chomsky, N.: Three models for the description of language. IRE Trans. on Information Theory 2 (1956)Google Scholar
- 2.Searls, D.B.: The linguistics of DNA. American Scientist 80, 579–591 (1992)Google Scholar
- 5.Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Journal of logic Programming 12 (1993)Google Scholar
- 8.Grune, D., Jacobs, C.J.: Parsing techniques – a practical guide. Ellis Horwood, Chichester (1990)Google Scholar
- 9.Gazdar, G., Mellish, C.: Natural Language Processing in Prolog. Addison Wesley, Reading (1989)Google Scholar
- 10.Aycock, J., Horspool, R.N.: Practical Earley parsing. The Computer Journal 45 (2002)Google Scholar