This chapter first analyzes the general relation between linguistic analysis and computational method. As a familiar example, automatic word form recognition is used. This example exhibits a number of properties which are methodologically characteristic for all components of grammar. We then show methods for investigating the frequency distribution of words in natural language.
KeywordsWord Form Corpus Analysis Frequency List British National Corpus Grammar System
Unable to display preview. Download preview PDF.
- 6.See O. Jespersen 1921, p. 341–346.Google Scholar
- 13.See K. Hess, J. Brustkern and W. Lenders 1983.Google Scholar
- 14.Cf. H. Bergenholtz 1989, D. Biber 1994, N. Oostdijk and P. de Haan (eds.) 1994.Google Scholar
- 24.The consequences of the tagset choice on the results of the corpus analysis are mentioned in S. Greenbaum and N. Yibin 1994, p. 34.Google Scholar
- 25.The use of HMMs for the grammatical tagging of corpora is described in, e.g., G. Leech, R. Garside and E. Atwell 1983, I. Marshall 1983, S. DeRose 1988, R. Sharman 1990, P. Brown, V. Della Pietra, et al. 1991. See also K. Church and L.R. Mercer 1993.Google Scholar
- 26.Meanwhile, the tagged BNC-lists have been removed from the web.Google Scholar
- 27.Unfortunately, neither G. Leech 1995 nor L. Burnard 1995 specify what exactly constitutes an error in tagging the BNC. A new project to improve the tagger was started in June 1995, however. It is called The British National Corpus Tag Enhancement Project’ and its results were originally scheduled to be made available in September 1996.Google Scholar