Corpus analysis

Hausser, Roland

doi:10.1007/978-3-662-04337-0_16

Roland Hausser²

218 Accesses

Abstract

This chapter first analyzes the general relation between linguistic analysis and computational method. As a familiar example, automatic word form recognition is used. This example exhibits a number of properties which are methodologically characteristic for all components of grammar. We then show methods for investigating the frequency distribution of words in natural language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

See O. Jespersen 1921, p. 341–346.
Google Scholar
See K. Hess, J. Brustkern and W. Lenders 1983.
Google Scholar
Cf. H. Bergenholtz 1989, D. Biber 1994, N. Oostdijk and P. de Haan (eds.) 1994.
Google Scholar
The consequences of the tagset choice on the results of the corpus analysis are mentioned in S. Greenbaum and N. Yibin 1994, p. 34.
Google Scholar
The use of HMMs for the grammatical tagging of corpora is described in, e.g., G. Leech, R. Garside and E. Atwell 1983, I. Marshall 1983, S. DeRose 1988, R. Sharman 1990, P. Brown, V. Della Pietra, et al. 1991. See also K. Church and L.R. Mercer 1993.
Google Scholar
Meanwhile, the tagged BNC-lists have been removed from the web.
Google Scholar
Unfortunately, neither G. Leech 1995 nor L. Burnard 1995 specify what exactly constitutes an error in tagging the BNC. A new project to improve the tagger was started in June 1995, however. It is called The British National Corpus Tag Enhancement Project’ and its results were originally scheduled to be made available in September 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Friedrich Alexander University Erlangen Nürnberg, Bismarckstr. 12, 91054, Erlangen, Germany
Roland Hausser (Professor of Computational Linguistics)

Authors

Roland Hausser
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hausser, R. (2001). Corpus analysis. In: Foundations of Computational Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04337-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-662-04337-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07626-8
Online ISBN: 978-3-662-04337-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics