Abstract
The chapter discusses the various types of corpora, and provides a sense of how words behave inside them. Quantitative exploration of individual words in corpus is shown using frequency and information content measures. Quantitative exploration of co-occurrences of words, called collocations, is shown using the point-wise mutual information and other measures. Concordancers, a tool for viewing words in their immediate contextual environment within a corpus, are introduced for qualitative exploration of corpora. Experiment: Comparing word frequencies between domain-specific corpora.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Barrière, C. (2016). Exploring Corpora. In: Natural Language Understanding in a Semantic Web Context. Springer, Cham. https://doi.org/10.1007/978-3-319-41337-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-41337-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41335-8
Online ISBN: 978-3-319-41337-2
eBook Packages: Computer ScienceComputer Science (R0)