Skip to main content

Part-of-Speech Tags and ICE Text Classification

  • Chapter
  • First Online:
Text Genres and Registers: The Computation of Linguistic Features

Abstract

Part-of-speech (POS) tags have been employed in automatic genre classification in that they do not ‘reflect the topic of the document, but rather the type of text used in the document’ and that their distribution has been observed to vary across different genres. The current study introduces a new set of linguistically fine-grained POS tags generated by AUTASYS for automatic genre classification. The experiment was designed to investigate the impact of the proposed feature set when compared and contrasted with word unigrams as a bag of words (BOW) and an impoverished POS tag set. Machine-learning tools were used to evaluate the classification performance in terms of F-score. The British component of the International Corpus of English was employed as a resource of different text genres. Ten different genre classification tasks were identified based on the existing British component of the International Corpus of English (ICE-GB) categories, which are grouped according to different granularities. As our results will show, the use of linguistically rich POS tags as discriminative features produces superior accuracy when compared with BOW for fine-grained genre classification. Our results will further demonstrate that the superior performance is due to the rich linguistic information since an impoverished tag set yielded worse classification results.

This study was originally presented at the 24th Pacific Asia Conference on Language, Information and Computation, Sendai, Japan, 4–7 November 2010.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Chengyu Fang .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fang, A., Cao, J. (2015). Part-of-Speech Tags and ICE Text Classification. In: Text Genres and Registers: The Computation of Linguistic Features. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45100-7_5

Download citation

Publish with us

Policies and ethics