Conclusions
Successful systems that classify texts and assign subject or classification codes rely upon the words and phrases of the texts. In many text categorization situations the number of patterns is large to manually acquire. In this case, the classifier is trained upon example texts. We investigated three aspects of text classifiers when categorizing magazine articles with broad subject descriptors: feature selection, learning algorithms, and improvement of the quality of the learned classifier by selection and grouping of the examples. Because the subject descriptors regard the broad topics of the texts, an initial feature selection that identifies the topic terms is important. Selecting important content words and proper names based upon the term frequency that is normalized by the maximum number a content term occurs in the text is effective. Adding knowledge of the discourse structure in the term selection process is useful for certain text classes. Given the limited number of positive examples and the high number of text features in the articles that belong to a variety of magazines, columns, and subject domains, the results of training a text classifier with the χ2 algorithm are very satisfying.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Rights and permissions
Copyright information
© 2002 Kluwer Academic Publishers
About this chapter
Cite this chapter
(2002). The Assignment of Subject Descriptors to Magazine Articles. In: Automatic Indexing and Abstracting of Document Texts. The Information Retrieval Series, vol 6. Springer, Boston, MA. https://doi.org/10.1007/0-306-47017-9_10
Download citation
DOI: https://doi.org/10.1007/0-306-47017-9_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7793-1
Online ISBN: 978-0-306-47017-2
eBook Packages: Springer Book Archive