Abstract
In this chapter background material for studying text classification problems is presented along with the notation used throughout the book. After describing the problem, a summary of typical applications is given and document representation issues are introduced followed by commonly used pre-processing steps, including dimensionality reduction. Next, state-of-the-art classifiers for text classification are briefly reviewed with current achievements, followed by some widely accepted performance evaluation metrics and benchmarks.
To determine the influence and relative importance of pre-processing methods in text classification performance an empirical study was carried out to compare dimensionality reduction techniques, using standard learning machines and benchmarks. Results and analysis of this study are reported and finally the conclusions on the relative success of the several pre-processing, learning and evaluation approaches are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Silva, C., Ribeiro, B. (2010). Background on Text Classification. In: Inductive Inference for Large Scale Text Classification. Studies in Computational Intelligence, vol 255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04533-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-04533-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04532-5
Online ISBN: 978-3-642-04533-2
eBook Packages: EngineeringEngineering (R0)