Background on Text Classification
In this chapter background material for studying text classification problems is presented along with the notation used throughout the book. After describing the problem, a summary of typical applications is given and document representation issues are introduced followed by commonly used pre-processing steps, including dimensionality reduction. Next, state-of-the-art classifiers for text classification are briefly reviewed with current achievements, followed by some widely accepted performance evaluation metrics and benchmarks.
To determine the influence and relative importance of pre-processing methods in text classification performance an empirical study was carried out to compare dimensionality reduction techniques, using standard learning machines and benchmarks. Results and analysis of this study are reported and finally the conclusions on the relative success of the several pre-processing, learning and evaluation approaches are presented.
KeywordsTesting Document Latent Semantic Indexing Relevance Vector Machine Dimensionality Reduction Technique Document Representation
Unable to display preview. Download preview PDF.