Boosting RVM Classifiers for Large Data Sets

Silva, Catarina; Ribeiro, Bernardete; Sung, Andrew H.

doi:10.1007/978-3-540-71629-7_26

Boosting RVM Classifiers for Large Data Sets

Catarina Silva^1,2,
Bernardete Ribeiro² &
Andrew H. Sung³

Conference paper

2000 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4432))

Abstract

Relevance Vector Machines (RVM) extend Support Vector Machines (SVM) to have probabilistic interpretations, to build sparse training models with fewer basis functions (i.e., relevance vectors or prototypes), and to realize Bayesian learning by placing priors over parameters (i.e., introducing hyperparameters). However, RVM algorithms do not scale up to large data sets. To overcome this problem, in this paper we propose a RVM boosting algorithm and demonstrate its potential with a text mining application. The idea is to build weaker classifiers, and then improve overall accuracy by using a boosting technique for document classification. The algorithm proposed is able to incorporate all the training data available; when combined with sampling techniques for choosing the working set, the boosted learning machine is able to attain high accuracy. Experiments on REUTERS benchmark show that the results achieve competitive accuracy against state-of-the-art SVM; meanwhile, the sparser solution found allows real-time implementations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tipping, M.: Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research I, 211–214 (2001)
Google Scholar
Seeger, M., Williams, C., Lawrence, N.: Fast Forward Selection to Speed up Sparse Gaussian Process Regression. In: International Workshop on AI and Statistics (2003)
Google Scholar
Csató, L., Opper, M.: Sparse Online Gaussian Processes. Neural Computation 14, 641–668 (2002)
Article MATH Google Scholar
Smola, A., Bartlett, P.: Sparse Greedy Gaussian Processes Regression. In: Advances in Neural Information Processing 13, pp. 619–625 (2001)
Google Scholar
Candela, J.: Learning with Uncertainty - Gaussian Processes and Relevance Vector Machines. PhD thesis, Technical University of Denmark (2004)
Google Scholar
Tipping, M., Faul, A.: Fast Marginal Likelihood Maximisation for Sparse Bayesian Models. In: International Workshop on Artificial Intelligence and Statistics (2003)
Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: International Conference Machine Learning, pp. 148–156 (1996)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)
Google Scholar
Schapire, R., Singer, Y.: Boostexter: A Boosting-based System for Text Categorization. Machine Learning 39(2/3), 135–168 (2000)
Article MATH Google Scholar
Sebastiani, F.: Classification of Text, Automatic. In: Brown, K. (ed.) The Encyclopedia of Language and Linguistics, 2nd edn., vol. 14, Elsevier, Amsterdam (2006)
Google Scholar
Eyheramendy, S., Genkin, A., Ju, W., Lewis, D., Madigan, D.: Sparse Bayesian Classifiers for Text Classification. Journal of Intelligence Community R&D (2003)
Google Scholar
Lewis, D.: An evaluation of phrasal and clustered representations on a text categorization task. In: 15th International ACMSIGIR Conference on Research and Development in Information Retrieval, pp. 37–50 (1992)
Google Scholar
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Google Scholar
Ruiz, M., Srinivasan, P.: Hierarchical Text Categorization Using Neural Networks. Information Retrieval 5, 87–118 (2002)
Article MATH Google Scholar
Yang, Y., Zhang, J., Kisiel, B.: A Scalability Analysis of Classifiers in Text Categorization. In: SIGIR ’03, pp. 96–103. ACM Press, New York (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Technology and Management, Polytechnic Institute of Leiria, Portugal
Catarina Silva
Dep. Informatics Eng., Center Informatics and Systems, Univ. of Coimbra, Portugal
Catarina Silva & Bernardete Ribeiro
Dep. Comp. Science, Inst. Complex Additive Sys. Analysis, New Mexico Tech, USA
Andrew H. Sung

Authors

Catarina Silva
View author publications
You can also search for this author in PubMed Google Scholar
Bernardete Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Andrew H. Sung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bartlomiej Beliczynski Andrzej Dzielinski Marcin Iwanowski Bernardete Ribeiro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, C., Ribeiro, B., Sung, A.H. (2007). Boosting RVM Classifiers for Large Data Sets. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71629-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-71629-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71590-0
Online ISBN: 978-3-540-71629-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics