Topic Learning from Few Examples

Zhu, Huaiyu; Vaithyanathan, Shivakumar; Joshi, Mahesh V.

doi:10.1007/978-3-540-39804-2_43

Huaiyu Zhu¹⁰,
Shivakumar Vaithyanathan¹⁰ &
Mahesh V. Joshi¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2838))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2149 Accesses
2 Citations

Abstract

This paper describes a semi-supervised algorithm for single class learning with very few examples. The problem is formulated as a hierarchical latent variable model which is clipped to ignore classes not of interest. The model is trained using a multistage EM (msEM) algorithm. The msEM algorithm maximizes the likelihood of the joint distribution of the data and latent variables, under the constraint that the distribution of each layer is fixed in successive stages. We demonstrate that with very few positive examples, the algorithm performs better than training all layers in a single stage. We also show that the latter is equivalent to training a single layer model with corresponding parameters. The performance of the algorithm was verified on several real-world information extraction tasks.

Download to read the full chapter text

Chapter PDF

Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification

Article 06 June 2017

Improving Classification Using Topic Correlation and Expectation Propagation

Subset Labeled LDA: A Topic Model for Extreme Multi-label Classification

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Amari, S.I.: Information geometry of the EM and em algorithms for neural networks. Neural Networks 8(9), 1379–1408 (1995)
Article Google Scholar
Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood for incomplete data via the EM algorithm. J. of the Royal Statistical Society, ser. B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Liu, B., Lee, W.S., Yu, P., Li, X.: Partially supervised classification of text documents. In: International Conference On Machine Learning (2002)
Google Scholar
Manevitz, L., Yousef, M.: One class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Article Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Rosset, S., Segal, E.: Boosting density estimation. In: NIPS (2002)
Google Scholar
Scholkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13, 1443–1472 (2001)
Article Google Scholar
Wu, C.: On the convergence properties of the EM algorithm. Annals of Statistics 11, 95–103 (1983)
Article MATH MathSciNet Google Scholar
Yang, Y., Liu, X.: Are-examination of text categorization methods. In: Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Google Scholar
Yu, H., Han, J., Chang, K.C.-C.: PEBL: Positive example based learning for web page classification using SVM. In: Proceedings of 2002 SIGKDD Conference, pp. 239–248 (2002)
Google Scholar
Zhu, H., Vaithyanathan, S.: A multistage EM algorithm for training reduced hierarchical latent variable models. Technical Report RJ-10283, IBM Research Report (January 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Almaden Research Center, 650 Harry Road, San Jose, CA, USA
Huaiyu Zhu, Shivakumar Vaithyanathan & Mahesh V. Joshi

Authors

Huaiyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shivakumar Vaithyanathan
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh V. Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, H., Vaithyanathan, S., Joshi, M.V. (2003). Topic Learning from Few Examples. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-39804-2_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Topic Learning from Few Examples

Abstract

Chapter PDF

Similar content being viewed by others

Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification

Improving Classification Using Topic Correlation and Expectation Propagation

Subset Labeled LDA: A Topic Model for Extreme Multi-label Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Topic Learning from Few Examples

Abstract

Chapter PDF

Similar content being viewed by others

Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification

Improving Classification Using Topic Correlation and Expectation Propagation

Subset Labeled LDA: A Topic Model for Extreme Multi-label Classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation