Topic Extraction from Text Documents Using Multiple-Cause Networks

Chang, Jeong-Ho; Won Lee, Jae; Kim, Yuseop; Zhang, Byoung-Tak

doi:10.1007/3-540-45683-X_47

Jeong-Ho Chang³,
Jae Won Lee⁴,
Yuseop Kim⁵ &
…
Byoung-Tak Zhang³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2417))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

856 Accesses
1 Citations

Abstract

This paper presents an approach to the topic extraction from text documents using probabilistic graphical models. Multiple-cause networks with latent variables are used and the Helmholtz machines are utilized to ease the learning and inference. The learning in this model is conducted in a purely data-driven way and does not require prespecified categories of the given documents. Topic words extraction experiments on the TDT-2 collection are presented. Especially, document clustering results on a subset of TREC-8 ad-hoc task data show the substantial reduction of the inference time without significant deterioration of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dayan, P., Hinton, G.E., Neal, R. M., Zemel, R. S.: The Helmholtz machine. Neural Computation 7 (1995) 889–904
Article Google Scholar
Dayan, P., Zemel, R.S.: Competition and multiple cause models. Neural Computation 7 (1995) 565–579
Article Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science. 41 (1990) 391–407
Article Google Scholar
deSa, V.R., deCharms, R.C., Merzenich, M.M.: Using Helmholtz machines to analyze multi-channel neuronal recordings. Advances in Neural Information Processing Systems 10 (1998) 131–137
Google Scholar
Frey, B.J.: Graphical Models for Machine Learning and Digital Communication. The MIT Press (1998)
Google Scholar
Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The wake-sleep algorithm for unsupervised neural networks. Science 268 (1995) 1158–1161.
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. Proceedings of the 22th International Conference on Research and Development in Information Retrieval (SIGIR) (1999) 50–57
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401 (1999) 788–791
Article Google Scholar
Sahami, M., Hearst, M., Saund, E.: Applying the multiple cause mixture model to Text Categorization. Proceedings of the 13th International Conference on Machine Learning (1996) 435–443
Google Scholar
Saund, E.: A multiple cause mixture model for unsupervised learning. Neural Computation 7 (1995) 51–71
Article Google Scholar
Teh, Y.W., Hinton, G.E.: Rate-coded restricted Boltzmann machines for face recognition. Advances in Neural Information Processing Systems 13 (2001) 908–914
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Seoul National University, Seoul, Korea, 151-742
Jeong-Ho Chang & Byoung-Tak Zhang
School of Computer Science and Engineering, Sungshin Women’s University, Seoul, Korea, 136-742
Jae Won Lee
Ewha Institute of Science and Technology, Ewha Woman’s University, Seoul, Korea, 120-750
Yuseop Kim

Authors

Jeong-Ho Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jae Won Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yuseop Kim
View author publications
You can also search for this author in PubMed Google Scholar
Byoung-Tak Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Science and Technology Department of Information and Communication Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Mitsuru Ishizuka
School of Information Technology Knowledge Representation and Reasoning Unit (KRRU) Faculty of Engineering and Information Technology, Griffith University, PMB 50 Gold Coast Mail Centre, Queensland, 9726, Australia
Abdul Sattar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, JH., Won Lee, J., Kim, Y., Zhang, BT. (2002). Topic Extraction from Text Documents Using Multiple-Cause Networks. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_47

Download citation

DOI: https://doi.org/10.1007/3-540-45683-X_47
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44038-3
Online ISBN: 978-3-540-45683-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics