Skip to main content

Topic Extraction from Text Documents Using Multiple-Cause Networks

  • Conference paper
  • First Online:
PRICAI 2002: Trends in Artificial Intelligence (PRICAI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2417))

Included in the following conference series:

Abstract

This paper presents an approach to the topic extraction from text documents using probabilistic graphical models. Multiple-cause networks with latent variables are used and the Helmholtz machines are utilized to ease the learning and inference. The learning in this model is conducted in a purely data-driven way and does not require prespecified categories of the given documents. Topic words extraction experiments on the TDT-2 collection are presented. Especially, document clustering results on a subset of TREC-8 ad-hoc task data show the substantial reduction of the inference time without significant deterioration of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dayan, P., Hinton, G.E., Neal, R. M., Zemel, R. S.: The Helmholtz machine. Neural Computation 7 (1995) 889–904

    Article  Google Scholar 

  2. Dayan, P., Zemel, R.S.: Competition and multiple cause models. Neural Computation 7 (1995) 565–579

    Article  Google Scholar 

  3. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science. 41 (1990) 391–407

    Article  Google Scholar 

  4. deSa, V.R., deCharms, R.C., Merzenich, M.M.: Using Helmholtz machines to analyze multi-channel neuronal recordings. Advances in Neural Information Processing Systems 10 (1998) 131–137

    Google Scholar 

  5. Frey, B.J.: Graphical Models for Machine Learning and Digital Communication. The MIT Press (1998)

    Google Scholar 

  6. Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The wake-sleep algorithm for unsupervised neural networks. Science 268 (1995) 1158–1161.

    Article  Google Scholar 

  7. Hofmann, T.: Probabilistic latent semantic indexing. Proceedings of the 22th International Conference on Research and Development in Information Retrieval (SIGIR) (1999) 50–57

    Google Scholar 

  8. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401 (1999) 788–791

    Article  Google Scholar 

  9. Sahami, M., Hearst, M., Saund, E.: Applying the multiple cause mixture model to Text Categorization. Proceedings of the 13th International Conference on Machine Learning (1996) 435–443

    Google Scholar 

  10. Saund, E.: A multiple cause mixture model for unsupervised learning. Neural Computation 7 (1995) 51–71

    Article  Google Scholar 

  11. Teh, Y.W., Hinton, G.E.: Rate-coded restricted Boltzmann machines for face recognition. Advances in Neural Information Processing Systems 13 (2001) 908–914

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chang, JH., Won Lee, J., Kim, Y., Zhang, BT. (2002). Topic Extraction from Text Documents Using Multiple-Cause Networks. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_47

Download citation

  • DOI: https://doi.org/10.1007/3-540-45683-X_47

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44038-3

  • Online ISBN: 978-3-540-45683-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics