News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

Yan, Zehua; Li, Fang

doi:10.1007/978-3-642-24958-7_49

Zehua Yan¹⁸ &
Fang Li¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7063))

Included in the following conference series:

International Conference on Neural Information Processing

2577 Accesses
1 Citations

Abstract

Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirichlet Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model “presidential election” of different years as a background phrase and “Obama wins” as a thread for event “2008 USA presidential election”. We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Nallapati, R., Feng, A., Peng, F., Allan, J.: Event threading within news topics. In: Proceedings of the Thirteenth ACM International Conference on Information and knowledge Management, pp. 446–453. ACM (2004)
Google Scholar
Chemudugunta, C., Smyth, P., Steyvers, M.: Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model. In: Advances in Neural Information Processing Systems, pp. 241–242 (2006)
Google Scholar
Li, P., Jiang, J., Wang, Y.: Generating templates of entity summaries with an entity-aspect model and pattern mining. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 640–649. Association for Computational Linguistics (2010)
Google Scholar
Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)
Google Scholar
MacKay, D.J.C., Peto, L.C.B.: A hierarchical dirichlet language model. Natural language engineering 1(03), 289–308 (1995)
Article Google Scholar
Griffiths, T.L., Steyvers, M., Tenenbaum, J.B.: Topics in semantic representation. Psychological Review 114(2), 211 (2007)
Article Google Scholar
Wang, X., McCallum, A., Wei, X.: Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In: Seventh IEEE International Conference on Data Mining ICDM 2007, pp. 697–702. IEEE (2007)
Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Machine learning 37(2), 183–233 (1999)
Article MATH Google Scholar
Andrieu, C., De Freitas, N., Doucet, A., Jordan, M.I.: An introduction to mcmc for machine learning. Machine learning 50(1), 5–43 (2003)
Article MATH Google Scholar
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pp. 352–359. Citeseer (2002)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101(suppl. 1), 5228 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
Zehua Yan & Fang Li

Authors

Zehua Yan
View author publications
You can also search for this author in PubMed Google Scholar
Fang Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800, Dongchuan Road, 200240, Shanghai, China
Bao-Liang Lu & Liqing Zhang &
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
James Kwok

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, Z., Li, F. (2011). News Thread Extraction Based on Topical N-Gram Model with a Background Distribution. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, vol 7063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24958-7_49

Download citation

DOI: https://doi.org/10.1007/978-3-642-24958-7_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24957-0
Online ISBN: 978-3-642-24958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics