Skip to main content

Online New Event Detection Based on IPLSA

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Abstract

New event detection (NED) involves monitoring one or multiple news streams to detect the stories that report on new events. With the overwhelming volume of news available today, NED has become a challenging task. In this paper, we proposed a new NED model based on incremental PLSA(IPLSA), and it can handle new document arriving in a stream and update parameters with less time complexity. Moreover, to avoid the limitation of TF-IDF method, a new approach of term reweighting is proposed. By dynamically exploiting importance of documents in discrimination of terms and documents’ topic information, this approach is more accurate. Experimental results on Linguistic Data Consortium (LDC) datasets TDT4 show that the proposed model can improve both recall and precision of NED task significantly, compared to the baseline system and other existing systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J.: Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Dordrecht (2002)

    Book  MATH  Google Scholar 

  2. Papka, R., Allan, J.: On-line New Event Detection Using Single Pass Clustering TITLE2: Technical Report UM-CS-1998-021 (1998)

    Google Scholar 

  3. Allan, J., Lavrenko, V., Jin, H.: First story detection in tdt is hard. Washiongton DC. In: Proceedings of the Ninth International Conference on Informaiton and Knowledge Management (2000)

    Google Scholar 

  4. Giridhar, K., Allan, J., Andrew, M.: Classification Models for New Event Detection. In: Proceeding of CIKM (2004)

    Google Scholar 

  5. Yang, Y., Pierce, T., Carbonell, J.: A Study on Retrospective and On-line Event Detection. In: Proceedings of SIGIR, Melbourne, Australia, pp. 28–36 (1998)

    Google Scholar 

  6. Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, Bounds, and Timelines: Umass and tdt-3. In: Proceedings of Topic Detection and Tracking Workshop (TDT-3), Vienna, VA, pp. 167–174 (2000)

    Google Scholar 

  7. Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned Novelty Detection. In: Proceedings of the 8th ACM SIGKDD International Conference, pp. 688–693 (2002)

    Google Scholar 

  8. Juha, M., Helena, A.M., Marko, S.: Applying Semantic Classes in Event Detection and Tracking. In: Proceedings of International Conference on Natural Language Processing, pp. 175–183 (2002)

    Google Scholar 

  9. Juha, M., Helena, A.M., Marko, S.: Simple Semantics in Topic Detection and Tracking. Information Retrieval, 347–368 (2004)

    Google Scholar 

  10. Giridhar, K., Allan, J.: Text Classification and Named Entities for New Event Detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference, New York, NY, USA, pp. 297–304 (2004)

    Google Scholar 

  11. Papka, R., Allan, J.: On-line New Event Detection Using Single Pass Clustering TITLE2: Technical Report UM-CS-1998-021 (1998)

    Google Scholar 

  12. Lam, W., Meng, H., Wong, K., Yen, J.: Using Contextual Analysis for News Event Detection. International Journal on Intelligent Systems, 525–546 (2001)

    Google Scholar 

  13. Thorsten, B., Francine, C., Ayman, F.: A System for New Event Detection. In: Proceedings of the 26th AnnualInternational ACM SIGIR Conference, pp. 330–337. ACM Press, New York (2003)

    Google Scholar 

  14. Nicol, S.a., Joe, C.: Combining Semantic and Syntactic Document Classifiers to Improve First Story Detection. In: Proceedings of the 24th Annual International ACM SIGIR Conference, pp. 424–425. ACM Press, New York (2001)

    Google Scholar 

  15. Luo, G., Tang, C., Yu, P.S.: Resource-Adaptive Real-Time New Event Detection. In: SIGMOD, pp. 497–508 (2007)

    Google Scholar 

  16. Kuo, Z., Zi, L.J., Gang, W.: New Event Detection Based on Indexing-tree and Named Entity. In: Proceedings of SIGIR, pp. 215–222 (2007)

    Google Scholar 

  17. Makkonen, J., Ahonen-Myka, H., Salmenkivi, M.: Applying semantic classes in event detection and tracking. In: Proceedings of International Conference on Natural Language Processing, pp. 175–183 (2002)

    Google Scholar 

  18. Makkonen, J., Ahonen-Myka, H., Salmenkivi, M.: Simple semantics in topic detection and tracking. In: Information Retrieval, pp. 347–368 (2004)

    Google Scholar 

  19. Zhang, J., Ghahramani, Z., Yang, Y.: A probabilistic model for online document clustering with application to novelty detection. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 1617–1624. MIT Press, Cambridge (2005)

    Google Scholar 

  20. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proc. ACMSIGIR 1999 (1999)

    Google Scholar 

  21. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statistical Soc. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  22. Brants, T., Chen, F., Tsochantaridis, I.: Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis. In: Proc. 11th ACM Int’l Conf. Information and Knowledge Management (2002)

    Google Scholar 

  23. Girolami, M., Kaban, A.: On an Equivalence Between PLSI and LDA. In: Proc. of SIGIR, pp. 433–434 (2003)

    Google Scholar 

  24. Thomas, H.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Maching Learning Journal 42(1-2), 177–196 (2001)

    MATH  Google Scholar 

  25. Chou, T.C., Chen, M.C.: Using Incremental PLSA for Threshold Resilient Online Event Anlysis. IEEE Transaction on Knowledge and Data Engineering 20(3), 289–299 (2008)

    Article  MathSciNet  Google Scholar 

  26. Chien, J.T., Wu, M.S.: Adaptive Bayesian Latent Semantic Analysis. IEEE Transactions on Audio, Speech, and Language Processing 16(1), 198–207 (2008)

    Article  MathSciNet  Google Scholar 

  27. Wu, H., Yongji, W., Xiang, C.: Incremental probabilistic latent semantic analysis for automatic question recommendation. In: Proceedings of ACM conference on Recommender systems, Lausanne, Switzerland, October 23-25 (2008)

    Google Scholar 

  28. Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization. In: Fisher, J.D.H. (ed.) The Fourteenth International Conference on MachineLearning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, X., Li, Z. (2009). Online New Event Detection Based on IPLSA. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03348-3_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03347-6

  • Online ISBN: 978-3-642-03348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics