Skip to main content

Discovering Emerging Topics in Unlabelled Text Collections

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4152))

Abstract

As document collections accummulate over time, some of the discussion subjects in them become outfashioned, while new ones emerge. Then, old classification schemes should be updated. In this paper, we address the challenge of finding emerging and persistent “themes”, i.e. subjects that live long enough to be incorporated into a taxonomy or ontology describing the document collection. We focus on the identification of cluster labels that “survive” changes in the constitution of the underlying population of documents, including changes in the feature space of dominant words, because the terminology of the document archive also changes over time. We have conducted a set of promising experiments on the identification of themes that manifested themselves in section H2.8 of the ACM digital library and juxtapose them with the classes foreseen in the ACM taxonomy for this section.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.: On change diagnosis in evolving data streams. IEEE TKDE 17(5), 587–600 (2005)

    Google Scholar 

  2. Allan, J.: Introduction to Topic Detection and Tracking. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  3. Borgelt, C., Nürnberger, A.: Experiments in Document Clustering using Cluster Specific Term Weights. In: Proc. Workshop Machine Learning and Interaction for Text-based Information Retrieval (TIR 2004), Germany, pp. 55–68. University of Ulm (2004)

    Google Scholar 

  4. Ganti, V., Gehrke, J., Ramakrishnan, R.: A Framework for Measuring Changes in Data Characteristics. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsylvania, May 1999, pp. 126–137. ACM Press, New York (1999)

    Google Scholar 

  5. Kontostathis, A., Galitsky, L., Pottenger, W.M., Roy, S., Phelps, D.J.: A Survey of Emerging Trend Detection in Textual Data Mining. Springer, Heidelberg (2003)

    Google Scholar 

  6. Moringa, S., Yamanishi, K.: Tracking Dynamics of Topic Trends Using a Finite Mixture Model. In: Kohavi, R., Gehrke, J., DuMouchel, W., Ghosh, J. (eds.) Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, August 2004, pp. 811–816. ACM Press, New York (2004)

    Chapter  Google Scholar 

  7. Mei, Q., Zhai, C.: Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, August 2005, pp. 198–207. ACM Press, New York (2005)

    Chapter  Google Scholar 

  8. Neill, D., Moore, A., Sabhnani, M., Daniel, K.: Detection of emerging space-time clusters. In: Proc. of KDD 2005, Chicago, IL, August 2005, pp. 218–227 (2005)

    Google Scholar 

  9. Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: Monic – modeling and monitoring cluster transitions. In: Proc. of 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, USA, August 2006, pages. 6. ACM Press, New York (2006)

    Google Scholar 

  10. Schult, R., Spiliopoulou, M.: Expanding the Taxonomies of Bibliographic Archives with Persistent Long-Term Themes. In: SAC 2006, ACM Press, New York (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schult, R., Spiliopoulou, M. (2006). Discovering Emerging Topics in Unlabelled Text Collections. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds) Advances in Databases and Information Systems. ADBIS 2006. Lecture Notes in Computer Science, vol 4152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827252_27

Download citation

  • DOI: https://doi.org/10.1007/11827252_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37899-0

  • Online ISBN: 978-3-540-37900-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics