Skip to main content

A Survey of Emerging Trend Detection in Textual Data Mining

  • Chapter

Abstract

In this chapter we describe several systems that detect emerging trends in textual data. Some of the systems are semiautomatic, requiring user input to begin processing, and others are fully automatic, producing output from the input corpus without guidance. For each Emerging Trend Detection (ETD) system we describe components including linguistic and statistical features, learning algorithms, training and test set generation, visualization, and evaluation. We also provide a brief overview of several commercial products with capabilities of detecting trends in textual data, followed by an industrial viewpoint describing the importance of trend detection tools, and an overview of how such tools are used.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Allan, L. Ballesteros, J. Callan, W. Croft, and Z. Lu.Recent experiments with inquery.In Proceedings of the Fourth Text Retrieval Conference (TREC-4), pages 49–63, 1995.

    Google Scholar 

  2. J. Allan, R. Papka, and V. Lavrenko.On-line new event detection and tracking.In Proceedings of ACM SIGIR, pages 37–45, 1998.

    Google Scholar 

  3. Applied Semantics [online, cited July 2002]. Available from World Wide Web: www.appliedsemantics.corn.

    Google Scholar 

  4. R. Agrawal, G. Psaila, E.L. Wimmers, and M. Zait.Querying shapes of histories.In Proceedings of the 21st International Conference on Very Large Databases,Zurich, Sep 1995.

    Google Scholar 

  5. R. Agrawal and R. Srikant.Mining sequential patterns.In Proceedings of the International Conference on Data Engineering (ICDE),Taipei, Mar 1995.

    Google Scholar 

  6. Autonomy [online, cited July 2002].Available from World Wide Web: www.autonomy. corn.

    Google Scholar 

  7. Autonomy [online, cited July 2002].Available from World Wide Web: www.autonomy. com/Content/Technology/Background/ IntellectualFoundations.

    Google Scholar 

  8. Knowlege Suite (Review) [online].1999 [cited July 2002 ]. Available fromWorld Wide Web: www. autonomy. com/Extranet/Marketing/ Analyst White Papers/Butler Report on Autonomy Suite 200299.pdf.

    Google Scholar 

  9. Banter [online, cited July 2002].Available from World Wide Web: www.banter. corn.

    Google Scholar 

  10. R. Bader, M. Callahan, D. Grim, J. Krause, N. Miller, and W.M. Pottenger.The role of the HDDITM collection builder in hierarchical distributed dynamic indexing.In Proceedings of the Textmine ‘01 Workshop, First SIAM International Conference on Data Mining,Apr 2001.

    Google Scholar 

  11. D. Bikel, S. Miller, R. Schwartz, and R. Weischedel.Nymble: A high-performance learning name-finder.In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 194–201, 1997.

    Google Scholar 

  12. F. Bouskila and W.M. Pottenger.The role of semantic locality in hierarchical distributed dynamic indexing.In Proceedings of the 2000 International Conference on Artificial Intelligence (IC-Al 2000),Las Vegas, Jun 2000.

    Google Scholar 

  13. G.D. Blank, W.M. Pottenger, G.D. Kessler, M. Herr, H. Jaffe, S. Roy, D. Gevry, and Q. Wang.Cimel: Constructive, collaborative inquiry-based multimedia elearning.In Proceedings of the Sixth Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE),Jun 2001.

    Google Scholar 

  14. G.D. Blank, W.M. Pottenger, G.D. Kessler, S. Roy, D.R. Gevry, J.J. Heigl, S.A. Sahasrabudhe, and Q. Wang.Design and evaluation of multimedia to teach Java and object-oriented software engineering.American Society for Engineering Education, Jun 2002.

    Google Scholar 

  15. Bri92] E. Brill.A simple rule-based part of speech tagger.In Proceedings of the Third Conference on Applied Natural Language Processing. ACL, 1992.

    Google Scholar 

  16. D. Bryan, Jul 2002. Email correspondence.

    Google Scholar 

  17. Captiva [online, cited July 2002].Available from World Wide Web: www.captivacorp.com.

    Google Scholar 

  18. C. Chen and L. Car.A semantic-centric approach to information visualization.In Proceedings of the 1999 International Conference on Information Visualization, pages 18–23, 1999.

    Google Scholar 

  19. CIMEL [online, cited July 2002].Available from World Wide Web: www.cse.lehigh.edu/”cimel.

    Google Scholar 

  20. H. Chen and K.J. Lynch.Automatic construction of networks of concepts characterizing document databases.IEEE Transactions on Systems, Man and Cybernetics, 22 (5): 885–902, 1992.

    Article  Google Scholar 

  21. ClearForest [online, cited July 2002 ]. Available from World Wide Web: www. clearforest. corn.

    Google Scholar 

  22. ClusterizerTM [online, cited July 20021.Available from World Wide Web: www.autonomy.com/Extranet/Technical/Modules/ TB Autonomy Clusterizer.pdf.

    Google Scholar 

  23. COMPENDEX® [online, cited July 2002].Available from World Wide Web: edina.ac.uk/compendex.

    Google Scholar 

  24. Delphion [online, cited July 2002].Available from World Wide Web: www.delphion. corn.

    Google Scholar 

  25. G.S. Davidson, B. Hendrickson, D.K. Johnson, C. E. Meyers, and B.N. Wylie.Knowledge mining with VxlnsightTM: Discovery through interaction.Journal of Intelligent Information Systems, 11 (3): 259–285, 1998.

    Google Scholar 

  26. E. Edgington.Randomization Tests.Marcel Dekker, New York, 1995.

    Google Scholar 

  27. Factiva [online, cited July 2002].Available from World Wide Web: www.factiva.com.

    Google Scholar 

  28. R. Feldman and I. Dagan.Knowledge discovery in textual databases.In Proceedings of the First International Conference on Knowledge Discovery (KDD-95). ACM, New York, Aug 1995.

    Google Scholar 

  29. D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert.Description of the UMASS systems as used for MUC-6.In Proceedings of the Sixth Message Understanding Conference, pages 127–140, Nov 1995.

    Google Scholar 

  30. GartnerG2 [online, cited July 2002].Available from World Wide Web: www.gartnerg2.com/site/default. asp.

    Google Scholar 

  31. D. Gevry.Detection of emerging trends: Automation of domain expert practices.Master’s thesis, Department of Computer Science and Engineering at Lehigh University, 2002.

    Google Scholar 

  32. B. Graubart.White paper, turning unstructured data overload into a competitive advantage, Jul 2002. Email attachment.

    Google Scholar 

  33. HDDITM [online, cited July 2002].Available from World Wide Web: hddi cse.lehigh.edu.

    Google Scholar 

  34. S. Havre, E. Hetzler, P. Whitney, and L. Nowell.ThemeRiver: Visualizing thematic changes in large document collections.IEEE Transactions on Visualization and Computer Graphics, 8(1), Jan — Mar 2002.

    Google Scholar 

  35. HyBrix [online, cited July 2002].Available from World Wide Web: www.siemens.com/index.jsp.

    Google Scholar 

  36. IDC [online, cited July 2002 ]. Available from World Wide Web: www. idc.com.

    Google Scholar 

  37. INSPEC® [online, cited July 2002].Available from World Wide Web: www.iee.org.uk/Publish/INSPEC.

    Google Scholar 

  38. Interwoven [online, cited July 2002].Available from World Wide Web: www.interwoven.com/products.

    Google Scholar 

  39. A. Leuski and J. Allan.Lighthouse: Showing the way to relevant information.In Proceedings of the IEEE Symposium on Information Visualization (InfoVis), pages 125–130, 2000.

    Google Scholar 

  40. A. Leuski and J. Allan.Strategy-based interactive cluster visualization for information retrieval international Journal on Digital Libraries, 3 (2): 170–184, 2000.

    Article  Google Scholar 

  41. B. Lent, R. Agrawal, and R. Srikant.Discovering trends in text databases.In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, pages 227–230, 1997.

    Google Scholar 

  42. LexisNexis [online, cited July 2002].Available from World Wide Web: www.lexisnexis.corn.

    Google Scholar 

  43. L. Leydesdorff.Indicators of structural change in the dynamics of science: Entropy statistics of the sci journal citation reports.Scientometrics,53(1):131159, 2002.

    Google Scholar 

  44. Linguistic Data Consortium [online, cited July 2002 ]. Available from World Wide Web: www. ldc. upenn. edu.

    Google Scholar 

  45. Lockheed-Martin [online, cited July 2002].Available from World Wide Web: www.lockheedmartin.com.

    Google Scholar 

  46. V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan.Mining of concurrent text and time-series.In Proceedings of the ACM KDD-2000 Text Mining Workshop,2000.

    Google Scholar 

  47. A. Martin, T.K.G. Doddington, M. Ordowski, and M. Przybocki.The DET curve in assessment of detection task performance.In Proceedings of EuroSpeech ‘97, vol. 4, pages 1895–1898, 1997.

    Google Scholar 

  48. Moreover [online, cited July 2002].Available from World Wide Web: www.moreover. corn.

    Google Scholar 

  49. L.T. Nowell, R.K. France, D. Hix, L. S Heath, and E.A. Fox.Visualizing search results: Some alternatives to query-document similarity.In Proceedings of SIGIR’96, Zurich, pages 67–75, 1996.

    Google Scholar 

  50. Northern Light [online, cited July 2002].Available from World Wide Web: www.northernlight.corn.

    Google Scholar 

  51. W.M. Pottenger, M.R. Callahan, and M.A. Padgett.Distributed information management.Annual Review of Information Science and Technology (ARIST), 35, 2001.

    Google Scholar 

  52. A.L. Porter and M.J. Detampel.Technology opportunities analysis. Technological Forecasting and Social Change, 49: 237–255, 1995.

    Article  Google Scholar 

  53. A. Popescul, G.W. Flake, S. Lawrence, L. Ungar, and C.L. Giles.Clustering and identifying temporal trends in document databases.In Proceedings of IEEE Advances in Digital Libraries, pages 173–182, 2000.

    Google Scholar 

  54. W.M. Pottenger, Y. Kim, and D.D. Meling.HDDITM: Hierarchical distributed dynamic indexing.In Data Mining for Scientific and Engineering Applications, Robert Grossman, Chandrika Kamath, Vipin Kumar and Raju Namburu, eds., Jul 2001.

    Google Scholar 

  55. C. Plaisant, R. Mushlin, A. Snyder, J. Li, D. Heller, and B. Shneiderman.Lifelines: Using visualization to enhance navigation and analysis of patient records.In Proceedings of the 1998 American Medical Informatic Association Annual Fall Symposium, pages 76–80, 1998.

    Google Scholar 

  56. W.M. Pottenger and T. Yang.Detecting emerging concepts in textual data mining.In Computational Information Retrieval, M.W. Berry, ed., pages 89–105, SIAM, Philadelphia, 2001.

    Google Scholar 

  57. S. Roy, D. Gevry, and W.M. Pottenger.Methodologies for trend detection in textual data mining.In Proceedings of the Textmine ‘02 Workshop, Second SIAM International Conference on Data Mining,Apr 2002.

    Google Scholar 

  58. S. Roy.A multimedia interface for emerging trend detection in inquiry-based learning.Master’s thesis, Department of Computer Science and Engineering at Lehigh University, May 2002.

    Google Scholar 

  59. R. Srikant and R. Agrawal.Mining sequential patterns: Generalizations and performance improvements.In Proceedings of the Fifth International Conference on Extending Database Technology (EDBT),Avignon, 1996.

    Google Scholar 

  60. R. Swan and J. Allan.Automatic generation of overview timelines.In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, ACM, New York, pages 49–56, 2000.

    Google Scholar 

  61. Semio [online, cited July 2002].Available from World Wide Web: www.semio.com.

    Google Scholar 

  62. Ser Solutions [online, cited July 2002]. Available from World Wide Web: www.sersolutions.com.

    Google Scholar 

  63. R. Swan and D. Jensen.TimeMines: Constructing timelines with statistical models of word usage.In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2000.

    Google Scholar 

  64. SPSS Clementine [online, cited July 2002].Available from World Wide Web:www.spss.corn/spssbi/clementine.

    Google Scholar 

  65. SPSS LexiQuest [online, cited July 2002].Available from World Wide Web:www.spss.com/spssbi/lexiquest.

    Google Scholar 

  66. Stratify [online, cited July 2002].Available from World Wide Web: www.stratify. corn.

    Google Scholar 

  67. TDT [online, cited July 2002 ]. Available from World Wide Web: www. ni s t.gov/speech/tests/tdt/index.htm.

    Google Scholar 

  68. TextAnalyst [online, cited July 2002].Available from World Wide Web: www.megaputer.com/products/ta/index.php3.

    Google Scholar 

  69. ThoughtShare [online, cited July 2002 ]. Available from World Wide Web:www. thought share.corn.

    Google Scholar 

  70. University of Illinois at Urbana-Champaign Digital Library Initiative [online,cited July 2002 ]. Available from World Wide Web: dl i. grainger. uiuc. edu.

    Google Scholar 

  71. US Patent Site [online, cited July 2002].Available from World Wide Web:www.uspto.gov/main/patents.htm.

    Google Scholar 

  72. Verity [online, cited July 2002].Available from World Wide Web: www.verity. corn.

    Google Scholar 

  73. J. Xu, J. Broglio, and W.B. Croft. The design and implementation of a partof speech tagger for English.Technical report, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, Technical Report IR-52, 1994.

    Google Scholar 

  74. T. Yang.Detecting emerging conceptual contexts in textual collections.Master’s thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, 2000.

    Google Scholar 

  75. Y. Yang, T. Pierce, and J. Carbonell.A study on retrospective and on-line event detection.In Proceedings of SIGIR-98, 21st ACM International Conference on Research and Development in Information Retrieval,1998.

    Google Scholar 

  76. L. Zhou.Machine learning classification for detecting trends in textual collections.Master’s thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, December 2000.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kontostathis, A., Galitsky, L.M., Pottenger, W.M., Roy, S., Phelps, D.J. (2004). A Survey of Emerging Trend Detection in Textual Data Mining. In: Berry, M.W. (eds) Survey of Text Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-4305-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-4305-0_9

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-3057-6

  • Online ISBN: 978-1-4757-4305-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics