Information Retrieval

, Volume 9, Issue 1, pp 95–109 | Cite as

Automatic search from streaming data



Streaming data poses a variety of new and interesting challenges for information retrieval and text analysis. Unlike static document collections, which are typically analyzed and indexed off-line to support ad-hoc queries, streaming data often must be analyzed on the fly and acted on as the data passes through the analysis system. Speech is one example of streaming data that is a challenge to exploit, yet has significant potential to provide value in a knowledge management system. We are specifically interested in techniques that analyze streaming data and automatically find collateral information, or information that clarifies, expands, and generally enhances the value of the streaming data. We present a system that analyzes a data stream and automatically finds documents related to the current topic of discussion in the data stream. Experimental results show that the system generates result lists with an average precision at 10 hits of better than 60%. We also present a hit-list re-ranking technique based on named entity analysis and automatic text categorization that can improve the search results by 6%–12%.


Speech retrieval Text mining Information retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Apte C and Damerau F (1994) Automated learning of decision rules for text categorization: ACM Trans. Inf. Syst., 12:233–251.Google Scholar
  2. Baeza-Yates R and Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, New York.Google Scholar
  3. Brown EW and Chong HA (1998) The guru system in TREC-6. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 535–540.Google Scholar
  4. Brown EW and Coden AR (2002) Capitalization recovery for text. In: Coden AR, Brown EW and Srinivasan S (eds.) Information Retrieval Techniques for Speech Applications. LNCS 2273. Springer, Berlin, pp. 11–22.Google Scholar
  5. Brown EW, Srinivasan S, et al. (2001) Toward speech as a knowledge resource. IBM Systems Journal, 40:985–1001.Google Scholar
  6. Chowdhury A, Beitzel S, et al. (2001) IIT TREC-9 - entity based feedback with fusion. In: Proceedings of the Ninth Text REtrieval Conference (TREC 9).Google Scholar
  7. Cieri C, Graff D, et al. (1999) The TDT-2 text and speech corpus. In: Proceedings of the 1999 DARPA Broadcast News Workshop.Google Scholar
  8. Coden A and Brown E (2001) Speech transcript analysis for automatic search. In: Proceedings of HICSS'34.Google Scholar
  9. Cooper JW and Byrd RJ (1997) Lexical navigation: Visually prompted query expansion and refinement. In: Proceedings of the ACM International Conference on Digital Libraries, pp. 237–246.Google Scholar
  10. DARPA (1998) Proceedings of the DARPA broadcast news transcription and understanding workshop. In: Proceedings.Google Scholar
  11. Garofolo J, Voorhees E, et al. (1998) TREC-6 1997 spoken document retrieval track overview and results. In: Proceedings of The Sixth Text REtrieval Conference (TREC-6), pp. 83–91.Google Scholar
  12. Johnson DE, Oles FJ, et al. (2002) A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41:428–437.Google Scholar
  13. Manning C and Schuetze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.Google Scholar
  14. Mitra M, Buckley C, et al. (1997) An analysis of statistical and syntactic phrases. In : Proceedings of RIAO97, Computer-Assisted Information Searching on the Internet, pp. 200–214.Google Scholar
  15. Ravin Y, Wacholder N, et al. (1997) Disambiguation of names in text. In: Proceedings of the ACL Conf. on Applied Natural Language Processing, pp. 202–208.Google Scholar
  16. Strzalkowski T, Lin F, et al. (1998) Natural language information retrieval TREC-6 report. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6).Google Scholar
  17. Strzalkowski T, Perez-Carballo J, et al. (2000) Natural language information retrieval: TREC-8 report. In: Proceedings of the Eigth Text REtrieval Conference (TREC 8).Google Scholar
  18. Strzalkowski T, Stein G, et al. (1999) Natural language information retrieval: TREC-7 report. In: Proceedings of the Seventh Text REtreival Conference (TREC-7).Google Scholar
  19. Turpin A and Moffat A (1999) Statistical phrases for vector-space information retrieval. In: Proceedings of the ACM Inter. Conf. on Research and Development in Information Retrieval, pp. 309–310.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.T.J. Watson Research CenterIBMHawthorne

Personalised recommendations