Automatic search from streaming data

Coden, Anni R.; Brown, Eric W.

doi:10.1007/s10791-005-5723-3

Automatic search from streaming data

Published: January 2006

Volume 9, pages 95–109, (2006)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Automatic search from streaming data

Download PDF

Anni R. Coden¹ &
Eric W. Brown¹

93 Accesses
1 Citation
Explore all metrics

Abstract

Streaming data poses a variety of new and interesting challenges for information retrieval and text analysis. Unlike static document collections, which are typically analyzed and indexed off-line to support ad-hoc queries, streaming data often must be analyzed on the fly and acted on as the data passes through the analysis system. Speech is one example of streaming data that is a challenge to exploit, yet has significant potential to provide value in a knowledge management system. We are specifically interested in techniques that analyze streaming data and automatically find collateral information, or information that clarifies, expands, and generally enhances the value of the streaming data. We present a system that analyzes a data stream and automatically finds documents related to the current topic of discussion in the data stream. Experimental results show that the system generates result lists with an average precision at 10 hits of better than 60%. We also present a hit-list re-ranking technique based on named entity analysis and automatic text categorization that can improve the search results by 6%–12%.

References

Apte C and Damerau F (1994) Automated learning of decision rules for text categorization: ACM Trans. Inf. Syst., 12:233–251.
Google Scholar
Baeza-Yates R and Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, New York.
Google Scholar
Brown EW and Chong HA (1998) The guru system in TREC-6. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 535–540.
Brown EW and Coden AR (2002) Capitalization recovery for text. In: Coden AR, Brown EW and Srinivasan S (eds.) Information Retrieval Techniques for Speech Applications. LNCS 2273. Springer, Berlin, pp. 11–22.
Google Scholar
Brown EW, Srinivasan S, et al. (2001) Toward speech as a knowledge resource. IBM Systems Journal, 40:985–1001.
Google Scholar
Chowdhury A, Beitzel S, et al. (2001) IIT TREC-9 - entity based feedback with fusion. In: Proceedings of the Ninth Text REtrieval Conference (TREC 9).
Cieri C, Graff D, et al. (1999) The TDT-2 text and speech corpus. In: Proceedings of the 1999 DARPA Broadcast News Workshop.
Coden A and Brown E (2001) Speech transcript analysis for automatic search. In: Proceedings of HICSS'34.
Cooper JW and Byrd RJ (1997) Lexical navigation: Visually prompted query expansion and refinement. In: Proceedings of the ACM International Conference on Digital Libraries, pp. 237–246.
DARPA (1998) Proceedings of the DARPA broadcast news transcription and understanding workshop. In: Proceedings.
Garofolo J, Voorhees E, et al. (1998) TREC-6 1997 spoken document retrieval track overview and results. In: Proceedings of The Sixth Text REtrieval Conference (TREC-6), pp. 83–91.
Johnson DE, Oles FJ, et al. (2002) A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41:428–437.
Google Scholar
Manning C and Schuetze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
Google Scholar
Mitra M, Buckley C, et al. (1997) An analysis of statistical and syntactic phrases. In : Proceedings of RIAO97, Computer-Assisted Information Searching on the Internet, pp. 200–214.
Ravin Y, Wacholder N, et al. (1997) Disambiguation of names in text. In: Proceedings of the ACL Conf. on Applied Natural Language Processing, pp. 202–208.
Strzalkowski T, Lin F, et al. (1998) Natural language information retrieval TREC-6 report. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6).
Strzalkowski T, Perez-Carballo J, et al. (2000) Natural language information retrieval: TREC-8 report. In: Proceedings of the Eigth Text REtrieval Conference (TREC 8).
Strzalkowski T, Stein G, et al. (1999) Natural language information retrieval: TREC-7 report. In: Proceedings of the Seventh Text REtreival Conference (TREC-7).
Turpin A and Moffat A (1999) Statistical phrases for vector-space information retrieval. In: Proceedings of the ACM Inter. Conf. on Research and Development in Information Retrieval, pp. 309–310.

Download references

Author information

Authors and Affiliations

T.J. Watson Research Center, IBM, 19 Skyline Drive, Hawthorne, NY, 10532
Anni R. Coden & Eric W. Brown

Authors

Anni R. Coden
View author publications
You can also search for this author in PubMed Google Scholar
Eric W. Brown
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anni R. Coden.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Coden, A.R., Brown, E.W. Automatic search from streaming data. Inf Retrieval 9, 95–109 (2006). https://doi.org/10.1007/s10791-005-5723-3

Download citation

Received: 09 January 2003
Revised: 23 December 2004
Accepted: 12 January 2005
Issue Date: January 2006
DOI: https://doi.org/10.1007/s10791-005-5723-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automatic search from streaming data

Abstract

Article PDF

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

Recent automatic text summarization techniques: a survey

Big data analytics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic search from streaming data

Abstract

Article PDF

Similar content being viewed by others

Social media analytics: a survey of techniques, tools and platforms

Recent automatic text summarization techniques: a survey

Big data analytics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation