Abstract
We present a multilevel model of discussions in USENET newsgroups that includes the use of statistical and linguistic methods to obtain lexical, semantic and discourse characteristics of the text. We expose constraints that make information extraction and summarization more amenable to analysis at different levels. Our model makes use of posting structure, times of posting, time spans, and length and depth of a thread in order to extract higher-level information on subject matter, interest level, topicality, and discussion trends.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, D., Dolin, R., El Abbadi A. 1999. Scalable Collection Summarization and Selection. In Edward A Fox and Neil Rowe, editors, ACM DL’99, 49–58, August 1999.
Broder, A., Kumar, R., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J. 2000. Graph Structure in the Web. Procs. 9 th WWW Conference, Amsterdam, May, 2000.
Cardie, C. 1997. Empirical Methods in Info Extraction. In AI Magazine, 18:4, 65–79 1997.
Dolin, R., Agrawal, D., El Abbadi, A., Pearlman, J. 1998. Using Automated Classification for Summarizing and Selecting Heterogeneous Information Sources. In D-Lib Magazine, ISSN 1082-9873, January 1998.
Jiang, M-F., Tseng, S-S., Tsai, C-J. 1999. Discovering Structure from Document Databases. In N. Zhong and L. Zhou, editors, Methodologies for Knowledge Discovery and Data Mining. Lecture Notes in Artificial Intelligence 1574. Springer-Verlag, New York, 1999.
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A. 1999. Trawling the Web for Emerging Cyber-communities. Procs. 8 th WWW Conference, Toronto, May, 1999.
Manning, C., Schutze, C. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (Mass.), 1999.
Melnik, S., Raghavan, S., Yang, B., Garcia-Molina, H. 2001. Building a Distributed Full-Text Index for the Web. Procs. 10 th World-Wide Web Conference, Hong Kong, May, 2001.
Ng, H.T., Zelle J. 1997. Corpus-Based Approaches to Semantic Interpretation in Natural Languge Processing. In AI Magazine Winter 1987, 18:4, 45–64 1997.
Staab, S. 1999. Grading Knowledge: Extracting Degree Information from Texts. Lecture Notes in Artificial Intelligence 1744. Springer-Verlag, New York, 1999.
Witten, I. H., Frank, E. 1999. Data Mining. Morgan Kaufman, San Francisco, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sampathsampath@tcnj.edu, G., Martinovic, M. (2002). A Multilevel Text Processing Model of Newsgroup Dynamics. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_19
Download citation
DOI: https://doi.org/10.1007/3-540-36271-1_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00307-6
Online ISBN: 978-3-540-36271-5
eBook Packages: Springer Book Archive