NowOnWeb: News Search and Summarization

  • Javier Parapar
  • José M. Casanova
  • Álvaro Barreiro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4739)


Agile access to the huge amount of information published by the thousands of news sites available on-line leads to the application of Information Retrieval techniques to this problem. The aim of this paper is to present NowOnWeb, a news retrieval system that obtains the articles from different on-line sources providing news searching and browsing. The main points solved during the development of NowOnWeb were: article recognition and extraction, redundancy detection and text summarization. For these points we provided effective solutions that put all them together had risen to a system that satisfies, in a reasonable way, the daily information needs of the user.


User Query Vector Space Model Summary Generation Text Summarization Tree Edit Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allan, J., Wade, C., Bolivar, A.: Retrieval and novelty detection at the sentence level. In: SIGIR 2003: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 314–321. ACM Press, USA (2003)CrossRefGoogle Scholar
  2. 2.
    Cafarella, M., Cutting, D.: Building nutch: Open source search. Queue 2(2), 54–61 (2004)CrossRefGoogle Scholar
  3. 3.
    Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. J. ACM 51(5), 731–779 (2004)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Hatcher, E., Gospodnetic, O.: Lucene in Action (In Action series). Manning Publications Co., Greenwich, CT, USA (2004)Google Scholar
  5. 5.
    Hovy, E.: Text Summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, ch. 32, pp. 583–598 (2005)Google Scholar
  6. 6.
    McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B., Sigelman, S.: Tracking and summarizing news on a daily basis with Columbia’s Newsblaster. In: Proceedings of the Human Language Technology Conference (2002)Google Scholar
  7. 7.
    Reis, D.C., Golgher, P.B., Silva, A.S., Laender, A.F.: Automatic web news extraction using tree edit distance. In: WWW 2004: Proceedings of the 13th international conference on World Wide Web, pp. 502–511. ACM Press, New York (2004)CrossRefGoogle Scholar
  8. 8.
    Radev, D., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: NewsInEssence: Summarizing online news topics. Commun. ACM 48(10), 95–98 (2005)CrossRefGoogle Scholar
  9. 9.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)zbMATHCrossRefGoogle Scholar
  10. 10.
    Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 81–88. ACM Press, USA (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Javier Parapar
    • 1
  • José M. Casanova
    • 1
  • Álvaro Barreiro
    • 1
  1. 1.IRLab, Department of Computer Science , University of A Coruña, Campus de Elviña s/n, 15071, A CoruñaSpain

Personalised recommendations