Semi-automatic Creation and Maintenance of Web Resources with webTopic

  • Nuno F. Escudeiro
  • Alípio M. Jorge
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4289)


In this paper we propose a methodology for automatically retrieving document collections from the web on specific topics and for organizing them and keeping them up-to-date over time, according to user specific persistent information needs. The documents collected are organized according to user specifications and are classified partly by the user and partly automatically. A presentation layer enables the exploration of large sets of documents and, simultaneously, monitors and records user interaction with these document collections. The quality of the system is permanently monitored; the system periodically measures and stores the values of its quality parameters. Using this quality log it is possible to maintain the quality of the resources by triggering procedures aimed at correcting or preventing quality degradation.


Document Collection Training Document Anchor Text Prototype Vector Supervise Setting 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yate, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  2. 2.
    Bueno, D., David, A.A.: METIORE: A Personalized Information Retrieval System. In: Bauer, M., Gmytrasiewicz, P.J., Vassileva, J. (eds.) UM 2001. LNCS, vol. 2109, p. 168. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Buntine, W., Perttu, S., Tirri, H.: Building and Maintaining Web Taxonomies. In: Proceedings of the XML Finland 2002 Conference, pp. 54–65 (2002)Google Scholar
  4. 4.
    Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In: Proceedings of the 7th International World Wide Web Conference (1998)Google Scholar
  5. 5.
    Chakrabarti, S., Berg, M., Dom, B.: Focused crawling: a new approach to topic specific resource discovery. In: Proceedings of the 8th World Wide Web Conference (1999)Google Scholar
  6. 6.
    Chakrabarti, S.: Mining the web, Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2003)Google Scholar
  7. 7.
    Chen, C.C., Chen, M.C., Sun, Y.: PVA: A Self-Adaptive Personal View Agent System. In: Proceedings of the ACM SIGKDD 2001 Conference (2001)Google Scholar
  8. 8.
    Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000a)Google Scholar
  9. 9.
    Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proceedings of the 23rd ACM SIGIR Conference, pp. 256–263 (2000)Google Scholar
  10. 10.
    Etzioni, O.: The World-Wide-Web: quagmire or gold mine? Communications of the ACM 39(11), 65–68 (1996)CrossRefGoogle Scholar
  11. 11.
    Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: Thesus: Organizing Web document collections based on link semantics. The VLDB Journal 12, 320–332 (2003)CrossRefGoogle Scholar
  12. 12.
    Joachims, T.: A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 1997 International Conference on Machine Learning (1997)Google Scholar
  13. 13.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Research Report of the unit no. VIII(AI), Computer Science Department of the University of Dortmund (1998)Google Scholar
  14. 14.
    Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: IJCAI 1999 Workshop on Text Mining: Foundation, Techniques and Applications, pp. 52–63 (1999)Google Scholar
  15. 15.
    Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2), 144–173 (2000)CrossRefGoogle Scholar
  16. 16.
    Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2(1), 1–13 (2000)CrossRefGoogle Scholar
  17. 17.
    Levene, M., Poulovassilis, A. (eds.): Web Dynamics: Adapting to Change in Content, Size, Topology and Use. Springer, Heidelberg (2004)zbMATHGoogle Scholar
  18. 18.
    Lieberman, H.: Letizia: an Agent That Assists Web Browsing. In: Proceedings of the International Joint Conference on AI (1995)Google Scholar
  19. 19.
    Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-Specific Concepts and Definitions on the Web. In: Proceedings of the World Wide Web 2003 Conference (2003)Google Scholar
  20. 20.
    Macskassy, S.A., Banerjee, A., Dovison, B.D., Hirsh, H.: Human Performance on Clustering Web Pages: a Preliminary Study. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (1998)Google Scholar
  21. 21.
    Mladenic, D.: Personal WebWatcher: design and implementation, Technical Report IJS-DP-7472, SI (1999)Google Scholar
  22. 22.
    Martins, B., Silva, M.J.: Language Identification in Web Pages. In: Document Engineering Track of the 20th ACM Symposium on Applied Computing (unpublished, 2002)Google Scholar
  23. 23.
    Mitchell, S., Mooney, M., Mason, J., Paynter, G.W., Ruscheinski, J., Kedzierski, A., Humphreys, K.: iVia Open Source Virtual Library System. D-Lib Magazine 9(1) (2003)Google Scholar
  24. 24.
    Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B., Williams, J.G.: Visualization of a Document Collection: The VIBE System. Information Processing & Management 29(1), 69–81 (1992)CrossRefGoogle Scholar
  25. 25.
    Silva, M.J., Martins, B.: Web Information Retrieval with Result set Clustering. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902. Springer, Heidelberg (2003)Google Scholar
  26. 26.
    Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems, 253–277 (1994)Google Scholar
  27. 27.
    Yang, Y., Pederson, J.: A Comparative Study of Feature Selection in Text Categorization. In: International Conference on Machine Learning (1997)Google Scholar
  28. 28.
    Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)Google Scholar
  29. 29.
    Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization, pp. 1–25. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  30. 30.
    Zamir, O., Etzioni, O.: Grouper: A Dynamic clustering Interface to Web Search Results. In: Proceedings of the 1999 World Wide Web Conference (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nuno F. Escudeiro
    • 1
  • Alípio M. Jorge
    • 1
  1. 1.LIACC, Faculdade de EconomiaUniversidade do Porto 

Personalised recommendations