Advertisement

Classification of a Large Web Page Collection Applying a GRNN Architecture

  • Ioannis Anagnostopoulos
  • Christos Anagnostopoulos
  • Vergados Dimitrios
  • Vassili Loumos
  • Eleftherios Kayafas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2869)

Abstract

This paper proposes an information system that classifies web pages according a taxonomy, which is mainly used from seven search engines/ directories. The proposed classifier is a four-layer Generalised Regression Neural Network (GRNN) that aims to perform the information segmentation according to web page features. Many types of web pages were used in order to evaluate the robustness of the method, since no restrictions were imposed except for the language of the content, which is English. The system can be used as an assistant and consultative tool in order to help the work of human editors.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths, London (1979)Google Scholar
  2. 2.
    Salton, G.: Automatic Text Processing. Addison-Wesley Publishing Company Inc., Reading (1989)Google Scholar
  3. 3.
    Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Transactions on Neural Networks 11(3), 574–585 (2000); special Issue on Neural Networks for Data Mining and Knowledge DiscoveryCrossRefGoogle Scholar
  4. 4.
    Lin, C.-H., Chen, H.: An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents. IEEE Transactions on Systems, Man and Cybernetics, Part B, 75–88 (February 1996)Google Scholar
  5. 5.
    Rialle, V., Meunier, J., Oussedik, S., Nault, G.: Semiotic and Modeling Computer Classification of Text with Genetic Algorithm: Analysis and first Results. In: Proceedings ISAS 1997, pp. 325–30 (1997)Google Scholar
  6. 6.
    Petridis, V., Kaburlasos, V.G.: Clustering and classification in structured data domains using Fuzzy Lattice Neurocomputing (FLN). IEEE Transactions on Knowledge and Data Engineering 13(2), 245–260 (2001)CrossRefGoogle Scholar
  7. 7.
    Anagnostopoulos, I., Anagnostopoulos, C., Papaleonidopoulos, I., Loumos, V., Kayafas, E.: A proposed system for segmentation of information sources in portals and search engines repositories. In: 5th IEEE International Conference of Information Fusion 2000, IF2002, Annapolis, Maryland, USA, July 7–11, vol. 2, pp. 1450–1456 (2002)Google Scholar
  8. 8.
    Anagnostopoulos, I., Psoroulas, I., Loumos, V., Kayafas, E.: Implementing a customised meta-search interface for user query personalisation. In: IEEE 24th International Conference on Information Technology Interfaces, ITI 2002, June 24–27, pp. 79–84 (2002); Cavtat/Dubrovnik, CROATIAGoogle Scholar
  9. 9.
    Fox, C.: A stop list for general text. ACM Special Interest Group on Information Retrieval 24(1–2), 19–35Google Scholar
  10. 10.
    Ricardo, B., Berthier, R.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Amsterdam (1999); Appendix: Porter’s AlgorithmGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Ioannis Anagnostopoulos
    • 1
  • Christos Anagnostopoulos
    • 1
  • Vergados Dimitrios
    • 1
  • Vassili Loumos
    • 1
  • Eleftherios Kayafas
    • 1
  1. 1.School of Electrical and Computer EngineeringAthensGREECE

Personalised recommendations