Classification of a Large Web Page Collection Applying a GRNN Architecture
This paper proposes an information system that classifies web pages according a taxonomy, which is mainly used from seven search engines/ directories. The proposed classifier is a four-layer Generalised Regression Neural Network (GRNN) that aims to perform the information segmentation according to web page features. Many types of web pages were used in order to evaluate the robustness of the method, since no restrictions were imposed except for the language of the content, which is English. The system can be used as an assistant and consultative tool in order to help the work of human editors.
Unable to display preview. Download preview PDF.
- 1.van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths, London (1979)Google Scholar
- 2.Salton, G.: Automatic Text Processing. Addison-Wesley Publishing Company Inc., Reading (1989)Google Scholar
- 4.Lin, C.-H., Chen, H.: An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents. IEEE Transactions on Systems, Man and Cybernetics, Part B, 75–88 (February 1996)Google Scholar
- 5.Rialle, V., Meunier, J., Oussedik, S., Nault, G.: Semiotic and Modeling Computer Classification of Text with Genetic Algorithm: Analysis and first Results. In: Proceedings ISAS 1997, pp. 325–30 (1997)Google Scholar
- 7.Anagnostopoulos, I., Anagnostopoulos, C., Papaleonidopoulos, I., Loumos, V., Kayafas, E.: A proposed system for segmentation of information sources in portals and search engines repositories. In: 5th IEEE International Conference of Information Fusion 2000, IF2002, Annapolis, Maryland, USA, July 7–11, vol. 2, pp. 1450–1456 (2002)Google Scholar
- 8.Anagnostopoulos, I., Psoroulas, I., Loumos, V., Kayafas, E.: Implementing a customised meta-search interface for user query personalisation. In: IEEE 24th International Conference on Information Technology Interfaces, ITI 2002, June 24–27, pp. 79–84 (2002); Cavtat/Dubrovnik, CROATIAGoogle Scholar
- 9.Fox, C.: A stop list for general text. ACM Special Interest Group on Information Retrieval 24(1–2), 19–35Google Scholar
- 10.Ricardo, B., Berthier, R.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Amsterdam (1999); Appendix: Porter’s AlgorithmGoogle Scholar