Abstract
In this paper, a new method of supervised classification of documents is proposed. It utilizes discrete trasforms to extract features from classified objects and adopts adaptive potential active hypercontours (APAH) for document classification. The idea of APAH generalizes classic contour methods of image segmentation. It has two main advantages: it can use almost any knowledge during the search for an optimal classification function and it can operate in a feature space where only metric is defined. Here, both of them are utilized - the first one by using expert knowledge about significance of documents from training set and the second one by inducing new metrics in feature spaces. The method has been evaluated on the subset of open directory project (ODP) database and compared with k-NN, the well known classification technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhong, N., Liu, J., Yao, Y.Y.: Web intelligence. Springer, Heidelberg (2003)
Zhong, N.: Developing Intelligent Portals by Using WI Technologies. In: Li, J.P., et al. (eds.) Wavelet Analysis and Its Applications, and Active Media Technology, vol. 2, pp. 555–567. World Scientific, Singapore (2004)
Zhong, N., Liu, J. (eds.): Intelligent Technologies for Information Analysis. Springer, Heidelberg (2004)
Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds.): Intelligent Exploration of the Web, 2nd edn. Physica Verlag, Heidelberg (2003)
Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.): AWIC 2003. LNCS (LNAI), vol. 2663. Springer, Heidelberg (2003)
Favela, J., Menasalvas, E., Chávez, E. (eds.): AWIC 2004. LNCS (LNAI), vol. 3034. Springer, Heidelberg (2004)
Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.): AWIC 2005. LNCS (LNAI), vol. 3528. Springer, Heidelberg (2005)
Last, M., Szczepaniak, P.S., Volkovich, Z., Kandel, A. (eds.): Advances in Web Intelligence and Data Mining. Springer, Heidelberg (2006)
Wegrzyn-Wolska, K., Szczepaniak, P.S. (eds.): Advances in Intelligent Web Mastering. Springer, Heidelberg (2007)
Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. International Journal of Computer Vision 22(1), 61–79 (1997)
Bishop, C.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1993)
Joachims, T.: Text Categorization using Support Vector Machines: Learning with Many Relevant Features. Research Reports of the unit nr VIII. Computer Science Department of the University of Dortmund. Dortmund (1997)
Kass, M., Witkin, W., Terzopoulos, D.: Snakes: Active Contour Models. International Journal of Computer Vision, 321–331 (1988)
Kirkpatrick, S., Gerlatt, C.D., Vecchi Jr., M.P.: Optimization by Simulated Annealing. Science 220, 671–680 (1983)
Kwiatkowski, W.: Methods of Automatic Pattern Recognition. WAT. Warsaw (in Polish) (2001)
Looney, C.: Pattern Recognition Using Neural Networks. Theory and Algorithms for Engineers and Scientists. Oxford University Press, New York (1997)
Park, L.A.F., Palaniswami, M., Ramamohanarao, K.: Fourier Domain Scoring: A Novel Document Ranking Method. IEEE Trans. on Knowledge and Data Engineering 16(5), 529–539 (2004)
Park, L.A.F., Ramamohanarao, K., Palaniswami, M.: A Novel Web Text Mining Method Using the Discrete Cosine Transform. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 385–396. Springer, Heidelberg (2002)
Pedrycz, W., Loia, V., Senatore, S.: P-FCM: A proximity-based fuzzy clustering. Fuzzy Sets and Systems 128, 21–41 (2004)
Pedrycz, W.: Knowledge-Based Clustering. Wiley-Interscience, Hoboken, New Jersey (2005)
Szczepaniak, P.S., Pryczek, M.: Web Textual Documents Scoring Based on Discrete Transforms and Fuzzy Weighting. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 415–420. Springer, Heidelberg (2005)
Szczepaniak, P.S., Pryczek, M.: On Textual Documents Classification Using Fourier Domain Scoring. In: Proceedings of 2006 IEEE /WIC/ACM International Conference on Web Inteligence (WI 2006), Hong Kong, IEEE Computer Society Press, Los Alamitos (2006)
Tadeusiewicz, R., Flasinski, M.: Pattern Recognition. PWN. Warsaw (in Polish) (1991)
Tomczyk, A., Szczepaniak, P.S.: On the Relationship between Active Contours and Contextual Classification. In: Kurzynski, M., et al. (eds.) Computer Recognition Systems. Proceedings of the 4th Int. Conference on Computer Recognition Systems - CORES 2005, pp. 303–310. Springer, Heidelberg (2005)
Tomczyk, A.: Active Hypercontours and Contextual Classification. In: 5th International Conference on Inteligent Systems Design and Applications (ISDA), Wroclaw. Polska, pp. 256–261. IEEE Computer Society Press, Los Alamitos (2005)
Tomczyk, A., Szczepaniak, P.S.: Adaptive Potential Active Hypercontours. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 692–701. Springer, Heidelberg (2006)
Wegrzyn-Wolska, K., Szczepaniak, P.S.: Classification of RSS-formatted Documents using Full Text Similarity Measures. In: Lowe, D.G., Gaedke, M. (eds.) ICWE 2005. LNCS, vol. 3579, pp. 400–405. Springer, Heidelberg (2005)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, San Francisco (1999)
Pal, S.K., Talwar, V., Mitra, P.: Web Mining in Soft Computing Frameworks: Relevance, State of the Art and Future Directions. IEEE Trans. on Neural Networks 13(5) (2002)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function. Plenum Press, New York (1981)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Kraft, D.H., Chen, J.: Integrating and Extending Fuzzy Clustering and Inferencing to Improve Text Retrieval Performance. In: Larsen, H.L., et al. (eds.) Flexible Query Answering Systems, Springer, Heidelberg (2001)
Kraft, D.H., Chen, J., Martin-Bautista, M.J., Amparo-Vila, M.: Textual Information Retrieval with User Profiles using Fuzzy Clustering and Inferencing. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L. (eds.) Intelligent Exploration of the Web, Springer, Heidelberg (2003)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function. Plenum Press, New York (1981)
Bezdek, J.C.: A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence 2, 1–8 (1980)
Bezdek, J.C., Hathaway, R.J., Sabin, M.J., Tucker, W.T.: Convergence Theory for Fuzzy c-Means: Counterexamples and Repairs. IEEE Trans. on Systems, Man, and Cybernetics 17, 873–877 (1987)
ODP, http://www.dmoz.org/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szczepaniak, P.S., Tomczyk, A., Pryczek, M. (2007). Supervised Web Document Classification Using Discrete Transforms, Active Hypercontours and Expert Knowledge. In: Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S., Li, K. (eds) Web Intelligence Meets Brain Informatics. WImBI 2006. Lecture Notes in Computer Science(), vol 4845. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77028-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-77028-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77027-5
Online ISBN: 978-3-540-77028-2
eBook Packages: Computer ScienceComputer Science (R0)