Skip to main content

Supervised Web Document Classification Using Discrete Transforms, Active Hypercontours and Expert Knowledge

  • Conference paper
Web Intelligence Meets Brain Informatics (WImBI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4845))

Included in the following conference series:

Abstract

In this paper, a new method of supervised classification of documents is proposed. It utilizes discrete trasforms to extract features from classified objects and adopts adaptive potential active hypercontours (APAH) for document classification. The idea of APAH generalizes classic contour methods of image segmentation. It has two main advantages: it can use almost any knowledge during the search for an optimal classification function and it can operate in a feature space where only metric is defined. Here, both of them are utilized - the first one by using expert knowledge about significance of documents from training set and the second one by inducing new metrics in feature spaces. The method has been evaluated on the subset of open directory project (ODP) database and compared with k-NN, the well known classification technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhong, N., Liu, J., Yao, Y.Y.: Web intelligence. Springer, Heidelberg (2003)

    Book  MATH  Google Scholar 

  2. Zhong, N.: Developing Intelligent Portals by Using WI Technologies. In: Li, J.P., et al. (eds.) Wavelet Analysis and Its Applications, and Active Media Technology, vol. 2, pp. 555–567. World Scientific, Singapore (2004)

    Google Scholar 

  3. Zhong, N., Liu, J. (eds.): Intelligent Technologies for Information Analysis. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  4. Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds.): Intelligent Exploration of the Web, 2nd edn. Physica Verlag, Heidelberg (2003)

    Google Scholar 

  5. Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.): AWIC 2003. LNCS (LNAI), vol. 2663. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  6. Favela, J., Menasalvas, E., Chávez, E. (eds.): AWIC 2004. LNCS (LNAI), vol. 3034. Springer, Heidelberg (2004)

    Google Scholar 

  7. Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.): AWIC 2005. LNCS (LNAI), vol. 3528. Springer, Heidelberg (2005)

    Google Scholar 

  8. Last, M., Szczepaniak, P.S., Volkovich, Z., Kandel, A. (eds.): Advances in Web Intelligence and Data Mining. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  9. Wegrzyn-Wolska, K., Szczepaniak, P.S. (eds.): Advances in Intelligent Web Mastering. Springer, Heidelberg (2007)

    Google Scholar 

  10. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. International Journal of Computer Vision 22(1), 61–79 (1997)

    Article  MATH  Google Scholar 

  11. Bishop, C.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1993)

    Google Scholar 

  12. Joachims, T.: Text Categorization using Support Vector Machines: Learning with Many Relevant Features. Research Reports of the unit nr VIII. Computer Science Department of the University of Dortmund. Dortmund (1997)

    Google Scholar 

  13. Kass, M., Witkin, W., Terzopoulos, D.: Snakes: Active Contour Models. International Journal of Computer Vision, 321–331 (1988)

    Google Scholar 

  14. Kirkpatrick, S., Gerlatt, C.D., Vecchi Jr., M.P.: Optimization by Simulated Annealing. Science 220, 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  15. Kwiatkowski, W.: Methods of Automatic Pattern Recognition. WAT. Warsaw (in Polish) (2001)

    Google Scholar 

  16. Looney, C.: Pattern Recognition Using Neural Networks. Theory and Algorithms for Engineers and Scientists. Oxford University Press, New York (1997)

    Google Scholar 

  17. Park, L.A.F., Palaniswami, M., Ramamohanarao, K.: Fourier Domain Scoring: A Novel Document Ranking Method. IEEE Trans. on Knowledge and Data Engineering 16(5), 529–539 (2004)

    Article  Google Scholar 

  18. Park, L.A.F., Ramamohanarao, K., Palaniswami, M.: A Novel Web Text Mining Method Using the Discrete Cosine Transform. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 385–396. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  19. Pedrycz, W., Loia, V., Senatore, S.: P-FCM: A proximity-based fuzzy clustering. Fuzzy Sets and Systems 128, 21–41 (2004)

    Article  MathSciNet  Google Scholar 

  20. Pedrycz, W.: Knowledge-Based Clustering. Wiley-Interscience, Hoboken, New Jersey (2005)

    MATH  Google Scholar 

  21. Szczepaniak, P.S., Pryczek, M.: Web Textual Documents Scoring Based on Discrete Transforms and Fuzzy Weighting. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 415–420. Springer, Heidelberg (2005)

    Google Scholar 

  22. Szczepaniak, P.S., Pryczek, M.: On Textual Documents Classification Using Fourier Domain Scoring. In: Proceedings of 2006 IEEE /WIC/ACM International Conference on Web Inteligence (WI 2006), Hong Kong, IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  23. Tadeusiewicz, R., Flasinski, M.: Pattern Recognition. PWN. Warsaw (in Polish) (1991)

    Google Scholar 

  24. Tomczyk, A., Szczepaniak, P.S.: On the Relationship between Active Contours and Contextual Classification. In: Kurzynski, M., et al. (eds.) Computer Recognition Systems. Proceedings of the 4th Int. Conference on Computer Recognition Systems - CORES 2005, pp. 303–310. Springer, Heidelberg (2005)

    Google Scholar 

  25. Tomczyk, A.: Active Hypercontours and Contextual Classification. In: 5th International Conference on Inteligent Systems Design and Applications (ISDA), Wroclaw. Polska, pp. 256–261. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  26. Tomczyk, A., Szczepaniak, P.S.: Adaptive Potential Active Hypercontours. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 692–701. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Wegrzyn-Wolska, K., Szczepaniak, P.S.: Classification of RSS-formatted Documents using Full Text Similarity Measures. In: Lowe, D.G., Gaedke, M. (eds.) ICWE 2005. LNCS, vol. 3579, pp. 400–405. Springer, Heidelberg (2005)

    Google Scholar 

  28. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)

    Google Scholar 

  29. Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  30. Pal, S.K., Talwar, V., Mitra, P.: Web Mining in Soft Computing Frameworks: Relevance, State of the Art and Future Directions. IEEE Trans. on Neural Networks 13(5) (2002)

    Google Scholar 

  31. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function. Plenum Press, New York (1981)

    MATH  Google Scholar 

  32. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  33. Kraft, D.H., Chen, J.: Integrating and Extending Fuzzy Clustering and Inferencing to Improve Text Retrieval Performance. In: Larsen, H.L., et al. (eds.) Flexible Query Answering Systems, Springer, Heidelberg (2001)

    Google Scholar 

  34. Kraft, D.H., Chen, J., Martin-Bautista, M.J., Amparo-Vila, M.: Textual Information Retrieval with User Profiles using Fuzzy Clustering and Inferencing. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L. (eds.) Intelligent Exploration of the Web, Springer, Heidelberg (2003)

    Google Scholar 

  35. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function. Plenum Press, New York (1981)

    MATH  Google Scholar 

  36. Bezdek, J.C.: A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence 2, 1–8 (1980)

    Article  MATH  Google Scholar 

  37. Bezdek, J.C., Hathaway, R.J., Sabin, M.J., Tucker, W.T.: Convergence Theory for Fuzzy c-Means: Counterexamples and Repairs. IEEE Trans. on Systems, Man, and Cybernetics 17, 873–877 (1987)

    MATH  Google Scholar 

  38. ODP, http://www.dmoz.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ning Zhong Jiming Liu Yiyu Yao Jinglong Wu Shengfu Lu Kuncheng Li

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Szczepaniak, P.S., Tomczyk, A., Pryczek, M. (2007). Supervised Web Document Classification Using Discrete Transforms, Active Hypercontours and Expert Knowledge. In: Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S., Li, K. (eds) Web Intelligence Meets Brain Informatics. WImBI 2006. Lecture Notes in Computer Science(), vol 4845. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77028-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77028-2_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77027-5

  • Online ISBN: 978-3-540-77028-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics