Matching Semi-structured Documents Using Similarity of Regions through Fuzzy Rule-Based System

Ensan, Alireza; Biletskiy, Yevgen

doi:10.1007/978-3-642-39736-3_16

Matching Semi-structured Documents Using Similarity of Regions through Fuzzy Rule-Based System

Alireza Ensan²⁰ &
Yevgen Biletskiy²⁰

Conference paper

1812 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7987))

Abstract

The present work briefly describes a novel approach for categorizing semi-structure documents by using fuzzy rule-based system. We propose fuzzy logic representation for semi-structured documents and then by proposing new metric, categorize documents into different classes. The idea behind of our approach is to divide web pages into different semantic sections and by using fuzzy logic system extract features and weight harvested terms to represent semi-structure documents. A set of metrics are also used to measure similarity between documents based on the weight of each region in the text. A clustering algorithm is also explained that categorized documents into several categories. This idea is inspired as a subfield of the area of Matchmaking that tries to match document creators and users in order to find the best similarities between them and connect them for further collaborations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Garc´ıa-Plaza, P., Fresno, V., Mart´ınez, R.: Web page clustering using a fuzzy logic based representation and self-organizing maps. In: Proceedings of the WI-IAT, pp. 851–854 (2008)
Google Scholar
Aggarwal, C.C., Zhai, C.X.: A Survey of Text Classification Algorithms. Mining Text Data, pp. 163–222. Springer, US (2012)
Google Scholar
Forman, G.: Feature Selection for Text Classification. In: Liu, H., Motoda, H. (eds.) Computational Methods of Feature Selection, pp. 257–276. CRC Press/Taylor and Francis Group (2008)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Thomas, H.: Probabilistic latent semantic analysis. In: Uncertainity in Artificial Intelligence (1999)
Google Scholar
Lan, M., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. Int’l Conf. Machine Learning, ICM (1998)
Google Scholar
Biletskiy, Y., Brown, J.A., Ranganathan, G.R.: Information extraction from syllabi for academic e-Advising. Expert Systems with Applications 36(3), 4508–4516 (2009)
Article Google Scholar
Jang, R., Mizutani, E.: Neuro-Fuzzy and Soft Computing. Prentice Hall, Englewood Cliffs (1997)
Google Scholar
Mitchel, T.M.: Machine Learning. Mc Graw Hill (1996)
Google Scholar
Lee, M., Pincombe, B., Welsh, W.: An Empirical Evaluation of Models of Text Document Similarity. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 1254–1259 (2005)
Google Scholar
Huang, A.: Similarity measures for Text Document Clustering. In: Proceedings of New Zealand Computer Science Research Student Conference, July 3, 2012, pp. 49–56. Weka (2008)
Google Scholar
Weka digital library (2010), http://www.cs.waikato.ac.nz/ml/weka/ (retrieved July 3, 2012)

Download references

Author information

Authors and Affiliations

University of New Brunswick, Fredericton, New Brunswick, Canada
Alireza Ensan & Yevgen Biletskiy

Authors

Alireza Ensan
View author publications
You can also search for this author in PubMed Google Scholar
Yevgen Biletskiy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ensan, A., Biletskiy, Y. (2013). Matching Semi-structured Documents Using Similarity of Regions through Fuzzy Rule-Based System. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2013. Lecture Notes in Computer Science(), vol 7987. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39736-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-39736-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39735-6
Online ISBN: 978-3-642-39736-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics