Skip to main content

Matching Semi-structured Documents Using Similarity of Regions through Fuzzy Rule-Based System

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7987))

Abstract

The present work briefly describes a novel approach for categorizing semi-structure documents by using fuzzy rule-based system. We propose fuzzy logic representation for semi-structured documents and then by proposing new metric, categorize documents into different classes. The idea behind of our approach is to divide web pages into different semantic sections and by using fuzzy logic system extract features and weight harvested terms to represent semi-structure documents. A set of metrics are also used to measure similarity between documents based on the weight of each region in the text. A clustering algorithm is also explained that categorized documents into several categories. This idea is inspired as a subfield of the area of Matchmaking that tries to match document creators and users in order to find the best similarities between them and connect them for further collaborations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Garc´ıa-Plaza, P., Fresno, V., Mart´ınez, R.: Web page clustering using a fuzzy logic based representation and self-organizing maps. In: Proceedings of the WI-IAT, pp. 851–854 (2008)

    Google Scholar 

  2. Aggarwal, C.C., Zhai, C.X.: A Survey of Text Classification Algorithms. Mining Text Data, pp. 163–222. Springer, US (2012)

    Google Scholar 

  3. Forman, G.: Feature Selection for Text Classification. In: Liu, H., Motoda, H. (eds.) Computational Methods of Feature Selection, pp. 257–276. CRC Press/Taylor and Francis Group (2008)

    Google Scholar 

  4. Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  5. Thomas, H.: Probabilistic latent semantic analysis. In: Uncertainity in Artificial Intelligence (1999)

    Google Scholar 

  6. Lan, M., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)

    Article  Google Scholar 

  7. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  8. Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. Int’l Conf. Machine Learning, ICM (1998)

    Google Scholar 

  9. Biletskiy, Y., Brown, J.A., Ranganathan, G.R.: Information extraction from syllabi for academic e-Advising. Expert Systems with Applications 36(3), 4508–4516 (2009)

    Article  Google Scholar 

  10. Jang, R., Mizutani, E.: Neuro-Fuzzy and Soft Computing. Prentice Hall, Englewood Cliffs (1997)

    Google Scholar 

  11. Mitchel, T.M.: Machine Learning. Mc Graw Hill (1996)

    Google Scholar 

  12. Lee, M., Pincombe, B., Welsh, W.: An Empirical Evaluation of Models of Text Document Similarity. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 1254–1259 (2005)

    Google Scholar 

  13. Huang, A.: Similarity measures for Text Document Clustering. In: Proceedings of New Zealand Computer Science Research Student Conference, July 3, 2012, pp. 49–56. Weka (2008)

    Google Scholar 

  14. Weka digital library (2010), http://www.cs.waikato.ac.nz/ml/weka/ (retrieved July 3, 2012)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ensan, A., Biletskiy, Y. (2013). Matching Semi-structured Documents Using Similarity of Regions through Fuzzy Rule-Based System. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2013. Lecture Notes in Computer Science(), vol 7987. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39736-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39736-3_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39735-6

  • Online ISBN: 978-3-642-39736-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics