Clustering XML Documents for Web Based Learning

Periakaruppan, Ramanathan; Nadarajan, Rethinaswamy

doi:10.1007/978-3-662-46315-4_24

Ramanathan Periakaruppan²¹ &
Rethinaswamy Nadarajan²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8390))

Included in the following conference series:

International Conference on Web-Based Learning

1041 Accesses

Abstract

The Web is increasingly used as a source of information for learning. Hence it is necessary that information on the web should be organized so that it can be used by the stakeholders efficiently. Most of the information in web is available in the form of XML documents. Grouping/clustering XML documents enhances the information retrieval process effectiveness. Computation of XML document similarity is a crucial task in clustering XML documents. In this paper we proposed a novel method to compute semantic structural similarity of an XML document by merging similar paths to address the above issues. In this method XML documents to be compared are represented by extracting all the paths from the root to the leaves and the comparison of paths is done based on a newly developed path matching algorithm. Similarity scores are given for exact, partial and contained in matches. In case of partial match merge operations are used namely the insertion of a new child (or descendants), parent (or ancestors) or both, and the creation of reference edges. More the number of merge operations more the dissimilarity of paths. Based on a similarity threshold the paths of XML documents are merged together and put in the same cluster and therefore avoiding pairwise similarity computations. Also, the matching process ensures the semantic structural similarity of the paths (i.e.) two XML paths may have a different order of hierarchy but semantically similar. Our proposed method shows an improved clustering accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tekli, J., Chbeir, R., Yetongnon, K.: An overview on XML similarity: background, current trends and future directions. Comput. Sci. Rev. 3(3), 151–173 (2009)
Article MATH Google Scholar
Aggarwal, C.C., Ta, N., Wang, J., Feng, J., Zaki, M.: Xproj: a framework for projected structural clustering of xml documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 46–55 (2007)
Google Scholar
Tai, K.C.: The tree-to-tree correction problem. J. ACM (JACM) 26, 433 (1979)
Article MathSciNet Google Scholar
Chawathe, S.S.: Comparing hierarchical data in external memory. In: Proceedings of the International Conference on Very Large Data Bases, pp. 90–101 (1999)
Google Scholar
Shasha, D., Zhang, K.: Approximate tree pattern matching, Pattern Matching in Strings. Trees and Arrays. Oxford University Press, Oxford (1995)
Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of ACM SIGMOD WebDB, pp. 61–66 (2002)
Google Scholar
Tekli, J., Chbeir, R.: A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. J. Web Semant. 11, 14–40 (2012)
Article Google Scholar
Rafiei, D., Moise, D,, Sun, D.: Finding syntactic similarities between xml documents. In: Proceedings of the 17th International Conference on Database and Expert Systems Applications, pp. 512–516 (2006)
Google Scholar
Buttler, D.: A short survey of document structure similarity algorithms. In: The 5th International Conference on Internet Computing, Las Vegas (2004)
Google Scholar
Joshi, S., Agrawal, N., Krishnapuram, R., Negi, S.: A bag of paths model for measuring structural similarity in web documents. In: Proceedings of the ACM SIGKKD Conference on Knowledge Discovery and Data Mining, pp. 577–582, USA (2003)
Google Scholar
Vacharaskunee, S., Sarun, I..: XML path matching for different hierarchy order of elements in XML documents. In: Proceedings of the 11th IEEE ACIS International Conference on Software Engineering Artificial Intelligence Networking and Parallel/Distributed Computing (SNPD) (2010)
Google Scholar
Choi, I., Moon, B., Kim, H.-J.: A clustering method based on path similarities of XML data. Data Knowl. Eng 60, 361–376 (2007)
Article Google Scholar
Vinson, A.R., Heuser, C.A., da Silva, A.S., De Moura, E.S.: An approach to XML path matching. In: The 9th Annual ACM International Workshop on Web Information and Data Management, pp. 17–24 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

PSG College of Technology, Coimbatore, India
Ramanathan Periakaruppan & Rethinaswamy Nadarajan

Authors

Ramanathan Periakaruppan
View author publications
You can also search for this author in PubMed Google Scholar
Rethinaswamy Nadarajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramanathan Periakaruppan .

Editor information

Editors and Affiliations

Dickson Computer Systems, Kowloon, Hong Kong SAR
Dickson K. W. Chiu
The University of Hong Kong, Pokfulam, Hong Kong SAR
Minhong Wang
University of Craiova, Craiova, Romania
Elvira Popescu
City University of Hong Kong, Hong Kong, Hong Kong SAR
Qing Li
City University of Hong Kong, Kowloon, Hong Kong SAR
Rynson Lau
Department of Computer Science and Information Engineering, National Central University, Jhongli City, Taiwan
Timothy K. Shih
Department of Electrical Engineering, National Cheng Kung University, Taiwan, Taiwan
Chu-Sing Yang
Digital Systems Centre for Research & Technology Hellas (CERTH), University of Piraeus Dept of Digital Systems, Piraeus, Greece
Demetrios G. Sampson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Periakaruppan, R., Nadarajan, R. (2015). Clustering XML Documents for Web Based Learning. In: Chiu, D., et al. Advances in Web-Based Learning – ICWL 2013 Workshops. ICWL 2013. Lecture Notes in Computer Science(), vol 8390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46315-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-662-46315-4_24
Published: 22 January 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46314-7
Online ISBN: 978-3-662-46315-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics