Skip to main content

Clustering XML Documents for Web Based Learning

  • Conference paper
  • First Online:
Advances in Web-Based Learning – ICWL 2013 Workshops (ICWL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8390))

Included in the following conference series:

  • 1041 Accesses

Abstract

The Web is increasingly used as a source of information for learning. Hence it is necessary that information on the web should be organized so that it can be used by the stakeholders efficiently. Most of the information in web is available in the form of XML documents. Grouping/clustering XML documents enhances the information retrieval process effectiveness. Computation of XML document similarity is a crucial task in clustering XML documents. In this paper we proposed a novel method to compute semantic structural similarity of an XML document by merging similar paths to address the above issues. In this method XML documents to be compared are represented by extracting all the paths from the root to the leaves and the comparison of paths is done based on a newly developed path matching algorithm. Similarity scores are given for exact, partial and contained in matches. In case of partial match merge operations are used namely the insertion of a new child (or descendants), parent (or ancestors) or both, and the creation of reference edges. More the number of merge operations more the dissimilarity of paths. Based on a similarity threshold the paths of XML documents are merged together and put in the same cluster and therefore avoiding pairwise similarity computations. Also, the matching process ensures the semantic structural similarity of the paths (i.e.) two XML paths may have a different order of hierarchy but semantically similar. Our proposed method shows an improved clustering accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tekli, J., Chbeir, R., Yetongnon, K.: An overview on XML similarity: background, current trends and future directions. Comput. Sci. Rev. 3(3), 151–173 (2009)

    Article  MATH  Google Scholar 

  2. Aggarwal, C.C., Ta, N., Wang, J., Feng, J., Zaki, M.: Xproj: a framework for projected structural clustering of xml documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 46–55 (2007)

    Google Scholar 

  3. Tai, K.C.: The tree-to-tree correction problem. J. ACM (JACM) 26, 433 (1979)

    Article  MathSciNet  Google Scholar 

  4. Chawathe, S.S.: Comparing hierarchical data in external memory. In: Proceedings of the International Conference on Very Large Data Bases, pp. 90–101 (1999)

    Google Scholar 

  5. Shasha, D., Zhang, K.: Approximate tree pattern matching, Pattern Matching in Strings. Trees and Arrays. Oxford University Press, Oxford (1995)

    Google Scholar 

  6. Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of ACM SIGMOD WebDB, pp. 61–66 (2002)

    Google Scholar 

  7. Tekli, J., Chbeir, R.: A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. J. Web Semant. 11, 14–40 (2012)

    Article  Google Scholar 

  8. Rafiei, D., Moise, D,, Sun, D.: Finding syntactic similarities between xml documents. In: Proceedings of the 17th International Conference on Database and Expert Systems Applications, pp. 512–516 (2006)

    Google Scholar 

  9. Buttler, D.: A short survey of document structure similarity algorithms. In: The 5th International Conference on Internet Computing, Las Vegas (2004)

    Google Scholar 

  10. Joshi, S., Agrawal, N., Krishnapuram, R., Negi, S.: A bag of paths model for measuring structural similarity in web documents. In: Proceedings of the ACM SIGKKD Conference on Knowledge Discovery and Data Mining, pp. 577–582, USA (2003)

    Google Scholar 

  11. Vacharaskunee, S., Sarun, I..: XML path matching for different hierarchy order of elements in XML documents. In: Proceedings of the 11th IEEE ACIS International Conference on Software Engineering Artificial Intelligence Networking and Parallel/Distributed Computing (SNPD) (2010)

    Google Scholar 

  12. Choi, I., Moon, B., Kim, H.-J.: A clustering method based on path similarities of XML data. Data Knowl. Eng 60, 361–376 (2007)

    Article  Google Scholar 

  13. Vinson, A.R., Heuser, C.A., da Silva, A.S., De Moura, E.S.: An approach to XML path matching. In: The 9th Annual ACM International Workshop on Web Information and Data Management, pp. 17–24 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramanathan Periakaruppan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Periakaruppan, R., Nadarajan, R. (2015). Clustering XML Documents for Web Based Learning. In: Chiu, D., et al. Advances in Web-Based Learning – ICWL 2013 Workshops. ICWL 2013. Lecture Notes in Computer Science(), vol 8390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46315-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46315-4_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46314-7

  • Online ISBN: 978-3-662-46315-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics