A Novel Three-Way Merge Algorithm for HTML/XML Documents Using a Hidden Markov Model

Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 283)


Ever since the advent of modern World Wide Web (WWW), collaborative editing of Web Pages at a programming level (via HTML/XML) has been identified as a major programming challenge. Yet surprisingly, relatively little effort has been put into the direction of developing sound algorithms and methodologies for meeting this challenge by automated means. In this paper a novel algorithmic approach to merging HTML/XML code documents is presented that is based on the “Three-way Merge” approach using Hidden Markov Models, the “line-of-code-per-line-of-code” comparison between the documents involved and the “Nested Parenthesis” principle. The algorithm can be easily extended to any level higher than the “Three-way Merge”, with, of course, its computational complexity increasing accordingly.


Algorithms Hidden Markov models Version control Document management HTML/XML diffing 



The authors would like to thank editors and anonymous reviewers for their valuable and constructive suggestions on this paper.


  1. 1.
    Bakaoukas, A.G., Bakaoukas, N.G.: A Top-down three-way merge algorithm for HTML/XML documents. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Computing Conference ’2020, London, UK, 16–17 July, 2020, Intelligent Computing. SAI 2020. Advances in Intelligent Systems and Computing, vol. 1228. Springer, Cham. Print ISBN: 978-3-030-52248-3, Online ISBN: 978-3-030-52249-0
  2. 2.
    Khanna, S., Kunal, K., Pierce, B.C.: A formal investigation of Diff3. In: Arvind, V., Prasad, S. (eds.) Foundations of Software Technology and Theoretical Computer Science (FSTTCS), December 2007Google Scholar
  3. 3.
    IBM Alphaworks: XML diff and merge tool home page.
  4. 4.
    The “DeltaXML” Project. Accessed 29 Mar 2019
  5. 5.
    Lindholm, T.: A three-way merge for XML documents. In: Proceedings of The 2004 ACM Symposium on Document Engineering, pp. 1–10 (2004).
  6. 6.
    Dinh, H.: A new approach to merging structured XML files. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 4(5) (2015)Google Scholar
  7. 7.
    Ba, M.L., Abdessalem, T., Senellart, P.: Merging uncertain multi-version XML documents, January 2013Google Scholar
  8. 8.
    Oliveira, A., et al.: An efficient similarity-based approach for comparing XML documents. Inf. Syst. 78, 40–57 (2018)CrossRefGoogle Scholar
  9. 9.
    Matthijs, N.: HTML, the foundation of the web.
  10. 10.
    Rozinajová, V., Hluchý, O.: One approach to HTML wrappers creation: using document object model tree. In: Proceedings of CompSysTech, pp. 1–6 (2009)Google Scholar
  11. 11.
    Barnard, D.: Tree-to-tree correction for document trees.
  12. 12.
    Cobena, G.: A comparative study for XML change detection. Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom J.: Change detection in hierarchically structured information. In: Proceedings of The 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 493–504 (1996)
  13. 13.
    Cobena G., Abiteboul, S., Marian, A.: Detecting changes in XML documents. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, pp. 41–52 (2002)Google Scholar
  14. 14.
    Ghahramani, Z.: An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recogn. Artif. Intell. 15(1), 9–42 (2001)CrossRefGoogle Scholar
  15. 15.
    Jelinek, F.: Statistical Methods For Speech Recognition (Language, Speech and Communication), 4th edn. Bradford Book, Cambridge (1998). ISBN-13: 978-0262100663Google Scholar

Copyright information

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022

Authors and Affiliations

  1. 1.School of Engineering and Sciences, Computer Science and Programming DepartmentFoundation CollegeAthensGreece
  2. 2.University of Northampton, Faculty of Arts, Science and TechnologyNorthamptonUK

Personalised recommendations