A Novel Three-Way Merge Algorithm for HTML/XML Documents Using a Hidden Markov Model

Bakaoukas, Nikolaos G.; Bakaoukas, Anastasios G.

doi:10.1007/978-3-030-80119-9_3

Nikolaos G. Bakaoukas¹⁰ &
Anastasios G. Bakaoukas¹¹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 283))

2399 Accesses

Abstract

Ever since the advent of modern World Wide Web (WWW), collaborative editing of Web Pages at a programming level (via HTML/XML) has been identified as a major programming challenge. Yet surprisingly, relatively little effort has been put into the direction of developing sound algorithms and methodologies for meeting this challenge by automated means. In this paper a novel algorithmic approach to merging HTML/XML code documents is presented that is based on the “Three-way Merge” approach using Hidden Markov Models, the “line-of-code-per-line-of-code” comparison between the documents involved and the “Nested Parenthesis” principle. The algorithm can be easily extended to any level higher than the “Three-way Merge”, with, of course, its computational complexity increasing accordingly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bakaoukas, A.G., Bakaoukas, N.G.: A Top-down three-way merge algorithm for HTML/XML documents. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Computing Conference ’2020, London, UK, 16–17 July, 2020, Intelligent Computing. SAI 2020. Advances in Intelligent Systems and Computing, vol. 1228. Springer, Cham. https://doi.org/10.1007/978-3-030-52249-0-6. Print ISBN: 978-3-030-52248-3, Online ISBN: 978-3-030-52249-0
Khanna, S., Kunal, K., Pierce, B.C.: A formal investigation of Diff3. In: Arvind, V., Prasad, S. (eds.) Foundations of Software Technology and Theoretical Computer Science (FSTTCS), December 2007
Google Scholar
IBM Alphaworks: XML diff and merge tool home page. http://www.alphaworks.ibm.com/tech/xmldiffmerge
The “DeltaXML” Project. http://www.deltaxml.com. Accessed 29 Mar 2019
Lindholm, T.: A three-way merge for XML documents. In: Proceedings of The 2004 ACM Symposium on Document Engineering, pp. 1–10 (2004). https://doi.org/10.1145/1030397.1030399
Dinh, H.: A new approach to merging structured XML files. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 4(5) (2015)
Google Scholar
Ba, M.L., Abdessalem, T., Senellart, P.: Merging uncertain multi-version XML documents, January 2013
Google Scholar
Oliveira, A., et al.: An efficient similarity-based approach for comparing XML documents. Inf. Syst. 78, 40–57 (2018)
Article Google Scholar
Matthijs, N.: HTML, the foundation of the web. http://www.wpdfd.com/issues/86/htmlthefoundationoftheweb/
Rozinajová, V., Hluchý, O.: One approach to HTML wrappers creation: using document object model tree. In: Proceedings of CompSysTech, pp. 1–6 (2009)
Google Scholar
Barnard, D.: Tree-to-tree correction for document trees. http://citeseer.ist.psu.edu/47676.html
Cobena, G.: A comparative study for XML change detection. http://citeseer.ist.psu.edu/696350.html Chawathe, S.S., Rajaraman, A., Garcia-Molina, H., Widom J.: Change detection in hierarchically structured information. In: Proceedings of The 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 493–504 (1996)
Cobena G., Abiteboul, S., Marian, A.: Detecting changes in XML documents. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, pp. 41–52 (2002)
Google Scholar
Ghahramani, Z.: An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recogn. Artif. Intell. 15(1), 9–42 (2001)
Article Google Scholar
Jelinek, F.: Statistical Methods For Speech Recognition (Language, Speech and Communication), 4th edn. Bradford Book, Cambridge (1998). ISBN-13: 978-0262100663
Google Scholar

Download references

Acknowledgment

The authors would like to thank editors and anonymous reviewers for their valuable and constructive suggestions on this paper.

Author information

Authors and Affiliations

School of Engineering and Sciences, Computer Science and Programming Department, Foundation College, Mitropoleos Street and Mnisikleous Campus, 105 56, Athens, Greece
Nikolaos G. Bakaoukas
University of Northampton, Faculty of Arts, Science and Technology, Waterside Campus, University Drive, Northampton, NN1 5PH, UK
Anastasios G. Bakaoukas

Authors

Nikolaos G. Bakaoukas
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios G. Bakaoukas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasios G. Bakaoukas .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bakaoukas, N.G., Bakaoukas, A.G. (2022). A Novel Three-Way Merge Algorithm for HTML/XML Documents Using a Hidden Markov Model. In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-030-80119-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-80119-9_3
Published: 13 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80118-2
Online ISBN: 978-3-030-80119-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics