Skip to main content
Log in

Towards efficient XML parsing through minimization of JVM parameter space

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A significant increase in the usage of Extensible Markup Language (XML) data for various protocols and standards emphasizes the development of efficient XML parsers. For the Java language, the XML DOM parser despite performing in-memory operations is unable to achieve peak execution performance on modern systems, especially for parsing large XML files. The issue of inefficient execution may be mitigated by selecting appropriate runtime parameters for the Java Virtual Machine (JVM). This entails to exploring parameter space in an exhaustive manner that is not practically feasible for rapid application development. This paper aims at performance enhancement of XML parsing through selection of optimal set of JVM runtime parameters. The proposed approach works independent of parser design. It reduces JVM parameter space through machine learning-based models which are trained using profile data. The impact of parameters is determined using linear regression and artificial neural network-based models. The subsequent computation of a location-based weight vector along with a threshold value for filtration of parameters generates a set of optimal parameters for performance enhancement. The XML parsing code using the optimal parameters achieves average speedups of 13.18% and 21.42% over the standard code on Intel Xeon and Intel Core i7-based systems, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Ahmad I, Patil S, Sarangi SR (2018) Hpxa: a highly parallel xml parser. In: 2018 Design, Automation Test in Europe Conference Exhibition (DATE), pp 249–252

  2. Ali A, Wasimi S (2007) Data mining: methods and techniques. Thomson Learning Australia, Victoria

    Google Scholar 

  3. Amars M, de Camargo RY, Dyab M, Goldman A, Trystram D (2016) A comparison of GPU execution time prediction using machine learning and analytical modeling. In: 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA), pp 326–333

  4. Ardalani N, Lestourgeon C, Sankaralingam K, Zhu X (2015) Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance. In: 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 725–737. https://doi.org/10.1145/2830772.2830780

  5. Baldini I, Fink SJ, Altman E (2014) Predicting GPU performance from CPU runs using machine learning. In: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, pp 254–261. https://doi.org/10.1109/SBAC-PAD.2014.30

  6. Bhattacharya B, Habtzghi D (2002) Median of the p value under the alternative hypothesis. Am Stat 56(3):202–206. https://doi.org/10.1198/000313002146

    Article  MathSciNet  Google Scholar 

  7. Bunker RP, Thabtah F (2017) A machine learning framework for sport result prediction. Appl Comput Inf https://doi.org/10.1016/j.aci.2017.09.005

  8. Deshmukh V, Bamnote G (2015) An empirical evaluation of optimization parameters in xml parsing for performance enhancement. In: 2015 International Conference on Computer, Communication and Control (IC4). IEEE, pp 1–6

  9. Fadika Z, Head MR, Govindaraju M (2009) Parallel and distributed approach for processing large-scale xml datasets. In: 2009 10th IEEE/ACM International Conference on Grid Computing. IEEE, pp 105–112

  10. Ghosh A, Givargis T (2003) Analytical design space exploration of caches for embedded systems. In: 2003 Design, Automation and Test in Europe Conference and Exhibition, pp 650–655

  11. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems. Elsevier Science. https://books.google.com.pk/books?id=pQws07tdpjoC

  12. Hayashi A, Ishizaki K, Koblents G, Sarkar V (2015) Machine-learning-based performance heuristics for runtime cpu/gpu selection. In: Proceedings of the Principles and Practices of Programming on The Java Platform, PPPJ ’15. ACM, New York, NY, USA, pp 27–36. https://doi.org/10.1145/2807426.2807429

  13. Hu H, Tang L, Zhang S, Wang H (2018) Predicting the direction of stock markets using optimized neural networks with Google trends. Neurocomputing https://doi.org/10.1016/j.neucom.2018.01.038

  14. Huang W, Nakamori Y, Wang SY (2005) Forecasting stock market movement direction with support vector machine. Comput Oper Res 32(10):2513–2522 (Applications of neural networks)

    Article  MATH  Google Scholar 

  15. Ïpek E, McKee SA, Caruana R, de Supinski BR, Schulz M (2006) Efficiently exploring architectural design spaces via predictive modeling. SIGPLAN Not 41(11):195–206. https://doi.org/10.1145/1168918.1168882

    Article  Google Scholar 

  16. Jianliang M, Zhang S, Hu T, Wu M, Chen T (2012) Parallel speculative DOM-based xml parser. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication, 2012 IEEE 9th International Conference on Embedded Software and Systems, pp 33–40. https://doi.org/10.1109/HPCC.2012.15

  17. Jongerius R, Anghel A, Dittmann G, Mariani G, Vermij E, Corporaal H (2018) Analytic multi-core processor model for fast design-space exploration. IEEE Trans Comput 67(6):755–770

    Article  MathSciNet  Google Scholar 

  18. Krasnopolsky VM, Fox-Rabinovitz MS (2006) 2006 special issue: complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw 19(2):122–134. https://doi.org/10.1016/j.neunet.2006.01.002

    Article  Google Scholar 

  19. Lam TC, Ding JJ, Liu JC (2008) Xml document parsing: operational and performance characteristics. Computer 41(9):30–37. https://doi.org/10.1109/MC.2008.403

    Article  Google Scholar 

  20. Li G, Gao-Feng L, Zhong L, Ru-Kui A (2010) Xml processing by tree-branch symbiosis algorithm. In: 2010 2nd International Conference on Future Computer and Communication (ICFCC), vol 1. IEEE, pp V1–669

  21. Li J, Ma X, Singh K, Schulz M, de Supinski BR, McKee SA (2009) Machine learning based online performance prediction for runtime parallelization and task scheduling. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp 89–100

  22. Lu W, Chiu K, Pan Y (2006) A parallel approach to xml parsing. In: 2006 7th IEEE/ACM International Conference on Grid Computing, pp 223–230. https://doi.org/10.1109/ICGRID.2006.311019

  23. Oracle-Inc (2015) Java platform, standard edition hotspot virtual machine garbage collection tuning guide. https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/. Accessed 18 May 2018

  24. Ozisikyilmaz B, Memik G, Choudhary A (2008) Machine learning models to predict performance of computer system design alternatives. In: 2008 37th International Conference on Parallel Processing, pp 495–502. https://doi.org/10.1109/ICPP.2008.36

  25. Pestel SD, den Steen SV, Akram S, Eeckhout L (2018) Rppm: rapid performance prediction of multithreaded applications on multicore hardware. IEEE Comput Archit Lett 17(2):183–186

    Article  Google Scholar 

  26. Petridis V, Kaburlasos VG (2003) Finknn: a fuzzy interval number k-nearest neighbor classifier for prediction of sugar production from populations of samples. J Mach Learn Res 4:17–37. https://doi.org/10.1162/153244304322765621

    MATH  Google Scholar 

  27. Qaddoum K, Hines E, Illiescu D (2011) Adaptive neuro-fuzzy modeling for crop yield prediction. In: Proceedings of the 10th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, AIKED’11. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA, pp 199–204

  28. Sahin S, Cao W, Zhang Q, Liu L (2016) JVM configuration management and its performance impact for big data applications. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp 410–417

  29. Sarle W (2000) How to measure importance of inputs? ftp://ftp.sas.com/pub/neural/importance.html. Accessed 18 May 2018

  30. Schneider J, Kamiya T, Peintner D, Kyusakov R (2017) Efficient XML interchange (EXI) format 1.0 (2nd edn). https://www.w3.org/TR/2014/REC-exi-20140211/. Accessed 18 May 2018

  31. Sevarac Z, Koprivica M (2017) Getting started with neuroph. http://neuroph.sourceforge.net/. Accessed 18 May 2018

  32. Shah B, Rao P, Moon B, Rajagopalan M (2009) A data parallel algorithm for XML DOM parsing. In: Database and XML Technologies (XSym 2009), vol 5679, pp 75–90

  33. Shynkevich Y, McGinnity T, Coleman SA, Belatreche A, Li Y (2017) Forecasting price movements using technical indicators: investigating the impact of varying input window length. Neurocomputing 264:71–88 (Machine learning in finance)

    Article  Google Scholar 

  34. Silva LG, Martins CAPS, Goes LFW (2015) JVM configuration parameters space exploration for performance evaluation of parallel applications. IEEE Lat Am Trans 13(8):2776–2784

    Article  Google Scholar 

  35. Singh K, İpek E, McKee SA, de Supinski BR, Schulz M, Caruana R (2007) Predicting parallel application performance via machine learning approaches: research articles. Concurr Comput Pract Exp 19(17):2219–2235. https://doi.org/10.1002/cpe.v19:17

    Article  Google Scholar 

  36. Sprenger M, Schemm S, Oechslin R, Jenkner J (2017) Nowcasting foehn wind events using the adaboost machine learning algorithm. Weather Forecast 32(3):1079–1099

    Article  Google Scholar 

  37. Tan PN, Steinbach M, Karpatne A, Kumar V (2013) Introduction to data mining, 2nd edn. Pearson, London

    Google Scholar 

  38. Van Engelen RA (2004) Constructing finite state automata for high performance web services. In: IEEE International Conference on Web Services. Citeseer

  39. Wang G, Xu C, Li Y, Chen Y (2006) Analyzing xml parser memory characteristics: experiments towards improving web services performance. In: 2006 IEEE International Conference on Web Services (ICWS’06), pp 681–688. https://doi.org/10.1109/ICWS.2006.31

  40. Wu G, Greathouse JL, Lyashevsky A, Jayasena N, Chiou D (2015) Gpgpu performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp 564–576. https://doi.org/10.1109/HPCA.2015.7056063

  41. Ximpleware: VTD-XML: the future of xml processing (2003). https://vtd-xml.sourceforge.io/. Accessed 18 May 2018

  42. You CH, Wang SD (2011) A data parallel approach to xml parsing and query. In: 2011 IEEE International Conference on High Performance Computing and Communications, pp 520–527. https://doi.org/10.1109/HPCC.2011.74

  43. Yu Z, Wang J, Eeckhout L, Xu C (2018) QIG: quantifying the importance and interaction of GPGPU architecture parameters. IEEE Trans Comput Aided Des Integr Circuits Syst 37(6):1211–1224

    Article  Google Scholar 

  44. Yu Z, Xiong W, Eeckhout L, Bei Z, Mendelson A, Xu C (2018) Mia: metric importance analysis for big data workload characterization. IEEE Trans Parallel Distrib Syst 29(6):1371–1384

    Article  Google Scholar 

  45. Zhang W, Van Engelen R (2006) A table-driven streaming xml parsing methodology for high-performance web services. In: ICWS’06. International Conference on Web Services. IEEE, pp 197–204

  46. Zhang Y, Pan Y, Chiu K (2009) Speculative p-DFAs for parallel xml parsing. In: 2009 International Conference on High Performance Computing (HiPC). IEEE, pp 388–397

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minhaj Ahmad Khan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, M.A. Towards efficient XML parsing through minimization of JVM parameter space. J Supercomput 75, 3693–3711 (2019). https://doi.org/10.1007/s11227-018-2721-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2721-y

Keywords

Navigation