MapReduce distributed parallel computing framework for diagnosis and treatment of knee joint Kashin-Beck disease


To improve the accuracy and computational efficiency of the MapReduce distributed parallel computing framework, thereby mining the diagnosis and treatment data of Kashin-Beck Disease (KBD) of the knee joint. Based on the shortcomings of the traditional K-means Clustering Algorithm (KCA), a simplified method for distance calculation was proposed. The Manhattan distance was used instead of Euclidean distance. Further improvement strategies were proposed to implement and compare KCA of MapReduce (MR-KCA) and Improved MR-KCA (IMR-KCA). With the same data, the sum of squared errors of MR-KCA and IMR-KCA decreased with the increase in the number of center points. Compared with MR-KCA, the quality of IMR-KCA was higher, and their difference was especially evident at 8 GB data capacity. The total execution time of both MR-KCA and IMR-KCA increased with the increase in the number of center points. Compared to MR-KCA, the total execution time of IMR-KCA was significantly reduced, especially when the data capacity was 8 GB. When the number of center points was 5000, IMR-KCA could reduce the total execution time by 50%. Through experiments, IMR-KCA was proved to better present the diagnosis and treatment data of patients with knee joint KBD. The scalability rates of MR-KCA and IMR-KCA decreased as the number of nodes increased, but the scalability rates of both algorithms could be maintained above 0.80, which had better scalability. Compared with MR-KCA, IMR-KCA had significantly higher scalability. The IMR-KCA proposed in this study had high accuracy and computing efficiency, which could be used in the visualization of KBD diagnosis and treatment.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    Shi XW, Zhang F, Li ZY et al (2018) Polymorphism in rs2229783 of the alpha 1(XI) collagen gene is associated with susceptibility to but not severity of Kashin-Beck disease in a Northwest Chinese Han population. Biomed Environ Sci Bes 31(4):322–326

    Google Scholar 

  2. 2.

    Liu H M, Wang Y F, Wu J M, et al. (2020) A comparative study of clinical effect of total knee arthroplasty in the treatment of primary osteoarthritis and osteoarthritis of Kashin-Beck disease. Int Orthop pp 1–8

  3. 3.

    Ma M, Liang X, Wang X et al (2020) The molecular mechanism study of COMP involved in the articular cartilage damage of Kashin-Beck disease. Bone Joint Res 9(9):578–586

    Article  Google Scholar 

  4. 4.

    Li Y, Kang P, Zhou Z et al (2020) Magnetic resonance imaging at 7.0 T for evaluation of early lesions of epiphyseal plate and epiphyseal end in a rat model of KashinBeck disease. BMC musculoskelet disord 21(1):1–9

    Article  Google Scholar 

  5. 5.

    Wu F, Xu J, Zhu Z (2018) Protective effect of tetrandrine in a rabbit model of osteoarthritis. Arch Rheumatol 33(1):80–84

    Article  Google Scholar 

  6. 6.

    Yang L, Wang D, Li X et al (2020) Comparison of the responsiveness of the WOMAC and the 12-item WHODAS in patients with Kashin–Beck disease. BMC Musculoskelet Disord 21(1):188

    Article  Google Scholar 

  7. 7.

    Bendechache M, Tari AK, Kechadi MT (2019) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34(6):671–689

    Article  Google Scholar 

  8. 8.

    Shakeel PM, Baskar S, Dhulipala VRS et al (2018) Cloud based framework for diagnosis of diabetes mellitus using K-means clustering[J]. Health Inf Sci Syst 6(1):16

    Article  Google Scholar 

  9. 9.

    Ding H, Sun C, Zeng J (2020) Fuzzy weighted clustering method for numerical attributes of communication big data based on cloud computing. Symmetry 12(4):530

    Article  Google Scholar 

  10. 10.

    Rathee S, Kashyap A (2018) Adaptive-miner: an efficient distributed association rule mining algorithm on spark. J Big Data 5(1):6

    Article  Google Scholar 

  11. 11.

    Sardar TH, Ansari Z (2018) An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm. Future Comput Inform J 3(2):200–209

    Article  Google Scholar 

  12. 12.

    Feng X, Gao J (2019) Gene sequence input formatting and MapReduce computing. Int J Bioautom 23(2):233

    Article  Google Scholar 

  13. 13.

    Ding D, Han QL, Wang Z et al (2019) A survey on model-based distributed control and filtering for industrial cyber-physical systems. IEEE Trans Industr Inf 15(5):2483–2499

    Article  Google Scholar 

  14. 14.

    Chen X, Liu Z, Kim I (2020) A parallel computing framework for solving user equilibrium problem on computer clusters. Transportmetrica A: Transport Sci 16(3):550–573

    Article  Google Scholar 

  15. 15.

    Sardar TH, Ansari Z (2018) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Future Comput Inform J 3(2):247–261

    Article  Google Scholar 

  16. 16.

    Lee S, Kang S, Kim J et al (2019) Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark cluster. Cluster Comput 22(1):2063–2087

    Article  Google Scholar 

  17. 17.

    Zhang H, Wu Y (2018) Optimization and application of clustering algorithm in community discovery. Wireless Pers Commun 102(4):2443–2454

    Article  Google Scholar 

  18. 18.

    Xiao B, Wang Z, Liu Q et al (2018) SMK-means: an improved mini batch k-means algorithm based on mapreduce with big data. Comput Mater Continua 56(3):365–379

    MathSciNet  Google Scholar 

  19. 19.

    Chen C, Li K, Ouyang A et al (2018) Gflink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data[J]. IEEE Trans Parallel Distrib Syst 29(6):1275–1288

    Article  Google Scholar 

  20. 20.

    Qiu Z, Chen R, Yan M (2020) Monitoring data analysis technology of smart grid based on cloud computing. MS&E 750(1):012221

    Google Scholar 

Download references


This work was supported by Fund of Gansu Health Care Research Plan (GSWSKY-2019-12).

Author information



Corresponding authors

Correspondence to Lintao Li or Shensong Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dang, C., Yi, G., Zhu, Z. et al. MapReduce distributed parallel computing framework for diagnosis and treatment of knee joint Kashin-Beck disease. J Supercomput (2021).

Download citation


  • MapReduce distributed parallel computing framework
  • Knee joint Kashin-Beck disease
  • k-means clustering algorithm
  • Data mining
  • Center point