Quality & Quantity

, Volume 47, Issue 3, pp 1761–1779 | Cite as

Discovering medical quality of total hip arthroplasty by rough set classifier with imbalanced class

  • Min-Hsiung Wei
  • Ching-Hsue Cheng
  • Chung-Shih Huang
  • Po-Chang Chiang


The incidence of THA (total hip arthroplasty) will rise with an aging population and improvements in surgery, a feasible alternative in health care can effectively increase medical quality. The reason of a hip joint replaced is to relieve severe arthritis pain that is limiting your activities. Hip joint replacement is usually done in people age 60 and older. Younger people who have a hip replaced may put extra stress on the artificial hip. This paper uses a serious data screening function by experts to reduce data dimension after data collection from the National Health Insurance database. The proposed model also adopts an imbalanced sampling method to solve class imbalance problem, and utilizes rough set theory to find out core attributes (selected 7 features). Based on the core attributes, the extracted rules can be comprehensive for the rules of medical quality. In verification, THA dataset is taken as case study; the performance of the proposed model is verified and compared with other data-mining methods under various criteria. Furthermore, the performance of the proposed model is identified as winning the listing methods, as well as using hybrid-sampling can increase the far true-positive rate (minority class). The results show that the proposed model is efficient; the performance is superior to the listing methods under the listing criteria. And the generated decision rules and core attributes could find more managerial implication. Moreover, the result can provide stakeholders with useful THA information to help make decision.


Total hip arthroplasty Medical quality Rough set theory Data mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Antoniou J., Eisenberg M.J., Filion K.B., Huk L., Martineau P.A., Pilote L., Zukor D.J.: In-hospital cost of total hip arthroplasty in Canada and the United States. J. Bone Surg. 86A, 2435–2439 (2004)Google Scholar
  2. Batista G., Monard M.C., Prati R.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  3. Bazan J.: A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision table. In: Polkowski, L., Skowron, A. (eds) Rough Sets in Knowledge Discovery, pp. 321–365. Physica-Verlag, Heidelberg (1998)Google Scholar
  4. Boardman L.D., Lieberman R.J., Thomas J.B.: Impact of declining reimbursement and rising hospital costs on the feasibility of total hip arthroplasty. J. Arthroplast. 12(5), 526–534 (1997)CrossRefGoogle Scholar
  5. Bozic K.J., Durbhakula S., Berry D.J., Naessens J.M., Rappaport K., Cisternas M., Saleh K.J., Rubash H.E.: Differences in patient and procedure characteristics and hospital 450 resource use in primary and revision total joint arthroplasty: a multicenter study. J. Arthroplast 20(7), 17–25 (2005)CrossRefGoogle Scholar
  6. Bozic K.J., Katz P., Cisternas M., Ono L., Ries M.D., Showstack J.: Hospital resource utilization for primary and revision THA. J. Bone Jt. Surg. Am. 87(3), 570–576 (2005)CrossRefGoogle Scholar
  7. Bozic K.J., Wagie A., Naessens J.M., Berry D.J., Rubash H.E.: Predictors of discharge to an inpatient extended care facility after total hip or knee arthroplasty. J. Arthroplast. 21(6), 151–156 (2006)CrossRefGoogle Scholar
  8. Breiman L., Friedman J.H., Olshen R.A., Stone C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)Google Scholar
  9. Chawla, N.V.: C4.5 and imbalanced datasets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the ICML’03 Workshop on Class Imbalances, Washington, DC, August 2003Google Scholar
  10. Chawla N.V., Japkowicz N., Kotcz A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)CrossRefGoogle Scholar
  11. Chen L.S., Su C.T., Yih Y.: Knowledge acquisition through information granulation for imbalanced data. Expert Syst. Appl. 31, 531–541 (2006)CrossRefGoogle Scholar
  12. Chen L.S., Chen M.C., Hsu C.C., Zeng W.R.: An information granulation based data mining approach for classifying imbalanced data. Inf. Sci. 178, 3214–3227 (2008)CrossRefGoogle Scholar
  13. Chmielewski M.R., Grzymala-Busse J.W.: Global discretization of continuous attributes as preprocessing for machine learning. Int. J. Approx. Reason. 15, 319–331 (1996)CrossRefGoogle Scholar
  14. Chyi, Y.M.: Classification analysis techniques for skewed class distribution problems. Master Thesis, Department of Information Management, National Sun Yat-Sen University (2003)Google Scholar
  15. Conan-Guez, B., Rossi, F.: Multi-layer perceptrons for functional data analysis: a projection based approach. In: ICANN 2002, Madrid, Spain, pp. 667–672 (2002)Google Scholar
  16. Cortes C., Vapnik V.: Support-vector network. Mach. Learn. 20, 273–297 (1995)Google Scholar
  17. Department of Health, Executive Yuan, R.O.C. National Health Insurance.: Taiwan international network; 2008. Accessed 3 Nov 2009
  18. Dieppe P.A., Dixon T., Shaw M.E.: Analysis of regional variation in hip and knee joint replacement rates in England using Hospial Episodes Statistics. Public Health. 120(1), 83–90 (2006)CrossRefGoogle Scholar
  19. Dorr L.D., Thomas D., Long W.T., Polatin P.B., Sirianni L.E.: Psychologic reasons for patients preferring minimally invasive total hip arthroplasty. Clin. Orthop. Relat. Res. 458, 94–100 (2007)Google Scholar
  20. Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Working Notes of the ICML’03 Workshop Learning from Imbalanced Data Sets, Washington, DC (2003)Google Scholar
  21. Estabrooks A., Japkowicz N., Jo T.: A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)CrossRefGoogle Scholar
  22. Fernández A., García S., Herrera F., Jesus M.J.: A study of the behavior of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)CrossRefGoogle Scholar
  23. Greco S., Matarazzo B., Slowinski R.: Rough sets theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001)CrossRefGoogle Scholar
  24. Grzymala-Busse J.W.: LERS—a system for learning from samples based on rough sets. In: Slowinski, R. (eds) Intelligent Decision Support, pp. 3–18. Kluwer Academic Publishers, Norwell (1992)CrossRefGoogle Scholar
  25. Grzymala-Busse J.W.: A new version of the rule induction system LERS. Fundam. Inf. 31, 27–39 (1997)Google Scholar
  26. Grzymala-Busse, J.W., Jan, P., Zdzislaw, S.H.: Melanoma prediction using data mining system LERS. In: Proceedings of the 25th Annual International Computer Software and Applications Conference, Chicago, IL, USA, 8–12 Oct 2001, pp. 615–620Google Scholar
  27. Grzymala-Busse J.W., Stefanowski J., Wilk S.: A comparison of two approaches to data mining from imbalanced data. J. Intell. Manuf. 16, 565–573 (2005)CrossRefGoogle Scholar
  28. Halpern M., Kurtz S., Lau E., Mowat F., Ong K.: Projections of primary and revision hip and knee arthroplasty in the United States from 2005 to 2030. J. Bone Surg. 89, 780–785 (2007)CrossRefGoogle Scholar
  29. Hanley J.A., McNeil B.J.: A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148(3), 839–843 (1983)Google Scholar
  30. Holte R.C., Kubat M., Matwin S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)Google Scholar
  31. Hudak P.L., McKeever P.D., Wright J.G.: Understanding the meaning of satisfaction with treatment outcome. Med Care 42(8), 718–725 (2004)CrossRefGoogle Scholar
  32. Japkowicz N., Jo T.: Class imbalances versus small disjuncts. SIGKDD Explor. 6(1), 40–49 (2004)CrossRefGoogle Scholar
  33. Komarek, P., Moore, A.: Fast robust logistic regression for large sparse datasets with binary outputs. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics. Key West, FL (2003)Google Scholar
  34. Kreder H.J., Grosso P., Williams J.I., Jaglal S., Axcell T., Wal E.K., Stephen D.J.: Provider volume and other predictors of outcome after total knee arthroplasty: a population study in Ontario. Can. Med. Assoc. 46(1), 15–22 (2003)Google Scholar
  35. Kumar V., Steinbach M., Tan P.N.: Introduction to Data Mining. Pearson Education, Boston (2006)Google Scholar
  36. Kurtz, M.S., Ong, K., Schmier, J.: The Surgeons’ revision burden: analysis of caseload disparities in the United States from 1990 to 2003. 74th Annual Meeting of the American Academy of Orthopaedic Surgeons, San Diego, CA (2007)Google Scholar
  37. Maloof, M.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: Proceedings of the ICML’03 Workshop on Learning from Imbalanced Data Sets, Washington, DC (2003)Google Scholar
  38. Medsker L.R.: Hybrid Intelligent System. Kluwer Academic Publishers, Boston (1995)CrossRefGoogle Scholar
  39. Mendenhall S.: 2004 Hip and knee implant review. Orthop. Netw. News 15, 1–16 (2004)Google Scholar
  40. Ong K., Lau E., Manley M., Kurtz S.M.: Patient, hospital, and procedure characteristics influencing total, hip and knee arthroplasty procedure duration. J. Arthroplast. 24(6), 925–931 (2009)CrossRefGoogle Scholar
  41. Pawlak Z.: Rough sets. Inf. J. Comput. Inf. Sci. 11, 341–356 (1982)CrossRefGoogle Scholar
  42. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishing, Dordrecht. ISBN 0-7923-1472-7 (1991)Google Scholar
  43. Pawlak Z.: Rough set approach to knowledge-based decision support. Eur. J. Oper. Res. 99, 48–57 (1997)CrossRefGoogle Scholar
  44. Provost F.J., Weiss G.M.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)Google Scholar
  45. Quinlan J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  46. Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  47. Shan, N., Ziarko, W.: Discovering attribute relationships, dependencies and rules by using rough sets. Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS’95), Hawaii, 1995, pp. 293–299Google Scholar
  48. Stefanowski, J.: On rough set based approaches to induction of decision rules. In: Skowron, A., Polkowski, L. (eds.) Rough Sets in Knowledge Discovery, vol. 1(1), pp. 500–529. Physica Verlag, Heidelberg (1998)Google Scholar
  49. Wu X., Yang Q.: 10 Challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Min-Hsiung Wei
    • 1
  • Ching-Hsue Cheng
    • 2
  • Chung-Shih Huang
    • 3
  • Po-Chang Chiang
    • 2
  1. 1.Department of OrthopedicsJiannren HospitalKaohsiung CityTaiwan
  2. 2.Department of Information ManagementNational Yunlin University of Science and TechnologyTouliuTaiwan
  3. 3.Department of International Business AdministrationChienkuo Technology UniversityChanghuaTaiwan

Personalised recommendations