Bayesian Network Retrieval Discrimination Criteria Model Based on Unbalanced Information

  • Man Xu
  • Dan Gan
  • Jiang ShenEmail author
  • Bang An
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10983)


Unbalanced sample data are usually ignored in the process of case matching, but these data also lead to misclassification during case matching. To solve this problem, a discrimination criteria model based on the Bayesian network and corresponding algorithm is proposed in our paper. The Bayesian network cost sensitivity learning in this model uses the minimization theorem of loss function. We also introduce a ROC curve to evaluate the performance of the retrieval model and verify the validity of the model by using diagnostic data for clinical heart disease. Our results indicate that this method can effectively eliminate the cost sensitivity of imbalanced datasets and improve the accuracy of the retrieval results.


Case matching Bayesian network Unbalanced data Cost-sensitivity 


  1. 1.
    Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell. 2, 294–309 (2011)CrossRefGoogle Scholar
  2. 2.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  3. 3.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2009)Google Scholar
  4. 4.
    Pandey, B., Mishra, R.B.: Knowledge and intelligent computing system in medicine. Comput. Biol. Med. 39(3), 215–230 (2009)CrossRefGoogle Scholar
  5. 5.
    Jing, Z., Lin, F.: An algorithm of robust online extreme learning machine for dynamic imbalanced datasets. Comput. Res. Dev. 52(7), 1487–1498 (2015)Google Scholar
  6. 6.
    Weber, B., Reichert, M., Rinderle-Ma, S.: Change patterns and change support features–enhancing flexibility in process-aware information systems. Data Knowl. Eng. 66(3), 438–466 (2008)CrossRefGoogle Scholar
  7. 7.
    Bohmer, R.M.J.: Fixing health care on the front lines. Harv. Bus. Rev. 4(1), 1–7 (2010)Google Scholar
  8. 8.
    Rajput, Q.N., Haider, S.: Use of Bayesian network in information extraction from unstructured data sources. Int. J. Inf. Technol. 4, 207–213 (2009)Google Scholar
  9. 9.
    Uramoto, N., Matsuzawa, H., et al.: A text-mining system for knowledge discovery from biomedical documents. IBM Syst. J. 43(3), 516–533 (2010)CrossRefGoogle Scholar
  10. 10.
    Heeb, N.V., Bach, C., Am, M.P.F.: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf. Sci. 180(8), 1268–1291 (2010)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gao, Y.F., Tang, Y.L., Chen, Y.W.: Bayesian networks structure learning based on cost-sensitive criterion. J. Chin. Comput. Syst. 30(2), 313–316 (2009)Google Scholar
  12. 12.
    Waegeman, W., De Baets, B., Boullart, L.: ROC analysis in ordinal regression learning. Pattern Recognit. Lett. 29(1), 1–9 (2008)CrossRefGoogle Scholar
  13. 13.
    Zhang, X., Li, X., Feng, Y., et al.: The use of ROC and AUC in the validation of objective image fusion evaluation metrics. Signal Process. 115, 38–48 (2015)CrossRefGoogle Scholar
  14. 14.
    Dmochowski, J.P., Sajda, P., et al.: Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. J. Mach. Learn. Res. 11(18), 3313–3332 (2010)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Dey, D., Sarkar, S., De, P.: A probabilistic decision model for entity matching in heterogeneous Databases. Manag. Sci. 44(10), 1379–1387 (1998)CrossRefGoogle Scholar
  16. 16.
    Asuncion, A., Newman, D.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2007).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Business SchoolNankai UniversityTianjinChina
  2. 2.College of Management and EconomicsTianjin UniversityTianjinChina

Personalised recommendations