Abstract
Unbalanced sample data are usually ignored in the process of case matching, but these data also lead to misclassification during case matching. To solve this problem, a discrimination criteria model based on the Bayesian network and corresponding algorithm is proposed in our paper. The Bayesian network cost sensitivity learning in this model uses the minimization theorem of loss function. We also introduce a ROC curve to evaluate the performance of the retrieval model and verify the validity of the model by using diagnostic data for clinical heart disease. Our results indicate that this method can effectively eliminate the cost sensitivity of imbalanced datasets and improve the accuracy of the retrieval results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell. 2, 294–309 (2011)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9, 1263–1284 (2009)
Pandey, B., Mishra, R.B.: Knowledge and intelligent computing system in medicine. Comput. Biol. Med. 39(3), 215–230 (2009)
Jing, Z., Lin, F.: An algorithm of robust online extreme learning machine for dynamic imbalanced datasets. Comput. Res. Dev. 52(7), 1487–1498 (2015)
Weber, B., Reichert, M., Rinderle-Ma, S.: Change patterns and change support features–enhancing flexibility in process-aware information systems. Data Knowl. Eng. 66(3), 438–466 (2008)
Bohmer, R.M.J.: Fixing health care on the front lines. Harv. Bus. Rev. 4(1), 1–7 (2010)
Rajput, Q.N., Haider, S.: Use of Bayesian network in information extraction from unstructured data sources. Int. J. Inf. Technol. 4, 207–213 (2009)
Uramoto, N., Matsuzawa, H., et al.: A text-mining system for knowledge discovery from biomedical documents. IBM Syst. J. 43(3), 516–533 (2010)
Heeb, N.V., Bach, C., Am, M.P.F.: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf. Sci. 180(8), 1268–1291 (2010)
Gao, Y.F., Tang, Y.L., Chen, Y.W.: Bayesian networks structure learning based on cost-sensitive criterion. J. Chin. Comput. Syst. 30(2), 313–316 (2009)
Waegeman, W., De Baets, B., Boullart, L.: ROC analysis in ordinal regression learning. Pattern Recognit. Lett. 29(1), 1–9 (2008)
Zhang, X., Li, X., Feng, Y., et al.: The use of ROC and AUC in the validation of objective image fusion evaluation metrics. Signal Process. 115, 38–48 (2015)
Dmochowski, J.P., Sajda, P., et al.: Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. J. Mach. Learn. Res. 11(18), 3313–3332 (2010)
Dey, D., Sarkar, S., De, P.: A probabilistic decision model for entity matching in heterogeneous Databases. Manag. Sci. 44(10), 1379–1387 (1998)
Asuncion, A., Newman, D.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2007). http://www.ics.uci.edu/mlearn/MLRepository.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, M., Gan, D., Shen, J., An, B. (2018). Bayesian Network Retrieval Discrimination Criteria Model Based on Unbalanced Information. In: Chen, H., Fang, Q., Zeng, D., Wu, J. (eds) Smart Health. ICSH 2018. Lecture Notes in Computer Science(), vol 10983. Springer, Cham. https://doi.org/10.1007/978-3-030-03649-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-03649-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03648-5
Online ISBN: 978-3-030-03649-2
eBook Packages: Computer ScienceComputer Science (R0)