Abstract
Multi-dimensional classifiers are Bayesian networks of restricted topological structure, for classifying data instances into multiple classes. We show that upon varying their parameter probabilities, the graphical properties of these classifiers induce higher-order sensitivity functions of restricted functional form. To allow ready interpretation of these functions, we introduce the concept of balanced sensitivity function in which parameter probabilities are related by the odds ratios of their original and new values. We demonstrate that these balanced functions provide a suitable heuristic for tuning multi-dimensional Bayesian network classifiers, with guaranteed bounds on the changes of all output probabilities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In earlier research, we introduced the related concept of sliced sensitivity function [3] which specifies an output probability of a Bayesian network in n linearly related parameters.
References
Bielza, C., Li, G., Larrañaga, P.: Multi-dimensional classification with Bayesian networks. Int. J. Approximate Reasoning 52, 705–727 (2011)
De Bock, J., de Campos, C.P., Antonucci, A.: Global sensitivity analysis for MAP inference in graphical models. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2690–2698 (2014)
Bolt, J.H., Renooij, S.: Local sensitivity of Bayesian networks to multiple parameter shifts. In: van der Gaag, L.C., Feelders, A.J. (eds.) PGM 2014. LNCS, vol. 8754, pp. 65–80. Springer, Switzerland (2014)
Borchani, H., Bielza, C., Toro, C., Larrañaga, P.: Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers. Artif. Intell. Med. 57, 219–229 (2013)
Chan, H., Darwiche, A.: A distance measure for bounding probabilistic belief change. Int. J. Approximate Reasoning 38, 149–174 (2005)
van der Gaag, L.C., de Waal, P.R.: Multi-dimensional Bayesian network classifiers. In: Vomlel, J., Studený, M. (eds.) Proceedings of the Third European Workshop in Probabilistic Graphical Models, pp. 107–114 (2006)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Palo Alto (1988)
Renooij, S.: Co-variation for sensitivity analysis in Bayesian networks: properties, consequences and alternatives. Int. J. Approximate Reasoning 55, 1022–1042 (2014)
de Waal, P.R., van der Gaag, L.C.: Inference and learning in multi-dimensional Bayesian network classifiers. In: Mellouli, K. (ed.) ECSQARU 2007. LNCS (LNAI), vol. 4724, pp. 501–511. Springer, Heidelberg (2007)
Acknowledgements
This work was supported by the Netherlands Organisation for Scientific Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Proof of Proposition 1. Let \( MDC (\mathbf{C,F})\) be a multi-dimensional classifier as before. Writing the output probability \(\Pr (\mathbf{c} \!\mid \!\mathbf{f})\) for a given \(\mathbf{c}\) and \(\mathbf{f}\) as \(\Pr (\mathbf{c} \!\mid \!\mathbf{f}) = \left( \Pr (\mathbf{f}\! \mid \!\mathbf{c})\cdot \Pr (\mathbf{c})\right) / \left( {\sum _{\mathbf{C}}\Pr (\mathbf{f}\! \mid \!\mathbf{C})\cdot \Pr (\mathbf{C})}\right) \), and including terms involving the original probability values \(\Pr ^o(\mathbf{c}\mid \mathbf{f})\) and \(\Pr ^o(\mathbf{c})\), results in
Rearranging its summands into a single fraction gives for the denominator
where \(\mathbf{C}\backslash \mathbf{c}^*\) is used to denote the set of all joint assignments to \(\mathbf{C}\) except \(\mathbf{c}^*\). Substitution and simplification now gives
in which we used that \(\Pr (\mathbf{f} \! \mid \! \mathbf{c}) = \prod _{i}\Pr (f_i \! \mid \! \mathbf{c}, \mathbf{f}_{F_i})\) with \(f_i,\mathbf{f}_{F_i}\sim \mathbf{f}\), and that \(\Pr (\mathbf{c})=\prod _{j}\Pr (c_j)\) with \(c_j\sim \mathbf{c}\). We then find that
   \(\Box \)
Proof of Proposition 2. For the one-way sensitivity function describing the output probability \(\Pr (\mathbf{c \mid \mathbf{f}})\) of an MDC in a parameter \(x \sim \mathbf{c}\), we have that \(\Pr (\mathbf{c\mid \mathbf{f}})(x) = (x\cdot r) / (x\cdot s + t)\), where \(r,s,t \ge 0\) since these constants arise from multiplication and addition of probabilities. The function’s first derivative equals \(\Pr (\mathbf{c\mid \mathbf{f}})'(x)= (r\cdot t) / (s\cdot x+t)^2\), which is always positive. Irrespective of the values of the other parameters in the classifier therefore, an increase in value of \(x \sim \mathbf{c}\) will result in an increase of \(\Pr (\mathbf{c\mid \mathbf{f}})\). Similarly, the output probability increases with a decrease in value of \(x \not \sim \mathbf{c}\).    \(\Box \)
Proof of Proposition 3. Let \( MDC (\mathbf{C,F})\), \(\mathbf{G}\) and \(\mathbf{x}\) be as stated in the proposition, and let \(\mathbf{H}\) be such that \(\mathbf{H}=\mathbf{F}\backslash \mathbf{G}\). We first show that the proposition holds for any value combination \(\mathbf{c}\in \mathbf{C}\) given a fixed instance \(\mathbf{f}\). Using Proposition 1 we find that
from which we find
and hence
If \(\mathbf{x}\) includes all parameters of the classifier, from each probability table exactely two parameters will not cancel out from the fraction \((\prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j) \, / \, (\prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j)\). For each such parameter x, the fraction includes either \(\frac{x}{x^o}\) or \(\frac{x^o}{x}\). Now, for \(\alpha \ge 1\), we have that \(\frac{x}{x^o},\frac{x^o}{x}\in [1/\alpha ,\alpha ]\). With a balanced sensitivity function therefore, the minimum of the fraction equals \(1/\alpha ^k\) and the maximum is \(\alpha ^k\), where k is two times the number of probability tables. If \(\mathbf{x}\) includes just a subset of the classifier’s parameters, we find that \(k=s+2\cdot t\), where s is the number of probability tables from which just a single parameter is in \(\mathbf{x}\) and t is the number of tables with two or more parameters in \(\mathbf{x}\).
For an instance \(\mathbf{f^{\prime }}\not \sim \mathbf{f}\), we find \(\Pr (\mathbf{c} \!\mid \! \mathbf{f^{\prime }})\) by replacing (some of) the parameters in the fraction above by their proportional co-variant, which gives \(\frac{1-x}{1-x^o}\) or its reciprocal. Since for \(\alpha \ge 1\), these fractions are in \([1/\alpha ,\alpha ]\) as well, the proof above generalises to all instances in \(\mathbf{F}\). For a partial instance \(\mathbf{g}\) we have that \(\Pr (\mathbf{C\mid \mathbf{g}})=\sum _\mathbf{H}\Pr (\mathbf{C}\mid \mathbf{g},\mathbf{H})\cdot \Pr (\mathbf{H}\mid \mathbf{g})\). Since \(({O(\mathbf{C}\mid \mathbf{g}\mathbf{H})}) / ({O^o(\mathbf{C}\mid \mathbf{g}\mathbf{H})}) \in [1/\alpha ^k,\alpha ^k]\) and \(\sum _\mathbf{H}\Pr (\mathbf{H}\mid \mathbf{g})=1\), we further find that \(({O(\mathbf{C}\mid \mathbf{g})}) / ({O^o(\mathbf{C}\mid \mathbf{g})}) \in [1/\alpha ^k,\alpha ^k]\) for all \(\mathbf{g}\in \mathbf{G}\). Â Â Â \(\Box \)
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bolt, J.H., van der Gaag, L.C. (2015). Balanced Tuning of Multi-dimensional Bayesian Network Classifiers. In: Destercke, S., Denoeux, T. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2015. Lecture Notes in Computer Science(), vol 9161. Springer, Cham. https://doi.org/10.1007/978-3-319-20807-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-20807-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20806-0
Online ISBN: 978-3-319-20807-7
eBook Packages: Computer ScienceComputer Science (R0)