Balanced Tuning of Multi-dimensional Bayesian Network Classifiers

Bolt, Janneke H.; van der Gaag, Linda C.

doi:10.1007/978-3-319-20807-7_19

Janneke H. Bolt⁶ &
Linda C. van der Gaag⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9161))

Included in the following conference series:

European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty

802 Accesses
1 Citations

Abstract

Multi-dimensional classifiers are Bayesian networks of restricted topological structure, for classifying data instances into multiple classes. We show that upon varying their parameter probabilities, the graphical properties of these classifiers induce higher-order sensitivity functions of restricted functional form. To allow ready interpretation of these functions, we introduce the concept of balanced sensitivity function in which parameter probabilities are related by the odds ratios of their original and new values. We demonstrate that these balanced functions provide a suitable heuristic for tuning multi-dimensional Bayesian network classifiers, with guaranteed bounds on the changes of all output probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In earlier research, we introduced the related concept of sliced sensitivity function [3] which specifies an output probability of a Bayesian network in n linearly related parameters.

References

Bielza, C., Li, G., Larrañaga, P.: Multi-dimensional classification with Bayesian networks. Int. J. Approximate Reasoning 52, 705–727 (2011)
Article MathSciNet MATH Google Scholar
De Bock, J., de Campos, C.P., Antonucci, A.: Global sensitivity analysis for MAP inference in graphical models. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2690–2698 (2014)
Google Scholar
Bolt, J.H., Renooij, S.: Local sensitivity of Bayesian networks to multiple parameter shifts. In: van der Gaag, L.C., Feelders, A.J. (eds.) PGM 2014. LNCS, vol. 8754, pp. 65–80. Springer, Switzerland (2014)
Google Scholar
Borchani, H., Bielza, C., Toro, C., Larrañaga, P.: Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers. Artif. Intell. Med. 57, 219–229 (2013)
Article Google Scholar
Chan, H., Darwiche, A.: A distance measure for bounding probabilistic belief change. Int. J. Approximate Reasoning 38, 149–174 (2005)
Article MathSciNet MATH Google Scholar
van der Gaag, L.C., de Waal, P.R.: Multi-dimensional Bayesian network classifiers. In: Vomlel, J., Studený, M. (eds.) Proceedings of the Third European Workshop in Probabilistic Graphical Models, pp. 107–114 (2006)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Palo Alto (1988)
MATH Google Scholar
Renooij, S.: Co-variation for sensitivity analysis in Bayesian networks: properties, consequences and alternatives. Int. J. Approximate Reasoning 55, 1022–1042 (2014)
Article MathSciNet MATH Google Scholar
de Waal, P.R., van der Gaag, L.C.: Inference and learning in multi-dimensional Bayesian network classifiers. In: Mellouli, K. (ed.) ECSQARU 2007. LNCS (LNAI), vol. 4724, pp. 501–511. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the Netherlands Organisation for Scientific Research.

Author information

Authors and Affiliations

Department of Information and Computing Sciences, Utrecht University, P.O. Box 80.089, 3508 TB, Utrecht, The Netherlands
Janneke H. Bolt & Linda C. van der Gaag

Authors

Janneke H. Bolt
View author publications
You can also search for this author in PubMed Google Scholar
Linda C. van der Gaag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Janneke H. Bolt .

Editor information

Editors and Affiliations

Université de Technologie de Compiègne, Compiègne, France
Sébastien Destercke
Centre de Recherches de Royallieu, Université de Technologie de Compiègne, Compiègne, France
Thierry Denoeux

Appendix

Proof of Proposition 1. Let $ MDC (\mathbf{C,F})$ be a multi-dimensional classifier as before. Writing the output probability $\Pr (\mathbf{c} \!\mid \!\mathbf{f})$ for a given $\mathbf{c}$ and $\mathbf{f}$ as $\Pr (\mathbf{c} \!\mid \!\mathbf{f}) = \left( \Pr (\mathbf{f}\! \mid \!\mathbf{c})\cdot \Pr (\mathbf{c})\right) / \left( {\sum _{\mathbf{C}}\Pr (\mathbf{f}\! \mid \!\mathbf{C})\cdot \Pr (\mathbf{C})}\right) $, and including terms involving the original probability values $\Pr ^o(\mathbf{c}\mid \mathbf{f})$ and $\Pr ^o(\mathbf{c})$, results in

$$\begin{aligned} \Pr (\mathbf{c}\! \mid \! \mathbf{f}) \ = \ \frac{\Big (\frac{\Pr (\mathbf{f}\mid \mathbf{c})\cdot \Pr (\mathbf{c})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{c})\cdot \Pr ^o(\mathbf{c})}{\Pr ^o(\mathbf{f})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{c})\cdot \Pr ^o(\mathbf{c})}\Big )}{\Big (\sum _{\mathbf{C}}\frac{\Pr (\mathbf{f}\mid \mathbf{C})\cdot \Pr (\mathbf{C})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}{\Pr ^o(\mathbf{f})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}\Big )} = \frac{\Big (\frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{c})\cdot \Pr (\mathbf{c})}{\Pr ^o(\mathbf{f}\mid \mathbf{c})\cdot \Pr ^o(\mathbf{c})}\Big )}{\Big (\sum _\mathbf{C}\frac{\Pr ^o(\mathbf{C}\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{C})\cdot \Pr (\mathbf{C})}{\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}\Big )}\\ \end{aligned}$$

Rearranging its summands into a single fraction gives for the denominator

$$\begin{aligned} \frac{\sum \limits _{\mathbf{c}^*\in \mathbf{C}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f}) \cdot \Pr (\mathbf{f}\mid \mathbf{c}^*)\cdot \Pr (\mathbf{c}^*)\cdot \prod \limits _{\mathbf{C}\backslash \mathbf{c}^*} \Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C}) \right) }{\prod \limits _{\mathbf{C}}\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})} \end{aligned}$$

where $\mathbf{C}\backslash \mathbf{c}^*$ is used to denote the set of all joint assignments to $\mathbf{C}$ except $\mathbf{c}^*$. Substitution and simplification now gives

$$\begin{aligned}&\Pr ( \mathbf{c} \!\mid \mathbf{f}) \ = \ \frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{c})\cdot \Pr (\mathbf{c})\cdot \prod _{\mathbf{C}\backslash \mathbf{c}}\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}{\sum _{\mathbf{c}^*\in \mathbf{C}}\Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{c}^*)\cdot \Pr (\mathbf{c}^*)\cdot \prod _{\mathbf{C}\backslash \mathbf{c}^*}\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}\\&\qquad =\frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \prod _{i}\Pr (f_i\mid \mathbf{c}, \mathbf{f}_{F_i})\cdot \Pr (\mathbf{c})\cdot \prod _{\mathbf{C}\backslash \mathbf{c}}\prod _{i}\Pr ^o(f_i\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}{\sum _{\mathbf{c}^*\in \mathbf{C}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{i}\Pr (f_i\mid \mathbf{c}^*, \mathbf{f}_{F_i})\cdot \Pr (\mathbf{c}^*)\cdot \prod _{\mathbf{C}\backslash \mathbf{c}^*}\prod _{i}\Pr ^o(f_i\mid \mathbf{C}, \mathbf{f}_{F_i})\cdot \Pr ^o(\mathbf{C})\right) }\\ \end{aligned}$$

in which we used that $\Pr (\mathbf{f} \! \mid \! \mathbf{c}) = \prod _{i}\Pr (f_i \! \mid \! \mathbf{c}, \mathbf{f}_{F_i})$ with $f_i,\mathbf{f}_{F_i}\sim \mathbf{f}$, and that $\Pr (\mathbf{c})=\prod _{j}\Pr (c_j)$ with $c_j\sim \mathbf{c}$. We then find that

$$\begin{aligned} \Pr (\mathbf{c}\mid \mathbf{f})(\mathbf x)= & {} \frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\sum _{\mathbf{c}^*\in \mathbf{C}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j \right) } \end{aligned}$$

$\Box $

Proof of Proposition 2. For the one-way sensitivity function describing the output probability $\Pr (\mathbf{c \mid \mathbf{f}})$ of an MDC in a parameter $x \sim \mathbf{c}$, we have that $\Pr (\mathbf{c\mid \mathbf{f}})(x) = (x\cdot r) / (x\cdot s + t)$, where $r,s,t \ge 0$ since these constants arise from multiplication and addition of probabilities. The function’s first derivative equals $\Pr (\mathbf{c\mid \mathbf{f}})'(x)= (r\cdot t) / (s\cdot x+t)^2$, which is always positive. Irrespective of the values of the other parameters in the classifier therefore, an increase in value of $x \sim \mathbf{c}$ will result in an increase of $\Pr (\mathbf{c\mid \mathbf{f}})$. Similarly, the output probability increases with a decrease in value of $x \not \sim \mathbf{c}$. $\Box $

Proof of Proposition 3. Let $ MDC (\mathbf{C,F})$, $\mathbf{G}$ and $\mathbf{x}$ be as stated in the proposition, and let $\mathbf{H}$ be such that $\mathbf{H}=\mathbf{F}\backslash \mathbf{G}$. We first show that the proposition holds for any value combination $\mathbf{c}\in \mathbf{C}$ given a fixed instance $\mathbf{f}$. Using Proposition 1 we find that

$$\begin{aligned} O(\mathbf{c}\mid \mathbf{f})(\mathbf{x}) = \frac{\Pr (\mathbf{c}\mid \mathbf{f})(\mathbf{x}) }{1-\Pr (\mathbf{c}\mid \mathbf{f})(\mathbf{x}) } =\frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\sum _{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j \right) } \end{aligned}$$

from which we find

$$\begin{aligned} \frac{O(\mathbf{c}\mid \mathbf{f})(\mathbf{x}) }{O^o(\mathbf{c}\mid \mathbf{f})} = \frac{\sum _{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}} x_i\cdot x^o_j \right) }{\sum _{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j \right) } \end{aligned}$$

and hence

$$\begin{aligned} \mathrm{min}_{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}}\frac{\prod \limits _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\prod \limits _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j} \ \ \le \ \ \frac{O(\mathbf{c}\mid \mathbf{f})(\mathbf{x}) }{O^o(\mathbf{c}\mid \mathbf{f})}\ \ \le \ \ \mathrm{max}_{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}}\frac{\prod \limits _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\prod \limits _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j} \end{aligned}$$

If $\mathbf{x}$ includes all parameters of the classifier, from each probability table exactely two parameters will not cancel out from the fraction $(\prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j) \, / \, (\prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j)$. For each such parameter x, the fraction includes either $\frac{x}{x^o}$ or $\frac{x^o}{x}$. Now, for $\alpha \ge 1$, we have that $\frac{x}{x^o},\frac{x^o}{x}\in [1/\alpha ,\alpha ]$. With a balanced sensitivity function therefore, the minimum of the fraction equals $1/\alpha ^k$ and the maximum is $\alpha ^k$, where k is two times the number of probability tables. If $\mathbf{x}$ includes just a subset of the classifier’s parameters, we find that $k=s+2\cdot t$, where s is the number of probability tables from which just a single parameter is in $\mathbf{x}$ and t is the number of tables with two or more parameters in $\mathbf{x}$.

For an instance $\mathbf{f^{\prime }}\not \sim \mathbf{f}$, we find $\Pr (\mathbf{c} \!\mid \! \mathbf{f^{\prime }})$ by replacing (some of) the parameters in the fraction above by their proportional co-variant, which gives $\frac{1-x}{1-x^o}$ or its reciprocal. Since for $\alpha \ge 1$, these fractions are in $[1/\alpha ,\alpha ]$ as well, the proof above generalises to all instances in $\mathbf{F}$. For a partial instance $\mathbf{g}$ we have that $\Pr (\mathbf{C\mid \mathbf{g}})=\sum _\mathbf{H}\Pr (\mathbf{C}\mid \mathbf{g},\mathbf{H})\cdot \Pr (\mathbf{H}\mid \mathbf{g})$. Since $({O(\mathbf{C}\mid \mathbf{g}\mathbf{H})}) / ({O^o(\mathbf{C}\mid \mathbf{g}\mathbf{H})}) \in [1/\alpha ^k,\alpha ^k]$ and $\sum _\mathbf{H}\Pr (\mathbf{H}\mid \mathbf{g})=1$, we further find that $({O(\mathbf{C}\mid \mathbf{g})}) / ({O^o(\mathbf{C}\mid \mathbf{g})}) \in [1/\alpha ^k,\alpha ^k]$ for all $\mathbf{g}\in \mathbf{G}$. $\Box $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bolt, J.H., van der Gaag, L.C. (2015). Balanced Tuning of Multi-dimensional Bayesian Network Classifiers. In: Destercke, S., Denoeux, T. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2015. Lecture Notes in Computer Science(), vol 9161. Springer, Cham. https://doi.org/10.1007/978-3-319-20807-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-20807-7_19
Published: 12 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20806-0
Online ISBN: 978-3-319-20807-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Balanced Tuning of Multi-dimensional Bayesian Network Classifiers

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation