Skip to main content

Balanced Tuning of Multi-dimensional Bayesian Network Classifiers

  • Conference paper
  • First Online:
Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9161))

Abstract

Multi-dimensional classifiers are Bayesian networks of restricted topological structure, for classifying data instances into multiple classes. We show that upon varying their parameter probabilities, the graphical properties of these classifiers induce higher-order sensitivity functions of restricted functional form. To allow ready interpretation of these functions, we introduce the concept of balanced sensitivity function in which parameter probabilities are related by the odds ratios of their original and new values. We demonstrate that these balanced functions provide a suitable heuristic for tuning multi-dimensional Bayesian network classifiers, with guaranteed bounds on the changes of all output probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In earlier research, we introduced the related concept of sliced sensitivity function [3] which specifies an output probability of a Bayesian network in n linearly related parameters.

References

  1. Bielza, C., Li, G., Larrañaga, P.: Multi-dimensional classification with Bayesian networks. Int. J. Approximate Reasoning 52, 705–727 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  2. De Bock, J., de Campos, C.P., Antonucci, A.: Global sensitivity analysis for MAP inference in graphical models. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2690–2698 (2014)

    Google Scholar 

  3. Bolt, J.H., Renooij, S.: Local sensitivity of Bayesian networks to multiple parameter shifts. In: van der Gaag, L.C., Feelders, A.J. (eds.) PGM 2014. LNCS, vol. 8754, pp. 65–80. Springer, Switzerland (2014)

    Google Scholar 

  4. Borchani, H., Bielza, C., Toro, C., Larrañaga, P.: Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers. Artif. Intell. Med. 57, 219–229 (2013)

    Article  Google Scholar 

  5. Chan, H., Darwiche, A.: A distance measure for bounding probabilistic belief change. Int. J. Approximate Reasoning 38, 149–174 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. van der Gaag, L.C., de Waal, P.R.: Multi-dimensional Bayesian network classifiers. In: Vomlel, J., Studený, M. (eds.) Proceedings of the Third European Workshop in Probabilistic Graphical Models, pp. 107–114 (2006)

    Google Scholar 

  7. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Palo Alto (1988)

    MATH  Google Scholar 

  8. Renooij, S.: Co-variation for sensitivity analysis in Bayesian networks: properties, consequences and alternatives. Int. J. Approximate Reasoning 55, 1022–1042 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. de Waal, P.R., van der Gaag, L.C.: Inference and learning in multi-dimensional Bayesian network classifiers. In: Mellouli, K. (ed.) ECSQARU 2007. LNCS (LNAI), vol. 4724, pp. 501–511. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported by the Netherlands Organisation for Scientific Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janneke H. Bolt .

Editor information

Editors and Affiliations

Appendix

Appendix

Proof of Proposition 1. Let \( MDC (\mathbf{C,F})\) be a multi-dimensional classifier as before. Writing the output probability \(\Pr (\mathbf{c} \!\mid \!\mathbf{f})\) for a given \(\mathbf{c}\) and \(\mathbf{f}\) as \(\Pr (\mathbf{c} \!\mid \!\mathbf{f}) = \left( \Pr (\mathbf{f}\! \mid \!\mathbf{c})\cdot \Pr (\mathbf{c})\right) / \left( {\sum _{\mathbf{C}}\Pr (\mathbf{f}\! \mid \!\mathbf{C})\cdot \Pr (\mathbf{C})}\right) \), and including terms involving the original probability values \(\Pr ^o(\mathbf{c}\mid \mathbf{f})\) and \(\Pr ^o(\mathbf{c})\), results in

$$\begin{aligned} \Pr (\mathbf{c}\! \mid \! \mathbf{f}) \ = \ \frac{\Big (\frac{\Pr (\mathbf{f}\mid \mathbf{c})\cdot \Pr (\mathbf{c})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{c})\cdot \Pr ^o(\mathbf{c})}{\Pr ^o(\mathbf{f})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{c})\cdot \Pr ^o(\mathbf{c})}\Big )}{\Big (\sum _{\mathbf{C}}\frac{\Pr (\mathbf{f}\mid \mathbf{C})\cdot \Pr (\mathbf{C})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}{\Pr ^o(\mathbf{f})\cdot \Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}\Big )} = \frac{\Big (\frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{c})\cdot \Pr (\mathbf{c})}{\Pr ^o(\mathbf{f}\mid \mathbf{c})\cdot \Pr ^o(\mathbf{c})}\Big )}{\Big (\sum _\mathbf{C}\frac{\Pr ^o(\mathbf{C}\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{C})\cdot \Pr (\mathbf{C})}{\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}\Big )}\\ \end{aligned}$$

Rearranging its summands into a single fraction gives for the denominator

$$\begin{aligned} \frac{\sum \limits _{\mathbf{c}^*\in \mathbf{C}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f}) \cdot \Pr (\mathbf{f}\mid \mathbf{c}^*)\cdot \Pr (\mathbf{c}^*)\cdot \prod \limits _{\mathbf{C}\backslash \mathbf{c}^*} \Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C}) \right) }{\prod \limits _{\mathbf{C}}\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})} \end{aligned}$$

where \(\mathbf{C}\backslash \mathbf{c}^*\) is used to denote the set of all joint assignments to \(\mathbf{C}\) except \(\mathbf{c}^*\). Substitution and simplification now gives

$$\begin{aligned}&\Pr ( \mathbf{c} \!\mid \mathbf{f}) \ = \ \frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{c})\cdot \Pr (\mathbf{c})\cdot \prod _{\mathbf{C}\backslash \mathbf{c}}\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}{\sum _{\mathbf{c}^*\in \mathbf{C}}\Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \Pr (\mathbf{f}\mid \mathbf{c}^*)\cdot \Pr (\mathbf{c}^*)\cdot \prod _{\mathbf{C}\backslash \mathbf{c}^*}\Pr ^o(\mathbf{f}\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}\\&\qquad =\frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \prod _{i}\Pr (f_i\mid \mathbf{c}, \mathbf{f}_{F_i})\cdot \Pr (\mathbf{c})\cdot \prod _{\mathbf{C}\backslash \mathbf{c}}\prod _{i}\Pr ^o(f_i\mid \mathbf{C})\cdot \Pr ^o(\mathbf{C})}{\sum _{\mathbf{c}^*\in \mathbf{C}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{i}\Pr (f_i\mid \mathbf{c}^*, \mathbf{f}_{F_i})\cdot \Pr (\mathbf{c}^*)\cdot \prod _{\mathbf{C}\backslash \mathbf{c}^*}\prod _{i}\Pr ^o(f_i\mid \mathbf{C}, \mathbf{f}_{F_i})\cdot \Pr ^o(\mathbf{C})\right) }\\ \end{aligned}$$

in which we used that \(\Pr (\mathbf{f} \! \mid \! \mathbf{c}) = \prod _{i}\Pr (f_i \! \mid \! \mathbf{c}, \mathbf{f}_{F_i})\) with \(f_i,\mathbf{f}_{F_i}\sim \mathbf{f}\), and that \(\Pr (\mathbf{c})=\prod _{j}\Pr (c_j)\) with \(c_j\sim \mathbf{c}\). We then find that

$$\begin{aligned} \Pr (\mathbf{c}\mid \mathbf{f})(\mathbf x)= & {} \frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\sum _{\mathbf{c}^*\in \mathbf{C}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j \right) } \end{aligned}$$

   \(\Box \)

Proof of Proposition 2. For the one-way sensitivity function describing the output probability \(\Pr (\mathbf{c \mid \mathbf{f}})\) of an MDC in a parameter \(x \sim \mathbf{c}\), we have that \(\Pr (\mathbf{c\mid \mathbf{f}})(x) = (x\cdot r) / (x\cdot s + t)\), where \(r,s,t \ge 0\) since these constants arise from multiplication and addition of probabilities. The function’s first derivative equals \(\Pr (\mathbf{c\mid \mathbf{f}})'(x)= (r\cdot t) / (s\cdot x+t)^2\), which is always positive. Irrespective of the values of the other parameters in the classifier therefore, an increase in value of \(x \sim \mathbf{c}\) will result in an increase of \(\Pr (\mathbf{c\mid \mathbf{f}})\). Similarly, the output probability increases with a decrease in value of \(x \not \sim \mathbf{c}\).    \(\Box \)

Proof of Proposition 3. Let \( MDC (\mathbf{C,F})\), \(\mathbf{G}\) and \(\mathbf{x}\) be as stated in the proposition, and let \(\mathbf{H}\) be such that \(\mathbf{H}=\mathbf{F}\backslash \mathbf{G}\). We first show that the proposition holds for any value combination \(\mathbf{c}\in \mathbf{C}\) given a fixed instance \(\mathbf{f}\). Using Proposition 1 we find that

$$\begin{aligned} O(\mathbf{c}\mid \mathbf{f})(\mathbf{x}) = \frac{\Pr (\mathbf{c}\mid \mathbf{f})(\mathbf{x}) }{1-\Pr (\mathbf{c}\mid \mathbf{f})(\mathbf{x}) } =\frac{\Pr ^o(\mathbf{c}\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\sum _{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j \right) } \end{aligned}$$

from which we find

$$\begin{aligned} \frac{O(\mathbf{c}\mid \mathbf{f})(\mathbf{x}) }{O^o(\mathbf{c}\mid \mathbf{f})} = \frac{\sum _{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}} x_i\cdot x^o_j \right) }{\sum _{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}} \left( \Pr ^o(\mathbf{c}^*\mid \mathbf{f})\cdot \prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j \right) } \end{aligned}$$

and hence

$$\begin{aligned} \mathrm{min}_{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}}\frac{\prod \limits _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\prod \limits _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j} \ \ \le \ \ \frac{O(\mathbf{c}\mid \mathbf{f})(\mathbf{x}) }{O^o(\mathbf{c}\mid \mathbf{f})}\ \ \le \ \ \mathrm{max}_{\mathbf{c}^*\in \mathbf{C}\backslash \mathbf{c}}\frac{\prod \limits _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j}{\prod \limits _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j} \end{aligned}$$

If \(\mathbf{x}\) includes all parameters of the classifier, from each probability table exactely two parameters will not cancel out from the fraction \((\prod _{x_i\sim \mathbf{c},x_j\not \sim \mathbf{c}}x_i\cdot x^o_j) \, / \, (\prod _{x_i\sim \mathbf{c}^*,x_j\not \sim \mathbf{c}^*}x_i\cdot x^o_j)\). For each such parameter x, the fraction includes either \(\frac{x}{x^o}\) or \(\frac{x^o}{x}\). Now, for \(\alpha \ge 1\), we have that \(\frac{x}{x^o},\frac{x^o}{x}\in [1/\alpha ,\alpha ]\). With a balanced sensitivity function therefore, the minimum of the fraction equals \(1/\alpha ^k\) and the maximum is \(\alpha ^k\), where k is two times the number of probability tables. If \(\mathbf{x}\) includes just a subset of the classifier’s parameters, we find that \(k=s+2\cdot t\), where s is the number of probability tables from which just a single parameter is in \(\mathbf{x}\) and t is the number of tables with two or more parameters in \(\mathbf{x}\).

For an instance \(\mathbf{f^{\prime }}\not \sim \mathbf{f}\), we find \(\Pr (\mathbf{c} \!\mid \! \mathbf{f^{\prime }})\) by replacing (some of) the parameters in the fraction above by their proportional co-variant, which gives \(\frac{1-x}{1-x^o}\) or its reciprocal. Since for \(\alpha \ge 1\), these fractions are in \([1/\alpha ,\alpha ]\) as well, the proof above generalises to all instances in \(\mathbf{F}\). For a partial instance \(\mathbf{g}\) we have that \(\Pr (\mathbf{C\mid \mathbf{g}})=\sum _\mathbf{H}\Pr (\mathbf{C}\mid \mathbf{g},\mathbf{H})\cdot \Pr (\mathbf{H}\mid \mathbf{g})\). Since \(({O(\mathbf{C}\mid \mathbf{g}\mathbf{H})}) / ({O^o(\mathbf{C}\mid \mathbf{g}\mathbf{H})}) \in [1/\alpha ^k,\alpha ^k]\) and \(\sum _\mathbf{H}\Pr (\mathbf{H}\mid \mathbf{g})=1\), we further find that \(({O(\mathbf{C}\mid \mathbf{g})}) / ({O^o(\mathbf{C}\mid \mathbf{g})}) \in [1/\alpha ^k,\alpha ^k]\) for all \(\mathbf{g}\in \mathbf{G}\).    \(\Box \)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bolt, J.H., van der Gaag, L.C. (2015). Balanced Tuning of Multi-dimensional Bayesian Network Classifiers. In: Destercke, S., Denoeux, T. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2015. Lecture Notes in Computer Science(), vol 9161. Springer, Cham. https://doi.org/10.1007/978-3-319-20807-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20807-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20806-0

  • Online ISBN: 978-3-319-20807-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics