Skip to main content

Bayesian Theory of Decision

  • Chapter
  • First Online:
Machine Learning for Audio, Image and Video Analysis

Abstract

What the reader should know to understand this chapter \(\bullet \) Basic notions of statistics and probability theory (see Appendix A). \(\bullet \) Calculus notions are an advantage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(\mathbf {I}_{\alpha (\mathbf {x}) = 1}\) is 1 if \({\alpha (\mathbf {x}) = 1}\); 0 otherwise.

  2. 2.

    Since \(\mathbf {I}_{\alpha ^{\star }(\mathbf {x}) = 1}\) is 1, the term must be nonnegative.

  3. 3.

    Since \(\mathbf {I}_{\alpha ^{\star }(\mathbf {x}) = 1}\) is 0, the term must be nonpositive.

References

  1. T. Bayes. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 1763.

    Google Scholar 

  2. J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, 1985.

    Google Scholar 

  3. J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley, 1986.

    Google Scholar 

  4. P. Comon. Independent component analysis: A new concept? Signal Processing, 36(1):287–314, 1994.

    Google Scholar 

  5. M. H. De Groot. Optimal Statistical Decisions. McGraw-Hill, 1970.

    Google Scholar 

  6. L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, 1996.

    Google Scholar 

  7. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, 2001.

    Google Scholar 

  8. T. S. Ferguson. Mathematical Statistics: A Decision-Theoretic Approach. Academic Press, 1967.

    Google Scholar 

  9. R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.

    Google Scholar 

  10. K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990.

    Google Scholar 

  11. D. Green and J.A. Swets. Signal Detection Theory and Psychophysics. Wiley, 1974.

    Google Scholar 

  12. A. Hyvarinen. Survey on independent component analysis. Neural Computing Surveys, 2(1):94–128, 1999.

    Google Scholar 

  13. I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, 1986.

    Google Scholar 

  14. G. A. Korn and T. M. Korn. Mathematical Handbook for Scientists and Engineers. Dover, 1961.

    Google Scholar 

  15. P. M. Lee. Bayesian Statistics: An Introduction. Edward Arnold, 1989.

    Google Scholar 

  16. D. V. Lindley. Making Decisions. John Wiley, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Camastra .

Problems

Problems

5.1

Given a normal distribution \(\mathcal {N}(\sigma ,\mu )\), show that the percentage of samples that assume values in \([-3\sigma , 3\sigma ]\) exceeds 99 %.

5.2

Consider the function \(f(x)= \frac{a}{1+x^2}\) where \(a \in \mathbb {R}\). Find the value a such that f(x) is a probability density. Besides, compute the expected value of x.

5.3

Consider the Geometric distribution [14] defined by:

$$ p(x)= \theta (1-\theta )^x \quad (x=0,1,2,\dots , 0\le \theta \le 1). $$

Prove that its mean is \(\mathcal {E}[x]= \frac{1-\theta }{\theta }\).

5.4

Given a probability density f(x), the moment of fourth order [14] is defined by

$$ \frac{1}{\sigma ^4} \int _{-\infty }^{\infty } f(x) (x-\mu )^4 \textit{dx} $$

where \(\mu \) and \(\sigma ^2\) are, respectively, the mean and the variance.

Prove that the moment of fourth-order of a normal distribution \(\mathcal {N}(\mu ,\sigma )\) is 3.

5.5

Let \(x=(x_1,\dots ,x_{\ell })\) and \(y=(y_1,\dots ,y_{\ell })\) be two variables. Prove that if they are statistically independent their covariance is null.

5.6

Suppose we have two classes \(\mathcal {C}_1\) and \(\mathcal {C}_2\) with a priori probabilities \(p(\mathcal {C}_1)= \frac{1}{3}\) and \(p(\mathcal {C}_2)= \frac{2}{3}\). Suppose that their likelihoods are \(p(x|\mathcal {C}_1)= \mathcal {N}(1,1)\) and \(p(x|\mathcal {C}_2)= \mathcal {N}(1,0)\). Find numerically the value of x such that the posterior probabilities \(p(\mathcal {C}_1|x)\), \(p(\mathcal {C}_2|x)\) are equal.

5.7

Suppose we have two classes \(\mathcal {C}_1\) and \(\mathcal {C}_2\) with a priori probabilities \(p(\mathcal {C}_1)= \frac{2}{5}\) and \(p(\mathcal {C}_2)= \frac{3}{5}\). Suppose that their likelihoods are \(p(x|\mathcal {C}_1)= \mathcal {N}(1,0)\) and \(p(x|\mathcal {C}_2)= \mathcal {N}(1,1)\). Compute the joint probability such that both points \(x_1= -0.1\), \(x_2= 0.2\) belong to \(\mathcal {C}_1\).

5.8

Suppose we have two classes \(\mathcal {C}_1\) and \(\mathcal {C}_2\) with a priori probabilities \(p(\mathcal {C}_1)= \frac{1}{4}\) and \(p(\mathcal {C}_2)= \frac{3}{4}\). Suppose that their likelihoods are \(p(x|\mathcal {C}_1)= \mathcal {N}(2,0)\) and \(p(x|\mathcal {C}_2)= \mathcal {N}(0.5,1)\). Compute the likelihood ratio and write the discriminant function.

5.9

Suppose we have three classes \(\mathcal {C}_1\), \(\mathcal {C}_2\) and \(\mathcal {C}_3\) with a priori probabilities \(p(\mathcal {C}_1)= \frac{1}{6}\), \(p(\mathcal {C}_2)= \frac{1}{3}\) and \(p(\mathcal {C}_2)= \frac{1}{2}\). Suppose that their likelihoods are respectively \(p(x|\mathcal {C}_1)= \mathcal {N}(0.25,0)\), \(p(x|\mathcal {C}_2)= \frac{a}{1+x^2}\) and \(p(x|\mathcal {C}_3)= \frac{1}{b+(x-1)^2}\). Find the values a and b such that likelihoods are density functions and write three discriminant functions.

5.10

Implement the whitening transform. Test your implementation transforming Iris Data [9], which can be downloaded by ftp.ics.uci.edu/pub/machine-learning-databases/iris. Verify that the covariance matrix of the transformed data is the identity matrix.

5.11

Suppose that the features are statistically independent and that they have the same variance \(\sigma \). In this case where the discriminant function is a linear classifier. Given two adjacent decision regions \(\mathcal {D}_1\) and \(\mathcal {D}_2\), show that their separating hyperplane is orthogonal to the line connecting the means \(\mu _1\) and \(\mu _2\).

5.12

Suppose that the covariance matrix is the same for all the classes. The discriminant function is a linear classifier. Given two adjacent decision regions \(\mathcal {D}_1\) and \(\mathcal {D}_2\) show that their separating hyperplane is not orthogonal to the line connecting the means \(\mu _1\) and \(\mu _2\).

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag London

About this chapter

Cite this chapter

Camastra, F., Vinciarelli, A. (2015). Bayesian Theory of Decision. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6735-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-6735-8_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-6734-1

  • Online ISBN: 978-1-4471-6735-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics