Skip to main content

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

Abstract

A scheme for the systematic adaptation of the random-parameter distribution widths is introduced. Weights exiting the same input node are combined into a weight group, and the distribution widths of the weight groups are adjusted during training by a method similar to Manhattan updating. A practical algorithm is derived, and an empirical demonstration shows that irrelevant inputs are detected and effectively switched off. The whole scheme was inspired by and is akin to Neal’s and MacKay’s automatic relevance determination. It will therefore be referred to by the same name.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  • The term hyperparameter will be borrowed since the g rand are parameters of a prior probability distribution, and are closely related to the hyperparameters α g in MacKays’s work [39].

    Google Scholar 

  • By defining σ as the exponential of ρ its positivity is always ensured. The other reason for introducing ρ is that σ is a scale parameter. Since a non-informative prior for a scale parameter is uniform on a logarithmic scale (as discussed in Section 11.2), ρ is the natural parameter for any adaptation scheme.

    Google Scholar 

  • The nature of the inconsistency of scheme ARD2 (vide infra) becomes clearer when the update rule for the ρ gs is analysed. As will be shown shortly in (15.15), the gradient of E with respect to ρ depends on all the weights exiting the input units, that is both the weights feeding into the S-layer and those feeding into the g-layer. However, as illustrated above and discussed in a more general way in [7], pp.340–342, these weights scale differently when the training data are subjected to a linear transformation. Consequently, the sign of the gradient in (15.15), and hence the network’s ‘assumption’ about the significance of the different inputs, can change as the result of such a linear transformation. This is a striking inconsistency, since linear transformations of the data should lead to equivalent networks which differ only by the linear transformation of the weights.

    Google Scholar 

  • The method of simple weight decay, with á k = 0.01 for all weight grous, was applied for regularization; see Section 12.1.2 for details.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this chapter

Cite this chapter

Husmeier, D. (1999). Automatic Relevance Determination (ARD). In: Neural Networks for Conditional Probability Estimation. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0847-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0847-4_15

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-095-8

  • Online ISBN: 978-1-4471-0847-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics