Skip to main content

Finding Minima Algorithms

  • Chapter
  • First Online:
  • 9111 Accesses

Part of the book series: Springer Series in the Data Sciences ((SSDS))

Abstract

The learning process in supervised learning consists of tuning the network parameters (weights and biases) until a certain cost function is minimized. Since the number of parameters is quite large (they can easily be into thousands), a robust minimization algorithm is needed. This chapter presents a number of minimization algorithms of different flavors, and emphasizes their advantages and disadvantages.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   29.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   39.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   49.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This is sometimes called the Euler system of equations.

  2. 2.

    This means that \(F_{|\mathbb {B}(0, \rho )}\) is bijective with both F and its inverse differentiable.

  3. 3.

    This is more transparent in the case of \({\mathbb R}^3\), when \(\langle \nabla f(x^0), v\rangle = \Vert \nabla f(x^0)\Vert \, \Vert v\Vert \cos \theta \). The minimum is realized for \(\theta =\pi \), i.e. when the vectors have opposite directions.

  4. 4.

    This proximity condition can be waived if f is a convex function.

  5. 5.

    For instance, shaking a basket filled with potatoes of different sizes will bring the large ones to the bottom of the basket and the small ones to the top – this corresponds to the state of the system with the smallest gravitational energy.

  6. 6.

    For discrete time steps this can be written equivalently as

    $$ v^n = \gamma v^{n-1} + (1- \gamma ) (g_{n-1})^2. $$
  7. 7.

    From the Physics point of view, in this case the substance does not reach to the state of a crystalline structure, but rather to an amorphous structure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ovidiu Calin .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Calin, O. (2020). Finding Minima Algorithms. In: Deep Learning Architectures. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-36721-3_4

Download citation

Publish with us

Policies and ethics