Getting Started

Demystifying the inner workings of BFGS optimization

Painting by J.H. Carl (1784). Brown Digital Repository, Brown University Library.


Surely anyone who has dabbled in machine learning is familiar with gradient descent, and possibly even its close counterpart, stochastic gradient descent. If you have more than dabbled, then you’re surely also aware of the fancier extensions like gradient descent with momentum, and Adam optimization.

Perhaps less well-known are a class of optimization algorithms known as quasi-Newton methods. Though these optimization methods are less fervently advertised in popular accounts of machine learning, they hold an important place in the arsenal of machine learning practitioners.

The goal of this article is to provide an introduction to the mathematical formulation of BFGS…

Adrian Lam

Student of data science, former astrophysics researcher | Brown University

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store