Demystifying the inner workings of BFGS optimization — Introduction Surely anyone who has dabbled in machine learning is familiar with gradient descent, and possibly even its close counterpart, stochastic gradient descent. If you have more than dabbled, then you’re surely also aware of the fancier extensions like gradient descent with momentum and Adam optimization. Perhaps less well-known are a…