Some notes about fundamentals of Unconstrained Optimization

imported
notes
Published

March 8, 2015

Given \[\mathbf{f(x_k+\alpha t)\approx f(x_k)+\alpha p^T \nabla f(x_k)}\], the steepest decent direction is \[\mathbf{-\bigtriangledown f_x}\] Proof: Let p be a unit vector, hence the unit direction p of most rapid decrease is the solution to the problem \[\min\limits_{p}\:\alpha p^T \nabla f(x_k)\:s.t.\:\|p\|=1\] Since \[\alpha p^T \nabla f(x_k)=\|p\|\|\nabla f(x_k)\|cos(\theta)=\|\nabla f(x_k)\|cos(\theta)\] the minimizer is attended when \[cos(\theta)=-1\]. Thus the direction of the steepest decent is the exact opposite of the direction of \[\nabla f(x_k)\], or \[p=-\frac{\nabla f(x_k)}{\|\nabla f(x_k)\|}\] Newton direction \[f(x_k+p)\approx f_k+p^T \nabla f_k+p^T \nabla^2 f_k p \stackrel{\mbox{def}}{=}m_k(p)\] Assuming for the moment that \[\nabla^2 f_k\] is positive definite, we obtain the Newton direction by finding the vector p that minimizes \[m_k(p)\] or by setting\[\frac{d(m_k(p))}{dp}=0\]. Let’s assume \[\nabla f=\nabla f_k\] to simplify the derivation. \[\frac{d(m_k(p))}{dp}=\frac{d}{dp}(p^T\nabla f)+\frac{1}{2}\frac{d}{dp}(p^T\nabla^2 f p)\] Note that \[\frac{\partial}{\partial p_i}(p^T \nabla f)=\frac{\partial}{\partial p_i} (\sum\limits_{i} p_i \nabla f_i)={\nabla f}^T\] and \[\frac{\partial}{\partial p_i}(p^T\nabla^2 f p)=\frac{\partial}{\partial p_i}(\sum\limits_{j}\sum\limits_{k}p_k\nabla^2 f_{kj} p_j)\] \[=\sum\limits_{j}\nabla^2 f_{ij} p_j+\sum\limits_{k}\nabla^2 f_{ki} p_k\] \[=p^T{\nabla^2 f}^T+p^T\nabla^2 f\] \[=2p^T\nabla^2 f\] (\[\nabla^2 f\] is symmetric) thus \[\frac{d(m_k(p))}{dp}={\nabla f}^T+p^T\nabla^2 f=\vec{0}^T\] \[\Rightarrow{\nabla f}+{\nabla^2 f}^T p=\vec{0} \] \[\Rightarrow{\nabla f}+{\nabla^2 f} p=\vec{0} \] (\[\nabla^2 f\] is symmetric) \[\Rightarrow p=-(\nabla^2 f)^{-1}\nabla f\]