Transcription

This is page iiiPrinter: Opaque thisJorge NocedalStephen J. WrightNumerical OptimizationSecond Edition

This is pagPrinter: OJorge NocedalEECS DepartmentNorthwestern UniversityEvanston, IL J. WrightComputer Sciences DepartmentUniversity of Wisconsin1210 West Dayton StreetMadison, WI 53706– Editors:Thomas V. MikoschUniversity of CopenhagenLaboratory of Actuarial MathematicsDK-1017 M. RobinsonDepartment of Industrial and SystemsEngineeringUniversity of Wisconsin1513 University AvenueMadison, WI 53706– I. ResnickCornell UniversitySchool of Operations Research andIndustrial EngineeringIthaca, NY Subject Classification : 90B30, 90C11, 90-01, 90-02Library of Congress Control Number: 923897ISBN-10: 0-387-30303-0ISBN-13: 978-0387-30303-1Printed on acid-free paper. C Springer Science Business Media, LLC.All rights reserved. This work may not be translated or copied in whole or in part without the written permissionof the publisher (Springer Science Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except forbrief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology nowknown or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are notidentified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietaryrights.Printed in the United States of America.(TB/HAM)

This is page vPrinter: Opaque thisTo Sue, Isabel and MartinandTo Mum and Dad

This is page viiPrinter: Opaque thisContentsPrefacexviiPreface to the Second Edition12IntroductionMathematical Formulation . . . . . . . . . .Example: A Transportation Problem . . . . .Continuous versus Discrete Optimization . . .Constrained and Unconstrained OptimizationGlobal and Local Optimization . . . . . . . .Stochastic and Deterministic Optimization . .Convexity . . . . . . . . . . . . . . . . . . .Optimization Algorithms . . . . . . . . . . .Notes and References . . . . . . . . . . . . . . . . .xxi.1245667789Fundamentals of Unconstrained Optimization2.1What Is a Solution? . . . . . . . . . . . . . . . . . . . . . . . . . . . .1012.

viiiCONTENTSRecognizing a Local Minimum . . . . . . .Nonsmooth Problems . . . . . . . . . . . .2.2Overview of Algorithms . . . . . . . . . . .Two Strategies: Line Search and Trust RegionSearch Directions for Line Search Methods .Models for Trust-Region Methods . . . . . .Scaling . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . .34.1417181920252627Line Search Methods3.1Step Length . . . . . . . . . . . . . . . . . . . .The Wolfe Conditions . . . . . . . . . . . . . . .The Goldstein Conditions . . . . . . . . . . . . .Sufficient Decrease and Backtracking . . . . . . .3.2Convergence of Line Search Methods . . . . . . .3.3Rate of Convergence . . . . . . . . . . . . . . . .Convergence Rate of Steepest Descent . . . . . . .Newton’s Method . . . . . . . . . . . . . . . . .Quasi-Newton Methods . . . . . . . . . . . . . .3.4Newton’s Method with Hessian Modification . . .Eigenvalue Modification . . . . . . . . . . . . . .Adding a Multiple of the Identity . . . . . . . . .Modified Cholesky Factorization . . . . . . . . .Modified Symmetric Indefinite Factorization . . .3.5Step-Length Selection Algorithms . . . . . . . . .Interpolation . . . . . . . . . . . . . . . . . . . .Initial Step Length . . . . . . . . . . . . . . . . .A Line Search Algorithm for the Wolfe ConditionsNotes and References . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . egion MethodsOutline of the Trust-Region Approach . .4.1Algorithms Based on the Cauchy Point . .The Cauchy Point . . . . . . . . . . . . .Improving on the Cauchy Point . . . . . .The Dogleg Method . . . . . . . . . . . .Two-Dimensional Subspace Minimization4.2Global Convergence . . . . . . . . . . . .Reduction Obtained by the Cauchy Point .Convergence to Stationary Points . . . . .4.3Iterative Solution of the Subproblem . . .6668717173737677777983.

CONTENTSThe Hard Case . . . . . . . . . . . . . . . . . . . . . . . .Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . .Convergence of Algorithms Based on Nearly Exact Solutions4.4Local Convergence of Trust-Region Newton Methods . . .4.5Other Enhancements . . . . . . . . . . . . . . . . . . . .Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . .Trust Regions in Other Norms . . . . . . . . . . . . . . . .Notes and References . . . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56Conjugate Gradient Methods5.1The Linear Conjugate Gradient Method . . . . . . .Conjugate Direction Methods . . . . . . . . . . . .Basic Properties of the Conjugate Gradient MethodA Practical Form of the Conjugate Gradient MethodRate of Convergence . . . . . . . . . . . . . . . . .Preconditioning . . . . . . . . . . . . . . . . . . .Practical Preconditioners . . . . . . . . . . . . . .5.2Nonlinear Conjugate Gradient Methods . . . . . .The Fletcher–Reeves Method . . . . . . . . . . . .The Polak–Ribière Method and Variants . . . . . .Quadratic Termination and Restarts . . . . . . . . .Behavior of the Fletcher–Reeves Method . . . . . .Global Convergence . . . . . . . . . . . . . . . . .Numerical Performance . . . . . . . . . . . . . . .Notes and References . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .Quasi-Newton Methods6.1The BFGS Method . . . . . . . . . . . . . . .Properties of the BFGS Method . . . . . . . .Implementation . . . . . . . . . . . . . . . .6.2The SR1 Method . . . . . . . . . . . . . . . .Properties of SR1 Updating . . . . . . . . . .6.3The Broyden Class . . . . . . . . . . . . . . .6.4Convergence Analysis . . . . . . . . . . . . .Global Convergence of the BFGS Method . . .Superlinear Convergence of the BFGS MethodConvergence Analysis of the SR1 Method . . .Notes and References . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . 56160161162ix

xCONTENTS789Large-Scale Unconstrained Optimization7.1Inexact Newton Methods . . . . . . . . . . . . . . . .Local Convergence of Inexact Newton Methods . . . . .Line Search Newton–CG Method . . . . . . . . . . . .Trust-Region Newton–CG Method . . . . . . . . . . .Preconditioning the Trust-Region Newton–CG MethodTrust-Region Newton–Lanczos Method . . . . . . . . .7.2Limited-Memory Quasi-Newton Methods . . . . . . .Limited-Memory BFGS . . . . . . . . . . . . . . . . .Relationship with Conjugate Gradient Methods . . . .General Limited-Memory Updating . . . . . . . . . . .Compact Representation of BFGS Updating . . . . . .Unrolling the Update . . . . . . . . . . . . . . . . . .7.3Sparse Quasi-Newton Updates . . . . . . . . . . . . .7.4Algorithms for Partially Separable Functions . . . . . .7.5Perspectives and Software . . . . . . . . . . . . . . . .Notes and References . . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90191Calculating Derivatives8.1Finite-Difference Derivative ApproximationsApproximating the Gradient . . . . . . . . .Approximating a Sparse Jacobian . . . . . .Approximating the Hessian . . . . . . . . .Approximating a Sparse Hessian . . . . . . .8.2Automatic Differentiation . . . . . . . . . .An Example . . . . . . . . . . . . . . . . .The Forward Mode . . . . . . . . . . . . .The Reverse Mode . . . . . . . . . . . . . .Vector Functions and Partial Separability . .Calculating Jacobians of Vector Functions . .Calculating Hessians: Forward Mode . . . .Calculating Hessians: Reverse Mode . . . . .Current Limitations . . . . . . . . . . . . .Notes and References . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . 17Derivative-Free Optimization9.1Finite Differences and Noise . . . .9.2Model-Based Methods . . . . . . .Interpolation and Polynomial BasesUpdating the Interpolation Set . .220221223226227.

CONTENTSA Method Based on Minimum-Change Updating .Coordinate and Pattern-Search Methods . . . . .Coordinate Search Method . . . . . . . . . . . .Pattern-Search Methods . . . . . . . . . . . . . .9.4A Conjugate-Direction Method . . . . . . . . . .9.5Nelder–Mead Method . . . . . . . . . . . . . . .9.6Implicit Filtering . . . . . . . . . . . . . . . . . .Notes and References . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .9.3.22822923023123423824024224210 Least-Squares Problems10.1 Background . . . . . . . . . . . . . . . . . . . . . .10.2 Linear Least-Squares Problems . . . . . . . . . . . .10.3 Algorithms for Nonlinear Least-Squares Problems . .The Gauss–Newton Method . . . . . . . . . . . . . .Convergence of the Gauss–Newton Method . . . . . .The Levenberg–Marquardt Method . . . . . . . . . .Implementation of the Levenberg–Marquardt MethodConvergence of the Levenberg–Marquardt Method . .Methods for Large-Residual Problems . . . . . . . . .10.4 Orthogonal Distance Regression . . . . . . . . . . . .Notes and References . . . . . . . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .24524725025425425525825926126226526726911 Nonlinear Equations11.1 Local Algorithms . . . . . . . . . . . . . .Newton’s Method for Nonlinear EquationsInexact Newton Methods . . . . . . . . .Broyden’s Method . . . . . . . . . . . . .Tensor Methods . . . . . . . . . . . . . .11.2 Practical Methods . . . . . . . . . . . . .Merit Functions . . . . . . . . . . . . . .Line Search Methods . . . . . . . . . . . .Trust-Region Methods . . . . . . . . . . .11.3 Continuation/Homotopy Methods . . . .Motivation . . . . . . . . . . . . . . . . .Practical Continuation Methods . . . . . .Notes and References . . . . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . . . . . . . . .27027427427727928328528528729029629629730230212 Theory of Constrained OptimizationLocal and Global Solutions . . . . . . . . . . . . . . . . . . . . . . . .304305.xi

xiiCONTENTSSmoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A Single Equality Constraint . . . . . . . . . . . . . . . . . . . . . .A Single Inequality Constraint . . . . . . . . . . . . . . . . . . . . .Two Inequality Constraints . . . . . . . . . . . . . . . . . . . . . .12.2 Tangent Cone and Constraint Qualifications . . . . . . . . . . . . .12.3 First-Order Optimality Conditions . . . . . . . . . . . . . . . . . .12.4 First-Order Optimality Conditions: Proof . . . . . . . . . . . . . . .Relating the Tangent Cone and the First-Order Feasible Direction SetA Fundamental Necessary Condition . . . . . . . . . . . . . . . . .Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Proof of Theorem 12.1 . . . . . . . . . . . . . . . . . . . . . . . . .12.5 Second-Order Conditions . . . . . . . . . . . . . . . . . . . . . . .Second-Order Conditions and Projected Hessians . . . . . . . . . .12.6 Other Constraint Qualifications . . . . . . . . . . . . . . . . . . . .12.7 A Geometric Viewpoint . . . . . . . . . . . . . . . . . . . . . . . .12.8 Lagrange Multipliers and Sensitivity . . . . . . . . . . . . . . . . . .12.9 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .