purdue university · cs 52000
computational methods in optimization
HOMEWORK
David F. Gleich
March 12, 2014
Homework 4 by Ning Zhang:
Acknowledge the help of Tao Wu and Zhiwei Zhang.
Problem 1: Steepest descent
(Nocedal and Wright, Exercise 3.6) Let’s conclude with a quick problem to show that steepest descent
can converge very rapidly! Consider the steepest descent method with exact line search for the function
f(x) = (1/2)xTQx� xTb. Suppose that we know x0 � x⇤ is parallel to an eigenvector of Q. Show that the
method will converge in a single iteration.
Answer: Since x0 � x⇤ is parallel to an eigenvector of Q,
g(x0) = Qx0 � b = Qx0 �Qx⇤ +Qx⇤ � b
= Q(x0 � x⇤) + g(x⇤)
= �(x0 � x⇤)
Then
g(x0)
Tg(x0) = �
2(x0 � x⇤)T (x0 � x⇤)
g(x0)
TQg(x0) = �
3(x0 � x⇤)T (x0 � x⇤)
And so the equation 3.26 turns to be
x1 = x0 � g(x0)
Tg(x0)
g(x0)TQg(x0)
g(x0)
= x0 � 1
�
�(x0 � x⇤)
= x⇤
End proof.
Problem 2: Some optimizaiton
Non-negative least squares is an important variant. Formally, it is:
minimize kAx� bk2
subject to x � 0.
We’ll see this problem again when we study constrained optimization. Here, we’ll investigate a log-barrier
function to approximate it in an unconstrained manner:
minimize kAx� bk2 � µeT log(x),
where log is an elementwise function. Use the definition where: log(x) = �1 if x 0.
1. Determine the gradient and Hessian.
Answer: g(x) = 2AT (Ax� b)� µ · column vectorn( 1xi ). H(x) = 2ATA+ µ · diag matrixn⇥n( 1x2i ).
2. Modify the gradient_descent_1.m and newtons_method_1.m functions to use a backgracking line
search routine from Poblano.
Answer: Just replace the x updating sentence with
1
[ x , ˜ , ˜ , ˜ , ˜ , n fev ] = pob l ano l i n e s e a r ch ( fg , x , f , g ,0 ,�g , params ) ;
for the gradient descent method and with
[ x , ˜ , ˜ , ˜ , ˜ , n fev ] = pob l ano l i n e s e a r ch ( fg , x , f , g ,0 ,�H\g , params ) ;
for the Newton’s method.
The parameters are set as default with
params = ncg ( ’ d e f a u l t s ’ ) ;
3. Suppose that x(0) is strictly positive.
Does x(k) stay strictly positive with these two algorithms? Discuss or prove.
Answer: Yes, xk stay strictly positive. Since H(x) is positive definite for x > 0 so if there exist
x⇤ > 0 st. g(x) = 0, we must have g(x) < 0 for 0 < x < x⇤ and g(x) > 0 for x > x⇤. So if 0 < xk < x⇤,
xk+1 > xk > 0 holds. If 0 < x⇤ < xk, xk+1 > 0 will still hold otherwise the su�cient condition will
be violated (log(x) = �1 if x 0, so f(xk+1) > f(xk)). Because x0 > 0, xk will stay positive by
induction.
4. Show plots of the convergence in terms of function values and of infinity norms of the gradients of the
methods for the matrices:
A = [
0.0372 0.2869
0.6861 0.7071
0.6233 0.6245
0.6344 0.6170];
b = [
0.8587
0.1781
0.0747
0.8405];
mu = 0.1;
Answer: The solution given by the two methods is
⇥
0.2034 0.5663
⇤
.
0 5 10 15 201
1.5
2
2.5
3
3.5
iteration
fun
va
l
Gradient descent
gradient descent
0 5 10 15 200
1
2
3
4
iteration
gra
d i
nf
no
rm
Gradient descent
gradient descent
0 1 2 3 4 5 61
1.5
2
2.5
3
3.5
iteration
fun
va
l
Newtons method
newton method
0 1 2 3 4 5 60
1
2
3
4
iteration
gra
d i
nf
no
rm
Newtons method
newton method
The plots are as above and the script is as follows
2
x0 = [ 1 , 1 ] ’ ;
[ x1 , f1 , g1 , h i s t1 , h i s t x1 ] = g r ad i en t d e s c en t 1 (@fgh , x0 ) ;
[ x2 , f2 , g2 , h , h i s t2 , h i s t x2 ] = newtons method 1 (@fgh , x0 ) ;
f igure ;
subplot ( 2 , 2 , 1 ) , plot ( h i s t 1 ( 1 , : ) ) ; xlabel ( ’ i t e r a t i o n ’ ) ; ylabel ( ’ fun va l ’ ) ;
legend ( ’ g rad i en t descent ’ ) ;
t i t l e ( ’ Gradient descent ’ )
subplot ( 2 , 2 , 2 ) , plot ( h i s t 1 ( 2 , : ) ) ; xlabel ( ’ i t e r a t i o n ’ ) ; ylabel ( ’ grad i n f norm ’
) ; legend ( ’ g rad i en t descent ’ ) ;
t i t l e ( ’ Gradient descent ’ )
subplot ( 2 , 2 , 3 ) , plot ( h i s t 2 ( 1 , : ) ) ; xlabel ( ’ i t e r a t i o n ’ ) ; ylabel ( ’ fun va l ’ ) ;
legend ( ’ newton method ’ ) ;
t i t l e ( ’ Newtons method ’ )
subplot ( 2 , 2 , 4 ) , plot ( h i s t 2 ( 2 , : ) ) ; xlabel ( ’ i t e r a t i o n ’ ) ; ylabel ( ’ grad i n f norm ’
) ; legend ( ’ newton method ’ ) ;
t i t l e ( ’ Newtons method ’ )
function [ f , g , H] = fgh (x )
%func t i on f o r hw4 , prb2 .4
A = [ 0 . 0 3 7 2 , 0 . 2 8 6 9 ;
0 . 6 861 , 0 . 7 071 ;
0 . 6 233 , 0 . 6 245 ;
0 . 6 3 4 4 , 0 . 6 1 7 0 ] ;
b = [ 0 . 8 5 8 7 , 0 . 1 7 8 1 , 0 . 0 7 4 7 , 0 . 8 4 0 5 ] ’ ;
mu = 0 . 1 ;
i f sum(x<=0) ˜= 0
f = Inf ;
[ n , ˜ ] = s ize ( x ) ;
g = NaN(n , 1) ;
H = NaN(n , n ) ;
else
l = A⇤x�b ;
f = ( l ’ ) ⇤ l � mu⇤sum( log ( x ) ) ;
g = 2⇤(A’ ) ⇤ l � mu⇤ ( 1 . / x ) ;
H = 2⇤(A’ ) ⇤A + mu⇤diag ( 1 . / ( x . ˆ 2 ) ) ;
end
return
end
5. Suppose we say that a method converges if the infinity norm of the gradient is less than 10�4. Plot the
number of function evaluations of your new steepest descent method and the Newton method for the
values of µ = {10�1, 10�2, 10�3, 10�4, 10�5, 10�6}.
Show your solutions for each of these cases. Compare the solutions to Matlab’s fminunc routine.
Answer: The solutions is as follows, 6 columns correspond to di↵erent µs. Basically both of the two
methods converge to the solutions of fminunc routine.
xgrad =
Columns 1 through 3
0.203401385491268 0.031356879716420 0.003302544299047
0.566276021849502 0.668855553977323 0.690319947024598
Columns 4 through 6
0.000331866443660 0.000088542751414 0.000018547582083
0.692642815622834 0.688367917395477 0.663899018138591
xnewtons =
Columns 1 through 3
0.203364839208255 0.031345907965919 0.003300555473141
3
0.566308538239243 0.668885696192224 0.690355769488330
Columns 4 through 6
0.000331785428427 0.000033196339712 0.000003319771431
0.692674723090668 0.692908531414492 0.692931798383219
xfmin =
Columns 1 through 3
0.203363682216457 0.031340098780883 0.003300040277398
0.566323772686660 0.668891184389050 0.690356255223426
Columns 4 through 6
0.000331024692903 0.000033195856239 0.000027181578248
0.692675440756985 0.692908411592569 0.638452502290494
The plot is as follows
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2 x 10
4
mu
Fu
n E
va
l N
um
be
r
Gradient descent
Newtons method
And the code is as follows
x0 = [ 1 , 1 ] ’ ;
mus = [1 e�1 1e�2 1e�3 1e�4 1e�5 1e�6] ;
global mu;
nfev = zeros (2 , length (mus) ) ;
xgrad = zeros (2 , length (mus) ) ;
xnewtons = zeros (2 , length (mus) ) ;
xfmin = zeros (2 , length (mus) ) ;
for i = 1 : length (mus)
mu = mus( i ) ;
param . t o l = 1e�4;
[ x1 , f1 , g1 , h i s t1 , h i s tx1 , f eva ln1 ] = g r ad i en t d e s c en t 1 (@fgh , x0 , param) ;
[ x2 , f2 , g2 , ˜ , h i s t2 , h i s tx2 , f eva ln2 ] = newtons method 1 (@fgh , x0 , param) ;
opt i ons = opt imset ( ’GradObj ’ , ’ on ’ , ’ Hess ian ’ , ’ on ’ ) ;
[ x3 ] = fminunc (@fgh , x0 , opt ions ) ;
n fev (1 , i ) = f eva ln1 ;
nfev (2 , i ) = f eva ln2 ;
xgrad ( : , i ) = x1 ;
xnewtons ( : , i ) = x2 ;
4
xfmin ( : , i ) = x3 ;
end
figure ;
plot ( 1 : length (mus) , nfev ( 1 , : ) , ’ r�.ˆ ’ ) ;
hold on ;
plot ( 1 : length (mus) , nfev ( 2 , : ) , ’ b : o ’ ) ;
xlabel ( ’mu ’ ) ;
ylabel ( ’Fun Eval Number ’ ) ;
legend ( ’ Gradient descent ’ , ’ Newtons method ’ ) ;
The mu in the objective functions is set as a global variable. And a cumulative counter of the number
of function evaluations is added into gradient decent 1.m and newtons mehtod 1.m.
Problem 3: Necessary and su�cient conditions
Let f(x) = 12x
TQx� xT c.
1. Write down the necessary conditions for the problem:
minimize f(x)
subject to x � 0 .
Answer: Add the lagrange multiplier,
1
2
xTQx� xT c� �Tx
Then the KKT condition is
Qx� c� � = 0
s � 0,x � 0,�Tx = 0
And the KKT condition is the first order necessary condition.
2. Write down the su�cient conditions for the same problem.
Answer: The KKT conditions plus Q � 0 are su�cient.
3. Consider the two-dimensional case with
Q =
1 2
2 1
�
c =
0
�1.5
�
.
Determine the solution to this problem by any means you can, and justify your work.
Answer: I use the fmincon routine and the solution is (0, 0) with function value 0. The code is as
follows
[ x , fv ] = fmincon (@f , [ 1 , 1 ] ’ , [ ] , [ ] , [ ] , [ ] , [ 0 , 0 ] ’ )
function [ f v ] = f ( x )
%ob j e c t i v e func o f hw4 p3
Q = [ 1 , 2 ; 2 , 1 ] ;
c = [ 0 ; �1 .5 ] ;
fv = 0 . 5⇤ ( x ’ ) ⇤Q⇤x � (x ’ ) ⇤c ;
end
The output is as follows
5
x = 0 0
fv = 0
f lag = 1
output =
i t e r a t i o n s : 2
funcCount : 6
l s s t e p l e n g t h : 1
s t e p s i z e : 0
a lgor i thm : ’medium�s c a l e : SQP, Quasi�Newton , l i n e�search ’
f i r s t o r d e r o p t : 0
c o n s t r v i o l a t i o n : 0
message : [ 1 x783 char ]
lambda . lower : 0 .000000007450581 1.500000007450581
grad = 0.000000007450581 1.500000007450581
According to the output, the first order necessary condition or say the KKT condition is satisfied with
� = (0, 1.5).
4. Produce a Matlab or hand illustration of the solution showing the function contours, gradient, the
constraint normal. What are the active constraints at the solution? What is the value of � in AT� = g?
Answer: The contour is as follows As shown in last subproblem, the gradient at (0, 0) is (0, 1.5).
x1
x2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
10
20
30
40
50
60
70
80
The active constraints are both x1 > 0, x2 > 0 and the constraint normal is 0. The � for the lower
constraints is (0, 1.5).
6
Ning Zhang
Ning Zhang
Gradient
本文档为【数值优化homework-4】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。