Change of variables and necessary conditions for optimality

In this post, we consider how removal and addition of degrees of freedom through change of variables can help in searching for a minimum of a function.

Removing degrees of freedom

Consider a function f(x,y)=xy. The necessary condition for optimality says that its differential should be equal to zero,

df=ydx+xdy0(x,y)=(0,0).

However, we can introduce a new variable z=xy and consider a function g(z)=z, which obviously has no critical point. What happened here is that we lost one degree of freedom. Initially, we could change x and y independently, but after introducing z, we can only change their product xy.

Sometimes, though, we don’t need to use all degrees of freedom to find a critical point of a function. For example, if f(x,y)=(xy)2 and g(z)=z2, then

dg=2zdz0z=0

describes the same set of solutions as the system of equations (xy2,x2y)=(0,0) obtained from the partial derivatives of f. So, we incur no loss of information by removing some degrees of freedom in this case.

One should be careful, nevertheless, to ensure that the solution z of the equation gz=0 lies in the range of the function z. For example, if f(x)=(1x)2, then we could write g(z)=z2 with z=1x. Provided dg=2z, we could haste to declare z=0 a critical point of f despite its lying outside the image of z.

To summarize,

Adding degrees of freedom

Adding degrees of freedom can only hurt. Consider the function

f(z)=z2(2z1),

that has a minimum at z=1. Assuming we are studying its restriction on z0.5, we can introduce the function g of two variables

g(x(z),y(z))=x2y2

with x2=z2 and y2=2z1. Equating its differential to zero,

dg=gxdx+gydy=2xdx2ydy0(x,y)=(0,0),

leads to contradiction, since it implies z=0 and z=0.5 at the same time. We ran into such troubles because two variables give more flexibility than we can actually afford with one. Expanding dx and dy in the differential, we see that

dg=(gxxz+gyyz)dz

and even though equation gy=2y=0 suggests setting y=0, multiplying gy=2y by yz=1y results in a finite quantity.

Introducing extra variables just obscures the problem. Most importantly, critical points of g do not tell us anything about critical points of f. Therefore, adding degrees of freedom in its pure form is not helpful for optimization.