Suppose we are given a set of data points {(xi , fi )}, i = 1, . . . , n. These could be measurements from an experiment or obtained simply by evaluating a function at some points. You have seen that we can interpolate these points, i.e., either ﬁnd a polynomial of degree ≤ (n − 1) which passes through all n points or we can use a continuous piecewise interpolant of the data which is usually a better approach. How, it might be the case that we know that these data points should lie on, for example, a line or a parabola, but due to experimental error they do not. So what we would like to do is ﬁnd a line (or some other higher degree polynomial) which best represents the data. Of course, we need to make precise what we mean by a “best ﬁt” of the data. As a concrete example suppose we have n points (x1 , f1 ), (x2 , f2 ), ··· (xn , fn )

and we expect them to lie on a straight line but due to experimental error, they don’t. We would like to draw a line and have the line be the best representation of the points. If n = 2 then the line will pass through both points and so the error is zero at each point. However, if we have more than two data points, then we can’t ﬁnd a line that passes through the three points (unless they happen to be collinear) so we have to ﬁnd a line which is a good approximation in some sense. Of course we need to deﬁne what we mean by a good representation. An obvious approach would be to create an error vector of length n and each component measures the diﬀerence (fi − y(xi )) where y = a1 x + a0 is the line we ﬁt the data with. Then we can take a norm of this error vector and our goal would be to ﬁnd the line which minimizes this error vector. Of course this problem is not clearly deﬁned because we have not speciﬁed what norm to use. The linear least squares problem ﬁnds the line which minimizes this diﬀerence in the ℓ2 (Euclidean) norm. Example We want to ﬁt a line p1 (x) = a0 + a1 x to the data points (1, 2.2), (.8, 2.4), (0, 4.25)

in a linear least squares sense. For now, we will just write the overdetermined system and determine if it has a solution. We will ﬁnd the line after we investigate how to solve the linear least squares problem. Our equations are a0 + a1 ∗ 1 = 2.2 a0 + a1 ∗ .8 = 2.4 a0 + a1 ∗ 0 = 4.25 1

Writing this as a matrix problem Ax = b we have 1 1 1 0.8 1 0 a0 a1 2.1 = 2.4 4.25

Now we know that this over-determined problem has a solution if the right hand side is in R(A) (i.e., it is a linear combination of the columns of the coeﬃcient matrix A). Here the rank of A is clearly 2 and thus not all of I 3 . Moreover, (2.1, 2.4, 4.25)T is not in the R(A), R T T i.e., not in the span{(1, 1, 1) , (1, 0.8, 0) } and so the system doesn’t have a solution. This just means that we can’t ﬁnd a line that passes through all three points.

Example

If our data had been (1, 2.1) (0.8, 2.5) (0, 4.1)

then would we have had a solution to the over-determined system? Our matrix problem Ax = b is 1 1 1 0.8 1 0 a0 a1 2.1 = 2.5 4.1

and we notice that in this case, the right hand side is in R(A) because 2.1 1 1 2.5 = 4.1 1 − 2 0.8 4.1 1 0 and thus the system is solvable and we have the line 4.1 − 2x which passes through all three points. But, in general, we can’t solve the over-determined system so our approach is to ﬁnd a vector x such that the residual r = b − Ax is as small as possible. The residual is a vector and so we take the norm. The linear least squares method uses the ℓ2 -norm. Consider the over-determined system Ax = b where A is m × n with m > n. The linear least squares problem is to ﬁnd a vector x which minimizes the ℓ2 norm of the residual that is x = min b − Az n z∈I R 2

b − Az

2

for all z ∈ I n R

2

We note that minimizing the ℓ2 norm of the residual is equivalent to minimizing its square. This is often easier to work with because we avoid dealing with square roots. So we...