Skip to main content

Section C.2 The Error Behaviour of Newton's Method

Newton's method usually works spectacularly well, provided your initial guess is reasonably close to a solution of \(f(x)=0\text{.}\) A good way to select this initial guess is to sketch the graph of \(y=f(x)\text{.}\) We now explain why “Newton's method usually works spectacularly well, provided your initial guess is reasonably close to a solution of \(f(x)=0\)”.

Let \(r\) be any solution of \(f(x)=0\text{.}\) Then \(f(r)=0\text{.}\) Suppose that we have already computed \(x_n\text{.}\) The error in \(x_n\) is \(\big|x_n-r\big|\text{.}\) We now derive a formula that relates the error after the next step, \(\big|x_{n+1}-r\big|\text{,}\) to \(\big|x_n-r\big|\text{.}\) We have seen in (3.4.32) that

\begin{gather*} f(x)=f(x_n)+f'(x_n)(x-x_n)+\frac{1}{2} f''(c)(x-x_n)^2 \end{gather*}

for some \(c\) between \(x_n\) and \(x\text{.}\) In particular, choosing \(x=r\text{,}\)

\begin{equation*} 0=f(r)=f(x_n)+f'(x_n)(r-x_n)+\frac{1}{2} f''(c)(r-x_n)^2 \tag{E1} \end{equation*}

Recall that \(x_{n+1}\) is the solution of \(0=f(x_n)+f'(x_n)(x-x_n)\text{.}\) So

\begin{equation*} 0=f(x_n)+f'(x_n)(x_{n+1}-x_n) \tag{E2} \end{equation*}

We need to get an expression for \(x_{n+1}-r\text{.}\) Subtracting (E2) from (E1) gives

\begin{align*} 0=f'(x_n)(r-x_{n+1})+\frac{1}{2} f''(c)(r-x_n)^2 \ \amp \implies\ x_{n+1}-r=\frac{f''(c)}{2f'(x_n)}(x_n-r)^2\\ \amp \implies\ \big|x_{n+1}-r\big| =\frac{|f''(c)|}{2|f'(x_n)|}|x_n-r|^2 \end{align*}

If the guess \(x_n\) is close to \(r\text{,}\) then \(c\text{,}\) which must be between \(x_n\) and \(r\text{,}\) is also close to \(r\) and we will have \(f''(c)\approx f''(r)\) and \(f'(x_n)\approx f'(r)\) and

\begin{equation*} \big|x_{n+1}-r\big| \approx\frac{|f''(r)|}{2|f'(r)|}|x_n-r|^2 \tag{E3} \end{equation*}

Even when \(x_n\) is not close to \(r\text{,}\) if we know that there are two numbers \(L,M\gt 0\) such that \(f\) obeys:

  1. \(\big|f'(x_n)\big|\ge L\)
  2. \(\big|f''(c)\big|\le M\)

(we'll see examples of this below) then we will have

\begin{equation*} \big|x_{n+1}-r\big| \le\frac{M}{2L}|x_n-r|^2 \tag{E4} \end{equation*}

Let's denote by \(\varepsilon_1\) the error, \(|x_1-r|\text{,}\) of our initial guess. In fact, let's denote by \(\varepsilon_n\) the error, \(|x_n-r|\text{,}\) in \(x_n\text{.}\) Then (E4) says

\begin{equation*} \varepsilon_{n+1}\le \frac{M}{2L}\varepsilon_n^2 \end{equation*}

In particular

\begin{alignat*}{3} \varepsilon_2\amp \le \frac{M}{2L}\varepsilon_1^2\\ \varepsilon_3\amp \le \frac{M}{2L}\varepsilon_2^2 \amp \amp \le \frac{M}{2L}\left( \frac{M}{2L}\varepsilon_1^2\right)^2 \amp \amp = \left( \frac{M}{2L}\right)^3\varepsilon_1^4\\ \varepsilon_4\amp \le \frac{M}{2L}\varepsilon_3^2 \amp \amp \le \frac{M}{2L}\left[\left( \frac{M}{2L}\right)^3\varepsilon_1^4\right]^2 \amp \amp = \left( \frac{M}{2L}\right)^7\varepsilon_1^8\\ \varepsilon_5\amp \le \frac{M}{2L}\varepsilon_4^2 \amp \amp \le \frac{M}{2L}\left[\left( \frac{M}{2L}\right)^7\varepsilon_1^8\right]^2 \amp \amp = \left( \frac{M}{2L}\right)^{15}\varepsilon_1^{16} \end{alignat*}

By now we can see a pattern forming, that is easily verified by induction 1 Mathematical induction is a technique for proving a sequence \(S_1\text{,}\) \(S_2\text{,}\) \(S_3\text{,}\) \(\cdots\) of statements. That technique consists of first proving that \(S_1\) is true, and then proving that, for any natural number \(n\text{,}\) if \(S_n\) is true then \(S_{n+1}\) is true..

\begin{equation*} \varepsilon_n\le \left( \frac{M}{2L}\right)^{2^{n-1}-1}\varepsilon_1^{2^{n-1}} =\frac{2L}{M}\left(\frac{M}{2L}\varepsilon_1\right)^{2^{n-1}} \tag{E5} \end{equation*}

As long as \(\frac{M}{2L}\varepsilon_1\lt 1\) (which gives us a quantitative idea as to how good our first guess has to be in order for Newton's method to work), this goes to zero extremely quickly as \(n\) increases. For example, suppose that \(\frac{M}{2L}\varepsilon_1\le \frac{1}{2}\text{.}\) Then

\begin{equation*} \varepsilon_n\le \frac{2L}{M}\left(\frac{1}{2}\right)^{2^{n-1}} \le \frac{2L}{M}\cdot\begin{cases} 0.25 \amp \text{if }n=2 \\ 0.0625\amp \text{if }n=3 \\ 0.0039=3.9\times 10^{-3}\amp \text{if }n=4 \\ 0.000015=1.5\times 10^{-5}\amp \text{if }n=5 \\ 0.00000000023=2.3\times 10^{-10}\amp \text{if }n=6 \\ 0.000000000000000000054=5.4\times 10^{-20} \amp \text{if }n=7 \end{cases} \end{equation*}

Each time you increase \(n\) by one, the number of zeroes after the decimal place roughly doubles. You can see why from (E5). Since

\begin{equation*} \left(\frac{M}{2L}\varepsilon_1\right)^{2^{(n+1)-1}} =\left(\frac{M}{2L}\varepsilon_1\right)^{2^{n-1}\times 2} =\left[\left(\frac{M}{2L}\varepsilon_1\right)^{2^{n-1}}\right]^2 \end{equation*}

we have, very roughly speaking, \(\varepsilon_{n+1}\approx\varepsilon_n^2\text{.}\) This quadratic behaviour is the reason that Newton's method is so useful.

Let's consider, as we did in Example C.1.2, \(f(x)=x^2-2\text{,}\) starting with \(x_1=\frac{3}{2}\text{.}\) Then

\begin{equation*} f'(x)=2x\qquad f''(x)=2 \end{equation*}

Recalling, from (H1) and (H2), that \(L\) is a lower bound on \(|f'|\) and \(M\) is an upper bound on \(|f''|\text{,}\) we may certainly take \(M=2\) and if, for example, \(x_n\ge 1\) for all \(n\) (as happened in Example C.1.2), we may take \(L=2\) too. While we do not know what \(r\) is, we do know that \(1\le r\le 2\) (since \(f(1)=1^1-2\lt 0\) and \(f(2)=2^2-2>0\)). As we took \(x_1=\frac{3}{2}\text{,}\) we have \(\varepsilon_1=|x_1-r|\le \frac{1}{2}\text{,}\) so that \(\frac{M}{2L}\varepsilon_1\le\frac{1}{4}\) and

\begin{equation*} \varepsilon_{n+1}\le \frac{2L}{M}\left(\frac{M}{2L}\varepsilon_1\right)^{2^{n-1}} \le 2\left(\frac{1}{4}\right)^{2^{n-1}} \tag{E6} \end{equation*}

This tends to zero very quickly as \(n\) increases. Furthermore this is an upper bound on the error and not the actual error. In fact (E6) is a very crude upper bound. For example, setting \(n=3\) gives the bound

\begin{equation*} \varepsilon_4\le 2\left(\frac{1}{4}\right)^{2^2} = 7\times 10^{-3} \end{equation*}

and we saw in Example C.1.2 that the actual error in \(x_4\) was smaller than \(5\times 10^{-10}\text{.}\)

Let's consider, as we did in Example C.1.3, \(f(x)=\sin x\text{,}\) starting with \(x_1=3\text{.}\) Then

\begin{gather*} f'(x)=\cos x\qquad f''(x)=-\sin x \end{gather*}

As \(|-\sin x|\le 1\text{,}\) we may certainly take \(M=1\text{.}\) In Example C.1.3, all \(x_n\)'s were between \(3\) and \(3.2\text{.}\) Since (to three decimal places)

\begin{gather*} \sin(3)=0.141>0\qquad \sin(3.2)=-0.058\lt 0 \end{gather*}

the IVT (intermediate value theorem) tells us that \(3\lt r\lt 3.2\) and \(\varepsilon_1=|x_1-r|\lt 0.2\text{.}\)

So \(r\) and all \(x_n\)'s and hence all \(c\)'s lie in the interval \((3,3.2)\text{.}\) Since

\begin{gather*} -0.9990=\cos(3)\lt \cos c \lt \cos(3.2) =- 0.9983 \end{gather*}

we necessarily have \(\big|f'(c)\big|=\big|\cos c\big|\ge 0.9\) and we may take \(L=0.9\text{.}\) So

\begin{gather*} \varepsilon_{n+1}\le \frac{2L}{M}\left(\frac{M}{2L}\varepsilon_1\right)^{2^{n-1}} \le \frac{2\times0.9}{1}\left(\frac{1}{2\times0.9}0.2\right)^{2^{n-1}} \le 2\left(\frac{1}{9}\right)^{2^{n-1}} \end{gather*}

This tends to zero very quickly as \(n\) increases.

We have now seen two procedures for finding roots of a function \(f(x)\) — the bisection method (which does not use the derivative of \(f(x)\text{,}\) but which is not very efficient) and Newton's method (which does use the derivative of \(f(x)\text{,}\) and which is very efficient). In fact, there is a whole constellation of other methods 2 What does it say about mathematicians that they have developed so many ways of finding zero? and the interested reader should search engine their way to, for example, Wikipedia's article on root finding algorithms. Here, we will just mention two other methods, one being a variant of the bisection method and the other being a variant of Newton's method.