Skip to main content

Subsection 1.1.7 Optional — careful definition of the integral

In this optional section we give a more mathematically rigorous definition of the definite integral \(\ds \int_a^b f(x)\dee{x}\text{.}\) Some textbooks use a sneakier, but equivalent, definition. The integral will be defined as the limit of a family of approximations to the area between the graph of \(y=f(x)\) and the \(x\)-axis, with \(x\) running from \(a\) to \(b\text{.}\) We will then show conditions under which this limit is guaranteed to exist. We should state up front that these conditions are more restrictive than is strictly necessary — this is done so as to keep the proof accessible.

The family of approximations needed is slightly more general than that used to define Riemann sums in the previous sections, though it is quite similar. The main difference is that we do not require that all the subintervals have the same size.

  • We start by selecting a positive integer \(n\text{.}\) As was the case previously, this will be the number of subintervals used in the approximation and eventually we will take the limit as \(n \to \infty\text{.}\)
  • Now subdivide the interval from \(a\) to \(b\) into \(n\) subintervals by selecting \(n+1\) values of \(x\) that obey
    \begin{gather*} a=x_0 \lt x_1 \lt x_2 \lt \cdots \lt x_{n-1} \lt x_n=b. \end{gather*}
    The subinterval number \(i\) runs from \(x_{i-1}\) to \(x_i\text{.}\) This formulation does not require the subintervals to have the same size. However we will eventually require that the widths of the subintervals shrink towards zero as \(n\to\infty\text{.}\)
  • Then for each subinterval we select a value of \(x\) in that interval. That is, for \(i=1,2,\dots,n\text{,}\) choose \(x_i^*\) satisfying \(x_{i-1} \leq x_i^* \leq x_i\text{.}\) We will use these values of \(x\) to help approximate \(f(x)\) on each subinterval.
  • The area between the graph of \(y=f(x)\) and the \(x\)-axis, with \(x\) running

    from \(x_{i-1}\) to \(x_i\text{,}\) i.e. the contribution, \(\int_{x_{i-1}}^{x_i} f(x)\dee{x}\text{,}\) from interval number \(i\) to the integral, is approximated by the area of a rectangle. The rectangle has width \(x_i-x_{i-1}\) and height \(f(x_i^*)\text{.}\)

  • Thus the approximation to the integral, using all \(n\) subintervals, is
    \begin{gather*} \int_a^b f(x)\dee{x} \approx f(x_1^*)[x_1-x_0]+f(x_2^*)[x_2-x_1]+\cdots+ f(x_n^*)[x_n-x_{n-1}] \end{gather*}
  • Of course every different choice of \(n\) and \(x_1,x_2,\cdots,x_{n-1}\) and \(x_1^*, x_2^*,\cdots,x_n^*\) gives a different approximation. So to simplify the discussion that follows, let us denote a particular choice of all these numbers by \(\bbbp\text{:}\)
    \begin{gather*} \bbbp=\left(n,x_1,x_2,\cdots,x_{n-1},x_1^*, x_2^*, \cdots, x_n^*\right). \end{gather*}
    Similarly let us denote the resulting approximation by \(\cI(\bbbp)\text{:}\)
    \begin{gather*} \cI(\bbbp)=f(x_1^*)[x_1-x_0]+f(x_2^*)[x_2-x_1]+\cdots+ f(x_n^*)[x_n-x_{n-1}] \end{gather*}
  • We claim that, for any reasonable  19 We'll be more precise about what “reasonable” means shortly. function \(f(x)\text{,}\) if you take any reasonable  20 Again, we'll explain this “reasonable” shortly sequence of these approximations you always get the exactly the same limiting value. We define \(\int_a^b f(x) \dee{x}\) to be this limiting value.
  • Let's be more precise. We can take the limit of these approximations in two equivalent ways. Above we did this by taking the number of subintervals \(n\) to infinity. When we did this, the width of all the subintervals went to zero. With the formulation we are now using, simply taking the number of subintervals to be very large does not imply that they will all shrink in size. We could have one very large subinterval and a large number of tiny ones. Thus we take the limit we need by taking the width of the subintervals to zero. So for any choice \(\bbbp\text{,}\) we define
    \begin{gather*} M(\bbbp)=\max\big\{ x_1-x_0\ ,\ x_2-x_1\ ,\ \cdots\ ,\ x_n-x_{n-1}\big\} \end{gather*}
    that is the maximum width of the subintervals used in the approximation determined by \(\bbbp\text{.}\) By forcing the maximum width to go to zero, the widths of all the subintervals go to zero.
  • We then define the definite integral as the limit
    \begin{gather*} \int_a^b f(x)\dee{x}=\lim_{M(\bbbp)\rightarrow 0}\cI(\bbbp). \end{gather*}

Of course, one is now left with the question of determining when the above limit exists. A proof of the very general conditions which guarantee existence of this limit is beyond the scope of this course, so we instead give a weaker result (with stronger conditions) which is far easier to prove.

For the rest of this section, assume

  • that \(f(x)\) is continuous for \(a\le x\le b\text{,}\)
  • that \(f(x)\) is differentiable for \(a \lt x \lt b\text{,}\) and
  • that \(f'(x)\) is bounded — ie \(|f'(x)|\leq F\) for some constant \(F\text{.}\)

We will now show that, under these hypotheses, as \(M(\bbbp)\) approaches zero, \(\cI(\bbbp)\) always approaches the area, \(A\text{,}\) between the graph of \(y=f(x)\) and the \(x\)-axis, with \(x\) running from \(a\) to \(b\text{.}\)

These assumptions are chosen to make the argument particularly transparent. With a little more work one can weaken the hypotheses considerably. We are cheating a little by implicitly assuming that the area \(A\) exists. In fact, one can adjust the argument below to remove this implicit assumption.

  • Consider \(A_j\text{,}\) the part of the area coming from \(x_{j-1}\le x\le x_j\text{.}\)

    We have approximated this area by \(f(x_j^*)[x_j-x_{j-1}]\) (see figure left).

  • Let \(f({\overline x}_j)\) and \(f({\underline x}_j)\) be the largest and smallest values  21 Here we are using the extreme value theorem — its proof is beyond the scope of this course. The theorem says that any continuous function on a closed interval must attain a minimum and maximum at least once. In this situation this implies that for any continuous function \(f(x)\text{,}\) there are \(x_{j-1}\le {\overline x}_j, {\underline x}_j\le x_j\) such that \(f({\underline x}_j)\le f(x) \le f({\overline x}_j)\) for all \(x_{j-1}\le x\le x_j\text{.}\) of \(f(x)\) for \(x_{j-1}\le x\le x_j\text{.}\) Then the true area is bounded by
    \begin{gather*} f({\underline x}_j)[x_j-x_{j-1}] \leq A_j \leq f({\overline x}_j)[x_j-x_{j-1}]. \end{gather*}
    (see figure right).
  • Now since \(f({\underline x}_j) \leq f(x_j^*) \leq f({\overline x}_j)\text{,}\) we also know that
    \begin{gather*} f({\underline x}_j)[x_j-x_{j-1}] \leq f(x_j^*)[x_j-x_{j-1}] \leq f({\overline x}_j)[x_j-x_{j-1}]. \end{gather*}
  • So both the true area, \(A_j\text{,}\) and our approximation of that area \(f(x_j^*)[x_j - x_{j-1}]\) have to lie between \(f({\overline x}_j)[x_j-x_{j-1}]\) and \(f({\underline x}_j)[x_j-x_{j-1}]\text{.}\) Combining these bounds we have that the difference between the true area and our approximation of that area is bounded by
    \begin{gather*} \big|A_j-f(x_j^*)[x_j-x_{j-1}]\big| \le[f({\overline x}_j)-f({\underline x}_j)]\cdot[x_j-x_{j-1}]. \end{gather*}
    (To see this think about the smallest the true area can be and the largest our approximation can be and vice versa.)
  • Now since our function, \(f(x)\) is differentiable we can apply one of the main theorems we learned in CLP-1 — the Mean Value Theorem  22 Recall that the mean value theorem states that for a function continuous on \([a,b]\) and differentiable on \((a,b)\text{,}\) there exists a number \(c\) between \(a\) and \(b\) so that \(f'(c) = \frac{f(b)-f(a)}{b-a}.\). The MVT implies that there exists a \(c\) between \({\underline x}_j\) and \({\overline x}_j\) such that
    \begin{gather*} f({\overline x}_j)-f({\underline x}_j) =f'(c)\cdot [{\overline x}_j-{\underline x}_j] \end{gather*}
  • By the assumption that \(|f'(x)|\le F\) for all \(x\) and the fact that \({\underline x}_j\) and \({\overline x}_j\) must both be between \(x_{j-1}\) and \(x_j\)
    \begin{gather*} \big|f({\overline x}_j)-f({\underline x}_j)\big| \le F\cdot \big|{\overline x}_j-{\underline x}_j\big| \le F\cdot [x_j-x_{j-1}] \end{gather*}
    Hence the error in this part of our approximation obeys
    \begin{gather*} \big|A_j-f(x_j^*)[x_j-x_{j-1}]\big| \le F\cdot [x_j-x_{j-1}]^2. \end{gather*}
  • That was just the error in approximating \(A_j\text{.}\) Now we bound the total error by combining the errors from approximating on all the subintervals. This gives
    \begin{align*} \left| A-\cI(\bbbp)\right| &= \left| \sum_{j=1}^n A_j - \sum_{j=1}^n f(x_j^*)[x_j-x_{j-1}] \right|\\ &= \left| \sum_{j=1}^n \left(A_j - f(x_j^*)[x_j-x_{j-1}] \right) \right| &\text{triangle inequality}\\ &\leq \sum_{j=1}^n\left|A_j - f(x_j^*)[x_j-x_{j-1}]\right|\\ &\leq \sum_{j=1}^n F\cdot [x_j-x_{j-1}]^2 & \text{from above}\\ \end{align*}

    Now do something a little sneaky. Replace one of these factors of \([x_j-x_{j-1}]\) (which is just the width of the \(j^\mathrm{th}\) subinterval) by the maximum width of the subintervals:

    \begin{align*} &\leq \sum_{j=1}^n F\cdot M(\bbbp)\cdot [x_j-x_{j-1}] &\text{$F$ and $M(\bbbp)$ are constant}\\ &\leq F\cdot M(\bbbp)\cdot \sum_{j=1}^n [x_j-x_{j-1}] & \text{sum is total width}\\ & = F\cdot M(\bbbp)\cdot (b-a). \end{align*}
  • Since \(a\text{,}\) \(b\) and \(F\) are fixed, this tends to zero as the maximum rectangle width \(M(\bbbp)\) tends to zero.

Thus, we have proven