Skip to main content

PLP: An introduction to mathematical proof

Section 12.5 More comparisons of cardinalties

At this point, we have given extra meaning to the symbols “\(\leq\)” and ’\(\lt \)’ so that they can be used to handle cardinalities of infinite sets. In that context do they still behave the same way as they do in the context of (say) comparing real numbers? So, for example, does the following hold:
if \(|A|\leq|B|\) and \(|B|\leq |A|\) then \(|A|=|B|\)
This is equivalent to
If there is an injection from \(A\) to \(B\) and an injection from \(B\) to \(A\text{,}\) then there is a bijection from \(A\) to \(B\text{.}\)
This result is the Cantor-Schröder-Bernstein 187  Theorem

Subsection 12.5.1 The Cantor-Schröder-Bernstein theorem

The proof is by no means trivial; we have some work to do. First up — let us assume that we have those injections \(f:A \to B\) and \(g:B \to A\text{.}\)
We can think of \(f\) injecting a copy of \(A\) into \(B\) — namely \(f(A)\text{.}\) It is not hard to show that by restricting the codomain we can build a function
\begin{align*} \hat{f}:A \to f(A) && \text{defined by } \hat{f}(a) = f(a) \end{align*}
is a bijection. So that looks like a good place to start making our bijection from \(A\) to \(B\text{.}\) We can similarly restrict the codomain of \(g\) to construct
\begin{align*} \hat{g}:B \to g(B) && \text{defined by } \hat{g}(b) = g(b) \end{align*}
Since this is also a bijection, its inverse is a bijection — giving us another bijection
\begin{gather*} \hat{g}^{-1}: g(B) \to B \end{gather*}
This should give us some hope because by restricting codomains we have
  • a bijection, \(\hat{f}\text{,}\) from all of \(A\) to some (but not all) of \(B\text{,}\) and
  • a bijection, \(\hat{g}^{-1}\) from some (but not all) of \(A\) to all of \(B\text{.}\)
So perhaps we can build a bijection from all of \(A\) to all of \(B\) by carefully choosing between these two depending on our input.
To investigate things a bit further, we should think about what happens when we compose \(f\) and \(g\text{.}\) Let \(x \in A\) and consider the image of \(x\) under \(f\) — we get some element \(f(x) \in B\text{.}\) We cannot do much with this, but we can apply \(g\) to it, giving us \(g(f(x))\text{.}\) This, in turn, is some element of \(A\text{,}\) so we can apply \(f\) to it, etc etc. In this way, we can think of the trajectory of \(x\) under these two injections.
\begin{gather*} x \mapsto f(x) \mapsto g(f(x)) \mapsto f(g(f(x))) \mapsto \cdots \end{gather*}
What we have here is the forward trajectory of \(x\) under \(f\) and \(g\text{.}\)
But we could also go backwards. For any \(x \in A\text{,}\) we can compute the preimage of \(g^{-1}(\set{x})\text{.}\) Since \(g\) is injective, we know that that this set is either empty or contains precisely 1 element (otherwise \(g\) would fail to be injective). We can make a similar argument about any \(y \in B\text{,}\) its preimage \(f^{-1}(\set{y})\) is either empty or contains exactly 1 element. Hence we can extend this trajectory backwards via unique preimages — unless we get stuck at an element whose preimage is empty.
Each element has a unique forward step along its path by applying \(f\) or \(g\) as appropriate. Similarly, when we step back along the trajectory, we move to a unique element (due to injectivity) by application of \(f^{-1}\) or \(g^{-1}\text{,}\) or the preimage is empty and cannot move any further back. So two trajectories cannot merge, nor can a trajectory branch:
Because of this, the trajectory of any point \(x\) can be of 4 types:
  • the trajectory forms an infinite line
  • the trajectory is a loop
  • the trajectory goes forward forever, but going backwards we get stuck at an element of \(A\text{,}\) or
  • the trajectory goes forward forever, but going backwards we get stuck at an element of \(B\text{.}\)
We cannot have more complicated shapes. Let us call the first three types “good” and the last type — that gets stuck at \(B\) — “bad”.
Now, define a function \(h: A \to B\) by
\begin{align*} h(x) &= \begin{cases} f(x) & \text{ if the trajectory of \(x\) is good, and}\\ g^{-1}(x) & \text{ if the trajectory of \(x\) is bad.} \end{cases} \end{align*}
To see that this is actually function — take any element of \(a \in A\text{.}\) We can always form \(f(a)\text{,}\) since \(f\) is a function. If \(a\) lies on a bad trajectory, then we know we can go backwards until we get stuck at some element of \(B\) — thus the preimage of \(g^{-1}(a)\) must contain exactly 1 element — which we take to be \(h(a)\text{.}\)
Oof. Now we “just” have to prove this is a bijection.
Let \(b \in B\text{.}\) Examine the trajectory on which \(b\) lies — it is either “good” or “bad”.
  • If \(b\) lies on a good trajectory, then (by moving back one step along the trajectory) there is some \(x \in A\) so that \(f(x)=y\text{.}\) This \(x\) must also lie on a good trajectory and so \(h(x) = f(x) = y\text{.}\)
  • If \(b\) lies on a bad trajectory, then (by moving forward one step along the trajectory) we set \(x=g(b)\text{.}\) We know that \(x\) also lies on a bad trajectory, so \(h(x)= g^{-1}(x) = g^{-1}(g(y))= y\text{.}\)
In either case, by moving forward or backward one step along the trajectory of \(y\) we find an \(x \in A\) so that \(h(x)=y\text{.}\)
Let \(a,c \in A\text{,}\) and assume that \(h(a)=h(c)\text{.}\) Notice that by construction, \(h(x)\) lies on the same trajectory as \(x\text{.}\) Thus \(a,c,h(a)\) and \(h(c)\) must all lie on the same trajectory. That trajectory is either good or bad.
  • If the trajectory is good, then \(h(a)=f(a)\) and \(h(c)=f(c)\) and so \(f(a)=f(c)\text{.}\) Since \(f\) is an injection, we know that \(a=c\text{.}\)
  • If the trajectory is bad then \(h(a)=g^{-1}(a)\) and \(h(c)=g^{-1}(c)\text{.}\) So \(g^{-1}(a)=g^{-1}(c)\) — since \(g\) is an injection, that preimage contains exactly one element, so we must have \(a=c\text{.}\)
In either case, we have that \(a=c\text{.}\)

Subsection 12.5.2 Applications

The Cantor-Schröder-Bernstein theorem makes proving the following results much easier.
By Cantor-Schröder-Bernstein it suffices to construct injections from \((0,1)\) to \([0,1)\) and vice-versa to show that they are equinumerous.
  • Let \(f:(0,1) \to [0,1)\) be defined by \(f(x)=x\text{.}\) This is an injection since if \(x_1 \neq x_2\) then \(f(x_1)=x_1 \neq x_2=f(x_2)\text{.}\)
  • Let \(g:[0,1) \to (0,1)\) be defined by \(g(x) = 0.1(1+x)\) (there are many similar choices). Again, this is an injection since if \(x_1 \neq x_2\) then \(g(x_1) \neq g(x_2)\text{.}\)
We can then show that the second and third sets are equinumerous via the explicit bijection
\begin{align*} f:\amp[0,1)\to(0,1] \amp f(x)=1-x \end{align*}
This is injective since if \(f(x_1)=f(x_2)\) then \(1-x_1=1-x_2\) so \(x_1=x_2\text{.}\) It is surjective since for any \(y \in (0,1]\) pick \(x=1-y \in [0,1)\text{.}\) Then \(f(x)= 1-(1-y)=y\) as required.
Finally we can prove that the first and last sets are equinumerous by very similar injections to those we used to prove that the first and second sets are equinumerous.
  • Let \(f:(0,1) \to [0,1]\) be defined by \(f(x)=x\text{.}\) It is (immediately) injective by the same argument used above.
  • Let \(g:[0,1] \to (0,1)\) be defined by \(g(x)=0.1(1+x)\text{.}\) It is injective by similar arguments.
So by CSB we have that \(|(0,1)|=|[0,1]|\text{.}\)
This can also be proved without CSB, but it is definitely more work — we’ll demonstrate that the first two sets are equinumerous. The approach is to split the set \((0,1)\) into the set \(\set{\frac{n}{n+1} \so n \in \mathbb{N} }\) and everything else. Then things of the form \(\frac{n}{n+1}\) are mapped to \(\frac{n-1}{n}\text{,}\) so that
\begin{align*} f(1/2) &= 0 & f(2/3) &= 1/2 & f(3/4) &= 2/3 & f(4/5) &= 3/4 & \end{align*}
This way we can map something to 0 while not forgetting or missing any other element.
Let \(f:(0,1) \to [0,1)\) be defined by
\begin{equation*} f(x) = \begin{cases} \frac{n-1}{n} & \text{if } x = \frac{n}{n+1}, n \in \mathbb{N} \\ x & \text{ otherwise} \end{cases} \end{equation*}
We then need to show that this is both injective and surjective.
  • Injective — Let \(a,b \in (0,1)\) and assume \(f(a)=f(b)\text{.}\) If \(f(a)=f(b) = \frac{n-1}{n}\) for some \(n \in \mathbb{N}\text{,}\) then we must have \(a=\frac{n}{n+1}=b\text{.}\) On the other hand if \(f(a)=f(b)\) is not of this form then \(f(x)=x\text{,}\) so \(a=b\text{.}\)
  • Surjective — Let \(y \in (0,1)\text{.}\) If \(y = \frac{n-1}{n}\) for some \(n \in \mathbb{N}\text{,}\) then set \(x=\frac{n}{n+1}\text{.}\) Otherwise set \(x=y\text{.}\) In either case \(f(x)=y\) as required.

Subsection 12.5.3 Proof that \(|\pow{\mathbb{N}}| = |\mathbb{R}|\)

From Cantor’s theorem we saw that \(|\mathbb{N}|\lt \pow{\mathbb{N}}|\text{,}\) and by Cantor’s diagonal argument we also saw that \(|\mathbb{N}|\lt |\mathbb{R}|\text{.}\) It is not unreasonable to ask to compare these two uncountable sets. The following really beautiful result shows that they are actually equinumerous.
By the Cantor-Schröder-Bernstein theorem it suffices to construct injections from \(\mathcal{P}(\mathbb{N})\) to \(\mathbb{R}\) and vice-versa.
  • We define an injection \(g:\mathcal{P}(\mathbb{N}) \to \mathbb{R}\text{.}\) Let \(X \in \mathcal{P}(\mathbb{N})\text{,}\) we can then construct \(y = g(X)\) by constructing its decimal expansion. In particular, write \(y = 0.y_1y_2y_3y_4\cdots\text{,}\) and then
    \begin{equation*} y_n = \begin{cases} 1 & \text{ if } n\in X \\ 0 & \text{ if } n \not\in X \end{cases} \end{equation*}
    Now if we take \(X_1,X_2 \in \mathcal{P}\) with \(X_1 \neq X_2\text{,}\) then
    • there is \(n_1 \in X_1\) so that \(n_1 \not\in X_2\text{,}\) or
    • there is \(n_2 \in X_2\) so that \(n_2 \not\in X_1\text{,}\) or both.
    In either case, this means that corresponding expansions differ at either \(y_{n_1}\) or \(y_{n_2}\text{,}\) and so \(g(X_1) \neq g(X_2)\text{.}\)
  • It is easier to construct an injection from \([0,1)\) to \(\pow{\mathbb{N}}\) rather than from \(\mathbb{R}\) to \(\pow{\mathbb{N}}\text{.}\) However, since we have proved that \(|[0,1)|=|(0,1)|=|\mathbb{R}|\text{,}\) we know there is a bijection between \(\mathbb{R}\) and \([0,1)\text{.}\) So if we compose that bijection with the injection we are about to construct, we get the required injection from \(\mathbb{R}\) to \(\pow{\mathbb{N}}\text{.}\)
    Given any \(x \in [0,1)\) we write its decimal expansion as \(x=0.x_1x_2x_3\dots\text{.}\) As noted above we can make this expansion unique by avoiding any expansion that ends in an infinite sequence of 9’s. Using this expansion we can define \(h:[0,1) \to \pow{\mathbb{N}}\) by
    \begin{equation*} h(x) = h(0.x_1x_2x_3\dots) = \set{x_1, 10x_2, 100x_3, \dots, 10^{n-1} x_n, \dots} \subseteq \mathbb{N} \end{equation*}
    So, for example,
    \begin{align*} h\left(\frac{1}{4}\right)\amp = h(0.25)= \set{2,50} \\ h\left(\frac{1}{3}\right)\amp = h(0.333\dots) = \set{3,30,300,3000,\dots} \\ h\left(\frac{2}{11}\right)\amp = h(0.181818\dots) = \set{1,80,100,8000,\dots} \end{align*}
    Now if \(x \neq z\) then their decimal expansions must differ at least one digit. Consequently their images will be different subsets of \(\mathbb{N}\text{.}\) Hence the function \(h\) is an injection from \([0,1)\) to \(\pow{\mathbb{N}}\text{,}\) and so there is an injection from \(\mathbb{R}\) to \(\pow{\mathbb{N}}\) as required.
This was first published without proof by Cantor in 1887. In the same year it was proved by Dedekind, but not published. Schröder gave a proof sketch in 1896, but it was found to be incorrect. Then in 1897, almost simultaneously, Schröder and Bernstein gave correct proofs. At this time, Bernstein was 19 years old! Dedekin proved it again later that year. Why Dedekin doesn’t get more credit for this theorem remains mysterious.