Skip to main content

PLP: An introduction to mathematical proof

Section 10.4 Injective and surjective functions

We typically think of a function as taking objects from one set, \(A\text{,}\) doing “stuff” and turning them into elements from another set \(B\text{.}\) With this in mind, it is quite natural to ask whether or not we can reverse this process; take our result and turn it back into our original object. That is, “when is a given function invertible”.
If you think back to our days-of-the-week example:
You can see that if we have arrived at the letter “M” then it is easy to determine that we started at “Monday” — so easy to undo the function. However, if we have arrived at “S” then things are not so simple — we could have started with either “Sunday” or “Saturday”. Obviously this is related to the preimage that we saw before. The good case of “M” worked because the preimage 135  had 1 element in it, “Monday”. The bad case of “S” didn’t work because the preimage 136  had 2 elements in it. More generally the preimage could have any number of elements in it — including zero.
Now, for us to be able to sensibly undo our function, we need the preimage of every element in our co-domain to have exactly one element in it. To be more precise, for any element \(y\) in our codomain, there must exist some \(x\) in the domain so that the preimage of \(\set{y}\) is the set \(\set{x}\text{.}\) If you think about this a little, this means that the domain and co-domain must have the same size 137 . We’ll discuss this more soon. But it should be clear from this discussion that not every function can be undone, and that those that are “undoable” have to satisfy special properties. This brings us to a couple of important definitions.
To get to this we need to define some simple properties that functions can have.

Subsection 10.4.1 Injections and surjections

Definition 10.4.1.

Let \(a_1, a_2 \in A\) and let \(f: A \to B\) be a function. We say that \(f\) is injective or one-to-one when
\begin{equation*} \text{if } a_1 \neq a_2 \text{ then } f(a_1) \neq f(a_2). \end{equation*}
It is helpful to also write the contrapositive of this condition. We say that \(f\) is injective or one-to-one when
\begin{equation*} \text{if } f(a_1) = f(a_2) \text{ then } a_1 = a_2. \end{equation*}
Things to note
  • The term injective is, in this author’s opinion, better to use than one-to-one. When we speak (and write) we are sometimes quite sloppy with our use of prepositions like “to” or “on” or “in” or “onto”, so we might accidentally say one-onto-one, for example. The term injective is a nice latin-flavoured word that makes the speaker / writer sound more authoritative  138  .
  • A very common mistake made with this definition is to get the implications around the wrong way — to give the converse of what is required:
    • The right way — \(f(a_1) = f(a_2) \implies a_1 = a_2\text{.}\) Injective!
    • The wrong way — \(a_1 = a_2 \implies f(a_1) = f(a_2)\) — this is just “same input implies same output” which just says the function is well defined.
    Be careful of this.
  • So when a function is injective, different elements map to different elements.
  • When a function is not injective there must be at least one pair \(a_1,a_2 \in A\) so that \(a_1 \neq a_2 \in A\) but \(f(a_1)=f(a_2)\text{.}\)
As a preview of what is to come when we reach the chapter on cardinality, think about what we can say about the sizes of finite sets \(A,B\) if there is an injective function between them \(f: A \to B\text{.}\) Each element \(a \in A\) has to map to a different element \(b \in B\text{.}\) Consequently the set \(B\) must have at least as many elements as \(A\text{.}\) That is \(|A| \leq |B|\text{.}\) We note that when \(A,B\) are infinite, these sorts of questions become much less obvious.
So — how do we prove this. We have two equivalent conditions
  • \(\displaystyle a_1 \neq a_2 \implies f(a_1) \neq f(a_2)\)
  • \(f(a_1) = f(a_2) \implies a_1 = a_2\text{.}\)
Arguably, the second is easier since we have all had much more practice manipulating equalities than manipulating inequalities. So lets run through the argument. Let \(a_1,a_2 \in \mathbb{R}\text{,}\) and assume that
\begin{align*} f(a_1) &= f(a_2) & \text{ then we must have }\\ 7a_1-3 &= 7a_2-3 & \text{ and so }\\ 7a_1 &= 7a_2 & \text{ and hence }\\ a_1 &= a_2. \end{align*}
That wasn’t too bad. Time to write it up as a proof.
Let \(a_1, a_2 \in \mathbb{R}\) and assume \(f(a_1) = f(a_2)\text{.}\) Then \(7a_1-3 = 7a_2-3\) and so \(7a_1=7a_2\) and thus \(a_1 = a_2\text{.}\) Hence \(f\) is injective.
Now what about an example that is not injective — we can recycle our \(x^2\) example from above.
Injective means
\begin{equation*} \forall a_1,a_2 \in A, f(a_1)=f(a_2) \implies a_1 = a_2 \end{equation*}
and hence non-injective is just the negation of this, namely
\begin{equation*} \exists a_1,a_2 \in A \text{ s.t. } f(a_1)=f(a_2) \land a_1 \neq a_2. \end{equation*}
So we need a counter-example; there exists some pair of distinct \(a_1,a_2 \in \mathbb{R}\) that map to the same value. Of course, we could have started with the equivalent contrapositive definition of injective:
\begin{equation*} \forall a_1,a_2 \in A, a_1 \neq a_2 \implies f(a_1) \neq f(a_2) \end{equation*}
The negation of that is
\begin{equation*} \exists a_1,a_2 \in A \text{ s.t. } a_2 \neq a_2 \implies f(a_1) = f(a_2) \end{equation*}
which is the same as we found with the first definition of injective. Of course, this must be the case because the two definitions are equivalent.
Since \(-1,1\) are in the domain of \(f\) and \(f(-1) = f(1) = 1\text{,}\) the function is not injective.

Remark 10.4.4. Preimages and injections.

Consider an injection \(f: A \to B\) and, for any given \(b \in B\text{,}\) the preimage of \(\set{b}\text{:}\)
\begin{equation*} f^{-1}(\set{b}) = \set{ a \in A \so f(a)=b} \end{equation*}
This set is either empty or contains exactly 1 element. To see this consider \(a,c \in f^{-1}(\set{b})\text{.}\) By definition of the preimage, we know that \(f(a)=b\) and \(f(c)=b\text{.}\) Since \(f\) is injective, this tells us that \(a=c\text{.}\) So the preimage of a point under an injection contains at most 1 element.
Another important class of functions are surjections.

Definition 10.4.5.

Let \(f: A \to B\) be a function. We say that \(f\) is surjective, or onto, when for every \(b \in B\) there is some \(a \in A\) such that \(f(a) = b\text{.}\)
Things to note
  • This simply means that every element in \(B\) is mapped to by some element of \(A\text{.}\)
  • If the function is not surjective then there is some \(b \in B\) such that for all \(a \in A\text{,}\) \(f(a) \neq b\text{.}\)
  • Again, this author prefers the nice latin “surjective” over the term “onto” because it is less likely to confused with other prepositions.
Again, as a preview of cardinality, think about what we can say about the sizes of finite sets \(A,B\) if there is an surjection between them \(g: A \to B\text{.}\) Each element \(b \in B\) must be mapped to by at least one element \(a \in A\text{.}\) Consequently the set \(A\) must have at least as many elements as \(B\text{.}\) That is \(|A| \geq |B|\text{.}\)
So we need to show that no matter which \(y \in \mathbb{R}\) we can always find some \(x \in \mathbb{R}\) such that \(f(x) = 7x-3 = y\text{.}\) So we can just make \(x\) the subject giving \(x=(y+3)/7\text{.}\)
Now that we have chosen \(x\text{,}\) we need to make sure it is actually an element of the domain of the function. In this case it is easy since \((y+3)/7 \in \mathbb{R}\text{.}\) However, if we consider a similarly function
\begin{equation*} g: \mathbb{Z} \to \mathbb{Z} \qquad g(x)=7x-3 \end{equation*}
we would also get \(x = \frac{y+3}{7}\text{,}\) but then \(x\) is not always in the domain. Time to write up.
Let \(y \in \mathbb{R}\text{.}\) Choose \(x = (y+3)/7\text{,}\) then \(f(x) = 7 \cdot \frac{y+3}{7} -3 =y\text{.}\) Hence \(f\) is surjective.
Notice that in the proof we do not have to explain to the reader how we found the choice of \(x\text{.}\) It is not necessary to work through solving \(y=f(x)\) for \(x\text{.}\) All we need to do in the proof is tell the reader “Given \(y\) we choose this value of \(x\)” and then show that \(f(x)=y\text{.}\) This can often be a little frustrating for the reader who can be left thinking “How on earth did they get that?”; a good author might put in a little explanation in the text surrounding the proof, but it is not required for the proof to be valid.
Since surjective means
\begin{equation*} \forall b \in B, \exists a \in A \text{ s.t. } f(a)=b \end{equation*}
its negation is
\begin{equation*} \exists b \in B \text{ s.t. } \forall a \in A, f(a) \neq b. \end{equation*}
So in order to show that \(g\) is not surjective, we have to find (at least one) \(b \in \mathbb{R}\) (in the co-domain) that is not the image of any \(a \in \mathbb{R}\) (in the domain). Sometimes this can be tricky to prove, but for the example above we can use the fact that the square of a real is non-negative.
Let \(b = -1 \in \mathbb{R}\text{.}\) Since the square of any real number is non-negative, we know that \(f(x) = x^2 \geq 0\) for any \(x \in \mathbb{R}\text{.}\) Hence there is no \(x \in \mathbb{R}\) so that \(f(x) = -1\text{.}\) Thus the function is not surjective.

Remark 10.4.8. Preimages and surjections.

Consider a surjection \(g: A \to B\) and, for any given \(b \in B\text{,}\) the preimage of \(\set{b}\text{:}\)
\begin{equation*} g^{-1}(\set{b}) = \set{ a \in A \so g(a)=b} \end{equation*}
Since \(g\) is a surjection, for any given \(b \in B\text{,}\) there must be at least \(a \in A\) so that \(g(a)=b\text{.}\) Hence the preimage must contain at least one element.

Subsection 10.4.2 Bijective functions

Recall that we reasoned (but didn’t really prove) that for finite sets \(A, B\)
  • if there is an injection \(f: A\to B\) then \(|A| \leq |B|\text{,}\) and
  • if there is a surjection \(g: A \to B\) then \(|A| \geq |B|\)

Warning 10.4.9.

Be careful with your converses. Consider two finite sets \(A,B\text{.}\) If \(|A|\leq|B|\) it does not mean that all functions \(f:A \to B\) will be injections. Similarly if \(|A|\geq |B|\) not all functions \(g: A\to B\) need to surjections.
The function on the left is not injective (despite its domain being smaller than its co-domain). And the function on the right it not surjective (despite its domain being larger than its co-domain).
So given sets \(A,B\) if we can find such an injection and a surjection between them, then \(|A|=|B|\text{.}\) One way to do this is to find one function \(h: A \to B\) that is both injective and surjective; these functions are called bijections. Finding a bijection between two sets is a good way to demonstrate that they have the same size — we’ll do more on this in the chapter on cardinality.

Definition 10.4.10.

Let \(f: A \to B\) be a function. If \(f\) is injective and surjective then we say that \(f\) is bijective, or a one-to-one correspondence.
The terms injective, surjective and bijective were coined by Nicholas Bourbaki. Bourbaki was not a person, but the pseudonym of a group of (mostly French) mathematicians who wrote a series of texts in the mid 20th century. The group still exists and published a book in 2016. The central aim of the group was to create a series of complete and self-contained texts on the core of mathematics. The texts strive to be extremely rigorous and very general in their treatment of the material and not without controversy. You can search engine your way to more information on this topic.

Example 10.4.11.

Consider the following functions. Determine whether they are injective, surjective or bijective. They all have the same formula, but have different domains and co-domains (and so are different functions).
  • \(f: \mathbb{R} \to \mathbb{R}\) given by \(f(x) =x^2\text{.}\) Neither surjective, nor injective.
  • \(g: \mathbb{R} \to [0, \infty)\) given by \(g(x) =x^2\text{.}\) Surjective, but not injective.
  • \(h: [0,\infty) \to \mathbb{R}\) given by \(h(x) =x^2\text{.}\) Not surjective, but is injective.
  • \(\rho: [0,\infty) \to [0,\infty) \) given by \(\rho(x) =x^2\text{.}\) Is bijective.
Solution.
All of these functions have the same formula, just different domains and co-domains.
  • Think about how these functions can fail to be injective. We can verify that \(f(-x)=(-x)^2 = x^2 = f(x)\text{.}\) Hence these functions will fail to be injective if \(x\) and \(-x\) are both within the domain. So neither \(f, g\) are injective since
    \begin{equation*} f(-1)=f(1)=1 \qquad \text{ and } \qquad g(-1)=g(1)=1 \end{equation*}
    We now prove that \(h\) is injective. So let \(a_1,a_2 \geq 0\text{,}\) so that \(h(a_1)=h(a_2)\text{.}\) Hence
    \begin{equation*} a_1^2 = a_2^2 \end{equation*}
    Taking square-roots gives
    \begin{equation*} a_1 = \pm a_2 \end{equation*}
    However, since neither \(a_1,a_2\) are negative, we must have \(a_1=a_2\) and so \(h\) is an injection. The same argument works for \(\rho\text{.}\)
  • Now think about how these functions might fail to be surjective. We know that the square of a real number is non-negative. That is \(0 \leq x^2\text{.}\) So if there is a negative number in the co-domain there is no real number that can map to it. Consequently, neither \(f\) nor \(h\) are surjections, since there is no \(x \in \mathbb{R}\) so that
    \begin{equation*} f(x) = -1 \qquad \text{ or } \qquad h(x)=-1. \end{equation*}
    We now prove that \(g\) is surjective using the square-root function. Given any \(y\) in the codomain of \(g\text{,}\) pick \(x = \sqrt{y}\text{.}\) Since \(y \geq 0\text{,}\) \(x\) is defined and non-negative and so in the domain of \(g\text{.}\) Further, we know that \(g(x) = x^2 = (\sqrt{y})^2 = y\text{.}\) Thus \(g\) is surjective. The argument for \(\rho\) is identical.
So to summarise
  • \(f\) is neither injective, nor surjective,
  • \(g\) is surjective but not injective,
  • \(h\) is injective but not surjective, and
  • \(\rho\) is both injective and surjective, and so bijective.
The simplest (useful) bijective function is the identity function.

Definition 10.4.12.

Given a non-empty set \(A\) we define the identity function \(i_A: A \to A\) by \(i_A(a) = a\) for all \(a\in A\text{.}\)
The authors are usually loath to use the word “clear”, but we hope that it is clear that the identify function is surjective and injective and so bijective. We could prove it if we really had to. We will need the identity function to help us define the inverse of a function.
We need a couple more examples.
We need to show that \(g\) is both injective and surjective.
Since the function is both injective and surjective it is bijective.
A more interesting example. Let \(a,b,c,d \in \mathbb{R}\) and define
\begin{equation*} h: \mathbb{R}-\set{\frac{d}{c}} \to \mathbb{R}-\set{\frac{a}{c}} \qquad h(x) = \frac{ax-b}{cx-d} \end{equation*}
If the constants satisfy \(ad\neq bc\text{,}\) then this is a Möbius transformation 139  Notice that the denominator is zero when \(cx=d\text{,}\) and hence we have removed the point \(x=\frac{d}{c}\) from the domain; the function is not defined there. Some similar reasoning can show that there is no \(x \in \mathbb{R}\) so that \(h(x)= \frac{a}{c}\text{,}\) so we remove that point from the co-domain. Finally, notice that if \(ad=bc\) then
\begin{align*} h(x)\amp = \frac{ax-b}{cx-d} \\ \amp = \frac{cax-cb}{c^2 x - cd} \amp \text{since } bc=ad\\ \amp = \frac{cax - ad}{c^2 x-cd} = \frac{a(cx-d)}{c(cx-d)}\\ \amp = \frac{a}{c} \end{align*}
and so is constant.
Möbius transforms are a good source of non-trivial bijective function examples for authors to give to students. So let us just do this in full generality.
Scratch work.
  • Injective. Let \(x,y \in \mathbb{R}-\set{\frac{d}{c}}\) and assume \(h(x) = h(y)\text{.}\) Then
    \begin{align*} \frac{ax-b}{cx-d} &= \frac{ay-b}{cy-d}\\ (cy-d)(ax-b) &= (ay-b)(cx-d) & \text{ since denominator } \neq 0\\ caxy-cyb-adx+db &= acxy - ady -bcx +bd\\ (ad-bc)y \amp = (ad-bc)x \amp \text{ so we are done} \end{align*}
  • Surjective. Let \(y = h(x)\text{,}\) now find \(x\)
    \begin{align*} y &= \frac{ax-b}{cx-d}\\ cxy-dy &= ax-b\\ cxy - ax &= dy-b\\ x(cy-a) \amp = dy-b\\ x \amp = \frac{dy-b}{cy-a} \end{align*}
    We now need to show that \(x \neq \frac{d}{c}\) (and so is in the domain), and we can do so by considering
    \begin{align*} x - \frac{d}{c}\amp = \frac{dy-b}{cy-a} - \frac{d}{c} \\ \amp = \frac{dcy-bc-dcy+da}{c(cy-a)}\\ \amp = \frac{ad-bc}{c(cy-a)} \end{align*}
    Now since \(ad \neq bc\) and \(y \neq \frac{a}{c}\text{,}\) this is never zero. Hence this choice of \(x\) really does lie in the domain of the function.
We need to show that \(h\) is both injective and surjective.
  • Let \(y \in \mathbb{R}- \frac{a}{c}\text{.}\) Pick \(x = \frac{dy-b}{cy-a} = \frac{d}{c} + \frac{ad-bc}{c(cy-a)}\text{.}\) Since \(\frac{ad-bc}{c(cy-a)} \neq 0\) for any \(y \in \mathbb{R}-\set{\frac{a}{c}}\text{,}\) this choice of \(x\) is in the domain of the function. Now
    \begin{align*} h(x) &= \frac{a \frac{dy-b}{cy-a} -b}{ c\frac{dy-b}{cy-a}-d }\\ &= \frac{a(dy-b) - b(cy-a)}{c(dy-b)-d(cy-a)}\\ &= \frac{y(ad-bc)}{ad-bc} = y \end{align*}
    Hence for any \(y\) in the co-domain, there is an \(x\) in the range such that \(h(x) = y\text{.}\) So the function is surjective.
  • Now let \(x, y\) be two elements of the range such that \(h(x) = h(y)\text{.}\) Hence
    \begin{align*} \frac{ax-b}{cx-d} &= \frac{ay-b}{cy-d}\\ (cy-d)(ax-b) &= (ay-b)(cx-d) & \text{ since denominator} \neq 0\\ cax-cby-dax+db &= acxy-day-bcx+db\\ (ad-bc)y &= (ad-bc)x \end{align*}
    Hence \(x=y\text{.}\) Thus if \(h(x)=h(y)\) we must have \(x=y\) and so \(h\) is injective.
Since the function is both injective and surjective, it is bijective as required.
To be more precise, the preimage of the set \(\set{\text{M}}\) is the set \(\set{\text{Monday}}\text{.}\)
Again, being more precise, the preimage of the set \(\set{\text{S}}\) is the set \(\set{\text{Saturday}, \text{Sunday}}\text{.}\)
This is perhaps no so hard to see when everything is finite, but harder to see when things are infinite. Indeed, we’ll see examples of just how weird this can be when we get to cardinality later in the text.
Of course that doesn’t mean the speaker knows what they are talking about, just that they sound like it. This is a variation of the “argument from authority” fallacy. Mind you there is the equally, or perhaps even more pernicious, fallacy that because a statement comes from an authority it should be mistrusted. Perhaps an example of an “appeal to common folk” fallacy. “Experts — bah! What would they know?!”
Well, we should really take \(x\) over complex numbers, but the interested reader should search-engine their way to more on this topic. They are named for August Möbius who was 19th century German mathematician and astronomer. He is perhaps best known for discovering the Möbius strip which is an two-dimensional surface with only 1 side. The Möbius strip was actually discovered slightly earlier by Johann Listing; perhaps “Listing strip” doesn’t have quite the same ring to it (sorry for the poor pun).