CTGD Some Examples of Mathematical Induction

Section 2.1 Some Examples of Mathematical Induction

If you are unfamiliar with the Principle of Mathematical Induction, you should read Appendix B (a portion of which is repeated here).

Subsection 2.1.1 Mathematical induction

The principle of mathematical induction states that

In order to prove a statement about an integer \(n\text{,}\) if we can

Prove the statement when \(n=b\text{,}\) for some fixed integer \(b\)

Show that the truth of the statement for \(n=k-1\) implies the truth of the statement for \(n=k\) whenever \(k>b\text{,}\)

then we can conclude the statement is true for all integers \(n\ge b\text{.}\)

As an example, let us give yet another proof that a set with \(n\) elements has \(2^n\) subsets. This proof uses essentially the same bijections we used in proving the Pascal Equation. The statement we wish to prove is the statement that “A set of size \(n\) has \(2^n\) subsets.”

Our statement is true when \(n=0\text{,}\) because a set of size 0 is the empty set and the empty set has \(1=2^0\) subsets. (This step of our proof is called a base step.) Now suppose that \(k>0\) and every set with \(k-1\) elements has \(2^{k-1}\) subsets. Suppose \(S=\{a_1,a_2,\ldots a_k\}\) is a set with \(k\) elements. We partition the subsets of \(S\) into two blocks. Block \(B_1\) consists of the subsets that do not contain \(a_n\) and block \(B_2\) consists of the subsets that do contain \(a_n\text{.}\) Each set in \(B_1\) is a subset of \(\{a_1,a_2,\ldots a_{k-1}\}\text{,}\) and each subset of \(\{a_1,a_2, \ldots a_{k-1}\}\) is in \(B_1\text{.}\) Thus \(B_1\) is the set of all subsets of \(\{a_1,a_2,\ldots a_{k-1}\}\text{.}\) Therefore by our assumption in the first sentence of this paragraph, the size of \(B_1\) is \(2^{k-1}\text{.}\) Consider the function from \(B_2\) to \(B_1\) which takes a subset of \(S\) including \(a_k\) and removes \(a_k\) from it. This function is defined on \(B_2\text{,}\) because every set in \(B_2\) contains \(a_k\text{.}\) This function is onto, because if \(T\) is a set in \(B_1\text{,}\) then \(T\cup \{a_k\}\) is a set in \(B_2\) which the function sends to \(T\text{.}\) This function is one-to-one because if \(V\) and \(W\) are two different sets in \(B_2\text{,}\) then removing \(a_k\) from them gives two different sets in \(B_1\text{.}\) Thus we have a bijection between \(B_1\) and \(B_2\text{,}\) so \(B_1\) and \(B_2\) have the same size. Therefore by the sum principle the size of \(B_1\cup B_2\) is \(2^{k-1} +2^{k-1}=2^k\text{.}\) Therefore \(S\) has \(2^k\) subsets. This shows that if a set of size \(k-1\) has \(2^{k-1}\) subsets, then a set of size \(k\) has \(2^k\) subsets. Therefore by the principle of mathematical induction, a set of size \(n\) has \(2^n\) subsets for every nonnegative integer \(n\text{.}\)

The first sentence of the last paragraph is called the inductive hypothesis. In an inductive proof we always make an inductive hypothesis as part of proving that the truth of our statement when \(n=k-1\) implies the truth of our statement when \(n=k\text{.}\) The last paragraph itself is called the inductive step of our proof. In an inductive step we derive the statement for \(n=k\) from the statement for \(n=k-1\text{,}\) thus proving that the truth of our statement when \(n=k-1\) implies the truth of our statement when \(n=k\text{.}\) The last sentence in the last paragraph is called the inductive conclusion. All inductive proofs should have a base step, an inductive hypothesis, an inductive step, and an inductive conclusion.

There are a couple details worth noticing. First, in this problem, our base step was the case \(n=0\text{,}\) or in other words, we had \(b=0\text{.}\) However, in other proofs, \(b\) could be any integer, positive, negative, or 0. Second, our proof that the truth of our statement for \(n=k-1\) implies the truth of our statement for \(n=k\) required that \(k\) be at least 1, so that there would be an element \(a_k\) we could take away in order to describe our bijection. However, condition (2) of the principle of mathematical induction only requires that we be able to prove the implication for \(k>0\text{,}\) so we were allowed to assume \(k>0\text{.}\)

Subsubsection 2.1.1.1 Strong Mathematical Induction

One way of looking at the principle of mathematical induction is that it tells us that if we know the “first” case of a theorem and we can derive each other case of the theorem from a smaller case, then the theorem is true in all cases. However the particular way in which we stated the theorem is rather restrictive in that it requires us to derive each case from the immediately preceding case. This restriction is not necessary, and removing it leads us to a more general statement of the principal of mathematical induction which people often call the strong principle of mathematical induction. It states:

In order to prove a statement about an integer \(n\) if we can

prove our statement when \(n=b\) and

prove that the statements we get with \(n=b\text{,}\) \(n=b+1\text{,}\) … \(n=k-1\) imply the statement with \(n=k\text{,}\)

then our statement is true for all integers \(n\ge b\text{.}\)

You will find some explicit examples of the use of the strong principle of mathematical induction in Appendix B and will find some uses for it in this chapter.

Subsection 2.1.2 Binomial Coefficients and the Binomial Theorem

Problem 72

When we studied the Pascal Equation and subsets in Chapter 1, it may have appeared that there is no connection between the Pascal relation \(\binom{n}{k} = \binom{n-1}{k-1} +\binom{n-1}{k}\) and the formula \(\binom{n}{k}=\frac{n!}{k!(n-k)!}\text{.}\) Of course you probably realize you can prove the Pascal relation by substituting the values the formula gives you into the right-hand side of the equation and simplifying to give you the left hand side. In fact, from the Pascal Relation and the facts that \(\binom{n}{0}=1\) and \(\binom{n}{n}=1\text{,}\) you can actually prove the formula for \(\binom{n}{k}\) by induction on \(n\text{.}\) Do so.

Hint

We wish to prove that \(\binom{n}{i} =\frac{n!}{i!(n-i)!}\text{.}\) Mathematical induction allows us to assume that \(\binom{n-1}{j}=\frac{(n-1)!}{j! (n-1-j)!}\) for every \(j\)between 0 and \(n-1\text{.}\) How does this put us into a position to use the Pascal relation? What special cases will be left over?

Problem 73

Use the fact that \((x+y)^n = (x+y)(x+y)^{n-1}\) to give an inductive proof of the binomial theorem.

Hint

What sort of relationship do you know between values of the form \(\binom{n}{i}\) and values of the form \(\binom{n-1}{j}\text{?}\)

Problem 74

Suppose that \(f\) is a function defined on the nonnegative integers such that \(f(0)=3\) and \(f(n)=2f(n-1)\text{.}\) Find a formula for \(f(n)\) and prove your formula is correct.

Problem 75

Prove the conjecture in Part 13.b for an arbitrary positive integer \(m\) without appealing to the general product principle.

Hint

We did something rather similar in our example of the inductive proof that a set with \(n\) elements has \(2^n\) subsets. The work you did in a previous problem may be similar to part of what you need to do here.

Subsection 2.1.3 Inductive definition

You may have seen \(n!\) described by the two equations \(0!=1\) and \(n!=n(n-1)!\) for \(n>0\text{.}\) By the principle of mathematical induction we know that this pair of equations defines \(n!\) for all nonnegative numbers \(n\text{.}\) For this reason we call such a definition an inductive definition. An inductive definition is sometimes called a recursive definition. Often we can get very easy proofs of useful facts by using inductive definitions.

Problem 76

An inductive definition of \(a^n\) for nonnegative \(n\) is given by \(a^0=1\) and \(a^n=aa^{n-1}\text{.}\) (Notice the similarity to the inductive definition of \(n!\text{.}\)) We remarked above that inductive definitions often give us easy proofs of useful facts. Here we apply this inductive definition to prove two useful facts about exponents that you have been using almost since you learned the meaning of exponents.

(a)

Use this definition to prove the rule of exponents \(a^{m+n}=a^ma^n\) for nonnegative \(m\) and \(n\text{.}\)

Hint

This may look difficult because one can't decide in advance on whether to try to induct on \(m\text{,}\) on \(n\text{,}\) or on their sum. In some sense, it doesn't matter which you choose to induct on, though inducting on the sum would look more complicated. For most people inducting on \(n\) fits their way of working with exponents best.

(b)

Use this definition to prove the rule of exponents \(a^{mn} = (a^m)^n\text{.}\)

Hint

Here it matters whether you choose to induct on \(m\) or \(n\text{.}\) However, it matters only in the sense that you need to use more tools in one case. In one case, you are likely to need the rule \((cd)^n=c^n d^n\)(, which we haven't proved. (However, you might be able to prove that by induction!) In either case, you may find part (a) handy.

Problem 77

Suppose that \(f\) is a function on the nonnegative integers such that \(f(0)=0\) and \(f(n) = n+f(n-1)\text{.}\) Prove that \(f(n) = n(n+1)/2\text{.}\) Notice that this gives a third proof that \(1+2+\cdots+n=n(n+1)/2\text{,}\) because this sum satisfies the two conditions for \(f\text{.}\) (The sum has no terms and is thus 0 when \(n=0\text{.}\))

Problem 78

Give an inductive definition of the summation notation \(\sum_{i=1}^n a_i\text{.}\) Use it and the distributive law \(b(a+c) = ba+bc\) to prove the distributive law

\begin{equation*} b\sum_{i=1}^n a_i = \sum_{i=1}^n ba_i. \end{equation*}

Subsection 2.1.4 Proving the general product principle (Optional)

We stated the sum principle as

If we have a partition of a set \(S\text{,}\) then the size of \(S\) is the sum of the sizes of the blocks of the partition.

In fact, the simplest form of the sum principle says that the size of the sum of two disjoint (finite) sets is the sum of their sizes.

Problem 79

Prove the sum principle we stated for partitions of a set from the simplest form of the sum principle.

Hint

We didn't explicitly say to use induction here, but, especially in this context, induction is a natural tool to try here. But we don't have a variable \(n\) to induct on. That means you have to choose one. So what do you think is most useful. The number of blocks in the partition? The size of the first block of the partition? The size of the set we are partitioning? Or something else?

We stated the simplest form of the product principle as

If we have a partition of a set \(S\) into \(m\) blocks, each of size \(n\text{,}\) then \(S\) has size \(mn\text{.}\)

In Problem 14 we gave a more general form of the product principle which can be stated as

Let \(S\) be a set of functions \(f\) from \([n]\) to some set \(X\text{.}\) Suppose that

there are \(k_1\) choices for \(f(1)\text{,}\) and

suppose that for each choice of \(f(1)\text{,}\) \(f(2)\text{,}\) … \(f(i-1)\text{,}\) there are \(k_i\) choices for \(f(i)\text{.}\)

Then the number of functions in the set \(S\) is \(k_1k_2\cdots k_n\text{.}\)

Problem 80

Prove the general form of the product principle from the simplest form of the product principle.

Hint

Think about how you might have gone from the number of double decker cones to the number of triple decker cones in Problem 6.

Subsection 2.1.5 Double Induction and Ramsey Numbers

In Section 1.3.4 we gave two different descriptions of the Ramsey number \(R(m,n)\text{.}\) However if you look carefully, you will see that we never showed that Ramsey numbers actually exist; we merely described what they were and showed that \(R(3,3)\) and \(R(3,4)\) exist by computing them directly. As long as we can show that there is some number \(R\) such that when there are \(R\) people together, there are either \(m\) mutual acquaintances or \(n\) mutual strangers, this shows that the Ramsey Number \(R(m,n)\) exists, because it is the smallest such \(R\text{.}\) Mathematical induction allows us to show that one such \(R\) is \(\binom{m+n-2}{m-1}\text{.}\) The question is, what should we induct on, \(m\) or \(n\text{?}\) In other words, do we use the fact that with \(\binom{m+n-3}{m-2}\) people in a room there are at least \(m-1\) mutual acquaintances or \(n\) mutual strangers, or do we use the fact that with at least \(\binom{m+n-3}{n-2}\) people in a room there are at least \(m\) mutual acquaintances or at least \(n-1\) mutual strangers? It turns out that we use both. Thus we want to be able to simultaneously induct on \(m\) and \(n\text{.}\) One way to do that is to use yet another variation on the principle of mathematical induction, the Principle of Double Mathematical Induction. This principle (which can be derived from one of our earlier ones) states that

In order to prove a statement about integers \(m\) and \(n\text{,}\) if we can

Prove the statement when \(m=a\) and \(n=b\text{,}\) for fixed integers \(a\) and \(b\)

Prove the statement when \(m=a\) and \(n>b\) and when \(m>a\) and \(n=b\) (for the same fixed integers \(a\) and \(b\)),

Show that the truth of the statement for \(m=j\) and \(n=k-1\) (with \(j\ge a\) and \(k>b\)) and the truth of the statement for \(m=j-1\) and \(n=k\) (with \(j>a\) and \(k\ge b\)) imply the truth of the statement for \(m=j\) and \(n=k\text{,}\)

then we can conclude the statement is true for all pairs of integers \(m\ge a\) and \(n\ge b\text{.}\)

There is a strong version of double induction, and it is actually easier to state. The principle of strong double mathematical induction says the following.

In order to prove a statement about integers \(m\) and \(n\text{,}\) if we can

Prove the statement when \(m=a\) and \(n=b\text{,}\) for fixed integers \(a\) and \(b\text{.}\)

Show that the truth of the statemetn for values of \(m\) and \(n\) with \(a+b\leq m+n \lt k\) imples the truth of the statment for \(m+n=k\text{,}\)

then we can conclude that the statement is true for all pairs of integers \(m\geq a\) and \(n\geq b\text{.}\)

Problem 81

Prove that \(R(m,n)\) exists by proving that if there are \(\binom{m+n-2}{m-1}\) people in a room, then there are either at least \(m\) mutual acquaintances or at least \(n\) mutual strangers.

Hint 1

Perhaps the first thing one needs to ask is why proving that if there are \(\binom{m+n-2}{m-1}\)people in a room, then there are either at least \(m\) mutual acquaintances or at least \(n\) mutual strangers proves that \(R(m, n)\) exists. Can you see why this tells us that there is some number \(R\) of people such that if \(R\) people are in a room, then there are \(m\) mutual acquaintances or \(n\) mutual strangers? And why does that mean the Ramsey Number exists?

Hint 2

Naturally it should come as no surprise that you will use double induction, and you can use either form. As you think about how to use induction, the Pascal relation will come to mind. This suggests that you want to make assumptions involving \(\binom{m+n-3}{m-1}\) people in a room, or \(\binom{m+n-3}{m-2}\) people in a room. Now you have to figure out what these assumptions are and how they help you prove the result! Recall that we have made progress before by choosing one person and asking whether this person is acquainted with at least some number of people or unacquainted with at least some other number of people.

Problem 82

Prove that \(R(m,n)\le R(m-1,n) + R(m,n-1)\text{.}\)

Hint 1

One expects to need double induction again here. But only because of the location of the problem and because the sum looks like double induction. And those reasons aren't enough to mean you have to use double induction. If you had this result in hand already, then you could us it with double induction to give a second proof that Ramsey Numbers exist.

Hint 2

What you do need to show is that if there are \(R(m - 1, n) + R(m, n - 1)\) people in a room, then there are either \(m\) mutual acquaintances or \(n\) mutual strangers. As with earlier problems, it helps to start with a person and think about the number of people with whom this person is acquainted or nonacquainted. The generalized pigeonhole principle tells you something about these numbers.

Problem 83

(a)

What does the equation in Problem 82 tell us about \(R(4,4)\text{?}\)

(b)

Consider 17 people arranged in a circle such that each person is acquainted with the first, second, fourth, and eighth person to the right and the first, second, fourth, and eighth person to the left. can you find a set of four mutual acquaintances? Can you find a set of four mutual strangers?

Hint

If you could find four mutual acquaintances, you could assume person 1 is among them. And by the generalized pigeonhole principle and symmetry, so are two of the people to the first, second, fourth and eighth to the right. Now there are lots of possibilities for that fourth person. You now have the hard work of using symmetry and the definition of who is acquainted with whom to eliminate all possible combinations of four people. Then you have to think about nonacquaintances.

(c)

What is \(R(4,4)\text{?}\)

Problem 84

(Optional) Prove the inequality of Problem 81 by induction on \(m+n\text{.}\)

Problem 85

Use Stirling's approximation (Problem 46) to convert the upper bound for \(R(n,n)\) that you get from Problem 81 to a multiple of a power of an integer.

Subsection 2.1.6 A bit of asymptotic combinatorics

Problem 83 gives us an upper bound on \(R(n,n)\text{.}\) A very clever technique due to Paul Erdös, called the “probabilistic method,” will give a lower bound. Since both bounds are exponential in \(n\text{,}\) they show that \(R(n,n)\) grows exponentially as \(n\) gets large. An analysis of what happens to a function of \(n\) as \(n\) gets large is usually called an asymptotic analysis. The probabilistic method, at least in its simpler forms, can be expressed in terms of averages, so one does not need to know the language of probability in order to understand it. We will apply it to Ramsey numbers in the next problem. Combined with the result of Problem 83, this problem will give us that \(\sqrt{2}^{ n}\lt R(n,n)\lt 2^{2n-2}\text{,}\) so that we know that the Ramsey number \(R(n,n)\) grows exponentially with \(n\text{.}\)

Problem 86

Suppose we have two numbers \(n\) and \(m\text{.}\) We consider all possible ways to color the edges of the complete graph \(K_m\) with two colors, say red and blue. For each coloring, we look at each \(n\)-element subset \(N\) of the vertex set \(M\) of \(K_m\text{.}\) Then \(N\) together with the edges of of \(K_m\) connecting vertices in \(N\) forms a complete graph on \(n\) vertices. This graph, which we denote by \(K_N\text{,}\) has its edges colored by the original coloring of the edges of \(K_m\text{.}\)

(a)

Why is it that if there is no subset \(N\subseteq M\) so that all the edges of \(K_N\) are colored the same color, then \(R(n,n)>m\text{?}\)

Hint

What is the definition of \(R(n, n)\text{?}\)

(b)

To apply the probabilistic method, we are going to compute the average, over all colorings of \(K_m\text{,}\) of the number of sets \(N\subseteq M\) with \(|N|=n\) such that \(K_N\) does have all its edges the same color. Explain why it is that if the average is less than 1, then for some coloring there is no set \(N\) such that \(K_N\) has all its edges colored the same color. Why does this mean that \(R(n,n)>m\text{?}\)

Hint

If you average a bunch of numbers and each one is bigger than one, what can you say about the average?

(c)

We call a \(K_N\) monochromatic for a coloring \(c\) of \(K_m\) if the color \(c(e)\) assigned to edge \(e\) is the same for every edge \(e\) of \(K_N\text{.}\) Let us define \({ mono}(c,N)\) to be 1 if \(N\) is monochromatic for \(c\) and to be 0 otherwise. Find a formula for the average number of monochromatic \(K_N\)s over all colorings of \(K_m\) that involves a double sum first over all edge colorings \(c\) of \(K_m\) and then over all \(n\)-element subsets \(N\subseteq M\) of \({ mono}(c,N)\text{.}\)

Hint

Note that there are \(2^{\binom{n}{2}}\) graphs on a set of \(n\) vertices.

(d)

Show that your formula for the average reduces to \(2\binom{m}{n}\cdot2^{-\binom{n}{2}}\)

Hint

A notation for the sum over all colorings \(c\) of \(K_m\) is

\begin{equation*} \sum_{c:c\text{ is a coloring of } K_m}, \end{equation*}

and a notation for the sum over all subsets \(N\) of \(M\) that have size \(n\) is

\begin{equation*} \sum_{N:N\subseteq M,\text{ }|N|=n} \end{equation*}

(e)

Explain why \(R(n,n)>m\) if \(\binom{m}{n}\le 2^{\binom{n}{2} -1}\text{.}\)

Hint

If you interchange the order of summation so that you sum over subsets first and colorings second, you can take advantage of the fact that for a fixed subset \(N\) , you can count count the number of colorings in which it is monochromatic.

(f)

Explain why \(R(n,n)>\root n \of {n!2^{\binom{n}{2}-1}}\text{.}\)

Hint

You have an inequality involving \(m\) and \(n\) that tells you that \(R(n, n) > m\text{.}\) Suppose you could work with that inequality in order to show that if the inequality holds, then \(m\) is bigger than something. What could you conclude about \(R(n, n)\text{?}\)

(g)

By using Stirling's formula, show that if \(n\) is large enough, then \(R(n,n) > \sqrt{2^n} = \sqrt{2}^{n}\text{.}\) (Here large enough means large enough for Stiirling's formula to be reasonable accurate.)