When students are first taught (as undergraduate math majors or as graduate students) about the Central Limit Theorem (CLT), they are often in awe of how all-encompassing this remarkable result is.
They have up to this point been introduced to the concepts of discrete and continuous random variables, distribution functions, independence and conditionality, expectations, convergence in probability and the weak Law of Large Numbers, among other topics.
More often than not they become acquainted with the binomial distribution and apply it to finding probabilities of outcomes associated with coin-tossing experiments. For a large number of trials (which, with today’s powerful math software, would be trivial), the instructor will introduce Stirling’s Theorem, which for our purposes states that
and use it to prove the de Moivre-Laplace approximation to the binomial approximation: If is a binomial random variable with parameters and then for positive integer
What this says is that for large , the number of successes in binomial trials with constant probability of success can be approximated using a normal distribution.
I have heard students say, “That is really cool,” which of course would please me greatly. But by now the students are ready to be introduced to the CLT. We present the classical version:
Suppose that is a sequence of independent and identically distributed random variables with finite mean and finite variance. Let and let and be the mean and variance respectively of , then with
for any real .
That is, converges in distribution to a normal random variable with mean 0 and variance 1, often designated simply as .
The instructor states that the CLT greatly generalizes the de Moivre-Laplace results in that it too serves as an approximation to binomial distribution using the normal law but does the same for any number of other distributions as long as its conditions are satisfied. In fact, social scientists and other researchers analyze their data using the normal law as the vehicle for estimation and for testing hypotheses. (Unfortunately, it is somewhat cavalier to approach such problems by invoking the normal law as if it is a universal truth. But of course, that is another story.)
Unfortunately in most courses, at least at the elementary or intermediate level, the story ends here, when it should not. The very brief formulation described above is de rigueur of most content experienced in an introductory probability course. In fact, the normal law is deeply connected to “sums of independent quantities.” And very early work connected to stochastically independent functions, which generalizes sums of independent quantities, frees us in our thinking from the constraints of games of chance, for example.
Nowhere is this more elegantly discussed for the non-expert than in a wonderful autobiography by Mark Kac entitled Enigmas of Chance, part of the Sloan Foundation series by or of prominent scientists. In Enigmas of Chance, Kac gives two examples which illustrate this point vividly and accessibly.
I will only briefly sketch Kac’s first example, which will amply make the point. For the more curious reader, please refer to Enigmas of Chance as a beginning.
To start, recall that for every there is a unique non-terminating decimal expansion. For example,
Or in general for any there exists a unique sequence of digits (where for any , can only assume .) Thus
Of course there is nothing sacred about base 10; we can, for example, use base 2. In this case the ’s from above can only assume the values 0 or 1, in which case
so that , etc.
Consider now those numbers for which .
The smallest such is therefore and, recalling the sum of a geometric series, the largest is
so and form the interval which has length .
Denote this as .
Now we can readily see that can occur two ways:
Reasoning as above for each respective possibility, we obtain two intervals and By summing these lengths we arrive at .
Similarly, reasoning yields and .
And so the reasoning continues, which shows that with the binary variable replaced by and replaced by it follows that with that for any
Notice what has been achieved: by the simple arithmetic demonstration that are indeed independent in the sense that we simply can apply the CLT to demonstrate convergence to without invoking games of chance like coin tossing or underlying probability distributions assumed for random variables. The normal law can apply as well under conditions that have nothing to do with what the student has thus far encountered in a standard course in probability but, as Kac illustrates, “could be part of everyday mathematics.”
As Kac points out, this kind of thinking was introduced by Hugo Steinhaus in a 1923 paper dealing with arithmetization of probability theory and resulted in bringing the “normal law closer to the mainstream of mathematics”—a useful and important piece in the history of mathematics but a worthy subject in showing how ubiquitous the normal law is.