Why are so many Harvard students first-borns?

The noted political philosopher Michael Sandel has taught the wildly popular course “Justice” at Harvard University for many years.  The course can be viewed online, and in Episode 8 there was an amazing moment when Professor Sandel asked his students to raise their hands if they were the first-born children in their family.  About 75 to 80 percent raised their hands.  (It’s rewarding to watching the lecture, but if you just want to locate this particular scene you can fast forward about 23 minutes.)  Professor Sandel used this data to infer that the academic effort a child exerts depends on birth order, with first born children exhibiting high effort levels, and thus being over-represented at Harvard.  But in the June 2012 issue of Significance (a bimonthly magazine published by the American Statistical Association and the Royal Statistical Society), Antony Millner and Raphael Calel pointed out that such reasoning constituted the classic cognitive illusion called base-rate neglect.  (You can read the preprint here.)  To get an idea about the base-rate effect, let’s perform a thought experiment.  Imagine that Professor Sandel is lecturing at a prestigious university in China, with a cohort of students born during the era of one-child policy.  It must look conspicuous if someone does not raise his hand, because theoretically everyone must be the first and the only child of the family!  Back to Harvard, it appears that Professor Sandel did not carefully differentiate the probability that a student is a first born given that the student is at Harvard, from the probability that a student is at Harvard given that the student is a first born.  Mathematicians use the notation \Pr(\text{1st-born} | \text{Harvard}) and \Pr(\text{Harvard} | \text{1st-born}) for the two conditional probabilities, and they are not the same.  (We know that \Pr(\text{pregnant} | \text{woman}) and \Pr(\text{woman} | \text{pregnant}) are certainly different!)  Bayes’ rule establishes a (sometime counter-intuitive) connection between the two conditional probabilities.  As explained in the technical note below, Professor Sandel’s observation that 75% to 80% of Harvard students are first-borns can be explained without recourse to any birth-order effect at all if the fertility rate of mothers with children at Harvard is 1.25 to 1.33.  It is well-known that wealthy and well-educated parents tend to have fewer children, and such a fact seems to be a more plausible explanation for the high number of first-born children at Harvard.

Bayes’ rule, in odds form, is simply O=Q \cdot R, where O denotes the posterior odds, Q denotes the prior odds, and R denotes the likelihood ratio (see the technical note below for a worked example).  As the psychologists Daniel Kahneman and the late Amos Tversky have demonstrated, there is a tendency that people (including mathematically sophisticated ones) ignore the prior odds or base rate, rather than integrate them with likelihood ratio when estimating probability.  For example, we know that people with a PhD are more likely to subscribe to The New York Times than people who ended their education after high school.  But when you see a person reading The New York Times on the subway, is he more likely to have a PhD?  Most people use PhDs as the representative of the Times readers, and ignore the fact that there are far more non-PhDs than PhDs riding in the New York subways.  Again, without carefully differentiating \Pr(\text{NYT reader} | \text{PhD}) from \Pr(\text{PhD} | \text{NYT reader}), one is prone to make a mistake.

So far, the mix-ups of \Pr(A | B) and \Pr(B | A) are rather harmless, but such a muddled thinking can lead to some serious consequences.  Many readers of this blog are probably familiar with the medical test problem.  Most people even some physicians can not give an informative estimate of the probability that a person has any disease given a positive screening test.  As a result of confusing \Pr(\text{having cancer} | \text{+ mammogram}) with \Pr(\text{+ mammogram} | \text{having cancer}), many women suffered needless anxiety about false-positive mammograms.  Because this topic has been discussed extensively, I will not say more, but refer the readers to “Mammogram Math” by Professor John Allen Paulos.  I want to mention stereotypes, which can be considered as probabilistic predictions that distinguish the stereotyped group from others.  (This was proposed by Clark McCauley and Christopher L. Stitt in 1978.)  I think most of our students have been exposed to the concept that stereotypes of race, gender, ethnicity, and religion can be harmful when they are used to discriminate or oppress.  I would like to see educators take a more quantitative approach by helping their students understand the difference between, say, \Pr(\text{terrorist} | \text{Muslim}) and \Pr( \text{Muslim} | \text{terrorist}).  While many people have the impression that many terrorist incidents involved Muslims, they often neglect the base rate—the vast majority of both Muslims and non-Muslims never commit violence.

Through these examples of \Pr(A | B) and \Pr(B | A), I hope I made a point that quantitative reasoning (QR), defined as contextualized use of numbers and data in a matter that involves critical thinking skills, should be an essential component of our general education.  Furthermore, I hope you realized that it takes not only mathematicians but also educators across various disciplines to develop students’ QR competency.  Currently, a group of CUNY faculty from several colleges, under the auspices of a NSF-funded project called NICHE, are working to infuse QR skills into their curricula; you can find more information here.

Technical Note

Bayes’ rule is often written as \Pr(A|B) = \Pr(A) \cdot \frac{\Pr(B|A)}{\Pr(B)}.  We can algebraically rearrange it into a more symmetric form, \frac{\Pr(A|B)}{\Pr(A^{c}|B)} = \frac{\Pr(A)}{\Pr(A^{c})} \cdot \frac{\Pr(B|A)}{\Pr(B|A^{c})}, or simply O = Q \cdot R as shown in the main text.  Applying this formula to Harvard students, we have \frac{\Pr(\text{1st-born}|\text{Harvard})}{\Pr(\text{not 1st-born}|\text{Harvard})} = \frac{\Pr(\text{1st-born})}{\Pr(\text{not 1st-born})} \cdot \frac{\Pr(\text{Harvard}|\text{1st-born})}{\Pr(\text{Harvard}|\text{not 1st-born})}.  The left-hand side of the equation is what Professor Sandel observed, which is the product of the base rate and the birth-order effect.  We can show that \Pr(\text{1st-born}) is the reciprocal of the fertility rate.  If we use 0.80 or 0.75 for \Pr(\text{1st-born}), which translates to a fertility rate of 1/0.80=1.25 to 1/0.75=1.33, Professor Sandel’s 75% to 80% value for \Pr(\text{1st-born}|\text{Harvard}) is entirely explained.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

3 Responses to Why are so many Harvard students first-borns?

  1. ND says:

    “Pr(1st born) = 1/fertility rate” So that’s why 127% of children in Singapore are first-borns…

  2. So glad you blogged about this! I’m not the most numerate individual, but I know that math is important. I look forward to reading more analyses of interesting events like this.

  3. Pingback: Labor Day Round Up! : Footenotes

Leave a Reply

Your email address will not be published. Required fields are marked *