The Hardy–Weinberg principle (HWP) (also Hardy–Weinberg equilibrium (HWE), or Hardy–Weinberg law) states that, under certain conditions, after one generation of random mating, the genotype frequencies at a single gene locus will become fixed at a particular equilibrium value. It also specifies that those equilibrium frequencies can be represented as a simple function of the allele frequencies at that locus.
In the simplest case of a single locus with two alleles A and a with allele frequencies of p and q, respectively, the HWP predicts that the genotypic frequencies for the AA homozygote to be p2, the Aa heterozygote to be 2pq and the other aa homozygote to be q2. The Hardy–Weinberg principle is an expression of the notion of a population in "genetic equilibrium" and is a basic principle of population genetics.
The original assumptions for Hardy–Weinberg equilibrium (HWE) were the population under consideration is idealised, that is:
When the Hardy–Weinberg assumptions are not met this can cause deviations from expectation, but depending which assumption is not met, such deviations may or may not be statistically detectable. Deviations can be caused by the Wahlund effect, inbreeding, assortative mating, selection, or genetic drift. Assortative mating will only change the genotype frequencies of those genes that are desired. Genetic drift is particularly active in small population sizes. Deviations caused by selection, however, often require a significant selection coefficient in order to be detected which is why the test for deviations from Hardy–Weinberg proportions is considered a weak test for selection.
A more statistical description for the HWP, is that the alleles for the next generation for any given individual are chosen independently. Consider two alleles, A and a, with frequencies p and q, respectively, in the population then the different ways to form new genotypes can be derived using a Punnett square, where the size of each cell is proportional to the fraction of each genotypes in the next generation:
|A (p)||a (q)|
|Males||A (p)||AA (p2)||Aa (pq)|
|a (q)||aA (qp)||aa (q2)|
So the final three possible genotype frequencies, in the offspring, if the alleles are drawn independently become:
This is normally achieved in one generation, except if a population is created by bringing together males and females with different allele frequencies, in which case, equilibrium is reached in two generations.
Where the a gene is sex-linked, the heterogametic sex (e.g. males in humans) have only one copy of the gene and are effectively haploid for that gene. So the genotype frequency at equilibrium is therefor p and q for the heterogametic sex but p^2, 2pq and q^2 for the homogametic sex.
For example in humans red-green colourblindness is caused by an X-linked recessive allele. The frequency in males is about 1 in 12, (or 0.083) whereas it affects about 1 in 250 women (0.004).
If a population is brought together with males and females with different allele frequencies, then the allele frequency of the male population follows that of the female population because each receives its X chromosome from its mother. The population converges on equilibrium, within about six generations maximum.
The Hardy–Weinberg principle may be generalized to more than two alleles. Consider an extra allele frequency, r. The two-allele case is the binomial expansion of (p + q)2, and thus the three-allele case is:
(p + q + r)2 = p2 + r2 + q2 + 2pq + 2pr + 2qr
More generally, consider the alleles A1 ... Ai given by the allele frequencies p1 to pi,
giving for all homozygotes:
and for all heterozygotes:
The Hardy–Weinberg principle may also be generalized to polyploid systems, that is to populations which have more than two copies of each chromosome. Consider again only two alleles. The diploid case is the binomial expansion of:
and therefore the polyploid case is the binomial expansion of:
where c is the ploidy, for example with tetraploid (c = 4):
The completely generalized formula is the multinomial expansion of :
The Hardy–Weinberg principle may be applied in two ways, either a population is assumed to be in Hardy–Weinberg proportions, in which the genotype frequencies for can be calculated, or if the genotype frequencies of all three genotypes are known, the can be tested for deviations that are statistically significant.
Suppose that the phenotypes of AA and Aa are indistinguishable i.e. that there is complete dominance. Assuming that the Hardy–Weinberg principle applies to the population, then q can still be calculated from f(aa):
and p can be calculated from q. And thus an estimate of f(AA) and f(Aa) derived from p^2 and 2pq respectively. Note however, such a population cannot be tested for equilibrium using the significance tests below because it is assumed a priori.
Testing deviation from the HWP is generally performed using Pearson's chi-squared test, using the observed genotype frequencies obtained from the data and the expected genotype frequencies obtained using the HWP. For systems where there are large numbers of alleles, this may result in data with many empty possible genotypes and low genotype counts, because there are often not enough individuals present in the sample to adequately represent all genotype classes. If this is the case, then the asymptotic assumption of the chi-square distribution, will no longer hold, and it may be necessary to use a form of Fisher's exact test, which requires a computer to solve.
These data are from E.B. Ford (1971) on the Scarlet tiger moth, for which the phenotypes of a sample of the population were recorded. Genotype-phenotype distinction is assumed to be negligibly small. The null hypothesis is that the population is in Hardy–Weinberg proportions, and the alternative hypothesis is that the population is not in Hardy–Weinberg proportions.
|Genotype||White-spotted (AA)||Intermediate (Aa)||Little spotting (aa)||Total|
From which allele frequencies can be calculated:
|q||= 1 - p|
|= 1 - 0.954|
So the Hardy–Weinberg expectation is:
Pearson's chi-square test states:
|= 0.001 + 0.073 + 0.756|
There is 1 degree of freedom. (degrees of freedom for χ2 squared tests are normally n - 1, where n is the number of genotype classes. However, an extra degree of freedom is lost because the expected values were calculated from the observed values). The 5% significance level for 1 degree of freedom is 3.84, and since the χ2 value is less than this, the null hypothesis that the population is in Hardy–Weinberg equilibrium is not rejected.
In F-statistics, the measure F is the observed frequency of heterozygotes over that expected from Hardy–Weinberg equilibrium:
where the expected value from Hardy–Weinberg equilibrium is given by
For example, for Ford's data above;
Mendelian genetics was rediscovered in 1900. However, it remained somewhat controversial for several years as it was not then known how it could cause continuous characters. Udny Yule (1902) argued against Mendelism because he thought that dominant alleles would increase in the population. The American William E. Castle (1903) showed that without selection, the genotype frequencies would remain stable. Karl Pearson (1903) found one equilibrium position with values of p = q = 0.5. Reginald Punnett, unable to counter Yule's point, introduced the problem to G. H. Hardy, a British mathematician, with whom he played cricket. Hardy was a pure mathematician and held applied mathematics in some contempt; his view of biologists use of mathematics comes across in his 1908 paper where he describes this as "very simple".
The principle was thus known as Hardy's law in the English-speaking world until Stern (1943) pointed out that it had first been formulated independently in 1908 by the German physician Wilhelm Weinberg (see Crow 1999).