Skip to contents

This is the key for PS 7. Before reading this document, you should have completed the problems. Use this key to check and correct your work BEFORE submitting the corrected version via the google form.

You should compare your responses with this key and make any changes in another font color. Be sure to explain why you got things wrong (showing that now you understand) as well as providing corrected responses.

Question Key

  1. In humans, the MN blood group locus consists of 2 alleles: the M allele and the N allele. These alleles encode antigens expressed on the surface of red blood cells; the two alleles are codominant so that a heterozygote (MN) has both the M-antigen and the N-antigen. In a study in Poland, 3100 people were genotyped with the following results:

    1126 people were MM

    1446 people were MN

    528 people were NN

    Calculate each of the following:
    1. the observed genotype frequencies

    2. the allele frequencies

    3. the expected number of individuals of each genotype (assuming random mating)

    4. carry out a Chi-square test for goodness of fit to random-mating proportions (with 1 degree of freedom). Is this population in Hardy-Weinberg Proportions for this locus? Explain.

    To get the observed genotype frequencies, we just divide the number of individuals of that genotype by the total number of individuals. Here’s the calcultion for MM:

    obs freq MM = 1126 / 3100 = 0.36

    To get the expected frequency of a genotype, we have to figure out the allele frequencies and use those to calculate the HW expected fraction. To get allele frequencies, we can use our observed genotype frequencies as follows:

    For the M allele, 100% of the alleles in MM individuals have the M allele and half of the alleles in MN individuals are M alleles so we get:

    p = freq M = 0.36*1 + 0.47*0.5 = 0.6

    Then we can calculate the frequency of the N allele by doing 1 - p OR we can do it the same way we did with M:

    q = freq N = 0.17*1 + 0.47*0.5 = 0.4

    Now, we use our allele frequencies to find out how many MM, MN, and NN individuals we’d expect, given how many of each type of allele we have. We expect to get MM as often as you expect to draw two M alleles by chance, or p x p = p2 = 0.6 x 0.6 = 0.36.

    In this example, our expected frequency and our observed frequency turn out to be quite similar! For MN you use 2pq and for MM you use q2.

    Once we have expected frequencies, we multiply them by the total number of individuals sampled to see how many individuals we’d expect to have found with each genotype. For MM, 0.36 x 3100 = 1116 individuals expected.

    Our last calculation to do here is to say how different are our observed and expected numbers of individuals. This is where we do the calculation \(\frac{(O-E)^2}{E}\). We subtract the expected number from the observed, square the value, and divide by the expected number. Here’s the calculation for MM:

    (1126 - 1116)2/1116 = 102/1116 = 0.089

    We perform the calculation for each genotype and then sum them up, as in the table below, to get our \(\chi^2\) value of 3.339.

    Genotype ObsNo ObsFreq ExpFreq ExpNo (O-E)^2/E
    MM 1126 0.36 0.36 1116 0.089
    MN 1446 0.47 0.48 1488 1.185
    NN 528 0.17 0.16 496 2.065
    Total 3100 1.00 1.00 3100 3.339

    Finally, we interpret our value. This population is in Hardy-Weinberg Proportions for this locus since the estimated \(\chi^2\) value is less than the critical value of 3.84 for 1 df. Thus, the difference between observed and expected values is due only to chance with P > 0.05, meaning that there is more than a 5% chance that the difference between observed and expected is due to chance. Note that it’s important to provide not just the estimated test statistic but also the P-value as well as your interpretation of what this result means.


  2. Consider the data presented by Jorde and Wooding (2004; only 6 pages long). Do human populations differ genetically? How do we know?
  3. Jorde and Wooding provide a variety of ways to understand or measure genetic variation in the human population. For example, they provide the estimate that a randomly chosen pair of humans will differ from each other (on average) for about 1 in every 1000-1500 base pairs of DNA (this amounts to 2-3 million base pairs in the whole genome). This is low compared to many species (see the content video on genetic variation).

    The numbers just provided are about genetic variation in the whole human population but this question is asking whether DIFFERENT human populations are also genetically different from each other (measurably). So, if we consider the base pairs (or nucleotides) in the genome where humans vary one from another, what do we see? We find that there is some geographic signal in the data – people from different locations in the world do tend to differ for some places in the genome. In fact, about 10-15% of the variation in the whole species is correlated with geographic location (continental groups of Africa, Asia, Europe). Meanwhile, 85-90% of the variation within humanity is common to most human populations, it is shared by populations in different locations. As a result, we see that there is very little genetic difference between different peoples and much much more of our variation is found in all populations.

    These data result from sequencing different parts of the human genome and comparing them for human genome samples from all over the world. Jorde and Wooding describe 3 kinds of genetic loci (not genes but locations in the genome, may be non-coding DNA): STR polymorphisms (Short Tandem Repeats), restriction site polymorphisms, and Alu insertion polymorphisms. All of these loci represent sequences that do not code for proteins and thus will evolve neutrally, not likely to have a functional effect on the organism’s phenotype.

  4. Refer to both Jorde and Wooding (2004) and/or Sankar and Cho (2002; only 2 pages long) to answer the following questions:
    1. How might we define “race”? What does this term represent? Is it useful? Harmful? Does it have any biological meaning? How might we decide whether there’s biological relevance or what role biology might play here? (I’m not looking for some right answer. I would like you to reflect on this in light of the readings and your experiences—and also as potential biologists!)
    2. Responses will vary. Jorde and Wooding seem to largely argue that “race” is pretty much entirely a social construct that does more harm than good. They show evidence that the way that “race” is defined has changed enormously over time and given the contexts in which it is used is usually causing harm, defining differences that are then used predjudically. Given the actual data on human genetic variation, J&W seem to argue that race is not biologically based or relevant.

    3. What might be benefits or costs of using a term like “race” (or racial descriptors) when discussing human genetic variation? If not by using a term like race, do you think there are better ways we could describe groups of humans that differ genetically from each other?
    4. Responses will vary. As noted above, J&W give evidence (and you may already be aware) of the ways in which race has been used to cause harm historically as well as today. Since race has been used to divide and justify power differentials, it has great potential for harm. Those who argue for a benefit to using race as a descriptor may argue that there are health benefits, say to being aware that a given race has a higher or lower risk of some health conditions compared to other races. However, higher or lower risk doesn’t mean a person must get a condition or absolutely can’t have it. Nor does it mean that a medication absolutely will or will not work on a person of a given race. As noted above, there is TONS more variation between people overall compared to the number of differences between folks from different parts of the world. This reflects the fact that the characeristics that define race (often primarily skin color) are related to very few genes in the genome. Thus, it’s not logical to assume that you know the rest of someone’s genome (and therefore health risks) based just on their skin color. Given everything we’ve talked about all semester, I hope you can appreciate that the environment has an equally large role in affecting phenotype, especially for things like health. In terms of other ways to describe groups of humans, J&W suggest that the best way to do something like determine health risks is to examine the individual’s genotype information as well as their lifestyle factors that are known to impact a given health condition.


When you are finished, check your responses on the key for PS7.

Remember to sign the Honor Code on your assignment.