Institute of Molecular
Evolutionary Genetics
















Fall 2009


Previous IMEG Seminars and Abstracts:

Fall 2013

Spring 2013

Fall 2012


Spring 2012

Fall 2011

Fall 2010


Spring 2010
Fall 2009

Spring 2009

Fall 2008

Spring 2008

Fall 2007
Spring 2007
Fall 2006

Spring 2006
Fall 2005
Spring 2005

Fall 2004

Spring 2004

Fall 2003

Spring 2003
Fall 2002




Speaker: Dr. Doug Cavener – Penn State University – Department of Biology

Title: Evolution of regulating protein synthesis and why it matters to defining gene coding sequences


Abstract: Gene expression is highly regulated at the level of protein synthesis specifically at the translation initiation step. Regulation of translation initiation is mediated by the variation in mRNA structure and sequence and translation initiation factors. The 5’ untranslated region (5’UTR) of eukaryotic mRNAs is often long, structurally complex, and contain multiple upstream open reading frames (uORFs) and internal ribosome entry sites (IRES), which impacts the frequency and site of translation initiation by the ribosome. In addition translational regulatory proteins including the family of eIF2a kinases play a dominant role in regulating both the level of translation initiation and start site selection. Previously, I performed the first comparative analysis of start and stop codon sequence context among major eukaryotic groups and analyzed 5’ UTR sequences. More recently my research group has developed genetic model systems in mice to determine the functions of the eIF2 alpha kinases genes in mice. We have discovered that the eIF2 alpha kinase genes play diverse roles in metabolism, stress responses, development, and neurological functions. An important consequence of our work and others is that considerable variation in coding sequences within genes is caused by the use of alternative translation start and stop sites that are regulated by physiological and developmental factors.


Zhang, W., Feng, D.,  Li, Y., Iida, K., McGrath, B., Cavener, D. R. (2006) PERK EIF2AK3 control of pancreatic b cell differentiation and proliferation is required for postnatal glucose homeostasis. Cell Metabol. 4:491-497.


Cavener, D. R., Ray, S. C. (1991) Eukaryotic start and stop translation sites. Nuc. Acids Res. 19:3185-3192.


Hao, S., Sharp, J. W., Ross-Inta, C. M., McDaniel, B. J., Anthony, T. G. Wek, R. C.,  Cavener, D. R., McGrath, B. C., Rudell, J. B., Koehnle, T. J., Gietzen, D. W. (2005) Uncharged tRNA and sensing of amino acid deficiency in mammalian piriform cortex. Science 307:1776-1778.


Speaker: Dr. David Rand - Brown University - Department of Ecology & Evolutionary Biology

Title: Running hot and cold about balancing selection: thermal selection in flies and barnacles

Abstract: Balancing selection can explain the maintenance of genetic variation in populations.  The popularity of this model has waxed and waned over the years.  In this seminar I will present empirical data from two systems that makes a case for balancing selection, or certainly environmentally variable selection regimes, related to temperature. In /Drosophila melanogaster/, we did two distinct thermal selection experiments from two different stock populations and mapped thermal QTL.  Notably, a marker in the /shaggy/ locus at band 3A that was significantly
differentiated in both experiments and implicates a connection between circadian rhythms and thermotolerance. The same allele that was increased in frequency in the high temperature populations is significantly clinal in North America and is more common in Florida than in Maine.  In the acorn barnacle, /Semibalanus balanoides/, the Mpi locus has a common polymorphism that shows genotype-specific zonation in the intertidal, related to thermal stress.  Experimental transplants, as well as DNA sequence data of the Mpi locus, implicate a history of balancing selection, and its modulation by gene flow.  Together these systems implicate thermal selection as a likely source of genetic heterogeneity.

Schmidt, P. S. and Rand, D. M. (2001) Adaptive maintenance of genetic polymorphism in an intertidal barnacle: habitat – and life-stage-specific survivorship of MPI genotypes. Evolution: 55(7)1336-1344.

Schmidt, P. S. and Rand, D. M. (1999) Intertidal microhabitat and selection at MPI: Interlocus contrasts in the northern acorn barnacle, semibalanus balanoides. Evolution: 53(1) 135-146.

Schmidt, P. S., Bertness, M. D., and Rand, D. M. (2000) Environmental heterogeneity and balancing selection in the acorn barnacle Semibalanus balanoides. Proc. R. Soc. Lon. B. 267:379-384.

Rand, D. M., Spaeth, P. S., Sackton, T. B., and Schmidt, P. S. (2002) Ecological genetics of Mpi and Gpi polymorphisms in the acorn barnacle and the spatial scale of neutral and non-neutral variation. Integ. And Comp. Biol. 42:825–836.


Speaker: Zhaorong Ma – Penn State University – Department of Integrative BioSci

(Axtell Lab)

Title: Comparative study of Arabidopsis thaliana and Arabidopsis lyrata small RNAs

Abstract: Small RNAs are short non-coding RNAs that regulate gene expression post-transcriptionally. In this seminar I will present my research on two classes of plant small RNAs - microRNAs (miRNAs) and siRNAs by comparing two closely related Brassicaceae species Arabidopsis thaliana and Arabidopsis lyrata. We knew some plant miRNAs are newly evolved and lineage specific. Do this group of newly evolved miRNAs have the same level of evolutionary constraints compared to ancient and more conserved miRNAs? We classified Arabidopsis miRNAs into two groups: "Brassicaceae-specific" and "more conserved", based on whether they are identified only in Brassicaceae species or in other species as well. We found that Brassicaceae-specific miRNAs have greater divergence between Arabidopsis thaliana and Arabidopsis lyrata in MIRNA sequences and target complementarity sites, and have lower processing accuracy for miRNA/miRNA* production compared to more conserved miRNAs. Arabidopsis siRNAs were known to have "hotspots" of regions with high siRNA production, but it is not known whether these siRNA hotspots are retained between species. We compared siRNA hotspots in both genomes and found no evidence of retention of 24nt siRNA hotspots.


Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, Griffiths-Jones S, Jacobsen SE, Mallory AC, Martienssen RA, Poethig RS, Qi Y, Vaucheret H, Voinnet O,        
    Watanabe Y, Weigel D, Zhu JK (2008) Criteria for Annotation of Plant MicroRNAs. Plant Cell 20: 3186-3190.

Axtell MJ (2008) Evolution of microRNAs and their targets: Are all microRNAs biologically relevant? Biochim. Biophys. Acta. 1779, 725-734.

Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC (2007) Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biol. 5: e57.


Speaker: Cadhla Firth – Penn State University – Department of Biology

(Holmes Lab)

Title: A phylodynamic approach to the evolution of emerging infectious diseases

Abstract: Eighty-seven novel human pathogens have been described since 1980, with viruses comprising 75% of these.  As a result, there has been increasing interest in the processes that support and shape the emergence of new pathogens.  Studying the evolutionary trajectories of both host and pathogen during emergence can reveal population– and species-level changes in dynamics that characterize these events.  In particular, phylogenetic methods use the information in genetic data to add insight into emergence by exploring features such as: the direction and speed of geographic spread of a pathogen, changes in population sizes over time, the relationships between a group of hosts and their pathogens, as well as infer the timing of critical events such as cross-species transmission and global dissemination.  Here, the evolution of two emerging viral pathogens, one human (Hantaviruses) and one livestock (Porcine Circovirus 2), will be explored to help determine the origin of these pathogens, as well as the timeframe and geographic context under which they have emerged.


Holmes EC (2008) Evolutionary history and phylogeography of human viruses. Annu Rev Microbiol. 62: 307-328.


Finsterbusch T, Mankertz A (2009) Porcine circoviruses—small but powerful. Virus Res. 143: 177-183.


Ramsden C, Holmes EC, Charleston MA (2009) Hantavirus evolution in relation to its rodent and insectivore hosts: no evidence for codivergence. Mol Biol Evol. 26: 143-153.


Speaker: MARKER LECTURE - Dr. John Doebly, University of Wisconsin, 4:00 PM, 100 Berg Auditorium

Host: Blair Hedges

Title: Darwin and Domestication


Abstract: In his book “On the Origin of Species”, Charles Darwin used plant and animal domestication as a model to inform his theory on evolution under natural selection.  Artificial selection during plant domestication is thought to have been largely unconscious, the inevitable product of a sowing-reaping cycle. Selection pressures placed by humans on crops are analogous to those placed by seed-dispersers such as birds on wild species.  Nevertheless, Darwin’s use of domestication as a model for natural evolution has been controversial.  Over the past 20 years, genetic and molecular research has begun to uncover the genetic basis of the changes involved in the evolution of plant form under both natural and artificial selection.  In the case of domestication, approximately 15 genes involved in the changes in morphology have been isolated. For most of these genes, the nature of the alteration in the gene is understood.  I will review what has been learned about change in form under domestication and whether any patterns are beginning to emerge.


Speaker: MARKER LECTURE - Dr. John Doebly, University of Wisconsin, 4:00 PM, 100 Berg Auditorium

Host: Blair Hedges

Title: Unraveling a developmental pathway involved in maize domestication


Abstract: Maize is a domesticated form of a wild Mexican grass called teosinte.  The domestication of maize from teosinte occurred about 8,000 years ago.  As a result of human (artificial) selection during the domestication process, dramatic changes in morphology arose such that maize no longer closely resembles its teosinte ancestor in ear and plant architecture.  Quantitative trait locus (QTL) mapping has shown that many genes contributed to the differences between maize and teosinte, but among these are several of very large effect.  We have cloned and analyzed two of these large-effect genes.  teosinte branched (tb1) is largely responsible for the difference between the long branches of teosinte versus the short branches of maize.  tb1 encodes a transcriptional regulator that functions as a repressor of branch elongation. Gene expression analysis indicates that the product of the teosinte allele of tb1 accumulates at about half the level of the maize allele.  Fine-mapping experiments show that the differences in phenotype and gene expression are controlled by an enhancer that is 65 kb upstream of the ORF. teosinte glume architecture (tga1) is largely responsible for the formation of a casing that surrounds teosinte seeds but is lacking in maize.  tga1 also encodes a transcriptional regulator, however in this case a single amino acid change represents the functional difference between maize and teosinte.  This single amino acid change appears to convert the maize allele into a transcriptional repressor of target genes.  Analysis of the interactions between tb1, tga1 and other domestication genes indicates that they form a cascade of transcriptional regulators that were a target of human selection during the domestication process.


Speaker: Dr. Charles Addo-Quaye  – Penn State University – Department of Biology

(Axtell Lab)

Title: Transcriptome-wide detection of cleaved RNA targets of small silencing RNAs in plants by using the degradome sequencing method

Abstract: Small silencing RNAs are 20-30 nucleotides (nts) long non-coding RNA sequences which play a critical role in gene and genome regulation in eukaryotes. The two main modes of post transcriptional gene regulation are the suppression of translation and the cleavage of targeted RNA transcripts. microRNAs (miRNAs) are a major category of small silencing RNAs and are usually 21-24nt long. In plants, the predominant role of well-characterized miRNAs is the cleavage of messenger RNAs of members of gene families of transcription factors and other regulatory genes involved in growth and development. Finding the targets of a miRNA is essential to discovering its biological significance. In this talk, I would discuss the degradome sequencing method we designed and implemented for the global detection of cleaved targets of small silencing RNAs and the Cleaveland computational pipeline used in the analysis of degradome sequences. The moss Physcomitrella patens (P. patens) is an important model organism for investigating the evolution of land plants. I would also be discussing the results of our analyses of degradome sequences derived from the Physcomitrella transcriptome.



Voinnet, Olivier. (2009). Origin, Biogenesis, and Activity of Plant MicroRNAs. Cell 136: 669–687.


Addo-Quaye, C., Eshoo, T.W., Bartel, D.P., and  Axtell, M.J. (2008). Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Curr. Biol., 18, 758–762.


Addo-Quaye, C., Miller, W., and Axtell, M.J. (2009). CleaveLand: A pipeline for using degradome data to find cleaved small RNA targets.  Bioinformatics 25: 130-131.



Speaker: Dr. Todd LaJeunesse – Penn State University – Department of Biology

Title: Evolutionary turnover of sequence variants in the ribosomal arrays of eukaryotes inferred from molecular ecological studies of coral endosymbionts.

Abstract: Ribosomal DNA sequences have provided the basis of phylogenetic reconstructions for most of the planet’s biota. Ironically, rRNA genes evolve differently from single copy nuclear or plastid genes.  The enigmatic processes of concerted evolution, molecular drive, and gene conversion appear to “homogenize” the sequences of 100’s to 1000’s of copies arrayed in tandem repeats found on one or multiple chromosomes in a typical eukaryotic genome.  In reality, the intragenomic arrays of most eukaryotes are not completely homogenized, but instead comprise numerous functional and non-functional sequence variants.  The bacterial cloning and sequencing of rDNA recovers this variation, which is often incorrectly interpreted as inter-individual differences.  However, the genomes of most individuals in a population appear to possess one variant that is numerically dominant.  The tracking of dominant intragenomic variants among species of endosymbiotic dinoflagellates associated with reef corals using denaturing gradient gel electrophoresis (DGGE) fingerprinting of the Internal Transcribed Spacers (ITS) reveals patterns suggestive of the mode and tempo of rDNA sequence turnover.  It would seem that rare variants, which differ from the dominant sequence by one base change, periodically proliferate to displace the dominant variant resulting in the deliberate stepwise divergence of rDNA among isolated populations.  These data may help refine molecular clock estimates of rDNA sequence evolution.



Thornhill DJ, LaJeunesse TC, Santos SR (2007) Measuring rDNA diversity in eukaryotic microbial systems: How intragenomic variation, pseudogenes, and PCR artifacts confound biodiversity estimates. Mol Ecol 16:5326-5340.

LaJeunesse TC, Pinzón JH  (2007) Screening intragenomic rDNA for dominant variants can provide a consistant retrieval of evolutionarily persistent ITS (rDNA) sequences. Molecular Phylogenetics and Evolution. 45:417-422.



Speaker: Dr. Christina Grozinger – Penn State University – Department of Entomology

Title: Genomics and evolution of chemical communication in social insects

Abstract: Chemical communication plays critical role regulating social behavior in honey bees.  Furthermore, this communication system is exquisitely tuned to the environmental context and physiological state of both the signaling and receiving animal, and thus represents a subtle and intricate system for coordinating the activities of thousands of individuals in a colony.  We seek to understand the molecular and physiological basis of modulation of chemical communication in honey bees, both in terms of production of the chemical signal and responsiveness of the receiving individual.  We are also extending these studies to other related species, to determine if the genes associated with pheromone response are conserved across species, and to elucidate the evolution of pheromonal regulation of social behavior.



Kocher, S.D., Richard, F.J., Tarpy, D.R., and C.M. Grozinger.  “Queen reproductive state modulates queen pheromone production and queen-worker interactions in honey bees” Behavioral Ecology. Advance Access published on July 2, 2009; doi:     doi:10.1093/beheco/arp090


Richard, F.J., Tarpy, D.R, and C.M. Grozinger. “Effects of insemination quantity on honey bee queen physiology”. PLoS ONE , 2(10):e980 (2007).


Grozinger, C. M., Sharabash, N. M., Whitfield, C. W. and Robinson, G. E. (2003) “Pheromone mediated gene expression in the honey bee brain.” Proc Natl Acad Sci U S A 100 (Suppl 2),14519-25.

Robinson, G.E., Grozinger, C.M., and Whitfield, C.W. (2005) “Social life in molecular terms.” Nat Gen Rev 6, 257-270.



Speaker: Dr. Trudy MacKay – North Carolina State University – Dept of Genetics

Huck Institute Lecture Series – 4:00 PM 100 Berg Auditorium

Title: Systems Genetics of Quantitative Traits in Drosophila

Population variation for quantitative traits is caused by segregating alleles at multiple interacting loci, with effects that are sensitive to the environment. Knowledge of the detailed genetic architecture of quantitative traits is important from the perspectives of evolutionary biology, human health and plant and animal breeding. Mapping quantitative trait loci to the level of individual genes and causal molecular variants is challenging because large numbers of individuals need to be assessed for the trait phenotype and a dense panel of polymorphic molecular markers in order to detect loci with modest effects; further, allelic effects can be sex-, environment- and genetic background-specific. Our understanding of the genetic architecture of quantitative traits will benefit from interrogating a single resource population for variation in DNA sequence, transcript abundance, proteins and metabolites; for multiple organismal phenotypes; and in multiple environments. This ‘systems genetics’ approach will yield a detailed map of genetic variants associated with each organismal phenotype in each environment; provide a functional context for interpreting the phenotypes; elucidate the genetic underpinnings that govern the interdependence of multiple phenotypes; and address the long-standing question of the genetic basis of genotype by environment interaction. The Drosophila Genetic Reference Panel (DGRP) is one such common resource population, which consists of 192 inbred lines derived from the Raleigh, USA population. The National Institutes of Health National Human Genome Research Institute has approved the sequencing of these lines by the Baylor College of Medicine Sequencing Center, using next generation sequencing technologies. The DGRP is a living library of common polymorphisms affecting complex traits, and a community resource for whole genome association mapping of quantitative trait loci. I will report the current status of the sequencing effort, as well as initial systems genetic analyses of several Drosophila life history traits. 




MacKay, T. F. C., Stone, E. A., Ayroles, J. F. (2009). The genetics of quantitative traits: challenges and prospects. Nat Rev Gene 10: 565 – 577.


Harbison, S. T., Carbone, M. A., Ayroles, E. A., Lyman, R. F., MacKay, T. F. C. (2009). Co-regulated transcriptional networks contribute to

natural genetic variation in Drosophila sleep. Nat Genet 41: 371 – 375.


Ayroles, J. F., Carbone, M. A.,  Stone, E. A., Jordan, K. W., Lyman, R. F.,  Magwire, M. M., Rollmann, S. M., Duncan, L. H., Lawrence, F.,  Anholt, R. R. H., Mackay, T. F. C. (2009). Systems genetics of complex traits in Drosophila melanogaster. Nat Genet 41: 299 – 307.


Speaker: Dr. Masafumi Nozawa – Penn State University – Department of Biology

(Nei Lab)
Title: Origin and evolution of microRNA genes in Drosophila species


Abstract: MicroRNA (miR) genes are known to regulate many genes at the posttranscriptional level. However, their origin and evolutionary processes after their birth are still unclear. I have therefore identified miR genes in 12 Drosophila species by using bioinformatics approach and examined their evolutionary mechanisms. The results showed that the extant and ancestral Drosophila species have >100 miR genes and frequent gains and losses of miR genes have occurred during evolution. A majority of gene gains generated new gene families, suggesting that many miR genes have originated from non-miR sequences. However, miR genes showed no sequence similarity to transposable elements or protein-coding genes. Instead, nearly half of miR genes were located within introns of protein-coding genes. These observations suggest that miR genes have largely originated from random hairpin structures or introns. I also found that new miR genes show a similar substitution rate to synonymous sites of protein-coding genes, implying that most of the “potential” miR genes may not have acquired any function yet and could become nonfunctional. By contrast, old miR genes showed a substitution rate much lower than protein-coding genes. There was a strong trend of substitution patterns that paired and unpaired sites in stem regions retain the same status even after substitutions during the evolution. Therefore, once miR genes acquired functions they appear to have evolved very slowly with keeping original structures over a long evolutionary period. This study revealed the contrast evolution of Drosophila miR genes between the short- and long-run.



Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-297.

Sempere, L.F., Cole, C.N., McPeek, M.A., and Peterson, K.J. 2006. The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zoolog B Mol Dev Evol 306: 575-588.

Lu, J., Shen, Y., Wu, Q., Kumar, S., He, B., Shi, S., Carthew, R.W., Wang, S.M., and Wu, C.I. 2008. The birth and death of microRNA genes in Drosophila. Nat Genet 40: 351-355.


Speaker: Yuannian Jiao – Penn State University – Department of Plant Biology

(dePamphilis Lab)

Title: The history of genome duplications in flowering plants: evidence from global gene family phylogenies


Abstract: There is strong evidence that the ancestors of major eudicot lineages have undergone one or more rounds of whole-genome duplication (WGD) following the divergence of monocots and eudicots. Although the occurrence of WGD event(s) is well accepted, the actual number, phylogenetic timing, and age of the event(s) remain equivocal. To address these issues, we built a phylogenomic pipeline to reconstruct the evolutionary relationships of 4433 gene families from the complete gene sets of Arabidopsis, Populus, Vitis, and Oryza. 1787 families were characterized by a surviving duplication shared by rosid I (Populus) and rosid II (Arabidopsis). These alignments were populated with unigenes of Asteridae and re-estimated the phylogenies to track potential WGD event(s) in eudicots, rosids, and asterids. Very little evidence was found to support large-scale duplications shared only by rosid I and rosid II, rejecting prior hypotheses of a rosid-wide WGD. The overwhelming majority of resolved duplications shared by rosid I/II were placed before the separation of rosids and asterids, providing evidence for WGD (($B!&(B early in eudicot evolution. Concentrations of gene duplications also suggested potential WGD events in the lineages leading to Solanaceae and to Asteraceae, but not across all Asteridae. Finally, our results support two rounds of WGD (($B!&(Band ($B!&(B in the Arabidopsis lineage after the divergence of rosid I/II. Global gene family phylogenies are a valuable complement to genome-scale structural analysis, incorporating extensive evidence even without conservation of gene order or a sequenced genome, and facilitate a better understanding of WGD events in eudicots.



Bowers JE, Chapman BA, Rong J, Paterson AH: Unravelling angiosperm genome evolution by phylogenetic analysis

       of chromosomal duplication events. Nature 2003, 422(6930):433-438.

Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH: Synteny and collinearity in plant genomes. Science 2008,





Speaker: Dr. Stephen Schaeffer – Penn State University – Department of Biology

Title: The Evolutionary Significance of Inversions in Natural Populations of Drosophila

Abstract: The Drosophila 12 genomes project has provided an opportunity to understand how evolution has shaped the organization of genes on chromosomes, but it is unclear what evolutionary forces allow new chromosomal rearrangements to invade natural populations.  Chromosomal rearrangements may play an important role in how populations adapt to a local environment. The gene arrangement polymorphism on the third chromosome of Drosophila pseudoobscura is a model system to help determine the role that inversions play in the evolution of this species. The gene arrangements are the likely target of strong selection because they form classical clines across diverse geographic habitats, they cycle in frequency over seasons, and they form stable equilibria in population cages. A numerical approach was developed to estimate the fitness sets for 15 gene arrangement karyotypes in six niches based on a model of selection–migration balance. Gene arrangement frequencies in the six different niches were able to reach a stable metapopulation equilibrium that matched the observed gene arrangement frequencies when recursions used the estimated fitnesses with a variety of initial inversion frequencies. These analyses show that a complex pattern of selection is operating in the six niches to maintain the D. pseudoobscura gene arrangement polymorphism. Models of local adaptation predict that the new inversion mutations were able to invade populations because they held combinations of two to 13 local adaptation loci together.


BHUTKAR, A. et al., 2008 Chromosomal rearrangement inferred from comparisons of twelve Drosophila genomes.

        Genetics 179: 1657-1680.

SCHAEFFER, S. W., 2008 Selection in heterogeneous environments maintains the gene arrangement polymorphism

        of Drosophila pseudoobscura. Evolution 62: 3082-3099.


Speaker: Yogeshwar Kelkar – Penn State University – Department of Integrative BioSci

(Makova Lab)

Title: What Should We Call a Microsatellite?

Microsatellites are repeats of short (1 to 6 bp) DNA motifs, and are ubiquitous in eukaryotic genomes. Microsatellites experience rapid insertion-deletion (of the motif) mutations  as they are hotspots for polymerase slippage. Microsatellite mutation rate estimates from pedigree studies and experimental assays range from  ~10-6 to ~10-2 mutations per locus per generation, orders of magnitude higher than for non-repetitive DNA (Ellegren 2000). Due to their high polymorphism levels, microsatellites are valuable genetic markers. While many (especially intergenic) microsatellites are thought to evolve neutrally, some, particularly the ones located within or in the vicinity of genes, are known to affect gene expression, splicing, or protein sequence (Li et al.), and have been implicated in many diseases (Pearson, Nichol Edamura, and Cleary).

            Previously we have shown that microsatellite size (repeat number) is a primary determinant of mono-,di-,tri-, and tetranucleotide microsatellite mutation rates (Kelkar et al.). One of the more debated issues pertaining to the very definition of microsatellites is, whether there is a critical (‘threshold’) size required for a repeat to be qualified as a microsatellite. Previous approaches to address this question involved phylogenetic observations of microsatellite growth, or inferences based on size-frequency distributions of microsatellites in genomes. In contrast, because the defining characteristic of microsatellites is the dynamic nature of their mutations, we used an operational definition of microsatellite threshold as the size at which the rate of polymerase slippage at a repeat significantly and sharply exceeds that of the background slippage process taking place at the smallest repeats in the genome. Here, we present a combined computational and experimental approach to determine the threshold value for [A/T]n mononucleotides, and for [TG/AC]n and [TC/AG]n dinucleotides. In our computational analysis, we assessed microsatellite polymorphism levels from the extensive re-sequencing of ten ENCODE regions in human populations ( The International Hapmap Consortium 2005). Our premise was that, presence of polymorphisms at repeats of a certain repeat- number reflects their dynamic mutation activity. In our experimental analysis, we modified our published HSV-tk in vitro mutagenesis system (Eckert, Yan, and Hile) to quantify DNA polymerase error frequencies within tandemly repeated sequences differing by increments of one unit. With this combined approach, we find evidence for existence of threshold sizes for all microsatellites investigated. Importantly our results indicate that microsatellite threshold is characterized by a minimal number of nucleotides, rather than a minimal number of repeats, irrespective of the size of the motif involved. With our approach, we aim to set an unambiguous standard for what loci should be called microsatellites in future studies.


References: The International Hapmap Consortium. 2005. A haplotype map of the human genome. Nature 437:1299-1320.

Eckert, K. A., G. Yan, and S. E. Hile. 2002. Mutation rate and specificity analysis of tetranucleotide microsatellite DNA alleles in somatic human cells. Mol Carcinog 34:140-150.

Ellegren, H. 2000. Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet 16:551-558.

Kelkar, Y. D., S. Tyekucheva, F. Chiaromonte, and K. D. Makova. 2008. The genome-wide determinants of human and chimpanzee microsatellite evolution. Genome Res 18:30-38.

Li, Y. C., A. B. Korol, T. Fahima, A. Beiles, and E. Nevo. 2002. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol 11:2453-2465.

Pearson, C. E., K. Nichol Edamura, and J. D. Cleary. 2005. Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet 6:729-742.