Albert Erives | |
---|---|
Born | Adalberto Jorge Erives March 4, 1972 San Fernando, California, U.S. |
Alma mater | California Institute of Technology, University of California, Berkeley |
Known for | Gene regulation, molecular evolution, genomics |
Awards | NSF CAREER award |
Scientific career | |
Fields | Biology |
Institutions | University of Iowa, Dartmouth College |
Doctoral advisor | Michael Levine |
Albert Erives (born March 4, 1972) is a developmental geneticist who studies transcriptional enhancers underlying animal development and diseases of development (cancers). Erives also proposed the pacRNA model for the dual origin of the genetic code and universal homochirality. [1] He is known for work at the intersection of genetics, evolution, developmental biology, and gene regulation. [2] [3] [4] [5] He has worked at the California Institute of Technology, University of California, Berkeley, and Dartmouth College, and is an associate professor at the University of Iowa.
Erives has shown how genes of the nucleocytoplasmic large DNA viruses inform on intermediate steps in the evolution of the linear, chromatinized eukaryotic chromosome and its mechanisms of gene regulation. [6] [7]
Erives' major work is on “regulatory grammars” for transcriptional enhancers underlying animal development and cancer diseases. Exploiting assemblies for animal genomes, Erives discovered complex gene regulatory codes underlie non-homologous subsets of mechanistically equivalent enhancers. [3] [4] [8] These codes are composed of a combinatorial “lexicon” of transcription factor (TF) binding sites, functional inflections of those binding sites (so-called “specialized sites” constrained for binding affinity and competition by multiple TFs), and complex site ordering (orientation and positional spacing of those sites). The relationship of these complex regulatory codes within a nucleosomal "regulatory reading frame" is a key goal. [9] His lab's work also elucidated how a mutational mechanism (microsatellite repeat slippage) plays a significant evolutionary role in functionally adjusting complex binding site arrangements that recruit poly-glutamine rich factors. [2] [5] [10] Correspondingly, Erives lab has pioneered the identification of novel poly-glutamine complex recruiting enhancers that integrate developmental signals, [9] while also identifying polyQ allelic series for key developmental factors targeting those enhancers. [11]
A significant implication of this work is that gene regulatory networks largely evolve by indels in both cis and trans (in enhancer DNAs and polyQ-encoding genes, respectively). As indels are largely produced by unstable microsatellite repeats, which are fast-evolving and difficult to genotype accurately, a large compartment of functional genetic variation is not treated by genome-wide association studies, which focus on single nucleotide polymorphisms and at most a subset of non-repeat associated indels.
Erives and colleagues determined how different morphogen gradient responses are encoded in DNA sequence. [2] [5] They did so by using diverse Drosophila species that have different sized eggs to study how a set of structured enhancers [4] [8] would have co-evolved or co-adapted to changes in the concentration gradients. Morphogen gradient systems are a core fundamental subject of developmental biology. Models of how morphogen gradient responses were encoded had previously been proposed but not tested across a set of unrelated enhancers constructed from a shared regulatory grammar and located throughout a genome.
Three major unexpected findings resulted from this work. The first finding is that gradient responses in general do not evolve by changes in transcription factor (TF) binding site quality or quantity (site density) as expected, but rather by changes in the precise spacing between binding sites for morphogenic TFs and their partner TFs. [2] The second finding is that homotypic site clustering at such enhancers was largely the result of a complex evolutionary history of selection for different threshold responses in the evolving insect egg. [5] A third-related finding is that frequent selection for different responses also enriches for microsatellite repeat tracts, which are inherently unstable and most responsible for the production of novel indel alleles. [5] [10]
Erives' work also showed the existence of inherent spatial-temporal conflict in morphogenic responses and how this is handled in nature via complementary morphogenic gradients. [12] [13]
Using insights gleaned from archaeal genomes, Erives elaborated and described a stereochemical model of "proto-anti-codon RNAs" (pacRNAs). [1] The pacRNA model ascribes a predetermined combined origin for the universal genetic code (i.e., the codon table), the biogenic amino acids, and their exclusive homochirality in life. The model implies that early RNA world was an aminoacylated RNA world and that proteinogenic amino acids arose because of compatible interactions with nucleotide-based polymers. The pacRNA model explicitly lists possible interactions between various anti-codon di-nucleotide and tri-nucleotide sequences and different amino acids. When the nucleotides are D-ribose based, L-amino acids are preferred.
In the pacRNA world, codons originate as cis-elements for recruiting self-aminoacylated pacRNAs/proto-tRNAs. Thus, a curious aspect of this model is that the (anti-) codon table is determined in evolutionary history prior to the origin of ribosome-based protein translation. The pacRNA model may explain why extant tRNAs are heavily modified in all three domains of life.
Erives first presented the pacRNA model at NASAs 2012 Astrobiology Science Conference [14] and most recently at the 2013 Iowa City Darwin Day festival, [15] which focused on the origins of life on Earth.
Like Erives' enhancer studies, which focus on how protein complexes interact with enhancer DNAs, his pacRNA work focuses on how biogenic amino acids would have beneficially interacted with the nucleotide-based molecules of early life. Both areas of study demonstrate how complex patterns in linear molecules emerge from interactions in 3-dimensions.
With his doctoral advisor Michael Levine, Erives authored several papers on ascidian developmental genetics, with key insights into the evolution of the proto-vertebrate body plan. [16] [17] [18] This work used the Ciona system to generate copious amounts of embryos that were then electroporated with enhancer DNAs.
In collaboration with Nori Satoh's lab at the University of Kyoto in Japan, where Erives spent a winter doing research, they also identified the largest collection of notochord specific genes by using genetically altered Ciona over-expressing the Brachyury transcription factor. [19] The notochord is a defining evolutionary innovation of the chordate body plan and this work was designed to advance understanding of the morphogenetic signals emanating from this important developmental and structural tissue.
In 2001, Erives co-founded the Caltech-associated company CodeGrok (code "grok") [20] with Paul Mineiro, currently a Principal Research Software Developer for Microsoft. It was started in Pasadena, California but later moved to Berkeley, California after its second round of financing. In its first three years, CodeGrok developed and used machine learning methods to identify, classify, and clone transcriptional enhancers from the human genome and construct pathway-specific cell-based reporters for drug screening and other applications. The company took its name from the Robert Heinlein novel Stranger in a Strange Land and its concept of grok , which is to understand something deeply and intuitively, in reference to the goal of "grokking" the regulatory code of the human genome. While the company is no longer in existence, it is often cited as a humorous example of what not to do in naming a start-up company as many people were unable to pronounce the name. [21]
The genetic code is the set of rules used by living cells to translate information encoded within genetic material into proteins. Translation is accomplished by the ribosome, which links proteinogenic amino acids in an order specified by messenger RNA (mRNA), using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.
In molecular biology, a stop codon is a codon that signals the termination of the translation process of the current protein. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide chain, which may ultimately become a protein; stop codons signal the termination of this process by binding release factors, which cause the ribosomal subunits to disassociate, releasing the amino acid chain.
Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.
The coding region of a gene, also known as the coding sequence(CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation.
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.
In biology, translation is the process in living cells in which proteins are produced using RNA molecules as templates. The generated protein is a sequence of amino acids. This sequence is determined by the sequence of nucleotides in the RNA. The nucleotides are considered three at a time. Each such triple results in addition of one specific amino acid to the protein being generated. The matching from nucleotide triple to amino acid is called the genetic code. The translation is performed by a large complex of functional RNA and proteins called ribosomes. The entire process is called gene expression.
The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.
In genetics, an insertion is the addition of one or more nucleotide base pairs into a DNA sequence. This can often happen in microsatellite regions due to the DNA polymerase slipping. Insertions can be anywhere in size from one base pair incorrectly inserted into a DNA sequence to a section of one chromosome inserted into another. The mechanism of the smallest single base insertion mutations is believed to be through base-pair separation between the template and primer strands followed by non-neighbor base stacking, which can occur locally within the DNA polymerase active site. On a chromosome level, an insertion refers to the insertion of a larger sequence into a chromosome. This can happen due to unequal crossover during meiosis.
A synonymous substitution is the evolutionary substitution of one base for another in an exon of a gene coding for a protein, such that the produced amino acid sequence is not modified. This is possible because the genetic code is "degenerate", meaning that some amino acids are coded for by more than one three-base-pair codon; since some of the codons for a given amino acid differ by just one base pair from others coding for the same amino acid, a mutation that replaces the "normal" base by one of the alternatives will result in incorporation of the same amino acid into the growing polypeptide chain when the gene is translated. Synonymous substitutions and mutations affecting noncoding DNA are often considered silent mutations; however, it is not always the case that the mutation is silent.
Hox genes, a subset of homeobox genes, are a group of related genes that specify regions of the body plan of an embryo along the head-tail axis of animals. Hox proteins encode and specify the characteristics of 'position', ensuring that the correct structures form in the correct places of the body. For example, Hox genes in insects specify which appendages form on a segment, and Hox genes in vertebrates specify the types and shape of vertebrae that will form. In segmented animals, Hox proteins thus confer segmental or positional identity, but do not form the actual segments themselves.
In biology, the word gene can have several different meanings. The Mendelian gene is a basic unit of heredity and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.
Paired box protein Pax-6, also known as aniridia type II protein (AN2) or oculorhombin, is a protein that in humans is encoded by the PAX6 gene.
Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will be lost or will replace all other alleles of the gene. That loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.
Missense mRNA is a messenger RNA bearing one or more mutated codons that yield polypeptides with an amino acid sequence different from the wild-type or naturally occurring polypeptide. Missense mRNA molecules are created when template DNA strands or the mRNA strands themselves undergo a missense mutation in which a protein coding sequence is mutated and an altered amino acid sequence is coded for.
In evolutionary biology, robustness of a biological system is the persistence of a certain characteristic or trait in a system under perturbations or conditions of uncertainty. Robustness in development is known as canalization. According to the kind of perturbation involved, robustness can be classified as mutational, environmental, recombinational, or behavioral robustness etc. Robustness is achieved through the combination of many genetic and molecular mechanisms and can evolve by either direct or indirect selection. Several model systems have been developed to experimentally study robustness and its evolutionary consequences.
Michael Levine is an American developmental and cell biologist at Princeton University, where he is the Director of the Lewis-Sigler Institute for Integrative Genomics and a Professor of Molecular Biology.
The invertebrate mitochondrial code is a genetic code used by the mitochondrial genome of invertebrates. Mitochondria contain their own DNA and reproduce independently from their host cell. Variation in translation of the mitochondrial genetic code occurs when DNA codons result in non-standard amino acids has been identified in invertebrates, most notably arthropods. This variation has been helpful as a tool to improve upon the phylogenetic tree of invertebrates, like flatworms.
The ascidian mitochondrial code is a genetic code found in the mitochondria of Ascidia.