Gene cluster

Last updated

A gene family is a set of homologous genes within one organism. A gene cluster is a group of two or more genes found within an organism's DNA that encode similar polypeptides, or proteins, which collectively share a generalized function and are often located within a few thousand base pairs of each other. The size of gene clusters can vary significantly, from a few genes to several hundred genes. [1] Portions of the DNA sequence of each gene within a gene cluster are found to be identical; however, the resulting protein of each gene is distinctive from the resulting protein of another gene within the cluster. Genes found in a gene cluster may be observed near one another on the same chromosome or on different, but homologous chromosomes. An example of a gene cluster is the Hox gene, which is made up of eight genes and is part of the Homeobox gene family.

Contents

Hox genes have been observed among various phylum. Eight genes make up the Hox gene Drosophila. The number of Hox genes may vary among organisms, but the Hox genes collectively make up the Homeobox family. Hox-genes-drosophila.jpg
Hox genes have been observed among various phylum. Eight genes make up the Hox gene Drosophila. The number of Hox genes may vary among organisms, but the Hox genes collectively make up the Homeobox family.

Formation

Historically, four models have been proposed for the formation and persistence of gene clusters.

Gene duplication and divergence

This model has been generally accepted since the mid-1970s. It postulates that gene clusters were formed as a result of gene duplication and divergence. [2] These gene clusters include the Hox gene cluster, the human β-globin gene cluster, and four clustered human growth hormone (hGH)/chorionic somatomammotropin genes. [3]

Conserved gene clusters, such as Hox and the human β-globin gene cluster, may be formed as a result of the process of gene duplication and divergence. A gene is duplicated during cell division, so that its descendants have two end-to-end copies of the gene where it had one copy, initially coding for the same protein or otherwise having the same function. In the course of subsequent evolution, they diverge, so that the products they code for have different but related functions, with the genes still being adjacent on the chromosome. [4] Ohno theorized that the origin of new genes during evolution was dependent on gene duplication. If only a single copy of a gene existed in the genome of a species, the proteins transcribed from this gene would be essential to their survival. Because there was only a single copy of the gene, they could not undergo mutations which would potentially result in new genes; however, gene duplication allows essential genes to undergo mutations in the duplicated copy, which would ultimately give rise to new genes over the course of evolution.

[5] Mutations in the duplicated copy were tolerated because the original copy contained genetic information for the essential gene's function. Species who have gene clusters have a selective evolutionary advantage because natural selection must keep the genes together. [1] [6] Over a short span of time, the new genetic information exhibited by the duplicated copy of the essential gene would not serve a practical advantage; however, over a long, evolutionary time period, the genetic information in the duplicated copy may undergo additional and drastic mutations in which the proteins of the duplicated gene served a different role than those of the original essential gene. [5] Over the long, evolutionary time period, the two similar genes would diverge so the proteins of each gene were unique in their functions. Hox gene clusters, ranging in various sizes, are found among several phyla.

Hox cluster

When gene duplication occurs to produce a gene cluster, one or multiple genes may be duplicated at once. In the case of the Hox gene, a shared ancestral ProtoHox cluster was duplicated, resulting in genetic clusters in the Hox gene as well as the ParaHox gene, an evolutionary sister complex of the Hox gene. [7] It is unknown the exact number of genes contained in the duplicated Protohox cluster; however, models exist suggesting that the duplicated Protohox cluster originally contained four, three, or two genes. [8]

In the case where a gene cluster is duplicated, some genes may be lost. Loss of genes is dependent of the number of genes originating in the gene cluster. In the four gene model, the ProtoHox cluster contained four genes which resulted in two twin clusters: the Hox cluster and the ParaHox cluster. [7] As its name indicates, the two gene model gave rise to the Hox cluster and the ParaHox cluster as a result of the ProtoHox cluster which contained only two genes. The three gene model was originally proposed in conjunction with the four gene model; [8] however, rather than the Hox cluster and the ParaHox cluster resulting from a cluster containing three genes, the Hox cluster and ParaHox cluster were as a result of single gene tandem duplication, identical genes found adjacent on the same chromosome. [7] This was independent of duplication of the ancestral ProtoHox cluster.

Intrachromosomal duplication is the duplication of genes within the same chromosome over the course of evolution (a-1). Mutations may occur in the duplicated copy, such as observed with the substitution of guanine with adenine (a-2). Alignment of DNA sequences exhibits homology between the two chromosomes (a-3). All segments were duplicated from the same ancestral DNA sequence as observed by the comparisons in b(i-iii). Intrachromosomal duplication.jpg
Intrachromosomal duplication is the duplication of genes within the same chromosome over the course of evolution (a-1). Mutations may occur in the duplicated copy, such as observed with the substitution of guanine with adenine (a-2). Alignment of DNA sequences exhibits homology between the two chromosomes (a-3). All segments were duplicated from the same ancestral DNA sequence as observed by the comparisons in b(i-iii).

Cis vs. trans duplication

Gene duplication may occur via cis-duplication or trans duplication. Cis-duplication, or intrachromosomal duplication, entails the duplication of genes within the same chromosome whereas trans duplication, or interchromosomal duplication, consists of duplicating genes on neighboring but separate chromosomes. [7] The formations of the Hox cluster and of the ParaHox cluster were results of intrachromosomal duplication, although they were initially thought to be interchromosomal. [8]

Fisher Model

The Fisher Model was proposed in 1930 by Ronald Fisher. Under the Fisher Model, gene clusters are a result of two alleles working well with one another. In other words, gene clusters may exhibit co-adaptation. [3] The Fisher Model was considered unlikely and later dismissed as an explanation for gene cluster formation. [2] [3]

Coregulation Model

Under the coregulation model, genes are organized into clusters, each consisting of a single promoter and a cluster of coding sequences, which are therefore co-regulated, showing coordinated gene expression. [3] Coordinated gene expression was once considered to be the most common mechanism driving the formation of gene clusters. [1] However coregulation and thus coordinated gene expression cannot drive the formation of gene clusters. [3]

Molarity Model

The Molarity Model considers the constraints of cell size. Transcribing and translating genes together is beneficial to the cell. [9] thus the formation of clustered genes generates a high local concentration of cytoplasmic protein products. Spatial segregation of protein products has been observed in bacteria; however, the Molarity Model does not consider co-transcription or distribution of genes found within an operon. [2]

Gene clusters vs. tandem arrays

Tandem duplication is the process in which one gene is duplicated and the resulting copy is found adjacent to the original gene. Tandemly arrayed genes are formed as a result of tandem duplications. Gene-duplication-notext.png
Tandem duplication is the process in which one gene is duplicated and the resulting copy is found adjacent to the original gene. Tandemly arrayed genes are formed as a result of tandem duplications.

Repeated genes can occur in two major patterns: gene clusters and tandem arrays, or formerly called tandemly arrayed genes. Although similar, gene clusters and tandemly arrayed genes may be distinguished from one another.

Gene Clusters

Gene clusters are found to be close to one another when observed on the same chromosome. They are dispersed randomly; however, gene clusters are normally within, at most, a few thousand bases of each other. The distance between each gene in the gene cluster can vary. The DNA found between each repeated gene in the gene cluster is non-conserved. [10] Portions of the DNA sequence of a gene is found to be identical in genes contained in a gene cluster. [5] Gene conversion is the only method in which gene clusters may become homogenized. Although the size of a gene cluster may vary, it rarely comprises more than 50 genes, making clusters stable in number. Gene clusters change over a long evolutionary time period, which does not result in genetic complexity. [10]

Tandem arrays

Tandem arrays are a group of genes with the same or similar function that are repeated consecutively without space between each gene. The genes are organized in the same orientation. [10] Unlike gene clusters, tandemly arrayed genes are found to consist of consecutive, identical repeats, separated only by a nontranscribed spacer region.

[11] While the genes contained in a gene cluster encode for similar proteins, identical proteins or functional RNAs are encoded by tandemly arrayed genes. Unequal recombination, which changes the number of repeats by placing duplicated genes next to the original gene. Unlike gene clusters, tandemly arrayed genes rapidly change in response to the needs of the environment, causing an increase in genetic complexity. [11]

Gene conversion allows tandemly arrayed genes to become homogenized, or identical. [11] Gene conversion may be allelic or ectopic. Allelic gene conversion occurs when one allele of a gene is converted to the other allele as a result of mismatch base pairing during meiosis homologous recombination. [12] Ectopic gene conversion occurs when one homologous DNA sequence is replaced by another. Ectopic gene conversion is the driving force for concerted evolution of gene families. [13]

Tandemly arrayed genes are essential to maintain large gene families, such as ribosomal RNA. In the eukaryotic genome, tandemly arrayed genes make up ribosomal RNA. Tandemly repeated rRNAs are essential to maintain the RNA transcript. One RNA gene may not be able to provide a sufficient amount of RNA. In this situation, tandem repeats of the gene allow a sufficient amount of RNA to be provided. For example, human embryonic cells contain 5-10 million ribosomes and double in number within 24 hours. In order to provide a substantive number of ribosomes, multiple RNA polymerases must consecutively transcribe multiple rRNA genes. [11]

Related Research Articles

<span class="mw-page-title-main">Mutation</span> Alteration in the nucleotide sequence of a genome

In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mitosis, or meiosis or other types of damage to DNA, which then may undergo error-prone repair, cause an error during other forms of repair, or cause an error during replication. Mutations may also result from insertion or deletion of segments of DNA due to mobile genetic elements.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

<span class="mw-page-title-main">Homeobox</span> DNA pattern affecting anatomy development

A homeobox is a DNA sequence, around 180 base pairs long, that regulates large-scale anatomical features in the early stages of embryonic development. Mutations in a homeobox may change large-scale anatomical features of the full-grown organism.

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

<span class="mw-page-title-main">Pseudogene</span> Functionless relative of a gene

Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by gene duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA.

Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

<span class="mw-page-title-main">Gene family</span> Set of several similar genes

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

<span class="mw-page-title-main">Ribosomal DNA</span>

Ribosomal DNA (rDNA) is a DNA sequence that codes for ribosomal RNA. These sequences regulate transcription initiation and amplification, and contain both transcribed and non-transcribed spacer segments.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

Tandemly arrayed genes (TAGs) are a gene cluster created by tandem duplications, a process in which one gene is duplicated and the copy is found adjacent to the original. They serve to encode large numbers of genes at a time.

Hox genes, a subset of homeobox genes, are a group of related genes that specify regions of the body plan of an embryo along the head-tail axis of animals. Hox proteins encode and specify the characteristics of 'position', ensuring that the correct structures form in the correct places of the body. For example, Hox genes in insects specify which appendages form on a segment, and Hox genes in vertebrates specify the types and shape of vertebrae that will form. In segmented animals, Hox proteins thus confer segmental or positional identity, but do not form the actual segments themselves.

<span class="mw-page-title-main">Copy number variation</span> Repeated DNA variation between individuals

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of duplication or deletion event that affects a considerable number of base pairs. Approximately two-thirds of the entire human genome may be composed of repeats and 4.8–9.5% of the human genome can be classified as copy number variations. In mammals, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene can have several different meanings. The Mendelian gene is a basic unit of heredity and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

<span class="mw-page-title-main">Masatoshi Nei</span> Japanese-American geneticist (1931–2023)

Masatoshi Nei was a Japanese-born American evolutionary biologist.

The 2R hypothesis or Ohno's hypothesis, first proposed by Susumu Ohno in 1970, is a hypothesis that the genomes of the early vertebrate lineage underwent two complete genome duplications, and thus modern vertebrate genomes reflect paleopolyploidy. The name derives from the 2 rounds of duplication originally hypothesized by Ohno, but refined in a 1994 version, and the term 2R hypothesis was probably coined in 1999. Variations in the number and timings of genome duplications typically still are referred to as examples of the 2R hypothesis.

<span class="mw-page-title-main">HOXD13</span> Protein

Homeobox protein Hox-D13 is a protein that in humans is encoded by the HOXD13 gene. This gene belongs to the homeobox family of genes. The homeobox genes encode a highly conserved family of transcription factors that play an important role in morphogenesis in all multicellular organisms.

<span class="mw-page-title-main">Gene redundancy</span>

Gene redundancy is the existence of multiple genes in the genome of an organism that perform the same function. Gene redundancy can result from gene duplication. Such duplication events are responsible for many sets of paralogous genes. When an individual gene in such a set is disrupted by mutation or targeted knockout, there can be little effect on phenotype as a result of gene redundancy, whereas the effect is large for the knockout of a gene with only one copy. Gene knockout is a method utilized in some studies aiming to characterize the maintenance and fitness effects functional overlap.

Evolution by gene duplication is an event by which a gene or part of a gene can have two identical copies that can not be distinguished from each other. This phenomenon is understood to be an important source of novelty in evolution, providing for an expanded repertoire of molecular activities. The underlying mutational event of duplication may be a conventional gene duplication mutation within a chromosome, or a larger-scale event involving whole chromosomes (aneuploidy) or whole genomes (polyploidy). A classic view, owing to Susumu Ohno, which is known as Ohno model, he explains how duplication creates redundancy, the redundant copy accumulates beneficial mutations which provides fuel for innovation. Knowledge of evolution by gene duplication has advanced more rapidly in the past 15 years due to new genomic data, more powerful computational methods of comparative inference, and new evolutionary models.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

References

  1. 1 2 3 Yi G, Sze SH, Thon MR (May 2007). "Identifying clusters of functionally related genes in genomes". Bioinformatics. 23 (9): 1053–60. doi:10.1093/bioinformatics/btl673. PMID   17237058.
  2. 1 2 3 Lawrence J (December 1999). "Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes" (PDF). Current Opinion in Genetics & Development. 9 (6): 642–8. doi:10.1016/s0959-437x(99)00025-8. PMID   10607610. Archived from the original (PDF) on 2010-05-28.
  3. 1 2 3 4 5 Lawrence JG, Roth JR (August 1996). "Selfish operons: horizontal transfer may drive the evolution of gene clusters". Genetics. 143 (4): 1843–60. doi:10.1093/genetics/143.4.1843. PMC   1207444 . PMID   8844169.
  4. Ohno S (1970). Evolution by gene duplication. Springer-Verlag. ISBN   978-0-04-575015-3.
  5. 1 2 3 Klug W, Cummings M, Spencer C, Pallodino M (2009). "Chromosome Mutations: Variation in chromosome number and arrangement". In Wilbur B (ed.). Concepts of Genetics (9 ed.). San Francisco, CA: Pearson Benjamin Cumming. pp. 213–214. ISBN   978-0-321-54098-0.
  6. Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N (March 1999). "The use of gene clusters to infer functional coupling". Proceedings of the National Academy of Sciences of the United States of America. 96 (6): 2896–901. Bibcode:1999PNAS...96.2896O. doi: 10.1073/pnas.96.6.2896 . PMC   15866 . PMID   10077608.
  7. 1 2 3 4 Garcia-Fernàndez J (February 2005). "Hox, ParaHox, ProtoHox: facts and guesses". Heredity. 94 (2): 145–52. doi: 10.1038/sj.hdy.6800621 . PMID   15578045.
  8. 1 2 3 Garcia-Fernàndez J (December 2005). "The genesis and evolution of homeobox gene clusters". Nature Reviews. Genetics. 6 (12): 881–92. doi:10.1038/nrg1723. PMID   16341069. S2CID   42823485.
  9. Gómez MJ, Cases I, Valencia A (2004). "Gene order in Prokaryotes: conservation and implications". In Vicente M, Tamames J, Valencia A, Mingorance J (eds.). Molecules in Time and Space: Bacterial Shape, Division, and Phylogeny. New York: Klumer Academic/Plenum Publishers. pp. 221–224. doi:10.1007/0-306-48579-6_11. ISBN   978-0-306-48578-7.
  10. 1 2 3 Graham GJ (July 1995). "Tandem genes and clustered genes". Journal of Theoretical Biology. 175 (1): 71–87. Bibcode:1995JThBi.175...71G. doi:10.1006/jtbi.1995.0122. PMID   7564393.
  11. 1 2 3 4 Lodish H, Berk A, Kaiser C, Krieger M, Bretscher A, Ploegh H, Amon A, Scott M (2013). "Genes, Genomics, and Chromosomes". Molecular Cell Biology (7th ed.). New York: W.H. Freeman Company. pp. 227–230. ISBN   978-1-4292-3413-9.
  12. Galtier N, Piganeau G, Mouchiroud D, Duret L (October 2001). "GC-content evolution in mammalian genomes: the biased gene conversion hypothesis". Genetics. 159 (2): 907–11. doi:10.1093/genetics/159.2.907. PMC   1461818 . PMID   11693127.
  13. Duret L, Galtier N (2009). "Biased gene conversion and the evolution of mammalian genomic landscapes". Annual Review of Genomics and Human Genetics. 10: 285–311. doi:10.1146/annurev-genom-082908-150001. PMID   19630562. S2CID   9126286.