Genomic organization

Last updated
Genome sizes and corresponding composition of six major model organisms as pie charts. The increase in genome size correlates with the vast expansion of noncoding (i.e., intronic, intergenic, and interspersed repeat sequences) and repeat DNA (e.g., satellite, LINEs, short interspersed nuclear element (SINEs), DNA (Alu sequence), in red) sequences in more complex muiticellular organisms. This expansion is accompanied by an increase in the number of epigenetic mechanisms (particularly repressive) that regulate the genome. Expansion of the genome also correlates with an increase in size and complexity of transcription units, with the exception of plants. P = Promoter DNA element. GenomicOrganization 140 percent.jpg
Genome sizes and corresponding composition of six major model organisms as pie charts. The increase in genome size correlates with the vast expansion of noncoding (i.e., intronic, intergenic, and interspersed repeat sequences) and repeat DNA (e.g., satellite, LINEs, short interspersed nuclear element (SINEs), DNA (Alu sequence), in red) sequences in more complex muiticellular organisms. This expansion is accompanied by an increase in the number of epigenetic mechanisms (particularly repressive) that regulate the genome. Expansion of the genome also correlates with an increase in size and complexity of transcription units, with the exception of plants. P = Promoter DNA element.

The hereditary material i.e. DNA (deoxyribonucleic acid) of an organism is composed of a sequence of four nucleotides in a specific pattern, which encode information as a function of their order. Genomic organization refers to the linear order of DNA elements and their division into chromosomes. "Genome organization" can also refer to the 3D structure of chromosomes and the positioning of DNA sequences within the nucleus.

Contents

Description

Organisms have a vast array of ways in which their respective genomes are organized. A comparison of the genomic organization of six major model organisms shows size expansion with the increase of complexity of the organism. There is a more than 300-fold difference between the genome sizes of yeast and mammals, but only a modest 4- to 5-fold increase in overall gene number (see the figure on the right). However, the ratio of coding to noncoding and repetitive sequences is indicative of the complexity of the genome: The largely "open" genomes of unicellular fungi have relatively little noncoding DNA compared with the highly heterochromatic genomes of multicellular organisms.[ citation needed ]

In particular, mammals have accumulated considerable repetitive elements and noncoding regions, which account for the majority of their DNA sequences (52% non-coding and 44% repetitive DNA). [1] [2] Only 1.2% of the mammalian genome thus encodes for protein function. This massive expansion of repetitive and noncoding sequences in multicellular organisms is most likely due to the incorporation of invasive elements, such as DNA transposons, retrotransposons, and other repetitive elements. [3] The expansion of repetitive elements (such as Alu sequences) has even infiltrated the transcriptional units of the mammalian genome. This results in transcription units that are frequently much larger (30–200 kb), commonly containing multiple promoters and DNA repeats within untranslated introns.[ citation needed ]

The vast expansion of the genome with noncoding and repetitive DNA in higher eukaryotes implies more extensive epigenetic silencing mechanisms. Studies of the genomic organization is thought to be the future of genomic medicine, which will provide the opportunity for personalized prognoses in clinics. [4]

See also

Related Research Articles

Genome Genetic material of organism

In the fields of molecular biology and genetics, a genome is the genetic material of an organism. It consists of DNA. The genome includes both the genes and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA. The study of the genome is called genomics.

Transposable element semiparasitic DNA sequence

A transposable element is a DNA sequence that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983.

Human genome Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome, and the mitochondrial genome. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes, which are contained in germ cells consist of three billion DNA base pairs, while diploid genomes have twice the DNA content. While there are significant differences among the genomes of human individuals, these are considerably smaller than the differences between humans and their closest living relatives, the bonobos and chimpanzees.

Non-coding DNA sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functions of non-coding DNA include the transcriptional and translational regulation of protein-coding sequences, scaffold attachment regions, origins of DNA replication, centromeres and telomeres.

Molecular evolution process of change in the sequence composition of cellular molecules across generations

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

Gene family set of several similar genes

A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on different chromosomes, called the α-globin and β-globin loci. These two gene clusters are thought to have arisen as a result of a precursor gene being duplicated approximately 500 million years ago.

Repeated sequences are patterns of nucleic acids that occur in multiple copies throughout the genome. Repetitive DNA was first detected because of its rapid re-association kinetics. In many organisms, a significant fraction of the genomic DNA is highly repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans.

Functional genomics field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional “gene-by-gene” approach.

Sequence homology Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

An Intergenic region (IGR) is a stretch of DNA sequences located between genes. Intergenic regions are a subset of noncoding DNA. Occasionally some intergenic DNA acts to control genes nearby, but most of it has no currently known function. It is one of the DNA sequences sometimes referred to as junk DNA, though it is only one phenomenon labeled such and in scientific studies today, the term is less used. Recently transcribed RNA from the DNA fragments in intergenic regions were known as "dark matter" or "dark matter transcripts".

ENCODE research consortium investigating functional elements in human and model organism DNA

The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims to identify functional elements in the human genome.

Recombination hotspots are regions in a genome that exhibit elevated rates of recombination relative to a neutral expectation. The recombination rate within hotspots can be hundreds of times that of the surrounding region. Recombination hotspots result from higher DNA break formation in these regions, and apply to both mitotic and meiotic cells. This appellation can refer to recombination events resulting from the uneven distribution of programmed meiotic double-strand breaks.

Gene Sequence of DNA or RNA that codes for an RNA or protein product

In biology, a gene is a sequence of nucleotides in DNA or RNA that encodes the synthesis of a gene product, either RNA or protein.

OR2F1 protein-coding gene in the species Homo sapiens

Olfactory receptor 2F1 is a protein that in humans is encoded by the OR2F1 gene.

HOXC11 protein-coding gene in the species Homo sapiens

Homeobox protein Hox-C11 is a protein that in humans is encoded by the HOXC11 gene.

ZNF33B protein-coding gene in the species Homo sapiens

Zinc finger protein 33B is a protein that in humans is encoded by the ZNF33B gene.

Long non-coding RNAs are a type of RNA, defined as being transcripts with lengths exceeding 200 nucleotides that are not translated into protein. This somewhat arbitrary limit distinguishes long ncRNAs from small non-coding RNAs such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of lncRNA which do not overlap protein-coding genes.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human. Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.

References

  1. Venter G, et al., The Sequence of the Human Genome Science (2001) 291. pp1304-51
  2. R. A. Harris et al., Human-Specific Changes of Genome Structure Detected by Genomic Triangulation Science (2007) 316.5822, pp. 235-7
  3. Haig H. Kazazian, Jr. Mobile Elements: Drivers of Genome Evolution Science, Mar 2004; 303: 1626-32
  4. West M., et al., Embracing the complexity of genomic data for personalized medicine Genome Res. (2006)16:559-66