Eukaryotic chromosome fine structure

Last updated June 01, 2023

Eukaryotic chromosome fine structure refers to the structure of sequences for eukaryotic chromosomes. Some fine sequences are included in more than one class, so the classification listed is not intended to be completely separate.

Chromosomal characteristics

Some sequences are required for a properly functioning chromosome:

Centromere: Used during cell division as the attachment point for the spindle fibers.
Telomere: Used to maintain chromosomal integrity by capping off the ends of the linear chromosomes. This region is a microsatellite, but its function is more specific than a simple tandem repeat.

Throughout the eukaryotic kingdom, the overall structure of chromosome ends is conserved and is characterized by the telomeric tract - a series of short G-rich repeats. This is succeeded by an extensive subtelomeric region consisting of various types and lengths of repeats - the telomere associated sequences (TAS).^[1] These regions are generally low in gene density, low in transcription, low in recombination, late replicating, are involved in protecting the end from degradation and end-to-end fusions and in completing replication. The subtelomeric repeats can rescue chromosome ends when telomerase fails, buffer subtelomerically located genes against transcriptional silencing and protect the genome from deleterious rearrangements due to ectopic recombination. They may also be involved in fillers for increasing chromosome size to some minimum threshold level necessary for chromosome stability; act as barriers against transcriptional silencing; provide a location for the adaptive amplification of genes; and be involved in secondary mechanism of telomere maintenance via recombination when telomerase activity is absent.

Structural sequences

Other sequences are used in replication or during interphase with the physical structure of the chromosome.

Ori, or Origin: Origins of replication.
MAR: Matrix attachment regions, where the DNA attaches to the nuclear matrix.

Protein-coding genes

Regions of the genome with protein-coding genes include several elements:

Enhancer regions (normally up to a few thousand basepairs upstream of transcription).
Promoter regions (normally less than a couple of hundred basepairs upstream of transcription) include elements such as the TATA and CAAT boxes, GC elements, and an initiator.
Exons are the part of the transcript that will eventually be transported to the cytoplasm for translation. When discussing gene with alternate splicing, an exon is a portion of the transcript that could be translated, given the correct splicing conditions. The exons can be divided into three parts
- The coding region is the portion of the mRNA that will eventually be translated.
- Upstream untranslated region (5' UTR) can serve several functions, including mRNA transport, and initiation of translation (including, portions of the Kozak sequence). They are never translated into the protein (excepting various mutations).
- The 3' region downstream from the stop codon is separated into two parts:
  - 3' UTR is never translated, but serves to add mRNA stability. It is also the attachment site for the poly-A tail. The poly-A tail is used in the initiation of translation and also seems to have an effect on the long-term stability (aging) of the mRNA.
  - An unnamed region after the poly-A tail, but before the actual site for transcription termination, is spliced off during transcription, and so does not become part of the 3' UTR. Its function, if any, is unknown.
Introns are intervening sequences between the exons that are never translated. Some sequences inside introns function as miRNA, and there are even some cases of small genes residing completely within the intron of a large gene. For some genes (such as the antibody genes), internal control regions are found inside introns. These situations, however, are treated as exceptions.

Genes that are used as RNA

Many regions of the DNA are transcribed with RNA as the functional form:

rRNA: Ribosomal RNA are used in the ribosome.
tRNA: Transfer RNA are used in the translation process by bringing amino acids to the ribosome.
snRNA: Small nuclear RNA are used in spliceosomes to help the processing of pre-mRNA.
gRNA: Guide RNA are used in RNA editing.
miRNA: Micro RNA are small (approximately 24 nucleotides) that are used in gene silencing.
snoRNA: Small nucleolar RNA are used to help process and construct the ribosome.

Other RNAs are transcribed and not translated, but have undiscovered functions.

Repeated sequences

Repeated sequences are of two basic types: unique sequences that are repeated in one area; and repeated sequences that are interspersed throughout the genome.

Satellites

Satellites are unique sequences that are repeated in tandem in one area. Depending on the length of the repeat, they are classified as either:

Minisatellite: Short repeats of nucleotides.
Microsatellite: Very short repeats of nucleotides. Some trinucleotide repeats are found in coding regions (see, Trinucleotide repeat disorder). Most are found in noncoding regions. Their function is unknown, if they have any specific function. They are used as molecular markers and in DNA fingerprinting.

Interspersed sequences

Interspersed sequences are nonadjacent repeats, with sequences that are found dispersed across the genome. They can be classified based on the length of the repeat as:

SINE: Short interspersed sequences. The repeats are normally a few hundred base pairs in length. These sequences constitute about 13% of the human genome^[2] with the specific Alu sequence accounting for about 10%.
LINE: Long interspersed sequences. The repeats are normally several thousand base pairs in length. These sequences constitute about 21% of the human genome.^[2]

Both of these types are classified as retrotransposons.

Retrotransposons

Retrotransposons are sequences in the DNA that are the result of retrotransposition of RNA. LINEs and SINEs are examples where the sequences are repeats, but there are non-repeated sequences that can also be retrotransposons.

Other sequences

Typical eukaryotic chromosomes contain much more DNA than is classified in the categories above. The DNA may be used as spacing, or have other as-yet-unknown function. Or, they may simply be random sequences of no consequence.

Related Research Articles

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA. The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences, and often a substantial fraction of junk DNA with no evident function. Almost all eukaryotes have mitochondria and a small mitochondrial genome. Algae and plants also contain chloroplasts with a chloroplast genome.

An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word intron is derived from the term intragenic region, i.e., a region inside a gene. The term intron refers to both the DNA sequence within a gene and the corresponding RNA sequence in RNA transcripts. The non-intron sequences that become joined by this RNA processing to form the mature RNA are called exons.

A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, by retrotransposon mobile genetic elements to proliferate within the host genome, and by eukaryotic cells to extend the telomeres at the ends of their linear chromosomes. Contrary to a widely held belief, the process does not violate the flows of genetic information as described by the classical central dogma, as transfers of information from RNA to DNA are explicitly held possible.

A transposable element is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transposition often results in duplication of the same genetic material. In the human genome, L1 and Alu elements are two examples. Barbara McClintock's discovery of them earned her a Nobel Prize in 1983. Its importance in personalized medicine is becoming increasingly relevant, as well as gaining more attention in data analytics given the difficulty of analysis in very high dimensional spaces.

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules. Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses.

The coding region of a gene, also known as the coding sequence(CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

Repeated sequences are short or long patterns of nucleic acids that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the genomic DNA is repetitive, with over two-thirds of the sequence consisting of repetitive elements in humans. Some of these repeated sequences are necessary for maintaining important genome structures such as telomeres or centromeres.

<span class="mw-page-title-main">Retrotransposon</span> Type of genetic component

Retrotransposons are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through the reverse transcription process using an RNA transposition intermediate.

<span class="mw-page-title-main">Primary transcript</span> RNA produced by transcription

A primary transcript is the single-stranded ribonucleic acid (RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNAs are modified in preparation for translation. For example, a precursor mRNA (pre-mRNA) is a type of primary transcript that becomes a messenger RNA (mRNA) after processing.

Subtelomeres are segments of DNA between telomeric caps and chromatin.

In biology, the word gene can have several different meanings. The Mendelian gene is a basic unit of heredity and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and noncoding genes.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

60S ribosomal protein L41 is a protein that is specific to humans and is encoded by the RPL41 gene, also known as HG12 and large eukaryotic ribosomal subunit protein eL41. The gene family HGNC is L ribosomal proteins. The protein itself is also described as P62945-RL41_HUMAN on the GeneCards database. This RPL41 gene is located on chromosome 12.

Numerous key discoveries in biology have emerged from studies of RNA, including seminal work in the fields of biochemistry, genetics, microbiology, molecular biology, molecular evolution and structural biology. As of 2010, 30 scientists have been awarded Nobel Prizes for experimental work that includes studies of RNA. Specific discoveries of high biological significance are discussed in this article.

A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.

Telomeric repeat–containing RNA (TERRA) is a long non-coding RNA transcribed from telomeres - repetitive nucleotide regions found on the ends of chromosomes that function to protect DNA from deterioration or fusion with neighboring chromosomes. TERRA has been shown to be ubiquitously expressed in almost all cell types containing linear chromosomes - including humans, mice, and yeasts. While the exact function of TERRA is still an active area of research, it is generally believed to play a role in regulating telomerase activity as well as maintaining the heterochromatic state at the ends of chromosomes. TERRA interaction with other associated telomeric proteins has also been shown to help regulate telomere integrity in a length-dependent manner.

Short interspersed nuclear elements (SINEs) are non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates. SINEs compose about 13% of the mammalian genome.

This glossary of genetics is a list of definitions of terms and concepts commonly used in the study of genetics and related disciplines in biology, including molecular biology, cell biology, and evolutionary biology. It is intended as introductory material for novices; for more specific and technical detail, see the article corresponding to each term. For related terms, see Glossary of evolutionary biology.

This glossary of genetics is a list of definitions of terms and concepts commonly used in the study of genetics and related disciplines in biology, including molecular biology, cell biology, and evolutionary biology. It is split across two articles:

References

Notes

↑ Pryde FE, Gorham HC, Louis EJ (1997) Chromosome ends: all the same under their caps. Curr Opin Genet Dev 7(6):822-828
1 2 Pierce, B. A. (2005). Genetics: A conceptual approach. Freeman. Page 311

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Pryde1997-1] Pryde FE, Gorham HC, Louis EJ (1997) Chromosome ends: all the same under their caps. Curr Opin Genet Dev 7(6):822-828

[Pierce-2] 1 2 Pierce, B. A. (2005). Genetics: A conceptual approach. Freeman. Page 311

[1]

[2]