Comparative genomics

Last updated
Whole genome alignment is a typical method in comparative genomics. This alignment of eight Yersinia bacteria genomes reveals 78 locally collinear blocks conserved among all eight taxa. Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. Regions that are inverted relative to Y. pestis KIM are shifted below a genome's center axis. A genome alignment of eight Yersinia isolates.png
Whole genome alignment is a typical method in comparative genomics. This alignment of eight Yersinia bacteria genomes reveals 78 locally collinear blocks conserved among all eight taxa. Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. Regions that are inverted relative to Y. pestis KIM are shifted below a genome's center axis.

Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. [2] [3] This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. [4] Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. [2] The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms. [4]

Contents

The comparative genomic analysis begins with a simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number. Table 1 presents data on several fully sequenced model organisms, and highlights some striking findings. For instance, while the tiny flowering plant Arabidopsis thaliana has a smaller genome than that of the fruit fly Drosophila melanogaster (157 million base pairs v. 165 million base pairs, respectively) it possesses nearly twice as many genes (25,000 v. 13,000). In fact, A. thaliana has approximately the same number of genes as humans (25,000). Thus, a very early lesson learned in the genomic era is that genome size does not correlate with evolutionary status, nor is the number of genes proportionate to genome size. [5]

Table 1: Comparative genome sizes of humans and other model organisms [2]
OrganismEstimated size (base pairs)Chromosome numberEstimated gene number
Human (Homo sapiens) 3.1 billion4625,000
Mouse (Mus musculus) 2.9 billion4025,000
Bovine (Bos taurus) 2.86 billion [6] 60 [7] 22,000 [8]
Fruit fly (Drosophila melanogater) 165 million813,000
Plant (Arabidopsis thaliana)157 million1025,000
Roundworm (Caenorhabditis elegans)97 million1219,000
Yeast (Saccharomyces cerevisiae)12 million326,000
Bacteria (Escherichia coli)4.6 million13,200

In comparative genomics, synteny is the preserved order of genes on chromosomes of related species indicating their descent from a common ancestor. Synteny provides a framework in which the conservation of homologous genes and gene order is identified between genomes of different species. [9] Synteny blocks are more formally defined as regions of chromosomes between genomes that share a common order of homologous genes derived from a common ancestor. [10] [11] Alternative names such as conserved synteny or collinearity have been used interchangeably. [12] Comparisons of genome synteny between and within species have provided an opportunity to study evolutionary processes that lead to the diversity of chromosome number and structure in many lineages across the tree of life; [13] [14] early discoveries using such approaches include chromosomal conserved regions in nematodes and yeast, [15] [16] evolutionary history and phenotypic traits of extremely conserved Hox gene clusters across animals and MADS-box gene family in plants, [17] [18] and karyotype evolution in mammals and plants. [19]

Furthermore, comparing two genomes not only reveals conserved domains or synteny but also aids in detecting copy number variations, single nucleotide polymorphisms (SNPs), indels, and other genomic structural variations.

Virtually started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium ) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence. [2] [20] With the explosion in the number of genome projects due to the advancements in DNA sequencing technologies, particularly the next-generation sequencing methods in late 2000s, this field has become more sophisticated, making it possible to deal with many genomes in a single study. [21] Comparative genomics has revealed high levels of similarity between closely related organisms, such as humans and chimpanzees, and, more surprisingly, similarity between seemingly distantly related organisms, such as humans and the yeast Saccharomyces cerevisiae . [22] It has also showed the extreme diversity of the gene composition in different evolutionary lineages. [20]

History

See also: History of genomics

Comparative genomics has a root in the comparison of virus genomes in the early 1980s. [20] For example, small RNA viruses infecting animals (picornaviruses) and those infecting plants (cowpea mosaic virus) were compared and turned out to share significant sequence similarity and, in part, the order of their genes. [23] In 1986, the first comparative genomic study at a larger scale was published, comparing the genomes of varicella-zoster virus and Epstein-Barr virus that contained more than 100 genes each. [24]

The first complete genome sequence of a cellular organism, that of Haemophilus influenzae Rd, was published in 1995. [25] The second genome sequencing paper was of the small parasitic bacterium Mycoplasma genitalium published in the same year. [26] Starting from this paper, reports on new genomes inevitably became comparative-genomic studies. [20]

Microbial genomes. The first high-resolution whole genome comparison system of microbial genomes of 10-15kbp was developed in 1998 by Art Delcher, Simon Kasif and Steven Salzberg and applied to the comparison of entire highly related microbial organisms with their collaborators at the Institute for Genomic Research (TIGR). The system is called MUMMER and was described in a publication in Nucleic Acids Research in 1999. The system helps researchers to identify large rearrangements, single base mutations, reversals, tandem repeat expansions and other polymorphisms. In bacteria, MUMMER enables the identification of polymorphisms that are responsible for virulence, pathogenicity, and anti-biotic resistance. The system was also applied to the Minimal Organism Project at TIGR and subsequently to many other comparative genomics projects.

Eukaryote genomes. Saccharomyces cerevisiae , the baker's yeast, was the first eukaryote to have its complete genome sequence published in 1996. [27] After the publication of the roundworm Caenorhabditis elegans genome in 1998 [15] and together with the fruit fly Drosophila melanogaster genome in 2000, [28] Gerald M. Rubin and his team published a paper titled "Comparative Genomics of the Eukaryotes", in which they compared the genomes of the eukaryotes D. melanogaster, C. elegans, and S. cerevisiae, as well as the prokaryote H. influenzae. [29] At the same time, Bonnie Berger, Eric Lander, and their team published a paper on whole-genome comparison of human and mouse. [30]

With the publication of the large genomes of vertebrates in the 2000s, including human, the Japanese pufferfish Takifugu rubripes , and mouse, precomputed results of large genome comparisons have been released for downloading or for visualization in a genome browser. Instead of undertaking their own analyses, most biologists can access these large cross-species comparisons and avoid the impracticality caused by the size of the genomes. [31]

Next-generation sequencing methods, which were first introduced in 2007, have produced an enormous amount of genomic data and have allowed researchers to generate multiple (prokaryotic) draft genome sequences at once. These methods can also quickly uncover single-nucleotide polymorphisms, insertions and deletions by mapping unassembled reads against a well annotated reference genome, and thus provide a list of possible gene differences that may be the basis for any functional variation among strains. [21]

Evolutionary principles

One character of biology is evolution, evolutionary theory is also the theoretical foundation of comparative genomics, and at the same time the results of comparative genomics unprecedentedly enriched and developed the theory of evolution. When two or more of the genome sequence are compared, one can deduce the evolutionary relationships of the sequences in a phylogenetic tree. Based on a variety of biological genome data and the study of vertical and horizontal evolution processes, one can understand vital parts of the gene structure and its regulatory function.

Similarity of related genomes is the basis of comparative genomics. If two creatures have a recent common ancestor, the differences between the two species genomes are evolved from the ancestors' genome. The closer the relationship between two organisms, the higher the similarities between their genomes. If there is close relationship between them, then their genome will display a linear behaviour (synteny), namely some or all of the genetic sequences are conserved. Thus, the genome sequences can be used to identify gene function, by analyzing their homology (sequence similarity) to genes of known function.

Human FOXP2 gene and evolutionary conservation is shown in and multiple alignment (at bottom of figure) in this image from the UCSC Genome Browser. Note that conservation tends to cluster around coding regions (exons). BrowserFoxp2.jpg
Human FOXP2 gene and evolutionary conservation is shown in and multiple alignment (at bottom of figure) in this image from the UCSC Genome Browser. Note that conservation tends to cluster around coding regions (exons).

Orthologous sequences are related sequences in different species: a gene exists in the original species, the species divided into two species, so genes in new species are orthologous to the sequence in the original species. Paralogous sequences are separated by gene cloning (gene duplication): if a particular gene in the genome is copied, then the copy of the two sequences is paralogous to the original gene. A pair of orthologous sequences is called orthologous pairs (orthologs), a pair of paralogous sequence is called collateral pairs (paralogs). Orthologous pairs usually have the same or similar function, which is not necessarily the case for collateral pairs. In collateral pairs, the sequences tend to evolve into having different functions.

Comparative genomics exploits both similarities and differences in the proteins, RNA, and regulatory regions of different organisms to infer how selection has acted upon these elements. Those elements that are responsible for similarities between different species should be conserved through time (stabilizing selection), while those elements responsible for differences among species should be divergent (positive selection). Finally, those elements that are unimportant to the evolutionary success of the organism will be unconserved (selection is neutral).

One of the important goals of the field is the identification of the mechanisms of eukaryotic genome evolution. It is however often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. For this reason comparative genomics studies of small model organisms (for example the model Caenorhabditis elegans and closely related Caenorhabditis briggsae) are of great importance to advance our understanding of general mechanisms of evolution. [32] [33]

Role of CNVs in evolution

Comparative genomics plays a crucial role in identifying copy number variations (CNVs) and understanding their significance in evolution. CNVs, which involve deletions or duplications of large segments of DNA, are recognized as a major source of genetic diversity, influencing gene structure, dosage, and regulation. While single nucleotide polymorphisms (SNPs) are more common, CNVs impact larger genomic regions and can have profound effects on phenotype and diversity. [34] Recent studies suggest that CNVs constitute around 4.8–9.5% of the human genome and have a substantial functional and evolutionary impact. In mammals, CNVs contribute significantly to population diversity, influencing gene expression and various phenotypic traits. [35] Comparative genomics analyses of human and chimpanzee genomes have revealed that CNVs may play a greater role in evolutionary change compared to single nucleotide changes. Research indicates that CNVs affect more nucleotides than individual base-pair changes, with about 2.7% of the genome affected by CNVs compared to 1.2% by SNPs. Moreover, while many CNVs are shared between humans and chimpanzees, a significant portion is unique to each species. Additionally, CNVs have been associated with genetic diseases in humans, highlighting their importance in human health. Despite this, many questions about CNVs remain unanswered, including their origin and contributions to evolutionary adaptation and disease. Ongoing research aims to address these questions using techniques like comparative genomic hybridization, which allows for a detailed examination of CNVs and their significance. When investigators examined the raw sequence data of the human and chimpanzee. [36]

Significance of comparative genomics

Comparative genomics holds profound significance across various fields, including medical research, basic biology, and biodiversity conservation. For instance, in medical research, predicting how genomic variants limited ability to predict which genomic variants lead to changes in organism-level phenotypes, such as increased disease risk in humans, remains challenging due to the immense size of the genome, comprising about three billion nucleotides. [37] [38] [39]

To tackle this challenge, comparative genomics offers a solution by pinpointing nucleotide positions that have remained unchanged over millions of years of evolution. These conserved regions indicate potential sites where genetic alterations could have detrimental effects on an organism's fitness, thus guiding the search for disease-causing variants. Moreover, comparative genomics holds promise in unraveling the mechanisms of gene evolution, environmental adaptations, gender-specific differences, and population variations across vertebrate lineages. [40]

Furthermore, comparative studies enable the identification of genomic signatures of selection—regions in the genome that have undergone preferential increase and fixation in populations due to their functional significance in specific processes. [41] For instance, in animal genetics, indigenous cattle exhibit superior disease resistance and environmental adaptability but lower productivity compared to exotic breeds. Through comparative genomic analyses, significant genomic signatures responsible for these unique traits can be identified. Using insights from this signature, breeders can make informed decisions to enhance breeding strategies and promote breed development. [42]

Methods

Computational approaches are necessary for genome comparisons, given the large amount of data encoded in genomes. Many tools are now publicly available, ranging from whole genome comparisons to gene expression analysis. [43] This includes approaches from systems and control, information theory, string analysis and data mining. [44] Computational approaches will remain critical for research and teaching, especially when information science and genome biology is taught in conjunction. [45]

Phylogenetic tree of descendant species and reconstructed ancestors. The branch color represents breakpoint rates in RACFs (breakpoints per million years). Black branches represent nondetermined breakpoint rates. Tip colors depict assembly contiguity: black, scaffold-level genome assembly; green, chromosome-level genome assembly; yellow, chromosome-scale scaffold-level genome assembly. Numbers next to species names indicate diploid chromosome number (if known). Phylogenetic tree of descendant species and reconstructed ancestors.png
Phylogenetic tree of descendant species and reconstructed ancestors. The branch color represents breakpoint rates in RACFs (breakpoints per million years). Black branches represent nondetermined breakpoint rates. Tip colors depict assembly contiguity: black, scaffold-level genome assembly; green, chromosome-level genome assembly; yellow, chromosome-scale scaffold-level genome assembly. Numbers next to species names indicate diploid chromosome number (if known).

Comparative genomics starts with basic comparisons of genome size and gene density. For instance, genome size is important for coding capacity and possibly for regulatory reasons. High gene density facilitates genome annotation, analysis of environmental selection. By contrast, low gene density hampers the mapping of genetic disease as in the human genome.

Sequence alignment

Alignments are used to capture information about similar sequences such as ancestry, common evolutionary descent, or common structure and function. Alignments can be done for both nucleotide and protein sequences. [47] [48] Alignments consist of local or global pairwise alignments, and multiple sequence alignments. One way to find global alignments is to use a dynamic programming algorithm known as Needleman-Wunsch algorithm whereas Smith–Waterman algorithm used to find local alignments. With the exponential growth of sequence databases and the emergence of longer sequences, there's a heightened interest in faster, approximate, or heuristic alignment procedures. Among these, the FASTA and BLAST algorithms are prominent for local pairwise alignment. Recent years have witnessed the development of programs tailored to aligning lengthy sequences, such as MUMmer (1999), BLASTZ (2003), and AVID (2003). While BLASTZ adopts a local approach, MUMmer and AVID are geared towards global alignment. To harness the benefits of both local and global alignment approaches, one effective strategy involves integrating them. Initially, a rapid variant of BLAST known as BLAT is employed to identify homologous "anchor" regions. These anchors are subsequently scrutinized to identify sets exhibiting conserved order and orientation. Such sets of anchors are then subjected to alignment using a global strategy.

Additionally, ongoing efforts focus on optimizing existing algorithms to handle the vast amount of genome sequence data by enhancing their speed. Furthermore, MAVID stands out as another noteworthy pairwise alignment program specifically designed for aligning multiple genomes.

Pairwise Comparison: The Pairwise comparison of genomic sequence data is widely utilized in comparative gene prediction. Many studies in comparative functional genomics lean on pairwise comparisons, wherein traits of each gene are compared with traits of other genes across species. his method yields many more comparisons than unique observations, making each comparison dependent on others. [49] [50]

Multiple comparisons: The comparison of multiple genomes is a natural extension of pairwise inter-specific comparisons. Such comparisons typically aim to identify conserved regions across two phylogenetic scales: 1. Deep comparisons, often referred to as phylogenetic footprinting [51] reveal conservation across higher taxonomic units like vertebrates. [52] 2. Shallow comparisons, recently termed Phylogenetic shadowing, [53] probe conservation across a group of closely related species.

Chromosome by chromosome variation of indicine and taurine cattle. The genomic structural differences on chromosome X between indicine (Bos indicus - Nelore cattle) and taurine cattle (Bos taurus - Hereford cattle) were identified using the SyRI tool. Genomic structural variation.png
Chromosome by chromosome variation of indicine and taurine cattle. The genomic structural differences on chromosome X between indicine (Bos indicus Nelore cattle) and taurine cattle (Bos taurusHereford cattle) were identified using the SyRI tool.

Whole-genome alignment

Whole-genome alignment (WGA) involves predicting evolutionary relationships at the nucleotide level between two or more genomes. It integrates elements of colinear sequence alignment and gene orthology prediction, presenting a greater challenge due to the vast size and intricate nature of whole genomes. Despite its complexity, numerous methods have emerged to tackle this problem because WGAs play a crucial role in various genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. [54] Thereby, SyRI (Synteny and Rearrangement Identifier) is one such method that utilizes whole genome alignment and it is designed to identify both structural and sequence differences between two whole-genome assemblies. By taking WGAs as input, SyRI initially scans for disparities in genome structures. Subsequently, it identifies local sequence variations within both rearranged and non-rearranged (syntenic) regions. [55]

Example of a phylogenetic tree created from an alignment of 250 unique spike protein sequences from the Betacoronavirus family. Betacoronavirus Phylogenetic Tree.png
Example of a phylogenetic tree created from an alignment of 250 unique spike protein sequences from the Betacoronavirus family.

Phylogenetic reconstruction

Another computational method for comparative genomics is phylogenetic reconstruction. It is used to describe evolutionary relationships in terms of common ancestors. The relationships are usually represented in a tree called a phylogenetic tree. Similarly, coalescent theory is a retrospective model to trace alleles of a gene in a population to a single ancestral copy shared by members of the population. This is also known as the most recent common ancestor. Analysis based on coalescence theory tries predicting the amount of time between the introduction of a mutation and a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed. The inheritance relationships are visualized in a form similar to a phylogenetic tree. Coalescence (or the gene genealogy) can be visualized using dendrograms. [56]

Example of synteny block and break. Genes located on chromosomes of two species are denoted in letters. Each gene is associated with a number representing the species they belong to (species 1 or 2). Orthologous genes are connected by dashed lines and genes without an orthologous relationship are treated as gaps in synteny programs. Synteny.png
Example of synteny block and break. Genes located on chromosomes of two species are denoted in letters. Each gene is associated with a number representing the species they belong to (species 1 or 2). Orthologous genes are connected by dashed lines and genes without an orthologous relationship are treated as gaps in synteny programs.

Genome maps

An additional method in comparative genomics is genetic mapping. In genetic mapping, visualizing synteny is one way to see the preserved order of genes on chromosomes. It is usually used for chromosomes of related species, both of which result from a common ancestor. [58] This and other methods can shed light on evolutionary history. A recent study used comparative genomics to reconstruct 16 ancestral karyotypes across the mammalian phylogeny. The computational reconstruction showed how chromosomes rearranged themselves during mammal evolution. It gave insight into conservation of select regions often associated with the control of developmental processes. In addition, it helped to provide an understanding of chromosome evolution and genetic diseases associated with DNA rearrangements.[ citation needed ]

Image from the study Evolution of the ancestral mammalian karyotype and syntenic regions. It is a Visualization of the evolutionary history of reconstructed mammalian chromosomes based on the human lineage. Reconstruction of mammillian chromosomes.png
Image from the study Evolution of the ancestral mammalian karyotype and syntenic regions. It is a Visualization of the evolutionary history of reconstructed mammalian chromosomes based on the human lineage.

Tools

Computational tools for analyzing sequences and complete genomes are developing quickly due to the availability of large amount of genomic data. At the same time, comparative analysis tools are progressed and improved. In the challenges about these analyses, it is very important to visualize the comparative results. [59]

Visualization of sequence conservation is a tough task of comparative sequence analysis. As we know, it is highly inefficient to examine the alignment of long genomic regions manually. Internet-based genome browsers provide many useful tools for investigating genomic sequences due to integrating all sequence-based biological information on genomic regions. When we extract large amount of relevant biological data, they can be very easy to use and less time-consuming. [59]

An advantage of using online tools is that these websites are being developed and updated constantly. There are many new settings and content can be used online to improve efficiency. [59]

Selected applications

Agriculture

Agriculture is a field that reaps the benefits of comparative genomics. Identifying the loci of advantageous genes is a key step in breeding crops that are optimized for greater yield, cost-efficiency, quality, and disease resistance. For example, one genome wide association study conducted on 517 rice landraces revealed 80 loci associated with several categories of agronomic performance, such as grain weight, amylose content, and drought tolerance. Many of the loci were previously uncharacterized. [74] Not only is this methodology powerful, it is also quick. Previous methods of identifying loci associated with agronomic performance required several generations of carefully monitored breeding of parent strains, a time-consuming effort that is unnecessary for comparative genomic studies. [75]

Medicine

Vaccine development

The medical field also benefits from the study of comparative genomics. In an approach known as reverse vaccinology, researchers can discover candidate antigens for vaccine development by analyzing the genome of a pathogen or a family of pathogens. [76] Applying a comparative genomics approach by analyzing the genomes of several related pathogens can lead to the development of vaccines that are multi-protective. A team of researchers employed such an approach to create a universal vaccine for Group B Streptococcus, a group of bacteria responsible for severe neonatal infection. [77] Comparative genomics can also be used to generate specificity for vaccines against pathogens that are closely related to commensal microorganisms. For example, researchers used comparative genomic analysis of commensal and pathogenic strains of E. coli to identify pathogen-specific genes as a basis for finding antigens that result in immune response against pathogenic strains but not commensal ones. [78] In May 2019, using the Global Genome Set, a team in the UK and Australia sequenced thousands of globally-collected isolates of Group A Streptococcus, providing potential targets for developing a vaccine against the pathogen, also known as S. pyogenes. [79]

Personalized Medicine

Personalized Medicine, enabled by Comparative Genomics, represents a revolutionary approach in healthcare, tailoring medical treatment and disease prevention to the individual patient's genetic makeup. [80] By analyzing genetic variations across populations and comparing them with an individual's genome, clinicians can identify specific genetic markers associated with disease susceptibility, drug metabolism, and treatment response. By identifying genetic variants associated with drug metabolism pathways, drug targets, and adverse reactions, personalized medicine can optimize medication selection, dosage, and treatment regimens for individual patients. This approach minimizes the risk of adverse drug reactions, enhances treatment efficacy, and improves patient outcomes.

Cancer

Cancer Genomics represents a cutting-edge field within oncology that leverages comparative genomics to revolutionize cancer diagnosis, treatment, and prevention strategies. Comparative genomics plays a crucial role in cancer research by identifying driver mutations, and providing comprehensive analyses of mutations, copy number alterations, structural variants, gene expression, and DNA methylation profiles in large-scale studies across different cancer types. By analyzing the genomes of cancer cells and comparing them with healthy cells, researchers can uncover key genetic alterations driving tumorigenesis, tumor progression, and metastasis. This deep understanding of the genomic landscape of cancer has profound implications for precision oncology. Moreover, Comparative Genomics is instrumental in elucidating mechanisms of drug resistance—a major challenge in cancer treatment.

TCR loci from humans (H, top) and mice (M, bottom) are compared, with TCR elements in red, non-TCR genes in purple, and V segments in orange, other TCR elements in red. M6A, a putative methyltransferase; ZNF, a zinc-finger protein; OR, olfactory receptor genes; DAD1, defender against cell death; The sites of species-specific, processed pseudogenes are shown by gray triangles. See also GenBank accession numbers AE000658-62. Modified after Glusman et al. 2001. New Mouse and Human Comparison (2).png
TCR loci from humans (H, top) and mice (M, bottom) are compared, with TCR elements in red, non-TCR genes in purple, and V segments in orange, other TCR elements in red. M6A, a putative methyltransferase; ZNF, a zinc-finger protein; OR, olfactory receptor genes; DAD1, defender against cell death; The sites of species-specific, processed pseudogenes are shown by gray triangles. See also GenBank accession numbers AE000658-62. Modified after Glusman et al. 2001.

Mouse models in immunology

T cells (also known as a T lymphocytes or a thymocytes) are immune cells that grow from stem cells in the bone marrow. They assist to defend the body from infection and may aid in the fight against cancer. Because of their morphological, physiological, and genetic resemblance to humans, mice and rats have long been the preferred species for biomedical research animal models. Comparative Medicine Research is built on the ability to use information from one species to understand the same processes in another. We can get new insights into molecular pathways by comparing human and mouse T cells and their effects on the immune system utilizing comparative genomics. In order to comprehend its TCRs and their genes, Glusman conducted research on the sequencing of the human and mouse T cell receptor loci. TCR genes are well-known and serve as a significant resource for supporting functional genomics and understanding how genes and intergenic regions of the genome contribute to biological processes. [81]

T-cell immune receptors are important in seeing the world of pathogens in the cellular immune system. One of the reasons for sequencing the human and mouse TCR loci was to match the orthologous gene family sequences and discover conserved areas using comparative genomics. These, it was thought, would reflect two sorts of biological information: (1) exons and (2) regulatory sequences. In fact, the majority of V, D, J, and C exons could be identified in this method. The variable regions are encoded by multiple unique DNA elements that are rearranged and connected during T cell (TCR) differentiation: variable (V), diversity (D), and joining (J) elements for the and polypeptides; and V and J elements for the and polypeptides.[Figure 1] However, several short noncoding conserved blocks of the genome had been shown. Both human and mouse motifs are largely clustered in the 200 bp [Figure 2], the known 3′ enhancers in the TCR/ were identified, and a conserved region of 100 bp in the mouse J intron was subsequently shown to have a regulatory function.

[Figure 2] Gene structure of the human (top) and mouse (bottom) V, D, J, and C gene segments. The arrows represent the transcriptional direction of each TCR gene. The squares and circles represent going in a direct and reverse direction. Modified after Glusman et al. 2001. Mouse and Human Comparison (2) (2).png
[Figure 2] Gene structure of the human (top) and mouse (bottom) V, D, J, and C gene segments. The arrows represent the transcriptional direction of each TCR gene. The squares and circles represent going in a direct and reverse direction. Modified after Glusman et al. 2001.

Comparisons of the genomic sequences within each physical site or location of a specific gene on a chromosome (locs) and across species allow for research on other mechanisms and other regulatory signals. Some suggest new hypotheses about the evolution of TCRs, to be tested (and improved) by comparison to the TCR gene complement of other vertebrate species. A comparative genomic investigation of humans and mice will obviously allow for the discovery and annotation of many other genes, as well as identifying in other species for regulatory sequences. [81]

Research

Comparative genomics also opens up new avenues in other areas of research. As DNA sequencing technology has become more accessible, the number of sequenced genomes has grown. With the increasing reservoir of available genomic data, the potency of comparative genomic inference has grown as well.

A notable case of this increased potency is found in recent primate research. Comparative genomic methods have allowed researchers to gather information about genetic variation, differential gene expression, and evolutionary dynamics in primates that were indiscernible using previous data and methods. [82]

Great Ape Genome Project

The Great Ape Genome Project used comparative genomic methods to investigate genetic variation with reference to the six great ape species, finding healthy levels of variation in their gene pool despite shrinking population size. [83] Another study showed that patterns of DNA methylation, which are a known regulation mechanism for gene expression, differ in the prefrontal cortex of humans versus chimps, and implicated this difference in the evolutionary divergence of the two species. [84]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Human genome</span> Complete set of nucleic acid sequences for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as that for ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements, DNA playing structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication, plus large numbers of transposable elements, inserted viral DNA, non-functional pseudogenes and simple, highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA, such as pseudogenes, but there is no firm consensus on the total amount of junk DNA.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

<span class="mw-page-title-main">Nucleic acid sequence</span> Succession of nucleotides in a nucleic acid

A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nucleotides. By convention, sequences are usually presented from the 5' end to the 3' end. For DNA, with its double helix, there are two possible directions for the notated sequence; of these two, the sense strand is used. Because nucleic acids are normally linear (unbranched) polymers, specifying the sequence is equivalent to defining the covalent structure of the entire molecule. For this reason, the nucleic acid sequence is also termed the primary structure.

Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Synteny</span> Co-localization of genetic loci on a chromosome, or the conservation of gene order

In genetics, the term synteny refers to two related concepts:

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Conserved sequence</span> Similar DNA, RNA or protein sequences within genomes or among species

In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.

<span class="mw-page-title-main">Paleopolyploidy</span> State of having undergone whole genome duplication in deep evolutionary time

Paleopolyploidy is the result of genome duplications which occurred at least several million years ago (MYA). Such an event could either double the genome of a single species (autopolyploidy) or combine those of two species (allopolyploidy). Because of functional redundancy, genes are rapidly silenced or lost from the duplicated genomes. Most paleopolyploids, through evolutionary time, have lost their polyploid status through a process called diploidization, and are currently considered diploids, e.g., baker's yeast, Arabidopsis thaliana, and perhaps humans.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

<i>k</i>-mer Substrings of length k contained in a biological sequence

In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides, k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers, three 2-mers, two 3-mers and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers.

Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population.

Pathogenomics is a field which uses high-throughput screening technology and bioinformatics to study encoded microbe resistance, as well as virulence factors (VFs), which enable a microorganism to infect a host and possibly cause disease. This includes studying genomes of pathogens which cannot be cultured outside of a host. In the past, researchers and medical professionals found it difficult to study and understand pathogenic traits of infectious organisms. With newer technology, pathogen genomes can be identified and sequenced in a much shorter time and at a lower cost, thus improving the ability to diagnose, treat, and even predict and prevent pathogenic infections and disease. It has also allowed researchers to better understand genome evolution events - gene loss, gain, duplication, rearrangement - and how those events impact pathogen resistance and ability to cause disease. This influx of information has created a need for bioinformatics tools and databases to analyze and make the vast amounts of data accessible to researchers, and it has raised ethical questions about the wisdom of reconstructing previously extinct and deadly pathogens in order to better understand virulence.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

hCONDELs refer to regions of deletions within the human genome containing sequences that are highly conserved among closely related relatives. Almost all of these deletions fall within regions that perform non-coding functions. These represent a new class of regulatory sequences and may have played an important role in the development of specific traits and behavior that distinguish closely related organisms from each other.

<i>De novo</i> gene birth Evolution of novel genes from non-genic DNA sequence

De novo gene birth is the process by which new genes evolve from non-coding DNA. De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes. The processes that govern de novo gene birth are not well understood, although several models exist that describe possible mechanisms by which de novo gene birth may occur.

Genome sequencing of endangered species is the application of Next Generation Sequencing (NGS) technologies in the field of conservation biology, with the aim of generating life history, demographic and phylogenetic data of relevance to the management of endangered wildlife.

References

  1. Darling AE, Miklós I, Ragan MA (July 2008). "Dynamics of genome rearrangement in bacterial populations". PLOS Genetics. 4 (7): e1000128. doi: 10.1371/journal.pgen.1000128 . PMC   2483231 . PMID   18650965.
  2. 1 2 3 4 Touchman J (2010). "Comparative Genomics". Nature Education Knowledge. 3 (10): 13.
  3. Xia X (2013). Comparative Genomics. SpringerBriefs in Genetics. Heidelberg: Springer. doi:10.1007/978-3-642-37146-2. ISBN   978-3-642-37145-5. S2CID   5491782.
  4. 1 2 Wei L, Liu Y, Dubchak I, Shon J, Park J (April 2002). "Comparative genomics approaches to study organism similarities and differences". Journal of Biomedical Informatics. 35 (2): 142–150. doi: 10.1016/s1532-0464(02)00506-3 . PMID   12474427.
  5. Bennett MD, Leitch IJ, Price HJ, Johnston JS (April 2003). "Comparisons with Caenorhabditis (approximately 100 Mb) and Drosophila (approximately 175 Mb) using flow cytometry show genome size in Arabidopsis to be approximately 157 Mb and thus approximately 25% larger than the Arabidopsis genome initiative estimate of approximately 125 Mb". Annals of Botany. 91 (5): 547–557. doi:10.1093/aob/mcg057. PMC   4242247 . PMID   12646499.
  6. Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, et al. (2009). "A whole-genome assembly of the domestic cow, Bos taurus". Genome Biology. 10 (4): R42. doi: 10.1186/gb-2009-10-4-r42 . ISSN   1465-6906. PMC   2688933 . PMID   19393038.
  7. Holečková B, Schwarzbacherová V, Galdíková M, Koleničová S, Halušková J, Staničová J, et al. (2021-08-27). "Chromosomal Aberrations in Cattle". Genes. 12 (9): 1330. doi: 10.3390/genes12091330 . ISSN   2073-4425. PMC   8468509 . PMID   34573313.
  8. Elsik CG, Tellam RL, Worley KC (2009-04-24). "The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution". Science. 324 (5926): 522–528. Bibcode:2009Sci...324..522A. doi:10.1126/science.1169588. ISSN   0036-8075. PMC   2943200 . PMID   19390049.
  9. Liu D, Hunt M, Tsai IJ (January 2018). "Inferring synteny between genome assemblies: a systematic evaluation". BMC Bioinformatics. 19 (1): 26. doi: 10.1186/s12859-018-2026-4 . PMC   5791376 . PMID   29382321.
  10. Vergara IA, Chen N (September 2010). "Large synteny blocks revealed between Caenorhabditis elegans and Caenorhabditis briggsae genomes using OrthoCluster". BMC Genomics. 11: 516. doi: 10.1186/1471-2164-11-516 . PMC   2997010 . PMID   20868500.
  11. Tang H, Lyons E, Pedersen B, Schnable JC, Paterson AH, Freeling M (April 2011). "Screening synteny blocks in pairwise genome comparisons through integer programming". BMC Bioinformatics. 12: 102. doi: 10.1186/1471-2105-12-102 . PMC   3088904 . PMID   21501495.
  12. Ehrlich J, Sankoff D, Nadeau JH (September 1997). "Synteny conservation and chromosome rearrangements during mammalian evolution". Genetics. 147 (1): 289–296. doi:10.1093/genetics/147.1.289. PMC   1208112 . PMID   9286688.
  13. Zhang G, Li B, Li C, Gilbert MT, Jarvis ED, Wang J (2014-12-11). "Comparative genomic data of the Avian Phylogenomics Project". GigaScience. 3 (1): 26. doi: 10.1186/2047-217X-3-26 . PMC   4322804 . PMID   25671091.
  14. Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ, Davis P, et al. (January 2016). "WormBase 2016: expanding to enable helminth genomic research". Nucleic Acids Research. 44 (D1): D774–D780. doi:10.1093/nar/gkv1217. PMC   4702863 . PMID   26578572.
  15. 1 2 The C. elegans Sequencing Consortium (December 1998). "Genome sequence of the nematode C. elegans: a platform for investigating biology". Science. 282 (5396): 2012–2018. doi:10.1126/science.282.5396.2012. PMID   9851916.
  16. Wong S, Wolfe KH (July 2005). "Birth of a metabolic gene cluster in yeast by adaptive gene relocation". Nature Genetics. 37 (7): 777–782. doi:10.1038/ng1584. PMID   15951822.
  17. Luebeck EG (October 2010). "Cancer: Genomic evolution of metastasis". Nature. 467 (7319): 1053–1055. Bibcode:2010Natur.467.1053L. doi:10.1038/4671053a. PMID   20981088.
  18. Ruelens P, de Maagd RA, Proost S, Theißen G, Geuten K, Kaufmann K (2013). "FLOWERING LOCUS C in monocots and the tandem origin of angiosperm-specific MADS-box genes". Nature Communications. 4: 2280. Bibcode:2013NatCo...4.2280R. doi:10.1038/ncomms3280. PMID   23955420.
  19. Kemkemer C, Kohn M, Cooper DN, Froenicke L, Högel J, Hameister H, et al. (April 2009). "Gene synteny comparisons between different vertebrates provide new insights into breakage and fusion events during mammalian karyotype evolution". BMC Evolutionary Biology. 9 (1): 84. Bibcode:2009BMCEE...9...84K. doi: 10.1186/1471-2148-9-84 . PMC   2681463 . PMID   19393055.
  20. 1 2 3 4 Koonin EV, Galperin MY (2003). Sequence - Evolution - Function: Computational approaches in comparative genomics. Dordrecht: Springer Science+Business Media.
  21. 1 2 Hu B, Xie G, Lo CC, Starkenburg SR, Chain PS (November 2011). "Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics". Briefings in Functional Genomics. 10 (6): 322–333. doi:10.1093/bfgp/elr042. PMID   22199376.
  22. Russel PJ, Hertz PE, McMillan B (2011). Biology: The Dynamic Science (2nd ed.). Belmont, CA: Brooks/Cole. pp. 409–410.
  23. Argos P, Kamer G, Nicklin MJ, Wimmer E (September 1984). "Similarity in gene organization and homology between proteins of animal picornaviruses and a plant comovirus suggest common ancestry of these virus families". Nucleic Acids Research. 12 (18): 7251–7267. doi:10.1093/nar/12.18.7251. PMC   320155 . PMID   6384934.
  24. McGeoch DJ, Davison AJ (May 1986). "DNA sequence of the herpes simplex virus type 1 gene encoding glycoprotein gH, and identification of homologues in the genomes of varicella-zoster virus and Epstein-Barr virus". Nucleic Acids Research. 14 (10): 4281–4292. doi:10.1093/nar/14.10.4281. PMC   339861 . PMID   3012465.
  25. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, et al. (July 1995). "Whole-genome random sequencing and assembly of Haemophilus influenzae Rd". Science. 269 (5223): 496–512. Bibcode:1995Sci...269..496F. doi:10.1126/science.7542800. PMID   7542800.
  26. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, et al. (October 1995). "The minimal gene complement of Mycoplasma genitalium". Science. 270 (5235): 397–403. Bibcode:1995Sci...270..397F. doi:10.1126/science.270.5235.397. PMID   7569993. S2CID   29825758.
  27. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, et al. (October 1996). "Life with 6000 genes". Science. 274 (5287): 546, 563–546, 567. Bibcode:1996Sci...274..546G. doi:10.1126/science.274.5287.546. PMID   8849441. S2CID   16763139.
  28. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al. (March 2000). "The genome sequence of Drosophila melanogaster". Science. 287 (5461): 2185–2195. Bibcode:2000Sci...287.2185.. CiteSeerX   10.1.1.549.8639 . doi:10.1126/science.287.5461.2185. PMID   10731132.
  29. Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, et al. (March 2000). "Comparative genomics of the eukaryotes". Science. 287 (5461): 2204–2215. Bibcode:2000Sci...287.2204.. doi:10.1126/science.287.5461.2204. PMC   2754258 . PMID   10731134.
  30. Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES (July 2000). "Human and mouse gene structure: comparative analysis and application to exon prediction". Genome Research. 10 (7): 950–958. doi: 10.1101/gr.10.7.950 . PMC   310911 . PMID   10899144.
  31. Ureta-Vidal A, Ettwiller L, Birney E (April 2003). "Comparative genomics: genome-wide analysis in metazoan eukaryotes". Nature Reviews. Genetics. 4 (4): 251–262. doi:10.1038/nrg1043. PMID   12671656. S2CID   2037634.
  32. Stein LD, Bao Z, Blasiar D, Blumenthal T, Brent MR, Chen N, et al. (November 2003). "The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics". PLOS Biology. 1 (2): E45. doi: 10.1371/journal.pbio.0000045 . PMC   261899 . PMID   14624247.
  33. "Newly Sequenced Worm a Boon for Worm Biologists". PLOS Biology. 1 (2): e4. 2003. doi: 10.1371/journal.pbio.0000044 . PMC   261884 .
  34. Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, et al. (May 2010). "Analysis of copy number variations among diverse cattle breeds". Genome Research. 20 (5): 693–703. doi:10.1101/gr.105403.110. PMC   2860171 . PMID   20212021.
  35. Liu Y, Mu Y, Wang W, Ahmed Z, Wei X, Lei C, et al. (2023). "Analysis of genomic copy number variations through whole-genome scan in Chinese Qaidam cattle". Frontiers in Veterinary Science. 10: 1148070. doi: 10.3389/fvets.2023.1148070 . PMC   10103646 . PMID   37065216.
  36. "Copy Number Variation | Learn Science at Scitable". www.nature.com. Retrieved 2024-05-03.
  37. Bornstein K, Gryan G, Chang ES, Marchler-Bauer A, Schneider VA (September 2023). "The NIH Comparative Genomics Resource: addressing the promises and challenges of comparative genomics on human health". BMC Genomics. 24 (1): 575. doi: 10.1186/s12864-023-09643-4 . PMC   10523801 . PMID   37759191.
  38. Zoonomia C, Serres A, Armstrong J, Johnson J, Marinescu VD, Murén E, et al. (November 2020). "A comparative genomics multitool for scientific discovery and conservation". Nature. 587 (7833): 240–245. Bibcode:2020Natur.587..240Z. doi:10.1038/s41586-020-2876-6. PMC   7759459 . PMID   33177664.
  39. Lappalainen T, Scott AJ, Brandt M, Hall IM (March 2019). "Genomic Analysis in the Age of Human Genome Sequencing". Cell. 177 (1): 70–84. doi:10.1016/j.cell.2019.02.032. PMC   6532068 . PMID   30901550.
  40. Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J (March 2014). "A general framework for estimating the relative pathogenicity of human genetic variants". Nature Genetics. 46 (3): 310–315. doi:10.1038/ng.2892. PMC   3992975 . PMID   24487276.
  41. de la Fuente R, Díaz-Villanueva W, Arnau V, Moya A (February 2023). "Genomic Signature in Evolutionary Biology: A Review". Biology. 12 (2): 322. doi: 10.3390/biology12020322 . PMC   9953303 . PMID   36829597.
  42. Verma S, Thakur A, Katoch S, Shekhar C, Wani AH, Kumar S, et al. (October 2017). "Differences in innate and adaptive immune response traits of Pahari (Indian non-descript indigenous breed) and Jersey crossbred cattle". Veterinary Immunology and Immunopathology. 192: 20–27. doi:10.1016/j.vetimm.2017.09.003. PMID   29042011.
  43. Cristianini N, Hahn M (2006). Introduction to Computational Genomics. Cambridge University Press. ISBN   978-0-521-67191-0.
  44. Pratas D, Silva RM, Pinho AJ, Ferreira PJ (May 2015). "An alignment-free method to find and visualise rearrangements between pairs of DNA sequences". Scientific Reports. 5: 10203. Bibcode:2015NatSR...510203P. doi:10.1038/srep10203. PMC   4434998 . PMID   25984837.
  45. Via A, De Las Rivas J, Attwood TK, Landsman D, Brazas MD, Leunissen JA, et al. (October 2011). "Ten simple rules for developing a short bioinformatics training course". PLOS Computational Biology. 7 (10): e1002245. Bibcode:2011PLSCB...7E2245V. doi: 10.1371/journal.pcbi.1002245 . PMC   3203054 . PMID   22046119.
  46. 1 2 Damas J, Corbo M, Kim J, Turner-Maier J, Farré M, Larkin DM, et al. (October 2022). "Evolution of the ancestral mammalian karyotype and syntenic regions". Proceedings of the National Academy of Sciences of the United States of America. 119 (40): e2209139119. Bibcode:2022PNAS..11909139D. doi: 10.1073/pnas.2209139119 . PMC   9550189 . PMID   36161960.
  47. Altschul SF, Pop M (2017). "Sequence Alignment". In Rosen KH, Shier DR, Goddard W (eds.). Handbook of Discrete and Combinatorial Mathematics (2nd ed.). Boca Raton (FL): CRC Press/Taylor & Francis. ISBN   978-1-58488-780-5. PMID   29206392 . Retrieved 2022-12-18.
  48. Prjibelski AD, Korobeynikov AI, Lapidus AL (2019-01-01). "Sequence Analysis". In Ranganathan S, Gribskov M, Nakai K, Schönbach C (eds.). Encyclopedia of Bioinformatics and Computational Biology. Oxford: Academic Press. pp. 292–322. doi:10.1016/b978-0-12-809633-8.20106-4. ISBN   978-0-12-811432-2. S2CID   226247797.
  49. Haubold B, Wiehe T (September 2004). "Comparative genomics: methods and applications". Die Naturwissenschaften. 91 (9): 405–421. Bibcode:2004NW.....91..405H. doi:10.1007/s00114-004-0542-8. PMID   15278216.
  50. Dunn CW, Zapata F, Munro C, Siebert S, Hejnol A (January 2018). "Pairwise comparisons across species are problematic when analyzing functional genomic data". Proceedings of the National Academy of Sciences of the United States of America. 115 (3): E409–E417. Bibcode:2018PNAS..115E.409D. doi: 10.1073/pnas.1707515115 . PMC   5776959 . PMID   29301966.
  51. Hardison RC, Oeltjen J, Miller W (October 1997). "Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome". Genome Research. 7 (10): 959–966. doi: 10.1101/gr.7.10.959 . PMID   9331366.
  52. Elgar G, Sandford R, Aparicio S, Macrae A, Venkatesh B, Brenner S (April 1996). "Small is beautiful: comparative genomics with the pufferfish (Fugu rubripes)". Trends in Genetics. 12 (4): 145–150. doi:10.1016/0168-9525(96)10018-4. PMID   8901419.
  53. Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, et al. (February 2003). "Phylogenetic shadowing of primate sequences to find functional regions of the human genome". Science. 299 (5611): 1391–1394. doi:10.1126/science.1081331. PMID   12610304.
  54. Dewey CN (2012). "Whole-Genome Alignment". In Anisimova M (ed.). Evolutionary Genomics. Methods in Molecular Biology. Vol. 855. Totowa, NJ: Humana Press. pp. 237–257. doi:10.1007/978-1-61779-582-4_8. ISBN   978-1-61779-581-7. PMID   22407711.
  55. Goel M, Sun H, Jiao W, Schneeberger K (2019). "SyRI: Finding genomic rearrangements and local sequence differences from whole-genome assemblies". Genome Biology. 20 (1): 277. doi: 10.1186/s13059-019-1911-0 . PMC   6913012 . PMID   31842948.
  56. Haubold B, Wiehe T (September 2004). "Comparative genomics: methods and applications". Die Naturwissenschaften. 91 (9): 405–421. Bibcode:2004NW.....91..405H. doi:10.1007/s00114-004-0542-8. PMID   15278216. S2CID   2041895.
  57. Liu D, Hunt M, Tsai IJ (January 2018). "Inferring synteny between genome assemblies: a systematic evaluation". BMC Bioinformatics. 19 (1): 26. doi: 10.1186/s12859-018-2026-4 . PMC   5791376 . PMID   29382321.
  58. Duran C, Edwards D, Batley J (2009). "Genetic Maps and the Use of Synteny". Plant Genomics. Methods in Molecular Biology. Vol. 513. pp. 41–55. doi:10.1007/978-1-59745-427-8_3. ISBN   978-1-58829-997-0. PMID   19347649.
  59. 1 2 3 Bergman NH (2007). Bergman NH (ed.). Comparative Genomics: Volumes 1 and 2. Totowa, New Jersey: Humana Press. ISBN   978-193411-537-4. PMID   21250292.
  60. "UCSC Browser".
  61. "Ensembl Genome Browser". Archived from the original on 2013-10-21.
  62. "Map Viewer".
  63. "VISTA tools".
  64. Soh J, Gordon PM, Sensen CW (March 2012). "The Bluejay genome browser". Current Protocols in Bioinformatics. 37. John Wiley & Sons, Inc. Chapter 10, Unit 10.9. doi:10.1002/0471250953.bi1009s37. ISBN   9780471250951. PMID   22389011. S2CID   34553139.
  65. Goel M, Sun H, Jiao WB, Schneeberger K (December 2019). "SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies". Genome Biology. 20 (1): 277. doi: 10.1186/s13059-019-1911-0 . PMC   6913012 . PMID   31842948.
  66. Haug-Baltzell A, Stephens SA, Davey S, Scheidegger CE, Lyons E (July 2017). "SynMap2 and SynMap3D: web-based whole-genome synteny browsers". Bioinformatics. 33 (14): 2197–2198. doi:10.1093/bioinformatics/btx144. PMID   28334338.
  67. Lin HN, Hsu WL (February 2020). "GSAlign: an efficient sequence alignment tool for intra-species genomes". BMC Genomics. 21 (1): 182. doi: 10.1186/s12864-020-6569-1 . PMC   7041101 . PMID   32093618.
  68. Thorvaldsdóttir H, Robinson JT, Mesirov JP (March 2013). "Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration". Briefings in Bioinformatics. 14 (2): 178–192. doi:10.1093/bib/bbs017. PMC   3603213 . PMID   22517427.
  69. Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al. (April 2016). "Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications". Bioinformatics. 32 (8): 1220–1222. doi:10.1093/bioinformatics/btv710. PMID   26647377.
  70. Abyzov A, Urban AE, Snyder M, Gerstein M (June 2011). "CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing". Genome Research. 21 (6): 974–984. doi:10.1101/gr.114876.110. PMC   3106330 . PMID   21324876.
  71. Elnitski L, Riemer C, Schwartz S, Hardison R, Miller W (February 2003). "PipMaker: a World Wide Web server for genomic sequence alignments". Current Protocols in Bioinformatics. Chapter 10. Chapter 10, Unit 10.2. doi:10.1002/0471250953.bi1002s00. PMID   18428692.
  72. Pal K, Bystry V, Reigl T, Demko M, Krejci A, Touloumenidou T, et al. (December 2017). "GLASS: assisted and standardized assessment of gene variations from Sanger sequence trace data". Bioinformatics. 33 (23): 3802–3804. doi:10.1093/bioinformatics/btx423. PMID   29036643.
  73. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A (January 2018). "MUMmer4: A fast and versatile genome alignment system". PLOS Computational Biology. 14 (1): e1005944. Bibcode:2018PLSCB..14E5944M. doi: 10.1371/journal.pcbi.1005944 . PMC   5802927 . PMID   29373581.
  74. Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, et al. (November 2010). "Genome-wide association studies of 14 agronomic traits in rice landraces". Nature Genetics. 42 (11): 961–967. doi:10.1038/ng.695. PMID   20972439. S2CID   439442.
  75. Morrell PL, Buckler ES, Ross-Ibarra J (December 2011). "Crop genomics: advances and applications". Nature Reviews. Genetics. 13 (2): 85–96. doi:10.1038/nrg3097. PMID   22207165. S2CID   13358998.
  76. Seib KL, Zhao X, Rappuoli R (October 2012). "Developing vaccines in the era of genomics: a decade of reverse vaccinology". Clinical Microbiology and Infection. 18 (Suppl 5): 109–116. doi: 10.1111/j.1469-0691.2012.03939.x . hdl: 10072/50260 . PMID   22882709.
  77. Maione D, Margarit I, Rinaudo CD, Masignani V, Mora M, Scarselli M, et al. (July 2005). "Identification of a universal Group B streptococcus vaccine by multiple genome screen". Science. 309 (5731): 148–150. Bibcode:2005Sci...309..148M. doi:10.1126/science.1109869. PMC   1351092 . PMID   15994562.
  78. Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, et al. (October 2008). "The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates". Journal of Bacteriology. 190 (20): 6881–6893. doi: 10.1128/JB.00619-08 . PMC   2566221 . PMID   18676672.
  79. "Group a Streptococcus Vaccine Target Candidates Identified from Global Genome Set". 28 May 2019.
  80. Sadee W (August 2011). "Genomics and personalized medicine". International Journal of Pharmaceutics. 415 (1–2): 2–4. doi:10.1016/j.ijpharm.2011.04.048. PMID   21539903.
  81. 1 2 3 4 Glusman G, Rowen L, Lee I, Boysen C, Roach JC, Smit AF, et al. (September 2001). "Comparative genomics of the human and mouse T cell receptor loci". Immunity. 15 (3): 337–349. doi: 10.1016/s1074-7613(01)00200-x . PMID   11567625.
  82. Rogers J, Gibbs RA (May 2014). "Comparative primate genomics: emerging patterns of genome content and dynamics". Nature Reviews. Genetics. 15 (5): 347–359. doi:10.1038/nrg3707. PMC   4113315 . PMID   24709753.
  83. Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B, et al. (July 2013). "Great ape genetic diversity and population history". Nature. 499 (7459): 471–475. Bibcode:2013Natur.499..471P. doi: 10.1038/nature12228 . PMC   3822165 . PMID   23823723.
  84. Zeng J, Konopka G, Hunt BG, Preuss TM, Geschwind D, Yi SV (September 2012). "Divergent whole-genome methylation maps of human and chimpanzee brains reveal epigenetic basis of human regulatory evolution". American Journal of Human Genetics. 91 (3): 455–465. doi: 10.1016/j.ajhg.2012.07.024 . PMC   3511995 . PMID   22922032.

Further reading