Genome browser

Last updated

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. [1] Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. [2] The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

Contents

The genome browser is an important tool for studying the genome. In bioinformatics, a genome browser is a graphical interface for displaying information from a biological database for genomic data. [2] It is a software tool that displays genetic data in graphical form. Genome browsers enable users to visualize and browse entire genomes with annotated data, including gene prediction, gene structure, protein, expression, regulation, variation, and comparative analysis. Annotated data is usually from multiple diverse sources. They differ from ordinary biological databases in that they display data in a graphical format, with genome coordinates on one axis with annotations or space-filling graphics to show analyses of the genes, such as the frequency of the genes and their expression profiles. [1] The software allows users to navigate the genome, view numerous features, analyze and investigate the relationships between various genomic elements.

History

The first genome browser, known as the Ensembl Genome Browser, was develop as part of the Human Genome Project by a group of researchers from the European Bioinformatics Institute (EBI). It was created with the aim of providing a complete resource for the human genome sequence, with focus on gene annotation. It is a user-friendly interface for exploring the human genome and other organism's genomes. Several more genome browsers have been created, including the UCSC Genome Browser , developed in 2000 by Jim Kent and David Haussler, and the NCBI's Genome Data Viewer. [2] [3]

These genome browsers may support multiple genomes, however, other genome browsers may be specific for particular species. These browsers may provide summary of data from genomic databases and comparative assessment of different genetic sequences across multiple species, and allow the data to be visualized in various ways to facilitate assessment and interpretation of these complex data. [4] [5]

Characteristics of a Browser

Genome Assembly and Annotation: Give access to the reference genome assembly, serving as a framework for overlaying and analyzing other genomic data. They also include gene annotations that provide information about gene locations, transcripts, and functional elements. There is no specific browser that is considered the "best" for genome annotation and assembly as it ultimately depends on the specific needs of the user and the type of analysis being performed. Integrative Genomics Viewer (IGV): IGV is a popular browser for visualizing and annotating genomic data, including genomic variation, gene expression, and chromatin structure. It supports a wide range of file formats and provides advanced tools for data analysis.

Data Overlay and Integration: Allow users to overlay and integrate diverse genomic data types, such as DNA sequencing data, gene expression data, and epigenetic data, onto the reference genome. This enables researchers to study relationships between different genomic features and datasets.The choice of the most suitable genome annotation and assembly browser varies depending on the specific analysis needs and preferences of the user. However, one popular option for visualizing and annotating genomic data is the Integrative Genomics Viewer (IGV), which offers a wide range of data analysis tools and supports various file formats, including genomic variation, gene expression, and chromatin structure data.

Visualization Tools: Offer visualization tools that enable users to visualize genomic data in various formats, such as heatmaps, line plots, bar plots, and genomic tracks. These tools facilitate exploration and interpretation of complex genomic data in a graphical format. The UCSC Genome Browser is a popular and comprehensive genome browser that offers a wide range of visualization tools for genomic data, such as genetic variation, gene expression, and epigenetic modifications. Additionally, it provides access to numerous publicly available datasets for comparative genomics research. [6]

Zooming and Navigation: Provide zooming and navigation tools that allow users to explore genomic data at different scales, from the whole genome down to individual nucleotides. This facilitates navigation and focus on specific genomic regions of interest. Again UCSC is a great browser for navigation, however the NCBI in Figure 1 as featured in the figure below has logical navigation and user interface .

Search and Retrieval: Include search and retrieval features that allow users to search for specific genes, genomic regions, or functional elements. This simplifies the process of locating and retrieving relevant genomic data for analysis. The NCBI browser [7] is a valuable tool for genomics research due to its extensive database, user-friendly interface, and integration with other NCBI tools. It provides access to a large and diverse set of biological databases, including the GenBank database, making it easier for users to search and retrieve genomic data. Additionally, the user-friendly interface and advanced search options allow for more efficient searches, while the integration with other NCBI tools ensures a seamless search and retrieval experience.

Comparative Genomics: Some genomic browsers include features for comparing and analyzing genomic data from different species or strains. This enables researchers to study evolutionary relationships, identify conserved regions, and compare gene orthologs. Ensembl [8] offers advanced comparative genomics tools, including the ability to compare gene structures, genome alignments, and synteny between different organisms.

Customization and Annotation: Can allow users to customize the display of genomic data by adding their own annotations, tracks, or visualizations. This enables researchers to tailor the browser interface to their specific research needs and hypotheses.

Data Sharing and Collaboration: Contain features for data sharing and collaboration, such as the ability to share browser sessions, save customizations, or collaborate with other researchers in real-time. This promotes collaboration and data sharing among researcher. GMOD: GMOD (Generic Model Organism Database) is a collection of open-source tools for building and sharing genome databases. It provides a framework for integrating genomic data with other biological data types, such as proteomics and metabolomics, and allows for the sharing of data and analysis with collaborators.

Analysis Tools: Some browsers provide analysis tools, such as tools for identifying differentially expressed genes, predicting functional elements, or performing other computational analyses on the genomic data directly within the browser environment.

GDV-table-view-2022.png

↵The two images show the features and inputs of the NCBI Genomic Browser which is one of many. The right image displays the Chr1 region of the human gene. The box at the bottom highlighted in red shows the customizable options such as BLAST, track by accession, assembly details, history, and tracks/user data. These features can be different across different genomic platforms.

Features and functionality

The genome browser displays the genome as a series of tracks or layers that can be toggled on or off based on the needs of the user. Each track represents a unique genomic feature such as genes, transcripts, regulatory region, or sequence variations. The user can zoom in and out of a certain genome region to view different level of detail or additional information, as well as navigate to specific regions using a search function or by clicking on a specific feature.

Aside from gene annotations, genome browsers can display a variety of different data types, such as:

DNA Sequence: This can be shown as a single linear track or as several tracks, with different colors signifying distinct features (for example, exons, introns, and repetitions).

Variation Data: This includes information on Single-nucleotide polymorphism (SNPs), insertions/deletions (indels), and structural variants.

Transcriptomics: This contains information on gene expression levels, alternative splicing, and non-coding RNAs.

Proteomics: This includes information on protein expression levels, post-translational modifications, and protein-protein interactions.

Applications

Genome browsers are used in a variety of research fields, including bioinformatics, genetics, and clinical genomics. They allow researchers to investigate the genetic basis of disease, evolution, and other biological processes. Here are some instances of how genome browsers are being used in various fields:

Evolutionary Biology: Genome browsers are used to study and compare the genomes of various organisms to identify similarities and differences in gene structure, regulatory element, function and repetitive sequence. This can provide evolutionary insight into the relationship between different species and also help identify genetic alteration that underpin adaptation and speciation, as well as provide evolutionary insight into relationship between different species.

Clinical Genomics: Genome browsers are used to study the genetic basis of disease. By examining the genome of a patient, researchers can identify genetic mutation that may be responsible for the disease. Genome browsers enable researchers to investigate these mutations' possible impact on gene expression and protein function by visualizing them in the context of the genome and how proteins work.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

The Bioinformatic Harvester was a bioinformatic meta search engine created by the European Molecular Biology Laboratory and subsequently hosted and further developed by KIT Karlsruhe Institute of Technology for genes and protein-associated information. Harvester currently works for human, mouse, rat, zebrafish, drosophila and arabidopsis thaliana based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves tens of thousands of pages every day to scientists and physicians. Since 2014 the service is down.

<span class="mw-page-title-main">Generic Model Organism Database</span>

The Generic Model Organism Database (GMOD) project provides biological research communities with a toolkit of open-source software components for visualizing, annotating, managing, and storing biological data. The GMOD project is funded by the United States National Institutes of Health, National Science Foundation and the USDA Agricultural Research Service.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

DAVID is a free online bioinformatics resource developed by the Laboratory of Human Retrovirology and Immunoinformatics. All tools in the DAVID Bioinformatics Resources aim to provide functional interpretation of large lists of genes derived from genomic studies, e.g. microarray and proteomics studies. DAVID can be found at https://david.ncifcrf.gov/

<span class="mw-page-title-main">UGENE</span>

UGENE is computer software for bioinformatics. It works on personal computer operating systems such as Windows, macOS, or Linux. It is released as free and open-source software, under a GNU General Public License (GPL) version 2.

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

GenomeSpace is an environment for genomics software tools and applications. It helps users manage their analysis workflows involving multiple diverse tools, including web applications and desktop tools and facilitates the transfer of data between tools via automatic format conversion. Analyses can use data from local or cloud-based stores.

<span class="mw-page-title-main">Gene set enrichment analysis</span> Bioinformatics method

Gene set enrichment analysis (GSEA) (also called functional enrichment analysis or pathway enrichment analysis) is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with different phenotypes (e.g. different organism growth patterns or diseases). The method uses statistical approaches to identify significantly enriched or depleted groups of genes. Transcriptomics technologies and proteomics results often identify thousands of genes, which are used for the analysis.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

Echinobase is a Model Organism Database (MOD). It supports the international research community by providing a centralized, integrated web based resource to access the diverse and rich, functional genomics data of echinoderm evolution, development and gene regulatory networks.

References

  1. 1 2 Wang Ziling; Zhang Lishu (2 July 2018). Essential Computing Skills for Biologists. World Scientific Publishing Company. pp. 20–29. ISBN   978-1-84816-926-5.
  2. 1 2 3 Jun Wang; Lei Kong; Ge Gao; Jingchu Luo (March 2013). "A brief introduction to web-based genome browsers". Briefings in Bioinformatics. 14 (2): 131–143. doi: 10.1093/bib/bbs029 . PMID   22764121.
  3. Michael Speicher; Stylianos E. Antonarakis; Arno G. Motulsky, eds. (2010). "Databases and Genome Browsers". Vogel and Motulsky's Human Genetics: Problems and Approaches (4th ed.). Springer. pp. 905–920. ISBN   978-3-540-37653-8.
  4. Jonathan Pevsner (26 October 2015). Bioinformatics and Functional Genomics (3rd ed.). Wiley. pp. 50–52. ISBN   978-1-118-58178-0.
  5. Joel T. Dudley; Konrad J. Karczewski (3 January 2013). Exploring Personal Genomics. pp. 64–72. ISBN   978-0-19-964448-3.
  6. "UCSC Genome Browser Home". genome.ucsc.edu. Retrieved 2023-05-03.
  7. "National Center for Biotechnology Information". www.ncbi.nlm.nih.gov. Retrieved 2023-05-03.
  8. "Ensembl genome browser 109". useast.ensembl.org. Retrieved 2023-05-03.