Genome in a Bottle

Last updated

Genome in a Bottle is a consortium hosted by NIST and dedicated to characterization of benchmark human genomes. [1] [2] The NCBI is serving as the repository for the detailed information on samples, genotypes, raw sequencing reads and mapped reads, via a dedicated FTP site. [3]

Related Research Articles

In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun.

The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data connections between the client and the server. FTP users may authenticate themselves with a clear-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that protects the username and password, and encrypts the content, FTP is often secured with SSL/TLS (FTPS) or replaced with SSH File Transfer Protocol (SFTP).

<span class="mw-page-title-main">DNA sequencer</span> A scientific instrument used to automate the DNA sequencing process

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

<span class="mw-page-title-main">Genome project</span>

Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism and to annotate protein-coding genes and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each chromosome in the organism. For a bacterium containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of autosomes and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.

<span class="mw-page-title-main">Entrez</span> Cross-database search engine for health sciences

The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCBI is a part of the National Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), which in turn is a part of the United States Department of Health and Human Services. The name "Entrez" was chosen to reflect the spirit of welcoming the public to search the content available from the NLM.

<span class="mw-page-title-main">FileZilla</span> Free software, cross-platform file transfer protocol application

FileZilla is a free and open-source, cross-platform FTP application, consisting of FileZilla Client and FileZilla Server. Clients are available for Windows, Linux, and macOS. Both server and client support FTP and FTPS, while the client can in addition connect to SFTP servers.

<span class="mw-page-title-main">National Human Genome Research Institute</span> Institute of the National Institutes of Health, located in Bethesda, Maryland, US

The National Human Genome Research Institute (NHGRI) is an institute of the National Institutes of Health, located in Bethesda, Maryland.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, United States.

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

<span class="mw-page-title-main">Steven Salzberg</span> American biologist and computer scientist

Steven Lloyd Salzberg is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.

Illumina, Inc. is an American biotechnology company, headquartered in San Diego, California. Incorporated on April 1, 1998, Illumina develops, manufactures, and markets integrated systems for the analysis of genetic variation and biological function. The company provides a line of products and services that serves the sequencing, genotyping and gene expression, and proteomics markets.

<span class="mw-page-title-main">CACNB4</span>

Voltage-dependent L-type calcium channel subunit beta-4 is a protein that in humans is encoded by the CACNB4 gene.

<span class="mw-page-title-main">CACNB3</span>

Voltage-dependent L-type calcium channel subunit beta-3 is a protein that in humans is encoded by the CACNB3 gene.

<span class="mw-page-title-main">Whole genome sequencing</span> Determining nearly the entirety of the DNA sequence of an organisms genome at a single time.

Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1994 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA. Both simple and advanced tools are provided, supporting complex tasks like variant calling and alignment viewing as well as sorting, indexing, data extraction and format conversion. SAM files can be very large, so compression is used to save space. SAM files are human-readable text files, and BAM files are simply their binary equivalent, whilst CRAM files are a restructured column-oriented binary container format. BAM files are typically compressed and more efficient for software to work with than SAM. SAMtools makes it possible to work directly with a compressed BAM file, without having to uncompress the whole file. Additionally, since the format for a SAM/BAM file is somewhat complex - containing reads, references, alignments, quality information, and user-specified annotations - SAMtools reduces the effort needed to use SAM/BAM files by hiding low-level details.

<i>Mytilus unguiculatus</i> Species of bivalve

Mytilus unguiculatus, common name the Korean mussel or the hard-shelled mussel, is a species of mussel, a marine bivalve mollusc in the family Mytilidae. This species is heavily exploited as a food item via mariculture in Korea and in China. It is also a typical macrofouling organism.

In genetics, the gene density of an organism's genome is the ratio of the number of genes per number of base pairs, usually written in terms of a million base pairs, or megabase (Mb). The human genome has a gene density of 11-15 genes/Mb, while the genome of the C. elegans roundworm is estimated to have 200.

Leonid Leon Peshkin is a scientist working at the Systems Biology Department at Harvard Medical School. Peshkin's research interests include embryology, evolution and aging.

Korean Genome Project (KGP) is the largest Korean Genome Project which currently includes over 10,000 human genomes sequenced in Korea by April 2021. KGP was originated from the national initiative of sequencing the reference Korean and whole population genomes in 2006 by KOBIC, KRIBB and NCSRD, KRISS, Daejeon in Korea. From 2009, KGP was supported by the Genome Research Foundation and TheragenEtex to build the Variome of Koreans as well as the Korean Reference Genome (KOREF). Starting from KOREF, a consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project was completed in 2017. Updating the technology an improved version of KOREF was then constructed using long-read sequencing data produced by Oxford Nanopore PromethION and PacBio technologies has been released showcasing newer assembly technologies and techniques. In 2022 a new chromosome-level haploid assembly of KOREF was published, assembled using Oxford Nanopore Technologies PromethION, Pacific Biosciences HiFi-CCS, and Hi-C technology.

References

  1. "GIAB". Genome in a Bottle. Retrieved 2021-06-24.
  2. "Genome in a bottle—a human DNA standard". Nature Biotechnology. 33 (7): 675. 2015-07-01. doi:10.1038/nbt0715-675a. ISSN   1546-1696. S2CID   27551129.
  3. "GIAB FTP". Genome in a Bottle FTP. Retrieved 2021-06-24.