Gene transfer format

Last updated

The Gene transfer format (GTF) is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information. A significant feature of the GTF that can be validated: given a sequence and a GTF file, one can check that the format is correct. This significantly reduces problems with the interchange of data between groups.

GTF is identical to GFF, version 2. [1]

Related Research Articles

Waveform Audio File Format is an audio file format standard, developed by IBM and Microsoft, for storing an audio bitstream on PCs. It is the main format used on Microsoft Windows systems for uncompressed audio. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

<span class="mw-page-title-main">General transcription factor</span> Class of protein transcription factors

General transcription factors (GTFs), also known as basal transcriptional factors, are a class of protein transcription factors that bind to specific sites (promoter) on DNA to activate transcription of genetic information from DNA to messenger RNA. GTFs, RNA polymerase, and the mediator constitute the basic transcriptional apparatus that first bind to the promoter, then start transcription. GTFs are also intimately involved in the process of gene regulation, and most are required for life.

Microsoft Assistance Markup Language is an XML-based markup language developed by the Microsoft User Assistance Platform team to provide user assistance for the Microsoft Windows Vista operating system. It makes up the Assistance Platform on Windows Vista.

In bioinformatics, the general feature format is a file format used for describing genes and other features of DNA, RNA and protein sequences.

Transcription factor TFIIA is a nuclear protein involved in the RNA polymerase II-dependent transcription of DNA. TFIIA is one of several general (basal) transcription factors (GTFs) that are required for all transcription events that use RNA polymerase II. Other GTFs include TFIID, a complex composed of the TATA binding protein TBP and TBP-associated factors (TAFs), as well as the factors TFIIB, TFIIE, TFIIF, and TFIIH. Together, these factors are responsible for promoter recognition and the formation of a transcription preinitiation complex (PIC) capable of initiating RNA synthesis from a DNA template.

<span class="mw-page-title-main">GTF2I</span> Protein-coding gene in the species Homo sapiens

General transcription factor II-I is a protein that in humans is encoded by the GTF2I gene.

<span class="mw-page-title-main">GTF3C2</span> Protein-coding gene in the species Homo sapiens

General transcription factor 3C polypeptide 2 is a protein that in humans is encoded by the GTF3C2 gene.

<span class="mw-page-title-main">GTF3A</span> Protein-coding gene in the species Homo sapiens

Transcription factor IIIA is a protein that in humans is encoded by the GTF3A gene. It was first isolated and characterized by Wolffe and Brown in 1988.

<span class="mw-page-title-main">GTF2H5</span> Protein-coding gene in the species Homo sapiens

General transcription factor IIH subunit 5 is a protein that in humans is encoded by the GTF2H5 gene.

<span class="mw-page-title-main">UGENE</span>

UGENE is computer software for bioinformatics. It works on personal computer operating systems such as Windows, macOS, or Linux. It is released as free and open-source software, under a GNU General Public License (GPL) version 2.

GENCODE is a scientific project in genome research and part of the ENCODE scale-up project.

<span class="mw-page-title-main">Integrated Genome Browser</span>

Integrated Genome Browser (IGB) is an open-source genome browser, a visualization tool used to observe biologically-interesting patterns in genomic data sets, including sequence data, gene models, alignments, and data from DNA microarrays.

In the BitTorrent file distribution system, a torrent file or meta-info file is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers, which are computers that help participants in the system find each other and form efficient distribution groups called swarms. A torrent file does not contain the content to be distributed; it only contains information about those files, such as their names, folder structure, sizes, and cryptographic hash values for verifying file integrity. Torrent files are normally named with the extension ".torrent".

<span class="mw-page-title-main">GTFS</span> Data standard for public transport information

GTFS, which stands for General Transit Feed Specification or (originally) Google Transit Feed Specification, defines a common format for public transportation schedules and associated geographic information. GTFS contains only static or scheduled information about public transport services, and is sometimes known as GTFS Static to distinguish it from the GTFS Realtime extension, which defines how information on the realtime status of services can be shared.

<span class="mw-page-title-main">Variant Call Format</span> Text file format for genomic data

The Variant Call Format (VCF) specifies the format of a text file used in bioinformatics for storing gene sequence variations. The format has been developed with the advent of large-scale genotyping and DNA sequencing projects, such as the 1000 Genomes Project. Existing formats for genetic data such as General feature format (GFF) stored all of the genetic data, much of which is redundant because it will be shared across the genomes. By using the variant call format only the variations need to be stored along with a reference genome.

Ensembl Genomes is a scientific project to provide genome-scale data from non-vertebrate species.

GFF may refer to:

<span class="mw-page-title-main">ANNOVAR</span> Bioinformatics software

ANNOVAR is a bioinformatics software tool for the interpretation and prioritization of single nucleotide variants (SNVs), insertions, deletions, and copy number variants (CNVs) of a given genome. It has the ability to annotate human genomes hg18, hg19, hg38, and model organisms genomes such as: mouse, zebrafish, fruit fly, roundworm, yeast and many others. The annotations could be used to determine the functional consequences of the mutations on the genes and organisms, infer cytogenetic bands, report functional importance scores, and/or find variants in conserved regions. ANNOVAR along with SNP effect (SnpEFF) and Variant Effect Predictor (VEP) are three of the most commonly used variant annotation tools.

The BED format is a text file format used to store genomic regions as coordinates and associated annotations. The data are presented in the form of columns separated by spaces or tabs. This format was developed during the Human Genome Project and then adopted by other sequencing projects. As a result of this increasingly wide use, this format had already become a de facto standard in bioinformatics before a formal specification was written.

References