Initial release | 30 August 2007 |
---|---|
Stable release | 2.3.1 / 11 January 2014 |
Repository | |
Written in | PHP JavaScript C++ MySQL |
Type | Computer-Aided Design Bioinformatics |
License | Apache v2.0 |
Website | genocad |
GenoCAD is one of the earliest computer assisted design tools for synthetic biology. [1] The software is a bioinformatics tool developed and maintained by GenoFAB, Inc.. GenoCAD facilitates the design of protein expression vectors, artificial gene networks and other genetic constructs for genetic engineering and is based on the theory of formal languages. [2]
GenoCAD originated as an offshoot of an attempt to formalize functional constraints of genetic constructs using the theory of formal languages. In 2007, the website genocad.org (now retired) was set up as a proof of concept by researchers at Virginia Bioinformatics Institute, Virginia Tech. Using the website, users could design genes by repeatedly replacing high-level genetic constructs with lower level genetic constructs, and eventually with actual DNA sequences. [2]
On August 31, 2009, the National Science Foundation granted a three-year $1,421,725 grant to Dr. Jean Peccoud, an associate professor at the Virginia Bioinformatics Institute at Virginia Tech, for the development of GenoCAD. [3] GenoCAD was and continues to be developed by GenoFAB, Inc., a company founded by Peccoud (currently CSO and acting CEO), who was also one of the authors of the originating study. [2]
Source code for GenoCAD was originally released on SourceForge in December 2009. [4]
GenoCAD version 2.0 was released in November 2011 and included the ability to simulate the behavior of the designed genetic code. This feature was a result of a collaboration with the team behind COPASI. [5]
In April, 2015, Peccoud and colleagues published a library of biological parts, called GenoLIB, [6] that can be incorporated into the GenoCAD platform. [7]
The four aims of the project are to develop a: [8]
The main features of GenoCAD can be organized into three main categories. [9]
GenoCAD is rooted in the theory of formal languages; in particular, the design rules describing how to combine different kinds of parts and form context-free grammars. [2]
A context free grammar can be defined by its terminals, variables, start variable and substitution rules. [11] In GenoCAD, the terminals of the grammar are sequences of DNA that perform a particular biological purpose (e.g. a promoter). The variables are less homogeneous: they can represent longer sequences that have multiple functions or can represent a section of DNA that can contain one of multiple different sequences of DNA but perform the same function (e.g. a variable represents the set of promoters). GenoCAD includes built in substitution rules to ensure that the DNA sequence is biologically viable. Users can also define their own sets of rules for other purposes.
Designing a sequence of DNA in GenoCAD is much like creating a derivation in a context free grammar. The user starts with the start variable and repeatedly selects a variable and a substitution for it until only terminals are left. [2]
The most common alternatives to GenoCAD are Proto, GEC and EuGene [12]
Tool | Advantages | Disadvantages |
---|---|---|
GEC |
| |
EuGene |
|
|
Proto |
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.
{{redirect|NCBI} extended size at ro zygote 32GB becaming ro zygote 46GB}
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.
In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.
BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.
BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project.
Synthetic biology (SynBio) is a multidisciplinary field of science that focuses on living systems and organisms, and it applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nature.
In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.
Xenobiology (XB) is a subfield of synthetic biology, the study of synthesizing and manipulating biological devices and systems. The name "xenobiology" derives from the Greek word xenos, which means "stranger, alien". Xenobiology is a form of biology that is not (yet) familiar to science and is not found in nature. In practice, it describes novel biological systems and biochemistries that differ from the canonical DNA–RNA-20 amino acid system. For example, instead of DNA or RNA, XB explores nucleic acid analogues, termed xeno nucleic acid (XNA) as information carriers. It also focuses on an expanded genetic code and the incorporation of non-proteinogenic amino acids, or “xeno amino acids” into proteins.
In September 2021, Synthetic Genomics Inc. (SGI), a private company located in La Jolla, California, changed its name to Viridos. The company is focused on the field of synthetic biology, especially harnessing photosynthesis with micro algae to create alternatives to fossil fuels. Viridos designs and builds biological systems to address global sustainability problems.
BioBrick parts are DNA sequences which conform to a restriction-enzyme assembly standard. These building blocks are used to design and assemble larger synthetic biological circuits from individual parts and combinations of parts with defined functions, which would then be incorporated into living cells such as Escherichia coli cells to construct new biological systems. Examples of BioBrick parts include promoters, ribosomal binding sites (RBS), coding sequences and terminators.
Artificial gene synthesis, or simply gene synthesis, refers to a group of methods that are used in synthetic biology to construct and assemble genes from nucleotides de novo. Unlike DNA synthesis in living cells, artificial gene synthesis does not require template DNA, allowing virtually any DNA sequence to be synthesized in the laboratory. It comprises two main steps, the first of which is solid-phase DNA synthesis, sometimes known as DNA printing. This produces oligonucleotide fragments that are generally under 200 base pairs. The second step then involves connecting these oligonucleotide fragments using various DNA assembly methods. Because artificial gene synthesis does not require template DNA, it is theoretically possible to make a completely synthetic DNA molecule with no limits on the nucleotide sequence or size.
Christopher Voigt is an American synthetic biologist, molecular biophysicist, and engineer.
OpenSCAD is a free software application for creating solid 3D computer-aided design (CAD) objects. It is a script-only based modeller that uses its own description language; the 3D preview can be manipulated interactively, but cannot be interactively modified in 3D. Instead, an OpenSCAD script specifies geometric primitives and defines how they are modified and combined to render a 3D model. As such, the program performs constructive solid geometry (CSG). OpenSCAD is available for Windows, Linux, and macOS.
Gene Designer is a computer software package for bioinformatics. It is used by molecular biologists from academia, government, and the pharmaceutical, chemical, agricultural, and biotechnology industries to design, clone, and validate genetic sequences. It is proprietary software, released as freeware needing registration.
Synthetic biological circuits are an application of synthetic biology where biological parts inside a cell are designed to perform logical functions mimicking those observed in electronic circuits. The applications range from simply inducing production to adding a measurable element, like GFP, to an existing natural biological circuit, to implementing completely new systems of many parts.
ATUM is an American biotechnology company which provides tools for the life sciences, from design and synthesis of optimized DNA to protein production and GMP cell line development.
The Synthetic Biology Open Language (SBOL) is a proposed data standard for exchanging synthetic biology designs between software packages. It has been under development by the SBOL Developers Group since 2008. This group aims to develop the standard in a way that is open and democratic in order to include as many interests as possible and to avoid domination by a single company. The group also aims to develop and improve the design standard over time as the field of synthetic biology reflects this development.
Genome Informatics is a scientific study of information processing in genomes.