GenoCAD

Last updated
Initial release30 August 2007 (2007-08-30)
Stable release
2.3.1 / 11 January 2014;9 years ago (2014-01-11)
Repository
Written in PHP JavaScript C++ MySQL
Type Computer-Aided Design Bioinformatics
License Apache v2.0
Website genocad.com

GenoCAD is one of the earliest computer assisted design tools for synthetic biology. [1] The software is a bioinformatics tool developed and maintained by GenoFAB, Inc.. GenoCAD facilitates the design of protein expression vectors, artificial gene networks and other genetic constructs for genetic engineering and is based on the theory of formal languages. [2]

Contents

History

GenoCAD originated as an offshoot of an attempt to formalize functional constraints of genetic constructs using the theory of formal languages. In 2007, the website genocad.org (now retired) was set up as a proof of concept by researchers at Virginia Bioinformatics Institute, Virginia Tech. Using the website, users could design genes by repeatedly replacing high-level genetic constructs with lower level genetic constructs, and eventually with actual DNA sequences. [2]

On August 31, 2009, the National Science Foundation granted a three-year $1,421,725 grant to Dr. Jean Peccoud, an associate professor at the Virginia Bioinformatics Institute at Virginia Tech, for the development of GenoCAD. [3] GenoCAD was and continues to be developed by GenoFAB, Inc., a company founded by Peccoud (currently CSO and acting CEO), who was also one of the authors of the originating study. [2]

Source code for GenoCAD was originally released on SourceForge in December 2009. [4]

GenoCAD version 2.0 was released in November 2011 and included the ability to simulate the behavior of the designed genetic code. This feature was a result of a collaboration with the team behind COPASI. [5]

In April, 2015, Peccoud and colleagues published a library of biological parts, called GenoLIB, [6] that can be incorporated into the GenoCAD platform. [7]

Goals

The four aims of the project are to develop a: [8]

  1. computer language to represent the structure of synthetic DNA molecules used in E.coli, yeast, mice, and Arabidopsis thaliana cells
  2. compiler capable of translating DNA sequences into mathematical models in order to predict the encoded phenotype
  3. collaborative workflow environment which allow to share parts, designs, fabrication resource
  4. means to forward the results to the user community through an external advisory board, an annual user conference, and outreach to industry

Features

The main features of GenoCAD can be organized into three main categories. [9]

Workflow of GenoCAD GenoCAD workflow.png
Workflow of GenoCAD

Theoretical foundation

GenoCAD is rooted in the theory of formal languages; in particular, the design rules describing how to combine different kinds of parts and form context-free grammars. [2]

A context free grammar can be defined by its terminals, variables, start variable and substitution rules. [11] In GenoCAD, the terminals of the grammar are sequences of DNA that perform a particular biological purpose (e.g. a promoter). The variables are less homogeneous: they can represent longer sequences that have multiple functions or can represent a section of DNA that can contain one of multiple different sequences of DNA but perform the same function (e.g. a variable represents the set of promoters). GenoCAD includes built in substitution rules to ensure that the DNA sequence is biologically viable. Users can also define their own sets of rules for other purposes.

Designing a sequence of DNA in GenoCAD is much like creating a derivation in a context free grammar. The user starts with the start variable and repeatedly selects a variable and a substitution for it until only terminals are left. [2]

Alternatives

The most common alternatives to GenoCAD are Proto, GEC and EuGene [12]

ToolAdvantagesDisadvantages
GEC
  • Designer only needs to know basic part types and determine constraints [12]
EuGene
  • Interfacing with other simulation and assembly tools [12]
Proto
  • Choice of molecules and sequences can be made by other programs [12]
  • Integration capability with some other languages [12]
  • Relatively hard to learn [12]
  • Results are less efficient [1]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

{{redirect|NCBI} extended size at ro zygote 32GB becaming ro zygote 46GB}

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

<span class="mw-page-title-main">BioPerl</span> Collection of Perl modules for bioinformatics

BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project.

<span class="mw-page-title-main">Synthetic biology</span> Interdisciplinary branch of biology and engineering

Synthetic biology (SynBio) is a multidisciplinary field of science that focuses on living systems and organisms, and it applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nature.

In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

Xenobiology (XB) is a subfield of synthetic biology, the study of synthesizing and manipulating biological devices and systems. The name "xenobiology" derives from the Greek word xenos, which means "stranger, alien". Xenobiology is a form of biology that is not (yet) familiar to science and is not found in nature. In practice, it describes novel biological systems and biochemistries that differ from the canonical DNA–RNA-20 amino acid system. For example, instead of DNA or RNA, XB explores nucleic acid analogues, termed xeno nucleic acid (XNA) as information carriers. It also focuses on an expanded genetic code and the incorporation of non-proteinogenic amino acids, or “xeno amino acids” into proteins.

In September 2021, Synthetic Genomics Inc. (SGI), a private company located in La Jolla, California, changed its name to Viridos. The company is focused on the field of synthetic biology, especially harnessing photosynthesis with micro algae to create alternatives to fossil fuels. Viridos designs and builds biological systems to address global sustainability problems.

<span class="mw-page-title-main">BioBrick</span> Standard for components used in DNA synthesis

BioBrick parts are DNA sequences which conform to a restriction-enzyme assembly standard. These building blocks are used to design and assemble larger synthetic biological circuits from individual parts and combinations of parts with defined functions, which would then be incorporated into living cells such as Escherichia coli cells to construct new biological systems. Examples of BioBrick parts include promoters, ribosomal binding sites (RBS), coding sequences and terminators.

Artificial gene synthesis, or simply gene synthesis, refers to a group of methods that are used in synthetic biology to construct and assemble genes from nucleotides de novo. Unlike DNA synthesis in living cells, artificial gene synthesis does not require template DNA, allowing virtually any DNA sequence to be synthesized in the laboratory. It comprises two main steps, the first of which is solid-phase DNA synthesis, sometimes known as DNA printing. This produces oligonucleotide fragments that are generally under 200 base pairs. The second step then involves connecting these oligonucleotide fragments using various DNA assembly methods. Because artificial gene synthesis does not require template DNA, it is theoretically possible to make a completely synthetic DNA molecule with no limits on the nucleotide sequence or size.

<span class="mw-page-title-main">Christopher Voigt</span>

Christopher Voigt is an American synthetic biologist, molecular biophysicist, and engineer.

<span class="mw-page-title-main">OpenSCAD</span> Free software for creating 3D objects

OpenSCAD is a free software application for creating solid 3D computer-aided design (CAD) objects. It is a script-only based modeller that uses its own description language; the 3D preview can be manipulated interactively, but cannot be interactively modified in 3D. Instead, an OpenSCAD script specifies geometric primitives and defines how they are modified and combined to render a 3D model. As such, the program performs constructive solid geometry (CSG). OpenSCAD is available for Windows, Linux, and macOS.

<span class="mw-page-title-main">Gene Designer</span>

Gene Designer is a computer software package for bioinformatics. It is used by molecular biologists from academia, government, and the pharmaceutical, chemical, agricultural, and biotechnology industries to design, clone, and validate genetic sequences. It is proprietary software, released as freeware needing registration.

<span class="mw-page-title-main">Synthetic biological circuit</span>

Synthetic biological circuits are an application of synthetic biology where biological parts inside a cell are designed to perform logical functions mimicking those observed in electronic circuits. The applications range from simply inducing production to adding a measurable element, like GFP, to an existing natural biological circuit, to implementing completely new systems of many parts.

ATUM is an American biotechnology company which provides tools for the life sciences, from design and synthesis of optimized DNA to protein production and GMP cell line development.

<span class="mw-page-title-main">Synthetic Biology Open Language</span> Standard for exchange of biological designs

The Synthetic Biology Open Language (SBOL) is a proposed data standard for exchanging synthetic biology designs between software packages. It has been under development by the SBOL Developers Group since 2008. This group aims to develop the standard in a way that is open and democratic in order to include as many interests as possible and to avoid domination by a single company. The group also aims to develop and improve the design standard over time as the field of synthetic biology reflects this development.

<span class="mw-page-title-main">Genome informatics</span>

Genome Informatics is a scientific study of information processing in genomes.

References

  1. 1 2 Beal, Jacob; Phillips, Andrew; Densmore, Douglas; Cai, Yizhi (2011). "High-Level Programming Languages for Biomolecular Systems". In Koeppl, Heinz; Densmore, Douglas; Setti, Gianluca; di Bernardo, Mario (eds.). Design and Analysis of Biomolecular Circuits. New York Dordrecht Heidelberg London: Springer. p. 241. doi:10.1007/978-1-4419-6766-4. ISBN   978-1-4419-6765-7.
  2. 1 2 3 4 5 Cai Y; Hartnett B; Gustafsson C; Peccoud J (2007). "A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts". Bioinformatics. 23 (20): 2760–7. doi:10.1093/bioinformatics/btm446. PMID   17804435.
  3. Jodi Lewis (September 14, 2009). "National Science Foundation awards $1.4 million for GenoCAD development". Archived from the original on June 11, 2015. Retrieved October 7, 2013.
  4. "GenoCAD Code". Sourceforge. Retrieved 8 October 2013.
  5. Wilson, Mandy. "GenoCAD Release Notes". Peccoud Lab. Archived from the original on 13 October 2013. Retrieved 8 October 2013.
  6. Adames, Neil; Wilson, Mandy; Fang, Gang; Lux, Matthew; Glick, Benjamin; Peccoud, Jean (April 29, 2016). "GenoLIB: a database of biological parts derived from a library of common plasmid features". Nucleic Acids Research. 43 (10): 4823–32. doi:10.1093/nar/gkv272. PMC   4446419 . PMID   25925571.
  7. Adames N, Wilson M, Fang G, Lux M, Glick B, Peccoud J (2015). "GenoLIB: a database of biological parts derived from a library of common plasmid features". Nucleic Acids Research. 43 (10): 4823–32. doi:10.1093/nar/gkv272. PMC   4446419 . PMID   25925571.
  8. Jean Peccoud (June 21, 2013). "GenoCAD: Computer Assisted Design of Synthetic DNA". Archived from the original on July 7, 2013. Retrieved October 7, 2013.
  9. Wilson ML; Hertzberg R; Adam L; Peccoud J (2011). "A Step-by-Step Introduction to Rule-Based Design of Synthetic Genetic Constructs Using GenoCAD". Synthetic Biology, Part B - Computer Aided Design and DNA Assembly. Methods in Enzymology. Vol. 498. pp. 173–88. doi:10.1016/B978-0-12-385120-8.00008-5. ISBN   9780123851208. PMID   21601678.
  10. Cai, Y.; Lux, M. W.; Adam, L.; Peccoud, J. (2009). Sauro, Herbert M (ed.). "Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars". PLOS Computational Biology. 5 (10): e1000529. Bibcode:2009PLSCB...5E0529C. doi: 10.1371/journal.pcbi.1000529 . PMC   2748682 . PMID   19816554.
  11. Sipser, Michael (2013). Introduction to the Theory of Computation, Third edition. Boston, MA, USA: Cengage Learning. p. 104. ISBN   978-1-133-18779-0.
  12. 1 2 3 4 5 6 7 8 Habibi, N., Mohd Hashim, S. Z., Rodriguez, C. A., & Samian, M. R. (2013). A Review of CADs, Languages and Data Models for Synthetic Biology. Jurnal Teknologi, 63(1).
  13. Pedersen, M. (2010). Modular languages for systems and synthetic biology.