Protein production is the biotechnological process of generating a specific protein. It is typically achieved by the manipulation of gene expression in an organism such that it expresses large amounts of a recombinant gene. This includes the transcription of the recombinant DNA to messenger RNA (mRNA), the translation of mRNA into polypeptide chains, which are ultimately folded into functional proteins and may be targeted to specific subcellular or extracellular locations. [1]
Protein production systems (also known as expression systems) are used in the life sciences, biotechnology, and medicine. Molecular biology research uses numerous proteins and enzymes, many of which are from expression systems; particularly DNA polymerase for PCR, reverse transcriptase for RNA analysis, restriction endonucleases for cloning, and to make proteins that are screened in drug discovery as biological targets or as potential drugs themselves. There are also significant applications for expression systems in industrial fermentation, notably the production of biopharmaceuticals such as human insulin to treat diabetes, and to manufacture enzymes.
Commonly used protein production systems include those derived from bacteria, [2] [3] yeast, [4] [5] baculovirus/insect, [6] mammalian cells, [7] [8] and more recently filamentous fungi such as Myceliophthora thermophila . [9] When biopharmaceuticals are produced with one of these systems, process-related impurities termed host cell proteins also arrive in the final product in trace amounts. [10]
This article includes a list of general references, but it lacks sufficient corresponding inline citations .(January 2024) |
The oldest and most widely used expression systems are cell-based and may be defined as the "combination of an expression vector, its cloned DNA, and the host for the vector that provide a context to allow foreign gene function in a host cell, that is, produce proteins at a high level". [11] [12] Overexpression is an abnormally and excessively high level of gene expression which produces a pronounced gene-related phenotype. [13] [14] [ clarification needed ]
There are many ways to introduce foreign DNA to a cell for expression, and many different host cells may be used for expression — each expression system has distinct advantages and liabilities. Expression systems are normally referred to by the host and the DNA source or the delivery mechanism for the genetic material. For example, common hosts are bacteria (such as E. coli , B. subtilis ), yeast (such as S. cerevisiae [5] ) or eukaryotic cell lines. Common DNA sources and delivery mechanisms are viruses (such as baculovirus, retrovirus, adenovirus), plasmids, artificial chromosomes and bacteriophage (such as lambda). The best expression system depends on the gene involved, for example the Saccharomyces cerevisiae is often preferred for proteins that require significant posttranslational modification. Insect or mammal cell lines are used when human-like splicing of mRNA is required. Nonetheless, bacterial expression has the advantage of easily producing large amounts of protein, which is required for X-ray crystallography or nuclear magnetic resonance experiments for structure determination.
Because bacteria are prokaryotes, they are not equipped with the full enzymatic machinery to accomplish the required post-translational modifications or molecular folding. Hence, multi-domain eukaryotic proteins expressed in bacteria often are non-functional. Also, many proteins become insoluble as inclusion bodies that are difficult to recover without harsh denaturants and subsequent cumbersome protein-refolding.
To address these concerns, expressions systems using multiple eukaryotic cells were developed for applications requiring the proteins be conformed as in, or closer to eukaryotic organisms: cells of plants (i.e. tobacco), of insects or mammalians (i.e. bovines) are transfected with genes and cultured in suspension and even as tissues or whole organisms, to produce fully folded proteins. Mammalian in vivo expression systems have however low yield and other limitations (time-consuming, toxicity to host cells,..). To combine the high yield/productivity and scalable protein features of bacteria and yeast, and advanced epigenetic features of plants, insects and mammalians systems, other protein production systems are developed using unicellular eukaryotes (i.e. non-pathogenic ' Leishmania ' cells).
E. coli is one of the most widely used expression hosts, and DNA is normally introduced in a plasmid expression vector. The techniques for overexpression in E. coli are well developed and work by increasing the number of copies of the gene or increasing the binding strength of the promoter region so assisting transcription. [3]
For example, a DNA sequence for a protein of interest could be cloned or subcloned into a high copy-number plasmid containing the lac (often LacUV5) promoter, which is then transformed into the bacterium E. coli. Addition of IPTG (a lactose analog) activates the lac promoter and causes the bacteria to express the protein of interest. [2]
E. coli strain BL21 and BL21(DE3) are two strains commonly used for protein production. As members of the B lineage, they lack lon and OmpT proteases, protecting the produced proteins from degradation. The DE3 prophage found in BL21(DE3) provides T7 RNA polymerase (driven by the LacUV5 promoter), allowing for vectors with the T7 promoter to be used instead. [15]
Non-pathogenic species of the gram-positive Corynebacterium are used for the commercial production of various amino acids. The C. glutamicum species is widely used for producing glutamate and lysine, [16] components of human food, animal feed and pharmaceutical products.
Expression of functionally active human epidermal growth factor has been done in C. glutamicum, [17] thus demonstrating a potential for industrial-scale production of human proteins. Expressed proteins can be targeted for secretion through either the general, secretory pathway (Sec) or the twin-arginine translocation pathway (Tat). [18]
Unlike gram-negative bacteria, the gram-positive Corynebacterium lack lipopolysaccharides that function as antigenic endotoxins in humans.[ citation needed ]
The non-pathogenic and gram-negative bacteria, Pseudomonas fluorescens , is used for high level production of recombinant proteins; commonly for the development bio-therapeutics and vaccines. P. fluorescens is a metabolically versatile organism, allowing for high throughput screening and rapid development of complex proteins. P. fluorescens is most well known for its ability to rapid and successfully produce high titers of active, soluble protein. [19]
Expression systems using either S. cerevisiae or Pichia pastoris allow stable and lasting production of proteins that are processed similarly to mammalian cells, at high yield, in chemically defined media of proteins. [4] [5]
Filamentous fungi, especially Aspergillus and Trichoderma , have long been used to produce diverse industrial enzymes from their own genomes ("native", "homologous") and from recombinant DNA ("heterologous"). [9]
More recently, Myceliophthora thermophila C1 has been developed into an expression platform for screening and production of native and heterologous proteins.The expression system C1 shows a low viscosity morphology in submerged culture, enabling the use of complex growth and production media. C1 also does not "hyperglycosylate" heterologous proteins, as Aspergillus and Trichoderma tend to do. [9]
Baculovirus-infected insect cells [20] (Sf9, Sf21, High Five strains) or mammalian cells [21] (HeLa, HEK 293) allow production of glycosylated or membrane proteins that cannot be produced using fungal or bacterial systems. [20] [6] It is useful for production of proteins in high quantity. Genes are not expressed continuously because infected host cells eventually lyse and die during each infection cycle. [22]
Non-lytic insect cell expression is an alternative to the lytic baculovirus expression system. In non-lytic expression, vectors are transiently or stably transfected into the chromosomal DNA of insect cells for subsequent gene expression. [23] [24] This is followed by selection and screening of recombinant clones. [25] The non-lytic system has been used to give higher protein yield and quicker expression of recombinant genes compared to baculovirus-infected cell expression. [24] Cell lines used for this system include: Sf9, Sf21 from Spodoptera frugiperda cells, Hi-5 from Trichoplusia ni cells, and Schneider 2 cells and Schneider 3 cells from Drosophila melanogaster cells. [23] [25] With this system, cells do not lyse and several cultivation modes can be used. [23] Additionally, protein production runs are reproducible. [23] [24] This system gives a homogeneous product. [24] A drawback of this system is the requirement of an additional screening step for selecting viable clones. [25]
Leishmania tarentolae (cannot infect mammals) expression systems allow stable and lasting production of proteins at high yield, in chemically defined media. Produced proteins exhibit fully eukaryotic post-translational modifications, including glycosylation and disulfide bond formation.[ citation needed ]
The most common mammalian expression systems are Chinese Hamster ovary (CHO) and Human embryonic kidney (HEK) cells. [26] [27] [28]
Cell-free production of proteins is performed in vitro using purified RNA polymerase, ribosomes, tRNA and ribonucleotides. These reagents may be produced by extraction from cells or from a cell-based expression system. Due to the low expression levels and high cost of cell-free systems, cell-based systems are more widely used. [29]
Molecular biology is a branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions.
A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria; however, plasmids are sometimes present in archaea and eukaryotic organisms. Plasmids often carry useful genes, such as for antibiotic resistance. While chromosomes are large and contain all the essential genetic information for living under normal conditions, plasmids are usually very small and contain additional genes for special circumstances.
An expression vector, otherwise known as an expression construct, is usually a plasmid or virus designed for gene expression in cells. The vector is used to introduce a specific gene into a target cell, and can commandeer the cell's mechanism for protein synthesis to produce the protein encoded by the gene. Expression vectors are the basic tools in biotechnology for the production of proteins.
Recombinant DNA (rDNA) molecules are DNA molecules formed by laboratory methods of genetic recombination that bring together genetic material from multiple sources, creating sequences that would not otherwise be found in the genome.
A DNA construct is an artificially-designed segment of DNA borne on a vector that can be used to incorporate genetic material into a target tissue or cell. A DNA construct contains a DNA insert, called a transgene, delivered via a transformation vector which allows the insert sequence to be replicated and/or expressed in the target cell. This gene can be cloned from a naturally occurring gene, or synthetically constructed. The vector can be delivered using physical, chemical or viral methods. Typically, the vectors used in DNA constructs contain an origin of replication, a multiple cloning site, and a selectable marker. Certain vectors can carry additional regulatory elements based on the expression system involved.
Isopropyl β-d-1-thiogalactopyranoside (IPTG) is a molecular biology reagent. This compound is a molecular mimic of allolactose, a lactose metabolite that triggers transcription of the lac operon, and it is therefore used to induce protein expression where the gene is under the control of the lac operator.
Baculoviridae is a family of viruses. Arthropods, among the most studied being Lepidoptera, Hymenoptera and Diptera, serve as natural hosts. Currently, 85 species are placed in this family, assigned to four genera.
Komagataella is a methylotrophic yeast within the order Saccharomycetales. It was found in the 1960s as Pichia pastoris, with its feature of using methanol as a source of carbon and energy. In 1995, P. pastoris was reassigned into the sole representative of genus Komagataella, becoming Komagataella phaffii. Later studies have further distinguished new species in this genus, resulting in a total of 7 recognized species. It is not uncommon to see the old name still in use in the context of protein production, as of 2023; in less formal use, the yeast may confusingly be referred to as pichia.
The blue–white screen is a screening technique that allows for the rapid and convenient detection of recombinant bacteria in vector-based molecular cloning experiments. This method of screening is usually performed using a suitable bacterial strain, but other organisms such as yeast may also be used. DNA of transformation is ligated into a vector. The vector is then inserted into a competent host cell viable for transformation, which are then grown in the presence of X-gal. Cells transformed with vectors containing recombinant DNA will produce white colonies; cells transformed with non-recombinant plasmids grow into blue colonies.
Biotechnology is the use of living organisms to develop useful products. Biotechnology is often used in pharmaceutical manufacturing. Notable examples include the use of bacteria to produce things such as insulin or human growth hormone. Other examples include the use of transgenic pigs for the creation of hemoglobin in use of humans.
In molecular cloning, a vector is any particle used as a vehicle to artificially carry a foreign nucleic sequence – usually DNA – into another cell, where it can be replicated and/or expressed. A vector containing foreign DNA is termed recombinant DNA. The four major types of vectors are plasmids, viral vectors, cosmids, and artificial chromosomes. Of these, the most commonly used vectors are plasmids. Common to all engineered vectors are an origin of replication, a multicloning site, and a selectable marker.
A subunit vaccine is a vaccine that contains purified parts of the pathogen that are antigenic, or necessary to elicit a protective immune response. Subunit vaccine can be made from dissembled viral particles in cell culture or recombinant DNA expression, in which case it is a recombinant subunit vaccine.
Baculovirus gene transfer into Mammalian cells (BacMam) is the use of a baculovirus to deliver genes to mammalian cells. Baculoviruses are insect viruses that are typically not capable of infecting mammalian cells; however, they can be modified to express proteins in mammalian cells. Unmodified baculoviruses are able to enter mammalian cells; however, their genes are not expressed unless a recognizable mammalian promoter is incorporated upstream of a gene of interest. Both the unmodified baculovirus and its modified counterpart are unable to replicate in humans, making them non-infectious.
Heterologous expression refers to the expression of a gene or part of a gene in a host organism that does not naturally have the gene or gene fragment in question. Insertion of the gene in the heterologous host is performed by recombinant DNA technology. The purpose of heterologous expression is often to determine the effects of mutations and differential interactions on protein function. It provides an easy path to efficiently express and experiment with combinations of genes and mutants that do not naturally occur.
Molecular cloning is a set of experimental methods in molecular biology that are used to assemble recombinant DNA molecules and to direct their replication within host organisms. The use of the word cloning refers to the fact that the method involves the replication of one molecule to produce a population of cells with identical DNA molecules. Molecular cloning generally uses DNA sequences from two different organisms: the species that is the source of the DNA to be cloned, and the species that will serve as the living host for replication of the recombinant DNA. Molecular cloning methods are central to many contemporary areas of modern biology and medicine.
High Five (BTI-Tn-5B1-4) is an insect cell line that originated from the ovarian cells of the cabbage looper, Trichoplusia ni. It was developed by the Boyce Thompson Institute for Plant Research.
Genetic engineering techniques allow the modification of animal and plant genomes. Techniques have been devised to insert, delete, and modify DNA at multiple levels, ranging from a specific base pair in a specific gene to entire genes. There are a number of steps that are followed before a genetically modified organism (GMO) is created. Genetic engineers must first choose what gene they wish to insert, modify, or delete. The gene must then be isolated and incorporated, along with other genetic elements, into a suitable vector. This vector is then used to insert the gene into the host genome, creating a transgenic or edited organism.
Transient expression, more frequently referred to "transient gene expression", is the temporary expression of genes that are expressed for a short time after nucleic acid, most frequently plasmid DNA encoding an expression cassette, has been introduced into eukaryotic cells with a chemical delivery agent like calcium phosphate (CaPi) or polyethyleneimine (PEI). However, unlike "stable expression," the foreign DNA does not fuse with the host cell DNA, resulting in the inevitable loss of the vector after several cell replication cycles. The majority of transient gene expressions are done with cultivated animal cells. The technique is also used in plant cells; however, the transfer of nucleic acids into these cells requires different methods than those with animal cells. In both plants and animals, transient expression should result in a time-limited use of transferred nucleic acids, since any long-term expression would be called "stable expression."
The T7 expression system is used in the field of microbiology to clone recombinant DNA using strains of E. coli. It is the most popular system for expressing recombinant proteins in E. coli.
Escherichia coli BL21(DE3) is a commonly used protein production strain. This strain combines several features that allow for excessive expression of heterologous proteins. It is derived from the B lineage of E. coli.
Aspergillus and Trichoderma are currently the main fungal genera used to produce industrial enzymes.
The production of abnormally large amounts of a substance which is coded for by a particular gene or group of genes; the appearance in the phenotype to an abnormally high degree of a character or effect attributed to a particular gene.
overexpress
In biology, to make too many copies of a protein or other substance. Overexpression of certain proteins or other substances may play a role in cancer development.