A genomic library is a collection of overlapping DNA fragments that together make up the total genomic DNA of a single organism. The DNA is stored in a population of identical vectors, each containing a different insert of DNA. In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase. [1] Next, the vector DNA can be taken up by a host organism - commonly a population of Escherichia coli or yeast - with each cell containing only one vector molecule. Using a host cell to carry the vector allows for easy amplification and retrieval of specific clones from the library for analysis. [2]
There are several kinds of vectors available with various insert capacities. Generally, libraries made from organisms with larger genomes require vectors featuring larger inserts, thereby fewer vector molecules are needed to make the library. Researchers can choose a vector also considering the ideal insert size to find the desired number of clones necessary for full genome coverage. [3]
Genomic libraries are commonly used for sequencing applications. They have played an important role in the whole genome sequencing of several organisms, including the human genome and several model organisms. [4] [5]
The first DNA-based genome ever fully sequenced was achieved by two-time Nobel Prize winner, Frederick Sanger, in 1977. Sanger and his team of scientists created a library of the bacteriophage, phi X 174, for use in DNA sequencing. [6] The importance of this success contributed to the ever-increasing demand for sequencing genomes to research gene therapy. Teams are now able to catalog polymorphisms in genomes and investigate those candidate genes contributing to maladies such as Parkinson's disease, Alzheimer's disease, multiple sclerosis, rheumatoid arthritis, and Type 1 diabetes. [7] These are due to the advance of genome-wide association studies from the ability to create and sequence genomic libraries. Prior, linkage and candidate-gene studies were some of the only approaches. [8]
Construction of a genomic library involves creating many recombinant DNA molecules. An organism's genomic DNA is extracted and then digested with a restriction enzyme. For organisms with very small genomes (~10 kb), the digested fragments can be separated by gel electrophoresis. The separated fragments can then be excised and cloned into the vector separately. However, when a large genome is digested with a restriction enzyme, there are far too many fragments to excise individually. The entire set of fragments must be cloned together with the vector, and separation of clones can occur after. In either case, the fragments are ligated into a vector that has been digested with the same restriction enzyme. The vector containing the inserted fragments of genomic DNA can then be introduced into a host organism. [1]
Below are the steps for creating a genomic library from a large genome.
Below is a diagram of the above outlined steps.
After a genomic library is constructed with a viral vector, such as lambda phage, the titer of the library can be determined. Calculating the titer allows researchers to approximate how many infectious viral particles were successfully created in the library. To do this, dilutions of the library are used to transform cultures of E. coli of known concentrations. The cultures are then plated on agar plates and incubated overnight. The number of viral plaques are counted and can be used to calculate the total number of infectious viral particles in the library. Most viral vectors also carry a marker that allows clones containing an insert to be distinguished from those that do not have an insert. This allows researchers to also determine the percentage of infectious viral particles actually carrying a fragment of the library. [11]
A similar method can be used to titer genomic libraries made with non-viral vectors, such as plasmids and BACs. A test ligation of the library can be used to transform E. coli. The transformation is then spread on agar plates and incubated overnight. The titer of the transformation is determined by counting the number of colonies present on the plates. These vectors generally have a selectable marker allowing the differentiation of clones containing an insert from those that do not. By doing this test, researchers can also determine the efficiency of the ligation and make adjustments as needed to ensure they get the desired number of clones for the library. [12]
In order to isolate clones that contain regions of interest from a library, the library must first be screened. One method of screening is hybridization. Each transformed host cell of a library will contain only one vector with one insert of DNA. The whole library can be plated onto a filter over media. The filter and colonies are prepared for hybridization and then labeled with a probe. [13] The target DNA- insert of interest- can be identified by detection such as autoradiography because of the hybridization with the probe as seen below.
Another method of screening is with polymerase chain reaction (PCR). Some libraries are stored as pools of clones and screening by PCR is an efficient way to identify pools containing specific clones. [2]
Genome size varies among different organisms and the cloning vector must be selected accordingly. For a large genome, a vector with a large capacity should be chosen so that a relatively small number of clones are sufficient for coverage of the entire genome. However, it is often more difficult to characterize an insert contained in a higher capacity vector. [3]
Below is a table of several kinds of vectors commonly used for genomic libraries and the insert size that each generally holds.
Vector type | Insert size (thousands of bases) |
---|---|
Plasmids | up to 10 |
Phage lambda (λ) | up to 25 |
Cosmids | up to 45 |
Bacteriophage P1 | 70 to 100 |
P1 artificial chromosomes (PACs) | 130 to 150 |
Bacterial artificial chromosomes (BACs) | 120 to 300 |
Yeast artificial chromosomes (YACs) | 250 to 2000 |
A plasmid is a double stranded circular DNA molecule commonly used for molecular cloning. Plasmids are generally 2 to 4 kilobase-pairs (kb) in length and are capable of carrying inserts up to 15kb. Plasmids contain an origin of replication allowing them to replicate inside a bacterium independently of the host chromosome. Plasmids commonly carry a gene for antibiotic resistance that allows for the selection of bacterial cells containing the plasmid. Many plasmids also carry a reporter gene that allows researchers to distinguish clones containing an insert from those that do not. [3]
Phage λ is a double-stranded DNA virus that infects E. coli . The λ chromosome is 48.5kb long and can carry inserts up to 25kb. These inserts replace non-essential viral sequences in the λ chromosome, while the genes required for formation of viral particles and infection remain intact. The insert DNA is replicated with the viral DNA; thus, together they are packaged into viral particles. These particles are very efficient at infection and multiplication leading to a higher production of the recombinant λ chromosomes. [3] However, due to the smaller insert size, libraries made with λ phage may require many clones for full genome coverage. [14]
Cosmid vectors are plasmids that contain a small region of bacteriophage λ DNA called the cos sequence. This sequence allows the cosmid to be packaged into bacteriophage λ particles. These particles- containing a linearized cosmid- are introduced into the host cell by transduction. Once inside the host, the cosmids circularize with the aid of the host's DNA ligase and then function as plasmids. Cosmids are capable of carrying inserts up to 40kb in size. [2]
Bacteriophage P1 vectors can hold inserts 70 – 100kb in size. They begin as linear DNA molecules packaged into bacteriophage P1 particles. These particles are injected into an E. coli strain expressing Cre recombinase. The linear P1 vector becomes circularized by recombination between two loxP sites in the vector. P1 vectors generally contain a gene for antibiotic resistance and a positive selection marker to distinguish clones containing an insert from those that do not. P1 vectors also contain a P1 plasmid replicon, which ensures only one copy of the vector is present in a cell. However, there is a second P1 replicon- called the P1 lytic replicon- that is controlled by an inducible promoter. This promoter allows the amplification of more than one copy of the vector per cell prior to DNA extraction. [2]
P1 artificial chromosomes (PACs) have features of both P1 vectors and Bacterial Artificial Chromosomes (BACs). Similar to P1 vectors, they contain a plasmid and a lytic replicon as described above. Unlike P1 vectors, they do not need to be packaged into bacteriophage particles for transduction. Instead they are introduced into E. coli as circular DNA molecules through electroporation just as BACs are. [2] Also similar to BACs, these are relatively harder to prepare due to a single origin of replication. [14]
Bacterial artificial chromosomes (BACs) are circular DNA molecules, usually about 7kb in length, that are capable of holding inserts up to 300kb in size. BAC vectors contain a replicon derived from E. coli F factor, which ensures they are maintained at one copy per cell. [4] Once an insert is ligated into a BAC, the BAC is introduced into recombination deficient strains of E. coli by electroporation. Most BAC vectors contain a gene for antibiotic resistance and also a positive selection marker. [2] The figure to the right depicts a BAC vector being cut with a restriction enzyme, followed by the insertion of foreign DNA that is re-annealed by a ligase. Overall, this is a very stable vector, but they may be hard to prepare due to a single origin of replication just like PACs. [14]
Yeast artificial chromosomes (YACs) are linear DNA molecules containing the necessary features of an authentic yeast chromosome, including telomeres, a centromere, and an origin of replication. Large inserts of DNA can be ligated into the middle of the YAC so that there is an “arm” of the YAC on either side of the insert. The recombinant YAC is introduced into yeast by transformation; selectable markers present in the YAC allow for the identification of successful transformants. YACs can hold inserts up to 2000kb, but most YAC libraries contain inserts 250-400kb in size. Theoretically there is no upper limit on the size of insert a YAC can hold. It is the quality in the preparation of DNA used for inserts that determines the size limit. [2] The most challenging aspect of using YAC is the fact they are prone to rearrangement. [14]
Vector selection requires one to ensure the library made is representative of the entire genome. Any insert of the genome derived from a restriction enzyme should have an equal chance of being in the library compared to any other insert. Furthermore, recombinant molecules should contain large enough inserts ensuring the library size is able to be handled conveniently. [14] This is particularly determined by the number of clones needed to have in a library. The number of clones to get a sampling of all the genes is determined by the size of the organism's genome as well as the average insert size. This is represented by the formula (also known as the Carbon and Clarke formula): [15]
where,
is the necessary number of recombinants [16]
is the desired probability that any fragment in the genome will occur at least once in the library created
is the fractional proportion of the genome in a single recombinant
can be further shown to be:
where,
is the insert size
is the genome size
Thus, increasing the insert size (by choice of vector) would allow for fewer clones needed to represent a genome. The proportion of the insert size versus the genome size represents the proportion of the respective genome in a single clone. [14] Here is the equation with all parts considered:
The above formula can be used to determine the 99% confidence level that all sequences in a genome are represented by using a vector with an insert size of twenty thousand basepairs (such as the phage lambda vector). The genome size of the organism is three billion basepairs in this example.
clones
Thus, approximately 688,060 clones are required to ensure a 99% probability that a given DNA sequence from this three billion basepair genome will be present in a library using a vector with an insert size of twenty thousand basepairs.
After a library is created, the genome of an organism can be sequenced to elucidate how genes affect an organism or to compare similar organisms at the genome-level. The aforementioned genome-wide association studies can identify candidate genes stemming from many functional traits. Genes can be isolated through genomic libraries and used on human cell lines or animal models to further research. [17] Furthermore, creating high-fidelity clones with accurate genome representation and no stability issues would contribute well as intermediates for shotgun sequencing or the study of complete genes in functional analysis. [10]
One major use of genomic libraries is hierarchichal shotgun sequencing, which is also called top-down, map-based or clone-by-clone sequencing. This strategy was developed in the 1980s for sequencing whole genomes before high throughput techniques for sequencing were available. Individual clones from genomic libraries can be sheared into smaller fragments, usually 500bp to 1000bp, which are more manageable for sequencing. [4] Once a clone from a genomic library is sequenced, the sequence can be used to screen the library for other clones containing inserts which overlap with the sequenced clone. Any new overlapping clones can then be sequenced forming a contig. This technique, called chromosome walking, can be exploited to sequence entire chromosomes. [2]
Whole genome shotgun sequencing is another method of genome sequencing that does not require a library of high-capacity vectors. Rather, it uses computer algorithms to assemble short sequence reads to cover the entire genome. Genomic libraries are often used in combination with whole genome shotgun sequencing for this reason. A high resolution map can be created by sequencing both ends of inserts from several clones in a genomic library. This map provides sequences of known distances apart, which can be used to help with the assembly of sequence reads acquired through shotgun sequencing. [4] The human genome sequence, which was declared complete in 2003, was assembled using both a BAC library and shotgun sequencing. [18] [19]
Genome-wide association studies are general applications to find specific gene targets and polymorphisms within the human race. In fact, the International HapMap project was created through a partnership of scientists and agencies from several countries to catalog and utilize this data. [20] The goal of this project is to compare genetic sequences of different individuals to elucidate similarities and differences within chromosomal regions. [20] Scientists from all of the participating nations are cataloging these attributes with data from populations of African, Asian, and European ancestry. Such genome-wide assessments may lead to further diagnostic and drug therapies while also helping future teams focus on orchestrating therapeutics with genetic features in mind. These concepts are already being exploited in genetic engineering. [20] For example, a research team has actually constructed a PAC shuttle vector that creates a library representing two-fold coverage of the human genome. [17] This could serve as an incredible resource to identify genes, or sets of genes, causing disease. Moreover, these studies can serve as a powerful way to investigate transcriptional regulation as it has been seen in the study of baculoviruses. [21] Overall, advances in genome library construction and DNA sequencing has allowed for efficient discovery of different molecular targets. [5] Assimilation of these features through such efficient methods can hasten the employment of novel drug candidates.
A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria; however, plasmids are sometimes present in archaea and eukaryotic organisms. In nature, plasmids often carry genes that benefit the survival of the organism and confer selective advantage such as antibiotic resistance. While chromosomes are large and contain all the essential genetic information for living under normal conditions, plasmids are usually very small and contain only additional genes that may be useful in certain situations or conditions. Artificial plasmids are widely used as vectors in molecular cloning, serving to drive the replication of recombinant DNA sequences within host organisms. In the laboratory, plasmids may be introduced into a cell via transformation. Synthetic plasmids are available for procurement over the internet.
In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun.
A bacterial artificial chromosome (BAC) is a DNA construct, based on a functional fertility plasmid, used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell division. The bacterial artificial chromosome's usual insert size is 150–350 kbp. A similar cloning vector called a PAC has also been produced from the DNA of P1 bacteriophage.
A cloning vector is a small piece of DNA that can be stably maintained in an organism, and into which a foreign DNA fragment can be inserted for cloning purposes. The cloning vector may be DNA taken from a virus, the cell of a higher organism, or it may be the plasmid of a bacterium. The vector contains features that allow for the convenient insertion of a DNA fragment into the vector or its removal from the vector, for example through the presence of restriction sites. The vector and the foreign DNA may be treated with a restriction enzyme that cuts the DNA, and DNA fragments thus generated contain either blunt ends or overhangs known as sticky ends, and vector DNA and foreign DNA with compatible ends can then be joined by molecular ligation. After a DNA fragment has been cloned into a cloning vector, it may be further subcloned into another vector designed for more specific use.
Yeast artificial chromosomes (YACs) are genetically engineered chromosomes derived from the DNA of the yeast, Saccharomyces cerevisiae, which is then ligated into a bacterial plasmid. By inserting large fragments of DNA, from 100–1000 kb, the inserted sequences can be cloned and physically mapped using a process called chromosome walking. This is the process that was initially used for the Human Genome Project, however due to stability issues, YACs were abandoned for the use of bacterial artificial chromosome
In molecular biology, a library is a collection of genetic material fragments that are stored and propagated in a population of microbes through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries, genomic libraries and randomized mutant libraries. DNA library technology is a mainstay of current molecular biology, genetic engineering, and protein engineering, and the applications of these libraries depend on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria or yeast such that each organism contains on average one construct. As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated.
Transduction is the process by which foreign DNA is introduced into a cell by a virus or viral vector. An example is the viral transfer of DNA from one bacterium to another and hence an example of horizontal gene transfer. Transduction does not require physical contact between the cell donating the DNA and the cell receiving the DNA, and it is DNase resistant. Transduction is a common tool used by molecular biologists to stably introduce a foreign gene into a host cell's genome.
A cosmid is a type of hybrid plasmid that contains a Lambda phage cos sequence. Often used as cloning vectors in genetic engineering, cosmids can be used to build genomic libraries. They were first described by Collins and Hohn in 1978. Cosmids can contain 37 to 52 kb of DNA, limits based on the normal bacteriophage packaging size. They can replicate as plasmids if they have a suitable origin of replication (ori): for example SV40 ori in mammalian cells, ColE1 ori for double-stranded DNA replication, or f1 ori for single-stranded DNA replication in prokaryotes. They frequently also contain a gene for selection such as antibiotic resistance, so that the transformed cells can be identified by plating on a medium containing the antibiotic. Those cells which did not take up the cosmid would be unable to grow.
A DNA construct is an artificially-designed segment of DNA borne on a vector that can be used to incorporate genetic material into a target tissue or cell. A DNA construct contains a DNA insert, called a transgene, delivered via a transformation vector which allows the insert sequence to be replicated and/or expressed in the target cell. This gene can be cloned from a naturally occurring gene, or synthetically constructed. The vector can be delivered using physical, chemical or viral methods. Typically, the vectors used in DNA constructs contain an origin of replication, a multiple cloning site, and a selectable marker. Certain vectors can carry additional regulatory elements based on the expression system involved.
Primer walking is a technique used to clone a gene from its known closest markers. As a result, it is employed in cloning and sequencing efforts in plants, fungi, and mammals with minor alterations. This technique, also known as "directed sequencing," employs a series of Sanger sequencing reactions to either confirm the reference sequence of a known plasmid or PCR product based on the reference sequence or to discover the unknown sequence of a full plasmid or PCR product by designing primers to sequence overlapping sections.
A restriction digest is a procedure used in molecular biology to prepare DNA for analysis or other processing. It is sometimes termed DNA fragmentation, though this term is used for other procedures as well. In a restriction digest, DNA molecules are cleaved at specific restriction sites of 4-12 nucleotides in length by use of restriction enzymes which recognize these sequences.
A restriction map is a map of known restriction sites within a sequence of DNA. Restriction mapping requires the use of restriction enzymes. In molecular biology, restriction maps are used as a reference to engineer plasmids or other relatively short pieces of DNA, and sometimes for longer genomic DNA. There are other ways of mapping features on DNA for longer length DNA molecules, such as mapping by transduction.
Fosmids are similar to cosmids but are based on the bacterial F-plasmid. The cloning vector is limited, as a host can only contain one fosmid molecule. Fosmids can hold DNA inserts of up to 40 kb in size; often the source of the insert is random genomic DNA. A fosmid library is prepared by extracting the genomic DNA from the target organism and cloning it into the fosmid vector. The ligation mix is then packaged into phage particles and the DNA is transfected into the bacterial host. Bacterial clones propagate the fosmid library. The low copy number offers higher stability than vectors with relatively higher copy numbers, including cosmids. Fosmids may be useful for constructing stable libraries from complex genomes. Fosmids have high structural stability and have been found to maintain human DNA effectively even after 100 generations of bacterial growth. Fosmid clones were used to help assess the accuracy of the Public Human Genome Sequence.
P1 is a temperate bacteriophage that infects Escherichia coli and some other bacteria. When undergoing a lysogenic cycle the phage genome exists as a plasmid in the bacterium unlike other phages that integrate into the host DNA. P1 has an icosahedral head containing the DNA attached to a contractile tail with six tail fibers. The P1 phage has gained research interest because it can be used to transfer DNA from one bacterial cell to another in a process known as transduction. As it replicates during its lytic cycle it captures fragments of the host chromosome. If the resulting viral particles are used to infect a different host the captured DNA fragments can be integrated into the new host's genome. This method of in vivo genetic engineering was widely used for many years and is still used today, though to a lesser extent. P1 can also be used to create the P1-derived artificial chromosome cloning vector which can carry relatively large fragments of DNA. P1 encodes a site-specific recombinase, Cre, that is widely used to carry out cell-specific or time-specific DNA recombination by flanking the target DNA with loxP sites.
In the fields of bioinformatics and computational biology, Genome survey sequences (GSS) are nucleotide sequences similar to expressed sequence tags (ESTs) that the only difference is that most of them are genomic in origin, rather than mRNA.
Functional cloning is a molecular cloning technique that relies on prior knowledge of the encoded protein’s sequence or function for gene identification. In this assay, a genomic or cDNA library is screened to identify the genetic sequence of a protein of interest. Expression cDNA libraries may be screened with antibodies specific for the protein of interest or may rely on selection via the protein function. Historically, the amino acid sequence of a protein was used to prepare degenerate oligonucleotides which were then probed against the library to identify the gene encoding the protein of interest. Once candidate clones carrying the gene of interest are identified, they are sequenced and their identity is confirmed. This method of cloning allows researchers to screen entire genomes without prior knowledge of the location of the gene or the genetic sequence.
A P1-derived artificial chromosome, or PAC, is a DNA construct derived from the DNA of P1 bacteriophages and Bacterial artificial chromosome. It can carry large amounts of other sequences for a variety of bioengineering purposes in bacteria. It is one type of the efficient cloning vector used to clone DNA fragments in Escherichia coli cells.
In molecular cloning, a vector is any particle used as a vehicle to artificially carry a foreign nucleic sequence – usually DNA – into another cell, where it can be replicated and/or expressed. A vector containing foreign DNA is termed recombinant DNA. The four major types of vectors are plasmids, viral vectors, cosmids, and artificial chromosomes. Of these, the most commonly used vectors are plasmids. Common to all engineered vectors are an origin of replication, a multicloning site, and a selectable marker.
Molecular cloning is a set of experimental methods in molecular biology that are used to assemble recombinant DNA molecules and to direct their replication within host organisms. The use of the word cloning refers to the fact that the method involves the replication of one molecule to produce a population of cells with identical DNA molecules. Molecular cloning generally uses DNA sequences from two different organisms: the species that is the source of the DNA to be cloned, and the species that will serve as the living host for replication of the recombinant DNA. Molecular cloning methods are central to many contemporary areas of modern biology and medicine.
A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA fragments that are obtained from different types of sequencing technology.
Klug, Cummings, Spencer, Palladino (2010). Essentials of Genetics. Pearson. pp. 355–264. ISBN 978-0-321-61869-6.{{cite book}}
: CS1 maint: multiple names: authors list (link)