Part of a series on |
Forensic science |
---|
DNA profiling is the determination of a DNA profile for legal and investigative purposes. DNA analysis methods have changed countless times over the years as technology changes and allows for more information to be determined with less starting material. Modern DNA analysis is based on the statistical calculation of the rarity of the produced profile within a population.
While most well known as a tool in forensic investigations, DNA profiling can also be used for non-forensic purposes such as paternity testing and human genealogy research.
The methods for producing a DNA profile were developed by Alec Jeffreys and his team in 1985. Jefferys discovered that an unknown sample of DNA such as blood, hair, saliva, or semen could be analyzed and a unique DNA pattern/profile could be developed. [1] A year after his discovery, Jefferys was asked to use his new found DNA analysis to convict a man that police believed was responsible for 2 rape murders. Jefferys proved that the man was innocent using DNA from the crime scene. [2]
When DNA analysis was first discovered, a process called Restriction Fragment Length Polymorphism (RFLP) was used to analyze DNA. However, RFLP was an inefficient process due to the fact that it used up large amounts of DNA which could not always be obtained from a crime scene. Modern day technology has evolved beyond RFLP. Short Tandem Repeat (STR) analysis is the modern day equivalent of RFLP. Not only does STR analysis use less of a sample to analyze DNA, but it also is a part of a larger process called Polymerase Chain Reaction (PCR). PCR is a process that can be used to quickly reproduce up to a billion copies of a singular segment of DNA. [3]
The first true method of DNA profiling was restriction fragment length polymorphism analysis. The first use of RFLP analysis in forensic casework was in 1985 in the United Kingdom. [4] This type of analysis used variable number tandem repeats (VNTRs) to distinguish between individuals. VNTRs are common throughout the genome and consist of the same DNA sequence repeated again and again. [5] Different individuals can have a different number of repeats at a specific location in the genome. [4] For example, person A could have 4 while person B could have 5 repeats. The differences were visualized through a process called gel electrophoresis. Smaller fragments would travel farther through the gel than larger fragments separating them out. [6] These differences were used to distinguish between individuals and when multiple VNTR sites were run together, RFLP analysis has a high degree of individualizing power. [7]
The process of RFLP analysis was extremely time consuming and due to the length of the repeats used, between 9 and 100 base pairs, [5] [8] amplification methods such as the polymerase chain reaction could not be used. This limited RFLP to samples that already had a larger quantity of DNA available to start with and did not perform well with degraded samples. [9] RFLP analysis was the primary type of analysis performed in most forensic laboratories before finally being retired and replaced by newer methods. It was fully abandoned by the FBI in 2000 and replaced with STR analysis. [10]
Developed in 1991, [10] DQ alpha testing was the first forensic DNA technique that utilized the polymerase chain reaction. [11] This technique allowed for the use of far fewer cells than RFLP analysis making it more useful for crime scenes that did not have the large amounts of DNA material that was previously required. [12] The DQ alpha 1 locus (or location) was also polymorphic and had multiple different alleles that could be used to limit the pool of individuals that could have produced that result and increasing the probability of exclusion. [13]
The DQ alpha locus was combined with other loci in a commercially available kit called Polymarker in 1993. [14] Polymarker was a precursor to modern multiplexing kits and allowed multiple different loci to be examined with one product. While more sensitive than RFLP analysis, Polymarker did not contain the same discriminatory power as the older RFLP testing. [14] By 1995, scientists attempted to return to a VNTR based analysis combined with PCR technology called amplified fragment length polymorphisms (AmpFLP). [10]
AmpFLP was the first attempt to couple VNTR analysis with PCR for forensic casework. This method used shorter VNTRs than RFLP analysis, between 8 and 16 base pairs. The shorter base pair sizes of AmpFLP was designed to work better with the amplification process of PCR. [8] It was hoped that this technique would allow for the discriminating power of RFLP analysis with the ability to process samples that have less template DNA to work with or which were otherwise degraded. However, only a few loci were validated for forensic applications to work with AmpFLP analysis as forensic labs quickly moved on to other techniques limited its discriminating ability for forensic samples. [15]
The technique was ultimately never widely used although it is still in use in smaller countries due to its lower cost and simpler setup compared to newer methods. [16] [17] By the late 1990s, laboratories began switching over to newer methods including STR analysis. These used even shorter fragments of DNA and could more reliably be amplified using PCR while still maintaining, and improving, the discriminatory power of the older methods. [10]
Short tandem repeat (STR) analysis is the primary type of forensic DNA analysis performed in modern DNA laboratories. STR analysis builds upon RFLP and AmpFLP used in the past by shrinking the size of the repeat units, to 2 to 6 base pairs, and by combining multiple different loci into one PCR reaction. These multiplexing assay kits can produce allele values for dozens of different loci throughout the genome simultaneously limiting the amount of time it takes to gain a full, individualizing, profile. STR analysis has become the gold standard for DNA profiling and is used extensively in forensic applications.
STR analysis can also be restricted to just the Y chromosome. Y-STR analysis can be used in cases that involve paternity or in familial searching as the Y chromosome is identical down the paternal line (except in cases where a mutation occurred). Certain multiplexing kits combine both autosomal and Y-STR loci into one kit further reducing the amount of time it takes to obtain a large amount of data.
Currently, STR analysis requires multiple cells to create a full DNA profile. However, science is getting closer to creating a full DNA profile using STR analysis on single cells. [18]
Mitochondrial DNA sequencing is a specialized technique that uses the separate mitochondrial DNA present in most cells. This DNA is passed down the maternal line and is not unique between individuals. However, because of the number of mitochondria present in cells, mtDNA analysis can be used for highly degraded samples or samples where STR analysis would not produce enough data to be useful. mtDNA is also present in locations where autosomal DNA would be absent, such as in the shafts of hair.
Because of the increased chance of contamination when dealing with mtDNA, few laboratories process mitochondrial samples. Those that do have specialized protocols in place that further separate different samples from each other to avoid cross-contamination.
Rapid DNA is a "swab in-profile out" technology that completely automates the entire DNA extraction, amplification, and analysis process. Rapid DNA instruments are able to go from a swab to a DNA profile in as little as 90 minutes and eliminates the need for trained scientists to perform the process. These instruments are being looked at for use in the offender booking process allowing police officers to obtain the DNA profile of the person under arrest.
Recently, the Rapid DNA Act of 2017 was passed in the United States, directing the FBI to create protocols for the implementation of this technology throughout the country. Currently, DNA obtained from these instruments is not eligible for upload to national DNA databases as they do not analyze enough loci to meet the standard threshold. However, multiple police agencies already use Rapid DNA instruments to collect samples from people arrested in their area. These local DNA database are not, subject to federal or state regulations.
Also known as next-generation sequencing, massively parallel sequencing (MPS) builds upon STR analysis by introducing direct sequencing of the loci. Instead of the number of repeats present at each location, MPS would give the scientist the actual base pair sequence. Theoretically MPS has the ability to distinguish between identical twins as random point mutations would be seen within repeat segments that would not be picked up by traditional STR analysis.
When a DNA profile is used in an evidentiary manner a match statistic is provided that explains how rare a profile is within a population. Specifically, this statistic is the probability that a person picked randomly out of a population would have that specific DNA profile. It is not the probability that the profile "matches" someone. There are multiple different methods to determining this statistic and each are used by various laboratories based on their experience and preference. However, likelihood ratio calculations is becoming the preferred method over the other two most commonly used methods, random man not excluded and combined probability of inclusion. Match statistics are especially important in mixture interpretation where there is more than one contributor to a DNA profile. When these statistics are given in a courtroom setting or in a laboratory report they are usually given for the three most common races of that specific area. This is because the allele frequencies at different loci changed based on the individual's ancestry. https://strbase.nist.gov/training/6_Mixture-Statistics.pdf Archived 2022-08-15 at the Wayback Machine
The probability produced with this method is the probability that a person randomly selected out the population could not be excluded from the analyzed data. This type of match statistic is easy to explain in a courtroom setting to individuals who have no scientific background but it also loses a lot of discriminating power as it does not take into account the suspect's genotype. This approach is commonly used when the sample is degraded or contains so many contributors that a singular profile cannot be determined. It is also useful in explaining to laypersons as the method of obtaining the statistic is straightforward. However, due to its limited discriminating power, RMNE is not generally performed unless no other method can be used. RMNE is not recommended for use in data that indicates a mixture is present.
Combined probability of inclusion or exclusion calculates the probability that a random, unrelated, person would be a contributor to a DNA profile or DNA mixture. In this method, statistics for each individual locus is determined using population statistics and then combined to get the total CPI or CPE. These calculations are repeated for all available loci with all available data and then each value is multiplied together to get the total combined probability of inclusion or exclusion. Since the values are multiplied together, extremely small numbers can be achieved using CPI. CPI or CPE is considered an acceptable statistical calculation when a mixture is indicated. https://www.promega.com/-/media/files/resources/conference-proceedings/ishi-15/parentage-and-mixture-statistics-workshop/generalpopulationstats.pdf?la=en
Probability of a Caucasian having a 14 allele at vWA = .10204
Probability of a Caucasian having a 17 allele at vWA = .26276
Probability of a Caucasian having either a 14 or a 17 allele (P) = .10204 + .26276 = .3648
Probability of any other alleles being present (Q) = 1 - P or 1 - .3648 = .6352
Probability of exclusion for vWA = Q2 + 2Q(1-Q) or .63522 + 2(.6352)(1 - .6352) = .86692096 ≈ 86.69%
Probability of inclusion for vWA = 1 - CPE or 1 - .86692096 = .13307904 ≈ 13.31%
Probability of a Caucasian having a 14 allele at vWA = .10204
Probability of a Caucasian having a 15 allele at vWA = .11224
Probability of a Caucasian having a 16 allele at vWA = .20153
Probability of a Caucasian having a 19 allele at vWA = .08418
Probability of a Caucasian having any of 14, 15, 16, or 19 alleles (P) = .10204 + .11224 + .20153 + .08418 = .49999
Probability of any other alleles being present (Q) = 1 - P or 1 - .49999 = .50001
Probability of exclusion for vWA = Q2 + 2Q(1-Q) or .500012 + 2(.50001)(1 - .50001) = .7500099999 ≈ 75%
Probability of inclusion for vWA = 1 - CPE or 1 - .7500099999 = .2499900001 ≈ 25%
Likelihood ratios (LR) are a comparison of two different probabilities to determine which one is more likely. When it involves a trial the LR is the probability of the prosecution's argument versus the probability of the defense's argument given their starting assumptions. In this scenario the prosecution's probability is often equal to 1 since the assumption is that the prosecution would not prosecute a suspect unless they were absolutely certain (100%) that they have the right person. Likelihood ratios are becoming more common in laboratories due to their usefulness in presenting statistics for data that indicates multiple contributors as well as their use in probabilistic genotyping software that predicts the most likely allele combinations given a set of data.
The drawbacks with using likelihood ratios is that they are very difficult to understand how analysts arrived at a specific value and the mathematics involved get very complicated as more data is introduced to the equations. In order to combat these problems in a courtroom setting, some laboratories have set up a "verbal scale" that replaces the actual numeral value of the likelihood ratio.
The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA sufficiently to enable detailed study. PCR was invented in 1983 by American biochemist Kary Mullis at Cetus Corporation. Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.
In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.
A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.
DNA profiling is the process of determining an individual's deoxyribonucleic acid (DNA) characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding.
A variable number tandem repeat is a location in a genome where a short nucleotide sequence is organized as a tandem repeat. These can be found on many chromosomes, and often show variations in length among individuals. Each variant acts as an inherited allele, allowing them to be used for personal or parental identification. Their analysis is useful in genetics and biology research, forensics, and DNA fingerprinting.
The first isolation of deoxyribonucleic acid (DNA) was done in 1869 by Friedrich Miescher. DNA extraction is the process of isolating DNA from the cells of an organism isolated from a sample, typically a biological sample such as blood, saliva, or tissue. It involves breaking open the cells, removing proteins and other contaminants, and purifying the DNA so that it is free of other cellular components. The purified DNA can then be used for downstream applications such as PCR, sequencing, or cloning. Currently, it is a routine procedure in molecular biology or forensic analyses.
Forensic identification is the application of forensic science, or "forensics", and technology to identify specific objects from the trace evidence they leave, often at a crime scene or the scene of an accident. Forensic means "for the courts".
A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosomal STRs because the Y chromosome is only found in males, which are only passed down by the father, making the Y chromosome in any paternal line practically identical. This causes a significantly smaller amount of distinction between Y-STR samples. Autosomal STRs provide a much stronger analytical power because of the random matching that occurs between pairs of chromosomes during the zygote-making process.
Second Generation Multiplex Plus (SGM Plus), is a DNA profiling system developed by Applied Biosystems. It is an updated version of Second Generation Multiplex. SGM Plus has been used by the UK National DNA Database since 1998.
Forensic biology is the application of biological principles and techniques in the investigation of criminal and civil cases.
Shorttandemrepeat (STR) analysis is a common molecular biology method used to compare allele repeats at specific loci in DNA between two or more samples. A short tandem repeat is a microsatellite with repeat units that are 2 to 7 base pairs in length, with the number of repeats varying among individuals, making STRs effective for human identification purposes. This method differs from restriction fragment length polymorphism analysis (RFLP) since STR analysis does not cut the DNA with restriction enzymes. Instead, polymerase chain reaction (PCR) is employed to discover the lengths of the short tandem repeats based on the length of the PCR product.
An allele-specific oligonucleotide (ASO) is a short piece of synthetic DNA complementary to the sequence of a variable target DNA. It acts as a probe for the presence of the target in a Southern blot assay or, more commonly, in the simpler dot blot assay. It is a common tool used in genetic testing, forensics, and molecular biology research.
The history of the polymerase chain reaction (PCR) has variously been described as a classic "Eureka!" moment, or as an example of cooperative teamwork between disparate researchers. Following is a list of events before, during, and after its development:
The versatility of polymerase chain reaction (PCR) has led to modifications of the basic protocol being used in a large number of variant techniques designed for various purposes. This article summarizes many of the most common variations currently or formerly used in molecular biology laboratories; familiarity with the fundamental premise by which PCR works and corresponding terms and concepts is necessary for understanding these variant techniques.
In paternity testing, Paternity Index (PI) is a calculated value generated for a single genetic marker or locus and is associated with the statistical strength or weight of that locus in favor of or against parentage given the phenotypes of the tested participants and the inheritance scenario. Phenotype typically refers to physical characteristics such as body plan, color, behavior, etc. in organisms. However, the term used in the area of DNA paternity testing refers to what is observed directly in the laboratory. Laboratories involved in parentage testing and other fields of human identity employ genetic testing panels that contain a battery of loci each of which is selected due to extensive allelic variations within and between populations. These genetic variations are not assumed to bestow physical and/or behavioral attributes to the person carrying the allelic arrangement(s) and therefore are not subject to selective pressure and follow Hardy Weinberg inheritance patterns.
The Combined DNA Index System (CODIS) is the United States national DNA database created and maintained by the Federal Bureau of Investigation. CODIS consists of three levels of information; Local DNA Index Systems (LDIS) where DNA profiles originate, State DNA Index Systems (SDIS) which allows for laboratories within states to share information, and the National DNA Index System (NDIS) which allows states to compare DNA information with one another.
The terms "relative fluorescence units" (RFU) and "RFU peak" refer to measurements in electrophoresis methods, such as for DNA analysis. A "relative fluorescence unit" is a unit of measurement used in analysis which employs fluorescence detection. Fluorescence is detected using a charged coupled device (CCD) array, when the labeled fragments, which are separated within a capillary by using electrophoresis, are energized by laser light and travel across the detection window. A computer program measures the results, determining the quantity or size of the fragments, at each data point, from the level of fluorescence intensity. Samples which contain higher quantities of amplified DNA will have higher corresponding RFU values.
Multiple loci VNTR analysis (MLVA) is a method employed for the genetic analysis of particular microorganisms, such as pathogenic bacteria, that takes advantage of the polymorphism of tandemly repeated DNA sequences. A "VNTR" is a "variable-number tandem repeat". This method is well known in forensic science since it is the basis of DNA fingerprinting in humans. When applied to bacteria, it contributes to forensic microbiology through which the source of a particular strain might eventually be traced back, making it a useful technique for outbreak surveillance.
Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.
Bulked segregant analysis (BSA) is a technique used to identify genetic markers associated with a mutant phenotype. This allows geneticists to discover genes conferring certain traits of interest, such as disease resistance or susceptibility.