Forensic DNA analysis

Last updated
DNA double helix DNA double helix horizontal.png
DNA double helix

DNA profiling is the determination of a DNA profile for legal and investigative purposes. DNA analysis methods have changed countless times over the years as technology changes and allows for more information to be determined with less starting material. Modern DNA analysis is based on the statistical calculation of the rarity of the produced profile within a population.

Contents

While most well known as a tool in forensic investigations, DNA profiling can also be used for non-forensic purposes such as paternity testing and human genealogy research.

History

The methods for producing a DNA profile were developed by Alec Jeffreys and his team in 1985. Jefferys discovered that an unknown sample of DNA such as blood, hair, saliva, or semen could be analyzed and a unique DNA pattern/profile could be developed. [1] A year after his discovery, Jefferys was asked to use his new found DNA analysis to convict a man that police believed was responsible for 2 rape murders. Jefferys proved that the man was innocent using DNA from the crime scene. [2]

When DNA analysis was first discovered, a process called Restriction Fragment Length Polymorphism (RFLP) was used to analyze DNA. However, RFLP was an inefficient process due to the fact that it used up large amounts of DNA which could not always be obtained from a crime scene. Modern day technology has evolved beyond RFLP. Short Tandem Repeat (STR) analysis is the modern day equivalent of RFLP. Not only does STR analysis use less of a sample to analyze DNA, but it also is a part of a larger process called Polymerase Chain Reaction (PCR). PCR is a process that can be used to quickly reproduce up to a billion copies of a singular segment of DNA. [3]

Methods

Retired methods

RFLP analysis

Six individuals with different number of repeats at one location in their genome shown on a gel. D1S80Demo.png
Six individuals with different number of repeats at one location in their genome shown on a gel.

The first true method of DNA profiling was restriction fragment length polymorphism analysis. The first use of RFLP analysis in forensic casework was in 1985 in the United Kingdom. [4] This type of analysis used variable number tandem repeats (VNTRs) to distinguish between individuals. VNTRs are common throughout the genome and consist of the same DNA sequence repeated again and again. [5] Different individuals can have a different number of repeats at a specific location in the genome. [4] For example, person A could have four while person B could have 5 repeats. The differences were visualized through a process called gel electrophoresis. Smaller fragments would travel farther through the gel than larger fragments separating them out. [6] These differences were used to distinguish between individuals and when multiple VNTR sites were run together, RFLP analysis has a high degree of individualizing power. [7]

The process of RFLP analysis was extremely time consuming and due to the length of the repeats used, between 9 and 100 base pairs, [5] [8] amplification methods such as the polymerase chain reaction could not be used. This limited RFLP to samples that already had a larger quantity of DNA available to start with and did not perform well with degraded samples. [9] RFLP analysis was the primary type of analysis performed in most forensic laboratories before finally being retired and replaced by newer methods. It was fully abandoned by the FBI in 2000 and replaced with STR analysis. [10]

DQ alpha testing

A DQ alpha testing strip showing a positive result. The filled in dots represent the allele values for that sample. DQ alpha 1 testing strip.png
A DQ alpha testing strip showing a positive result. The filled in dots represent the allele values for that sample.

Developed in 1991, [10] DQ alpha testing was the first forensic DNA technique that utilized the polymerase chain reaction. [11] This technique allowed for the use of far fewer cells than RFLP analysis making it more useful for crime scenes that did not have the large amounts of DNA material that was previously required. [12] The DQ alpha 1 locus (or location) was also polymorphic and had multiple different alleles that could be used to limit the pool of individuals that could have produced that result and increasing the probability of exclusion. [13]

The DQ alpha locus was combined with other loci in a commercially available kit called Polymarker in 1993. [14] Polymarker was a precursor to modern multiplexing kits and allowed multiple different loci to be examined with one product. While more sensitive than RFLP analysis, Polymarker did not contain the same discriminatory power as the older RFLP testing. [14] By 1995, scientists attempted to return to a VNTR based analysis combined with PCR technology called amplified fragment length polymorphisms (AmpFLP). [10]

AmpFLP

An agarose gel showing the D1S80 locus ran in multiple lanes using AmpFLP. AmpFLP gel of D1S80 locus.png
An agarose gel showing the D1S80 locus ran in multiple lanes using AmpFLP.

AmpFLP was the first attempt to couple VNTR analysis with PCR for forensic casework. This method used shorter VNTRs than RFLP analysis, between 8 and 16 base pairs. The shorter base pair sizes of AmpFLP was designed to work better with the amplification process of PCR. [8] It was hoped that this technique would allow for the discriminating power of RFLP analysis with the ability to process samples that have less template DNA to work with or which were otherwise degraded. However, only a few loci were validated for forensic applications to work with AmpFLP analysis as forensic labs quickly moved on to other techniques limited its discriminating ability for forensic samples. [15]

The technique was ultimately never widely used although it is still in use in smaller countries due to its lower cost and simpler setup compared to newer methods. [16] [17] By the late 1990s, laboratories began switching over to newer methods including STR analysis. These used even shorter fragments of DNA and could more reliably be amplified using PCR while still maintaining, and improving, the discriminatory power of the older methods. [10]

Current methods

STR analysis

A partial electropherogram produced through STR analysis. Str profile.jpg
A partial electropherogram produced through STR analysis.

Short tandem repeat (STR) analysis is the primary type of forensic DNA analysis performed in modern DNA laboratories. STR analysis builds upon RFLP and AmpFLP used in the past by shrinking the size of the repeat units, to 2 to 6 base pairs, and by combining multiple different loci into one PCR reaction. These multiplexing assay kits can produce allele values for dozens of different loci throughout the genome simultaneously limiting the amount of time it takes to gain a full, individualizing, profile. STR analysis has become the gold standard for DNA profiling and is used extensively in forensic applications.

STR analysis can also be restricted to just the Y chromosome. Y-STR analysis can be used in cases that involve paternity or in familial searching as the Y chromosome is identical down the paternal line (except in cases where a mutation occurred). Certain multiplexing kits combine both autosomal and Y-STR loci into one kit further reducing the amount of time it takes to obtain a large amount of data.

Currently, STR analysis requires multiple cells to create a full DNA profile. However, science is getting closer to creating a full DNA profile using STR analysis on single cells. [18]

mtDNA sequencing

Mitochondrial DNA sequencing is a specialized technique that uses the separate mitochondrial DNA present in most cells. This DNA is passed down the maternal line and is not unique between individuals. However, because of the number of mitochondria present in cells, mtDNA analysis can be used for highly degraded samples or samples where STR analysis would not produce enough data to be useful. mtDNA is also present in locations where autosomal DNA would be absent, such as in the shafts of hair.

Because of the increased chance of contamination when dealing with mtDNA, few laboratories process mitochondrial samples. Those that do have specialized protocols in place that further separate different samples from each other to avoid cross-contamination.

Rapid DNA

Rapid DNA is a "swab in-profile out" technology that completely automates the entire DNA extraction, amplification, and analysis process. Rapid DNA instruments are able to go from a swab to a DNA profile in as little as 90 minutes and eliminates the need for trained scientists to perform the process. These instruments are being looked at for use in the offender booking process allowing police officers to obtain the DNA profile of the person under arrest.

Recently, the Rapid DNA Act of 2017 was passed in the United States, directing the FBI to create protocols for the implementation of this technology throughout the country. Currently, DNA obtained from these instruments is not eligible for upload to national DNA databases as they do not analyze enough loci to meet the standard threshold. However, multiple police agencies already use Rapid DNA instruments to collect samples from people arrested in their area. These local DNA database are not, subject to federal or state regulations.

Massively parallel sequencing

Also known as next-generation sequencing, massively parallel sequencing (MPS) builds upon STR analysis by introducing direct sequencing of the loci. Instead of the number of repeats present at each location, MPS would give the scientist the actual base pair sequence. Theoretically MPS has the ability to distinguish between identical twins as random point mutations would be seen within repeat segments that would not be picked up by traditional STR analysis.

Profile rarity

When a DNA profile is used in an evidentiary manner a match statistic is provided that explains how rare a profile is within a population. Specifically, this statistic is the probability that a person picked randomly out of a population would have that specific DNA profile. It is not the probability that the profile "matches" someone. There are multiple different methods to determining this statistic and each are used by various laboratories based on their experience and preference. However, likelihood ratio calculations is becoming the preferred method over the other two most commonly used methods, random man not excluded and combined probability of inclusion. Match statistics are especially important in mixture interpretation where there is more than one contributor to a DNA profile. When these statistics are given in a courtroom setting or in a laboratory report they are usually given for the three most common races of that specific area. This is because the allele frequencies at different loci changed based on the individual's ancestry. https://strbase.nist.gov/training/6_Mixture-Statistics.pdf Archived 2022-08-15 at the Wayback Machine

Random man not excluded

The probability produced with this method is the probability that a person randomly selected out the population could not be excluded from the analyzed data. This type of match statistic is easy to explain in a courtroom setting to individuals who have no scientific background but it also loses a lot of discriminating power as it does not take into account the suspect's genotype. This approach is commonly used when the sample is degraded or contains so many contributors that a singular profile cannot be determined. It is also useful in explaining to laypersons as the method of obtaining the statistic is straightforward. However, due to its limited discriminating power, RMNE is not generally performed unless no other method can be used. RMNE is not recommended for use in data that indicates a mixture is present.

Combined probability of inclusion/exclusion

A single source profile showing two allele at the vWA locus. Str profile (cropped).jpg
A single source profile showing two allele at the vWA locus.
A three-person mixture showing four alleles at the vWA locus. STR electropherogram of a three person mixture (cropped).jpg
A three-person mixture showing four alleles at the vWA locus.

Combined probability of inclusion or exclusion calculates the probability that a random, unrelated, person would be a contributor to a DNA profile or DNA mixture. In this method, statistics for each individual locus is determined using population statistics and then combined to get the total CPI or CPE. These calculations are repeated for all available loci with all available data and then each value is multiplied together to get the total combined probability of inclusion or exclusion. Since the values are multiplied together, extremely small numbers can be achieved using CPI. CPI or CPE is considered an acceptable statistical calculation when a mixture is indicated. https://www.promega.com/-/media/files/resources/conference-proceedings/ishi-15/parentage-and-mixture-statistics-workshop/generalpopulationstats.pdf?la=en

Example calculation for single source profile

Probability of a Caucasian having a 14 allele at vWA = .10204

Probability of a Caucasian having a 17 allele at vWA = .26276

Probability of a Caucasian having either a 14 or a 17 allele (P) = .10204 + .26276 = .3648

Probability of any other alleles being present (Q) = 1 - P or 1 - .3648 = .6352

Probability of exclusion for vWA = Q2 + 2Q(1-Q) or .63522 + 2(.6352)(1 - .6352) = .86692096 ≈ 86.69%

Probability of inclusion for vWA = 1 - CPE or 1 - .86692096 = .13307904 ≈ 13.31%

Example calculation for mixture profile

Probability of a Caucasian having a 14 allele at vWA = .10204

Probability of a Caucasian having a 15 allele at vWA = .11224

Probability of a Caucasian having a 16 allele at vWA = .20153

Probability of a Caucasian having a 19 allele at vWA = .08418

Probability of a Caucasian having any of 14, 15, 16, or 19 alleles (P) = .10204 + .11224 + .20153 + .08418 = .49999

Probability of any other alleles being present (Q) = 1 - P or 1 - .49999 = .50001

Probability of exclusion for vWA = Q2 + 2Q(1-Q) or .500012 + 2(.50001)(1 - .50001) = .7500099999 ≈ 75%

Probability of inclusion for vWA = 1 - CPE or 1 - .7500099999 = .2499900001 ≈ 25%

Likelihood ratio

Likelihood ratios (LR) are a comparison of two different probabilities to determine which one is more likely. When it involves a trial the LR is the probability of the prosecution's argument versus the probability of the defense's argument given their starting assumptions. In this scenario the prosecution's probability is often equal to 1 since the assumption is that the prosecution would not prosecute a suspect unless they were absolutely certain (100%) that they have the right person. Likelihood ratios are becoming more common in laboratories due to their usefulness in presenting statistics for data that indicates multiple contributors as well as their use in probabilistic genotyping software that predicts the most likely allele combinations given a set of data.

The drawbacks with using likelihood ratios is that they are very difficult to understand how analysts arrived at a specific value and the mathematics involved get very complicated as more data is introduced to the equations. In order to combat these problems in a courtroom setting, some laboratories have set up a "verbal scale" that replaces the actual numeral value of the likelihood ratio.

Related Research Articles

<span class="mw-page-title-main">Polymerase chain reaction</span> Laboratory technique to multiply a DNA sample for study

The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA sufficiently to enable detailed study. PCR was invented in 1983 by American biochemist Kary Mullis at Cetus Corporation. Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.

In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

<span class="mw-page-title-main">DNA profiling</span> Technique used to identify individuals via DNA characteristics

DNA profiling is the process of determining an individual's deoxyribonucleic acid (DNA) characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding.

<span class="mw-page-title-main">Variable number tandem repeat</span>

A variable number tandem repeat is a location in a genome where a short nucleotide sequence is organized as a tandem repeat. These can be found on many chromosomes, and often show variations in length among individuals. Each variant acts as an inherited allele, allowing them to be used for personal or parental identification. Their analysis is useful in genetics and biology research, forensics, and DNA fingerprinting.

The first isolation of deoxyribonucleic acid (DNA) was done in 1869 by Friedrich Miescher. DNA extraction is the process of isolating DNA from the cells of an organism isolated from a sample, typically a biological sample such as blood, saliva, or tissue. It involves breaking open the cells, removing proteins and other contaminants, and purifying the DNA so that it is free of other cellular components. The purified DNA can then be used for downstream applications such as PCR, sequencing, or cloning. Currently, it is a routine procedure in molecular biology or forensic analyses.

Forensic identification is the application of forensic science, or "forensics", and technology to identify specific objects from the trace evidence they leave, often at a crime scene or the scene of an accident. Forensic means "for the courts".

A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosomal STRs because the Y chromosome is only found in males, which are only passed down by the father, making the Y chromosome in any paternal line practically identical. This causes a significantly smaller amount of distinction between Y-STR samples. Autosomal STRs provide a much stronger analytical power because of the random matching that occurs between pairs of chromosomes during the zygote-making process.

Second Generation Multiplex Plus (SGM Plus), is a DNA profiling system developed by Applied Biosystems. It is an updated version of Second Generation Multiplex. SGM Plus has been used by the UK National DNA Database since 1998.

<span class="mw-page-title-main">Forensic biology</span> Forensic application of the study of biology

Forensic biology involves the application of biological principles and techniques within the context of law enforcement investigations.

<span class="mw-page-title-main">STR analysis</span> Biological DNA analysis for allele repeats

Shorttandemrepeat (STR) analysis is a common molecular biology method used to compare allele repeats at specific loci in DNA between two or more samples. A short tandem repeat is a microsatellite with repeat units that are 2 to 7 base pairs in length, with the number of repeats varying among individuals, making STRs effective for human identification purposes. This method differs from restriction fragment length polymorphism analysis (RFLP) since STR analysis does not cut the DNA with restriction enzymes. Instead, polymerase chain reaction (PCR) is employed to discover the lengths of the short tandem repeats based on the length of the PCR product.

An allele-specific oligonucleotide (ASO) is a short piece of synthetic DNA complementary to the sequence of a variable target DNA. It acts as a probe for the presence of the target in a Southern blot assay or, more commonly, in the simpler dot blot assay. It is a common tool used in genetic testing, forensics, and molecular biology research.

<span class="mw-page-title-main">History of polymerase chain reaction</span>

The history of the polymerase chain reaction (PCR) has variously been described as a classic "Eureka!" moment, or as an example of cooperative teamwork between disparate researchers. Following is a list of events before, during, and after its development:

The versatility of polymerase chain reaction (PCR) has led to modifications of the basic protocol being used in a large number of variant techniques designed for various purposes. This article summarizes many of the most common variations currently or formerly used in molecular biology laboratories; familiarity with the fundamental premise by which PCR works and corresponding terms and concepts is necessary for understanding these variant techniques.

In paternity testing, Paternity Index (PI) is a calculated value generated for a single genetic marker or locus and is associated with the statistical strength or weight of that locus in favor of or against parentage given the phenotypes of the tested participants and the inheritance scenario. Phenotype typically refers to physical characteristics such as body plan, color, behavior, etc. in organisms. However, the term used in the area of DNA paternity testing refers to what is observed directly in the laboratory. Laboratories involved in parentage testing and other fields of human identity employ genetic testing panels that contain a battery of loci each of which is selected due to extensive allelic variations within and between populations. These genetic variations are not assumed to bestow physical and/or behavioral attributes to the person carrying the allelic arrangement(s) and therefore are not subject to selective pressure and follow Hardy Weinberg inheritance patterns.

<span class="mw-page-title-main">Combined DNA Index System</span> United States national DNA database

The Combined DNA Index System (CODIS) is the United States national DNA database created and maintained by the Federal Bureau of Investigation. CODIS consists of three levels of information; Local DNA Index Systems (LDIS) where DNA profiles originate, State DNA Index Systems (SDIS) which allows for laboratories within states to share information, and the National DNA Index System (NDIS) which allows states to compare DNA information with one another.

The terms "relative fluorescence units" (RFU) and "RFU peak" refer to measurements in electrophoresis methods, such as for DNA analysis. A "relative fluorescence unit" is a unit of measurement used in analysis which employs fluorescence detection. Fluorescence is detected using a charged coupled device (CCD) array, when the labeled fragments, which are separated within a capillary by using electrophoresis, are energized by laser light and travel across the detection window. A computer program measures the results, determining the quantity or size of the fragments, at each data point, from the level of fluorescence intensity. Samples which contain higher quantities of amplified DNA will have higher corresponding RFU values.

<span class="mw-page-title-main">Multiple loci VNTR analysis</span>

Multiple loci VNTR analysis (MLVA) is a method employed for the genetic analysis of particular microorganisms, such as pathogenic bacteria, that takes advantage of the polymorphism of tandemly repeated DNA sequences. A "VNTR" is a "variable-number tandem repeat". This method is well known in forensic science since it is the basis of DNA fingerprinting in humans. When applied to bacteria, it contributes to forensic microbiology through which the source of a particular strain might eventually be traced back, making it a useful technique for outbreak surveillance.

Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis and metagenomics.

Bulked segregant analysis (BSA) is a technique used to identify genetic markers associated with a mutant phenotype. This allows geneticists to discover genes conferring certain traits of interest, such as disease resistance or susceptibility.

References

  1. Wickenheiser, Ray A. (2019). "Forensic Genealogy, Bioethics and the Golden State Killer Case". Forensic Science International. Synergy. 1. Forensic Science International: 114–125. doi:10.1016/j.fsisyn.2019.07.003. PMC   7219171 . PMID   32411963.
  2. Panneerchelyam, S. (2003). "Forensic DNA Profiling and Database". The Malaysian Journal of Medical Sciences. 10 (2): 20–26. PMC   3561883 . PMID   23386793.
  3. Marks, Kathy. "New DNA Technology for Cold Cases". ProQuest   1074789441 . Retrieved 25 April 2023.
  4. 1 2 "DNA Typing by RFLP Analysis". National Forensic Science Technology Center. 2005. Archived from the original on January 3, 2015. Retrieved November 10, 2017.
  5. 1 2 Griffiths, Anthony J.F.; Lewontin, Richard C.; Gelbart, William M.; Miller, Jeffrey H. (22 February 2002). Modern Genetic Analysis: Integrating Genes and Genomes (Second ed.). W. H. Freeman and Company. p. 274. ISBN   9780716743828 . Retrieved November 10, 2017.
  6. Fisher, Barry A. J. (2005). Techniques of Crime Scene Investigation (Seventh ed.). CRC Press. p. 240. ISBN   9781439870952 . Retrieved November 11, 2017.
  7. Rudin, Norah; Inman, Keith (2002). An Introduction to Forensic DNA Analysis (Second ed.). CRC Press. p.  41. ISBN   9780849302336 . Retrieved November 11, 2017.
  8. 1 2 James, Stuart H.; Nordby, Jon J., eds. (2005). Forensic Science: An Introduction to Scientific and Investigative Techniques (Second ed.). Taylor & Francis. p. 286. ISBN   9780849327476 . Retrieved November 10, 2017.
  9. "Disadvantages". National Forensic Science Technology Center. 2005. Archived from the original on January 2, 2015. Retrieved November 10, 2017.
  10. 1 2 3 4 Tilstone, William J.; Savage, Kathleen A.; Clark, Leigh A. (2006). Forensic Science: An Encyclopedia of History, Methods, and Techniques. ABC CLIO. p. 49. ISBN   9781576071946 . Retrieved October 30, 2017.
  11. Riley, Donald E. (April 6, 2005). "DNA Testing: An Introduction For Non-Scientists An Illustrated Explanation" . Retrieved October 29, 2017.
  12. McClintock, J. Thomas (2008). Forensic DNA Analysis: A Laboratory Manual. CRC Press. p. 64. ISBN   9781420063301 . Retrieved October 29, 2017.
  13. Blake, E; Mihalovich, J; Hiquchi, R; Walsh, PS; Erlich, H (May 1992). "Polymerase chain reaction (PCR) amplification and human leukocyte antigen (HLA)-DQ alpha oligonucleotide typing on biological evidence samples: casework experience". Journal of Forensic Sciences. 37 (3): 700–726. doi:10.1520/JFS11984J. PMID   1629670.
  14. 1 2 "DQ-Alpha". National Forensic Science Technology Center. 2005. Archived from the original on November 10, 2014. Retrieved October 30, 2017.
  15. Bär, W.; Fiori, A.; Rossi, U., eds. (October 1993). Advances in Forensic Haemogenetics. 15th Congress of the International Society for Forensic Haemogenetics. p. 255. ISBN   9783642787829 . Retrieved November 10, 2017.
  16. "AmpFLPs". National Forensic Science Technology Center. 2005. Archived from the original on November 21, 2014. Retrieved November 5, 2017.
  17. "DNA Fingerprinting Methods". Fingerprinting.com. Retrieved November 5, 2017.
  18. Ostojic, Lana; O’Connor, Craig; Wurmbach, Elisa (1 March 2021). "Micromanipulation of single cells and fingerprints for forensic identification". Forensic Science International: Genetics. 51: 102430. doi:10.1016/j.fsigen.2020.102430. PMID   33260060. S2CID   227255180.