STR analysis

Last updated
Short tandem repeat (STR) analysis on a simplified model using polymerase chain reaction (PCR): First, a DNA sample undergoes PCR with primers targeting certain STRs (which vary in lengths between individuals and their alleles). The resultant fragments are separated by size (such as electrophoresis). Short Tandem Repeat (STR) analysis.png
Short tandem repeat (STR) analysis on a simplified model using polymerase chain reaction (PCR): First, a DNA sample undergoes PCR with primers targeting certain STRs (which vary in lengths between individuals and their alleles). The resultant fragments are separated by size (such as electrophoresis).
A partial human STR profile obtained using the Applied Biosystems Identifiler kit Str profile.jpg
A partial human STR profile obtained using the Applied Biosystems Identifiler kit

Shorttandemrepeat (STR) analysis is a common molecular biology method used to compare allele repeats at specific loci in DNA between two or more samples. A short tandem repeat is a microsatellite with repeat units that are 2 to 7 base pairs in length, with the number of repeats varying among individuals, making STRs effective for human identification purposes. [2] This method differs from restriction fragment length polymorphism analysis (RFLP) since STR analysis does not cut the DNA with restriction enzymes. Instead, polymerase chain reaction (PCR) is employed to discover the lengths of the short tandem repeats based on the length of the PCR product.

Contents

Forensic uses

STR analysis is a tool in forensic analysis that evaluates specific STR regions found on nuclear DNA. The variable (polymorphic) nature of the STR regions that are analyzed for forensic testing intensifies the discrimination between one DNA profile and another. [3] Scientific tools such as FBI approved STRmix incorporate this research technique. [4] [5] Forensic science takes advantage of the population's variability in STR lengths, enabling scientists to distinguish one DNA sample from another. The system of DNA profiling used today is based on PCR and uses simple sequences [6] or short tandem repeats (STR). This method uses highly polymorphic regions that have short repeated sequences of DNA (the most common is 4 bases repeated, but there are other lengths in use, including 3 and 5 bases). Because unrelated people almost certainly have different numbers of repeat units, STRs can be used to discriminate between unrelated individuals. These STR loci (locations on a chromosome) are targeted with sequence-specific primers and amplified using PCR. The DNA fragments that result are then separated and detected using electrophoresis. There are two common methods of separation and detection, capillary electrophoresis (CE) and gel electrophoresis.

Each STR is polymorphic, but the number of alleles is very small. Typically each STR allele will be shared by around 5 - 20% of individuals. The power of STR analysis comes from looking at multiple STR loci simultaneously. [6] The pattern of alleles can identify an individual quite accurately. Thus STR analysis provides an excellent identification tool. The more STR regions that are tested in an individual the more discriminating the test becomes. [6] However, given 10 STR loci, it can result in a genotyping error margin of 30%, or nearly one third (1/3) of the time. [7] Even when using 15 identifier microsatellite STR loci, they are not informative markers for inference of ancestry, a much larger set of genetic markers is needed to detect fine-scale population structure(PDF) Genetic variation and population structure of Sudanese populations as indicated by 15 Identifiler sequence-tagged repeat (STR) loci. A study claimed 30 DIP-STRs were found to be suitable for prenatal paternity testing and roughly outlining biogeographic ancestry in forensics, but more markers and multiplex panels need to be developed to promote use of this original approach. [8]

When comparing SNP and STR analysis, the use of high-quality SNPs has proven to be better for delineating population structure, as well as genetic relationships at the individual and population level. [9] Using the best 15 SNPs (30 alleles) was similar to the best 4 STR loci (83 alleles), and increasing the STR made no difference, but increasing to 100 SNPs substantially increased assignment giving the highest result. Researchers found that some of the STR loci out-performed the SNP loci on a single locus basis, but combinations of SNPs outperformed the STRs based upon total number of alleles. The SNPs from a larger panel gave significantly more accurate individual genetic self-assignment compared to any combination of the STR loci. [9]

From country to country, different STR-based DNA-profiling systems are in use. In North America, systems that amplify the CODIS 20 core loci are almost universal, whereas in the United Kingdom the DNA-17 17 loci system (which is compatible with The National DNA Database) is in use. Whichever system is used, many of the STR regions used are the same. These DNA-profiling systems are based on multiplex reactions, whereby many STR regions will be tested at the same time.

The true power of STR analysis is in its statistical power of discrimination. Because the 20 loci that are currently used for discrimination in CODIS are independently assorted (having a certain number of repeats at one locus does not change the likelihood of having any number of repeats at any other locus), the product rule for probabilities can be applied. This means that, if someone has the DNA type of ABC, where the three loci were independent, we can say that the probability of having that DNA type is the probability of having type A times the probability of having type B times the probability of having type C. This has resulted in the ability to generate match probabilities of 1 in a quintillion (1x1018) or more. However, DNA database searches showed much more frequent than expected false DNA profile matches. [10] Moreover, since there are about 12 million monozygotic twins on Earth, the theoretical probability is not accurate.

In practice, the risk of contaminated-matching is much greater than matching a distant relative, such as contamination of a sample from nearby objects, or from left-over cells transferred from a prior test. The risk is greater for matching the most common person in the samples: Everything collected from, or in contact with, a victim is a major source of contamination for any other samples brought into a lab. For that reason, multiple control-samples are typically tested in order to ensure that they stayed clean, when prepared during the same period as the actual test samples. Unexpected matches (or variations) in several control-samples indicates a high probability of contamination for the actual test samples. In a relationship test, the full DNA profiles should differ (except for twins), to prove that a person was not matched as being related to their own DNA in another sample.[ citation needed ]

In biomedical research, STR profiles are used to authenticate cell lines. [11] Self-generated STR profiles can be compared with databases such as CLASTR (https://www.cellosaurus.org/cellosaurus-str-search/) or STRBase (https://strbase.nist.gov/). In addition, self-generated primary murine cell lines cultured before the first passaging can be matched with later passages, thus ensuring the identity of the cell line.

See also

Related Research Articles

In molecular biology, restriction fragment length polymorphism (RFLP) is a technique that exploits variations in homologous DNA sequences, known as polymorphisms, populations, or species or to pinpoint the locations of genes within a sequence. The term may refer to a polymorphism itself, as detected through the differing locations of restriction enzyme sites, or to a related laboratory technique by which such differences can be illustrated. In RFLP analysis, a DNA sample is digested into fragments by one or more restriction enzymes, and the resulting restriction fragments are then separated by gel electrophoresis according to their size.

A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats (STRs) by forensic geneticists and in genetic genealogy, or as simple sequence repeats (SSRs) by plant geneticists.

<span class="mw-page-title-main">DNA profiling</span> Technique used to identify individuals via DNA characteristics

DNA profiling is the process of determining an individual's deoxyribonucleic acid (DNA) characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

In genetics, a minisatellite is a tract of repetitive DNA in which certain DNA motifs are typically repeated two to several hundred times. Minisatellites occur at more than 1,000 locations in the human genome and they are notable for their high mutation rate and high diversity in the population. Minisatellites are prominent in the centromeres and telomeres of chromosomes, the latter protecting the chromosomes from damage. The name "satellite" refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA. Minisatellites are small sequences of DNA that do not encode proteins but appear throughout the genome hundreds of times, with many repeated copies lying next to each other.

<span class="mw-page-title-main">Haplotype</span> Group of genes from one parent

A haplotype is a group of alleles in an organism that are inherited together from a single parent.

A variable number tandem repeat is a location in a genome where a short nucleotide sequence is organized as a tandem repeat. These can be found on many chromosomes, and often show variations in length among individuals. Each variant acts as an inherited allele, allowing them to be used for personal or parental identification. Their analysis is useful in genetics and biology research, forensics, and DNA fingerprinting.

A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species. It can be described as a variation that can be observed. A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change, or a long one, like minisatellites.

Forensic identification is the application of forensic science, or "forensics", and technology to identify specific objects from the trace evidence they leave, often at a crime scene or the scene of an accident. Forensic means "for the courts".

A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosomal STRs because the Y chromosome is only found in males, which are only passed down by the father, making the Y chromosome in any paternal line practically identical. This causes a significantly smaller amount of distinction between Y-STR samples. Autosomal STRs provide a much stronger analytical power because of the random matching that occurs between pairs of chromosomes during the zygote-making process.

Second Generation Multiplex Plus (SGM Plus), is a DNA profiling system developed by Applied Biosystems. It is an updated version of Second Generation Multiplex. SGM Plus has been used by the UK National DNA Database since 1998.

Preimplantation genetic haplotyping (PGH) is a clinical method of preimplantation genetic diagnosis (PGD) used to determine the presence of single gene disorders in offspring. PGH provides a more feasible method of gene location than whole-genome association experiments, which are expensive and time-consuming.

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

Marker assisted selection or marker aided selection (MAS) is an indirect selection process where a trait of interest is selected based on a marker linked to a trait of interest, rather than on the trait itself. This process has been extensively researched and proposed for plant- and animal- breeding.

<span class="mw-page-title-main">Paternity Index</span>

In paternity testing, Paternity Index (PI) is a calculated value generated for a single genetic marker or locus and is associated with the statistical strength or weight of that locus in favor of or against parentage given the phenotypes of the tested participants and the inheritance scenario. Phenotype typically refers to physical characteristics such as body plan, color, behavior, etc. in organisms. However, the term used in the area of DNA paternity testing refers to what is observed directly in the laboratory. Laboratories involved in parentage testing and other fields of human identity employ genetic testing panels that contain a battery of loci each of which is selected due to extensive allelic variations within and between populations. These genetic variations are not assumed to bestow physical and/or behavioral attributes to the person carrying the allelic arrangement(s) and therefore are not subject to selective pressure and follow Hardy Weinberg inheritance patterns.

<span class="mw-page-title-main">Combined DNA Index System</span> United States national DNA database

The Combined DNA Index System (CODIS) is the United States national DNA database created and maintained by the Federal Bureau of Investigation. CODIS consists of three levels of information; Local DNA Index Systems (LDIS) where DNA profiles originate, State DNA Index Systems (SDIS) which allows for laboratories within states to share information, and the National DNA Index System (NDIS) which allows states to compare DNA information with one another.

DNA Specimen Provenance Assignment (DSPA) also known as DNA Specimen ProvenanceAssay, is a molecular diagnostic test used to definitively assign biopsy specimen identity and establish specimen purity during the diagnostic testing cycle for cancer and other histopathological conditions. The term first appeared in the 2011 scientific paper, “The Changing Spectrum of DNA-Based Specimen Provenance Testing in Surgical Pathology,” published in the American Journal of Clinical Pathology, which built upon concepts described in an earlier paper published in the Journal of Urology.

<span class="mw-page-title-main">Y Chromosome Haplotype Reference Database</span>

The Y Chromosome Haplotype Reference Database (YHRD) is an open-access, annotated collection of population samples typed for Y chromosomal sequence variants. Two important objectives are pursued: (1) the generation of reliable frequency estimates for Y-STR haplotypes and Y-SNP haplotypes to be used in the quantitative assessment of matches in forensic and kinship cases and (2) the characterization of male lineages to draw conclusions about the origins and history of human populations. The database is endorsed by the International Society for Forensic Genetics (ISFG). By May 2023 about 350,000 Y chromosomes typed for 9-29 STR loci have been directly submitted by worldwide forensic institutions and universities. In geographic terms, about 53% of the YHRD samples stem from Asia, 21% from Europe, 12% from North America, 10% from Latin America, 3% from Africa, 0.8% from Oceania/Australia and 0.2% from the Arctic. The 1.406 individual sampling projects are described in more than 800 peer-reviewed publications

Bulked segregant analysis (BSA) is a technique used to identify genetic markers associated with a mutant phenotype. This allows geneticists to discover genes conferring certain traits of interest, such as disease resistance or susceptibility.

<span class="mw-page-title-main">Forensic DNA analysis</span> Genetic analyses in crime analysis

DNA profiling is the determination of a DNA profile for legal and investigative purposes. DNA analysis methods have changed countless times over the years as technology changes and allows for more information to be determined with less starting material. Modern DNA analysis is based on the statistical calculation of the rarity of the produced profile within a population.

References

  1. Image by Mikael Häggström, MD, using following source image: Figure 1 - available via license: Creative Commons Attribution 4.0 International", from the following article:
    Roberta Sitnik, Margareth Afonso Torres, Nydia Strachman Bacal, João Renato Rebello Pinho (2006). "Using PCR for molecular monitoring of post-transplantation chimerism". Einstein (Sao Paulo). 4 (2).{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. Butler, John M. (4 August 2011). Advanced Topics in Forensic DNA Typing: Methodology. San Diego: Elsevier Academic Press. pp. 99–100. ISBN   9780123745132.
  3. National Commission on the Future of DNA Evidence (July 2002). "Using DNA to Solve Cold Cases" (PDF). U.S. Department of Justice. Retrieved 2006-08-08.
  4. "Internal Validation of STRmix™ V2.3" (PDF). dfs.dc.gov.
  5. Moretti, Tamyra R.; Just, Rebecca S.; Kehl, Susannah C.; Willis, Leah E.; Buckleton, John S.; Bright, Jo-Anne; Taylor, Duncan A.; Onorato, Anthony J. (2017). "Internal validation of STRmix™ for the interpretation of single source and mixed DNA profiles". Forensic Science International: Genetics. 29: 126–144. doi: 10.1016/j.fsigen.2017.04.004 . PMID   28504203.
  6. 1 2 3 Tautz D. (1989). "Hypervariability of simple sequences as a general source for polymorphic DNA markers". Nucleic Acids Research. 17 (16): 6463–6471. doi:10.1093/nar/17.16.6463. PMC   318341 . PMID   2780284.
  7. Witherspoon, D. J.; Wooding, S.; Rogers, A. R.; Marchani, E. E.; Watkins, W. S.; Batzer, M. A.; Jorde, L. B. (2007-05-01). "Genetic Similarities Within and Between Human Populations". Genetics. 176 (1): 351–359. doi:10.1534/genetics.106.067355. ISSN   0016-6731. PMC   1893020 . PMID   17339205.
  8. Damour, Géraldine; Mauffrey, Florian; Hall, Diana (2023-05-01). "Identification and characterization of novel DIP-STRs from whole-genome sequencing data". Forensic Science International: Genetics. 64: 102849. doi:10.1016/j.fsigen.2023.102849. ISSN   1872-4973. PMID   36827792.
  9. 1 2 Glover, Kevin A.; Hansen, Michael M.; Lien, Sigbjørn; Als, Thomas D.; Høyheim, Bjørn; Skaala, Oystein (2010-01-06). "A comparison of SNP and STR loci for delineating population structure and performing individual genetic assignment". BMC Genetics. 11: 2. doi: 10.1186/1471-2156-11-2 . ISSN   1471-2156. PMC   2818610 . PMID   20051144.
  10. Felch, Jason; et al. (July 20, 2008). "FBI resists scrutiny of 'matches'". Los Angeles Times. pp. P8.
  11. Hong Y. (2020). "Authentication of Primary Murine Cell Lines by a Microfluidics-Based Lab-On-Chip System". Biomedicines. 8 (12): 590. doi: 10.3390/biomedicines8120590 . PMC   7763653 . PMID   33317212.