Nvidia Parabricks

Nvidia Parabricks
Developer(s)	Nvidia
Stable release	4.3.1-1 / July 1, 2024
Platform	Nvidia GPUs
Available in	English
Type	Medical software
Website	www.nvidia.com/en-us/clara/genomics/

Last updated December 15, 2024

Nvidia Parabricks is a suite of free software for genome analysis developed by Nvidia, designed to deliver high throughput by using graphics processing unit (GPU) acceleration.^[1]

Parabricks offers workflows for DNA and RNA analyses and the detection of germline and somatic mutations, using open-source tools.^[1] It is designed to improve the computing time of genomic data analysis while maintaining the flexibility required for various bioinformatics experiments.^[1] Along with the speed of GPU-based processing, Parabricks ensures high accuracy, compliance with standard genomic formats and the ability to scale in order to handle very large datasets.^[1]

Users can download and run Parabricks pipelines locally or directly deploy them on cloud providers, such as Amazon Web Services, Google Cloud, Oracle Cloud Infrastructure, and Microsoft Azure.^[1]

Accelerated genome analysis fundamentals

Standard pipeline to extract variants from an individual's genome Genome-analysis-pipeline.png — Standard pipeline to extract variants from an individual's genome

Sequencing machines able to identify the sequence of bases constituting the DNA DNBSEQ-G400.jpg — Sequencing machines able to identify the sequence of bases constituting the DNA

The massive reduction in sequencing costs^[2] resulted in a significant increase in the size and the availability of genomics data ^[3] with the potential of revolutionizing many fields, from medicine to drug design.^[4]

Starting from a biological sample (e.g., saliva or blood), it is possible to extract the individual's DNA and sequence it with sequencing machinery to translate the biological information into a textual sequence of bases.^[5] Then, once the entire genome is obtained through the genome assembly process, the DNA can be analyzed to extract information that is key in several domains, including personalized medicine and medical diagnostics.^[6]

Typically, genomics data analysis is performed with tools based on Central Processing Units (CPUs) for processing.^[7] Recently, several researchers in this field have underlined the challenges in terms of computing power delivered by these tools and focused their efforts on finding ways to boost the performance of the applications.^[7] The issue has been addressed in two ways: developing more efficient algorithms or accelerating the compute-intensive part using hardware accelerators. Examples of accelerators used in the domain are GPUs, FPGAs, and ASICs ^[8]

In this context, GPUs have revolutionized genomics by exploiting their parallel processing power to accelerate computationally intensive tasks.^[9]^[10] GPUs deliver promising results in these scenarios thanks to their architecture, composed of thousands of small cores capable of performing computations in parallel.^[11] This parallelism allows GPUs to process multiple tasks simultaneously, significantly speeding up computations that can be broken down into independent units.^[11] For instance, aligning millions of sequencing reads against a reference genome or performing statistical analyses on large genomic datasets can be completed much faster on GPUs than when using CPUs.^[10] This facilitates the rapid analysis of genomic data from diverse sources, ranging from individual genomes to large-scale population studies,^[12] accelerating the understanding of genetic diseases, genetic diversity, and more complex biological systems.^[10]

Featured pipelines

Parabricks offers end users various collections of tools organized sequentially to analyze the raw data according to the user's requirements, called pipelines.^[1] Nevertheless, users can decide to run the tools provided by Parabricks as a standalone, still exploiting GPU acceleration to overcome possible computational bottlenecks. Only some of the provided tools in the suite are GPU-based.^[13]

Overview of the main steps of NVIDIA Parabricks pipelines Parabricks-pipeline-overview.png — Overview of the main steps of NVIDIA Parabricks pipelines

Overall, all the pipelines share a standard structure. Most of the pipelines are built to analyze FASTQ data resulting from various sequencing technologies (e.g., short- or long-read). Input genomic sequences are firstly aligned and then undergo a quality control process. These two processes provide a BAM or a CRAM file as an intermediate result. Based on this data, the variant calling task that follows employs high-accuracy tools that are already widely used. As output, these pipelines provide the identified mutations in a VCF (or a gVCF).^[13]

Germline pipeline

The germline pipeline offered by Parabricks follows the best practices^[14] proposed by the Broad Institute in their Genome Analysis ToolKit (GATK).^[15] The germline pipeline operates on the FASTQ files provided as input by the user to call the variants that, belonging to the germ line, can be inherited.^[13]

This pipeline analyzes data computing the read alignment with BWA-MEM^[16]^[17] and calling variants using GATK HaplotypeCaller,^[18] one of the most relevant tools in the domain for germline variant calling.^[13]

DeepVariant germline pipeline

Besides the pipeline that resorts to HaplotypeCaller to call variants, Parabricks also offers an alternative pipeline that still calls germline variants but is based on DeepVariant.^[19]^[20] DeepVariant is a variant caller, developed and maintained by Google, capable of identifying mutations using a deep learning-based approach. The core of DeepVariant^[19] is a convolutional neural network (CNN) that identifies variants by transforming this task into an image classification operation. In Parabricks, the inference process is accelerated in hardware. For this pipeline, only T4, V100, and A100 GPUs are supported.^[13]

Analyses performed according to this pipeline are compliant with the use of BWA-MEM^[16] for the alignment by Google's CNN for variant calling.^[13]

Human_par pipeline

Still compliant with GATK best practices,^[14] the human_par pipeline allows users to identify mutations in the entire human genome, including sex chromosomes X and Y, and, thus, it is compliant with their ploidy. For male samples, firstly, the pipeline runs HaplotypeCaller^[18] on all the regions that do not belong to the X and Y chromosomes and on the pseudoautosomal region with ploidy equal to 1. Then, HaplotypeCaller analyses the X and Y regions without the pseudoautosomal region with ploidy 2. Regarding female samples, instead, the pipeline runs HaplotypeCaller on the entire genome, with ploidy 2.^[13]

The sex of the sample can be determined in two main ways:

Manually set with the --sample-sex option;
Specify the X vs. Y ratio with range options --range-male and --range-female and let the tool automatically infer the sex of the samples based on the X and Y reads count.

The pipeline requires the user to specify at least one of these three options.^[13]

As for the germline case, since this pipeline targets the germline variants, the pipeline resorts to BWA-MEM^[16] for the alignment, followed by HaplotypeCaller^[18] for variant calling.^[13]

Somatic pipeline

Parabricks' somatic pipeline is designed to call somatic variants, i.e., those mutations affecting non-reproductive (somatic) cells. This pipeline can analyze both tumor and non-tumor genomes, offering either tumor-only or tumor/normal analyses for comprehensive examinations.^[13]

As in the germline pipeline, the alignment task is carried out using BWA-MEM^[16] followed by GATK Mutect^[21] to identify the possible mutations. Mutect is used instead of HaplotypeCaller due to its focus on somatic mutations, as opposed to germline mutations targeted by HaplotypeCaller.^[21]

RNA pipeline

This pipeline is optimized for short variant discovery (i.e., Single-nucleotide polymorphisms (SNPs) and indels) in RNAseq data. It follows the Broad Institute's best practices for these types of analyses.^[13]

It relies on the STAR aligner,^[22] a read aligner specialized for RNA sequences for aligning the reads, and HaplotypeCaller^[18] for calling variants.^[13]

Parabricks tools

Parabricks provides a collection of tools to perform genomics analyses, classified into six main categories related to their task.^[13] These tools combined constitutes Parabricks' pipelines, and can be also used as-is.

For FASTQ and BAM files processing, the proposed tools are:^[13]

applybsqr
bam2fq
bamsort
bqsr
fq2bam
fq2bamfast
fq2bam_meth
markdup
minimap2 (beta)

For calling variants, the proposed tools are:^[13]

deepsomatic
deepvariant
deepvariant_germline
germline (GATK Germline Pipeline)
haplotypecaller
mutectcaller
pacbio_germline (beta)
postpon
prepon
somatic (Somatic Variant Caller)

For RNA processing, the proposed tools are:^[13]

rna_fq2bam
starfusion

For results quality control, the proposed tools are:^[13]

bammetrics
collectmultiplemetrics

For processing variants, the proposed tools are:^[13]

dbsnp

For processing gVCF files, the proposed tools are:^[13]

genotypegvcf
indexgvcf

Not all the listed tools are accelerated on GPU.^[13]

Hardware support

Users can download and run Parabricks pipelines on their local servers, allowing for private, on-site data processing and analysis. They also can deploy Parabricks pipelines on cloud platforms, with improved scalability for larger datasets. Supported cloud providers include AWS, GCP, OCI, and Azure.^[1]

In the latest release (v4.3.1-1), Parabricks includes support for the NVIDIA Grace Hopper super chip.^[23] The NVIDIA GH200 Grace Hopper Superchip is a heterogeneous platform designed for high-performance computing and artificial intelligence, combining an NVIDIA Grace and a Hopper on a single chip.^[24] This platform enhances application performance using both GPUs and CPUs, offering a programming model aimed at improving performance, portability, and productivity.^[23]

Applications

Due to the computational power required by genomics workloads, Parabricks has found application in several research studies with different applicative domains, especially in cancer research.^[25]^[26]^[27]

Scientists from Washington University used the Parabricks DeepVariant pipeline for identifying variants (e.g., SNPs and small indels) in long-read Hi-Fi whole-genome sequencing (WGS) data generated with PacBio's Revio SMRT Cell technology.^[28]

In addition to the pipelines, individual components of Parabricks have been used as standalone tools in academic settings. For example, the accelerated DeepVariant has been employed in a novel process to reduce the processing time further for WGS Nanopore data.^[29]

In 2022, Nvidia announced a collaboration with the Broad Institute to provide researchers with the benefits of accelerated computing. This partnership includes the entire suite of Nvidia's biomedical hardware-accelerated software suite called Clara, that includes Parabricks and MONAI.^[30] Similarly, the Regeneron Genetics Center uses Parabricks to expedite the secondary analysis of the exomes they sequence in their high-throughput sequencing center, leverage the DeepVariant Germline pipeline inside their workflows.^[31]

Related Research Articles

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems.

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome, transcriptome or proteome of an organism, and can also involve only selected segments or regions, like tandem repeats and transposable elements. Methodologies used include sequence alignment, searches against biological databases, and others.

<span class="mw-page-title-main">Single-nucleotide polymorphism</span> Single nucleotide in genomic DNA at which different sequence alternatives exist

In genetics and bioinformatics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor without recombination, that is, the segment has the same ancestral origin in these individuals. DNA segments that are IBD are IBS per definition, but segments that are not IBD can still be IBS due to the same mutations in different individuals or recombinations that do not alter the segment.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

Perfect phylogeny is a term used in computational phylogenetics to denote a phylogenetic tree in which all internal nodes may be labeled such that all characters evolve down the tree without homoplasy. That is, characteristics do not hold to evolutionary convergence, and do not have analogous structures. Statistically, this can be represented as an ancestor having state "0" in all characteristics where 0 represents a lack of that characteristic. Each of these characteristics changes from 0 to 1 exactly once and never reverts to state 0. It is rare that actual data adheres to the concept of perfect phylogeny.

Computational epigenetics uses statistical methods and mathematical modelling in epigenetic research. Due to the recent explosion of epigenome datasets, computational methods play an increasing role in all areas of epigenetic research.

Whole genome sequencing (WGS) is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.

Complete Genomics is a life sciences company that has developed and commercialized a DNA sequencing platform for human genome sequencing and analysis. The company is a wholly-owned subsidiary of MGI.

SOAP is a suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. It is particularly suited to short read sequencing data.

Cancer genome sequencing is the whole genome sequencing of a single, homogeneous or heterogeneous group of cancer cells. It is a biochemical laboratory method for the characterization and identification of the DNA or RNA sequences of cancer cell(s).

SNV calling from NGS data is any of a range of methods for identifying the existence of single nucleotide variants (SNVs) from the results of next generation sequencing (NGS) experiments. These are computational techniques, and are in contrast to special experimental methods based on known population-wide single nucleotide polymorphisms. Due to the increasing abundance of NGS data, these techniques are becoming increasingly popular for performing SNP genotyping, with a wide variety of algorithms designed for specific experimental designs and applications. In addition to the usual application domain of SNP genotyping, these techniques have been successfully adapted to identify rare SNPs within a population, as well as detecting somatic SNVs within an individual using multiple tissue samples.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

Single-cell DNA template strand sequencing, or Strand-seq, is a technique for the selective sequencing of a daughter cell's parental template strands. This technique offers a wide variety of applications, including the identification of sister chromatid exchanges in the parental cell prior to segregation, the assessment of non-random segregation of sister chromatids, the identification of misoriented contigs in genome assemblies, de novo genome assembly of both haplotypes in diploid organisms including humans, whole-chromosome haplotyping, and the identification of germline and somatic genomic structural variation, the latter of which can be detected robustly even in single cells.

PrecisionFDA is a secure, collaborative, high-performance computing platform that has established a growing community of experts around the analysis of biological datasets in order to advance precision medicine, inform regulatory science, and enable improvements in health outcomes. This cloud-based platform is developed and served by the United States Food and Drug Administration (FDA). PrecisionFDA connects experts, citizen scientists, and scholars from around the world and provides them with a library of computational tools, workflow features, and reference data. The platform allows researchers to upload and compare data against reference genomes, and execute bioinformatic pipelines. The variant call file (VCF) comparator tool also enables users to compare their genetic test results to reference genomes. The platform's code is open source and available on GitHub. The platform also features a crowdsourcing model to sponsor community challenges in order to stimulate the development of innovative analytics that inform precision medicine and regulatory science. Community members from around the world come together to participate in scientific challenges, solving problems that demonstrate the effectiveness of their tools, testing the capabilities of the platform, sharing their results, and engaging the community in discussions. Globally, precisionFDA has more than 5,000 users.

ANNOVAR is a bioinformatics software tool for the interpretation and prioritization of single nucleotide variants (SNVs), insertions, deletions, and copy number variants (CNVs) of a given genome.

Personalized onco-genomics (POG) is the field of oncology and genomics that is focused on using whole genome analysis to make personalized clinical treatment decisions. The program was devised at British Columbia's BC Cancer Agency and is currently being led by Marco Marra and Janessa Laskin. Genome instability has been identified as one of the underlying hallmarks of cancer. The genetic diversity of cancer cells promotes multiple other cancer hallmark functions that help them survive in their microenvironment and eventually metastasise. The pronounced genomic heterogeneity of tumours has led researchers to develop an approach that assesses each individual's cancer to identify targeted therapies that can halt cancer growth. Identification of these "drivers" and corresponding medications used to possibly halt these pathways are important in cancer treatment.

Nextflow is a scientific workflow system predominantly used for bioinformatic data analysis. It establishes standards for programmatically creating a series of dependent computational steps and facilitates their execution on various local and cloud resources.

Linked-read sequencing, a type of DNA sequencing technology, uses specialized technique that tags DNA molecules with unique barcodes before fragmenting them. Unlike traditional sequencing technology, where DNA is broken into small fragments and then sequenced individually, resulting in short read lengths that has difficulties in accurately reconstructing the original DNA sequence, the unique barcodes of linked-read sequencing allows scientists to link together DNA fragments that come from the same DNA molecule. A pivotal benefit of this technology lies in the small quantities of DNA required for large genome information output, effectively combining the advantages of long-read and short-read technologies.

References

1 2 3 4 5 6 7 "Clara for Genomics". NVIDIA. Retrieved 8 July 2024.
↑ "DNA Sequencing Costs: Data". www.genome.gov. Retrieved 2024-07-10.
↑ Langmead B, Nellore A (April 2018). "Cloud computing for genomic data analysis and collaboration". Nature Reviews. Genetics. 19 (4): 208–219. doi:10.1038/nrg.2017.113. PMC 6452449 . PMID 29379135.
↑ Ombrello MJ, Sikora KA, Kastner DL (April 2014). "Genetics, genomics, and their relevance to pathology and therapy". Best Practice & Research. Clinical Rheumatology. Advances in Paediatric Rheumatology and Translation of Research to Targeted Therapies. 28 (2): 175–189. doi:10.1016/j.berh.2014.05.001. PMC 4149217 . PMID 24974057.
↑ Alser M, Lindegger J, Firtina C, Almadhoun N, Mao H, Singh G, et al. (2022). "From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures". Computational and Structural Biotechnology Journal. 20: 4579–4599. doi:10.1016/j.csbj.2022.08.019. PMC 9436709 . PMID 36090814.
↑ Jain KK (2009). "Basics of Personalized Medicine". In Jain KK (ed.). Textbook of Personalized Medicine. New York, NY: Springer. pp. 1–27. doi:10.1007/978-1-4419-0769-1_1. ISBN 978-1-4419-0769-1.
1 2 Alser M, Bingol Z, Cali DS, Kim J, Ghose S, Alkan C, et al. (September 2020). "Accelerating Genome Analysis: A Primer on an Ongoing Journey". IEEE Micro. 40 (5): 65–75. arXiv: 2008.00961 . doi:10.1109/MM.2020.3013728. ISSN 0272-1732.
↑ Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, et al. (August 2021). "Technology dictates algorithms: recent developments in read alignment". Genome Biology. 22 (1): 249. doi: 10.1186/s13059-021-02443-7 . PMC 8390189 . PMID 34446078.
↑ Taylor-Weiner A, Aguet F, Haradhvala NJ, Gosai S, Anand S, Kim J, et al. (November 2019). "Scaling computational genomics to millions of individuals with GPUs". Genome Biology. 20 (1): 228. doi: 10.1186/s13059-019-1836-7 . PMC 6823959 . PMID 31675989.
1 2 3 Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D (September 2017). "Graphics processing units in bioinformatics, computational biology and systems biology". Briefings in Bioinformatics. 18 (5): 870–885. doi:10.1093/bib/bbw058. PMC 5862309 . PMID 27402792.
1 2 Cheng J, Grossman M, McKercher T (2014-09-09). Professional CUDA C Programming. John Wiley & Sons. ISBN 978-1-118-73932-7.
↑ Zhou C, Lang X, Wang Y, Zhu C (2015-08-06). "gPGA: GPU Accelerated Population Genetics Analyses". PLOS ONE. 10 (8): e0135028. Bibcode:2015PLoSO..1035028Z. doi: 10.1371/journal.pone.0135028 . PMC 4527771 . PMID 26248314.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 "Welcome to NVIDIA Parabricks v4.3.1". NVIDIA Docs. Retrieved 2024-07-10.
1 2 "Best Practices for Variant Calling with the GATK". @broadinstitute. 2015-03-19. Retrieved 2024-07-09.
↑ "Genome Analysis Toolkit (GATK)". @broadinstitute. 2010-06-08. Retrieved 2024-07-09.
1 2 3 4 Li H (2013-05-26), Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv: 1303.3997
↑ "Burrows-Wheeler Aligner". bio-bwa.sourceforge.net. Retrieved 2024-07-09.
1 2 3 4 Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO, Van der Auwera GA, et al. (2018-07-24), Scaling accurate genetic variant discovery to tens of thousands of samples, doi:10.1101/201178 , retrieved 2024-07-09
1 2 Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, et al. (November 2018). "A universal SNP and small-indel variant caller using deep neural networks". Nature Biotechnology. 36 (10): 983–987. doi:10.1038/nbt.4235. PMID 30247488.
↑ google/deepvariant, Google, 2024-07-04, retrieved 2024-07-09
1 2 Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. (March 2013). "Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples". Nature Biotechnology. 31 (3): 213–219. doi:10.1038/nbt.2514. PMC 3833702 . PMID 23396013.
↑ Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. (January 2013). "STAR: ultrafast universal RNA-seq aligner". Bioinformatics. 29 (1): 15–21. doi:10.1093/bioinformatics/bts635. PMC 3530905 . PMID 23104886.
1 2 "Grace Hopper Superchip". NVIDIA Docs. Retrieved 2024-07-10.
↑ Simakov NA, Jones MD, Furlani TR, Siegmann E, Harrison RJ (2024-01-11). "First Impressions of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip for Scientific Workloads". Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops. HPCAsia '24 Workshops. New York, NY, USA: Association for Computing Machinery. pp. 36–44. doi:10.1145/3636480.3637097. ISBN 979-8-4007-1652-2.
↑ Crowgey EL, Vats P, Franke K, Burnett G, Sethia A, Harkins T, et al. (July 2021). "Abstract 165: Enhanced processing of genomic sequencing data for pediatric cancers: GPUs and machine learning techniques for variant detection". Cancer Research. 81 (13_Supplement): 165. doi:10.1158/1538-7445.AM2021-165. ISSN 0008-5472.
↑ Ng JK, Vats P, Fritz-Waters E, Sarkar S, Sams EI, Padhi EM, et al. (December 2022). "de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project". Human Mutation. 43 (12): 1979–1993. doi:10.1002/humu.24455. PMC 9771978 . PMID 36054329.
↑ Lee TH, Jang BS, Chang JH, Kim E, Park JH, Chie EK (July 2023). "Genomic landscape of locally advanced rectal adenocarcinoma: Comparison between before and after neoadjuvant chemoradiation and effects of genetic biomarkers on clinical outcomes and tumor response". Cancer Medicine. 12 (14): 15664–15675. doi:10.1002/cam4.6169. PMC 10417181 . PMID 37260182.
↑ Manuel JG, Heins HB, Crocker S, Neidich JA, Sadzewicz L, Tallon L, et al. (June 2023). "High Coverage Highly Accurate Long-Read Sequencing of a Mouse Neuronal Cell Line Using the PacBio Revio Sequencer". bioRxiv. doi:10.1101/2023.06.06.543940. PMC 10274723 . PMID 37333171.
↑ Goenka SD, Gorzynski JE, Shafin K, Fisk DG, Pesout T, Jensen TD, et al. (July 2022). "Accelerated identification of disease-causing variants with ultra-rapid nanopore genome sequencing". Nature Biotechnology. 40 (7): 1035–1041. doi:10.1038/s41587-022-01221-5. PMC 9287171 . PMID 35347328.
↑ "The Broad Institute and NVIDIA Bring NVIDIA Clara to Terra Cloud Platform Serving 25,000 Researchers Advancing Biomedical Discovery". NVIDIA Newsroom. Retrieved 2024-07-09.
↑ "UK Biobank Advances Genomics Research with NVIDIA Clara Parabricks". NVIDIA. Retrieved 2024-07-09.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:6-1] 1 2 3 4 5 6 7 "Clara for Genomics". NVIDIA. Retrieved 8 July 2024.

[2] "DNA Sequencing Costs: Data". www.genome.gov. Retrieved 2024-07-10.

[3] Langmead B, Nellore A (April 2018). "Cloud computing for genomic data analysis and collaboration". Nature Reviews. Genetics. 19 (4): 208–219. doi:10.1038/nrg.2017.113. PMC 6452449 . PMID 29379135.

[4] Ombrello MJ, Sikora KA, Kastner DL (April 2014). "Genetics, genomics, and their relevance to pathology and therapy". Best Practice & Research. Clinical Rheumatology. Advances in Paediatric Rheumatology and Translation of Research to Targeted Therapies. 28 (2): 175–189. doi:10.1016/j.berh.2014.05.001. PMC 4149217 . PMID 24974057.

[5] Alser M, Lindegger J, Firtina C, Almadhoun N, Mao H, Singh G, et al. (2022). "From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures". Computational and Structural Biotechnology Journal. 20: 4579–4599. doi:10.1016/j.csbj.2022.08.019. PMC 9436709 . PMID 36090814.

[6] Jain KK (2009). "Basics of Personalized Medicine". In Jain KK (ed.). Textbook of Personalized Medicine. New York, NY: Springer. pp. 1–27. doi:10.1007/978-1-4419-0769-1_1. ISBN 978-1-4419-0769-1.

[:9-7] 1 2 Alser M, Bingol Z, Cali DS, Kim J, Ghose S, Alkan C, et al. (September 2020). "Accelerating Genome Analysis: A Primer on an Ongoing Journey". IEEE Micro. 40 (5): 65–75. arXiv: 2008.00961 . doi:10.1109/MM.2020.3013728. ISSN 0272-1732.

[8] Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, et al. (August 2021). "Technology dictates algorithms: recent developments in read alignment". Genome Biology. 22 (1): 249. doi: 10.1186/s13059-021-02443-7 . PMC 8390189 . PMID 34446078.

[9] Taylor-Weiner A, Aguet F, Haradhvala NJ, Gosai S, Anand S, Kim J, et al. (November 2019). "Scaling computational genomics to millions of individuals with GPUs". Genome Biology. 20 (1): 228. doi: 10.1186/s13059-019-1836-7 . PMC 6823959 . PMID 31675989.

[:7-10] 1 2 3 Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D (September 2017). "Graphics processing units in bioinformatics, computational biology and systems biology". Briefings in Bioinformatics. 18 (5): 870–885. doi:10.1093/bib/bbw058. PMC 5862309 . PMID 27402792.

[:8-11] 1 2 Cheng J, Grossman M, McKercher T (2014-09-09). Professional CUDA C Programming. John Wiley & Sons. ISBN 978-1-118-73932-7.

[12] Zhou C, Lang X, Wang Y, Zhu C (2015-08-06). "gPGA: GPU Accelerated Population Genetics Analyses". PLOS ONE. 10 (8): e0135028. Bibcode:2015PLoSO..1035028Z. doi: 10.1371/journal.pone.0135028 . PMC 4527771 . PMID 26248314.

[:4-13] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 "Welcome to NVIDIA Parabricks v4.3.1". NVIDIA Docs. Retrieved 2024-07-10.

[:0-14] 1 2 "Best Practices for Variant Calling with the GATK". @broadinstitute. 2015-03-19. Retrieved 2024-07-09.

[15] "Genome Analysis Toolkit (GATK)". @broadinstitute. 2010-06-08. Retrieved 2024-07-09.

[:1-16] 1 2 3 4 Li H (2013-05-26), Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv: 1303.3997

[17] "Burrows-Wheeler Aligner". bio-bwa.sourceforge.net. Retrieved 2024-07-09.

[:2-18] 1 2 3 4 Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO, Van der Auwera GA, et al. (2018-07-24), Scaling accurate genetic variant discovery to tens of thousands of samples, doi:10.1101/201178 , retrieved 2024-07-09

[:3-19] 1 2 Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, et al. (November 2018). "A universal SNP and small-indel variant caller using deep neural networks". Nature Biotechnology. 36 (10): 983–987. doi:10.1038/nbt.4235. PMID 30247488.

[20] google/deepvariant, Google, 2024-07-04, retrieved 2024-07-09

[:5-21] 1 2 Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. (March 2013). "Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples". Nature Biotechnology. 31 (3): 213–219. doi:10.1038/nbt.2514. PMC 3833702 . PMID 23396013.

[22] Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. (January 2013). "STAR: ultrafast universal RNA-seq aligner". Bioinformatics. 29 (1): 15–21. doi:10.1093/bioinformatics/bts635. PMC 3530905 . PMID 23104886.

[:10-23] 1 2 "Grace Hopper Superchip". NVIDIA Docs. Retrieved 2024-07-10.

[24] Simakov NA, Jones MD, Furlani TR, Siegmann E, Harrison RJ (2024-01-11). "First Impressions of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip for Scientific Workloads". Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops. HPCAsia '24 Workshops. New York, NY, USA: Association for Computing Machinery. pp. 36–44. doi:10.1145/3636480.3637097. ISBN 979-8-4007-1652-2.

[25] Crowgey EL, Vats P, Franke K, Burnett G, Sethia A, Harkins T, et al. (July 2021). "Abstract 165: Enhanced processing of genomic sequencing data for pediatric cancers: GPUs and machine learning techniques for variant detection". Cancer Research. 81 (13_Supplement): 165. doi:10.1158/1538-7445.AM2021-165. ISSN 0008-5472.

[26] Ng JK, Vats P, Fritz-Waters E, Sarkar S, Sams EI, Padhi EM, et al. (December 2022). "de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project". Human Mutation. 43 (12): 1979–1993. doi:10.1002/humu.24455. PMC 9771978 . PMID 36054329.

[27] Lee TH, Jang BS, Chang JH, Kim E, Park JH, Chie EK (July 2023). "Genomic landscape of locally advanced rectal adenocarcinoma: Comparison between before and after neoadjuvant chemoradiation and effects of genetic biomarkers on clinical outcomes and tumor response". Cancer Medicine. 12 (14): 15664–15675. doi:10.1002/cam4.6169. PMC 10417181 . PMID 37260182.

[28] Manuel JG, Heins HB, Crocker S, Neidich JA, Sadzewicz L, Tallon L, et al. (June 2023). "High Coverage Highly Accurate Long-Read Sequencing of a Mouse Neuronal Cell Line Using the PacBio Revio Sequencer". bioRxiv. doi:10.1101/2023.06.06.543940. PMC 10274723 . PMID 37333171.

[29] Goenka SD, Gorzynski JE, Shafin K, Fisk DG, Pesout T, Jensen TD, et al. (July 2022). "Accelerated identification of disease-causing variants with ultra-rapid nanopore genome sequencing". Nature Biotechnology. 40 (7): 1035–1041. doi:10.1038/s41587-022-01221-5. PMC 9287171 . PMID 35347328.

[30] "The Broad Institute and NVIDIA Bring NVIDIA Clara to Terra Cloud Platform Serving 25,000 Researchers Advancing Biomedical Discovery". NVIDIA Newsroom. Retrieved 2024-07-09.

[31] "UK Biobank Advances Genomics Research with NVIDIA Clara Parabricks". NVIDIA. Retrieved 2024-07-09.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]