AMRFinderPlus

Last updated
NCBI Antimicrobial Resistance Gene Finder (AMRFinderPlus)
Developer(s) National Center for Biotechnology Information
Written in C++ [1]
Operating system UNIX, Linux, Mac, MS-Windows
Type Bioinformatics tool
License Public domain
Website www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/

The AMRFinderPlus tool from the National Center for Biotechnology Information (NCBI) is a bioinformatic tool that allows users to identify antimicrobial resistance determinants, stress response, and virulence genes in bacterial genomes. [2] This tool's development began in 2018 (as AMRFinder) and is still underway. The National Institutes of Health funds the development of the software and the databases it uses.

Contents

Usage

AMRFinderPlus is used by NCBI's Pathogen Detection Project, which clusters and finds similar pathogen genomic sequences from food, environmental sources, and patients. AMRFinderPlus is run for each bacterial isolate in the Pathogen Detection Project, and the findings are provided for public use. Since its scientific publication in 2021, it has also gathered citations from other users.

Database design and curation

AMRFinderPlus can detect acquired antibiotic resistance, stress response, and virulence genes, and genetic mutations that are known to confer antibiotic resistance. [2] When AMRFinderPlus was initially developed and distributed, there were already multiple databases containing antibiotic resistance determinants. The team collaborated with database developers, expert panels, and others to consolidate these sources and create a high-quality resource that addressed limitations in these different data sources that the community had highlighted at the time. The NCBI team also collaborates with expert groups to develop the database and its annotation on a regular basis. Continuous evaluation of review papers and new reports of resistance proteins augment these sources. [2]

While some AMR gene identification tools rely on BLAST-based methodologies, others employ hidden Markov model (HMM) approaches. BLAST-based methods can identify particular alleles and genes that are closely related, but they often apply arbitrary cutoffs that can misidentify AMR genes or assign resistance to non-AMR genes. In AMRFinderPlus, custom BLAST cutoffs are created for each gene to optimize sensitivity and specificity of detection. Unlike BLAST-based approaches, which apply the same penalty for sequence mismatches across any sequence, HMMs allow for the weighing of sequence mismatches based on how prevalent they are in nature, resulting in higher accuracy in detecting true homologs than BLAST-based approaches, [3] but these models require curation and validation to ensure accuracy. [2]

The tool’s Bacterial Antimicrobial Resistance Reference Gene Database consists of up-to-date gene nomenclature, a set of hidden Markov models (HMMs), and a curated protein family hierarchy. The database contains over 627 AMR HMMs, 6,428 genes, and 682 mutations, placing this data in a hierarchical framework of gene families, symbols, and names in collaboration with multiple groups. The genes in the database consist of 5588 AMR genes, 210 stress response genes, and 630 virulence genes. The AMR genes cover resistance to 31 different classes of antibiotic and 58 specific drugs. [2]

Sequence records include, where possible, an additional 100 bp on either side of the coding region to assist in the design of primers. Cutoffs were set individually for each HMM through a manual process that involved confirmation of the supporting literature, benchmarking against other AMR proteins from related families, and the background of millions of additional proteins included in NCBI’s non redundant protein sequence database. [2]

Usage rights

Under the rules of the United States Copyright Act, NCBI AMRFinderPlus is classed as "United States Government Work." It is considered to be work performed as part of the developers' "official obligations for the US government", and is therefore not protected by copyright. The software is therefore freely accessible to the public for use and there are no limitations on its current or future use. [1]

See also

Related Research Articles

In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.

The European Bioinformatics Institute (EMBL-EBI) is an Intergovernmental Organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<i>Pseudomonas aeruginosa</i> Species of bacterium

Pseudomonas aeruginosa is a common encapsulated, Gram-negative, strict aerobic, Rod-shaped bacterium that can cause disease in plants and animals, including humans. A species of considerable medical importance, P. aeruginosa is a multidrug resistant pathogen recognized for its ubiquity, its intrinsically advanced antibiotic resistance mechanisms, and its association with serious illnesses – hospital-acquired infections such as ventilator-associated pneumonia and various sepsis syndromes.

The resistome has been used to describe to two similar yet separate concepts:

MicrobesOnline

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

GeneMark is a generic name for a family of ab initio gene prediction programs developed at the Georgia Institute of Technology in Atlanta. Developed in 1993, original GeneMark was used in 1995 as a primary gene prediction tool for annotation of the first completely sequenced bacterial genome of Haemophilus influenzae, and in 1996 for the first archaeal genome of Methanococcus jannaschii. The algorithm introduced inhomogeneous three-periodic Markov chain models of protein-coding DNA sequence that became standard in gene prediction as well as Bayesian approach to gene prediction in two DNA strands simultaneously. Species specific parameters of the models were estimated from training sets of sequences of known type. The major step of the algorithm computes for a given DNA fragment posterior probabilities of either being "protein-coding" in each of six possible reading frames or being "non-coding". Original GeneMark is an HMM-like algorithm; it can be viewed as approximation to known in the HMM theory posterior decoding algorithm for appropriately defined HMM.

Pathogenomics is a field which uses high-throughput screening technology and bioinformatics to study encoded microbe resistance, as well as virulence factors (VFs), which enable a microorganism to infect a host and possibly cause disease. This includes studying genomes of pathogens which cannot be cultured outside of a host. In the past, researchers and medical professionals found it difficult to study and understand pathogenic traits of infectious organisms. With newer technology, pathogen genomes can be identified and sequenced in a much shorter time and at a lower cost, thus improving the ability to diagnose, treat, and even predict and prevent pathogenic infections and disease. It has also allowed researchers to better understand genome evolution events - gene loss, gain, duplication, rearrangement - and how those events impact pathogen resistance and ability to cause disease. This influx of information has created a need for bioinformatics tools and databases to analyze and make the vast amounts of data accessible to researchers, and it has raised ethical questions about the wisdom of reconstructing previously extinct and deadly pathogens in order to better understand virulence.

Pathema was one of the eight bioinformatics resource centers funded by the National Institute of Allergy and Infectious Diseases (NIAID), a component of the National Institute of Health (NIH), which is an agency of the United States Department of Health and Human Services.

Genostar is a bioinformatics provider based in Grenoble, France. The company was founded in 2004 following the "Genostar consortium" that was created in 1999 as a public-private consortium by Genome Express, Hybrigenics, INRIA and The Pasteur Institute.

Bacterial small RNAs (sRNA) are small RNAs produced by bacteria; they are 50- to 500-nucleotide non-coding RNA molecules, highly structured and containing several stem-loops. Numerous sRNAs have been identified using both computational analysis and laboratory-based techniques such as Northern blotting, microarrays and RNA-Seq in a number of bacterial species including Escherichia coli, the model pathogen Salmonella, the nitrogen-fixing alphaproteobacterium Sinorhizobium meliloti, marine cyanobacteria, Francisella tularensis, Streptococcus pyogenes, the pathogen Staphylococcus aureus, and the plant pathogen Xanthomonas oryzae pathovar oryzae. Bacterial sRNAs affect how genes are expressed within bacterial cells via interaction with mRNA or protein, and thus can affect a variety of bacterial functions like metabolism, virulence, environmental stress response, and structure.

PATRIC is a bacterial bioinformatic website from the Bioinformatics Resource Center. It is an information system integrating databases with various types of data about bacterial pathogens together with analysis tools. Freely available, it is designed to support the biomedical research community's work on bacterial infectious diseases via these integrations of various pieces of pathogen information.

Metatranscriptomics is the science that studies gene expression of microbes within natural environments, i.e., the metatranscriptome. It also allows to obtain whole gene expression profiling of complex microbial communities.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Comprehensive Antibiotic Resistance Database

The Comprehensive Antibiotic Resistance Database (CARD) is a biological database that collects and organizes reference information on antimicrobial resistance genes, proteins and phenotypes. The database covers all types of drug classes and resistance mechanisms and structures its data based on an ontology. The CARD database was one of the first resources that covered antimicrobial resistance genes. The resource is updated monthly and provides tools to allow users to find potential antibiotic resistance genes in newly-sequenced genomes.

MEGARes is a hand-curated antibiotic resistance database which incorporates previously published resistance sequences for antimicrobial drugs, while also expanding to include published sequences for metal and biocide resistance determinants. In MEGARes 2.0, the nodes of the acyclic hierarchical ontology include four antimicrobial compound types, 57 classes, 220 mechanisms of resistance, and 1,345 gene groups that classify the 7,868 accessions. This works in conjunction with the AmrPlusPlus pipeline to classify resistome sequences directly from FASTA.

VFDB also known as Virulence Factor Database is a database that provides scientist quick access to virulence factors in bacterial pathogens. It can be navigated and browsed using genus or words. A BLAST tool is provided for search against known virulence factors. VFDB contains a collection of 16 important bacterial pathogens. Perl scripts were used to extract positions and sequences of VF from GenBank. Clusters of Orthologous Groups (COG) was used to update incomplete annotations. More information was obtained by NCBI. VFDB was built on Linux operation systems on DELL PowerEdge 1600SC servers.

In molecular biology, MvirDB is a publicly available database that stores information on toxins, virulence factors and antibiotic resistance genes. Sources that this database uses for DNA and protein information include: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provides a BLAST tool that allows the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors can be obtained from the usage of the provided browser tool. Once the browser tool is used, the results are returned as a readable table that is organized by ascending E-Values, each of which are hyperlinked to their related page. MvirDB is implemented in an Oracle 10g relational database.

The SARG database also known as Structured Antibiotic Resistance Gene database is a collection of antimicrobial resistance genes. The hierarchical structure of the database is clear to be 1) Type: antibiotic type 2) Subtype: genotype 3) Sequence: reference sequence. The SARG database helps in quick survey of antimicrobial resistance genes from environmental samples. The database was initially integrated from ARDB and Comprehensive Antibiotic Resistance Database, followed by hand curation including removing non-ARG sequences, removing redundant sequences and SNP sequences. Other sources include NCBI nr database and published papers.

Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material in clinical samples from patients. It uses the techniques of metagenomics to identify and characterize the genome of bacteria, fungi, parasites, and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays, such as PCR, fail. Its limitations include clinical utility, laboratory validity, sense and sensitivity, cost and regulatory considerations.

References

  1. 1 2 "NCBI Antimicrobial Resistance Gene Finder (AMRFinderPlus)". NCBI - National Center for Biotechnology Information/NLM/NIH. 23 November 2021. Retrieved 14 December 2021.
  2. 1 2 3 4 5 6 Feldgarden, Michael; Brover, Vyacheslav; Gonzalez-Escalona, Narjol; Frye, Jonathan G.; Haendiges, Julie; Haft, Daniel H.; Hoffmann, Maria; Pettengill, James B.; Prasad, Arjun B.; Tillman, Glenn E.; Tyson, Gregory H.; Klimke, William (16 June 2021). "AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence". Scientific Reports. 11 (1): 12728. Bibcode:2021NatSR..1112728F. doi:10.1038/s41598-021-91456-0. PMC   8208984 . PMID   34135355.
  3. Eddy, Sean R. (20 October 2011). "Accelerated Profile HMM Searches". PLOS Computational Biology. 7 (10): e1002195. Bibcode:2011PLSCB...7E2195E. doi:10.1371/journal.pcbi.1002195. PMC   3197634 . PMID   22039361.