Protein subfamily

Last updated

Protein subfamily is a level of protein classification, based on their close evolutionary relationship. It is below the larger levels of protein superfamily and protein family. [1]

Proteins typically share greater sequence and function similarities with other subfamily members than they do with members of their wider family. [1] [2] For example, in the Structural Classification of Proteins database classification system, members of a subfamily share the same interaction interfaces and interaction partners. [3] These are stricter criteria than for a family, where members have similar structures, but may be more distantly related and so have different interfaces. Subfamilies are assigned by a variety of methods, including sequence similarity, [4] motifs linked to function, [5] or phylogenetic clade. [6] [7] There is no exact and consistent distinction between a subfamily and a family. The same group of proteins may sometimes be described as a family or a subfamily, depending on the context.

Related Research Articles

<span class="mw-page-title-main">Protein family</span> Group of evolutionarily-related proteins

A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

<span class="mw-page-title-main">Tetraloop</span>

Tetraloops are a type of four-base hairpin loop motifs in RNA secondary structure that cap many double helices. There are many variants of the tetraloop. The published ones include ANYA, CUYG, GNRA, UNAC and UNCG.

<span class="mw-page-title-main">28S ribosomal RNA</span> RNA component of the large subunit of the eukaryotic ribosome

28S ribosomal RNA is the structural ribosomal RNA (rRNA) for the large subunit (LSU) of eukaryotic cytoplasmic ribosomes, and thus one of the basic components of all eukaryotic cells. It has a size of 25S in plants and 28S in mammals, hence the alias of 25S–28S rRNA.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">Protein fold class</span> Categories of protein tertiary structure

In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.

<span class="mw-page-title-main">European Nucleotide Archive</span> Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

Computer Atlas of Surface Topography of Proteins (CASTp) aims to provide comprehensive and detailed quantitative characterization of topographic features of protein, is now updated to version 3.0. Since its release in 2006, the CASTp server has ≈45000 visits and fulfills ≈33000 calculation requests annually. CASTp has been proven as a confident tool for a wide range of researches, including investigations of signaling receptors, discoveries of cancer therapeutics, understanding of mechanism of drug actions, studies of immune disorder diseases, analysis of protein–nanoparticle interactions, inference of protein functions and development of high-throughput computational tools. This server is maintained by Jie Liang's lab in University of Illinois at Chicago.

In bioinformatics, the PANTHER classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products. PANTHER is part of the Gene Ontology Reference Genome Project designed to classify proteins and their genes for high-throughput analysis.

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred. Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent. Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

Monica Riley was an American scientist who contributed to the discovery of messenger RNA in her Ph.D work with Arthur Pardee, and was later a pioneer in the exploration and computer representation of the Escherichia coli genome.

The ViennaRNA Package is a set of standalone programs and libraries used for prediction and analysis of RNA secondary structures. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. The original paper has been cited over 2000 times.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

Toby James Gibson is a group leader and biochemist at the European Molecular Biology Laboratory (EMBL) in Heidelberg known for his work on Clustal. According to Nature, Gibson's co-authored papers describing Clustal are among the top ten most highly cited scientific papers of all time.

In molecular biology, MvirDB was a publicly available database that stored information on toxins, virulence factors and antibiotic resistance genes. Sources that this database used for DNA and protein information included: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provided a BLAST tool that allowed the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors could be obtained from the usage of the provided browser tool. Once the browser tool was used, the results were returned as a readable table that was organized by ascending E-Values, each of which were hyperlinked to their related page. MvirDB was implemented in an Oracle 10g relational database. MvirDB appears to have been inactive for some time, and is therefore not current. The last available snapshot was made on August 2, 2017.

<span class="mw-page-title-main">MIF4GD</span> Protein-coding gene in the species Homo sapiens

MIF4GD, or MIF4G domain-containing protein, is a protein which in humans is encoded by the MIF4GD gene. It is also known as SLIP1, SLBP -interacting protein 1, AD023, and MIFD. MIF4GD is expressed ubiquitously in humans, and has been found to be involved in activating proteins for histone mRNA translation, alternative splicing and translation of mRNAs, and is a factor in the regulation of cell proliferation.

IntFOLD is fully automated, integrated pipeline for prediction of 3D structure and function from amino acid sequences. The pipeline is wrapped up and deployed as a Web Server. The core of the server method is quality assessment using built-in accuracy self-estimates (ASE) which improves performance prediction of 3D model using ModFOLD.

References

  1. 1 2 "What are protein families?". EMBL-EBI Train online. 2011-11-18. Retrieved 2018-03-08.
  2. Das, Sayoni; Orengo, Christine A. (2016). "Protein function annotation using protein domain family resources" (PDF). Methods. 93: 24–34. doi:10.1016/j.ymeth.2015.09.029. PMID   26434392.
  3. Rausell, Antonio; Juan, David; Pazos, Florencio; Valencia, Alfonso (2010-02-02). "Protein interactions and ligand binding: From protein subfamilies to functional specificity". Proceedings of the National Academy of Sciences. 107 (5): 1995–2000. Bibcode:2010PNAS..107.1995R. doi: 10.1073/pnas.0908044107 . PMC   2808218 . PMID   20133844.
  4. Brown, Duncan P.; Krishnamurthy, Nandini; Sjölander, Kimmen (2007-08-17). "Automated Protein Subfamily Identification and Classification". PLOS Computational Biology. 3 (8): e160. Bibcode:2007PLSCB...3..160B. doi: 10.1371/journal.pcbi.0030160 . ISSN   1553-7358. PMC   1950344 . PMID   17708678.
  5. Eisen, Jonathan A.; Sweder, Kevin S.; Hanawalt, Philip C. (1995-07-25). "Evolution of the SNF2 family of proteins: subfamilies with distinct sequences and functions". Nucleic Acids Research. 23 (14): 2715–2723. doi:10.1093/nar/23.14.2715. ISSN   0305-1048. PMC   307096 . PMID   7651832.
  6. Wicker, Nicolas; Perrin, Guy René; Thierry, Jean Claude; Poch, Olivier (2001-08-01). "Secator: A Program for Inferring Protein Subfamilies from Phylogenetic Trees". Molecular Biology and Evolution. 18 (8): 1435–1441. doi: 10.1093/oxfordjournals.molbev.a003929 . ISSN   0737-4038. PMID   11470834.
  7. Mi, Huaiyu; Poudel, Sagar; Muruganujan, Anushya; Casagrande, John T.; Thomas, Paul D. (2016-01-04). "PANTHER version 10: expanded protein families and functions, and analysis tools". Nucleic Acids Research. 44 (D1): D336–D342. doi:10.1093/nar/gkv1194. ISSN   0305-1048. PMC   4702852 . PMID   26578592.