List of liquid–liquid phase separation databases

Last updated

Liquid-liquid phase separation (LLPS) is well defined in the Biomolecular condensate page.

LLPS databases cover different aspects of LLPS phenomena, ranging from cellular location of the Membraneless Organelles (MLOs) to the role of a particular protein/region forming the condensate state. These databases contain manually curated data supported by experimental evidence in the literature and can include related features as presence of protein disorder, low complexity, post-translational modifications, experimental details, phase diagrams, among others.

Table 1
DatabaseCreatedRecordsDescription - Type of data
PhaSepDB2.0 [1] 20192957Aims at collecting proteins that were found in membraneless organelles (MLOs), and organizes its entries according to MLO location. All entries can be classified into three groups depending on the quality of annotation: (i) Reviewed, verified by PhaSepDB curators, (ii) UniProtKB reviewed, pulled from UniProtKB, and (iii) high-throughput, identified by high-throughput experiments - MLO localization/association
MloDisDB [2] 2021771Manually curated database, developed by the same research group of PhaSepDB, but focusing on the association between MLOs and diseases - MLO localization/association and diseases
PhaSePro [3] 2019121Manually curated database of LLPS drivers, solely based on experimentally verified cases of LLPS - Drivers/Scaffolds
LLPSDB [4] 20191182Collection of in vitro LLPS experiments detailing the outcome and parameters of thousands of experiments -Experiments
DrLLPS [5] 20199285Database of LLPS proteins from nine model organisms - Clients, regulators, drivers/scaffolds

See also

Related Research Articles

<span class="mw-page-title-main">Pfam</span> Database of protein families

Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 36.0, was released in September 2023 and contains 20,795 families.

<span class="mw-page-title-main">Coacervate</span> Aqueous phase rich in macromolecules

Coacervate is an aqueous phase rich in macromolecules such as synthetic polymers, proteins or nucleic acids. It forms through liquid-liquid phase separation (LLPS), leading to a dense phase in thermodynamic equilibrium with a dilute phase. The dispersed droplets of dense phase are also called coacervates, micro-coacervates or coacervate droplets. These structures draw a lot of interest because they form spontaneously from aqueous mixtures and provide stable compartmentalization without the need of a membrane.

InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them.

Protein subfamily is a level of protein classification, based on their close evolutionary relationship. It is below the larger levels of protein superfamily and protein family.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">Therapeutic Targets Database</span> Database of protein targets in drug design

Therapeutic Target Database (TTD) is a pharmaceutical and medical repository constructed by the Innovative Drug Research and Bioinformatics Group (IDRB) at Zhejiang University, China and the Bioinformatics and Drug Design Group at the National University of Singapore. It provides information about known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Detailed knowledge about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, and clinical development status. TTD is freely accessible without any login requirement at https://idrblab.org/ttd/.

<span class="mw-page-title-main">WikiPathways</span>

WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of admins, but the bulk of peer review, editorial curation, and maintenance is the responsibility of the user community. WikiPathways is originally built using MediaWiki software, a custom graphical pathway editing tool (PathVisio) and integrated BridgeDb databases covering major gene, protein, and metabolite systems. WikiPathways was founded in 2008 by Thomas Kelder, Alex Pico, Martijn Van Iersel, Kristina Hanspers, Bruce Conklin and Chris Evelo. Current architects are Alex Pico and Martina Summer-Kutmon.

MatrixDB is a biological database focused on molecular interactions between extracellular proteins and polysaccharides. MatrixDB takes into account the multimeric nature of the extracellular proteins. The database was initially released in 2009 and is maintained by the research group of Sylvie Ricard-Blum at UMR5246, Claude Bernard University Lyon 1.

DiProDB is a database designed to collect and analyse thermodynamic, structural and other dinucleotide properties.

OMA is a database of orthologs extracted from available complete genomes. The orthology predictions of OMA are available in several forms:

<span class="mw-page-title-main">Protein fold class</span> Categories of protein tertiary structure

In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent protein superfamilies.

<span class="mw-page-title-main">European Nucleotide Archive</span> Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

<span class="mw-page-title-main">Biomolecular condensate</span> Class of membrane-less organelles within biological cells

In biochemistry, biomolecular condensates are a class of membrane-less organelles and organelle subdomains, which carry out specialized functions within the cell. Unlike many organelles, biomolecular condensate composition is not controlled by a bounding membrane. Instead, condensates can form and maintain organization through a range of different processes, the most well-known of which is phase separation of proteins, RNA and other biopolymers into either colloidal emulsions, gels, liquid crystals, solid crystals or aggregates within cells.

APBS is a free and open-source software for solving the equations of continuum electrostatics intended primarily for the large biomolecular systems. It is available under the BSD license.

PathoPhenoDB is a biological database. The database connects pathogens to their phenotypes using multiple databases such as NCBI, Human Disease Ontology Human Phenotype Ontology, Mammalian Phenotype Ontology, PubChem, SIDER and CARD. Pathogen-disease associations were gathered mainly through the CDC and the List of Infectious Diseases page on Wikipedia. The manner by which they assigned taxonomy was semi-automatic. When mapped against NCBI Taxonomy, if the pathogen was not an exact match, it was then mapped to the parent class. PathoPhenoDB employs NPMI in order to filter pairs based on their co-occurrence statistics.

In molecular biology, MvirDB is a publicly available database that stores information on toxins, virulence factors and antibiotic resistance genes. Sources that this database uses for DNA and protein information include: Tox-Prot, SCORPION, the PRINTS Virulence Factors, VFDB, TVFac, Islander, ARGO and VIDA. The database provides a BLAST tool that allows the user to query their sequence against all DNA and protein sequences in MvirDB. Information on virulence factors can be obtained from the usage of the provided browser tool. Once the browser tool is used, the results are returned as a readable table that is organized by ascending E-Values, each of which are hyperlinked to their related page. MvirDB is implemented in an Oracle 10g relational database.

David S. Wishart is a Canadian researcher and a Distinguished University Professor in the Department of Biological Sciences and the Department of Computing Science at the University of Alberta. Wishart also holds cross appointments in the Faculty of Pharmacy and Pharmaceutical Sciences and the Department of Laboratory Medicine and Pathology in the Faculty of Medicine and Dentistry. Additionally, Wishart holds a joint appointment in metabolomics at the Pacific Northwest National Laboratory in Richland, Washington. Wishart is well known for his pioneering contributions to the fields of protein NMR spectroscopy, bioinformatics, cheminformatics and metabolomics. In 2011, Wishart founded the Metabolomics Innovation Centre (TMIC), which is Canada's national metabolomics laboratory.

<span class="mw-page-title-main">Dishevelled binding antagonist of beta catenin 1</span> Developmental protein

Dishevelled binding antagonist of beta catenin 1 is a protein that in humans is encoded by the DACT1 gene. Dact1 was originally described in 2002 as a negative regulator of Wnt signaling by binding and destabilizing Dishevelled. More recent investigation into the molecular function of Dact1 has identified its principle role in the cell as a scaffold to generate membrane-less biomolecular condensates through liquid-liquid phase separation. Mutations in the phase-separating regions of Dact1 lead to Townes-Brock Syndrome 2 while its overexpression is associated with bone metastasis.

LLPS often involves sequence regions that have unique functional characteristics, as well as the presence of prion-like and RNA-binding domains. Nowadays there are just a few methods to predict the propensity of a protein to drive LLPS. The range of biological mechanisms involved in LLPS, the limited knowledge about these mechanisms and the important context-dependent component of LLPS make this problem challenging. In the last years, despite the advances in this field, just few predictors, specific for LLPS, have been developed, trying to understand the relationship between protein sequence properties and the capability to drive LLPS. Here we will revise the state-of-the-art LLPS sequence-based predictors, briefly introducing them and explaining which are the individual protein characteristics that they identify in the context of LLPS.

References

  1. You, Kaiqiang; Huang, Qi; Yu, Chunyu; Shen, Boyan; Sevilla, Cristoffer; Shi, Minglei; Hermjakob, Henning; Chen, Yang; Li, Tingting (2019-10-04). "PhaSepDB: a database of liquid–liquid phase separation related proteins". Nucleic Acids Research. 48 (D1): D354–D359. doi:10.1093/nar/gkz847. ISSN   0305-1048. PMC   6943039 . PMID   31584089.
  2. Hou, Chao; Xie, Haotai; Fu, Yang; Ma, Yao; Li, Tingting (2020-10-30). "MloDisDB: a manually curated database of the relations between membraneless organelles and diseases". Briefings in Bioinformatics. 22 (4). doi:10.1093/bib/bbaa271. ISSN   1467-5463. PMID   33126250.
  3. Mészáros, Bálint; Erdős, Gábor; Szabó, Beáta; Schád, Éva; Tantos, Ágnes; Abukhairan, Rawan; Horváth, Tamás; Murvai, Nikoletta; Kovács, Orsolya P; Kovács, Márton; Tosatto, Silvio C E (2019-10-15). "PhaSePro: the database of proteins driving liquid–liquid phase separation". Nucleic Acids Research. 48 (D1): D360–D367. doi:10.1093/nar/gkz848. ISSN   0305-1048. PMC   7145634 . PMID   31612960.
  4. Li, Qian; Peng, Xiaojun; Li, Yuanqing; Tang, Wenqin; Zhu, Jia’an; Huang, Jing; Qi, Yifei; Zhang, Zhuqing (2019-09-06). "LLPSDB: a database of proteins undergoing liquid–liquid phase separation in vitro". Nucleic Acids Research. 48 (D1): D320–D327. doi:10.1093/nar/gkz778. ISSN   0305-1048. PMC   6943074 . PMID   31906602.
  5. Ning, Wanshan; Guo, Yaping; Lin, Shaofeng; Mei, Bin; Wu, Yu; Jiang, Peiran; Tan, Xiaodan; Zhang, Weizhi; Chen, Guowei; Peng, Di; Chu, Liang (2020-01-08). "DrLLPS: a data resource of liquid–liquid phase separation in eukaryotes". Nucleic Acids Research. 48 (D1): D288–D295. doi:10.1093/nar/gkz1027. ISSN   0305-1048. PMC   7145660 . PMID   31691822.