PSORTdb

Last updated
PSORTdb
PSORTdb Logo.gif
Content
Descriptionprotein subcellular localization database
Organisms Prokaryotes
Contact
Research center Simon Fraser University
LaboratoryDepartment of Molecular Biology and Biochemistry
AuthorsSébastien Rey
Primary citationRey & al. (2005) [1]
Release date2010 (v. 2.0)
Access
Website http://db.psort.org/
Miscellaneous
License GNU General Public License
Version2.0

PSORTdb is a database of protein subcellular localization (SCL) for bacteria and archaea. It is a member of the PSORT family of bioinformatics tools. [1] The database consists of two datasets, ePSORTdb and cPSORTdb, which contain information determined through experimental validation and computational prediction, respectively. The ePSORTdb dataset is the largest curated collection of experimentally verified SCL data. [2]

Contents

PSORTdb was initially developed in 2005 in order to contain protein subcellular localization predictions for bacteria. [3] The computational predictions in the cPSORTdb dataset were generated by the PSORT tool PSORTb 2.0, the most accurate SCL predictor of the time. [3] [4]

The second and current version of PSORTdb was released in 2010. Entries in the database are automatically generated as newly sequenced prokaryotic genomes become available through NCBI. As of the second release, ePSORTdb contained over 12,000 entries for bacterial proteins and 800 entries for archaeal proteins; cPSORTdb contained SCL predictions for over 3,700,000 proteins in total. The ePSORTdb data is derived from a manual literature search (using PubMed) as well as Swiss-Prot annotations. The cPSORTdb predictions are generated using the updated PSORTb 3.0 tool. [3]

PSORTdb can be accessed through a web interface or via BLAST. The entire database is also available for download under the GNU General Public License. Each term in the database is associated with an identifier from the Gene Ontology in order to allow integration with other bioinformatics resources. [1]

As of 2014, PSORTdb is maintained through the laboratory of Fiona Brinkman at Simon Fraser University. [5] [6]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

Protein subcellular localization prediction involves the prediction of where a protein resides in a cell, its subcellular localization.

<span class="mw-page-title-main">PROSITE</span> Database of protein domains, families and functional sites

PROSITE is a protein database. It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them. These are manually curated by a team of the Swiss Institute of Bioinformatics and tightly integrated into Swiss-Prot protein annotation. PROSITE was created in 1988 by Amos Bairoch, who directed the group for more than 20 years. Since July 2018, the director of PROSITE and Swiss-Prot is Alan Bridge.

<span class="mw-page-title-main">PSORT</span>

PSORT is a bioinformatics tool used for the prediction of protein localisation sites in cells. It receives the information of an amino acid sequence and its taxon of origin as inputs. Then it analyses the input sequence by applying the stored rules for various sequence features of known protein sorting signals. Finally, it reports the possibility for the input protein to be localised at each candidate site with additional information.

<span class="mw-page-title-main">Fiona Brinkman</span> Canadian medical researcher

Fiona Brinkman is a Professor in Bioinformatics and Genomics in the Department of Molecular Biology and Biochemistry at Simon Fraser University in British Columbia, Canada, and is a leader in the area of microbial bioinformatics. She is interested in developing "more sustainable, holistic approaches for infectious disease control and conservation of microbiomes".

The cells of eukaryotic organisms are elaborately subdivided into functionally-distinct membrane-bound compartments. Some major constituents of eukaryotic cells are: extracellular space, plasma membrane, cytoplasm, nucleus, mitochondria, Golgi apparatus, endoplasmic reticulum (ER), peroxisome, vacuoles, cytoskeleton, nucleoplasm, nucleolus, nuclear matrix and ribosomes.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

Protein function prediction methods are techniques that bioinformatics researchers use to assign biological or biochemical roles to proteins. These proteins are usually ones that are poorly studied or predicted based on genomic sequence data. These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression profiles, protein domain structures, text mining of publications, phylogenetic profiles, phenotypic profiles, and protein-protein interaction. Protein function is a broad term: the roles of proteins range from catalysis of biochemical reactions to transport to signal transduction, and a single protein may play a role in multiple processes or cellular pathways.

LocDB is an expert-curated database that collects experimental annotations for the subcellular localization of proteins in Homo sapiens (human) and Arabidopsis thaliana (Weed). The database also contains predictions of subcellular localization from a variety of state-of-the-art prediction methods for all proteins with experimental information.

BASys is a freely available web server that can be used to perform automated, comprehensive annotation of bacterial genomes. With the advent of next generation DNA sequencing it is now possible to sequence the complete genome of a bacterium within a single day. This has led to an explosion in the number of fully sequenced microbes. In fact, as of 2013, there were more than 2700 fully sequenced bacterial genomes deposited with GenBank. However, a continuing challenge with microbial genomics is finding the resources or tools for annotating the large number of newly sequenced genomes. BASys was developed in 2005 in anticipation of these needs. In fact, BASys was the world’s first publicly accessible microbial genome annotation web server. Because of its widespread popularity, the BASys server was updated in 2011 through the addition of multiple server nodes to handle the large number of queries it was receiving.

Proteome Analyst (PA) is a freely available web server and online toolkit for predicting protein subcellular localization, or where a protein resides in a cell. In the field of proteomics, accurately predicting a protein's subcellular localization, or where a specific protein is located inside a cell, is an important step in the large scale study of proteins. This computational prediction problem is known as Protein subcellular localization prediction. Over the last decade, more than a dozen web servers and computer programs have been developed to attempt to solve this problem. Proteome Analyst is an example of one of the better performing subcellular prediction tools. Proteome Analyst makes predictions for both prokaryotic eukaryotic proteins using a text mining approach. Proteome Analyst was originally developed by the Proteome Analyst Research Group at the University of Alberta, and was initially released in March 2004. It was recently updated in January 2014.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Non-coding RNAs have been discovered using both experimental and bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA, and are thus able to discover entirely new kinds of ncRNAs.

FAM237A is a protein coding gene which encodes a protein of the same name. Within Homo sapiens, FAM237A is believed to be primarily expressed within the brain, with moderate heart and lesser testes expression,. FAM237A is hypothesized to act as a specific activator of receptor GPR83.

<span class="mw-page-title-main">Genome mining</span>

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.

<span class="mw-page-title-main">C13orf42</span> C13orf42 gene page

C13orf42 is a protein which, in humans, is encoded by the gene chromosome 13 open reading frame 42 (C13orf42). RNA sequencing data shows low expression of the C13orf42 gene in a variety of tissues. The C13orf42 protein is predicted to be localized in the mitochondria, nucleus, and cytosol. Tertiary structure predictions for C13orf42 indicate multiple alpha helices.

References

  1. 1 2 3 Rey, Sébastien; Acab Michael; Gardy Jennifer L; Laird Matthew R; deFays Katalin; Lambert Christophe; Brinkman Fiona S L (Jan 2005). "PSORTdb: a protein subcellular localization database for bacteria". Nucleic Acids Res. 33 (Database issue): D164–8. doi:10.1093/nar/gki027. PMC   539981 . PMID   15608169.
  2. "PSORTdb :: Home" . Retrieved 1 September 2014.
  3. 1 2 3 Yu, N. Y.; Laird, M. R.; Spencer, C.; Brinkman, F. S. L. (10 November 2010). "PSORTdb--an expanded, auto-updated, user-friendly protein subcellular localization database for Bacteria and Archaea". Nucleic Acids Research. 39 (Database): D241–D244. doi:10.1093/nar/gkq1093. PMC   3013690 . PMID   21071402.
  4. Gardy, JL; Laird, MR; Chen, F; Rey, S; Walsh, CJ; Ester, M; Brinkman, FS (Mar 1, 2005). "PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis". Bioinformatics. 21 (5): 617–23. doi: 10.1093/bioinformatics/bti057 . PMID   15501914.
  5. "PSORTdb :: About" . Retrieved 1 September 2014.
  6. "Computational Tools Developed - Fiona Brinkman Laboratory" . Retrieved 1 September 2014.