David Lipman | |
---|---|
Born | David J. Lipman |
Alma mater | Brown University University at Buffalo, The State University of New York |
Known for | Influence on development of BLAST (biotechnology) [1] |
Awards | ISCB Senior Scientist Award Member of the U.S. National Academy of Sciences ISCB Fellow [2] |
Scientific career | |
Fields | Bioinformatics Computational biology Sequence comparison methods Comparative genomics Molecular evolution |
Institutions | National Center for Biotechnology Information Brown University University at Buffalo, The State University of New York |
Notable students | Stephen Altschul [3] Mark Boguski [ citation needed ] |
Website | www |
David J. Lipman is an American biologist who from 1989 [1] to 2017 was the director of the National Center for Biotechnology Information (NCBI) at the National Institutes of Health. [4] [5] NCBI is the home of GenBank, [6] the U.S. node of the International Sequence Database Consortium, and PubMed, one of the most heavily used sites in the world for the search and retrieval of biomedical information. Lipman is one of the original authors of the BLAST sequence alignment program, and a respected figure in bioinformatics. [7] [8] [9] In 2017, he left NCBI and became Chief Science Officer at Impossible Foods. [10]
Lipman received his undergraduate degree from Brown University and his M.D. in 1980 from the University at Buffalo, The State University of New York [11]
Lipman was the founding director of the National Center for Biotechnology Information, part of the National Library of Medicine at the U.S. National Institutes of Health. Under his leadership, NCBI grew from fewer than a dozen people to more than 500 scientific staff, and it now hosts hundreds of scientific and medical databases including GenBank, PubMed, PubMed Central, dbGaP, dbSNP, the Sequence Read Archive (SRA), RefSeq, PubChem, and many more. The internal research program at NCBI included groups led by Stephen Altschul (another BLAST co-author), David Landsman, Eugene Koonin [12] (a prolific author on comparative genomics), and L. Aravind.
Lipman is very well known for his seminal work on a series of sequence similarity algorithms, starting from the Wilbur-Lipman [13] algorithm in 1983, FASTA search [14] [15] in 1985, BLAST [16] in 1990, and Gapped BLAST and PSI-BLAST [17] in 1997. BLAST eventually became the most widely-used and highly-cited (over 160,000 citations as of 2021) sequence alignment program in the field, and the NCBI BLAST server today is one of its most heavily used resources.
Lipman also worked for many years with Dennis A. Benson and others at NCBI on the maintenance and improvement of GenBank, one of the world's largest databases of genome and protein sequence data. GenBank along with the European Nucleotide Archive and the DNA Data Bank of Japan form the International Nucleotide Sequence Database Collaboration (INSDC), a fully open, unrestricted database of genome sequences that has been the world's repository of such data since 1990. [18] [19] [20]
He was one of the originators of the Influenza Genome Sequencing Project, a project to sequence and make available the genomes of thousands of influenza virus isolates.[ citation needed ]
He was one of the original signatories of the Bethesda Statement on Open Access Publishing.[ citation needed ]
He is also the editor-in-chief for an open-access, peer-reviewed online scientific journal called Biology Direct . [21]
In May 2017, Lipman left his role at the NCBI to join the plant-based meat company Impossible Foods as chief scientific officer. [22]
Lipman received the Association of Biomolecular Resource Facilities Award for outstanding contributions to Biomolecular Technologies in 1996.
In 2000, he was elected to the National Academy of Medicine. [23]
In 2004, he was awarded the ISCB Senior Scientist Award and elected an ISCB Fellow in 2009 by the International Society for Computational Biology. [2] [24]
In 2005, Dr. Lipman was elected to the US National Academy of Sciences.[ citation needed ]
In 2013, he received the award of a White House "Open Science" Champion of Change. [25] [26]
In 2023, he was awarded the Warren Alpert Foundation Prize. [27]
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.
In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.
In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. The UniProt database is an example of a protein sequence database. As of 2013 it contained over 40 million sequences and is growing at an exponential rate. Historically, sequences were published in paper form, but as the number of sequences grew, this storage method became unsustainable.
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence Database Collaboration (INSDC).
Stephen Frank Altschul is an American mathematician who has designed algorithms that are used in the field of bioinformatics. Altschul is the co-author of the BLAST algorithm used for sequence analysis of proteins and nucleotides.
The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences. It involves the following computerized databases: NIG's DNA Data Bank of Japan (Japan), NCBI's GenBank (USA) and the EMBL-EBI's European Nucleotide Archive (UK). New and updated data on nucleotide sequences contributed by research teams to each of the three databases are synchronized on a daily basis through continuous interaction between the staff at each the collaborating organizations.
The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.
The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids.
Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.
Webb Colby Miller is an American bioinformatician who is professor in the Department of Biology and the Department of Computer Science and Engineering at The Pennsylvania State University.
MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.
The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.
The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).
The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.
Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.
In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.
Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.
William Raymond Pearson is professor of biochemistry and molecular Genetics in the School of Medicine at the University of Virginia. Pearson is best known for the development of the FASTA format.
VFDB also known as Virulence Factor Database is a database that provides scientist quick access to virulence factors in bacterial pathogens. It can be navigated and browsed using genus or words. A BLAST tool is provided for search against known virulence factors. VFDB contains a collection of 16 important bacterial pathogens. Perl scripts were used to extract positions and sequences of VF from GenBank. Clusters of Orthologous Groups (COG) was used to update incomplete annotations. More information was obtained by NCBI. VFDB was built on Linux operation systems on DELL PowerEdge 1600SC servers.
Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.