David J. Lipman

Last updated
David Lipman
Paul Ginsparg and David Lipman (cropped).jpg
David Lipman in June 2013
Born
David J. Lipman
Alma mater Brown University
University at Buffalo, The State University of New York
Known forInfluence on development of BLAST (biotechnology) [1]
Awards ISCB Senior Scientist Award
Member of the U.S. National Academy of Sciences
ISCB Fellow [2]
Scientific career
Fields Bioinformatics
Computational biology
Sequence comparison methods
Comparative genomics
Molecular evolution
Institutions National Center for Biotechnology Information
Brown University
University at Buffalo, The State University of New York
Notable students Stephen Altschul [3]
Mark Boguski [ citation needed ]
Website www.ncbi.nlm.nih.gov/research/staff/lipman

David J. Lipman is an American biologist who from 1989 [1] to 2017 was the director of the National Center for Biotechnology Information (NCBI) at the National Institutes of Health. [4] [5] NCBI is the home of GenBank, [6] the U.S. node of the International Sequence Database Consortium, and PubMed, one of the most heavily used sites in the world for the search and retrieval of biomedical information. Lipman is one of the original authors of the BLAST sequence alignment program, and a respected figure in bioinformatics. [7] [8] [9] In 2017, he left NCBI and became Chief Science Officer at Impossible Foods. [10]

Contents

Education

Lipman received his undergraduate degree from Brown University and his M.D. in 1980 from the University at Buffalo, The State University of New York [11]

Career

Lipman was the founding director of the National Center for Biotechnology Information, part of the National Library of Medicine at the U.S. National Institutes of Health. Under his leadership, NCBI grew from fewer than a dozen people to more than 500 scientific staff, and it now hosts hundreds of scientific and medical databases including GenBank, PubMed, PubMed Central, dbGaP, dbSNP, the Sequence Read Archive (SRA), RefSeq, PubChem, and many more. The internal research program at NCBI included groups led by Stephen Altschul (another BLAST co-author), David Landsman, Eugene Koonin [12] (a prolific author on comparative genomics), and L. Aravind.

Lipman is very well known for his seminal work on a series of sequence similarity algorithms, starting from the Wilbur-Lipman [13] algorithm in 1983, FASTA search [14] [15] in 1985, BLAST [16] in 1990, and Gapped BLAST and PSI-BLAST [17] in 1997. BLAST eventually became the most widely-used and highly-cited (over 160,000 citations as of 2021) sequence alignment program in the field, and the NCBI BLAST server today is one of its most heavily used resources.

Lipman also worked for many years with Dennis A. Benson and others at NCBI on the maintenance and improvement of GenBank, one of the world's largest databases of genome and protein sequence data. GenBank along with the European Nucleotide Archive and the DNA Data Bank of Japan form the International Nucleotide Sequence Database Collaboration (INSDC), a fully open, unrestricted database of genome sequences that has been the world's repository of such data since 1990. [18] [19] [20]

He was one of the originators of the Influenza Genome Sequencing Project, a project to sequence and make available the genomes of thousands of influenza virus isolates.[ citation needed ]

He was one of the original signatories of the Bethesda Statement on Open Access Publishing.[ citation needed ]

He is also the editor-in-chief for an open-access, peer-reviewed online scientific journal called Biology Direct . [21]

In May 2017, Lipman left his role at the NCBI to join the plant-based meat company Impossible Foods as chief scientific officer. [22]

Awards and honors

Lipman received the Association of Biomolecular Resource Facilities Award for outstanding contributions to Biomolecular Technologies in 1996.

In 2000, he was elected to the National Academy of Medicine. [23]

In 2004, he was awarded the ISCB Senior Scientist Award and elected an ISCB Fellow in 2009 by the International Society for Computational Biology. [2] [24]

In 2005, Dr. Lipman was elected to the US National Academy of Sciences.[ citation needed ]

In 2013, he received the award of a White House "Open Science" Champion of Change. [25] [26]

In 2023, he was awarded the Warren Alpert Foundation Prize. [27]

Related Research Articles

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.

In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. The UniProt database is an example of a protein sequence database. As of 2013 it contained over 40 million sequences and is growing at an exponential rate. Historically, sequences were published in paper form, but as the number of sequences grew, this storage method became unsustainable.

The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence Database Collaboration (INSDC).

Stephen Frank Altschul is an American mathematician who has designed algorithms that are used in the field of bioinformatics. Altschul is the co-author of the BLAST algorithm used for sequence analysis of proteins and nucleotides.

The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences. It involves the following computerized databases: NIG's DNA Data Bank of Japan (Japan), NCBI's GenBank (USA) and the EMBL-EBI's European Nucleotide Archive (UK). New and updated data on nucleotide sequences contributed by research teams to each of the three databases are synchronized on a daily basis through continuous interaction between the staff at each the collaborating organizations.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<span class="mw-page-title-main">Start codon</span> First codon of a messenger RNA translated by a ribosome

The start codon is the first codon of a messenger RNA (mRNA) transcript translated by a ribosome. The start codon always codes for methionine in eukaryotes and archaea and a N-formylmethionine (fMet) in bacteria, mitochondria and plastids.

<span class="mw-page-title-main">Amos Bairoch</span>

Amos Bairoch is a Swiss bioinformatician and Professor of Bioinformatics at the Department of Human Protein Sciences of the University of Geneva where he leads the CALIPHO group at the Swiss Institute of Bioinformatics (SIB) combining bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

<span class="mw-page-title-main">Webb Miller</span> American bioinformatician

Webb Colby Miller is an American bioinformatician who is professor in the Department of Biology and the Department of Computer Science and Engineering at The Pennsylvania State University.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

<span class="mw-page-title-main">Sequence Read Archive</span>

The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).

<span class="mw-page-title-main">European Nucleotide Archive</span> Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

Transmembrane Protein 217 is a protein encoded by the gene TMEM217. TMEM217 has been found to have expression correlated with the lymphatic system and endothelial tissues and has been predicted to have a function linked to the cytoskeleton.

William Raymond Pearson is professor of biochemistry and molecular Genetics in the School of Medicine at the University of Virginia. Pearson is best known for the development of the FASTA format.

VFDB also known as Virulence Factor Database is a database that provides scientist quick access to virulence factors in bacterial pathogens. It can be navigated and browsed using genus or words. A BLAST tool is provided for search against known virulence factors. VFDB contains a collection of 16 important bacterial pathogens. Perl scripts were used to extract positions and sequences of VF from GenBank. Clusters of Orthologous Groups (COG) was used to update incomplete annotations. More information was obtained by NCBI. VFDB was built on Linux operation systems on DELL PowerEdge 1600SC servers.

<span class="mw-page-title-main">Genome mining</span>

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.

References

  1. 1 2 "Research Institute Posts Gene Data on Internet". The New York Times . June 26, 1997.
  2. 1 2 Anon (2017). "ISCB Fellows". iscb.org. International Society for Computational Biology. Archived from the original on 2017-03-20.
  3. "Sense from Sequences: Stephen F. Altschul on Bettering BLAST". 2000. Archived from the original on 2007-10-07.
  4. "David J. Lipman, MD, Director, National Center for Biotechnology Information". Archived from the original on 2013-09-26.
  5. "Open Access Now | Conversation with David Lipman". Biomedcentral.com. Archived from the original on 29 June 2011. Retrieved 2 July 2011.
  6. Benson, D. A.; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Wheeler, D. L. (2007). "GenBank". Nucleic Acids Research. 36 (Database issue): D25–D30. doi:10.1093/nar/gkm929. PMC   2238942 . PMID   18073190.
  7. "david lipman – Google Scholar". Scholar.google.com. Retrieved 2017-03-26.
  8. David J. Lipman publications indexed by Microsoft Academic [ dead link ]
  9. David J. Lipman at DBLP Bibliography Server OOjs UI icon edit-ltr-progressive.svg
  10. "National Library of Medicine Announces Departure of NCBI Director Dr. David Lipman". www.nlm.nih.gov. Retrieved 2017-05-05.
  11. "David J. Lipman, M.D. Biography". nih.gov. Archived from the original on 2017-02-11. Retrieved 2017-02-09.
  12. Tatusov, R. L.; Koonin, E. V.; Lipman, D. J. (1997). "A Genomic Perspective on Protein Families". Science. 278 (5338): 631–637. Bibcode:1997Sci...278..631T. doi:10.1126/science.278.5338.631. PMID   9381173.
  13. Wilbur, W. J.; Lipman, D. J. (1983). "Rapid similarity searches of nucleic acid and protein data banks". Proceedings of the National Academy of Sciences of the United States of America. 80 (3): 726–730. Bibcode:1983PNAS...80..726W. doi: 10.1073/pnas.80.3.726 . PMC   393452 . PMID   6572363.
  14. Lipman, D.; Pearson, W. (1985). "Rapid and sensitive protein similarity searches". Science. 227 (4693): 1435–1441. Bibcode:1985Sci...227.1435L. doi:10.1126/science.2983426. PMID   2983426.
  15. Pearson, W. R.; Lipman, D. J. (1988). "Improved tools for biological sequence comparison". Proceedings of the National Academy of Sciences of the United States of America. 85 (8): 2444–2448. Bibcode:1988PNAS...85.2444P. doi: 10.1073/pnas.85.8.2444 . PMC   280013 . PMID   3162770.
  16. Altschul, Stephen; Gish, Warren; Miller, Webb; Myers, Eugene; Lipman, David (1990). "Basic local alignment search tool". Journal of Molecular Biology . 215 (3): 403–410. doi:10.1016/S0022-2836(05)80360-2. PMID   2231712. S2CID   14441902.
  17. Altschul, S.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. (1997). "Gapped BLAST and PSI-BLAST: A new generation of protein database search programs". Nucleic Acids Research. 25 (17): 3389–3402. doi:10.1093/nar/25.17.3389. PMC   146917 . PMID   9254694.
  18. Benson, D. A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E. W. (2012). "GenBank". Nucleic Acids Research. 41 (Database issue): D36–D42. doi:10.1093/nar/gks1195. PMC   3531190 . PMID   23193287.
  19. Benson, D. A.; Karsch-Mizrachi, I.; Clark, K.; Lipman, D. J.; Ostell, J.; Sayers, E. W. (2011). "GenBank". Nucleic Acids Research. 40 (Database issue): D48–D53. doi:10.1093/nar/gkr1202. PMC   3245039 . PMID   22144687.
  20. Benson, D. A.; Karsch-Mizrachi, I.; Lipman, D. J.; Ostell, J.; Sayers, E. W. (2010). "GenBank". Nucleic Acids Research. 39 (Database issue): D32–D37. doi:10.1093/nar/gkq1079. PMC   3013681 . PMID   21071399.
  21. "Biology Direct | Editorial board". Archived from the original on 2011-09-30. Retrieved 2011-10-28.
  22. "National Library of Medicine Announces Departure of NCBI Director Dr. David Lipman". www.nlm.nih.gov. Retrieved 2017-05-04.
  23. "Institute of Medicine Elects New Members".
  24. "ISCB Names 2004 Senior Scientist Accomplishment Award Winner, Dr. David Lipman ISCB Newsletter 7-3". Iscb.org. Retrieved 2 July 2011.
  25. "Open Science | the White House". whitehouse.gov . Archived from the original on 2017-01-21. Retrieved 2016-04-02 via National Archives.
  26. "Dr. David Lipman Receives White House "Open Science" Champions of Change Award on Behalf of NCBI". Ncbi.nlm.nih.gov. Retrieved 2 April 2016.
  27. Warren Alpert Foundation Prize 2023