Steven Salzberg | |
---|---|
Born | Steven Lloyd Salzberg 1960 (age 63–64) |
Alma mater | Yale University Harvard University |
Known for | GLIMMER [1] MUMmer [2] AMOS assembler [3] Bowtie [4] TopHat [5] |
Spouse | Claudia Pasche [6] |
Awards | Ben Franklin Award (2013) |
Scientific career | |
Institutions | University of Maryland, College Park The Institute for Genomic Research Johns Hopkins University |
Thesis | Learning with nested generalized exemplars (1989) |
Doctoral advisor | William Aaron Woods [7] |
Doctoral students | |
Other notable students | Olga Troyanskaya [8] |
Website | salzberg-lab |
Steven Lloyd Salzberg (born 1960) is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.
Salzberg was born in 1960 as one of four children to Herman Salzberg, a Distinguished Professor Emeritus of Psychology, and Adele Salzberg, a retired school teacher. [9] Salzberg did his undergraduate studies at Yale University where he received his Bachelor of Arts degree in English in 1980. In 1981 he returned to Yale, and he received his Master of Science and Master of Philosophy degrees in Computer Science in 1982 and 1984, respectively. After several years in a startup company, he enrolled at Harvard University, where he earned a Ph.D. in Computer Science in 1989. [10]
After obtaining his undergraduate degree, he worked for a local power company in South Carolina, where he gained programming experience on an IBM mainframe, [11] programming in COBOL and IBM assembly language. He then joined a Boston-based AI startup upon completion of his masters degree in Computer Science. [11]
After earning his Ph.D., Salzberg joined Johns Hopkins University as an assistant professor in the Department of Computer Science, and was promoted to associate professor in 1997. From 1998 to 2005, he was the head of the Bioinformatics department at The Institute for Genomic Research, one of the world's largest genome sequencing centers. Salzberg then joined the Department of Computer Science at the University of Maryland, College Park, where he was the Horvitz Professor of Computer Science as well as the Director of the Center for Bioinformatics and Computational Biology. In 2011, Salzberg returned to Johns Hopkins University as a professor in the McKusick-Nathans Institute of Genetic Medicine and in the Department of Medicine. [6] [12] [13]
In 2013, Salzberg won the Benjamin Franklin award [14] in bioinformatics.
In 2014, he was named a Bloomberg Distinguished Professor at Johns Hopkins University for his accomplishments as an interdisciplinary researcher and excellence in teaching the next generation of scholars. [15] The Bloomberg Distinguished Professorships were established in 2013 by a gift from Michael Bloomberg. [16] Salzberg holds joint appointments in the Johns Hopkins Whiting School of Engineering, Johns Hopkins School of Medicine, and the Johns Hopkins Bloomberg School of Public Health.
Salzberg has been a prominent scientist in the field of bioinformatics and computational biology since the 1990s. He has made many contributions to gene finding algorithms, notably the GLIMMER [17] program for bacterial gene finding as well as several related programs for finding genes in animals, plants, and other organisms. He has also been a leader in genome assembly research and has led the assembly of dozens of genomes, both large and small. He was a participant in the human genome project [18] as well as many other genome projects, including the malaria genome ( Plasmodium falciparum ) and the genome of the model plant Arabidopsis thaliana . In 2001–2002, he and his colleagues sequenced the anthrax that was used in the 2001 anthrax attacks. They published their results in the journal Science in 2002. [19] These findings helped the FBI track the source of the attacks to a single vial at Ft. Detrick in Frederick, Maryland.
Salzberg together with David Lipman and Lone Simonsen started the Influenza Genome Sequencing Project in 2003, a project to sequence and make available the genomes of thousands of influenza virus isolates. [20] [21]
Soon after the advent of next-generation sequencing (NGS) in the mid-2000s, Salzberg's research lab and his collaborators developed a suite of highly efficient, accurate programs for alignment of NGS sequences to large genomes and for assembly of sequences from RNA-Seq experiments. These include the "Tuxedo" suite, comprising the Bowtie, TopHat, and Cufflinks programs, which have been cited tens of thousands of times in the years since their publication.
Salzberg has also been a vocal advocate against pseudoscience and has authored editorials and appeared in print media on this topic. Since 2010, he has written a column at Forbes magazine [22] on science, medicine, and pseudoscience, where he has published hundreds of articles that have received tens of millions of views. His work at Forbes won the 2012 Robert P. Balles Prize in Critical Thinking. [23]
Salzberg was a charter member of the Cambridge Working Group in 2014, which was created to express alarm in the scientific community over the creation of highly transmissible and contagious viruses (also called Gain-of-function research) and the likelihood of an accidental lab release. [24]
Salzberg has authored or co-authored over 300 scientific publications. [25] He has more than 300,000 citations in Google Scholar and an h-index of 159. [26] In 2014 and every year since (through at least 2022), Salzberg was selected for inclusion in HighlyCited.com, a ranking compiled by the Institute for Scientific Information of scientists who are among the top 1% most cited for their subject field during the previous ten years. He was also chosen for this list when it was first created in 2001. This list of highly cited researchers continues under Clarivate, and Salzberg was also included in the list in 2018, 2019, 2020, 2021, 2022, and 2023. [27]
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is often referred to as computational biology, though the distinction between the two terms is often disputed.
Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.
In bioinformatics, GLIMMER (Gene Locator and Interpolated Markov ModelER) is used to find genes in prokaryotic DNA. "It is effective at finding genes in bacteria, archea, viruses, typically finding 98-99% of all relatively long protein coding genes". GLIMMER was the first system that used the interpolated Markov model to identify coding regions. The GLIMMER software is open source and is maintained by Steven Salzberg, Art Delcher, and their colleagues at the Center for Computational Biology at Johns Hopkins University. The original GLIMMER algorithms and software were designed by Art Delcher, Simon Kasif and Steven Salzberg and applied to bacterial genome annotation in collaboration with Owen White.
Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.
In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids or proteins across species, or within a genome, or between donor and receptor taxa. Conservation indicates that a sequence has been maintained by natural selection.
Webb Colby Miller is an American bioinformatician who is professor in the Department of Biology and the Department of Computer Science and Engineering at The Pennsylvania State University.
In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides, k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers, three 2-mers, two 3-mers and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers.
MUMmer is a bioinformatics software system for sequence alignment. It is based on the suffix tree data structure. It has been used for comparing different genomes assemblies to one another, which allows scientists to determine how a genome has changed. The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs.
RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.
Richard Michael Durbin is a British computational biologist and Al-Kindi Professor of Genetics at the University of Cambridge. He also serves as an associate faculty member at the Wellcome Sanger Institute where he was previously a senior group leader.
In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.
Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics. The source code for the package is distributed freely and compiled binaries are available for Linux, macOS and Windows platforms. As of 2017, the Genome Biology paper describing the original Bowtie method has been cited more than 11,000 times. Bowtie is open-source software and is currently maintained by Johns Hopkins University.
Lior Samuel Pachter is a computational biologist. He works at the California Institute of Technology, where he is the Bren Professor of Computational Biology. He has widely varied research interests including genomics, combinatorics, computational geometry, machine learning, scientific computing, and statistics.
TopHat is an open-source bioinformatics tool for the throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies using Bowtie first and then mapping to a reference genome to discover RNA splice sites de novo. TopHat aligns RNA-Seq reads to mammalian-sized genomes.
Ben Langmead is a computational biologist and associate professor in the Computational Biology & Medicine Group at Johns Hopkins University.
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Owen R. White is a bioinformatician and director of the Institute For Genome Sciences at the University of Maryland School of Medicine, United States. He is known for his work on the bioinformatics tools GLIMMER and MUMmer.
Bruce Colston Trapnell Jr. is an assistant professor in the Department of Genome Sciences at the University of Washington. He was awarded the Overton Prize by the International Society for Computational Biology (ISCB) for “outstanding accomplishment in the early to mid stage of his career” in 2018.