Charles Lawrence (mathematician)

Last updated
Charles Lawrence
NationalityAmerican
Alma mater Rensselaer Polytechnic Institute
Cornell University
Known forBayesian Statistics
Computational Molecular Biology
Statistical Inferences in discrete high-D spaces
Scientific career
Fields Bioinformatics
Applied mathematics
Institutions Brown University

Charles "Chip" Lawrence is an American bioinformatician and mathematician, who is the pioneer in developing novel statistical approaches to biological sequence analysis.

Contents

After his PhD graduation, Lawrence became the assistant professor in Systems Engineering and Operations Research and Statistics, in Rensselaer Polytechnic Institute. In the same time period of time (1971–1975), Lawrence worked as the consultant to the Ministry of Maternal and Child Health in Dominican Republic. From 1975 to 1981, he worked in the New York State Department of Health as the Director of Operations Research and Statistics, in the Division of Epidemiology.

Now, he is the Professor of Applied Mathematics and Center for computational Molecular Biology, at Brown University. [1] From 2004 to 2006, he was the director of the Center for Computational Biology. Now he is the director of the Statistical Molecular Biology Group (SMBG), at Brown University.

Lawrence's key scientific works to date are focusing on algorithmic approaches to biological sequence analysis. In fact, he was one of the first to recognize that the inherent statistical nature of genomic processes and the immense data resulting from genomic sequencing projects could only be fully analyzed by using statistical algorithms.

Early life and education

Lawrence got his bachelor's degree in 1967, in Rensselaer Polytechnic Institute, majoring in physics.

After the graduation from Rensselaer Polytechnic Institute, he pursued further education in Cornell University and moved to another research field: Applied Operation Research and Statistics in Environmental Engineering. He finished his PhD in 1971. His dissertation topic is population dynamics.

Lawrence did not switch to bioinformatics until he finished his PhD study.

Research interests and contributions

Since the 1980s, Lawrence started the research in the field of computational biology. focusing on algorithmic approaches, he was a pioneer in developing novel statistical approaches to biological sequence analysis.

Gibbs sampling in motif finding

Lawrence has particular contributions in the development of sequence alignment algorithms, which is approaching the modif finding problem by integrating the Bayesian statistics and Gibbs sampling strategy. In his seminal paper published in Science in 1993, the first application of the statistical technique Gibbs sampling to the problem of multiple sequence alignment was described and clearly illustrated. [2]

Besides, Lawrence collaborated with others to further develop the Bayesian statistical approaches to RNA secondary structure prediction, which greatly facilitate the predictions on the full ensemble of probable structures that an RNA molecule may adopt.

Lawrence researches the application of Bayesian algorithms, specifically in the statistical approaches for the understanding of biological problems, with particular interest in transcription regulation and identification of regulatory motifs in sequences, antisense oligonucleotide and siRNA design, comparative genomics, the composition of nucleotide sequences and detailed analyses of several protein families. [1]

Software and platforms

In the past several years, based on the statistical algorithm development by Lawrence and his collaborators, several programs have also been publicly available and widely used, such as the Gibbs Motif Sampler, [3] the Bayes aligner, Sfold, [4] BALSA, Gibbs Gaussian Clustering, and Bayesian Motif Clustering. His work in Bayesian Statistics won the Mitchell Prize for outstanding applied Bayesian statistics paper in 2000.

Chip Lawrence Lab

Lawrence became the director of Chip Lawrence Lab at Brown University. Their work is more focused on the applications of the high-D inferences in the biological problems such as the regulatory motif finding, RNAsecondary structure prediction, and genome wide studies of epigenetics; besides, his research interests also expanded into the geoscience areas of change point estimators of paleoclimate records and probabilistic alignment of geological stratigraphic sequences. The application models of stochastic grammars is also studied in Chip Lawrence Lab. [5]

Teaching and traineeship

Lawrence has also devoted time to education.

He developed a tutorial on Bayesian statistics and Gibbs sampling, [1] as well as the introduction courses in Bayesian statistics at Brown University.

Lawrence has mentored several young investigators before he took the job at Brown University. From 1981 to 2003, he worked as the Chief in Wadsworth Center for Laboratories and Research, New York State Department of Health, many young bioinformaticians were trained by him, such as Stephen Bryant.

Dr. Bryant now is the senior Investigator in National Center for Biotechnology Information, National Library of Medicine, and National Institutes of Health, working in the branch of computational biology. His focus is in the area of structural bioinformatics. Dr. Bryant also leads NCBI information resource teams in protein structure, protein family classification, and cheminformatics. These teams maintain NCBI's macromolecular structure database and Cn3D visualization tool, the Conserved Domain Database and CDTree analysis tool, and most recently the PubChem cheminformatics database and associated analysis tools. [6]

Career summary

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

In bioinformatics, BLAST is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence.

Stephen Frank Altschul is an American mathematician who has designed algorithms that are used in the field of bioinformatics. Altschul is the co-author of the BLAST algorithm used for sequence analysis of proteins and nucleotides.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Multiple sequence alignment</span> Alignment of more than two molecular sequences

Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations that appear as differing characters in a single alignment column, and insertion or deletion mutations that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

Warren Richard Gish is the owner of Advanced Biocomputing LLC. He joined Washington University in St. Louis as a junior faculty member in 1994, and was a Research Associate Professor of Genetics from 2002 to 2007.

<span class="mw-page-title-main">UGENE</span>

UGENE is computer software for bioinformatics. It works on personal computer operating systems such as Windows, macOS, or Linux. It is released as free and open-source software, under a GNU General Public License (GPL) version 2.

<span class="mw-page-title-main">DNA binding site</span> Regions of DNA capable of binding to biomolecules

DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence and (2) they are bound by DNA-binding proteins. DNA binding sites are often associated with specialized proteins known as transcription factors, and are thus linked to transcriptional regulation. The sum of DNA binding sites of a specific transcription factor is referred to as its cistrome. DNA binding sites also encompasses the targets of other proteins, like restriction enzymes, site-specific recombinases and methyltransferases.

Phyloscan is a web service for DNA sequence analysis that is free and open to all users. For locating matches to a user-specified sequence motif for a regulatory binding site, Phyloscan provides a statistically sensitive scan of user-supplied mixed aligned and unaligned DNA sequence data. Phyloscan's strength is that it brings together

Jun S. Liu is a Chinese-American statistician focusing on Bayesian statistical inference, statistical machine learning, and computational biology. He was Assistant Professor of Statistics at Harvard University from 1991 to 1994. From 1994 to 2004, he was Assistant, Associate, and full Professor of Statistics at Stanford University. Since 2000, Liu has been Professor of Statistics in the Department of Statistics at Harvard University and held a courtesy appointment at Harvard T.H. Chan School of Public Health.

<span class="mw-page-title-main">Tandy Warnow</span> American computer scientist (active 1984–)

Tandy Warnow is an American computer scientist and Grainger Distinguished Chair in Engineering at the University of Illinois at Urbana–Champaign. She is known for her work on the reconstruction of evolutionary trees, both in biology and in historical linguistics, and also for multiple sequence alignment methods.

Adam C. Siepel is an American computational biologist known for his research in comparative genomics and population genetics, particularly the development of statistical methods and software tools for identifying evolutionarily conserved sequences. Siepel is currently Chair of the Simons Center for Quantitative Biology and Professor in the Watson School for Biological Sciences at Cold Spring Harbor Laboratory.

<span class="mw-page-title-main">David Sankoff</span> Canadian scientist

David Sankoff is a Canadian mathematician, bioinformatician, computer scientist and linguist. He holds the Canada Research Chair in Mathematical Genomics in the Mathematics and Statistics Department at the University of Ottawa, and is cross-appointed to the Biology Department and the School of Information Technology and Engineering. He was founding editor of the scientific journal Language Variation and Change (Cambridge) and serves on the editorial boards of a number of bioinformatics, computational biology and linguistics journals. Sankoff is best known for his pioneering contributions in computational linguistics and computational genomics. He is considered to be one of the founders of bioinformatics. In particular, he had a key role in introducing dynamic programming for sequence alignment and other problems in computational biology. In Pavel Pevzner's words, "[ Michael Waterman ] and David Sankoff are responsible for transforming bioinformatics from a ‘stamp collection' of ill-defined problems into a rigorous discipline with important biological applications."

<span class="mw-page-title-main">Ron Shamir</span> Israeli professor of computer science (born 1953)

Ron Shamir is an Israeli professor of computer science known for his work in graph theory and in computational biology. He holds the Raymond and Beverly Sackler Chair in Bioinformatics, and is the founder and former head of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University.

<span class="mw-page-title-main">Gary Stormo</span> American geneticist (born 1950)

Gary Stormo is an American geneticist and currently Joseph Erlanger Professor in the Department of Genetics and the Center for Genome Sciences and Systems Biology at Washington University School of Medicine in St Louis. He is considered one of the pioneers of bioinformatics and genomics. His research combines experimental and computational approaches in order to identify and predict regulatory sequences in DNA and RNA, and their contributions to the regulatory networks that control gene expression.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

References

  1. 1 2 3 "Applied Math at Brown University". Brown University. Retrieved 26 October 2011.
  2. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993). "Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment". Science. 262 (5131): 208–214. Bibcode:1993Sci...262..208L. doi:10.1126/science.8211139. PMID   8211139.
  3. Gibbs Motif Sampler, Gibbs Motif Sampler, archived from the original on 2011-11-04, retrieved 27 October 2011
  4. Sfold, Sfold, archived from the original on 16 September 2011, retrieved 27 October 2011
  5. Chip Lawrence Lab, Brown University, retrieved 27 October 2011
  6. NCBI, NCBI, retrieved 27 October 2011