Philip Palmer Green

Last updated
Philip Palmer Green
EducationBerkeley
Known forDeveloping important algorithms and procedures used in Gene mapping and DNA sequencing
AwardsGairdner Award
Scientific career
FieldsTheoretical and computational biology
Thesis C*-algebra (1976)
Doctoral advisor Marc Rieffel

Philip Palmer Green is a theoretical and computational biologist noted for developing important algorithms and procedures used in Gene mapping and DNA sequencing. He earned his doctorate from Berkeley in mathematics in 1976 with a dissertation on C*-algebra under the direction of Marc Rieffel, but transitioned from pure mathematics into applied work in biology and bioinformatics. Green has obtained numerous important results, including in developing Phred, [1] a widely used DNA trace analyzer, [2] [3] in mapping techniques, [4] and in genetic analysis. [5] [6] Green was elected to the National Academy of Sciences in 2001 and won the Gairdner Award in 2002. [7]

Contents

See also

Related Research Articles

Bioinformatics Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.

Genetics Science of genes, heredity, and variation in living organisms

Genetics is a branch of biology concerned with the study of genes, genetic variation, and heredity in organisms.

Genome entirety of an organisms hereditary information; genome of organism (encoded by the genomic DNA) is the (biological) information of heredity which is passed from one generation of organism to the next; is transcribed to produce various RNAs

In the fields of molecular biology and genetics, a genome is the genetic material of an organism. It consists of DNA. The genome includes both the genes and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA. The study of the genome is called genomics.

Polymerase chain reaction Laboratory technique to multiply a DNA sample for study

Polymerase chain reaction (PCR) is a method widely used in molecular biology to rapidly make millions to billions of copies of a specific DNA sample allowing scientists to take a very small sample of DNA and amplify it to a large enough amount to study in detail. PCR was invented in 1983 by Kary Mullis. It is fundamental to much of genetic testing including analysis of ancient samples of DNA and identification of infectious agents. Using PCR, copies of very small amounts of DNA sequences are exponentially amplified in a series or cycles of temperature changes. PCR is now a common and often indispensable technique used in medical laboratory and clinical laboratory research for a broad variety of applications including biomedical research and criminal forensics.

Human genome complete set of nucleic acid sequence for humans

The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome, and the mitochondrial genome. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes, which are contained in germ cells consist of three billion DNA base pairs, while diploid genomes have twice the DNA content. While there are significant differences among the genomes of human individuals, these are considerably smaller than the differences between humans and their closest living relatives, the bonobos and chimpanzees.

Genomics discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

DNA sequencer

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

Functional genomics field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional “gene-by-gene” approach.

DNA sequencing process of determining the nucleic acid sequence – the order of nucleotides in DNA

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Sanger sequencing Method of DNA sequencing developed in 1977

Sanger sequencing is a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. Developed by Frederick Sanger and colleagues in 1977, it was the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by "Next-Gen" sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use, for smaller-scale projects, and for validation of Next-Gen results. It still has the advantage over short-read sequencing technologies that it can produce DNA sequence reads of > 500 nucleotides.

Ancient DNA Method of archaeological study

Ancient DNA (aDNA) is DNA isolated from ancient specimens. Due to degradation processes ancient DNA is more degraded in comparison with contemporary genetic material. Even under the best preservation conditions, there is an upper boundary of 0.4–1.5 million years for a sample to contain sufficient DNA for sequencing technologies. Genetic material has been recovered from paleo/archaeological and historical skeletal material, mummified tissues, archival collections of non-frozen medical specimens, preserved plant remains, ice and from permafrost cores, marine and lake sediments and excavation dirt.

Phred quality score

A Phred quality score is a measure of the quality of the identification of the nucleobases generated by automated DNA sequencing. It was originally developed for Phred base calling to help in the automation of DNA sequencing in the Human Genome Project. Phred quality scores are assigned to each nucleotide base call in automated sequencer traces. The FASTQ format encodes phred scores as ASCII characters alongside the read sequences. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods. Perhaps the most important use of Phred quality scores is the automatic determination of accurate, quality-based consensus sequences.

Bisulfite sequencing

Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.

Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population.

DNA sequencing theory is the broad body of work that attempts to lay analytical foundations for determining the order of specific nucleotides in a sequence of DNA, otherwise known as DNA sequencing. The practical aspects revolve around designing and optimizing sequencing projects, predicting project performance, troubleshooting experimental results, characterizing factors such as sequence bias and the effects of software processing algorithms, and comparing various sequencing methods to one another. In this sense, it could be considered a branch of systems engineering or operations research. The permanent archive of work is primarily mathematical, although numerical calculations are often conducted for particular problems too. DNA sequencing theory addresses physical processes related to sequencing DNA and should not be confused with theories of analyzing resultant DNA sequences, e.g. sequence alignment. Publications sometimes do not make a careful distinction, but the latter are primarily concerned with algorithmic issues. Sequencing theory is based on elements of mathematics, biology, and systems engineering, so it is highly interdisciplinary. The subject may be studied within the context of computational biology.

The exome is composed of all of the exons within the genome, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing. This includes untranslated regions of mRNA, and coding sequence. Exome sequencing has proven to be an efficient method to determine the genetic basis of more than two dozen Mendelian or single gene disorders.

Phred base-calling is a computer program for identifying a base (nucleobase) sequence from a fluorescence "trace" data generated by an automated DNA sequencer that uses electrophoresis and 4-fluorescent dye method. When originally developed, Phred produced significantly fewer errors in the data sets examined than other methods, averaging 40–50% fewer errors. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods.

Exome sequencing Sequencing of all the exons of a genome

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons – humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

Ladeana Hillier is a biomedical engineer and computational biologist. She was one of the earliest scientists involved in the Human Genome Project and is noted for her work in various branches of DNA sequencing, as well as for having co-developed Phred, a widely used DNA trace analyzer.

Michael Christopher Wendl is a mathematician and biomedical engineer who has worked on DNA sequencing theory, covering and matching problems in probability, theoretical fluid mechanics, and co-wrote Phred. He was a scientist on the Human Genome Project and has done bioinformatics and biostatistics work in cancer. Wendl is of ethnic German heritage and is the son of the aerospace engineer Michael J. Wendl.

References

  1. Ewing, B., Hillier, L., Wendl, M.C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8(3), 175–185. PMID   9521921 full article
  2. Koboldt, D. C. and Miller, R. D. (2011) Identification of Polymorphic Markers for Genetic Mapping, chapter 2 in "Genomics: Essential Methods", John Wiley and Sons.
  3. Highsmith, W. E. (2006) Electrophoretic Methods for Mutation Detection and DNA Sequencing, chapter 9 in "Molecular Diagnostics for the Clinical Laboratorian", Humana Press
  4. Lander, E.S. and Green, P. (1987) Construction of multilocus genetic-linkage maps in humans. PNAS 84(8), 2363–2367.
  5. Ewing, B. and Green, P. (2000) Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genetics 25(2), 232–234.
  6. Green, P. et al. (1993) Ancient conserved regions in new gene-sequences and the protein databases. Science 259(5102), 1711–1716.
  7. National Academy of Sciences (2004) Biography of Phil Green. PNAS 101(39), 13991–13993.