Jean Yang

Last updated

Jean Yee Hwa Yang is an Australian statistician known for her work on variance reduction for microarrays, [1] and for inferring proteins from mass spectrometry data. [2] Yang is a Professor in the School of Mathematics and Statistics at the University of Sydney. [3]

Contents

Education and career

Yang studied at Killara High School between 1987 and 1990 and at Barker College between 1991 and 1992. She earned a bachelor's degree with first class honours and a University Medal from the University of Sydney in 1996, in mathematics and statistics.

After working for half a year at the Commonwealth Scientific and Industrial Research Organisation, [4] Yang then went to the University of California, Berkeley, for graduate study, completing her Ph.D. in statistics in 2002. Her dissertation, "Statistical methods in the design and analysis of gene expression data from cDNA microarray experiment", was supervised by Terry Speed. [4] [5]

Yang did postdoctoral research in biostatistics and bioinformatics with Mark R. Segal at the University of California, San Francisco, where she became an assistant professor in 2003. In 2005 she returned to the University of Sydney with a faculty position. [4]

Recognition

In 2015, Yang was the winner of the Moran Medal of the Australian Academy of Science for her "significant contributions to the development of statistical methodology for analyzing molecular data arising in contemporary biomedical research". [1]

Selected publications

Related Research Articles

Biostatistics is a branch of statistics that applies statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

<span class="mw-page-title-main">DNA microarray</span> Collection of microscopic DNA spots attached to a solid surface

A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles of a specific DNA sequence, known as probes. These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA sample under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. An example of its application is in SNPs arrays for polymorphisms in cardiovascular diseases, cancer, pathogens and GWAS analysis. It is also used for the identification of structural variations and the measurement of gene expression.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Gene expression profiling</span>

In the field of molecular biology, gene expression profiling is the measurement of the activity of thousands of genes at once, to create a global picture of cellular function. These profiles can, for example, distinguish between cells that are actively dividing, or show how the cells react to a particular treatment. Many experiments of this sort measure an entire genome simultaneously, that is, every gene present in a particular cell.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes – in many cases, an organism's entire genome – in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult – if not impossible – to analyze without the help of computer programs.

<span class="mw-page-title-main">ChIP-on-chip</span> Molecular biology method

ChIP-on-chip is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray ("chip"). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo. Specifically, it allows the identification of the cistrome, the sum of binding sites, for DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin. The most prominent representatives of this class are transcription factors, replication-related proteins, like origin recognition complex protein (ORC), histones, their variants, and histone modifications.

Within computational biology, an MA plot is an application of a Bland–Altman plot for visual representation of genomic data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M and A scales, then plotting these values. Though originally applied in the context of two channel DNA microarray gene expression data, MA plots are also used to visualise high-throughput sequencing analysis.

<span class="mw-page-title-main">Terry Speed</span> Australian statistician at the Walter and Eliza Hall Institute of Medical Research

Terence Paul "Terry" Speed, FAA FRS is an Australian statistician. A senior principal research scientist at the Walter and Eliza Hall Institute of Medical Research, he is known for his contributions to the analysis of variance and bioinformatics, and in particular to the analysis of microarray data.

Robert Clifford Gentleman is a Canadian statistician and bioinformatician who is currently the founding executive director of the Center for Computational Biomedicine at Harvard Medical School. He was previously the vice president of computational biology at 23andMe. Gentleman is recognized, along with Ross Ihaka, as one of the originators of the R programming language and the Bioconductor project.

<span class="mw-page-title-main">Alicia Oshlack</span> Australian bioinformatician

Alicia Yinema Kate Nungarai Oshlack is an Australian bioinformatician and is Co-Head of Computational Biology at the Peter MacCallum Cancer Centre in Melbourne, Victoria, Australia. She is best known for her work developing methods for the analysis of transcriptome data as a measure of gene expression. She has characterized the role of gene expression in human evolution by comparisons of humans, chimpanzees, orangutans, and rhesus macaques, and works collaboratively in data analysis to improve the use of clinical sequencing of RNA samples by RNAseq for human disease diagnosis.

<span class="mw-page-title-main">Pathway analysis</span>

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a metabolic pathway describing an enzymatic process within a cell or tissue or a signaling pathway model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular signaling molecule that activates a specific receptor, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small graph with gene, protein, and/or small molecule nodes connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as protein families, Gene Ontology (GO) and Disease Ontology (DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key genes/ proteins within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway de novo from proteins that have been identified as key affected elements. By examining changes in e.g. gene expression in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental condition that was studied with omics tools or genome-wide association study. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions. In such situations, the most productive way of exploring the list is to identify enrichment of specific FGSs in it. The general approach of enrichment analyses is to identify FGSs, members of which were most frequently or most strongly altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.

TopHat is an open-source bioinformatics tool for the throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies using Bowtie first and then mapping to a reference genome to discover RNA splice sites de novo. TopHat aligns RNA-Seq reads to mammalian-sized genomes.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

<span class="mw-page-title-main">Rafael Irizarry (scientist)</span> American professor of biostatistics

Rafael Irizarry is a professor of biostatistics at the Harvard T.H. Chan School of Public Health and professor of biostatistics and computational biology at the Dana–Farber Cancer Institute. Irizarry is known as one of the founders of the Bioconductor project.

Sandrine Dudoit is a professor of statistics and public health at the University of California, Berkeley. Her research applies statistics to microarray and genetic data; she is known as one of the founders of the open-source Bioconductor project for the development of bioinformatics software.

Sündüz Keleş is a Turkish statistician specializing in statistical methods in genomics. She is a professor of statistics and of biostatistics and medical informatics at the University of Wisconsin–Madison. Her research has included the development of the FreeHi-C system for generating synthetic Hi-C data.

References

  1. 1 2 "2015 Moran Medal for research in statistics", 2015 awardees, Australian Academy of Science, retrieved 2018-02-27
  2. Mathematicians solving today's problems of systems biology, Garvan Institute of Medical Research, 21 March 2012
  3. Professor Jean Yang, School of Mathematics and Statistics, University of Sydney, retrieved 2018-02-27
  4. 1 2 3 Curriculum vitae (PDF), 2013, retrieved 2018-02-27
  5. Jean Yang at the Mathematics Genealogy Project