Mark B. Gerstein

Last updated
Mark Gerstein
Born
Mark Bender Gerstein

February 23
CitizenshipUS
Alma mater
Awards
Scientific career
Fields Bioinformatics [3]
Institutions
Thesis Protein recognition: surfaces and conformational change  (1993)
Doctoral advisor
Other academic advisors Michael Levitt (postdoc)
Doctoral students Werner Krebs [6] [7]
Website

Mark Bender Gerstein is an American scientist working in bioinformatics and Data Science. As of 2009, he is co-director of the Yale Computational Biology and Bioinformatics program.

Contents

Mark Gerstein is Albert L. Williams Professor of Biomedical Informatics, Professor of Molecular Biophysics & Biochemistry, Professor of Statistics & Data Science, and Professor of Computer Science at Yale University. [8] In 2018, Gerstein was named co-director of the Yale Center for Biomedical Data Science. [9]

Education

After graduating from Harvard College summa cum laude with a Bachelor of Arts in physics in 1989, Gerstein did a PhD co-supervised by Ruth Lynden-Bell [5] at the University of Cambridge and Cyrus Chothia at the Laboratory of Molecular Biology on conformational change in proteins, graduating in 1993. [10] He then went on to postdoctoral research in bioinformatics at Stanford University from 1993 to 1996 supervised by Nobel-laureate Michael Levitt.

Research

Gerstein does research in the field of bioinformatics. [3] [11] [12] This involves applying a range of computational approaches to problems in molecular biology, including data mining and machine learning, molecular simulation, and database design. His research group has a number of foci including annotating the human genome, [13] personal genomics, cancer genomics, building tools in support of genome technologies (such as next-generation sequencing), analyzing molecular networks, and simulating macromolecular motions. Notable databases and tools that the group has developed include the Database of Macromolecular Motions, [6] [7] which categorizes macromolecular conformational change; tYNA, [14] which helps analyze molecular networks; PubNet, [15] which analyzes publication networks; PeakSeq, [16] which identifies regions in the genome bound by particular transcription factors; and CNVnator, [17] which categorizes block variants in the genome. Gerstein has also written extensively on how general issues in data science impact on genomics—in particular, in relation to privacy [18] and to structuring scientific communication. [19]

Gerstein's work has been published in peer reviewed scientific journals [20] [21] [22] and non-scientific publications in more popular forums. [23] His work has been highly cited, with an H greater than 100. [3] He serves on a number of editorial and advisory boards, including those of PLoS Computational Biology , Genome Research , Genome Biology , and Molecular Systems Biology . He has been quoted in the New York Times, [24] [25] [26] including on the front page, [13] and in other major newspapers. [27]

Awards and honors

In addition to a W. M. Keck Foundation Distinguished Young Scholars award, [28] Gerstein has received awards from the US Navy, IBM, Pharmaceutical Research and Manufacturers of America, and the Donaghue Foundation. [29] He is a Fellow of the AAAS. [1] Other awards include a Herchel-Smith Scholarship supporting his doctoral work at Emmanuel College, Cambridge and a Damon Runyon Cancer Research Foundation Postdoctoral Fellowship. He is a contributor to a number of scientific consortia including ENCODE, [30] modENCODE, [31] [32] [33] 1000 Genomes Project, Brainspan, [34] and DOE Kbase.[ citation needed ] He was made a Fellow of the International Society for Computational Biology in 2015. [2]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Michael Ashburner</span> English biologist (1942–2023)

Michael Ashburner was an English biologist and Professor in the Department of Genetics at University of Cambridge. He was also the former joint-head and co-founder of the European Bioinformatics Institute (EBI) of the European Molecular Biology Laboratory (EMBL) and a Fellow of Churchill College, Cambridge.

<span class="mw-page-title-main">Ewan Birney</span> English businessman

John Frederick William Birney is joint director of EMBL's European Bioinformatics Institute (EMBL-EBI), in Hinxton, Cambridgeshire and deputy director general of the European Molecular Biology Laboratory (EMBL). He also serves as non-executive director of Genomics England, chair of the Global Alliance for Genomics and Health (GA4GH) and honorary professor of bioinformatics at the University of Cambridge. Birney has made significant contributions to genomics, through his development of innovative bioinformatics and computational biology tools. He previously served as an associate faculty member at the Wellcome Trust Sanger Institute.

<span class="mw-page-title-main">Steven Salzberg</span> American biologist and computer scientist

Steven Lloyd Salzberg is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.

The Database of Macromolecular Motions is a bioinformatics database and software-as-a-service tool that attempts to categorize macromolecular motions, sometimes also known as conformational change. It was originally developed by Mark B. Gerstein, Werner Krebs, and Nat Echols in the Molecular Biophysics & Biochemistry Department at Yale University.

<span class="mw-page-title-main">RNA-Seq</span> Lab technique in cellular biology

RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also known as transcriptome.

<span class="mw-page-title-main">Richard M. Durbin</span> British computational biologist

Richard Michael Durbin is a British computational biologist and Al-Kindi Professor of Genetics at the University of Cambridge. He also serves as an associate faculty member at the Wellcome Sanger Institute where he was previously a senior group leader.

<span class="mw-page-title-main">Sean Eddy</span> American professor at Harvard University

Sean Roberts Eddy is Professor of Molecular & Cellular Biology and of Applied Mathematics at Harvard University. Previously he was based at the Janelia Research Campus from 2006 to 2015 in Virginia. His research interests are in bioinformatics, computational biology and biological sequence analysis. As of 2016 projects include the use of Hidden Markov models in HMMER, Infernal Pfam and Rfam.

<span class="mw-page-title-main">Lincoln Stein</span> American scientist and academic

Lincoln David Stein is a scientist and Professor in bioinformatics and computational biology at the Ontario Institute for Cancer Research.

DNase-seq is a method in molecular biology used to identify the location of regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I. FAIRE-Seq is a successor of DNase-seq for the genome-wide identification of accessible DNA regions in the genome. Both the protocols for identifying open chromatin regions have biases depending on underlying nucleosome structure. For example, FAIRE-seq provides higher tag counts at non-promoter regions. On the other hand, DNase-seq signal is higher at promoter regions, and DNase-seq has been shown to have better sensitivity than FAIRE-seq even at non-promoter regions.

<span class="mw-page-title-main">Cyrus Chothia</span> English biochemist (1942–2019)

Cyrus Homi Chothia was an English biochemist who was an emeritus scientist at the Medical Research Council (MRC) Laboratory of Molecular Biology (LMB) at the University of Cambridge and emeritus fellow of Wolfson College, Cambridge.

Chimeric RNA, sometimes referred to as a fusion transcript, is composed of exons from two or more different genes that have the potential to encode novel proteins. These mRNAs are different from those produced by conventional splicing as they are produced by two or more gene loci.

<span class="mw-page-title-main">Alfonso Valencia</span>

Alfonso Valencia is a Spanish biologist, ICREA Professor, current director of the Life Sciences department at Barcelona Supercomputing Center. and of Spanish National Bioinformatics Institute (INB-ISCIII). From 2015-2018, he was President of the International Society for Computational Biology. His research is focused on the study of biomedical systems with computational biology and bioinformatics approaches.

<span class="mw-page-title-main">Tim Hubbard</span> Professor of Bioinformatics at Kings College London

Timothy John Phillip Hubbard is a Professor of Bioinformatics at King's College London, Head of Genome Analysis at Genomics England and Honorary Faculty at the Wellcome Trust Sanger Institute in Cambridge, UK. From 1 March 2024, Hubbard became the director of Europe's Life Science Data Infrastructure ELIXIR.

<span class="mw-page-title-main">Alicia Oshlack</span> Australian bioinformatician

Alicia Yinema Kate Nungarai Oshlack is an Australian bioinformatician and is Co-Head of Computational Biology at the Peter MacCallum Cancer Centre in Melbourne, Victoria, Australia. She is best known for her work developing methods for the analysis of transcriptome data as a measure of gene expression. She has characterized the role of gene expression in human evolution by comparisons of humans, chimpanzees, orangutans, and rhesus macaques, and works collaboratively in data analysis to improve the use of clinical sequencing of RNA samples by RNAseq for human disease diagnosis.

<span class="mw-page-title-main">Sarah Teichmann</span> German bioinformatician

Sarah Amalia Teichmann is a German scientist who is head of cellular genetics at the Wellcome Sanger Institute and a visiting research group leader at the European Bioinformatics Institute (EMBL-EBI). She serves as director of research in the Cavendish Laboratory, at the University of Cambridge and a senior research fellow at Churchill College, Cambridge.

ATAC-seq is a technique used in molecular biology to assess genome-wide chromatin accessibility. In 2013, the technique was first described as an alternative advanced method for MNase-seq, FAIRE-Seq and DNase-Seq. ATAC-seq is a faster analysis of the epigenome than DNase-seq or MNase-seq.

Michael P. Snyder is an American genomicist and the Stanford B. Ascherman Professor, and since 2009, chair of genetics and director of genomics and personalized medicine at Stanford University. He is the former director of the Yale Center for Genomics and Proteomics. He was elected to the American Academy of Arts and Sciences in 2015. During his tenure as chair of the department at Stanford, U.S. News & World Report has ranked Stanford University first or tied for first in genetics, genomics and bioinformatics under his leadership.

Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.

References

  1. 1 2 "Yale Scientists Awarded AAAS Fellowship".
  2. 1 2 "Meet the ISCB Fellows Class of 2015". International Society for Computational Biology. Archived from the original on 2015-02-20.
  3. 1 2 3 Mark B. Gerstein publications indexed by Google Scholar
  4. Gerstein, M.; Chothia, C. (1991). "Analysis of protein loop closure. Two types of hinges produce one motion in lactate dehydrogenase". Journal of Molecular Biology. 220 (1): 133–149. doi:10.1016/0022-2836(91)90387-L. PMID   2067013.
  5. 1 2 Mark B. Gerstein at the Mathematics Genealogy Project
  6. 1 2 Krebs, Werner G. (2002). The database of macromolecular motions : a standardized system for analyzing and visualizing macromolecular motions in a database framework (PhD thesis). Yale University. OCLC   54626123.
  7. 1 2 Gerstein, M; Krebs, W (1998). "A database of macromolecular motions". Nucleic Acids Research. 26 (18): 4280–90. doi:10.1093/nar/26.18.4280. PMC   147832 . PMID   9722650.
  8. Mark B. Gerstein's publications indexed by the Scopus bibliographic database. (subscription required)
  9. Xiong, Amy (February 9, 2018). "Yale establishes biomedical data science center". yaledailynews.com. Retrieved 2020-09-27.
  10. Gerstein, Mark (1993). Protein recognition: surfaces and conformational change (PhD thesis). University of Cambridge.
  11. Durbin, R. M.; Abecasis, G. R.; Altshuler, R. M.; Auton, G. A. R.; Brooks, D. R.; Durbin, A.; Gibbs, A. G.; Hurles, F. S.; McVean, F. M.; Donnelly, P.; Egholm, M.; Flicek, P.; Gabriel, S. B.; Gibbs, R. A.; Knoppers, B. M.; Lander, E. S.; Lehrach, H.; Mardis, E. R.; McVean, G. A.; Nickerson, D. A.; Peltonen, L.; Schafer, A. J.; Sherry, S. T.; Wang, J.; Wilson, R. K.; Gibbs, R. A.; Deiros, D.; Metzker, M.; Muzny, D.; et al. (2010). "A map of human genome variation from population-scale sequencing". Nature. 467 (7319): 1061–1073. Bibcode:2010Natur.467.1061T. doi:10.1038/nature09534. PMC   3042601 . PMID   20981092.
  12. Wang, Z.; Gerstein, M.; Snyder, M. (2009). "RNA-Seq: A revolutionary tool for transcriptomics". Nature Reviews Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC   2949280 . PMID   19015660.
  13. 1 2 Gina Kolata, (Sept. 5, 2012) 'Bits of Mystery DNA, Far From Junk, Play Crucial Role,' NY Times
  14. Yip, K. Y.; Yu, H; Kim, P. M.; Schultz, M; Gerstein, M (2006). "The tYNA platform for comparative interactomics: A web tool for managing, comparing and mining multiple networks". Bioinformatics. 22 (23): 2968–70. doi: 10.1093/bioinformatics/btl488 . PMID   17021160.
  15. Douglas, S. M.; Montelione, G. T.; Gerstein, M. (2005). "PubNet: A flexible system for visualizing literature derived networks". Genome Biology. 6 (9): R80. doi: 10.1186/gb-2005-6-9-r80 . PMC   1242215 . PMID   16168087.
  16. Rozowsky, J; Euskirchen, G; Auerbach, R. K.; Zhang, Z. D.; Gibson, T; Bjornson, R; Carriero, N; Snyder, M; Gerstein, M. B. (2009). "Peak Seq enables systematic scoring of ChIP-seq experiments relative to controls". Nature Biotechnology. 27 (1): 66–75. doi:10.1038/nbt.1518. PMC   2924752 . PMID   19122651.
  17. Abyzov, A; Urban, A. E.; Snyder, M; Gerstein, M (2011). "CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing". Genome Research. 21 (6): 974–84. doi:10.1101/gr.114876.110. PMC   3106330 . PMID   21324876.
  18. Greenbaum, D; Sboner, A; Mu, X. J.; Gerstein, M (2011). "Genomics and privacy: Implications of the new reality of closed data for the field". PLOS Computational Biology. 7 (12): e1002278. Bibcode:2011PLSCB...7E2278G. doi: 10.1371/journal.pcbi.1002278 . PMC   3228779 . PMID   22144881.
  19. Gerstein, M; Seringhaus, M; Fields, S (2007). "Structured digital abstract makes text mining easy". Nature. 447 (7141): 142. Bibcode:2007Natur.447..142G. doi: 10.1038/447142a . PMID   17495904.
  20. Mark Gerstein at DBLP Bibliography Server OOjs UI icon edit-ltr-progressive.svg
  21. Mark B. Gerstein publications indexed by Microsoft Academic
  22. Giaever, G.; Chu, A. M.; Ni, L.; Connelly, C.; Riles, L.; Véronneau, S.; Dow, S.; Lucau-Danila, A.; Anderson, K.; André, B.; Arkin, A. P.; Astromoff, A.; El-Bakkoury, M.; Bangham, R.; Benito, R.; Brachat, S.; Campanaro, S.; Curtiss, M.; Davis, K.; Deutschbauer, A.; Entian, K. D.; Flaherty, P.; Foury, F.; Garfinkel, D. J.; Gerstein, M.; Gotte, D.; Güldener, U.; Hegemann, J. H.; Hempel, S.; Herman, Z. (2002). "Functional profiling of the Saccharomyces cerevisiae genome". Nature. 418 (6896): 387–391. Bibcode:2002Natur.418..387G. doi:10.1038/nature00935. PMID   12140549. S2CID   4400400.
  23. "List of Non-technical Writing by Mark Gerstein". gersteinlab.org. Archived from the original on 2013-10-17.
  24. Kolata, Gina (2013-06-16). "Poking Holes in Genetic Privacy". The New York Times. ISSN   0362-4331 . Retrieved 2016-01-18.
  25. Zimmer, Carl (2014-09-01). "Tiny, Vast Windows Into Human DNA". The New York Times. ISSN   0362-4331 . Retrieved 2016-01-18.
  26. "Thoughts on Genes". The New York Times. 2008-11-10. ISSN   0362-4331 . Retrieved 2016-01-18.
  27. "Scientists Unveil New Blueprint Of How The Human Genome Works". courant.com. Retrieved 2016-01-18.
  28. Mervis, Jeffrey (1999-07-16). "Keck Helps Five Careers With $1 Million Grants". Science. 285 (5426): 312–3. doi:10.1126/science.285.5426.312b. PMID   10438290. S2CID   33084600.
  29. "Donaghue Foundation selects five investigators for long-term support". medicine.yale.edu. Retrieved 2020-09-27.
  30. ENCODE Project Consortium; Birney E; Stamatoyannopoulos JA; Dutta A; Guigó R; Gingeras TR; Margulies EH; Weng Z; Snyder M; Dermitzakis ET; et al. (2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project". Nature. 447 (7146): 799–816. Bibcode:2007Natur.447..799B. doi:10.1038/nature05874. PMC   2212820 . PMID   17571346.
  31. Landt, S. G.; Marinov, G. K.; Kundaje, A.; Kheradpour, P.; Pauli, F.; Batzoglou, S.; Bernstein, B. E.; Bickel, P.; Brown, J. B.; Cayting, P.; Chen, Y.; Desalvo, G.; Epstein, C.; Fisher-Aylor, K. I.; Euskirchen, G.; Gerstein, M.; Gertz, J.; Hartemink, A. J.; Hoffman, M. M.; Iyer, V. R.; Jung, Y. L.; Karmakar, S.; Kellis, M.; Kharchenko, P. V.; Li, Q.; Liu, T.; Liu, X. S.; Ma, L.; Milosavljevic, A.; Myers, R. M. (2012). "ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia". Genome Research. 22 (9): 1813–1831. doi:10.1101/gr.136184.111. PMC   3431496 . PMID   22955991.
  32. Cheng, C.; Yan, K. K.; Yip, K. Y.; Rozowsky, J.; Alexander, R.; Shou, C.; Gerstein, M. (2011). "A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets". Genome Biology. 12 (2): R15. doi: 10.1186/gb-2011-12-2-r15 . PMC   3188797 . PMID   21324173.
  33. Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, et al. (2010). "Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project". Science. 330 (6012): 1775–1787. Bibcode:2010Sci...330.1775G. doi:10.1126/science.1196914. PMC   3142569 . PMID   21177976.
  34. Li, Mingfeng; Santpere, Gabriel; Kawasawa, Yuka Imamura; Evgrafov, Oleg V.; Gulden, Forrest O.; Pochareddy, Sirisha; Sunkin, Susan M.; Li, Zhen; Shin, Yurae; Zhu, Ying; Sousa, André M. M. (2018-12-14). "Integrative functional genomic analysis of human brain development and neuropsychiatric risks". Science. 362 (6420): eaat7615. Bibcode:2018Sci...362.7615L. doi:10.1126/science.aat7615. ISSN   0036-8075. PMC   6413317 . PMID   30545854.