Donna R. Maglott

Last updated
Donna R. Maglott
Alma materUniversity of Michigan
Scientific career
InstitutionsNational Center for Biotechnology Information
Thesis The structure and function of the 50S ribosome of Escherichia coli  (1970)

Donna R. Maglott is a staff scientist at the National Center for Biotechnology Information known for her research on large-scale genomics projects, including the mouse genome and development of databases required for genomics research.

Contents

Education and career

Maglott earned her Ph.D. in 1970 from the University of Michigan where she worked on the 50S ribosome in the bacterium Escherichia coli. [1] She held an academic position at Howard University;[ when? ] and then moved to the American Type Culture Center in 1986 where she began establishing databases needed for genomic research. [2] [3] She started at the National Center for Biotechnology Information (NCBI) in 1998. [4]

Research

While at Howard University, Maglott worked on protein synthesis during early development of sea urchins. [5] [6] At ATCC, she worked on repositories holding clone and genomic information [7] [8] and began research using genomic tools to investigate information on human chromosomes. [9] [10] In 2000, Maglott worked with Kim D. Pruitt to introduce RefSeq, a web-based resource for gene-based information that is hosted by NCBI [11] [12] and has been updated over the years. [13] [14] She has also been involved in the development of other databases at NCBI including Entrez Gene, [15] [16] ClinVar, [17] [18] STS markers, Conserved CoDing Sequences (CCDS), Map Viewer, RefSeqGene, the NIH Genetic Testing Registry (GTR), MedGen, and ClinVar. [4] Large-scale genomics projects that Margott has worked on include the rat genome database, [19] and the mouse genome [20] [21] and transcriptome. [22] In 2006, Maglott was a part of the team analyzing the genome of the sea urchin, Strongylocentrotus purpuratus , which was the first genome obtained for a motile marine invertebrate. [23] [24]

Selected publications

Related Research Articles

The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence Database Collaboration (INSDC).

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

<span class="mw-page-title-main">David J. Lipman</span> American biologist

David J. Lipman is an American biologist who from 1989 to 2017 was the director of the National Center for Biotechnology Information (NCBI) at the National Institutes of Health. NCBI is the home of GenBank, the U.S. node of the International Sequence Database Consortium, and PubMed, one of the most heavily used sites in the world for the search and retrieval of biomedical information. Lipman is one of the original authors of the BLAST sequence alignment program, and a respected figure in bioinformatics. In 2017, he left NCBI and became Chief Science Officer at Impossible Foods.

Mouse Genome Informatics (MGI) is a free, online database and bioinformatics resource hosted by The Jackson Laboratory, with funding by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). MGI provides access to data on the genetics, genomics and biology of the laboratory mouse to facilitate the study of human health and disease. The database integrates multiple projects, with the two largest contributions coming from the Mouse Genome Database and Mouse Gene Expression Database (GXD). As of 2018, MGI contains data curated from over 230,000 publications.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was first introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

The Consensus Coding Sequence (CCDS) Project is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier, and ensures that they are consistently represented by the National Center for Biotechnology Information (NCBI), Ensembl, and UCSC Genome Browser. The integrity of the CCDS dataset is maintained through stringent quality assurance testing and on-going manual curation.

The Mammalian Promoter Database (MPromDb) is a curated database of gene promoters identified from ChIP-seq. The proximal promoter region contains the cis-regulatory elements of most of the transcription factors (TFs).

<span class="mw-page-title-main">Sequence Read Archive</span>

The Sequence Read Archive is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing, which are typically less than 1,000 base pairs in length. The archive is part of the International Nucleotide Sequence Database Collaboration (INSDC), and run as a collaboration between the NCBI, the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ).

<span class="mw-page-title-main">European Nucleotide Archive</span> Online database from the EBI on Nucleotides

The European Nucleotide Archive (ENA) is a repository providing free and unrestricted access to annotated DNA and RNA sequences. It also stores complementary information such as experimental procedures, details of sequence assembly and other metadata related to sequencing projects. The archive is composed of three main databases: the Sequence Read Archive, the Trace Archive and the EMBL Nucleotide Sequence Database. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.

TIGRFAMs is a database of protein families designed to support manual and automated genome annotation. Each entry includes a multiple sequence alignment and hidden Markov model (HMM) built from the alignment. Sequences that score above the defined cutoffs of a given TIGRFAMs HMM are assigned to that protein family and may be assigned the corresponding annotations. Most models describe protein families found in Bacteria and Archaea.

Monica Riley was an American scientist who contributed to the discovery of messenger RNA in her Ph.D work with Arthur Pardee, and was later a pioneer in the exploration and computer representation of the Escherichia coli genome.

Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.

Transcription factors are proteins that bind genomic regulatory sites. Identification of genomic regulatory elements is essential for understanding the dynamics of developmental, physiological and pathological processes. Recent advances in chromatin immunoprecipitation followed by sequencing (ChIP-seq) have provided powerful ways to identify genome-wide profiling of DNA-binding proteins and histone modifications. The application of ChIP-seq methods has reliably discovered transcription factor binding sites and histone modification sites.

In molecular phylogenetics, relationships among individuals are determined using character traits, such as DNA, RNA or protein, which may be obtained using a variety of sequencing technologies. High-throughput next-generation sequencing has become a popular technique in transcriptomics, which represent a snapshot of gene expression. In eukaryotes, making phylogenetic inferences using RNA is complicated by alternative splicing, which produces multiple transcripts from a single gene. As such, a variety of approaches may be used to improve phylogenetic inference using transcriptomic data obtained from RNA-Seq and processed using computational phylogenetics.

Judith Anne Blake is a computational biologist at the Jackson Laboratory and Professor of Mammalian Genetics.

References

  1. Maglott, Donna Rae Schneider (1970). The structure and function of the 50S ribosome of Escherichia coli (Thesis). Ann Arbor - Mich. OCLC   633231185.
  2. "Center for Bioinformatics and Computational Biology". www.cbcb.umd.edu. 2006. Archived from the original on 2007-08-17. Retrieved 2021-10-02.
  3. "Donna Maglott, PhD - ClinGen | Clinical Genome Resource". 2017-07-10. Archived from the original on 2017-07-10. Retrieved 2021-10-02.
  4. 1 2 "Human Variome Project". www.humanvariomeproject.org. Retrieved 2016-12-18.
  5. Maglott, D.R. (1985). "Dissociation of cells from sea urchin embryos alters the synthesis of actins and other proteins". Cell Differentiation. 17 (1): 29–43. doi:10.1016/0045-6039(85)90535-4. PMID   3875415.
  6. Maglott, D.R. (1985). "Two-dimensional electrophoretic analysis of major phosphoproteins of the sea urchin, Arbacia punctulata". Comparative Biochemistry and Physiology Part B: Comparative Biochemistry. 80 (3): 513–516. doi:10.1016/0305-0491(85)90282-2. PMID   4006443.
  7. Maglott, Donna R.; Nierman, William C. (1990-11-01). "Clone and genomic repositories at the American Type Culture Collection". Genomics. 8 (3): 601–605. doi:10.1016/0888-7543(90)90054-X. ISSN   0888-7543. PMID   1981058.
  8. Maglott, Donna R.; Nierman, William C. (1991). "Mammalian probes and libraries at the ATCC". Mammalian Genome. 1 (1): 59–64. doi:10.1007/BF00350848. ISSN   0938-8990. PMID   1794047. S2CID   12336955.
  9. Scott Durkin, A.; Maglott, Donna R.; Nierman, William C. (1992-11-01). "Chromosomal assignment of 38 human brain expressed sequence tags (ESTs) by analyzing fluorescently labeled PCR products from hybrid cell panels". Genomics. 14 (3): 808–810. doi:10.1016/S0888-7543(05)80194-6. ISSN   0888-7543. PMID   1427913.
  10. Schmidt, Valentina A.; Nierman, William C.; Maglott, Donna R.; Cupit, Lisa D.; Moskowitz, Keith A.; Wainer, Jean Ann; Bahou, Wadie F. (1998). "The Human Proteinase-activated Receptor-3 (PAR-3) Gene". Journal of Biological Chemistry. 273 (24): 15061–15068. doi: 10.1074/jbc.273.24.15061 . ISSN   0021-9258. PMID   9614115.
  11. Maglott, D. R. (2000-01-01). "NCBI's LocusLink and RefSeq". Nucleic Acids Research. 28 (1): 126–128. doi:10.1093/nar/28.1.126. PMC   102393 . PMID   10592200.
  12. Pruitt, Kim D.; Katz, Kenneth S.; Sicotte, Hugues; Maglott, Donna R. (2000). "Introducing RefSeq and LocusLink: curated human genome resources at the NCBI". Trends in Genetics. 16 (1): 44–47. doi:10.1016/S0168-9525(99)01882-X. PMID   10637631.
  13. Pruitt, K. D. (2001-01-01). "RefSeq and LocusLink: NCBI gene-centered resources". Nucleic Acids Research. 29 (1): 137–140. doi:10.1093/nar/29.1.137. PMC   29787 . PMID   11125071.
  14. Pruitt, K. D.; Tatusova, T.; Maglott, D. R. (2007-01-03). "NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins". Nucleic Acids Research. 35 (Database): D61–D65. doi:10.1093/nar/gkl842. ISSN   0305-1048. PMC   1716718 . PMID   17130148.
  15. Maglott, D. (2004-12-17). "Entrez Gene: gene-centered information at NCBI". Nucleic Acids Research. 33 (Database issue): D54–D58. doi:10.1093/nar/gki031. ISSN   1362-4962. PMC   539985 . PMID   15608257.
  16. Maglott, D.; Ostell, J.; Pruitt, K. D.; Tatusova, T. (2011-01-01). "Entrez Gene: gene-centered information at NCBI". Nucleic Acids Research. 39 (Database): D52–D57. doi:10.1093/nar/gkq1237. ISSN   0305-1048. PMC   3013746 . PMID   21115458.
  17. Landrum, Melissa J; Lee, Jennifer M; Benson, Mark; Brown, Garth R; Chao, Chen; Chitipiralla, Shanmuga; Gu, Baoshan; Hart, Jennifer; Hoffman, Douglas; Jang, Wonhee; Karapetyan, Karen (2018-01-04). "ClinVar: improving access to variant interpretations and supporting evidence". Nucleic Acids Research. 46 (D1): D1062–D1067. doi:10.1093/nar/gkx1153. ISSN   0305-1048. PMC   5753237 . PMID   29165669.
  18. Landrum, Melissa J.; Lee, Jennifer M.; Riley, George R.; Jang, Wonhee; Rubinstein, Wendy S.; Church, Deanna M.; Maglott, Donna R. (2014). "ClinVar: public archive of relationships among sequence variation and human phenotype". Nucleic Acids Research. 42 (D1): D980–D985. doi:10.1093/nar/gkt1113. ISSN   0305-1048. PMC   3965032 . PMID   24234437.
  19. Twigger, S. (2002-01-01). "Rat Genome Database (RGD): mapping disease onto the genome". Nucleic Acids Research. 30 (1): 125–128. doi:10.1093/nar/30.1.125. PMC   99132 . PMID   11752273.
  20. Mouse Genome Sequencing Consortium (2002). "Initial sequencing and comparative analysis of the mouse genome". Nature. 420 (6915): 520–562. Bibcode:2002Natur.420..520W. doi: 10.1038/nature01262 . ISSN   0028-0836. PMID   12466850.
  21. Hudson, Thomas J.; Church, Deanna M.; Greenaway, Simon; Nguyen, Huy; Cook, April; Steen, Robert G.; Van Etten, William J.; Castle, Andrew B.; Strivens, Mark A.; Trickett, Pamela; Heuston, Christine (2001). "A radiation hybrid map of mouse genes". Nature Genetics. 29 (2): 201–205. doi:10.1038/ng1001-201. ISSN   1061-4036. PMID   11586302. S2CID   27643522.
  22. The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I & II Team* (2002). "Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs". Nature. 420 (6915): 563–573. Bibcode:2002Natur.420..563O. doi: 10.1038/nature01266 . ISSN   0028-0836. PMID   12466851. S2CID   4347839.
  23. Sea Urchin Genome Sequencing Consortium; Sodergren, E.; Weinstock, G. M.; Davidson, E. H; Cameron, R. A.; Gibbs, R. A.; Angerer, R. C.; Angerer, L. M.; Arnone, M. I.; Burgess, D. R.; Burke, R. D. (2006-11-10). "The Genome of the Sea Urchin Strongylocentrotus purpuratus". Science. 314 (5801): 941–952. Bibcode:2006Sci...314..941S. doi:10.1126/science.1133609. ISSN   0036-8075. PMC   3159423 . PMID   17095691.
  24. Livingston, B.T.; Killian, C.E.; Wilt, F.; Cameron, A.; Landrum, M.J.; Ermolaeva, O.; Sapojnikov, V.; Maglott, D.R.; Buchanan, A.M.; Ettensohn, C.A. (2006). "A genome-wide analysis of biomineralization-related proteins in the sea urchin Strongylocentrotus purpuratus". Developmental Biology. 300 (1): 335–348. doi: 10.1016/j.ydbio.2006.07.047 . PMID   16987510.