UC Santa Cruz Genomics Institute

Last updated
UC Santa Cruz Genomics Institute
Formation2014
Location
Research Type
Basic (non-clinical)
Fields of Research
Genomics, Bioinformatics, Computational Biology, Big Data Analytics, Data Sharing, Paleogenomics, Nanopore Sequencing, Cancer and Disease Research
Director
David Haussler, Founding Scientific Director
Affiliations University of California, Santa Cruz, Jack Baskin School of Engineering
Website genomics.ucsc.edu

The UC Santa Cruz Genomics Institute is a public research institution based in the Jack Baskin School of Engineering at the University of California, Santa Cruz. The Genomics Institute's scientists and engineers work on a variety of projects related to genome sequencing, computational biology, large data analytics, and data sharing. The institute also maintains a number of software tools used by researchers worldwide, including the UCSC Genome Browser, Dockstore, and the Xena Browser.

Contents

History

The UC Santa Cruz Genomics Institute was formally organized in 2014 by Dr. David Haussler, whose involvement in the Human Genome Project helped build UC Santa Cruz's strength in sequencing, analyzing, and displaying huge volumes of data.

UC Santa Cruz joined the Human Genome Project's consortium in 1999, when the public project was speeding up its efforts because of concern that a corporation might complete the human genome sequence first and patent key parts of it. [1]

Haussler, then a member of the Computer Science Department at UC Santa Cruz, recognized the need for a computational solution to align consortium's 400,000 fragments of DNA into a coherent sequence. He enlisted the help of biology graduate student Jim Kent, who was able to write a code that assembled the DNA sequencing data from the international consortium just days before a competing corporate team created their first assembly. UC Santa Cruz posted the first human genome sequence on the internet on July 7, 2000, to make it freely available to the public. [2] Kent then immediately began to assemble the UCSC Genome Browser to allow researchers to view the assembled DNA sequence in terms of genes and chromosomes. [3] This tool, and other browsers that UC Santa Cruz has created over the last two decades, continue to be maintained by the Genomics Institute.

In 2004, UC Santa Cruz created a Department of Biomolecular Engineering, which housed many of the university's genomics efforts until the formal Genomics Institute was organized in 2014.

Research

The stated mission of the UC Santa Cruz Genomics Institute is to openly share genomic data and speed its incorporation into health and conservation efforts. Its research is focused on three broad areas: 1) developing and improving tools for DNA and RNA sequencing and for securely sharing large quantities of genomic data 2) using genomic data to improve human health, and 3) collecting and sequencing genomic data that can be used for environmental conservation.

Affiliates of the Genomics Institute have played significant roles in large-scale consortium sequencing projects like GENCODE, the Vertebrate Genome Project, [4] Pan-Cancer Initiative, [5] Human Cell Atlas, and the Human Genome Reference Program to create a more complete and diverse reference genome. [6] One of the institute's Associate Directors, Karen Miga, recently co-led the Telomere to Telomere consortium that finally completed a full, gapless sequence of the human genome. [7]

Several of the Institute's major initiatives focus on disease and pathogen research. These include Treehouse Childhood Cancer Initiative, the UCSC Center for Live Cell Genomics, the BRCA Exchange for breast cancer, and a pathogen genomics team that has provided tools used by the CDC and other health organizations to track the mutations and spread of the COVID-19 virus.

The Genomics Institute is also a partner of the CALeDNA program to document and sequence biodiversity in California and hosts a number of related projects in conservation genomics. [8]

Government and private sponsors for the Genomics Institute include the National Institute of Health, the National Science Foundation, the California Department of Public Health, St. Baldrick's Foundation, Simons Foundation, Keck Foundation, Howard Hughes Medical Institute, Schmidt Futures, and the Chan Zuckerberg Initiative, among others. It frequently engages in collaborations with industry partners, some of which have included Google Health and Amazon Web Services.

Training Programs and Outreach

The Genomics Institute at UC Santa Cruz organizes a number of programs to train young researchers in coding and bioinformatics, including its short, intensive courses for community college students, summer coding program for incoming students, a Research Mentoring and Internship Program to train undergraduates in genomic science, and a Treehouse Undergraduate Bioinformatics Intensive program that introduces students to using bioinformatics in cancer research. It also collaborates with high school classrooms to bring remote experimentation to classrooms that are not normally equipped for such experimentation. [9]

The Graduate Program in Genome Sciences combines rigorous training in the disciplines that are fundamental to genome science, including computer science, molecular biology and genetics, and statistics, with hands-on technical training for students. [10]

Organizational Structure

The UC Santa Cruz Genomics Institute is an interdivisional, technology driven institution formally characterized as an Organized Research Unit [11] of the University of California, Santa Cruz in 2019. [12] This designation allows it to conduct multidisciplinary and collaborative research in a more supportive manner within the university. The institute is made up of approximately 150 researchers and staff.

Facilities

In 2019, the Genomics Institute moved its headquarters to the UC Santa Cruz Westside Research Park located at 2300 Delaware Ave. Santa Cruz, CA 95060, which is also the home of its diagnostic lab, the Colligan Clinical and Diagnostic Laboratory. Institute affiliates also maintain labs on the UC Santa Cruz main campus.

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Genomics</span> Discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

<span class="mw-page-title-main">Comparative genomics</span>

Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.

<span class="mw-page-title-main">Wellcome Sanger Institute</span> British genomics research institute

The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust.

<span class="mw-page-title-main">ENCODE</span> Research consortium investigating functional elements in human and model organism DNA

The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims "to build a comprehensive parts list of functional elements in the human genome."

<span class="mw-page-title-main">Jim Kent</span> American research scientist and computer programmer

William James Kent is an American research scientist and computer programmer. He has been a contributor to genome database projects and the 2003 winner of the Benjamin Franklin Award.

The Baskin School of Engineering, known simply as Baskin Engineering, is the school of engineering at the University of California, Santa Cruz. It consists of six departments: Applied Mathematics, Biomolecular Engineering, Computational Media, Computer Science and Engineering, Electrical and Computer Engineering, and Statistics.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

<span class="mw-page-title-main">Human Genome Project</span> Human genome sequencing programme

The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint. It started in 1990 and was completed in 2003. It remains the world's largest collaborative biological project. Planning for the project started after it was adopted in 1984 by the US government, and it officially launched in 1990. It was declared complete on April 14, 2003, and included about 92% of the genome. Level "complete genome" was achieved in May 2021, with a remaining only 0.3% bases covered by potential issues. The final gapless assembly was finished in January 2022.

The Cancer Genome Project is part of the cancer, aging, and somatic mutation research based at the Wellcome Trust Sanger Institute in the United Kingdom. It aims to identify sequence variants/mutations critical in the development of human cancers. Like The Cancer Genome Atlas project within the United States, the Cancer Genome Project represents an effort in the War on Cancer to improve cancer diagnosis, treatment, and prevention through a better understanding of the molecular basis of the disease. The Cancer Genome Project was launched by Michael Stratton in 2000, and Peter Campbell is now the group leader of the project. The project works to combine knowledge of the human genome sequence with high throughput mutation detection techniques.

<span class="mw-page-title-main">David Haussler</span> American bioinformatician

David Haussler is an American bioinformatician known for his work leading the team that assembled the first human genome sequence in the race to complete the Human Genome Project and subsequently for comparative genome analysis that deepens understanding the molecular function and evolution of the genome.

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

<span class="mw-page-title-main">Reference genome</span> Digital nucleic acid sequence database

A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.

GenomeSpace is an environment for genomics software tools and applications. It helps users manage their analysis workflows involving multiple diverse tools, including web applications and desktop tools and facilitates the transfer of data between tools via automatic format conversion. Analyses can use data from local or cloud-based stores.

Kate R. Rosenbloom is a member of the Encyclopedia of DNA Elements (ENCODE) Consortium. She is a Tech Project Manager and Software Developer at the Center for Biomolecular Science and Engineering, Jack Baskin School of Engineering, University of California Santa Cruz (UCSC), USA. She has been a member of the scientific advisory board to the human proteome project and contributed data integration and visualisation within the GTEx consortium, an international project aiming to understand how genetic variation shapes variation between human tissues.

<span class="mw-page-title-main">Angela N. Brooks</span> American biologist and geneticist

Angela Brooks is an Assistant Professor of Biomolecular Engineering at University of California, Santa Cruz. She is a member of the Genomics Institute.

The BED format is a text file format used to store genomic regions as coordinates and associated annotations. The data are presented in the form of columns separated by spaces or tabs. This format was developed during the Human Genome Project and then adopted by other sequencing projects. As a result of this increasingly wide use, this format had already become a de facto standard in bioinformatics before a formal specification was written.

Katherine Snowden Pollard is the Director of the Gladstone Institute of Data Science and Biotechnology and a professor at the University of California, San Francisco (UCSF). She is a Chan Zuckerberg Biohub Investigator. She was awarded Fellowship of the International Society for Computational Biology in 2020 and the American Institute for Medical and Biological Engineering in 2021 for outstanding contributions to computational biology and bioinformatics.

Karen Elizabeth Hayden Miga is an American geneticist who co-leads the Telomere-to-Telomore (T2T) consortium that released fully complete assembly of the human genome in March 2022. She is an assistant professor of biomolecular engineering at the University of California, Santa Cruz and Associate Director of Human Pangenomics at the UC Santa Cruz Genomics Institute. She was named as "One to Watch" in the 2020 Nature's 10 and one of Time 100’s most influential people of 2022.

References

  1. Fisher, Lawrence M. (1999-08-29). "The Race to Cash In On the Genetic Code". The New York Times. ISSN   0362-4331 . Retrieved 2022-09-08.
  2. Kent, W. James; Haussler, David (2001-09-01). "Assembly of the Working Draft of the Human Genome with GigAssembler". Genome Research. 11 (9): 1541–1548. doi:10.1101/gr.183201. ISSN   1088-9051. PMC   311095 . PMID   11544197.
  3. Wade, Nicholas (2001-02-13). "READING THE BOOK OF LIFE; Grad Student Becomes Gene Effort's Unlikely Hero". The New York Times. ISSN   0362-4331 . Retrieved 2022-09-08.
  4. Vertebrate Genome Project. ""Who We Are"".
  5. Woodward, Aylin. "New 'Pan-Cancer' analysis reveals the common roots of different cancers". UC Santa Cruz News. Retrieved 2022-09-08.
  6. "Human Genome Reference Program".
  7. "Most complete human genome yet reveals previously indecipherable DNA". www.science.org. Retrieved 2022-09-08.
  8. UCeDNA. "Meet the Team".
  9. "Invention inspires UCSC-Alisal bio collaboration, taking basic research way beyond basics | engineering.ucsc.edu". engineering.ucsc.edu. Retrieved 2022-09-08.
  10. "FINAL 2022 GI Annual Report accessible.pdf". Google Docs. Retrieved 2023-09-13.
  11. "Organized Research Units (ORUs)". officeofresearch.ucsc.edu. Retrieved 2020-04-12.
  12. Stephens, Tim. "UCSC's Genomics Institute settles into new Delaware Avenue headquarters". UC Santa Cruz News. Retrieved 2020-04-09.