Alan Christoffels | |
---|---|
Born | Kensington, Cape Town, South Africa |
Occupation(s) | Bioinformatics scientist, academic, and an author |
Academic background | |
Education | BSc. (1992), BSc. (1993), MSc. (1995), Ph.D. (2001) |
Alma mater | |
Thesis | Generation of a human gene index and its application to disease candidacy (2001) |
Academic work | |
Institutions | South African National Bioinformatics Institute,University of the Western Cape |
Alan Christoffels is a bioinformatics scientist,academic,and an author. He is Professor of Bioinformatics,and the director of the South African National Bioinformatics Institute at the University of the Western Cape. [1] He has been serving as a senior advisor to the Africa Centres for Disease Control and Prevention Pathogen genomics &Partnerships and DSI/NRF Research Chair in Bioinformatics and Public Health Genomics. [2]
Christoffels’primary contributions and research work are in the areas of host-pathogen interaction,genome evolution,pathogen genomics,and biobank LIMS. [3]
Christoffels is a founding member of Global Emerging Pathogens Consortium, [4] elected member of the Academy of Science of South Africa, [5] and President of the African Society for Bioinformatics and Computational Biology from 2020 until 2022. He was elected as a fellow of the Royal Society of South Africa in 2022. [6]
Christoffels graduated with BSc. in Microbiology and Biochemistry in 1992 and BSc (Hons.) in Pharmacology from the University of Cape Town in 1993. Between 1995 and 1997,he enrolled at the Stellenbosch University for a MSc. in Genetics. [7] His research centered upon identification of novel markers in the fine mapping of a locally prevalent cardiac disease (PFHB1). [8] He proceeded with a Ph.D. in bioinformatics at the South African National Bioinformatics Institute University of the Western Cape. His thesis was titled,"Generation of a human gene index and its application to disease candidacy". [9]
Christoffels started his academic career in 1994 as a Genetics Technician in medical Biochemistry at the Stellenbosch University. Between 2001 and 2004,he held the postdoctoral fellowship at the Institute of Molecular &Cell Biology in Singapore. Followed by this appointment,he served as an Adjunct Assistant Professor at the Nanyang Technological University based in Singapore. Later on,he held the appointment of associate professor from 2007 to 2012 in the South African National Bioinformatics Institute at the University of the Western Cape. [10] He was promoted to the position of Professor in 2013 and currently holds this appointment. [11]
From 2009,Christoffels serves as the Director of South African National Bioinformatics Institute at the University of the Western Cape. [12] For the next five years,he served on the board of directors for International Society for Computational Biology (ISCB). He has been holding the appointment of SA Medical Research Council Unit Director at the Bioinformatics Capacity Development Unit since 2012. Later on,he held a brief appointment as vice president for the South African Society for Bioinformatics. [13] He was appointed as President of African Society of Bioinformatics &Computational Biology in 2020 for a two-year term. [14]
Followed by stakeholder meeting convened by Bill &Melinda Gates Foundation,Christoffels and his international partners launched a global initiative called Public Health Alliance for Genomic Epidemiology [15] in October 2019 at the GrandChallenges meeting in Ethiopia. He serves as the principal investigator of this initiative. The research prototypes generated in the lab are integrated into the working groups within the alliance. [16]
Christoffels has more than 400 publications under his name. [3] His research spans host-pathogen interaction,pathogen genomics,genome evolution with a particular focus on the genome annotation and sequence data analysis. [17]
Christoffels and a team of scientists at his lab based in South African National Bioinformatics Institute presented the first genome sequence of SARS-CoV-2,the virus that causes COVID-19,found in South Africa. [18] His recent work features conducting research on scaling up disease surveillance systems on the African continent. [19] [20]
Christoffels has developed technologies for application of bioinformatics in public health. He has developed an analysis platform for Tuberculosis sequencing data called COMBAT-TB. This tool can be deployed in resource limited settings. [21]
With a group of researchers,Christoffels developed a management system for biobanking,Laboratory Information Management System (LIMS) at the South African National Bioinformatics Institute. [22] The purpose of the management system is to collect,store,process and,manage the bio samples appropriately. This human biobanking system is based upon the Plone web-content management framework. [23]
Christoffels has focused his genomics research on the disease vectors. In 2016,he described the non-coding RNA (miRNA) in Anopheles funestus,and postulated roles of these small genes in understanding of parasitic control by blood-sucking mosquito. [24] He co-led a tsetse fly genome project between 2008 and 2014 where he and his students studied the way a fly regulates its own immune system and protects itself against iron toxicity. He has subsequently used his analytical methods to describe non-coding RNA regulation in the Black soldier fly. [25]
Christoffels' earlier research focused on sequencing,annotating,and analyzing genomes. He analyzed the genome assembly and annotation of the Fugu rubripes. With an over 95% sequenced coverage,it is indicated that the 80% of the assembly is in multigene-sized scaffolds. The study reported that in the genome,the repetitive DNA occupy less than one-sixth of the sequence,whereas the gene loci occupy about one-third of the genome. It also highlighted that protein evolution since 450 million years,although the three-quarters of human proteins had either diverged from or did not have pufferfish homologs. Given a significant gene order scrambling,the conserved linkages between Fugu rubripes and humans show the preservation of segments of chromosomes associated with the common vertebrate ancestor. [26] A systematic comparison was carried out between the draft genome sequence of Fugu and humans for identifying paralogous chromosomal regions in the Fugu. The duplicate genes were indicated in the Fugu after using phylogenetic analyses of the Fugu,human and invertebrate sequences. The analysis determined evidence for 425 fish-specific duplicate genes in the Fugu and indicated that at least 6.6% of the genome is represented by fish-specific paralogons. The study also strongly suggests a whole-genome duplication during the ray-finned fish evolution,which may also have occurred before the origin of teleosts. [27]
In 2013,Christoffels reported the analysis of the African coelacanth genome. [28] The genome sequence provided genetic changes which address the adaption from the aquatic environment to land. [29] The study was conducted for obtaining insights into the tetrapod evolution. The phylogenomic analysis highlighted the closest living association of lungfish with the tetrapod instead of coelacanth. The protein-coding genes of Coelacanth also revealed a slow evolution than that of tetrapod. [30]
In 2013,Christoffels completed the annotation of the coelacanth genome in his lab. Later on,the analysis on the taste and odorant receptors in the coelacanth was carried out and it was demonstrated that the repertoire of GPCR chemosensory receptors (CRs) of the Coelacanth supports its intermediary position. [31]
The Tsetse project spanned nearly 10 years. [32] [33] Christoffels was part of the executive team who managed the genome project, [34] and led the scientific analyses by supervising the Ph.D. students who analyzed different regions of the genome in detail as it pertains to innate protection against the pathogen (Trypanosome), [35] Trypanosomatid SNAREs comparison, [36] chemical signaling to find a host, [37] and promoter architecture in Tsetse. [38]
Followed by his sequencing projects in coelacanth and Tsetse genomes,Christoffels co-lead the genome assembly and annotation of the Asian Seabass in 2014. [39] The group combined for the first-time long reads (PacBio) and short reads (illumina) sequencing data to assemble a non-model eukaryotic genome at that time. He also led the genome assembly and annotation teams in South Africa and Singapore. [40] He had defined the parameters for non-model organisms as well. [41]
He has authored three books,titled How to be a Health Activist:Teacher’s Guide,How to Be a Health Activist:A Life Orientation Workbook,and How to Be a Health Activist:A life skills workbook for grades 7-9 learners.
Throughout his research career,Christoffels has added a community engagement dimension to his work. He has achieved this initially with integrating awareness of Tuberculosis into school curricula activities. Later he developed audio books for communicating the value of biobanks in multiple languages. [42]
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data,especially when the data sets are large and complex. Bioinformatics uses biology,chemistry,physics,computer science,computer programming,information engineering,mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can some times referred to as computational biology,however this distinction between the two terms is often disputed. To some,the term computational biology refers to building and using models of biological systems.
The human genome is a complete set of nucleic acid sequences for humans,encoded as the DNA within each of the 24 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual mitochondria. These are usually treated separately as the nuclear genome and the mitochondrial genome. Human genomes include both protein-coding DNA sequences and various types of DNA that does not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA,such as that for ribosomal RNA,transfer RNA,ribozymes,small nuclear RNAs,and several types of regulatory RNAs. It also includes promoters and their associated gene-regulatory elements,DNA playing structural and replicatory roles,such as scaffolding regions,telomeres,centromeres,and origins of replication,plus large numbers of transposable elements,inserted viral DNA,non-functional pseudogenes and simple,highly repetitive sequences. Introns make up a large percentage of non-coding DNA. Some of this non-coding DNA is non-functional junk DNA,such as pseudogenes,but there is no firm consensus on the total amount of junk DNA.
Genomics is an interdisciplinary field of molecular biology focusing on the structure,function,evolution,mapping,and editing of genomes. A genome is an organism's complete set of DNA,including all of its genes as well as its hierarchical,three-dimensional structural configuration. In contrast to genetics,which refers to the study of individual genes and their roles in inheritance,genomics aims at the collective characterization and quantification of all of an organism's genes,their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn,proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.
In bioinformatics,sequence analysis is the process of subjecting a DNA,RNA or peptide sequence to any of a wide range of analytical methods to understand its features,function,structure,or evolution. It can be performed on the entire genome,transcriptome or proteome of an organism,and can also involve only selected segments or regions,like tandem repeats and transposable elements. Methodologies used include sequence alignment,searches against biological databases,and others.
In genetics,an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts,and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proceeded rapidly,with approximately 74.2 million ESTs now available in public databases. EST approaches have largely been superseded by whole genome and transcriptome sequencing and metagenome sequencing.
Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species,spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees. This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences,researchers gain insights into genetic relationships between organisms and study evolutionary changes. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore,Comparative genomics provides a powerful tool for studying evolutionary changes among organisms,helping to identify genes that are conserved or common among species,as well as genes that give unique characteristics of each organism. Moreover,these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.
Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription,translation,regulation of gene expression and protein–protein interactions,as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions,generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.
Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics,ecogenomics,community genomics or microbiomics.
In bioinformatics,k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis,in which k-mers are composed of nucleotides,k-mers are capitalized upon to assemble DNA sequences,improve heterologous gene expression,identify species in metagenomic samples,and create attenuated vaccines. Usually,the term k-mer refers to all of a sequence's subsequences of length ,such that the sequence AGAT would have four monomers,three 2-mers,two 3-mers and one 4-mer (AGAT). More generally,a sequence of length will have k-mers and total possible k-mers,where is number of possible monomers.
Population genomics is the large-scale comparison of DNA sequences of populations. Population genomics is a neologism that is associated with population genetics. Population genomics studies genome-wide effects to improve our understanding of microevolution so that we may learn the phylogenetic history and demography of a population.
RNA-Seq is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample,providing a snapshot of gene expression in the sample,also known as transcriptome.
Pathogenomics is a field which uses high-throughput screening technology and bioinformatics to study encoded microbe resistance,as well as virulence factors (VFs),which enable a microorganism to infect a host and possibly cause disease. This includes studying genomes of pathogens which cannot be cultured outside of a host. In the past,researchers and medical professionals found it difficult to study and understand pathogenic traits of infectious organisms. With newer technology,pathogen genomes can be identified and sequenced in a much shorter time and at a lower cost,thus improving the ability to diagnose,treat,and even predict and prevent pathogenic infections and disease. It has also allowed researchers to better understand genome evolution events - gene loss,gain,duplication,rearrangement - and how those events impact pathogen resistance and ability to cause disease. This influx of information has created a need for bioinformatics tools and databases to analyze and make the vast amounts of data accessible to researchers,and it has raised ethical questions about the wisdom of reconstructing previously extinct and deadly pathogens in order to better understand virulence.
In molecular biology and genetics,DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome,by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things,it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.
Viral metagenomics uses metagenomic technologies to detect viral genomic material from diverse environmental and clinical samples. Viruses are the most abundant biological entity and are extremely diverse;however,only a small fraction of viruses have been sequenced and only an even smaller fraction have been isolated and cultured. Sequencing viruses can be challenging because viruses lack a universally conserved marker gene so gene-based approaches are limited. Metagenomics can be used to study and analyze unculturable viruses and has been an important tool in understanding viral diversity and abundance and in the discovery of novel viruses. For example,metagenomics methods have been used to describe viruses associated with cancerous tumors and in terrestrial ecosystems.
Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted,collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.
Metatranscriptomics is the set of techniques used to study gene expression of microbes within natural environments,i.e.,the metatranscriptome.
Bacterial phylodynamics is the study of immunology,epidemiology,and phylogenetics of bacterial pathogens to better understand the evolutionary role of these pathogens. Phylodynamic analysis includes analyzing genetic diversity,natural selection,and population dynamics of infectious disease pathogen phylogenies during pandemics and studying intra-host evolution of viruses. Phylodynamics combines the study of phylogenetic analysis,ecological,and evolutionary processes to better understand of the mechanisms that drive spatiotemporal incidence and phylogenetic patterns of bacterial pathogens. Bacterial phylodynamics uses genome-wide single-nucleotide polymorphisms (SNP) in order to better understand the evolutionary mechanism of bacterial pathogens. Many phylodynamic studies have been performed on viruses,specifically RNA viruses which have high mutation rates. The field of bacterial phylodynamics has increased substantially due to the advancement of next-generation sequencing and the amount of data available.
Transcriptomics technologies are the techniques used to study an organism's transcriptome,the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here,mRNA serves as a transient intermediary molecule in the information network,whilst non-coding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell. Transcriptomics technologies provide a broad account of which cellular processes are active and which are dormant. A major challenge in molecular biology is to understand how a single genome gives rise to a variety of cells. Another is how gene expression is regulated.
Clinical metagenomic next-generation sequencing (mNGS) is the comprehensive analysis of microbial and host genetic material in clinical samples from patients by next-generation sequencing. It uses the techniques of metagenomics to identify and characterize the genome of bacteria,fungi,parasites,and viruses without the need for a prior knowledge of a specific pathogen directly from clinical specimens. The capacity to detect all the potential pathogens in a sample makes metagenomic next generation sequencing a potent tool in the diagnosis of infectious disease especially when other more directed assays,such as PCR,fail. Its limitations include clinical utility,laboratory validity,sense and sensitivity,cost and regulatory considerations.