Jianlin Cheng

Last updated

Jianlin (Jack) Cheng is the William and Nancy Thompson Missouri Distinguished Professor in the Electrical Engineering and Computer Science (EECS) Department at the University of Missouri, Columbia. He earned his PhD from the University of California-Irvine in 2006, his MS degree from Utah State University in 2001, and his BS degree from Huazhong University of Science and Technology in 1994. [1]

Contents

His research interests include bioinformatics, machine learning and artificial intelligence. His current research is focused on protein structure and function prediction, [2] 3D genome structure modeling, [3] biological network construction, [4] and deep learning with applications to big data in biomedical domains.

Dr. Cheng has more than 180 publications in the field of bioinformatics, computational biology, artificial intelligence, and machine learning, which have been cited thousands of times according to Google Scholar Citations. He and his students developed one of the first deep learning methods for protein structure prediction and demonstrated that deep learning was the best method for protein structure prediction for the first time in the 10th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP10) in 2012. His protein structure prediction methods (MULTICOM) supported by the National Institutes of Health (NIH) and the National Science Foundation (NSF) were consistently ranked among the top methods during the last several rounds of the community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP) from 2008 to 2022. Dr. Cheng was a recipient of 2012 NSF CAREER award for his work on 3D genome structure modeling. He is a fellow of American Institute for Medical and Biological Engineering (AIMBE) and a fellow of Asia-Pacific Artificial Intelligence Association (AAIA).

Selected publications

  1. Chen, C., Chen, X., Morehead, A., Wu, T., Cheng, J. (2023) 3D-equivariant graph neural networks for protein model quality assessment. Bioinformatics, accepted.
  2. Guo, Z., Liu, J., Skolnick, J., Cheng, J. (2022) Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nature Communications. 13:6963. .
  3. Liu, J., Wu, T., Guo, Z., Hou, J., & Cheng, J. (2022). Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins: Structure, Function, and Bioinformatics, 90(1), 58-72.
  4. Chen, C., Wu, T., Guo, Z., & Cheng, J. (2021). Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction. Proteins: Structure, Function, and Bioinformatics, 89(6), 697-707.
  5. Wu, T., Guo, Z., Hou, J., & Cheng, J. (2021). DeepDist: real-value inter-residue distance prediction with deep residual convolutional network. BMC bioinformatics, 22, 1-17.
  6. Hou, J., Wu, T., Cao, R., & Cheng, J. (2019). Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins: Structure, Function, and Bioinformatics, 87(12), 1165-1178.
  7. T. Trieu, J. Cheng. Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data. Nucleic Acids Research. 42(7):e52, 2014. paper
  8. M. Zhu, J. Dahmen, G. Stacey, J. Cheng. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data. BMC Bioinformatics. 14:278, 2013. paper
  9. J. Eickholt, J. Cheng. A Study and Extension of DNcon: a Method for Protein Residue-Residue Contact Prediction Using Deep Networks. BMC Bioinformatics. 14(Suppl 14):S12, 2013. paper
  10. J. Eickholt, J. Cheng. Predicting Protein Residue-Residue Contacts Using Deep Networks and Boosting. Bioinformatics. 28(23):3066-3072, 2012. paper

Related Research Articles

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignment, searches against biological databases, and others.

<span class="mw-page-title-main">Protein structure prediction</span> Type of biological prediction

Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its secondary and tertiary structure from primary structure. Structure prediction is different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by computational biology; and it is important in medicine and biotechnology.

<span class="mw-page-title-main">CASP</span> Protein structure prediction challenge

Critical Assessment of Structure Prediction (CASP), sometimes called Critical Assessment of Protein Structure Prediction, is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994. CASP provides research groups with an opportunity to objectively test their structure prediction methods and delivers an independent assessment of the state of the art in protein structure modeling to the research community and software users. Even though the primary goal of CASP is to help advance the methods of identifying protein three-dimensional structure from its amino acid sequence many view the experiment more as a “world championship” in this field of science. More than 100 research groups from all over the world participate in CASP on a regular basis and it is not uncommon for entire groups to suspend their other research for months while they focus on getting their servers ready for the experiment and on performing the detailed predictions.

<span class="mw-page-title-main">Structural alignment</span> Aligning molecular sequences using sequence and structural information

Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.

<span class="mw-page-title-main">Structural bioinformatics</span> Bioinformatics subfield

Structural bioinformatics is the branch of bioinformatics that is related to the analysis and prediction of the three-dimensional structure of biological macromolecules such as proteins, RNA, and DNA. It deals with generalizations about macromolecular 3D structures such as comparisons of overall folds and local motifs, principles of molecular folding, evolution, binding interactions, and structure/function relationships, working both from experimentally solved structures and from computational models. The term structural has the same meaning as in structural biology, and structural bioinformatics can be seen as a part of computational structural biology. The main objective of structural bioinformatics is the creation of new methods of analysing and manipulating biological macromolecular data in order to solve problems in biology and generate new knowledge.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

Protein subcellular localization prediction involves the prediction of where a protein resides in a cell, its subcellular localization.

Vasant G. Honavar is an Indian born American computer scientist, and artificial intelligence, machine learning, big data, data science, causal inference, knowledge representation, bioinformatics and health informatics researcher and professor.

<span class="mw-page-title-main">Homology modeling</span> Method of protein structure prediction using other known proteins

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein. Homology modeling relies on the identification of one or more known protein structures likely to resemble the structure of the query sequence, and on the production of an alignment that maps residues in the query sequence to residues in the template sequence. It has been seen that protein structures are more conserved than protein sequences amongst homologues, but sequences falling below a 20% sequence identity can have very different structure.

The global distance test (GDT), also written as GDT_TS to represent "total score", is a measure of similarity between two protein structures with known amino acid correspondences but different tertiary structures. It is most commonly used to compare the results of protein structure prediction to the experimentally determined structure as measured by X-ray crystallography, protein NMR, or, increasingly, cryoelectron microscopy. The metric was developed by Adam Zemla at Lawrence Livermore National Laboratory and originally implemented in the Local-Global Alignment (LGA) program. It is intended as a more accurate measurement than the common root-mean-square deviation (RMSD) metric - which is sensitive to outlier regions created, for example, by poor modeling of individual loop regions in a structure that is otherwise reasonably accurate. The conventional GDT_TS score is computed over the alpha carbon atoms and is reported as a percentage, ranging from 0 to 100. In general, the higher the GDT_TS score, the more closely a model approximates a given reference structure.

Structural and physical properties of DNA provide important constraints on the binding sites formed on surfaces of DNA-binding proteins. Characteristics of such binding sites may be used for predicting DNA-binding sites from the structural and even sequence properties of unbound proteins. This approach has been successfully implemented for predicting the protein–protein interface. Here, this approach is adopted for predicting DNA-binding sites in DNA-binding proteins. First attempt to use sequence and evolutionary features to predict DNA-binding sites in proteins was made by Ahmad et al. (2004) and Ahmad and Sarai (2005). Some methods use structural information to predict DNA-binding sites and therefore require a three-dimensional structure of the protein, while others use only sequence information and do not require protein structure in order to make a prediction.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

Phyre and Phyre2 are free web-based services for protein structure prediction. Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times. Like other remote homology recognition techniques, it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods. Its development is funded by the Biotechnology and Biological Sciences Research Council.

RaptorX is a software and web server for protein structure and function prediction that is free for non-commercial use. RaptorX is among the most popular methods for protein structure prediction. Like other remote homology recognition/protein threading techniques, RaptorX is able to regularly generate reliable protein models when the widely used PSI-BLAST cannot. However, RaptorX is also significantly different from those profile-based methods in that RaptorX excels at modeling of protein sequences without a large number of sequence homologs by exploiting structure information. RaptorX Server has been designed to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

<span class="mw-page-title-main">David T. Jones (scientist)</span>

David Tudor Jones FRS is a Professor of Bioinformatics, and Head of Bioinformatics Group in the University College London. He is also the director in Bloomsbury Center for Bioinformatics, which is a joint Research Centre between UCL and Birkbeck, University of London and which also provides bioinformatics training and support services to biomedical researchers. In 2013, he is a member of editorial boards for PLoS ONE, BioData Mining, Advanced Bioinformatics, Chemical Biology & Drug Design, and Protein: Structure, Function and Bioinformatics.

<span class="mw-page-title-main">Burkhard Rost</span> German computational biology researcher

Burkhard Rost is a scientist leading the Department for Computational Biology & Bioinformatics at the Faculty of Informatics of the Technical University of Munich (TUM). Rost chairs the Study Section Bioinformatics Munich involving the TUM and the Ludwig Maximilian University of Munich (LMU) in Munich. From 2007-2014 Rost was President of the International Society for Computational Biology (ISCB).

<span class="mw-page-title-main">I-TASSER</span>

I-TASSER is a bioinformatics method for predicting three-dimensional structure model of protein molecules from amino acid sequences. It detects structure templates from the Protein Data Bank by a technique called fold recognition. The full-length structure models are constructed by reassembling structural fragments from threading templates using replica exchange Monte Carlo simulations. I-TASSER is one of the most successful protein structure prediction methods in the community-wide CASP experiments.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. The program is designed as a deep learning system.

References

  1. "Cheng, Jianlin: Mizzou Engineering".
  2. "The MULTICOM Toolbox for Protein Structure Prediction".
  3. "NSF CAREER Project: Analysis, Construction, Visualization, and Modeling of 3D Genome Structures".
  4. "MU Center for Botanical Interaction Studies".