List of gene prediction software

Last updated

This is a list of software tools and web portals used for gene prediction.

NameDescriptionSpeciesReferences
FINDER Automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequencesEukaryotes [1]
FragGeneScan Predicting genes in complete genomes and sequencing ReadsProkaryotes, Metagenomes [2]
ATGpr Identifies translational initiation sites in cDNA sequencesHuman [3]
Prodigal Its name stands for Prokaryotic Dynamic Programming Genefinding Algorithm. It is based on log-likelihood functions and does not use Hidden or Interpolated Markov Models.Prokaryotes, Metagenomes (metaProdigal) [4]
AUGUSTUS Eukaryote gene predictorEukaryotes [5]
BGF Hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program [6]
DIOGENESFast detection of coding regions in short genome sequences
Dragon Promoter FinderProgram to recognize vertebrate RNA polymerase II promotersVertebrates [7]
EasyGene The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome.Prokaryotes [8] [9]
EuGene Integrative gene findingProkaryotes, Eukaryotes [10] [11]
FGENESH HMM-based gene structure prediction: multiple genes, both chainsEukaryotes [12]
FrameD Find genes and frameshift in G+C rich prokaryote sequencesProkaryotes, Eukaryotes [13]
GeMoMa Homology-based gene prediction based on amino acid and intron position conservation as well as RNA-Seq data [14] [15]
GENIUS IILinks ORFs in complete genomes to protein 3D structuresProkaryotes, Eukaryotes [16]
geneid Program to predict genes, exons, splice sites, and other signals along DNA sequencesEukaryotes [17]
GeneParser Parse DNA sequences into introns and exonsEukaryotes [18]
GeneMark Family of self-training gene prediction programsProkaryotes, Eukaryotes,

Metagenomes

[19] [20] [21] [22]
GeneTack Predicts genes with frameshifts in prokaryote genomesProkaryotes [23]
GenomeScan Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms, GENSCAN server is the GenomeScan's predecessorVertebrate, Arabidopsis, Maize [24]
GENSCAN Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organismsVertebrate, Arabidopsis, Maize [25] [26] [27]
GLIMMER Finds genes in microbial DNAProkaryotes [28] [29] [30]
GLIMMERHMM Eukaryotic gene-finding systemEukaryotes [31]
GrailEXPPredicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repeat elements in DNA sequenceHuman, Mus musculus, Arabidopsis thaliana, Drosophila melanogaster [32] [33]
mGene Support-vector machine (SVM) based system to find genesEukaryotes [34]
mGene.ngsSVM based system to find genes using heterogeneous information: RNA-seq, tiling arraysEukaryotes [35]
MORGAN Decision tree system to find genes in vertebrate DNAEukaryotes [36]
BioNIX Web tool to combine results from different programs: GRAIL, FEX, HEXON, MZEF, GENEMARK, GENEFINDER, FGENE, BLAST, POLYAH, REPEATMASKER, TRNASCANProkaryotes, Eukaryotes [37]
NNPP Neural network promoter predictionProkaryotes, Eukaryotes [38]
NNSPLICE Neural network splice site predictionDrosophila, Human [39]
ORFfinder Graphical analysis tool to find all open reading frames Prokaryotes, Eukaryotes [40]
Regulatory Sequence Analysis Tools Series of modular computer programs to detect regulatory signals in non-coding sequencesFungi, Prokaryotes, Metazoa, Protist, Plants [41] [42]
PHANOTATE A tool to annotate phage genomes.Phages [43]
SplicePredictorMethod to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical modelsEukaryotes [44]
VEIL Hidden Markov model to find genes in vertebrate DNA ServerEukaryotes [45]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is often referred to as computational biology, though the distinction between the two terms is often disputed.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">UniProt</span> Database of protein sequences and functional information

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature. It is maintained by the UniProt consortium, which consists of several European bioinformatics organisations and a foundation from Washington, DC, USA.

<span class="mw-page-title-main">Steven Salzberg</span> American biologist and computer scientist

Steven Lloyd Salzberg is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.

A ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.

MUMmer is a bioinformatics software system for sequence alignment. It is based on the suffix tree data structure. It has been used for comparing different genomes assemblies to one another, which allows scientists to determine how a genome has changed. The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

GeneMark is a generic name for a family of ab initio gene prediction algorithms and software programs developed at the Georgia Institute of Technology in Atlanta. Developed in 1993, original GeneMark was used in 1995 as a primary gene prediction tool for annotation of the first completely sequenced bacterial genome of Haemophilus influenzae, and in 1996 for the first archaeal genome of Methanococcus jannaschii. The algorithm introduced inhomogeneous three-periodic Markov chain models of protein-coding DNA sequence that became standard in gene prediction as well as Bayesian approach to gene prediction in two DNA strands simultaneously. Species specific parameters of the models were estimated from training sets of sequences of known type. The major step of the algorithm computes for a given DNA fragment posterior probabilities of either being "protein-coding" in each of six possible reading frames or being "non-coding". The original GeneMark was an HMM-like algorithm; it could be viewed as approximation to known in the HMM theory posterior decoding algorithm for appropriately defined HMM model of DNA sequence.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

Single nucleotide polymorphism annotation is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.

SEA-PHAGES stands for Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science; it was formerly called the National Genomics Research Initiative. This was the first initiative launched by the Howard Hughes Medical Institute (HHMI) Science Education Alliance (SEA) by their director Tuajuanda C. Jordan in 2008 to improve the retention of Science, technology, engineering, and mathematics (STEM) students. SEA-PHAGES is a two-semester undergraduate research program administered by the University of Pittsburgh's Graham Hatfull's group and the Howard Hughes Medical Institute's Science Education Division. Students from over 100 universities nationwide engage in authentic individual research that includes a wet-bench laboratory and a bioinformatics component.

<span class="mw-page-title-main">Wojciech Karlowski</span> Polish biologist specializing in molecular biology and bioinformatics

Wojciech Maciej Karlowski is a Polish biologist specializing in molecular biology and bioinformatics, and a full professor in biological sciences. He is Head of the Department of Computational Biology at the Faculty of Biology at the Adam Mickiewicz University in Poznan. His major scientific interests include identification of non-coding RNAs, genomics, high-throughput analyses, and functional annotation of biological sequences.

<span class="mw-page-title-main">Genome mining</span>

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions. It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry, such as discovering novel natural products.

References

  1. Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (Apr 2021). "FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences". BMC Bioinformatics. 44 (9): e89. doi: 10.1186/s12859-021-04120-9 . PMC   8056616 . PMID   33879057.
  2. Rho M, Tang H, Ye Y (November 2010). "FragGeneScan: predicting genes in short and error-prone reads". Nucleic Acids Research. 38 (20): e191. doi:10.1093/nar/gkq747. PMC   2978382 . PMID   20805240.
  3. Nishikawa, Tetsuo; Ota, Toshio; Isogai, Takao (2000-11-01). "Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences". Bioinformatics. 16 (11): 960–967. doi: 10.1093/bioinformatics/16.11.960 . ISSN   1367-4803. PMID   11159307.
  4. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (March 2010). "Prodigal: prokaryotic gene recognition and translation initiation site identification". BMC Bioinformatics. 11: 119. doi: 10.1186/1471-2105-11-119 . PMC   2848648 . PMID   20211023.
  5. Keller O, Kollmar M, Stanke M, Waack S (March 2011). "A novel hybrid gene prediction method employing protein multiple sequence alignments". Bioinformatics. 27 (6): 757–63. doi: 10.1093/bioinformatics/btr010 . hdl: 11858/00-001M-0000-0011-F244-D . PMID   21216780.
  6. Li, Heng; Liu, Jin-Song; Xu, Zhao; Jin, Jiao; Fang, Lin; Gao, Lei; Li, Yu-Dong; Xing, Zi-Xing; Gao, Shao-Gen; Liu, Tao; Li, Hai-Hong (2005-07-01). "Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome". Journal of Computer Science and Technology. 20 (4): 446–453. doi:10.1007/s11390-005-0446-x. ISSN   1860-4749. S2CID   13497894.
  7. Bajic, Vladimir B.; Seah, Seng Hong; Chong, Allen; Zhang, Guanglan; Koh, Judice L. Y.; Brusic, Vladimir (2002-01-01). "Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters". Bioinformatics. 18 (1): 198–199. doi: 10.1093/bioinformatics/18.1.198 . ISSN   1367-4803. PMID   11836231.
  8. Nielsen, P.; Krogh, A. (2005-12-15). "Large-scale prokaryotic gene prediction and comparison to genome annotation". Bioinformatics. 21 (24): 4322–4329. doi: 10.1093/bioinformatics/bti701 . ISSN   1367-4803. PMID   16249266.
  9. Larsen, Thomas Schou; Krogh, Anders (2003-06-03). "EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance". BMC Bioinformatics. 4 (1): 21. doi: 10.1186/1471-2105-4-21 . ISSN   1471-2105. PMC   521197 . PMID   12783628.
  10. Foissac S, Gouzy J, Rombauts S, Mathé C, Amselem J, Sterck L, de Peer YV, Rouzé P, Schiex T (May 2008). "Genome annotation in plants and fungi: EuGene as a model platform". Current Bioinformatics. 3 (2): 87–97. doi:10.2174/157489308784340702.
  11. Sallet, Erika; Gouzy, Jérôme; Schiex, Thomas (2019), Kollmar, Martin (ed.), "EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes", Gene Prediction: Methods and Protocols, Methods in Molecular Biology, vol. 1962, New York, NY: Springer, pp. 97–120, doi:10.1007/978-1-4939-9173-0_6, ISBN   978-1-4939-9173-0, PMID   31020556, S2CID   131776381 , retrieved 2021-11-24
  12. Salamov AA, Solovyev VV (April 2000). "Ab initio gene finding in Drosophila genomic DNA". Genome Research. 10 (4): 516–22. doi:10.1101/gr.10.4.516. PMC   310882 . PMID   10779491.
  13. Schiex T, Gouzy J, Moisan A, de Oliveira Y (July 2003). "FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences". Nucleic Acids Research. 31 (13): 3738–41. doi:10.1093/nar/gkg610. PMC   169016 . PMID   12824407.
  14. Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F (May 2016). "Using intron position conservation for homology-based gene prediction". Nucleic Acids Research. 44 (9): e89. doi: 10.1186/s12859-018-2203-5 . PMC   4872089 . PMID   26893356.
  15. Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J (May 2018). "Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi". BMC Bioinformatics. 19 (1): 189. doi:10.1093/nar/gkw092. PMC   5975413 . PMID   29843602.
  16. Yabuki, Yukimitsu; Mukai, Yuri; Swindells, Mark B.; Suwa, Makiko (2004-03-01). "GENIUS II: a high-throughput database system for linking ORFs in complete genomes to known protein three-dimensional structures". Bioinformatics. 20 (4): 596–598. doi: 10.1093/bioinformatics/btg478 . ISSN   1367-4803. PMID   14751990.
  17. Blanco, Enrique; Parra, Genís; Guigó, Roderic (June 2007), "Using geneid to Identify Genes", Current Protocols in Bioinformatics, Chapter 4, John Wiley & Sons, Inc.: 4.3.1–4.3.28, doi:10.1002/0471250953.bi0403s18, ISBN   978-0471250951, PMID   18428791
  18. Snyder, Eric E.; Stormo, Gary D. (1995-04-21). "Identification of Protein Coding Regions In Genomic DNA". Journal of Molecular Biology. 248 (1): 1–18. doi:10.1006/jmbi.1995.0198. ISSN   0022-2836. PMID   7731036.
  19. Lukashin AV, Borodovsky M (February 1998). "GeneMark.hmm: new solutions for gene finding". Nucleic Acids Research. 26 (4): 1107–15. doi:10.1093/nar/26.4.1107. PMC   147337 . PMID   9461475.
  20. Besemer J, Lomsadze A, Borodovsky M (June 2001). "GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions". Nucleic Acids Research. 29 (12): 2607–18. doi:10.1093/nar/29.12.2607. PMC   55746 . PMID   11410670.
  21. Lomsadze A, Burns PD, Borodovsky M (September 2014). "Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm". Nucleic Acids Research. 42 (15): e119. doi:10.1093/nar/gku557. PMC   4150757 . PMID   24990371.
  22. Zhu W, Lomsadze A, Borodovsky M (July 2010). "Ab initio gene identification in metagenomic sequences". Nucleic Acids Research. 38 (12): e132. doi:10.1093/nar/gkq275. PMC   2896542 . PMID   20403810.
  23. Antonov I, Borodovsky M (June 2010). "Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm". Journal of Bioinformatics and Computational Biology. 8 (3): 535–51. doi: 10.1142/S0219720010004847 . PMID   20556861.
  24. Yeh, Ru-Fang; Lim, Lee P.; Burge, Christopher B. (2001-05-01). "Computational Inference of Homologous Gene Structures in the Human Genome". Genome Research. 11 (5): 803–816. doi:10.1101/gr.175701. ISSN   1088-9051. PMC   311055 . PMID   11337476.
  25. Burge, Chris; Karlin, Samuel (1997-04-25). "Prediction of complete gene structures in human genomic DNA11Edited by F. E. Cohen". Journal of Molecular Biology. 268 (1): 78–94. doi: 10.1006/jmbi.1997.0951 . ISSN   0022-2836. PMID   9149143.
  26. Burge, Christopher B. (1998-01-01), Salzberg, Steven L.; Searls, David B.; Kasif, Simon (eds.), "Chapter 8 - Modeling dependencies in pre-mRNA splicing signals", New Comprehensive Biochemistry, Computational Methods in Molecular Biology, vol. 32, Elsevier, pp. 129–164, doi:10.1016/S0167-7306(08)60465-2, ISBN   978-0-444-82875-0 , retrieved 2021-11-24
  27. Burge, Christopher B; Karlin, Samuel (1998-06-01). "Finding the genes in genomic DNA". Current Opinion in Structural Biology. 8 (3): 346–354. doi: 10.1016/S0959-440X(98)80069-9 . ISSN   0959-440X. PMID   9666331.
  28. Delcher, Arthur L.; Bratke, Kirsten A.; Powers, Edwin C.; Salzberg, Steven L. (2007-01-19). "Identifying bacterial genes and endosymbiont DNA with Glimmer". Bioinformatics. 23 (6): 673–679. doi:10.1093/bioinformatics/btm009. ISSN   1460-2059. PMC   2387122 . PMID   17237039.
  29. Delcher, A. (1999-12-01). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research. 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. ISSN   1362-4962. PMC   148753 . PMID   10556321.
  30. Salzberg, S. L.; Delcher, A. L.; Kasif, S.; White, O. (1998-01-01). "Microbial gene identification using interpolated Markov models". Nucleic Acids Research. 26 (2): 544–548. doi:10.1093/nar/26.2.544. ISSN   0305-1048. PMC   147303 . PMID   9421513.
  31. Majoros WH, Pertea M, Salzberg SL (November 2004). "TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders". Bioinformatics. 20 (16): 2878–9. doi: 10.1093/bioinformatics/bth315 . PMID   15145805.
  32. Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2004). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Bioinformatics. 8 (1): 4.9.1–4.9.15. doi:10.1002/0471250953.bi0409s04. ISSN   1934-340X. PMID   18428726.
  33. Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2003). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Human Genetics. 39 (1): 6.5.1–6.5.15. doi:10.1002/0471142905.hg0605s39. ISSN   1934-8258. PMID   18428363. S2CID   21431978.
  34. Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, et al. (November 2009). "mGene: accurate SVM-based gene finding with an application to nematode genomes". Genome Research. 19 (11): 2133–43. doi:10.1101/gr.090597.108. PMC   2775605 . PMID   19564452.
  35. Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, et al. (August 2011). "Multiple reference genomes and transcriptomes for Arabidopsis thaliana". Nature. 477 (7365): 419–23. Bibcode:2011Natur.477..419G. doi:10.1038/nature10414. PMC   4856438 . PMID   21874022.
  36. "MORGAN". sites.stat.washington.edu. Retrieved 2021-11-24.
  37. Bedő, Justin; Di Stefano, Leon; Papenfuss, Anthony T (November 2020). "Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix". GigaScience. 9 (11). doi:10.1093/gigascience/giaa121. ISSN   2047-217X. PMC   7672450 . PMID   33205815.
  38. Reese, Martin G (2001-12-01). "Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome". Computers & Chemistry. 26 (1): 51–56. doi:10.1016/S0097-8485(01)00099-7. ISSN   0097-8485. PMID   11765852.
  39. Reese, Martin G.; Eeckman, Frank H.; Kulp, David; Haussler, David (1997-01-01). "Improved Splice Site Detection in Genie". Journal of Computational Biology. 4 (3): 311–323. doi:10.1089/cmb.1997.4.311. PMID   9278062.
  40. "Home - ORFfinder - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2021-11-24.
  41. Santana-Garcia, Walter; Rocha-Acevedo, Maria; Ramirez-Navarro, Lucia; Mbouamboua, Yvon; Thieffry, Denis; Thomas-Chollier, Morgane; Contreras-Moreira, Bruno; van Helden, Jacques; Medina-Rivera, Alejandra (2019-01-01). "RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding". Computational and Structural Biotechnology Journal. 17: 1415–1428. doi:10.1016/j.csbj.2019.09.009. ISSN   2001-0370. PMC   6906655 . PMID   31871587.
  42. Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques (2018-05-02). "RSAT 2018: regulatory sequence analysis tools 20th anniversary". Nucleic Acids Research. 46 (W1): W209–W214. doi:10.1093/nar/gky317. ISSN   0305-1048. PMC   6030903 . PMID   29722874.
  43. McNair, Katelyn; Zhou, Carol; Dinsdale, Elizabeth A.; Souza, Brian; Edwards, Robert A. (2019-11-01). "PHANOTATE: a novel approach to gene identification in phage genomes". Bioinformatics. 35 (22): 4537–4542. doi: 10.1093/bioinformatics/btz265 . ISSN   1367-4803. PMC   6853651 . PMID   31329826.
  44. Brendel, V.; Xing, L.; Zhu, W. (2004-02-05). "Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus". Bioinformatics. 20 (7): 1157–1169. doi: 10.1093/bioinformatics/bth058 . ISSN   1367-4803. PMID   14764557.
  45. Henderson, John; Salzberg, Steven; Fasman, Kenneth H. (1997-01-01). "Finding Genes in DNA with a Hidden Markov Model". Journal of Computational Biology. 4 (2): 127–141. doi:10.1089/cmb.1997.4.127. hdl: 1903/8004 . PMID   9228612.