List of gene prediction software

Last updated

This is a list of software tools and web portals used for gene prediction.

NameDescriptionSpeciesReferences
FINDER Automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequencesEukaryotes [1]
FragGeneScan Predicting genes in complete genomes and sequencing ReadsProkaryotes, Metagenomes [2]
ATGpr Identifies translational initiation sites in cDNA sequencesHuman [3]
Prodigal Its name stands for Prokaryotic Dynamic Programming Genefinding Algorithm. It is based on log-likelihood functions and does not use Hidden or Interpolated Markov Models.Prokaryotes, Metagenomes (metaProdigal) [4]
AUGUSTUS Eukaryote gene predictorEukaryotes [5]
BGF Hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program [6]
DIOGENESFast detection of coding regions in short genome sequences
Dragon Promoter FinderProgram to recognize vertebrate RNA polymerase II promotersVertebrates [7]
EasyGene The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome.Prokaryotes [8] [9]
EuGene Integrative gene findingProkaryotes, Eukaryotes [10] [11]
FGENESH HMM-based gene structure prediction: multiple genes, both chainsEukaryotes [12]
FrameD Find genes and frameshift in G+C rich prokaryote sequencesProkaryotes, Eukaryotes [13]
GeMoMa Homology-based gene prediction based on amino acid and intron position conservation as well as RNA-Seq data [14] [15]
GENIUS IILinks ORFs in complete genomes to protein 3D structuresProkaryotes, Eukaryotes [16]
geneid Program to predict genes, exons, splice sites, and other signals along DNA sequencesEukaryotes [17]
GeneParser Parse DNA sequences into introns and exonsEukaryotes [18]
GeneMark Family of self-training gene prediction programsProkaryotes, Eukaryotes,

Metagenomes

[19] [20] [21] [22]
GeneTack Predicts genes with frameshifts in prokaryote genomesProkaryotes [23]
GenomeScan Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organisms, GENSCAN server is the GenomeScan's predecessorVertebrate, Arabidopsis, Maize [24]
GENSCAN Predicts the locations and exon-intron structures of genes in genome sequences from a variety of organismsVertebrate, Arabidopsis, Maize [25] [26] [27]
GLIMMER Finds genes in microbial DNAProkaryotes [28] [29] [30]
GLIMMERHMM Eukaryotic gene-finding systemEukaryotes [31]
GrailEXPPredicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repeat elements in DNA sequenceHuman, Mus musculus, Arabidopsis thaliana, Drosophila melanogaster [32] [33]
mGene Support-vector machine (SVM) based system to find genesEukaryotes [34]
mGene.ngsSVM based system to find genes using heterogeneous information: RNA-seq, tiling arraysEukaryotes [35]
MORGAN Decision tree system to find genes in vertebrate DNAEukaryotes [36]
BioNIX Web tool to combine results from different programs: GRAIL, FEX, HEXON, MZEF, GENEMARK, GENEFINDER, FGENE, BLAST, POLYAH, REPEATMASKER, TRNASCANProkaryotes, Eukaryotes [37]
NNPP Neural network promoter predictionProkaryotes, Eukaryotes [38]
NNSPLICE Neural network splice site predictionDrosophila, Human [39]
ORFfinder Graphical analysis tool to find all open reading frames Prokaryotes, Eukaryotes [40]
Regulatory Sequence Analysis Tools Series of modular computer programs to detect regulatory signals in non-coding sequencesFungi, Prokaryotes, Metazoa, Protist, Plants [41] [42]
PHANOTATE A tool to annotate phage genomes.Phages [43]
SplicePredictorMethod to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical modelsEukaryotes [44]
VEIL Hidden Markov model to find genes in vertebrate DNA ServerEukaryotes [45]

See also

Related Research Articles

BioJava is an open-source software project dedicated to provide Java tools to process biological data. BioJava is a set of library functions written in the programming language Java for manipulating sequences, protein structures, file parsers, Common Object Request Broker Architecture (CORBA) interoperability, Distributed Annotation System (DAS), access to AceDB, dynamic programming, and simple statistical routines. BioJava supports a huge range of data, starting from DNA and protein sequences to the level of 3D protein structures. The BioJava libraries are useful for automating many daily and mundane bioinformatics tasks such as to parsing a Protein Data Bank (PDB) file, interacting with Jmol and many more. This application programming interface (API) provides various file parsers, data models and algorithms to facilitate working with the standard data formats and enables rapid application development and analysis.

In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.

<span class="mw-page-title-main">Sequence homology</span> Shared ancestry between DNA, RNA or protein sequences

Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a speciation event (orthologs), or a duplication event (paralogs), or else a horizontal gene transfer event (xenologs).

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.

<span class="mw-page-title-main">Steven Salzberg</span> American biologist and computer scientist

Steven Lloyd Salzberg is an American computational biologist and computer scientist who is a Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics at Johns Hopkins University, where he is also Director of the Center for Computational Biology.

Rfam is a database containing information about non-coding RNA (ncRNA) families and other structured RNA elements. It is an annotated, open access database originally developed at the Wellcome Trust Sanger Institute in collaboration with Janelia Farm, and currently hosted at the European Bioinformatics Institute. Rfam is designed to be similar to the Pfam database for annotating protein families.

<i>k</i>-mer

In bioinformatics, k-mers are substrings of length contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides, k-mers are capitalized upon to assemble DNA sequences, improve heterologous gene expression, identify species in metagenomic samples, and create attenuated vaccines. Usually, the term k-mer refers to all of a sequence's subsequences of length , such that the sequence AGAT would have four monomers, three 2-mers, two 3-mers and one 4-mer (AGAT). More generally, a sequence of length will have k-mers and total possible k-mers, where is number of possible monomers.

A ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.

MUMmer is a bioinformatics software system for sequence alignment. It is based on the suffix tree data structure and is one of the fastest and most efficient systems available for this task, enabling it to be applied to very long sequences. It has been widely used for comparing different genomes to one another. In recent years, it has become a popular algorithm for comparing genome assemblies to one another, which allows scientists to determine how a genome has changed after adding more DNA sequence or after running a different genome assembly program. The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs. The original algorithms in the MUMMER software package were designed by Art Delcher, Simon Kasif and Steven Salzberg. Mummer was the first whole genome comparison system developed in Bioinformatics. It was originally applied to comparison of two related strains of bacteria.

GeneMark is a generic name for a family of ab initio gene prediction programs developed at the Georgia Institute of Technology in Atlanta. Developed in 1993, original GeneMark was used in 1995 as a primary gene prediction tool for annotation of the first completely sequenced bacterial genome of Haemophilus influenzae, and in 1996 for the first archaeal genome of Methanococcus jannaschii. The algorithm introduced inhomogeneous three-periodic Markov chain models of protein-coding DNA sequence that became standard in gene prediction as well as Bayesian approach to gene prediction in two DNA strands simultaneously. Species specific parameters of the models were estimated from training sets of sequences of known type. The major step of the algorithm computes for a given DNA fragment posterior probabilities of either being "protein-coding" in each of six possible reading frames or being "non-coding". Original GeneMark is an HMM-like algorithm; it can be viewed as approximation to known in the HMM theory posterior decoding algorithm for appropriately defined HMM.

Anders Krogh is a bioinformatician at the University of Copenhagen, where he leads the university's bioinformatics center. He is known for his pioneering work on the use of hidden Markov models in bioinformatics, and is co-author of a widely used textbook in bioinformatics. In addition, he also co-authored one of the early textbooks on neural networks. His current research interests include promoter analysis, non-coding RNA, gene prediction and protein structure prediction.

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable sequence homology.

<span class="mw-page-title-main">Sean Eddy</span> American professor at Harvard University

Sean Roberts Eddy is Professor of Molecular & Cellular Biology and of Applied Mathematics at Harvard University. Previously he was based at the Janelia Research Campus from 2006 to 2015 in Virginia. His research interests are in bioinformatics, computational biology and biological sequence analysis. As of 2016 projects include the use of Hidden Markov models in HMMER, Infernal Pfam and Rfam.

Molecular recognition features (MoRFs) are small intrinsically disordered regions in proteins that undergo a disorder-to-order transition upon binding to their partners. MoRFs are implicated in protein-protein interactions, which serve as the initial step in molecular recognition. MoRFs are disordered prior to binding to their partners, whereas they form a common 3D structure after interacting with their partners. As MoRF regions tend to resemble disordered proteins with some characteristics of ordered proteins, they can be classified as existing in an extended semi-disordered state.

SEA-PHAGES stands for Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science; it was formerly called the National Genomics Research Initiative. This was the first initiative launched by the Howard Hughes Medical Institute (HHMI) Science Education Alliance (SEA) by their director Tuajuanda C. Jordan in 2008 to improve the retention of Science, technology, engineering, and mathematics (STEM) students. SEA-PHAGES is a two-semester undergraduate research program administered by the University of Pittsburgh's Graham Hatfull's group and the Howard Hughes Medical Institute's Science Education Division. Students from over 100 universities nationwide engage in authentic individual research that includes a wet-bench laboratory and a bioinformatics component.

<span class="mw-page-title-main">Wojciech Karlowski</span> Polish biologist specializing in molecular biology and bioinformatics

Wojciech Maciej Karlowski is a Polish biologist specializing in molecular biology and bioinformatics, and a full professor in biological sciences. He is Head of the Department of Computational Biology at the Faculty of Biology at the Adam Mickiewicz University in Poznan. His major scientific interests include identification of non-coding RNAs, genomics, high-throughput analyses, and functional annotation of biological sequences.

References

  1. Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (Apr 2021). "FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences". BMC Bioinformatics. 44 (9): e89. doi: 10.1186/s12859-021-04120-9 . PMC   8056616 . PMID   33879057.
  2. Rho M, Tang H, Ye Y (November 2010). "FragGeneScan: predicting genes in short and error-prone reads". Nucleic Acids Research. 38 (20): e191. doi:10.1093/nar/gkq747. PMC   2978382 . PMID   20805240.
  3. Nishikawa, Tetsuo; Ota, Toshio; Isogai, Takao (2000-11-01). "Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences". Bioinformatics. 16 (11): 960–967. doi: 10.1093/bioinformatics/16.11.960 . ISSN   1367-4803. PMID   11159307.
  4. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (March 2010). "Prodigal: prokaryotic gene recognition and translation initiation site identification". BMC Bioinformatics. 11: 119. doi:10.1186/1471-2105-11-119. PMC   2848648 . PMID   20211023.
  5. Keller O, Kollmar M, Stanke M, Waack S (March 2011). "A novel hybrid gene prediction method employing protein multiple sequence alignments". Bioinformatics. 27 (6): 757–63. doi: 10.1093/bioinformatics/btr010 . PMID   21216780.
  6. Li, Heng; Liu, Jin-Song; Xu, Zhao; Jin, Jiao; Fang, Lin; Gao, Lei; Li, Yu-Dong; Xing, Zi-Xing; Gao, Shao-Gen; Liu, Tao; Li, Hai-Hong (2005-07-01). "Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome". Journal of Computer Science and Technology. 20 (4): 446–453. doi:10.1007/s11390-005-0446-x. ISSN   1860-4749. S2CID   13497894.
  7. Bajic, Vladimir B.; Seah, Seng Hong; Chong, Allen; Zhang, Guanglan; Koh, Judice L. Y.; Brusic, Vladimir (2002-01-01). "Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters". Bioinformatics. 18 (1): 198–199. doi: 10.1093/bioinformatics/18.1.198 . ISSN   1367-4803. PMID   11836231.
  8. Nielsen, P.; Krogh, A. (2005-12-15). "Large-scale prokaryotic gene prediction and comparison to genome annotation". Bioinformatics. 21 (24): 4322–4329. doi: 10.1093/bioinformatics/bti701 . ISSN   1367-4803. PMID   16249266.
  9. Larsen, Thomas Schou; Krogh, Anders (2003-06-03). "EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance". BMC Bioinformatics. 4 (1): 21. doi:10.1186/1471-2105-4-21. ISSN   1471-2105. PMC   521197 . PMID   12783628.
  10. Foissac S, Gouzy J, Rombauts S, Mathé C, Amselem J, Sterck L, de Peer YV, Rouzé P, Schiex T (May 2008). "Genome annotation in plants and fungi: EuGene as a model platform". Current Bioinformatics. 3 (2): 87–97. doi:10.2174/157489308784340702.
  11. Sallet, Erika; Gouzy, Jérôme; Schiex, Thomas (2019), Kollmar, Martin (ed.), "EuGene: An Automated Integrative Gene Finder for Eukaryotes and Prokaryotes", Gene Prediction: Methods and Protocols, Methods in Molecular Biology, New York, NY: Springer, vol. 1962, pp. 97–120, doi:10.1007/978-1-4939-9173-0_6, ISBN   978-1-4939-9173-0, PMID   31020556, S2CID   131776381 , retrieved 2021-11-24
  12. Salamov AA, Solovyev VV (April 2000). "Ab initio gene finding in Drosophila genomic DNA". Genome Research. 10 (4): 516–22. doi:10.1101/gr.10.4.516. PMC   310882 . PMID   10779491.
  13. Schiex T, Gouzy J, Moisan A, de Oliveira Y (July 2003). "FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences". Nucleic Acids Research. 31 (13): 3738–41. doi:10.1093/nar/gkg610. PMC   169016 . PMID   12824407.
  14. Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F (May 2016). "Using intron position conservation for homology-based gene prediction". Nucleic Acids Research. 44 (9): e89. doi:10.1186/s12859-018-2203-5. PMC   4872089 . PMID   26893356.
  15. Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J (May 2018). "Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi". BMC Bioinformatics. 19 (1): 189. doi:10.1093/nar/gkw092. PMC   5975413 . PMID   29843602.
  16. Yabuki, Yukimitsu; Mukai, Yuri; Swindells, Mark B.; Suwa, Makiko (2004-03-01). "GENIUS II: a high-throughput database system for linking ORFs in complete genomes to known protein three-dimensional structures". Bioinformatics. 20 (4): 596–598. doi: 10.1093/bioinformatics/btg478 . ISSN   1367-4803. PMID   14751990.
  17. Blanco, Enrique; Parra, Genís; Guigó, Roderic (June 2007), "Using geneid to Identify Genes", Current Protocols in Bioinformatics, John Wiley & Sons, Inc., Chapter 4: 4.3.1–4.3.28, doi:10.1002/0471250953.bi0403s18, ISBN   978-0471250951, PMID   18428791
  18. Snyder, Eric E.; Stormo, Gary D. (1995-04-21). "Identification of Protein Coding Regions In Genomic DNA". Journal of Molecular Biology. 248 (1): 1–18. doi:10.1006/jmbi.1995.0198. ISSN   0022-2836. PMID   7731036.
  19. Lukashin AV, Borodovsky M (February 1998). "GeneMark.hmm: new solutions for gene finding". Nucleic Acids Research. 26 (4): 1107–15. doi:10.1093/nar/26.4.1107. PMC   147337 . PMID   9461475.
  20. Besemer J, Lomsadze A, Borodovsky M (June 2001). "GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions". Nucleic Acids Research. 29 (12): 2607–18. doi:10.1093/nar/29.12.2607. PMC   55746 . PMID   11410670.
  21. Lomsadze A, Burns PD, Borodovsky M (September 2014). "Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm". Nucleic Acids Research. 42 (15): e119. doi:10.1093/nar/gku557. PMC   4150757 . PMID   24990371.
  22. Zhu W, Lomsadze A, Borodovsky M (July 2010). "Ab initio gene identification in metagenomic sequences". Nucleic Acids Research. 38 (12): e132. doi:10.1093/nar/gkq275. PMC   2896542 . PMID   20403810.
  23. Antonov I, Borodovsky M (June 2010). "Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm". Journal of Bioinformatics and Computational Biology. 8 (3): 535–51. doi: 10.1142/S0219720010004847 . PMID   20556861.
  24. Yeh, Ru-Fang; Lim, Lee P.; Burge, Christopher B. (2001-05-01). "Computational Inference of Homologous Gene Structures in the Human Genome". Genome Research. 11 (5): 803–816. doi:10.1101/gr.175701. ISSN   1088-9051. PMC   311055 . PMID   11337476.
  25. Burge, Chris; Karlin, Samuel (1997-04-25). "Prediction of complete gene structures in human genomic DNA11Edited by F. E. Cohen". Journal of Molecular Biology. 268 (1): 78–94. doi:10.1006/jmbi.1997.0951. ISSN   0022-2836. PMID   9149143.
  26. Burge, Christopher B. (1998-01-01), Salzberg, Steven L.; Searls, David B.; Kasif, Simon (eds.), "Chapter 8 - Modeling dependencies in pre-mRNA splicing signals", New Comprehensive Biochemistry, Computational Methods in Molecular Biology, Elsevier, vol. 32, pp. 129–164, retrieved 2021-11-24
  27. Burge, Christopher B; Karlin, Samuel (1998-06-01). "Finding the genes in genomic DNA". Current Opinion in Structural Biology. 8 (3): 346–354. doi:10.1016/S0959-440X(98)80069-9. ISSN   0959-440X. PMID   9666331.
  28. Delcher, Arthur L.; Bratke, Kirsten A.; Powers, Edwin C.; Salzberg, Steven L. (2007-01-19). "Identifying bacterial genes and endosymbiont DNA with Glimmer". Bioinformatics. 23 (6): 673–679. doi:10.1093/bioinformatics/btm009. ISSN   1460-2059. PMC   2387122 . PMID   17237039.
  29. Delcher, A. (1999-12-01). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research. 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. ISSN   1362-4962. PMC   148753 . PMID   10556321.
  30. Salzberg, S. L.; Delcher, A. L.; Kasif, S.; White, O. (1998-01-01). "Microbial gene identification using interpolated Markov models". Nucleic Acids Research. 26 (2): 544–548. doi:10.1093/nar/26.2.544. ISSN   0305-1048. PMC   147303 . PMID   9421513.
  31. Majoros WH, Pertea M, Salzberg SL (November 2004). "TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders". Bioinformatics. 20 (16): 2878–9. doi: 10.1093/bioinformatics/bth315 . PMID   15145805.
  32. Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2004). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Bioinformatics. 8 (1): 4.9.1–4.9.15. doi:10.1002/0471250953.bi0409s04. ISSN   1934-340X. PMID   18428726.
  33. Uberbacher, Edward C.; Hyatt, Doug; Shah, Manesh (2003). "GrailEXP and Genome Analysis Pipeline for Genome Annotation". Current Protocols in Human Genetics. 39 (1): 6.5.1–6.5.15. doi:10.1002/0471142905.hg0605s39. ISSN   1934-8258. PMID   18428363. S2CID   21431978.
  34. Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, et al. (November 2009). "mGene: accurate SVM-based gene finding with an application to nematode genomes". Genome Research. 19 (11): 2133–43. doi:10.1101/gr.090597.108. PMC   2775605 . PMID   19564452.
  35. Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, et al. (August 2011). "Multiple reference genomes and transcriptomes for Arabidopsis thaliana". Nature. 477 (7365): 419–23. Bibcode:2011Natur.477..419G. doi:10.1038/nature10414. PMC   4856438 . PMID   21874022.
  36. "MORGAN". sites.stat.washington.edu. Retrieved 2021-11-24.
  37. Bedő, Justin; Di Stefano, Leon; Papenfuss, Anthony T (November 2020). "Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix". GigaScience. 9 (11). doi:10.1093/gigascience/giaa121. ISSN   2047-217X. PMC   7672450 . PMID   33205815.
  38. Reese, Martin G (2001-12-01). "Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome". Computers & Chemistry. 26 (1): 51–56. doi:10.1016/S0097-8485(01)00099-7. ISSN   0097-8485. PMID   11765852.
  39. Reese, Martin G.; Eeckman, Frank H.; Kulp, David; Haussler, David (1997-01-01). "Improved Splice Site Detection in Genie". Journal of Computational Biology. 4 (3): 311–323. doi:10.1089/cmb.1997.4.311. PMID   9278062.
  40. "Home - ORFfinder - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2021-11-24.
  41. Santana-Garcia, Walter; Rocha-Acevedo, Maria; Ramirez-Navarro, Lucia; Mbouamboua, Yvon; Thieffry, Denis; Thomas-Chollier, Morgane; Contreras-Moreira, Bruno; van Helden, Jacques; Medina-Rivera, Alejandra (2019-01-01). "RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding". Computational and Structural Biotechnology Journal. 17: 1415–1428. doi:10.1016/j.csbj.2019.09.009. ISSN   2001-0370. PMC   6906655 . PMID   31871587.
  42. Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques (2018-05-02). "RSAT 2018: regulatory sequence analysis tools 20th anniversary". Nucleic Acids Research. 46 (W1): W209–W214. doi:10.1093/nar/gky317. ISSN   0305-1048. PMC   6030903 . PMID   29722874.
  43. McNair, Katelyn; Zhou, Carol; Dinsdale, Elizabeth A.; Souza, Brian; Edwards, Robert A. (2019-11-01). "PHANOTATE: a novel approach to gene identification in phage genomes". Bioinformatics. 35 (22): 4537–4542. doi: 10.1093/bioinformatics/btz265 . ISSN   1367-4803. PMC   6853651 . PMID   31329826.
  44. Brendel, V.; Xing, L.; Zhu, W. (2004-02-05). "Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus". Bioinformatics. 20 (7): 1157–1169. doi: 10.1093/bioinformatics/bth058 . ISSN   1367-4803. PMID   14764557.
  45. Henderson, John; Salzberg, Steven; Fasman, Kenneth H. (1997-01-01). "Finding Genes in DNA with a Hidden Markov Model". Journal of Computational Biology. 4 (2): 127–141. doi:10.1089/cmb.1997.4.127. hdl: 1903/8004 . PMID   9228612.