In genetics, the gene density of an organism's genome is the ratio of the number of genes per number of base pairs, usually written in terms of a million base pairs, or megabase (Mb). The human genome has a gene density of 11-15 genes/Mb, while the genome of the C. elegans roundworm is estimated to have 200. [1]
Seemingly simple organisms, such as bacteria and amoebas, have a much higher gene density than humans. Bacterial DNA has a gene density on the order of 500-1000 genes/Mb. This is due several factors, including that the fact that bacterial DNA has no introns. There are also fewer codons in bacterial genes. [2]
species | number of genes | base pairs in genome (Mb) | gene density |
---|---|---|---|
Babesia microti | 3,685 [3] | 6.4 [4] | 575.8 |
Besnoitia besnoiti | 8,546 [5] | 58.8 [6] | 145.3 |
Cryptosporidium parvum | 3,887 [7] | 9.1 [8] | 427.1 |
Giardia intestinalis | 5,388 [9] | 11.5 [10] | 468.5 |
Leishmania donovani | 8,123 [11] | 32.4 [12] | 250.7 |
Plasmodium falciparum | 5,683 [13] | 23.5 [14] | 241.8 |
Phytophthora infestans | 19,344 [15] | 203.2 [16] | 95.2 |
Tetrahymena thermophila | 26,996 [17] | 130.4 [18] | 207 |
Theileria parva | 4,141 [19] | 8.3 [20] | 498.9 |
Toxoplasma gondii | 8,925 [21] | 64.5 [22] | 138.4 |
Trypanosoma cruzi | 25,183 [23] | 32.5 [24] | 774.9 |
species | common name | number of genes | base pairs in genome (Mb) | gene density |
---|---|---|---|---|
Ananas comosus | pineapple | 25,758 | 382 [69] | 67.4 |
Arabidopsis thaliana | thale cress | 38,311 [70] | 119 [71] | 321.9 |
Asparagus officinalis | asparagus | 32,237 [72] | 1,187 [73] | 27.2 |
Brassica rapa | field mustard | 51,592 [74] | 333 [75] | 154.9 |
Cannabis sativa | cannabis | 31,170 [76] | 876 [77] | 35.6 |
Cajanus cajan | pigeon pea | 33,495 [78] | 593 [79] | 56.5 |
Capsicum annuum | pepper | 41,729 [80] | 2,909 [81] | 14.3 |
Coffea arabica | coffee | 56,902 [82] | 1,094 [83] | 52 |
Glycine max | soybean | 59,905 [84] | 995.7 [85] | 60.2 |
Juglans regia | English walnut | 37,996 [86] | 634.7 [87] | 59.9 |
Nicotiana tabacum | tobacco | 74,273 [88] | 3,734.2 [89] | 19.9 |
Olea europaea | olive | 47,911 [90] | 1,316.7 [91] | 36.4 |
Oryza sativa | rice | 35,223 [92] | 385 [93] | 91.5 |
Panicum hallii | grass | 31,528 [94] | 511.6 [95] | 61.6 |
Physcomitrium patens | spreading earthmoss | 23,747 [96] | 472 [97] | 50.3 |
Prunus dulcis | almond | 26,936 [98] | 236.9 [99] | 113.7 |
Prunus persica | peach | 26,412 [100] | 214.2 [101] | 123.3 |
Punica granatum | pomegrante | 29,281 [102] | 308.4 [103] | 94.9 |
Rosa chinensis | rose | 40,349 [104] | 513.9 [105] | 78.5 |
Solanum lycopersicum | tomato | 31,263 [106] | 813 [107] | 38.5 |
Solanum tuberosum | potato | 33,606 [108] | 760 [109] | 44.2 |
Theobroma cacao | cocoa | 24,957 [110] | 335 [111] | 74.5 |
Zea mays | corn | 49,796 [112] | 2,197 [113] | 22.7 |
species | common name | number of genes | base pairs in genome (Mb) | gene density |
---|---|---|---|---|
Aedes aegypti | yellow fever mosquito | 19,623 [114] | 1,274 [115] | 15.4 |
Anopheles gambiae | African malaria mosquito | 13,247 [116] | 250 [117] | 53 |
Apis mellifera | honey bee | 12,374 [118] | 226 [119] | 54.8 |
Caenorhabditis elegans | C. elegans | 47,632 [120] | 102 [121] | 467 |
Daphnia magna | water flea | 21,549 [122] | 126 [123] | 171 |
Drosophila melanogaster | fruit fly | 17,868 [124] | 137 [125] | 130.4 |
Hydra vulgaris | hydra | 22,980 [126] | 1,055 [127] | 21.8 |
Limulus polyphemus | horseshoe crab | 27,318 [128] | 1,828 [129] | 14.9 |
Octopus sinensis | octopus | 29,784 [130] | 2,719 [131] | 11 |
Pediculus humanus | human louse | 10,993 [132] | 110 [133] | 99.9 |
Strongylocentrotus purpuratus | purple sea urchin | 33,504 [134] | 921 [135] | 36.4 |
species | common name | number of genes | base pairs in genome (Mb) | gene density |
---|---|---|---|---|
Alligator mississippiensis | American alligator | 25,279 [136] | 2,161 [137] | 11.7 |
Anolis carolinensis | anole lizard | 22,293 [138] | 1,799 [139] | 12.4 |
Astyanax mexicanus | Mexican tetra | 31,695 [140] | 1,263 [141] | 25.1 |
Columba livia | pigeon | 26,679 [142] | 1,063 [143] | 25.1 |
Danio rerio | zebrafish | 47,350 [144] | 1,408 [145] | 33.6 |
Gallus gallus | chicken | 24,402 [146] | 1,043 [147] | 23.4 |
Nothobranchius furzeri | turquoise killifish | 25,387 [148] | 1,052 [149] | 24.1 |
Oryzias latipes | Japanese rice fish | 26,846 [150] | 746 [151] | 36 |
Petromyzon marinus | sea lamprey | 22,167 [152] | 987 [153] | 22.5 |
Taeniopygia guttata | zebra finch | 21,543 [154] | 1,069 [155] | 20.2 |
Takifugu rubripes | Japanese puffer | 27,413 [156] | 384 [157] | 71.8 |
Xenopus tropicalis | tropical clawed frog | 28,863 [158] | 1,468 [159] | 19.7 |
Xenopus laevis | African clawed frog | 37,283 [160] | 2,718 [161] | 13.7 |
species | common name | number of genes | base pairs in genome (Mb) | gene density |
---|---|---|---|---|
Acinonyx jubatus | cheetah | 34,482 [162] | 2,378 [163] | 14.5 |
Ailuropoda melanoleuca | giant panda | 32,950 [164] | 2,371 [165] | 13.9 |
Aotus nancymaae | night monkey | 31,331 [166] | 2,861 [167] | 11 |
Artibeus jamaicensis | Jamaican fruit-eating bat | 31,480 [168] | 2,316 [169] | 13.6 |
Arvicanthis niloticus | African grass rat | 31,912 [170] | 2,496 [171] | 12.8 |
Balaenoptera acutorostrata | minke whale | 26,805 [172] | 2,431 [173] | 11 |
Balaenoptera musculus | blue whale | 27,194 [174] | 2,105 [175] | 12.9 |
Bison bison | American bison | 27,488 [176] | 2,828 [177] | 9.7 |
Bos mutus | wild yak | 26,159 [178] | 2,703 [179] | 9.7 |
Bos taurus | cattle | 35,061 [180] | 2,715 [181] | 12.9 |
Canis lupus familiaris | dog | 36,717 [182] | 2,370 [183] | 15.5 |
Chinchilla lanigera | long-tailed chinchilla | 30,378 [184] | 2,390 [185] | 12.7 |
Felis catus | cat | 35,521 [186] | 2,521 [187] | 14 |
Gorilla gorilla | gorilla | 35,419 [188] | 3,063 [189] | 11.6 |
Homo sapiens | human | 44,507 [190] | 2,861 [191] | 15.5 |
Macaca mulatta | Rhesus monkey | 40,164 [192] | 2,970 [193] | 13.5 |
Mus musculus | house mouse | 42,823 [194] | 2,689 [195] | 15.9 |
Pan troglodytes | chimpanzee | 40,769 [196] | 3,050 [197] | 13.4 |
Rattus norvegicus | brown rat | 38,371 [198] | 2,743 [199] | 14 |
Sus scrofa | pig | 30,345 [200] | 2,459 [201] | 12.3 |
Gene count is calculated by the number of gene flags in the latest version of a given species gtf annotation file. Bear in mind that well studied organisms will probably have higher gene counts because these species have better annotation, therefore these numbers are estimates for comparison.
Receptor expression-enhancing protein 5 is a protein that in humans is encoded by the REEP5 gene. Receptor Expression Enhancing Protein is a protein encoded for in Humans by the REEP5 gene.
C2CD4D, or C2 calcium-dependent domain-containing protein 4D is a protein product of the human genome. The gene that codes for this protein is found on chromosome 1, from 150,076,963 to 150,079,657. The gene contains 2 exons and encodes 353 amino acids. Synonyms for C2CD4D are "FAM148D" and NP_001129475. C2CD4D contains a conserved metal binding domain that is a known as Protein kinase C conserved region 2, subgroup 1. This motif is known to be a member of the C2 superfamily, which is present in phospholipases, protein kinases C, and synaptotagmins. The amino acid sequence of C2CD4D can be accessed at Prior to any post translational modification, C2CD4D has a molecular weight of 37.6 kdal. Although scientists have not yet determined where C2CD4D functions within the cell, C2CD4D has a predicted isoelectric point of 11.636 which severely limits the places in which it can be effective. In addition, C2CD4D does not contain any predicted transmembrane domains or any predicted signal peptides.
A reference genome is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead, a reference provides a haploid mosaic of different DNA sequences from each donor. For example, one of the most recent human reference genomes, assembly GRCh38/hg38, is derived from >60 genomic clone libraries. There are reference genomes for multiple species of viruses, bacteria, fungus, plants, and animals. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.
TIGRFAMs is a database of protein families designed to support manual and automated genome annotation. Each entry includes a multiple sequence alignment and hidden Markov model (HMM) built from the alignment. Sequences that score above the defined cutoffs of a given TIGRFAMs HMM are assigned to that protein family and may be assigned the corresponding annotations. Most models describe protein families found in Bacteria and Archaea.
C9orf135 is a gene that encodes a 229 amino acid protein. It is located on Chromosome 9 of the Homo sapiens genome at 9q12.21. The protein has a transmembrane domain from amino acids 124-140 and a glycosylation site at amino acid 75. C9orf135 is part of the GRCh37 gene on Chromosome 9 and is contained within the domain of unknown function superfamily 4572. Also, c9orf135 is known by the name of LOC138255 which is a description of the gene location on Chromosome 9.1.
Transmembrane Protein 176B, or TMEM176B is a transmembrane protein that in humans is encoded by the TMEM176B gene. It is thought to play a role in the process of maturation of dendritic cells.
Uncharacterized protein Chromosome 16 Open Reading Frame 71 is a protein in humans, encoded by the C16orf71 gene. The gene is expressed in epithelial tissue of the respiratory system, adipose tissue, and the testes. Predicted associated biological processes of the gene include regulation of the cell cycle, cell proliferation, apoptosis, and cell differentiation in those tissue types. 1357 bp of the gene are antisense to spliced genes ZNF500 and ANKS3, indicating the possibility of regulated alternate expression.
Retrotransposon Gag Like 6 is a protein encoded by the RTL6 gene in humans. RTL6 is a member of the Mart family of genes, which are related to Sushi-like retrotransposons and were derived from fish and amphibians. The RTL6 protein is localized to the nucleus and has a predicted leucine zipper motif that is known to bind nucleic acids in similar proteins, such as LDOC1.
Propionispira raffinosivorans is a motile, obligate anaerobic, gram-negative bacteria. It was originally isolated from spoiled beer and believed to have some causative effect in beer spoilage. Since then, it has been taxonomically reclassified and proven to play a role in anaerobic beer spoilage, because of its production of acids, such as acetic and propionic acid, during fermentation
C2orf81 is a human gene encoding protein c2orf81, which is predicted to have nuclear localization.
Yuyuevirus is a genus of negative-strand RNA viruses which infect invertebrates. Member viruses have bisegmented genomes. It is the only genus in the family Yueviridae, which in turn is the only family in the order Goujianvirales and class Yunchangviricetes. Two species are recognized: Beihai yuyuevirus and Shahe yuyuevirus.
Proline-rich protein 16 (PRR16) is a protein coding gene in Homo sapiens. The protein is known by the alias Largen.
Transmembrane protein 221 (TMEM221) is a protein that in humans is encoded by the TMEM221 gene. The function of TMEM221 is currently not well understood.
SMIM15(small integral membrane protein 15) is a protein in humans that is encoded by the SMIM15 gene. It is a transmembrane protein that interacts with PBX4. Deletions where SMIM15 is located have produced mental defects and physical deformities. The gene has been found to have ubiquitous but variable expression in many tissues throughout the body.
TMEM275 is a protein that in humans is encoded by the TMEM275 gene. TMEM275 has two, highly-conserved, helical trans-membrane regions. It is predicted to reside within the plasma membrane or the endoplasmic reticulum's membrane.
Chromosome 9 open reading frame 85, commonly known as C9orf85, is a protein in Homo sapiens encoded by the C9orf85 gene. The gene is located at 9q21.13. When spliced, four different isoforms are formed. C9orf85 has a predicted molecular weight of 20.17 kdal. Isoelectric point was found to be 9.54. The function of the gene has not yet been confirmed, however it has been found to show high levels of expression in cells of high differentiation.
Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.
Chromosome 12 Open Reading Frame 50 (C12orf50) is a protein-encoding gene which in humans encodes for the C12orf50 protein. The accession id for this gene is NM_152589. The location of C12orf50 is 12q21.32. It covers 55.42 kb, from 88429231 to 88373811, on the reverse strand. Some of the neighboring genes to C12orf50 are RPS4XP15, LOC107984542, and C12orf29. RPS4XP15 is upstream C12orf50 and is on the same strand. LOC107984542 and C12orf29 are both downstream. LOC107984542 is on the opposite strand while C12orf29 is on the same strand. C12orf50 has six isoforms. This page is focusing on isoform X1. C12orf50 isoform X1 is 1711 nucleotides long and has a protein with a length of 414 aa.
TBC1D30 is a gene in the human genome that encodes the protein of the same name. This protein has two domains, one of which is involved in the processing of the Rab protein. Much of the function of this gene is not yet known, but it is expressed mostly in the brain and adrenal cortex.