GeneCards

Last updated
GeneCards
Content
Data types
captured
Human genes and model orthologues
Organisms Homo sapiens
Contact
Research center Crown Human Genome Center, WIS
Primary citation PMID   9097728
Access
Data format HTML
Website www.genecards.org

GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. [1] [2] [3] [4] It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.

Contents

The database aims at providing a comprehensive view of the current available biomedical information about the searched gene, including its aliases and identifiers, the encoded proteins, associated diseases and variations, its function, relevant publications and more. [1] [5] [6] The GeneCards database provides access to free Web resources about more than 350,000 known and predicted human genes, integrated from >150 data resources, such as HGNC, Ensembl, and NCBI. The core gene list is based on NCBI, Ensembl and approved gene symbols published by the HUGO Gene Nomenclature Committee (HGNC). [7] [8] The information is carefully gathered and selected from these databases by its integration engine.

Over time, the GeneCards database has developed a suite of tools (VarElect, GeneALaCart, etc.) that have more specialised capabilities leveraging the database. Since 1998, the GeneCards database has been widely used by bioinformatics, genomics and medical communities for more than 24 years. [7] [8]

History

Since the 1980s, sequence information has become increasingly abundant; subsequently many laboratories realized this and began to store such information in central repositories-the primary database. [9] However, the information provided by the primary sequence databases (lower level databases) focus on different aspects. To gather these scattered data, the Weizmann Institute of Science's Crown Human Genome Centre developed a database called ‘GeneCards’ in 1997. This database mainly dealt with human genome information, human genes, the encoded proteins’ functions, and related diseases, though it has expanded since that time. [1]

Growth

Initially, the GeneCards database had two main features: delivery of integrated biomedical information for a gene in ‘card’ format, and a text-based search engine. Since 1998, the database has integrated more data resources and data types, such as protein expression and gene network information. It has also improved the speed and sophistication of the search engine, and expanded from a gene-centric dogma to contain gene-set analyses. Version 3 of the database gathers information from more than 90 database resources based on a consolidated gene list. It has also added a suite of GeneCards tools which focus on more specific purposes. "GeneNote and GeneAnnot for transcriptome analyses, GeneLoc for genomic locations and markers, GeneALaCart for batch queries and GeneDecks for finding functional partners and for gene set distillations.". The database updates on a 3-year cycle of planning, implementation, development, semi-automated quality assurance, and deployment. Technologies used include Eclipse, Apache, Perl, XML, PHP, Propel, Java, R and MySQL. [7] [8]

Ongoing GeneCards Expansions

Source: [7]

Availability

GeneCards can be freely accessed by non-profit institution for educational and research purpose at https://www.genecards.org/ and academic mirror sites. Commercial usage requires a license.[ citation needed ]

GeneCards Suite

GeneDecks

GeneDecks is a novel analysis tool to identify similar or partner genes, which provides a similarity metric by highlighting shared descriptors between genes, based on GeneCards' unique wealth of combinatorial annotations of human genes.

  1. Annotation combinatory: Using GeneDecks, one can get a set of similar genes for a particular gene with a selected combinatorial annotation. The summary table result in ranking the different level of similarity between the identified genes and the probe gene.
  2. Annotation unification: Different data sources often offer annotations with heterogeneous naming system. Annotation unification of GeneDecks is based on the similarity in GeneCards gene-content space detection algorithms.
  3. Partner hunting: In GeneDecks's Partner Hunter, users give a query gene, and the system seeks similar genes based on combinatorial similarity of weighted attributes.
  4. Set distillation: In Set distiller, users give a set of genes, and the system ranks attributes by their degree of sharing within a given gene set. Like Partner Hunter, it enables sophisticated investigation of a variety of gene sets, of diverse origins, for discovering and elucidating relevant biological patterns, thus enhancing systematic genomics and systems biology scrutiny. [8] [10] [11]

GeneALaCart

GeneALaCart is a gene-set-orientated batch-querying engine based on the popular GeneCards database. It allows retrieval of information about multiple genes in a batch query. [7] [12]

GeneLoc

The GeneLoc suit member presents an integrated human chromosome map, which is very important for designing a custom-made capture chip, based on data integrated by the GeneLoc algorithm. GeneLoc includes further links to GeneCards, NCBI's Human Genome Sequencing, UniGene, and mapping resources. [7] [13]

Usage

Firstly, enter a search term into the blank on the homepages. Searching methods include Keywords, Symbol only, Symbol/Alias/Identifier and Symbol/Alias. [5] The default search option is searching by keywords. When a user searches by keywords, MicroCard and MiniCard are shown. However, when a user searches by Symbol only, they will be directed to GeneCard. [14] Searches may be furthered by clicking on advanced search, where a user can choose section, category, GIFtS, Symbol Source and gene sets directly. Sections include Aliases & Descriptions, Disorders, Drugs & Compounds, Expression in Human Tissues, Function, Genomic Location, Genomic Variants, Orthologs, Paralogs, Pathways & Interactions, Protein Domains/Families, Proteins, Publications, Summaries and Transcripts. The default option is searching for all sections. [5] Categories include Protein-coding, Pseudogenes, RNA genes Genetic Loci, Gene clusters and Uncategorized. The default option is searching for all categories. [5] GIFtS is the GeneCards Inferred Functionality Scores, which gives objective numbers to show the knowledge level about the functionality of human genes. It includes High, Medium, Low, and custom range. [4] [15] Symbol Sources include HGNC (HUGO Gene Nomenclature Committee), EntrezGene (gene-centered information at NCBI), Ensembl, GeneCards RNA genes, CroW21 and so on. [5]

Moreover, the user can choose to search for All GeneCards or Within Gene Subset, which would be more specific and with priority.

Secondly, the search result page shows all relevant minicards. Symbol, Description, Category, GIFtS, GC id and Score are displayed on the page. [5] A user may click on the plus button for each of the mini-cards to open the minicard. Also, the user can click directly on the symbol to see the details of a particular GeneCard.

Expression profile of FAM214A found in normal human tissue, as shown in GeneCards. Gene Cards Expression of FAM214 Gene.PNG
Expression profile of FAM214A found in normal human tissue, as shown in GeneCards.

GeneCards Content

Source: [5]

For a particular GeneCard (example: GeneCard for TGFB1 ), it is consist of the following contents.

  1. Header: The header is made up of gene's symbol, category (i.e. protein-coding), GIFtS(i.e. 74) and GCID(GC19M041837). Different categories have different colors to express: protein-coding, pseudogene, RNA gene, gene cluster, genetic locus, and uncategorized. The background indicates the symbol sources: HGNC Approved Genes, EntrezGene Database, Ensembl Gene Database, or GeneCards Generated Genes.
  2. Aliases: Aliases, as its name indicates, shows synonyms and aliases of the gene according to diverse sources such as HGNC. The right column displays how the aliases associated with the resources and gives previous GC identifiers.
  3. Summaries: The left column is the same with the one in the Aliases, which shows the sources. The right column here gives brief summary on gene's function, localization and effect on phenotype from various sources.
  4. Genomic Views: In addition to sources, this section gives reference DNA sequence, regulatory elements, epigenetics, chromosome band and genomic location of different sources. The red line on the image indicates the GeneLoc integrated location. In particular, if the GeneLoc integrated location is different from the location in Entrez Gene, it is shown in green; Blue is appeared when the GeneLoc integrated location differs from the location in Ensembl. Addition details can be accessed through the links in the section.
  5. Proteins: This section presents annotated information of genes, including recommended name, size, subunit, subcellular location and secondary accessions. Also, post-translational modifications, protein expression data, REF SEQ proteins, ENSEMBL proteins, Reactome Protein details, Human Recombinant Protein Products, Gene Ontology, Antibody Products and Assay Products are introduced.
  6. Protein Domains/Families: This section shows annotated information of protein domains and families.
  7. Function: The function section describes gene function, including: Human phenotypes, bound Targets, shRNA for human and/or mouse/rat, miRNA Gene Targets, RNAi products, microRNA for human and/or mouse/rat orthologs, Gene Editing, Clones, Cell Lines, Animal models, in situ hybridization assays.
  8. Pathways & Interactions: This section shows unified GeneCards pathways and interactions that are from different sources. Unified GeneCards pathways are collected into super-pathways, which displays the connection between different pathways. Interaction shows interactant and interaction details.
  9. Drugs & Compounds: This section connects GeneCards with drugs and compounds. Compounds show chemical compound, action and CAS number. DrugBank compound gives compound, synonyms, CAS number (Chemical Abstracts Registry number), type (transporter/target/carrier/enzyme), actions and PubMed IDs. HMDB and Novoseek show the relationships of chemical compounds, which includes compound, synonyms, CAS number and PubMed IDs (articles related to the compound). BitterDB displays compound, CAS number and SMILES (Simplified Molecular Input Line Entry Specification). PharmGKB gives drug/compound and its annotation.
  10. Transcripts: This section is consist of reference sequence mRNAs, Unigene Cluster and representative Sequence, miRNA products, inhib.RNA products, Clone products, primer products and additional mRNA sequence. Also, the user can gain exon structure from GeneLoc.
  11. Expression: The left column shows the resources of the data. Expression images and data, similar genes, PCR arrays, primers for human and in situ hybridization assays are included in this section.
  12. Orthologs: This section gives orthologs for a particular gene from numbers of species. The table displays the corresponding organism, taxonomic classification, gene, description, human similarity, orthology type and details. It is connected to ENSEMBL Gene Tree, TreeFam Gene Tree, and Aminode. [16]
  13. Paralogs: This section displays paralogs and pseudogenes for a particular gene.
  14. Genomic Variants: The genomic variants show the result of NCBI SNPs/Variants, HapMap linkage disequilibrium report, structural variations, human gene mutation database(HGMD), QIAGEN SeqTarget long-range PCR primers in human, mouse &rat and SABiosciences cancer mutation PCR arrays. The table in this section shows SNP ID, Valid, Clinical significance, Chr pos, Sequence for genomic data, AAChg, Type and More for transcription related data, Allele freq, Pop, Total sample and More for Allele Frequencies. For Valid, the different character represents different validation methods. ‘C’ means by-cluster; ‘A’ is by-2hit-2allele; ‘F’ is by-frequency; ‘H’ is by-hapmap and ‘O’ is by-other-pop. Clinical significance can be one of the following: non-pathogenic, pathogenic, drug-response, histocompatibility, probable-non-pathogenic, probable-pathogenic, untested, unknown and other. Type should be one of these: nonsynon, syn, cds, spl, utr, int, exc, loc, stg, ds500, spa, spd, us2k, us5k, PupaSUITE Designations.
  15. Disorders/Diseases: Shows disorders/diseases associated with the gene.
  16. Publications: Displays publications associated with the gene.
  17. External Searches: Searches more information in PubMed, OMIM and NCBI.
  18. Genome Databases: Other Databases, and specialized Databases.
  19. Intellectual Property: This section gives patent information and licensable technologies.
  20. Products

Applications

GeneCards is used widely in the biological and biomedical fields. For example, S.H. Shah extracted data of early-onset coronary artery disease from GeneCards to identify genes that contributes to the disease. Chromosome 3q13, 1q25 etc. are confirmed to take effects and this paper further discussed the relationship between morbid genes and serum lipoproteins with the help of GeneCard. [17]

Another example is a research study on synthetic lethality in cancer. Synthetic lethality appears when a mutation in a single gene has no effect on the function of a cell but a mutation in an additional gene leads to cell death. This study aimed to find novel methods of treating cancer through blocking the lethality of drugs. GeneCards was used when comparing data of a given target gene with all possible genes. In this process, the annotation sharing score was calculated using GeneDecks Partner Hunter (now called Genes Like Me) to give paralogy. Inactivation targets were extracted after the microarray experiments of resistant and non-resistant neuroblastoma cell lines. [7]

Related Research Articles

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">Ensembl genome database project</span> Scientific project at the European Bioinformatics Institute

Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

The Bioinformatic Harvester was a bioinformatic meta search engine created by the European Molecular Biology Laboratory and subsequently hosted and further developed by KIT Karlsruhe Institute of Technology for genes and protein-associated information. Harvester currently works for human, mouse, rat, zebrafish, drosophila and arabidopsis thaliana based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves tens of thousands of pages every day to scientists and physicians. Since 2014 the service is down.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

Gene nomenclature is the scientific naming of genes, the units of heredity in living organisms. It is also closely associated with protein nomenclature, as genes and the proteins they code for usually have similar nomenclature. An international committee published recommendations for genetic symbols and nomenclature in 1957. The need to develop formal guidelines for human gene names and symbols was recognized in the 1960s and full guidelines were issued in 1979. Several other genus-specific research communities have adopted nomenclature standards, as well, and have published them on the relevant model organism websites and in scientific journals, including the Trends in Genetics Genetic Nomenclature Guide. Scientists familiar with a particular gene family may work together to revise the nomenclature for the entire set of genes when new information becomes available. For many genes and their corresponding proteins, an assortment of alternate names is in use across the scientific literature and public biological databases, posing a challenge to effective organization and exchange of biological information. Standardization of nomenclature thus tries to achieve the benefits of vocabulary control and bibliographic control, although adherence is voluntary. The advent of the information age has brought gene ontology, which in some ways is a next step of gene nomenclature, because it aims to unify the representation of gene and gene product attributes across all species.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

<span class="mw-page-title-main">HUGO Gene Nomenclature Committee</span> Committee for human gene name standards

The HUGO Gene Nomenclature Committee (HGNC) is a committee of the Human Genome Organisation (HUGO) that sets the standards for human gene nomenclature. The HGNC approves a unique and meaningful name for every known human gene, based on a query of experts. In addition to the name, which is usually 1 to 10 words long, the HGNC also assigns a symbol to every gene. As with an SI symbol, a gene symbol is like an abbreviation but is more than that, being a second unique name that can stand on its own just as much as substitute for the longer name. It may not necessarily "stand for" the initials of the name, although many gene symbols do reflect that origin.

The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.

<span class="mw-page-title-main">DNA annotation</span> The process of describing the structure and function of a genome

In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and understand the biological processes in which they participate. Among other things, it identifies the locations of genes and all the coding regions in a genome and determines what those genes do.

The Consensus Coding Sequence (CCDS) Project is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies. The CCDS project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier, and ensures that they are consistently represented by the National Center for Biotechnology Information (NCBI), Ensembl, and UCSC Genome Browser. The integrity of the CCDS dataset is maintained through stringent quality assurance testing and on-going manual curation.

<span class="mw-page-title-main">Proline-rich 12</span> Protein-coding gene in the species Homo sapiens

Proline-rich 12 (PRR12) is a protein of unknown function encoded by the gene PRR12.

<span class="mw-page-title-main">Chitinase domain-containing protein 1</span> Protein-coding gene in the species Homo sapiens

Chitinase domain-containing protein 1 (CHID1) is a highly conserved protein of unknown function located on the short (p) arm of chromosome 11 near the telomere. The protein has 27 introns, which allows for many isoforms of this gene. It has several aliases, the most common of which is Stabilin-1 interacting chitinase-like protein (SI-CLP). As indicated by the alias, CHID1 is known to interact with the protein STAB1. CHID1 is expressed ubiquitously at levels nearly 6 times the average gene, and is conserved very far back to organisms such as Caenorhabditis elegans and possibly some prokaryotes. This protein is known to have carbohydrate binding sites, which could be involved in carbohydrate catabolysis.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Echinobase is a Model Organism Database (MOD). It supports the international research community by providing a centralized, integrated web based resource to access the diverse and rich, functional genomics data of echinoderm evolution, development and gene regulatory networks.

<span class="mw-page-title-main">TMEM81</span> Protein-coding gene in the species Homo sapiens

Transmembrane Protein 81 or TMEM81 is a protein that in humans is encoded by the TMEM81 gene. TMEM81 is a poorly-characterized transmembrane protein which contains an extracellular immunoglobulin domain.

<span class="mw-page-title-main">FAM214B</span> Protein-coding gene in the species Homo sapiens

The FAM214B, also known as protein family with sequence similarity 214, B (FAM214B) is a protein that, in humans, is encoded by the FAM214B gene located on the human chromosome 9. The protein has 538 amino acids. The gene contain 9 exon. There has been studies that there are low expression of this gene in patients with major depression disorder. In most organisms such as mammals, amphibians, reptiles, and birds, there are high levels of gene expression in the bone marrow and blood. For humans in fetal development, FAM214B is mostly expressed in the brains and bone marrow.

<span class="mw-page-title-main">Chromosome 5 open reading frame 47</span> Human C5ORF47 Gene

Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene. It also goes by the alias LOC133491. The human C5ORF47 gene is primarily expressed in the testis.

References

  1. 1 2 3 Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (April 1997). "GeneCards: integrating information about genes, proteins and diseases". Trends in Genetics. 13 (4): 163. doi:10.1016/S0168-9525(97)01103-7. PMID   9097728.
  2. Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (1998). "GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support". Bioinformatics. 14 (8): 656–64. doi: 10.1093/bioinformatics/14.8.656 . PMID   9789091.
  3. Safran M, Solomon I, Shmueli O, Lapidot M, Shen-Orr S, Adato A, et al. (November 2002). "GeneCards 2002: towards a complete, object-oriented, human gene compendium". Bioinformatics. 18 (11): 1542–3. doi: 10.1093/bioinformatics/18.11.1542 . PMID   12424129.
  4. 1 2 Harel A, Inger A, Stelzer G, Strichman-Almashanu L, Dalah I, Safran M, Lancet D (October 2009). "GIFtS: annotation landscape analysis with GeneCards". BMC Bioinformatics. 10 (1): 348. doi: 10.1186/1471-2105-10-348 . PMC   2774327 . PMID   19852797.
  5. 1 2 3 4 5 6 7 "GeneCards". GeneCards. Archived from the original on 2013-10-14. Retrieved 18 Oct 2013.
  6. "GeneCards" . Retrieved 19 Oct 2013.
  7. 1 2 3 4 5 6 7 Stelzer G, Dalah I, Stein TI, Satanower Y, Rosen N, Nativ N, et al. (October 2011). "In-silico human genomics with GeneCards". Human Genomics. 5 (6): 709–17. doi: 10.1186/1479-7364-5-6-709 . PMC   3525253 . PMID   22155609.
  8. 1 2 3 4 Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M, et al. (August 2010). "GeneCards Version 3: the human gene integrator". Database. 2010: baq020. doi:10.1093/database/baq020. PMC   2938269 . PMID   20689021.
  9. Attwood TK, Parry-Smith DJ (1999). Introduction to Bioinformatics. Harlow Longman.
  10. "GeneDecks". GeneCards. Archived from the original on 2013-10-14. Retrieved 20 Oct 2013.
  11. Stelzer G, Inger A, Olender T, Iny-Stein T, Dalah I, Harel A, et al. (December 2009). "GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation" (PDF). Omics. 13 (6): 477–87. doi:10.1089/omi.2009.0069. PMID   20001862. S2CID   33116814. Archived from the original (PDF) on 2020-02-10.
  12. "GeneALaCart". GeneCards. Archived from the original on 2013-09-30. Retrieved 20 Oct 2013.
  13. "GeneLoc". GeneCards. Retrieved 20 Oct 2013.
  14. Sridhar GR, Divakar CH, Hanuman T, Rao AA (2006). "Bioinformatics approach to extract information from genes". Journal of Diabetes in Developing Countries. 26 (4): 149–51. doi: 10.4103/0973-3930.33179 .
  15. Chalifa-Caspi V, Shmueli O, Benjamin-Rodrig H, Rosen N, Shmoish M, Yanai I, et al. (December 2003). "GeneAnnot: interfacing GeneCards with high-throughput gene expression compendia". Briefings in Bioinformatics. 4 (4): 349–60. doi: 10.1093/bib/4.4.349 . PMID   14725348.
  16. Chang KT, Guo J, di Ronza A, Sardiello M (January 2018). "Aminode: Identification of Evolutionary Constraints in the Human Proteome". Scientific Reports. 8 (1): 1357. Bibcode:2018NatSR...8.1357C. doi:10.1038/s41598-018-19744-w. PMC   5778061 . PMID   29358731.
  17. Shah SH, Kraus WE, Crossman DC, Granger CB, Haines JL, Jones CJ, et al. (November 2006). "Serum lipids in the GENECARD study of coronary artery disease identify quantitative trait loci and phenotypic subsets on chromosomes 3q and 5q". Annals of Human Genetics. 70 (Pt 6): 738–48. doi:10.1111/j.1469-1809.2006.00288.x. PMID   17044848. S2CID   24239757.