Human disease network

Last updated

A human disease network is a network of human disorders and diseases with reference to their genetic origins or other features. More specifically, it is the map of human disease associations referring mostly to disease genes. For example, in a human disease network, two diseases are linked if they share at least one associated gene. A typical human disease network usually derives from bipartite networks which consist of both diseases and genes information. Additionally, some human disease networks use other features such as symptoms and proteins to associate diseases.

Contents

Human disease network. It is generated as a Plotly plot . Nodes are diseases. Two diseases are connected if they share a genetic component. HDN.png
Human disease network. It is generated as a Plotly plot . Nodes are diseases. Two diseases are connected if they share a genetic component.

History

In 2007, Goh et al. constructed a disease-gene bipartite graph using information from OMIM database and termed human disease network. [2] In 2009, Barrenas et al. derived complex disease-gene network using GWAs (Genome Wide Association studies). [3] In the same year, Hidalgo et al. published a novel way of building human phenotypic disease networks in which diseases were connected according to their calculated distance. [4] In 2011, Cusick et al. summarized studies on genotype-phenotype associations in cellular context. [5] In 2014, Zhou, et al. built a symptom-based human disease network by mining biomedical literature database. [6]

Properties

A large-scale human disease network shows scale-free property. The degree distribution follows a power law suggesting that only a few diseases connect to a large number of diseases, whereas most diseases have few links to others. Such network also shows a clustering tendency by disease classes. [2] [7]

In a symptom-based disease network, disease are also clustered according to their categories. Moreover, diseases sharing the same symptom are more likely to share the same genes and protein interactions. [6]

See also

Related Research Articles

Bioinformatics Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques.

Single-nucleotide polymorphism Single nucleotide position in genomic DNA at which different sequence alternatives exist

In genetics, a single-nucleotide polymorphism is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently large fraction of the population, many publications do not apply such a frequency threshold.

An intergenic region (IGR) is a stretch of DNA sequences located between genes. Intergenic regions are a subset of noncoding DNA. Occasionally some intergenic DNA acts to control genes nearby, but most of it has no currently known function. It is one of the DNA sequences sometimes referred to as junk DNA, though it is only one phenomenon labeled such and in scientific studies today, the term is less used. Recently transcribed RNA from the DNA fragments in intergenic regions were known as "dark matter" or "dark matter transcripts".

The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest, and phenotypes or disease states. This is in contrast to genome-wide association studies (GWAS), which is a hypothesis-free approach that scans the entire genome for associations between common genetic variants and traits of interest. Candidate genes are most often selected for study based on a priori knowledge of the gene's biological functional impact on the trait or disease in question. The rationale behind focusing on allelic variation in specific, biologically relevant regions of the genome is that certain alleles within a gene may directly impact the function of the gene in question and lead to variation in the phenotype or disease state being investigated. This approach often uses the case-control study design to try to answer the question, "Is one allele of a candidate gene more frequently seen in subjects with the disease than in subjects without the disease?" Candidate genes hypothesized to be associated with complex traits have generally not been replicated by subsequent GWASs or highly powered replication attempts. The failure of candidate gene studies to shed light on the specific genes underlying such traits has been ascribed to insufficient statistical power, low prior probability that scientists can correctly guess a specific allele within a specific gene that is related to a trait, poor methodological practices, and data dredging.

Hox genes, a subset of homeobox genes, are a group of related genes that specify regions of the body plan of an embryo along the head-tail axis of animals. Hox proteins encode and specify the characteristics of 'position', ensuring that the correct structures form in the correct places of the body. For example, Hox genes in insects specify which appendages form on a segment, and Hox genes in vertebrates specify the types and shape of vertebrae that will form. In segmented animals, Hox proteins thus confer segmental or positional identity, but do not form the actual segments themselves.

Gene Sequence of DNA or RNA that codes for an RNA or protein product

In biology, a gene is a basic unit of heredity and a sequence of nucleotides in DNA that encodes the synthesis of a gene product, either RNA or protein.

HADHB Protein-coding gene in the species Homo sapiens

Trifunctional enzyme subunit beta, mitochondrial (TP-beta) also known as 3-ketoacyl-CoA thiolase, acetyl-CoA acyltransferase, or beta-ketothiolase is an enzyme that in humans is encoded by the HADHB gene.

Small nucleolar RNA SNORD116 Non-coding RNA molecule involved in Prader–Willi syndrome

In molecular biology, SNORD116 is a non-coding RNA (ncRNA) molecule which functions in the modification of other small nuclear RNAs (snRNAs). This type of modifying RNA is usually located in the nucleolus of the eukaryotic cell which is a major site of snRNA biogenesis. It is known as a small nucleolar RNA (snoRNA) and also often referred to as a guide RNA.

Genome-wide association study Study of genetic variants in different individuals

In genomics, a genome-wide association study, also known as whole genome association study, is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

HOXC6

Homeobox protein Hox-C6 is a protein that in humans is encoded by the HOXC6 gene. Hox-C6 expression is highest in the fallopian tube and ovary. HoxC6 has been highly expressed in many types of cancers including prostate, breast, and esophageal squamous cell cancer.

1000 Genomes Project International research effort on genetic variation

The 1000 Genomes Project, launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one thousand anonymous participants from a number of different ethnic groups within the following three years, using newly developed technologies which were faster and less expensive. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.

TOMM40 Protein-coding gene in the species Homo sapiens

Translocase of outer mitochondrial membrane 40 homolog (yeast), also known as TOMM40, is a protein which in humans is encoded by the TOMM40 gene.

Expression quantitative trait loci (eQTLs) are genomic loci that explain variation in expression levels of mRNAs.

SHOX2

Short-stature homeobox 2, also known as homeobox protein Og12X or paired-related homeobox protein SHOT, is a protein that in humans is encoded by the SHOX2 gene.

RNF213

Ring finger protein 213 is a protein that in humans is encoded by the RNF213 gene. RNF213 is a 591kDa cytosolic E3 Ubiquitin-ligase with RING finger and AAA+ ATPase domains.

Essential genes are indispensable genes for organisms to grow and reproduce offspring under certain environment. However, being essential is highly dependent on the circumstances in which an organism lives. For instance, a gene required to digest starch is only essential if starch is the only source of energy. Recently, systematic attempts have been made to identify those genes that are absolutely required to maintain life, provided that all nutrients are available. Such experiments have led to the conclusion that the absolutely required number of genes for bacteria is on the order of about 250–300. Essential genes of single-celled organisms encode proteins for three basic functions including genetic information processing, cell envelopes and energy production. Those gene functions are used to maintain a central metabolism, replicate DNA, translate genes into proteins, maintain a basic cellular structure, and mediate transport processes into and out of the cell. Compared with single-celled organisms, multicellular organisms have more essential genes related to communication and development. Most of the essential genes in viruses are related to the processing and maintenance of genetic information. In contrast to most single-celled organisms, viruses lack many essential genes for metabolism, which forces them to hijack the host's metabolism. Most genes are not essential but convey selective advantages and increased fitness. Hence, the vast majority of genes are not essential and many can be deleted without consequences, at least under most circumstances.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

Network medicine is the application of network science towards identifying, preventing, and treating diseases. This field focuses on using network topology and network dynamics towards identifying diseases and developing medical drugs. Biological networks, such as protein-protein interactions and metabolic pathways, are utilized by network medicine. Disease networks, which map relationships between diseases and biological factors, also play an important role in the field. Epidemiology is extensively studied using network science as well; social networks and transportation networks are used to model the spreading of disease across populations. Network medicine is a medically focused area of systems biology. An introduction to the field can be found here: https://web.uniroma1.it/stitch/node/5613.

The human interactome is the set of protein–protein interactions that occur in human cells. The sequencing of reference genomes, in particular the Human Genome Project, has revolutionized human genetics, molecular biology, and clinical medicine. Genome-wide association study results have led to the association of genes with most Mendelian disorders, and over 140 000 germline mutations have been associated with at least one genetic disease. However, it became apparent that inherent to these studies is an emphasis on clinical outcome rather than a comprehensive understanding of human disease; indeed to date the most significant contributions of GWAS have been restricted to the “low-hanging fruit” of direct single mutation disorders, prompting a systems biology approach to genomic analysis. The connection between genotype and phenotype remain elusive, especially in the context of multigenic complex traits and cancer. To assign functional context to genotypic changes, much of recent research efforts have been devoted to the mapping of the networks formed by interactions of cellular and genetic components in humans, as well as how these networks are altered by genetic and somatic disease.

The first phenotypic disease network was constructed by Hidalgo et al. (2009) to help understand the origins of many diseases and the links between them. Hidalgo et al. (2009) defined diseases as specific sets of phenotypes that affect one or several physiological systems, and compiled data on pairwise comorbidity correlations for more than 10,000 diseases reconstructed from over 30 million medical records. Hidalgo et al. (2009) presented their data in the form of a network with diseases as the nodes and comorbidity correlations as the links. Intuitively, the phenotypic disease network (PDN) can be seen as a map of the phenotypic space whose structure can contribute to the understanding of disease progression.

References

  1. "Jupyter Notebook Viewer".
  2. 1 2 Goh, Kwang-Il, et al. "The human disease network." Proceedings of the National Academy of Sciences 104.21 (2007): 8685-8690.
  3. Barrenas, Fredrik, et al. "Network properties of complex human disease genes identified through genome-wide association studies." PLoS ONE 4.11 (2009): e8090.
  4. Hidalgo, César A., et al. "A dynamic network approach for the study of human phenotypes." PLoS Computational Biology 5.4 (2009): e1000353.
  5. Vidal, Marc, Michael E. Cusick, and Albert-Laszlo Barabasi. "Interactome networks and human disease." Cell 144.6 (2011): 986-998.
  6. 1 2 Zhou, XueZhong, et al. "Human symptoms–disease network." Nature Communications 5 (2014).
  7. Kannan, Venkateshan et al. "Conditional Disease Development extracted from Longitudinal Health Care Cohort Data using Layered Network Construction". Scientific Reports 6, 26170 (2016)