The social genome is the collection of data about members of a society that is captured in ever-larger and ever-more complex databases (e.g., government administrative data, operational data, social media data etc.). Some have used the term digital footprint to refer to individual traces.
There have been two distinct uses of the term. First, the word Social Genome was used in a letter to the editor submission to Science in response to a seminal article about using big data for social science by King. [1] The letter [2] was published, but the word social genome was edited out of the letter. The original submission states, “A well-integrated federated data system of administrative databases updated on an ongoing basis could hold a collective representation of our society, our social genome.” Kum and others continue to use the word since 2011, with it being defined in a peer reviewed article in 2013. [3] It states “Today there is a constant flow of data into, out of, and between ever-larger and ever-more complex databases about people. Together, these digital traces collectively capture our social genome, the footprints of our society.” In 2014, a vision paper [4] on population informatics was published which further elaborated on the term.
Second, separately at about the same time, a group of researchers led by the Brookings Institution started the Social Genome Project [5] which built a data-rich model to map the pathway to the Middle class by tracing the life course from birth until middle age. The first paper [6] was published in 2012.
Record linkage is the task of finding records in a data set that refer to the same entity across different data sources. Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier, which may be due to differences in record shape, storage location, or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked.
Public health informatics has been defined as the systematic application of information and computer science and technology to public health practice, research, and learning. It is one of the subdomains of health informatics.
Renaissance Computing Institute (RENCI) was launched in 2004 as a collaboration involving the State of North Carolina, University of North Carolina at Chapel Hill (UNC-CH), Duke University, and North Carolina State University. RENCI is organizationally structured as a research institute within UNC-CH, and its main campus is located in Chapel Hill, NC, a few miles from the UNC-CH campus. RENCI has engagement centers at UNC-CH, Duke University (Durham), and North Carolina State University (Raleigh).
Gilean Alistair Tristram McVean is a professor of statistical genetics at the University of Oxford, fellow of Linacre College, Oxford and co-founder and director of Genomics plc. He also co-chaired the 1000 Genomes Project analysis group.
Cafeteria roenbergensis virus (CroV) is a giant virus that infects the marine bicosoecid flagellate Cafeteria roenbergensis, a member of the microzooplankton community.
GFAJ-1 is a strain of rod-shaped bacteria in the family Halomonadaceae. It is an extremophile that was isolated from the hypersaline and alkaline Mono Lake in eastern California by geobiologist Felisa Wolfe-Simon, a NASA research fellow in residence at the US Geological Survey. In a 2010 Science journal publication, the authors claimed that the microbe, when starved of phosphorus, is capable of substituting arsenic for a small percentage of its phosphorus to sustain its growth. Immediately after publication, other microbiologists and biochemists expressed doubt about this claim, which was robustly criticized in the scientific community. Subsequent independent studies published in 2012 found no detectable arsenate in the DNA of GFAJ-1, refuted the claim, and demonstrated that GFAJ-1 is simply an arsenate-resistant, phosphate-dependent organism.
Literature-based discovery (LBD), also called literature-related discovery (LRD) is a form of knowledge extraction and automated hypothesis generation that uses papers and other academic publications to find new relationships between existing knowledge. Literature-based discovery aims to discover new knowledge by connecting information which have been explicitly stated in literature to deduce connections which have not been explicitly stated.
Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts. Researchers data mine large digital archives to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called Quantitative Analysis of Culture Using Millions of Digitized Books, co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden.
The Estonian Genome Project is a population-based biological database and biobank which was established in 2000 to improve public health in Estonia. It contains health records and biological specimens from a large percentage of the Estonian population.
Jason H. Moore is a translational bioinformatics scientist, biomedical informatician, and human geneticist, the Edward Rose Professor of Informatics and Director of the Institute for Biomedical Informatics at the Perelman School of Medicine at the University of Pennsylvania, where he is also Senior Associate Dean for Informatics and Director of the Division of Informatics in the Department of Biostatistics, Epidemiology, and Informatics.
Translational bioinformatics (TBI) is a field that emerged in the 2010s to study health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.
Christopher G. Chute is a Bloomberg Distinguished Professor at Johns Hopkins University, physician-scientist and biomedical informatician known for biomedical terminologies and health information technology (IT) standards. He chairs the World Health Organization Revision Steering Group for the revision of the International Classification of Diseases (ICD-11).
Nenad Ban is a biochemist born in Zagreb, Croatia who currently works at the ETH Zurich, Swiss Federal Institute of Technology, as a professor of Structural Molecular Biology. He is a pioneer in studying gene expression mechanisms and the participating protein synthesis machinery.
Picozoa, Picobiliphyta, Picobiliphytes, or Biliphytes are protists of a phylum of marine unicellular heterotrophic eukaryotes with a size of less than about 3 micrometers. They were formerly treated as eukaryotic algae and the smallest member of photosynthetic picoplankton before it was discovered they do not perform photosynthesis. The first species identified therein is Picomonas judraskeda. They probably belong in the Archaeplastida as sister of the Rhodophyta.
The field of population informatics is the systematic study of populations via secondary analysis of massive data collections about people. Scientists in the field refer to this massive data collection as the social genome, denoting the collective digital footprint of our society. Population informatics applies data science to social genome data to answer fundamental questions about human society and population health much like bioinformatics applies data science to human genome data to answer questions about individual health. It is an emerging research area at the intersection of SBEH sciences, computer science, and statistics in which quantitative methods and computational tools are used to answer fundamental questions about our society. [[File:Data science.png|alt=Data Science|thumb|Data Science]
tranSMART is an open-source data warehouse designed to store large amounts of clinical data from clinical trials, as well as data from basic research, so that it can be interrogated together for translational research. It is also designed to be used by many people, across organizations. It was developed by Johnson & Johnson, in partnership with Recombinant Data Corporation. The platform was released in Jan 2012 and has been governed by the tranSMART Foundation since its initiation in 2013. In May 2017, the tranSMART Foundation merged with the i2b2 Foundation to create an organization with the key mission to advance the field of precision medicine.
Vineet Bafna is an Indian bioinformatician and professor of computer science and director of bioinformatics program at University of California, San Diego. He was elected a Fellow of the International Society for Computational Biology (ISCB) in 2019 for outstanding contributions to the fields of computational biology and bioinformatics. He has also been a member of the Research in Computational Molecular Biology (RECOMB) conference steering committee.
Melissa Anne Haendel is an American bioinformaticist who is the Chief Research Informatics Officer of the Anschutz Medical Campus of the University of Colorado as well as a Professor of Biochemistry and Molecular Genetics and the Marsico Chair in Data Science. She serves as Director of the Center for Data to Health (CD2H). Her research makes use of data to improve the discovery and diagnosis of diseases. During the COVID-19 pandemic, Haendel joined with the National Institutes of Health to launch the National COVID Cohort Collaborative (N3C), which looks to identify the risk factors that can predict severity of disease outcome and help to identify treatments.
Nicholas Pierino Tatonetti is an American bioscientist who is Vice Chair of Operations in the Department of Computational Biomedicine and Associate Director of Computational Oncology in the Cancer Center at Cedars-Sinai Medical Center in Los Angeles, California.
Daniel Richard Masys is an American biotechnologist and academic. He is an Affiliate Professor of Biomedical and Health Informatics at the University of Washington.