National Database for Autism Research

Last updated
National Database for Autism Research
Founded2006
Founder National Institutes of Health
Headquarters
Rockville, Maryland
Key people
Greg Farber, Ph.D. (Director)
Dan Hall (Manager)
Brian Koser (Operations Manager)
Gretchen Navidi (Principal Analyst, Grants Management)
Svetlana Novikova, Ph.D. (Principal Analyst, Genomics)
Anne Sperling, Ph.D. (Health Science Policy Analyst)
Website ndar.nih.gov

The National Database for Autism Research (NDAR) is a secure research data repository promoting scientific data sharing and collaboration among autism spectrum disorder (ASD) investigators. The project was launched in 2006 as a joint effort between five institutes and centers at the National Institutes of Health (NIH): the National Institute of Mental Health (NIMH), the National Institute of Child Health and Human Development (NICHD), the National Institute of Neurological Disorders and Stroke (NINDS), the National Institute of Environmental Health Sciences (NIEHS), and the Center for Information Technology (CIT). The goal of NDAR is to provide a shared common platform for data collection, retrieval, and archiving to accelerate the advancement of research on autism spectrum disorders. The largest repository of its kind, NDAR makes available data at all levels of biological and behavioral organization for all data types. As of November 2013, data from over 90,000 research participants are available to qualified investigators through the NDAR portal. Summary information about the available data is accessible through the NDAR public website.

Contents

Background

In response to the heightened societal concern over ASD, the United States Congress passed the Combating Autism Act (CAA) of 2006 (P.L. 109–416). [1] Through this Act, Congress intended to rapidly increase and improve coordination of scientific discovery in ASD research. The CAA mandated the creation of the Interagency Autism Coordinating Committee (IACC), a federal government advisory panel charged with developing and annually updating a Strategic Plan for ASD research. This plan provides a blueprint for autism research that is advisory to Congress, the Department of Health and Human Services, and other federal agencies on the needs and opportunities for autism research. The IACC Strategic Plan was designed to detail research opportunities centered on the six most pressing questions facing those affected by autism and links them to specific research efforts. In 2009, the plan was finalized and submitted to the Secretary of the Department of Health and Human Services; a seventh question related to infrastructure and surveillance needs was added to the plan in 2010.

NDAR was developed by the NIH with the goal of improving sample sizes and enabling researchers to share data for increased analyses. NDAR was already in the process of being developed when the seventh question of the IACC Strategic Plan was added. Question 7, Objective H [2] of the IACC Strategic Plan emphasizes the creation of mechanisms to specifically support the contribution of data into NDAR from 90 percent of newly initiated projects regardless of funding source, and the linking of NDAR with other existing data resources by 2012.

Oversight and governance

Thomas Insel, the Director of NIMH, oversees NDAR and its implementation and participates on a Governing Committee responsible for the ongoing management and stewardship of NDAR. This committee includes several other NIH Institute and Center directors or their designees. [3]

The NDAR Implementation Team (NIT) is one of the groups providing direction on NDAR, specifically data submission and access, in order to promote consistent participant protections. The team is composed of program staff representing Institutes and Centers with autism research in their portfolios. [4]

The Autism Informatics Consortium (AIC) was launched in 2011 with the goal of accelerating scientific discovery by making informatics tools and resources more useful to autism researchers. Current members include Autism Speaks, Kennedy Krieger Institute, Simons Foundation, Prometheus Research, and the NIH. [5]

NDAR Organization

The two key components that form the basis of NDAR are the Global Unique Identifier (GUID) for research subjects and the researcher-defined Data Dictionary to describe experiments. This platform requires that common data definitions and standards, as well as comprehensive and coherent informatics approaches, be developed for and with the involvement of the research community.

Global Unique Identifier (GUID)

The NDAR GUID is a subject identifier used to protect the confidentiality of a research subject. [6] When submitting data, an investigator who has appropriate access to a subject's personally identifiable information (PII) uses the GUID Tool to create a unique identifier for each subject in their study. Although a GUID is based on PII, a subject's PII never leaves the local research site. The GUID Tool requires basic information typically found on a birth certificate such as first name at birth, last name at birth, date of birth, gender at birth and city/municipality of birth which do not change throughout an individual's life. A one-way hash code sequence is then generated based on this input. The GUID Tool transmits the encrypted codes to NDAR. The submitted one-way hash code sequence is compared to other sequences previously submitted. If that sequence has already been registered and has an associated GUID already identified, then the same GUID will be returned to the researcher. If the sequence has not been previously registered, a new GUID is created and returned to the researcher. The GUID for a subject is the same regardless of the location or time where it is generated. If the same subject enrolls in another investigator's project or provides a biological specimen for a repository, the same information from his or her birth certificate is entered into the software by the second investigator and the same GUID is generated. Data are always submitted to NDAR in association with a GUID, and the data in NDAR are indexed by the GUIDs. In this way, data from a de-identified individual subject can be aggregated, tracked and linked across projects, time, databases, and biobanks allowing for a more complete picture of the subject.

The GUID is the result of a collaboration between NDAR, the Simons Foundation, and a team of researchers from Columbia University. [7] It has become the standard as a patient identifier for autism research and serves as a model for similar standards in other research areas.

Data dictionary and validation tool

NDAR has established a data dictionary with over 300 clinical, imaging, and genomic research definitions which were created in close collaboration with the ASD research community. To submit data to NDAR, researchers are required to format their data in accord with an existing data definition or define new data definition which will be available for use by other researchers. As of May 2012, NDAR contains over 35,000 discrete data elements.

Researchers confirm that their data conforms to the existing definitions by using the Validation Tool. The Validation Tool ensures that naming conventions are defined, GUIDs are properly registered, and the reported data are consistent with the value ranges defined in the dictionary. NDAR requires minimal adjustments to the way raw data are entered, and multiple web tutorials and demos are available for researchers willing to submit data. All data contributed and shared must pass validation before they are submitted.

Genomics tool

After thorough analyses of functional genomics acquisition and storage criteria as well as a review of the needs of the research community, NDAR staff developed a tool to simply and clearly define the relationship between samples and data files. A predefined set of parameters was built that would guarantee the consistency of raw experimental data, while simplifying the data definition for submission and aggregation across federated repositories. The predefined set of parameters includes attributes specific to each experiment (such as molecule and sub-molecule), experiment technology, vendor and platform, extraction protocol and kit, processing protocol and kits, analysis software, equipment. [8]

Imaging tool

NDAR currently supports the receipt of unprocessed brain images in DICOM format, as well as processed images in variety of formats, including DICOM, MINC 1.0 and 2.0, Analyze, NIfTI-1, AFNI and SPM. Images could be visualized using NDAR's built-in image registration and visualization tool [MIPAV]. [9] Collaborations are planned with prominent ASD researchers in order to define data structures and develop standardization tools for functional neuroimaging, EEG, TMS, MEG, and eye-tracking.

Data submission

Investigators working on autism-related projects, regardless of their funding source, are strongly encourages to submit any type of autism-related data generated in their laboratories. [10] After extensive consultations with the research community, NDAR has established a two-tiered submission strategy for investigators receiving NIH funding. Descriptive (raw) data are expected to be submitted biannually in January and July, and includes non-proprietary behavioral and diagnostic data. Examples include standard clinical assessments, family medical history, demographics, unprocessed images, and genomic data. Making this information available early in the research process allows other investigators to understand the general characteristics of the participants enrolled. Experimental (analyzed) data are expected to be submitted within 12 months after accomplishment of each primary aim or objective (or set of interdependent aims or objectives) of the supported research, or at the time of publication of the results of the primary aim(s), whichever occurs first. Examples include outcome measures, analyzed genomics data, results from image analysis, and volumetric data.

Data sharing

NDAR's Ongoing Study capability allows investigators to work collaboratively on research studies in progress; sharing data, tools, and standards through the NDAR portal before they are shared with the rest of the ASD community. Qualified researchers can also request access to data stored in NDAR and/or data stored at federated repositories, after the data are made public. To gain access to that data, an investigator must obtain NDAR data access privileges. By default, all data contained in NDAR has passed data validation ensuring that all research participant data has an NDAR GUID, conforms to the NDAR data standard, and meets standard value constraints. Beginning with the January 2011 submissions, NDAR developed and implemented automated quality procedures that are run against all incoming data to check for a variety of potential data discrepancies such as duplicate data, uniformity of gender, age consistency across measures, and scoring errors on a number of measures. Not only will the new QA procedures make NDAR-residing data of higher quality, but will increase data accuracy across each individual laboratory and project.

Federation

NDAR is federated with four other private databases- the Autism Genetic Resource Exchange (AGRE), the Autism Tissue Program (ATP), and the Interactive Autism Network (IAN). This federation allows the data to be kept in their respective locations while enabling users to search across the databases simultaneously. These repositories all use the NDAR GUID as well as common data definitions. NDAR is currently finalizing a federation agreement with the Simons Foundation.

Federal databases

NDAR is linked to the following federal data repositories providing a wealth of information in one central location: the Pediatric MRI Data Repository, dbGaP, dbVaR, and the Sequence Read Archive.

NDAR Study

The NDAR Study allows researchers to record basic information about the cohort, measures, analysis, and results of a study, linking to data contained in NDAR as well as the resulting publication. This tool allows others to replicate results and understand the data analysis methods. NDAR data is associated with PubMed papers; readers are able to easily access the NDAR data from PubMed using this feature.

Awards

See also

Related Research Articles

The European Bioinformatics Institute (EMBL-EBI) is an intergovernmental organization (IGO) which, as part of the European Molecular Biology Laboratory (EMBL) family, focuses on research and services in bioinformatics. It is located on the Wellcome Genome Campus in Hinxton near Cambridge, and employs over 600 full-time equivalent (FTE) staff. Institute leaders such as Rolf Apweiler, Alex Bateman, Ewan Birney, and Guy Cochrane, an adviser on the National Genomics Data Center Scientific Advisory Board, serve as part of the international research network of the BIG Data Center at the Beijing Institute of Genomics.

<span class="mw-page-title-main">Analysis of Functional NeuroImages</span>

Analysis of Functional NeuroImages (AFNI) is an open-source environment for processing and displaying functional MRI data—a technique for mapping human brain activity.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Center for Biotechnology Information (NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced metadata, medical ontology, and unique identifiers which enrich the XML structured data for each article. Content within PMC can be linked to other NCBI databases and accessed via Entrez search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

<span class="mw-page-title-main">Combating Autism Act</span>

The Combating Autism Act of 2006 is an Act of Congress public law that was passed by the 109th United States Congress and was signed into law by President of the United States George W. Bush on December 19, 2006. It authorized nearly one billion dollars in expenditures over five years, starting in 2007, for screening, education, early intervention, prompt referrals for treatment and services, and research of the autism spectrum disorders of autism, Asperger syndrome, Rett syndrome, childhood disintegrative disorder, and pervasive developmental disorder - not otherwise specified.

<span class="mw-page-title-main">Neuroscience Information Framework</span>

The Neuroscience Information Framework is a repository of global neuroscience web resources, including experimental, clinical, and translational neuroscience databases, knowledge bases, atlases, and genetic/genomic resources and provides many authoritative links throughout the neuroscience portal of Wikipedia.

<span class="mw-page-title-main">Human Microbiome Project</span> Former research initiative

The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on identifying and characterizing human microbiota. The second phase, known as the Integrative Human Microbiome Project (iHMP) launched in 2014 with the aim of generating resources to characterize the microbiome and elucidating the roles of microbes in health and disease states. The program received $170 million in funding by the NIH Common Fund from 2007 to 2016.

<span class="mw-page-title-main">Thomas R. Insel</span> American neuroscientist

Thomas Roland Insel is an American neuroscientist, psychiatrist, entrepreneur, and author who led the National Institute of Mental Health (NIMH) from 2002 until November 2015. Prior to becoming Director of NIMH, he was the founding Director of the Center for Behavioral Neuroscience at Emory University in Atlanta, Georgia. He is best known for research on oxytocin and vasopressin, two peptide hormones implicated in complex social behaviors, such as parental care and attachment. He announced on Sept. 15, 2015, that he was resigning as the director of the NIMH to join the Life Science division of Google X. On May 8, 2017, CNBC reported that he had left Verily Life Sciences. Insel is a Co-founder with Richard Klausner and Paul Dagum of a digital mental health company named "Mindstrong," a Bay-area startup. He has also co-founded Humanest Care, NeuraWell Therapeutics, and MindSite News and is a member of the scientific advisory board for Compass Pathways, a company that is developing the psychedelic drug psilocybin to treat depression and other mental health disorders. His book, Healing: Our Path from Mental Illness to Mental Health was published by Penguin Random House in February, 2022.

<span class="mw-page-title-main">Eric D. Green</span> American science administrator

Eric D. Green is an American genomics researcher who had significant involvement in the Human Genome Project. He is the director of the National Human Genome Research Institute (NHGRI) at the National Institutes of Health (NIH), a position he has held since 2009.

The Influenza Research Database (IRD) is an integrative and comprehensive publicly available database and analysis resource to search, analyze, visualize, save and share data for influenza virus research. IRD is one of the five Bioinformatics Resource Centers (BRC) funded by the National Institute of Allergy and Infectious Diseases (NIAID), a component of the National Institutes of Health (NIH), which is an agency of the United States Department of Health and Human Services.

The PhenX Toolkit is a web-based catalog of high-priority measures related to complex diseases, phenotypic traits and environmental exposures. These measures were selected by working groups of experts using a consensus process. PhenX Toolkit's mission is to provide investigators with standard measurement protocols for use in genomic, epidemiologic, clinical and translational research. Use of PhenX measures facilitates combining data from a variety of studies, and makes it easy for investigators to expand a study design beyond the primary research focus. The Toolkit is funded by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) with co-funding by the Office of the Director (OD), the National Institute of Neurological Disorders and Stroke (NINDS), and the National Heart, Lung, and Blood Institute (NHLBI). Continuously funded since 2007, PhenX has received funding from a variety of NIH institutes, including the National Institute on Drug Abuse (NIDA), the National Institute on Mental Health (NIMH), the National Cancer Institute (NCI) and the National Institute on Minority Health and Health Disparities (NIMHD). The PhenX Toolkit is available to the scientific community at no cost.

Cognitive genomics is the sub-field of genomics pertaining to cognitive function in which the genes and non-coding sequences of an organism's genome related to the health and activity of the brain are studied. By applying comparative genomics, the genomes of multiple species are compared in order to identify genetic and phenotypical differences between species. Observed phenotypical characteristics related to the neurological function include behavior, personality, neuroanatomy, and neuropathology. The theory behind cognitive genomics is based on elements of genetics, evolutionary biology, molecular biology, cognitive psychology, behavioral psychology, and neurophysiology.

<span class="mw-page-title-main">Neuroimaging Informatics Tools and Resources Clearinghouse</span> Neuroimaging informatics knowledge environment

The Neuroimaging Tools and Resources Collaboratory is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and resources.

The Cancer Imaging Archive (TCIA) is an open-access database of medical images for cancer research. The site is funded by the National Cancer Institute's (NCI) Cancer Imaging Program, and the contract is operated by the University of Arkansas for Medical Sciences. Data within the archive is organized into collections which typically share a common cancer type and/or anatomical site. The majority of the data consists of CT, MRI, and nuclear medicine images stored in DICOM format, but many other types of supporting data are also provided or linked to, in order to enhance research utility. All data are de-identified in order to comply with the Health Insurance Portability and Accountability Act and National Institutes of Health data sharing policies.

The Interagency Autism Coordinating Committee (IACC) is a United States federal advisory panel within the Department of Health and Human Services (HHS). It coordinates all efforts within HHS concerning autism spectrum disorder (ASD).

<span class="mw-page-title-main">Della Hann</span> American psychologist and research administrator

Della Marie Hann is an American psychologist and research administrator serving as the associate director for extramural research at the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

<span class="mw-page-title-main">Biological data</span>

Biological data refers to a compound or information derived from living organisms and their products. A medicinal compound made from living organisms, such as a serum or a vaccine, could be characterized as biological data. Biological data is highly complex when compared with other forms of data. There are many forms of biological data, including text, sequence data, protein structure, genomic data and amino acids, and links among others.

<span class="mw-page-title-main">Karen Pierce (scientist)</span> American scientist

Karen Pierce is an American scientist known for her research on the early detection of autism spectrum disorder (ASD). She is a professor-in-residence in the Department of Neurosciences at University of California San Diego, and co-director of the UC San Diego Autism Center of Excellence (ACE). Pierce is an advocate of early detection and treatment engagement in autism spectrum disorder (ASD), supported by evidence of the brain’s high level of plasticity during early development.

References