Open data portal

Last updated

An open data portal is any online platform which supports users in accessing collections of open data. Typical open data portals present the data of the organization which hosts the portal.

Contents

Government organizations sometimes host open data portals as a way of meeting their regional freedom of information legal requirements. Another common use case is open data portals for sharing data in some field of research for the benefit of other researchers.

Characteristics

The simplest open data portal is list of datasets with instructions for how anyone can access and use that data. [1]

Characteristics of good open data portals include the use of open standards, access to data without human intervention, and analytics about what data people use. [2]

Open data portals contain information of interest to citizens, business owners, nonprofit administrators, researchers, and journalists. [3]

Uses

Government

A 2012 paper reported that government organizations which set up open data portals often find it challenging to predict what sorts of users will want the data and how they will use it. [4]

In the European Union there is a central open data portal which connects anyone to the regional and subject specific data portals for various matters of government. [5]

In the United States all the states and many cities offer open data portals. [6] [7]

A report on the open data portal emphasized the need to develop the culture of appreciation of open data. [8]

A review of open data portals in Australia found variation in what the portals offered and how they operated. [9]

Science

There is a cancer genomics open data portal. [10]

There is a portal for systems chemistry biology. [11]

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

<span class="mw-page-title-main">Health informatics</span> Computational approaches to health care

Health informatics is the study and implementation of computer structures and algorithms to improve communication, understanding, and management of medical information. It can be viewed as branch of engineering and applied science.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

<span class="mw-page-title-main">UK Biobank</span> Long-term biobank study of 500,000 people

UK Biobank is a large long-term biobank study in the United Kingdom (UK) which is investigating the respective contributions of genetic predisposition and environmental exposure to the development of disease. It began in 2006. UK Biobank has been cited as an important resource for cancer research.

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

The cancer Biomedical Informatics Grid (caBIG) was a US government program to develop an open-source, open access information network called caGrid for secure data exchange on cancer research. The initiative was developed by the National Cancer Institute and was maintained by the Center for Biomedical Informatics and Information Technology (CBIIT) and program managed by Booz Allen Hamilton. In 2011 a report on caBIG raised significant questions about effectiveness and oversight, and its budget and scope were significantly trimmed. In May 2012, the National Cancer Informatics Program (NCIP) was created as caBIG's successor program.

The completion of the human genome sequencing in the early 2000s was a turning point in genomics research. Scientists have conducted series of research into the activities of genes and the genome as a whole. The human genome contains around 3 billion base pairs nucleotide, and the huge quantity of data created necessitates the development of an accessible tool to explore and interpret this information in order to investigate the genetic basis of disease, evolution, and biological processes. The field of genomics has continued to grow, with new sequencing technologies and computational tool making it easier to study the genome.

<span class="mw-page-title-main">Human Genome Project</span> Human genome sequencing programme

The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint. It started in 1990 and was completed in 2003. It remains the world's largest collaborative biological project. Planning for the project started after it was adopted in 1984 by the US government, and it officially launched in 1990. It was declared complete on April 14, 2003, and included about 92% of the genome. Level "complete genome" was achieved in May 2021, with a remaining only 0.3% bases covered by potential issues. The final gapless assembly was finished in January 2022.

<span class="mw-page-title-main">Open data</span> Openly accessible data

Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license.

Public health genomics is the use of genomics information to benefit public health. This is visualized as more effective preventive care and disease treatments with better specificity, tailored to the genetic makeup of each patient. According to the Centers for Disease Control and Prevention (U.S.), Public Health genomics is an emerging field of study that assesses the impact of genes and their interaction with behavior, diet and the environment on the population's health.

The Cancer Genome Atlas (TCGA) is a project to catalogue the genomic alterations responsible for cancer using genome sequencing and bioinformatics. The overarching goal was to apply high-throughput genome analysis techniques to improve the ability to diagnose, treat, and prevent cancer through a better understanding of the genetic basis of the disease.

<span class="mw-page-title-main">Galaxy (computational biology)</span>

Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system.

Personal genomics or consumer genetics is the branch of genomics concerned with the sequencing, analysis and interpretation of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism (SNP) analysis chips, or partial or full genome sequencing. Once the genotypes are known, the individual's variations can be compared with the published literature to determine likelihood of trait expression, ancestry inference and disease risk.

Pan-cancer analysis aims to examine the similarities and differences among the genomic and cellular alterations found across diverse tumor types. International efforts have performed pan-cancer analysis on exomes and the whole genomes of cancers, the latter including their non-coding regions. In 2018, The Cancer Genome Atlas (TCGA) Research Network used exome, transcriptome, and DNA methylome data to develop an integrated picture of commonalities, differences, and emergent themes across tumor types.

The Cancer Imaging Archive (TCIA) is an open-access database of medical images for cancer research. The site is funded by the National Cancer Institute's (NCI) Cancer Imaging Program, and the contract is operated by the University of Arkansas for Medical Sciences. Data within the archive is organized into collections which typically share a common cancer type and/or anatomical site. The majority of the data consists of CT, MRI, and nuclear medicine images stored in DICOM format, but many other types of supporting data are also provided or linked to, in order to enhance research utility. All data are de-identified in order to comply with the Health Insurance Portability and Accountability Act and National Institutes of Health data sharing policies.

<span class="mw-page-title-main">Open source</span> Practice of freely allowing access and modification of source code

Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized software development model that encourages open collaboration. A main principle of open-source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public. The open-source movement in software began as a response to the limitations of proprietary code. The model is used for projects such as in open-source appropriate technology, and open-source drug discovery.

Software for COVID-19 pandemic mitigation takes many forms. It includes mobile apps for contact tracing and notifications about infection risks, vaccine passports, software for enabling – or improving the effectiveness of – lockdowns and social distancing, Web software for the creation of related information services, and research and development software. A common issue is that few apps interoperate, reducing their effectiveness.

References

  1. Dodds, Leigh (13 October 2015). "What is a data portal?". Lost Boy.
  2. koordinates. "The ten features you need from your open data portal". koordinates.com. Retrieved 30 December 2019.
  3. Warner, Tiana (10 May 2016). "Guide to Open Data: Using it, Sharing it, and Creating a Portal". Safe Software.
  4. Janssen, Marijn; Charalabidis, Yannis; Zuiderwijk, Anneke (September 2012). "Benefits, Adoption Barriers and Myths of Open Data and Open Government". Information Systems Management. 29 (4): 258–268. doi:10.1080/10580530.2012.716740.
  5. "Open data portals". Digital Single Market - European Commission. 2 August 2013.
  6. Brown, Meta S. (30 April 2018). "States Offer Information Resources: 50+ Open Data Portals". Forbes.
  7. Brown, Meta S. (28 April 2018). "City Governments Making Public Data Easier To Get: 90 Municipal Open Data Portals". Forbes.
  8. Verma, Neeta; Gupta, M. P. (2013). "Open government data". Proceedings of the 7th International Conference on Theory and Practice of Electronic Governance - ICEGOV '13. pp. 338–341. doi:10.1145/2591888.2591949. ISBN   9781450324564.
  9. Chatfield, Akemi Takeoka; Reddick, Christopher G. (2017). "A longitudinal cross-sector analysis of open data portal service capability: The case of Australian local governments". Government Information Quarterly. 34 (2): 231–243. doi:10.1016/j.giq.2017.02.004. ISSN   0740-624X.
  10. Zhang, J.; Baran, J.; Cros, A.; Guberman, J. M.; Haider, S.; Hsu, J.; Liang, Y.; Rivkin, E.; Wang, J.; Whitty, B.; Wong-Erasmus, M.; Yao, L.; Kasprzyk, A. (19 September 2011). "International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data". Database. 2011: bar026. doi:10.1093/database/bar026. PMC   3263593 . PMID   21930502.
  11. Chen, Bin; Ding, Ying; Wang, Huijun; Wild, David J.; Dong, Xiao; Sun, Yuyin; Zhu, Qian; Sankaranarayanan, Madhuvanthi (2010). "Chem2Bio2RDF: A Linked Open Data Portal for Systems Chemical Biology". 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. pp. 232–239. doi:10.1109/WI-IAT.2010.183. ISBN   978-1-4244-8482-9.