Open Data Indices

Last updated March 25, 2020

Open data indices are indicators which assess and evaluates the general openness of an open government data portal. Open data indices not only show how open a data portal is, but also encourage citizens and government officials alike, to participate in their local open data communities, particularly in advocating for local open data and local open data policies.

There are two mainstream methodologies, which are Global Open Data Index and Open Data Barometer. The Global Open Data Index evaluates an open data portal from 11 different aspects based on the Open Definition of open data, while the Open Data Barometer adds two more indices compared to the previous one.

Scoring standard

According to the service offered by Open Knowledge International, they run a measurement called "Global Open Data Index" which is "an annual effort to measure the state of open government data around the world".^[1] And they evaluate the openness of an open dataset according to the following questions:

1. Does the data exist? (5 marks)

The Open Knowledge Foundation specifically indicates that the data of an open data portal should be directly comes from the official government department or a third party with the permission of the government that they can fully represent the government. And if so, the third party should explicitly states the permission.

2. Is data in digital form? (5 marks)

This question does not examines if the data can be accessed online or by public but if the data exists in any digital format.

3. Publicly available? (5 marks)

A data could be considered as publicly available when it can be accessed without any permission or password by every individual (not just government officers) and there is no restrictions for the amount of photocopies can be made if the data is in the paper form. For this question, it does not matter if the data is in paper form or digital form.

4. Is the data available for free? (15 marks)

The data is available for free if the access of the data does not require any forms of charges.

5. Is the data available online? (5 marks)

The data is available online if it can be accessed through the Internet from an official source.

6. Is the data machine-readable? (15 marks)

This question addresses whether the data is in a form that can be easily processed by the computer. File types such as XLS, CSV, JSON, XML are considered as machine-readable, while PDF, or HTML are not.

7. Available in bulk? (10 marks)

If the whole dataset can be easily downloaded, it can be considered as available in bulk.

8. Openly licensed? (30 marks)

This question addresses whether the data can be freely used, reused, and redistributed by everyone without any restrictions. A list of types of licenses that meet the requirements is listed at http://opendefinition.org/licenses/.

9. Is the data provided on a timely and up to date basis? (10 marks)

This question examines if the data is updated on a regular basis. It requires personal judgement with rationale.

Each of these questions evaluates different aspects of a dataset, and each question is weighted differently based on the importance. There is in total 13 types of datasets. The final score is calculated according to following equation: sum of all datasets scores/1300 ( (the maximum possible score that a country can get) - sum (13 dataset)/1300 = index percentage. The Global Open Data Index ranks each country according to their percentage of openness.

In addition, the Open Data Barometer adds two more question for their evaluation of the open data portal, and they are:

10. Is the publication of the dataset sustainable?

11. Are (linked) data URIs provided for key elements of the data?

Related Research Articles

Open content describes any work that others can copy or modify freely by attributing to the original creator, but without needing to ask for permission. This has been applied to a range of formats, including textbooks, academic journals, films and music. The term was an expansion of the related concept of open-source software. Such content is said to be under an open licence.

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL) are used. These technologies are used to formally represent metadata. For example, ontology can describe concepts, relationships between entities, and categories of things. These embedded semantics offer significant advantages such as reasoning over data and operating with heterogeneous data sources.

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

JSTOR is a digital library founded in 1995. Originally containing digitized back issues of academic journals, it now encompasses books and other primary sources as well as current issues of journals. It provides full-text searches of almost 2,000 journals.

A digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO). An implementation of the Handle System, DOIs are in wide use mainly to identify academic, professional, and government information, such as journal articles, research reports and data sets, and official publications though they also have been used to identify other types of information resources, such as commercial videos.

Species diversity is the number of different species that are represented in a given community. The effective number of species refers to the number of equally abundant species needed to obtain the same mean proportional species abundance as that observed in the dataset of interest. Meanings of species diversity may include species richness, taxonomic or phylogenetic diversity, and/or species evenness. Species richness is a simple count of species. Taxonomic or phylogenetic diversity is the genetic relationship between different groups of species. Species evenness quantifies how equal the abundances of the species are.

Consumer confidence is an economic indicator that measures the degree of optimism that consumers feel about the overall state of the economy and their personal financial situation. If the consumer has confidence in the immediate and near future economy and his/her personal finance, then the consumer will spend more than save.

Freedom in the World is a yearly survey and report by the U.S.-based non-governmental organization Freedom House that measures the degree of civil liberties and political rights in every nation and significant related and disputed territories around the world.

Open data practice of sharing data publicly and reusably

Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open-source data movement are similar to those of other "open(-source)" movements such as open-source software, hardware, open content, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. Paradoxically, the growth of the open data movement is paralleled by a rise in intellectual property rights. The philosophy behind open data has been long established, but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov, Data.gov.uk and Data.gov.in.

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets. Tim Berners-Lee described DBpedia as one of the most famous parts of the decentralized Linked Data effort.

The Neuroscience Information Framework is a repository of global neuroscience web resources, including experimental, clinical, and translational neuroscience databases, knowledge bases, atlases, and genetic/genomic resources and provides many authoritative links throughout the neuroscience portal of Wikipedia.

Open data in Canada describes the capacity for the Canadian Federal Government and other levels of government in Canada to provide online access to data collected and created by governments in a standards-compliant Web 2.0 way. Open data requires that machine-readable should be made openly available, simple to access, and convenient to reuse. As of 2016, Canada was ranked 2nd in the world for publishing open data by the World Wide Web Foundation's Open Data Barometer. But as of July 2018, Canada was ranked 7th alongside Norway

The European Climate Assessment and Dataset (ECA&D) is a database of daily meteorological station observations across Europe and is gradually being extended to countries in the Middle East and North Africa. ECA&D has attained the status of Regional Climate Centre for high-resolution observation data in World Meteorological Organization Region VI ].

The Definition of Free Cultural Works is a definition of free content from 2006. The project evaluates and recommends compatible free content licenses.

The Open Government Initiative is an effort by the administration of President of the United States Barack Obama to "[create] an unprecedented level of openness in Government.". The directive starting this initiative was issued on January 20, 2009, Obama's first day in office.

Crown Copyright has been a long-standing copyright protection applied to official works, and at times artistic works, produced under royal or official supervision. In 2006, The Guardian newspaper's Technology section began a "Free Our Data" campaign, calling for data gathered by authorities at public expense to be made freely available for reuse by individuals. In 2010 with the creation of the Open Government Licence and the Data.gov.uk site it appeared that the campaign had been mostly successful, and since 2013 the UK has been consistently named one of the leaders in the open data space.

Open science data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge. While the idea of open science data has been actively promoted since the 1950s, the rise of the Internet has significantly lowered the cost and time required to publish or obtain data.

The Global Slavery Index is a global study of modern slavery published by the Minderoo Foundation’s Walk Free initiative. Four editions have been published: in 2013, 2014, 2016 and 2018.

Big data ethics also known as simply data ethics refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data Ethics is of increasing relevance as the quantity of data increases because of the scale of the impact.

FAIR data are data which meet principles of findability, accessibility, interoperability, and reusability. A March 2016 publication by a consortium of scientists and organizations specified the "FAIR Guiding Principles for scientific data management and stewardship" in Scientific Data, using FAIR as an acronym and making the concept easier to discuss.

References

↑ Knowledge, Open. "Open Data Index - Open Knowledge". Open Data Index. Retrieved 2016-10-20.

Further reading

Open Data Indices

Contents

Scoring standard

Related Research Articles

References

See also