Open data is data that is openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data is licensed under an open license. [1] [2] [3]
The goals of the open data movement are similar to those of other "open(-source)" movements such as open-source software, open-source hardware, open content, open specifications, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. The growth of the open data movement is paralleled by a rise in intellectual property rights. [4] The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives Data.gov, Data.gov.uk and Data.gov.in.
Open data can be linked data - referred to as linked open data.
One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. Open government data's importance is born from it being a part of citizens' everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.
The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the principles of FAIR data and carries an explicit data‑capable open license.
The concept of open data is not new, but a formalized definition is relatively new. Open data as a phenomenon denotes that governmental data should be available to anyone with a possibility of redistribution in any form without any copyright restriction. [5] One more definition is the Open Definition which can be summarized as "a piece of data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike." [6] Other definitions, including the Open Data Institute's "open data is data that anyone can access, use or share," have an accessible short version of the definition but refer to the formal definition. [7] Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data, and practice, bioscience and biodiversity.
A major barrier to the open data movement is the commercial value of data. Access to, or re-use of, data is often controlled by public or private organizations. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions detract from the common good and that data should be available without restrictions or fees.
Creators of data do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright enters the data into the public domain. For example, many scientists do not consider the data published with their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. The lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is possible for public or private organizations to aggregate said data, claim that it is protected by copyright, and then resell it.
Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.
The concept of open access to scientific data was established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957–1958. [8] The International Council of Scientific Unions (now the International Council for Science) oversees several World Data Centres with the mission to minimize the risk of data loss and to maximize data accessibility. [9]
While the open-science-data movement long predates the Internet, the availability of fast, readily available networking has significantly changed the context of Open science data, as publishing or obtaining data has become much less expensive and time-consuming. [10]
The Human Genome Project was a major initiative that exemplified the power of open data. It was built upon the so-called Bermuda Principles, stipulating that: "All human genomic sequence information … should be freely available and in the public domain in order to encourage research and development and to maximize its benefit to society". [11] More recent initiatives such as the Structural Genomics Consortium have illustrated that the open data approach can be used productively within the context of industrial R&D. [12]
In 2004, the Science Ministers of all nations of the Organisation for Economic Co-operation and Development (OECD), which includes most developed countries of the world, signed a declaration which states that all publicly funded archive data should be made publicly available. [13] Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation. [14]
Examples of open data in science:
There are a range of different arguments for government open data. [19] [20] Some advocates say that making government information available to the public as machine readable open data can facilitate government transparency, accountability and public participation. "Open data can be a powerful force for public accountability—it can make existing information easier to analyze, process, and combine than ever before, allowing a new level of public scrutiny." [21] Governments that enable public viewing of data can help citizens engage within the governmental sectors and "add value to that data." [22] Open data experts have nuanced the impact that opening government data may have on government transparency and accountability. In a widely cited paper, scholars David Robinson and Harlan Yu contend that governments may project a veneer of transparency by publishing machine-readable data that does not actually make government more transparent or accountable. [23] Drawing from earlier studies on transparency and anticorruption, [24] World Bank political scientist Tiago C. Peixoto extended Yu and Robinson's argument by highlighting a minimal chain of events necessary for open data to lead to accountability:
Some make the case that opening up official information can support technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services. [26]
Several national governments have created websites to distribute a portion of the data they collect. It is a concept for a collaborative project in the municipal Government to create and organize culture for Open Data or Open government data.
Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada. Data.gov lists the sites of a total of 40 US states and 46 US cities and counties with websites to provide open data, e.g., the state of Maryland, the state of California, US [27] and New York City. [28]
At the international level, the United Nations has an open data website that publishes statistical data from member states and UN agencies, [29] and the World Bank published a range of statistical data relating to developing countries. [30] The European Commission has created two portals for the European Union: the EU Open Data Portal which gives access to open data from the EU institutions, agencies and other bodies [31] and the European Data Portal that provides datasets from local, regional and national public bodies across Europe. [32] The two portals were consolidated to data.europa.eu on April 21, 2021.
Italy is the first country to release standard processes and guidelines under a Creative Commons license for spread usage in the Public Administration. The open model is called the Open Data Management Cycle and was adopted in several regions such as Veneto and Umbria. [33] [34] [35] Main cities like Reggio Calabria and Genova have also adopted this model.[ citation needed ] [36]
In October 2015, the Open Government Partnership launched the International Open Data Charter, a set of principles and best practices for the release of governmental open data formally adopted by seventeen governments of countries, states and cities during the OGP Global Summit in Mexico. [37]
In July 2024, the OECD adopted Creative Commons CC-BY-4.0 licensing for its published data and reports. [38]
Many non-profit organizations offer open access to their data, as long it does not undermine their users', members' or third party's privacy rights. In comparison to for-profit corporations, they do not seek to monetize their data. OpenNWT launched a website offering open data of elections. [39] CIAT offers open data to anybody who is willing to conduct big data analytics in order to enhance the benefit of international agricultural research. [40] DBLP, which is owned by a non-profit organization Dagstuhl, offers its database of scientific publications from computer science as open data. [41]
Hospitality exchange services, including Bewelcome, Warm Showers, and CouchSurfing (before it became for-profit) have offered scientists access to their anonymized data for analysis, public research, and publication. [42] [43] [44] [45] [46]
At a small level, a business or research organization's policies and strategies towards open data will vary, sometimes greatly. One common strategy employed is the use of a data commons. A data commons is an interoperable software and hardware platform that aggregates (or collocates) data, data infrastructure, and data-producing and data-managing applications in order to better allow a community of users to manage, analyze, and share their data with others over both short- and long-term timelines. [47] [48] [49] Ideally, this interoperable cyberinfrastructure should be robust enough "to facilitate transitions between stages in the life cycle of a collection" of data and information resources [47] while still being driven by common data models and workspace tools enabling and supporting robust data analysis. [49] The policies and strategies underlying a data commons will ideally involve numerous stakeholders, including the data commons service provider, data contributors, and data users. [48]
Grossman et al [48] suggests six major considerations for a data commons strategy that better enables open data in businesses and research organizations. Such a strategy should address the need for:
Beyond individual businesses and research centers, and at a more macro level, countries like Germany [50] have launched their own official nationwide open data strategies, detailing how data management systems and data commons should be developed, used, and maintained for the greater public good.
This section needs additional citations for verification .(May 2011) |
Opening government data is only a waypoint on the road to improving education, improving government, and building tools to solve other real-world problems. While many arguments have been made categorically[ citation needed ], the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.
Arguments made on behalf of open data include the following:
It is generally held that factual data cannot be copyrighted. [59] Publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.
While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
Unlike open access, where groups of publishers have stated their concerns, open data is normally challenged by individual institutions.[ citation needed ] Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.
Arguments against making all data available as open data include the following:
The paper entitled "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data" [63] argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities. The author argues that open data can be used to identify the needs of different areas of a city, develop algorithms that are fair and equitable, and justify the installation of soft mobility resources.
The goals of the Open Data movement are similar to those of other "Open" movements.
Formally both the definition of Open Data and commons revolve around the concept of shared resources with a low barrier to access. Substantially, digital commons include Open Data in that it includes resources maintained online, such as data. [68] Overall, looking at operational principles of Open Data one could see the overlap between Open Data and (digital) commons in practice. Principles of Open Data are sometimes distinct depending on the type of data under scrutiny. [69] Nonetheless, they are somewhat overlapping and their key rationale is the lack of barriers to the re-use of data(sets). [69] Regardless of their origin, principles across types of Open Data hint at the key elements of the definition of commons. These are, for instance, accessibility, re-use, findability, non-proprietarily. [69] Additionally, although to a lower extent, threats and opportunities associated with both Open Data and commons are similar. Synthesizing, they revolve around (risks and) benefits associated with (uncontrolled) use of common resources by a large variety of actors.
Both commons and Open Data can be defined by the features of the resources that fit under these concepts, but they can be defined by the characteristics of the systems their advocates push for. Governance is a focus for both Open Data and commons scholars. [69] [68] The key elements that outline commons and Open Data peculiarities are the differences (and maybe opposition) to the dominant market logics as shaped by capitalism. [68] Perhaps it is this feature that emerges in the recent surge of the concept of commons as related to a more social look at digital technologies in the specific forms of digital and, especially, data commons.
Application of open data for societal good has been demonstrated in academic research works. [70] The paper "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data" uses open data in two ways. First, it uses open data to identify the needs of different areas of a city. For example, it might use data on population density, traffic congestion, and air quality to determine where soft mobility resources, such as bike racks and charging stations for electric vehicles, are most needed. Second, it uses open data to develop algorithms that are fair and equitable. For example, it might use data on the demographics of a city to ensure that soft mobility resources are distributed in a way that is accessible to everyone, regardless of age, disability, or gender. The paper also discusses the challenges of using open data for soft mobility optimization. One challenge is that open data is often incomplete or inaccurate. Another challenge is that it can be difficult to integrate open data from different sources. Despite these challenges, the paper argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities.
An exemplification of how the relationship between Open Data and commons and how their governance can potentially disrupt the market logic otherwise dominating big data is a project conducted by Human Ecosystem Relazioni in Bologna (Italy). See: https://www.he-r.it/wp-content/uploads/2017/01/HUB-report-impaginato_v1_small.pdf.
This project aimed at extrapolating and identifying online social relations surrounding “collaboration” in Bologna. Data was collected from social networks and online platforms for citizens collaboration. Eventually data was analyzed for the content, meaning, location, timeframe, and other variables. Overall, online social relations for collaboration were analyzed based on network theory. The resulting dataset have been made available online as Open Data (aggregated and anonymized); nonetheless, individuals can reclaim all their data. This has been done with the idea of making data into a commons. This project exemplifies the relationship between Open Data and commons, and how they can disrupt the market logic driving big data use in two ways. First, it shows how such projects, following the rationale of Open Data somewhat can trigger the creation of effective data commons. The project itself was offering different types of support to social network platform users to have contents removed. Second, opening data regarding online social networks interactions has the potential to significantly reduce the monopolistic power of social network platforms on those data.
Several funding bodies that mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR): [71]
Other bodies promoting the deposition of data and full text include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU) should mandate that funded projects hand in their databases as "deliverables" at the end of the project so that they can be checked for third-party usability and then shared. [72]
A geographic information system (GIS) consists of integrated computer hardware and software that store, manage, analyze, edit, output, and visualize geographic data. Much of this often happens within a spatial database; however, this is not essential to meet the definition of a GIS. In a broader sense, one may consider such a system also to include human users and support staff, procedures and workflows, the body of knowledge of relevant concepts and methods, and institutional organizations.
Research funding is a term generally covering any funding for scientific research, in the areas of natural science, technology, and social science. Different methods can be used to disburse funding, but the term often connotes funding obtained through a competitive process, in which potential research projects are evaluated and only the most promising receive funding. It is often measured via Gross domestic expenditure on R&D (GERD).
Open educational resources (OER) are teaching, learning, and research materials intentionally created and licensed to be free for the end user to own, share, and in most cases, modify. The term "OER" describes publicly accessible materials and resources for any user to use, re-mix, improve, and redistribute under some licenses. These are designed to reduce accessibility barriers by implementing best practices in teaching and to be adapted for local unique contexts.
Grey literature is materials and research produced by organizations outside of the traditional commercial or academic publishing and distribution channels. Common grey literature publication types include reports, working papers, government documents, white papers and evaluations. Organizations that produce grey literature include government departments and agencies, civil society or non-governmental organizations, academic centres and departments, and private companies and consultants.
Open science is the movement to make scientific research and its dissemination accessible to all levels of society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open-notebook science, broader dissemination and engagement in science and generally making it easier to publish, access and communicate scientific knowledge.
Open government is the governing doctrine which maintains that citizens have the right to access the documents and proceedings of the government to allow for effective public oversight. In its broadest construction, it opposes reason of state and other considerations which have tended to legitimize extensive state secrecy. The origins of open-government arguments can be dated to the time of the European Age of Enlightenment, when philosophers debated the proper construction of a then nascent democratic society. It is also increasingly being associated with the concept of democratic reform. The United Nations Sustainable Development Goal 16 for example advocates for public access to information as a criterion for ensuring accountable and inclusive institutions.
Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archiving of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently.
Free content, libre content, libre information, or free information is any kind of creative work, such as a work of art, a book, a software program, or any other creative content unrestricted by copyright and other legal limitations on use. These are works or expressions which can be freely studied, applied, copied and modified by anyone for any purpose including, in some cases, commercial purposes. Free content encompasses all works in the public domain and also those copyrighted works whose licenses honor and uphold the definition of free cultural work.
The National Library of Economics is the world's largest research infrastructure for economic literature, online as well as offline. The ZBW is a member of the Leibniz Association and has been a foundation under public law since 2007. Several times the ZBW received the international LIBER Award for its innovative work in librarianship. The ZBW allows for access of millions of documents and research on economics, partnering with over 40 research institutions to create a connective Open Access portal and social web of research. Through its EconStor and EconBiz, researchers and students have accessed millions of datasets and thousands of articles. The ZBW also edits two journals: Wirtschaftsdienst and Intereconomics.
Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.
A memory institution is an organization maintaining a repository of public knowledge, a generic term used about institutions such as libraries, archives, heritage institutions, aquaria and arboreta, and zoological and botanical gardens, as well as providers of digital libraries and data aggregation services which serve as memories for given societies or mankind. Memory institutions serve the purpose of documenting, contextualizing, preserving and indexing elements of human culture and collective memory. These institutions allow and enable society to better understand themselves, their past, and how the past impacts their future. These repositories are ultimately preservers of communities, languages, cultures, customs, tribes, and individuality. Memory institutions are repositories of knowledge, while also being actors of the transitions of knowledge and memory to the community. These institutions ultimately remain some form of collective memory. Increasingly such institutions are considered as a part of a unified documentation and information science perspective.
The digital commons are a form of commons involving the distribution and communal ownership of informational resources and technology. Resources are typically designed to be used by the community by which they are created.
The Panton Principles are a set of principles which were written to promote open science. They were first drafted in July 2009 at the Panton Arms pub in Cambridge.
Data publishing is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice.
In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with Linked Data principles. The Linguistic Linked Open Data Cloud was conceived and is being maintained by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation, but has been a point of focal activity for several W3C community groups, research projects, and infrastructure efforts since then.
Open energy system database projects employ open data methods to collect, clean, and republish energy-related datasets for open use. The resulting information is then available, given a suitable open license, for statistical analysis and for building numerical energy system models, including open energy system models. Permissive licenses like Creative Commons CC0 and CC BY are preferred, but some projects will house data made public under market transparency regulations and carrying unqualified copyright.
FAIR data is data which meets the FAIR principles of findability, accessibility, interoperability, and reusability (FAIR). The acronym and principles were defined in a March 2016 paper in the journal Scientific Data by a consortium of scientists and organizations.
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open source model is a decentralized software development model that encourages open collaboration. A main principle of open source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public. The open source movement in software began as a response to the limitations of proprietary code. The model is used for projects such as in open source appropriate technology, and open source drug discovery.
Data collaboratives are a form of collaboration in which participants from different sectors—including private companies, research institutions, and government agencies—can exchange data and data expertise to help solve public problems.
The economics of open science describe the economic aspects of making a wide range of scientific outputs to all levels of society.
{{cite web}}
: CS1 maint: multiple names: authors list (link)