Open data

Last updated

Open data map Lod.png
Open data map
Linked open data cloud in August 2014 LOD Cloud 2014-08.svg
Linked open data cloud in August 2014
Clear labeling of the licensing terms is a key component of open data, and icons like the one pictured here are being used for that purpose. Open Data stickers.jpg
Clear labeling of the licensing terms is a key component of open data, and icons like the one pictured here are being used for that purpose.

Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. [1] The goals of the open-source data movement are similar to those of other "open(-source)" movements such as open-source software, hardware, open content, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. Paradoxically, the growth of the open data movement is paralleled by a rise in intellectual property rights. [2] The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as, and


Open data can also be linked data; when it is, it is linked open data. One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. Open government data's importance is borne from it being a part of citizens' everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.


The concept of open data is not new, but a formalized definition is relatively new. Conceptually, open data as a phenomenon denotes that governmental data should be available to anyone with a possibility of redistribution in any form without any copyright restriction. [3] One more definition is the Open Definition which can be summarized in the statement that "A piece of data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike." [4] Other definitions, including the Open Data Institute's "Open data is data that anyone can access, use or share", have an accessible short version of the definition but refer to the formal definition.

Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data, and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the common good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by a license.

A typical depiction of the need for open data:

Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery ... we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge.

John Wilbanks, VP Science, Creative Commons [5]

Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright puts the data into the public domain. For example, many scientists do not regard the published data arising from their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. However, the lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is also possible for public or private organizations to aggregate said data, protect it with copyright and then resell it.

The issue of indigenous knowledge (IK) poses a great challenge in terms of capturing, storage and distribution. Many societies in third-world countries lack the technicality processes of managing the IK.

At his presentation at the XML 2005 conference, Connolly [6] displayed these two quotations regarding open data:

Major sources

The State of Open Data, a 2019 book from African Minds The State of Open Data Histories and Horizons.pdf
The State of Open Data, a 2019 book from African Minds

Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.

In science

The concept of open access to scientific data was institutionally established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957–1958. [8] The International Council of Scientific Unions (now the International Council for Science) oversees several World Data Centres with the mandate to minimize the risk of data loss and to maximize data accessibility. [9]

While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming. [10]

The Human Genome Project was a major initiative that exemplified the power of open data. It was built upon the so-called Bermuda Principles, stipulating that: "All human genomic sequence information ... should be freely available and in the public domain in order to encourage research and development and to maximize its benefit to society'. [11] More recent initiatives such as the Structural Genomics Consortium have illustrated that the open data approach can also be used productively within the context of industrial R&D. [12]

In 2004, the Science Ministers of all nations of the Organisation for Economic Co-operation and Development (OECD), which includes most developed countries of the world, signed a declaration which essentially states that all publicly funded archive data should be made publicly available. [13] Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation. [14]

Examples of open data in science:

In government

There are a range of different arguments for government open data. [17] [18] For example, some advocates contend that making government information available to the public as machine readable open data can facilitate government transparency, accountability and public participation. "Open data can be a powerful force for public accountability—it can make existing information easier to analyze, process, and combine than ever before, allowing a new level of public scrutiny." [19] Governments that enable public viewing of data can help citizens engage within the governmental sectors and "add value to that data." [20]

Some make the case that opening up official information can support technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services.

Several national governments have created websites to distribute a portion of the data they collect. It is a concept for a collaborative project in the municipal Government to create and organize culture for Open Data or Open government data.

Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada. lists the sites of a total of 40 US states and 46 US cities and counties with websites to provide open data; e.g. the state of Maryland, the state of California, US [21] and New York City. [22]

At the international level, the United Nations has an open data website that publishes statistical data from member states and UN agencies, [23] and the World Bank published a range of statistical data relating to developing countries. [24] The European Commission has created two portals for the European Union: the EU Open Data Portal which gives access to open data from the EU institutions, agencies and other bodies [25] and the PublicData portal that provides datasets from local, regional and national public bodies across Europe. [26]

In October 2015, the Open Government Partnership launched the International Open Data Charter, a set of principles and best practices for the release of governmental open data formally adopted by seventeen governments of countries, states and cities during the OGP Global Summit in Mexico. [27]

In non-profit organizations

Many non-profit organizations offer more or less open access to their data, as long it does not undermine their users', members' or third party's privacy rights. In comparison to for-profit corporations, they do not seek to monetize their data. OpenNWT launched a website offering open data of elections. [28] CIAT offers open data to anybody, who is willing to conduct big data analytics in order to enhance the benefit of international agricultural research. [29] DBLP, which is owned by a non-profit organization Dagstuhl, offers its database of scientific publications from computer science as open data. [30] Non-profit hospitality exchange services offer trustworthy teams of scientists access to their anonymized data for publication of insights to the benefit of humanity. Before becoming a for-profit corporation in 2011, Couchsurfing offered 4 research teams access to its social networking data. [31] [32] [33] [34] In 2015, non-profit hospitality exchange services Bewelcome and Warm Showers provided their data for public research. [35] [36]

Arguments for and against

The debate on open data is still evolving. The best open government applications seek to empower citizens, to help small businesses, or to create value in some other positive, constructive way. Opening government data is only a way-point on the road to improving education, improving government, and building tools to solve other real world problems. While many arguments have been made categorically[ citation needed ], the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.

Arguments made on behalf of open data include the following:

It is generally held that factual data cannot be copyrighted. [44] However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.

While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.

Unlike open access, where groups of publishers have stated their concerns, open data is normally challenged by individual institutions.[ citation needed ] Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.

Arguments against making all data available as open data include the following:

Relation to other open activities

The goals of the Open Data movement are similar to those of other "Open" movements.

Funders' mandates

Several funding bodies which mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR): [50]

Other bodies active in promoting the deposition of data as well as full text include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU) should mandate that funded projects hand in their databases as "deliverables" at the end of the project, so that they can be checked for third party usability then shared. [51]

Non-open data

Several mechanisms restrict access to or reuse of data (and several reasons for doing this are given above). They include:

See also

Related Research Articles

Research Systematic study undertaken to increase knowledge

Research is "creative and systematic work undertaken to increase the stock of knowledge". It involves the collection, organization, and analysis of information to increase understanding of a topic or issue. A research project may be an expansion on past work in the field. Research projects can be used to develop further knowledge on a topic, or for education. To test the validity of instruments, procedures, or experiments, research may replicate elements of prior projects or the project as a whole.

Scientific journal Periodical journal publishing scientific research

In academic publishing, a scientific journal is a periodical publication intended to further the progress of science, usually by reporting new research.

CiteSeerx is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer is considered as a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search. CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. For this reason, authors whose documents are freely available are more likely to be represented in the index.

Open access Research publications that are distributed online, free of cost or other access barriers

Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of cost or other access barriers. With open access strictly defined, or libre open access, barriers to copying or reuse are also reduced or removed by applying an open license for copyright.

Research funding is a term generally covering any funding for scientific research, in the areas of natural science, technology, and social science. The term often connotes funding obtained through a competitive process, in which potential research projects are evaluated and only the most promising receive funding.

Open educational resources

Open educational resources (OER) are freely accessible, openly licensed text, media, and other digital assets that are useful for teaching, learning, and assessing as well as for research purposes.

The Access to Knowledge (A2K) movement is a loose collection of civil society groups, governments, and individuals converging on the idea that access to knowledge should be linked to fundamental principles of justice, freedom, and economic development.

Open research Research made available to the public

Open research is research conducted in the spirit of free and open-source software. Much like open-source schemes that are built around a source code that is made public, the central theme of open research is to make clear accounts of the methodology freely available via the internet, along with any data or results extracted or derived from them. This permits a massively distributed collaboration, and one in which anyone may participate at any level of the project.

Grey literature is materials and research produced by organizations outside of the traditional commercial or academic publishing and distribution channels. Common grey literature publication types include reports, working papers, government documents, white papers and evaluations. Organizations that produce grey literature include government departments and agencies, civil society or non-governmental organizations, academic centres and departments, and private companies and consultants.

Open science is the movement to make scientific research and its dissemination accessible to all levels of an inquiring society, amateur or professional. Open science is transparent and accessible knowledge that is shared and developed through collaborative networks. It encompasses practices such as publishing open research, campaigning for open access, encouraging scientists to practice open-notebook science, and generally making it easier to publish and communicate scientific knowledge.

Open government is the governing doctrine which holds that citizens have the right to access the documents and proceedings of the government to allow for effective public oversight. In its broadest construction, it opposes reason of state and other considerations, which have tended to legitimize extensive state secrecy. The origins of open-government arguments can be dated to the time of the European Age of Enlightenment, during which philosophers debated the proper construction of a then nascent democratic society. It is also increasingly being associated with the concept of democratic reform. The United Nations Sustainable Development Goal 16 for example advocates for public access to information as a criteria for ensuring accountable and inclusive institutions.

Data sharing

Data sharing is the practice of making data used for scholarly research available to other investigators. Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are considered by many to be part of the scientific method.

Free content Creative work with few or no restrictions on how it may be used

A free content, libre content, or free information, is any kind of functional work, work of art, or other creative content that meets the definition of a free cultural work.

The open-source model is a decentralized software development model that encourages open collaboration. A main principle of open-source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public. The open-source movement in software began as a response to the limitations of proprietary code. The model is used for projects such as in open-source appropriate technology, and open-source drug discovery.

An open-access mandate is a policy adopted by a research institution, research funder, or government which requires or recommends researchers—usually university faculty or research staff and/or research grant recipients—to make their published, peer-reviewed journal articles and conference papers open access (1) by self-archiving their final, peer-reviewed drafts in a freely accessible institutional repository or disciplinary repository or (2) by publishing them in an open-access journal or both.

Open science data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge. While the idea of open science data has been actively promoted since the 1950s, the rise of the Internet has significantly lowered the cost and time required to publish or obtain data.

A memory institution is an organization maintaining a repository of public knowledge, a generic term used about institutions such as libraries, archives, heritage institutions, aquaria and arboreta, and zoological and botanical gardens, as well as providers of digital libraries and data aggregation services which serve as memories for given societies or mankind. Memory institutions serve the purpose of documenting, contextualizing, preserving and indexing elements of human culture and collective memory. These institutions allow and enable society to better understand themselves, their past, and how the past impacts their future. These repositories are ultimately preservers of communities, languages, cultures, customs, tribes, and individuality. Memory institutions are repositories of knowledge, while also being actors of the transitions of knowledge and memory to the community. These institutions ultimately remain some form of collective memory. Increasingly such institutions are considered as a part of a unified documentation and information science perspective.

Academic journal publishing reform is the advocacy for changes in the way academic journals are created and distributed in the age of the Internet and the advent of electronic publishing. Since the rise of the Internet, people have organized campaigns to change the relationships among and between academic authors, their traditional distributors and their readership. Most of the discussion has centered on taking advantage of benefits offered by the Internet's capacity for widespread distribution of reading material.

History of open access

The idea and practise of providing free online access to journal articles began at least a decade before the term "open access" was formally coined. Computer scientists had been self-archiving in anonymous ftp archives since the 1970s and physicists had been self-archiving in arxiv since the 1990s. The Subversive Proposal to generalize the practice was posted in 1994.

Open source is a source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. It most commonly refers to the open-source model, in which open-source software or other products are released under an open-source license as part of the open-source-software movement. Use of the term originated with software, but has expanded beyond the software sector to cover other open content and forms of open collaboration.


  1. Auer, S. R.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. (2007). "DBpedia: A Nucleus for a Web of Open Data". The Semantic Web. Lecture Notes in Computer Science. 4825. p. 722. doi:10.1007/978-3-540-76298-0_52. ISBN   978-3-540-76297-3.
  2. Kitchin, Rob (2014). The Data Revolution. London: Sage. p. 49. ISBN   978-1-4462-8748-4.
  3. Kassen, Maxat (1 October 2013). "A promising phenomenon of open data: A case study of the Chicago open data project". Government Information Quarterly. 30 (4): 508–513. doi:10.1016/j.giq.2013.05.012. ISSN   0740-624X.
  4. See Open Definition home page and the full Open Definition
  5. Science Commons
  6. Connolly, Dan (16 November 2005). "Semantic Web Data Integration with hCalendar and GRDDL". W3C Talks and Presentations. XML Conference & Exposition 2005, Atlanta, Georgia, USA: W3C. p. 2. Retrieved 2 May 2015.CS1 maint: location (link)
  7. Veen, Jeffrey (2 November 2005). "Polar Heart Rate Monitors: Gimme my data!". A website by Jeffrey Veen.
  8. Committee on Scientific Accomplishments of Earth Observations from Space, National Research Council (2008). Earth Observations from Space: The First 50 Years of Scientific Achievements. The National Academies Press. p. 6. doi:10.17226/11991. ISBN   978-0-309-11095-2 . Retrieved 24 November 2010.
  9. World Data System (27 September 2017). "Data Sharing Principles". ICSU-WDS (International Council for Science - World Data Service). Retrieved 27 September 2017.
  10. Vuong, Quan-Hoang (12 December 2017). "Open data, open review and open dialogue in making social sciences plausible". arXiv: 1712.04801 . Bibcode:2017arXiv171204801V . Retrieved 30 June 2018.Cite journal requires |journal= (help)
  11. Human Genome Project, 1996. Summary of Principles Agreed Upon at the First International Strategy Meeting on Human Genome Sequencing (Bermuda, 25–28 February 1996)
  12. Perkmann, Markus; Schildt, Henri (2015). "Open Data Partnerships between Firms and Universities: The Role of Boundary Organizations". Research Policy. 44 (5): 1133–1143. doi: 10.1016/j.respol.2014.12.006 .
  13. OECD Declaration on Open Access to publicly funded data Archived 20 April 2010 at the Wayback Machine
  14. OECD Principles and Guidelines for Access to Research Data from Public Funding
  15. Dataverse Network Project
  17. Gray, Jonathan (2014). "Towards a Genealogy of Open Data". Social Science Research Network (SSRN). doi:10.2139/ssrn.2605828.Cite journal requires |journal= (help)
  18. Brito, Jerry. "Hack, Mash, & Peer: Crowdsourcing Government Transparency". Colum. Sci. & Tech. L. Rev. 119 (2008).
  19. Yu, Harlan; Robinson, David G. (28 February 2012). "The New Ambiguity of 'Open Government'". Rochester, NY: Social Science Research Network. SSRN   2012489 .Cite journal requires |journal= (help)
  20. Robinson, David G.; Yu, Harlan; Zeller, William P.; Felten, Edward W. (1 January 2009). "Government Data and the Invisible Hand". Rochester, NY: Social Science Research Network. SSRN   1138083 .Cite journal requires |journal= (help)
  21. "". Retrieved 7 May 2019.
  22. Data, City of New York, NYC Open. "NYC Open Data". NYC OpenData. Retrieved 7 May 2019.
  23. "UNdata". Retrieved 7 May 2019.
  24. "World Bank Open Data | Data". Retrieved 7 May 2019.
  25. "" . Retrieved 7 May 2019.
  26. "Home | Open Data Portal". Retrieved 7 May 2019.
  27. "The Open Data Charter: A Roadmap for Using a Global Resource". The Huffington Post. 27 October 2015. Retrieved 29 October 2015.
  28. Green, Arthur C. "OpenNWT announces launch of new election information website". My Yellowknife Now.
  29. Oyuela, Andrea; Walmsley, Thea; Walla, Katherine (30 December 2019). "120 Organizations Creating a New Decade for Food". Food Tank. Retrieved 21 January 2020.
  30. "dblp: How can I download the whole dblp dataset?". Dagstuhl. Retrieved 21 January 2020.
  31. Victor, Patricia; Cornelis, Chris; De Cock, Martine; Herrera-Viedma, Enrique (2010). "Bilattice-based aggregation operators for gradual trust and distrust". World Scientific Proceedings Series on Computer Engineering and Information Science. World Scientific: 505–510. doi:10.1142/9789814324700_0075. ISBN   978-981-4324-69-4.
  32. Dandekar, Pranav. "Analysis & Generative Model for Trust Networks" (PDF). Retrieved 21 January 2020.Cite journal requires |journal= (help)
  33. Overgoor, Jan; Wulczyn, Ellery; Potts, Christopher (20 May 2012). "Trust Propagation with Mixed-Effects Models". Sixth International AAAI Conference on Weblogs and Social Media.
  34. Lauterbach, Debra; Truong, Hung; Shah, Tanuj; Adamic, Lada (August 2009). "Surfing a Web of Trust: Reputation and Reciprocity on". 2009 International Conference on Computational Science and Engineering. 4: 346–353. doi:10.1109/CSE.2009.345. ISBN   978-1-4244-5334-4.
  35. Rustam Tagiew; Dmitry I. Ignatov; Radhakrishnan Delhibabu (2015). Hospitality Exchange Services as a Source of Spatial and Social Data?. (IEEE) International Conference on Data Mining Workshop (ICDMW). Atlantic City. pp. 1125–1130. doi:10.1109/ICDMW.2015.239.
  36. Rustam Tagiew; Dmitry I. Ignatov; Radhakrishnan Delhibabu (2015). Economics of Internet-Based Hospitality Exchange. (IEEE/WIC/ACM) International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Singapore. pp. 493–498. arXiv: 1501.06941 . doi:10.1109/WI-IAT.2015.89.
  37. "On the road to open data, by Ian Manocha". Archived from the original on 29 March 2012. Retrieved 12 August 2011.
  38. "Big Data for Development: From Information- to Knowledge Societies", Martin Hilbert (2013), SSRN Scholarly Paper No. ID 2205145. Rochester, NY: Social Science Research Network;
  39. How to Make the Dream Come True [ permanent dead link ] argues in one research area (Astronomy) that access to open data increases the rate of scientific discovery.
  40. Khodiyar, Varsha (19 May 2014). "Stopping the rot: ensuring continued access to scientific data, irrespective of age". F1000 Research. F1000. Retrieved 11 March 2015.
  41. Magee AF, May MR, Moore BR (24 October 2014). "The dawn of open access to phylogenetic data". PLOS One. 9 (10): e110268. arXiv: 1405.6623 . Bibcode:2014PLoSO...9k0268M. doi:10.1371/journal.pone.0110268. PMC   4208793 . PMID   25343725.
  42. Rivera, Roberto; Marazzi, Mario; Torres, Pedro (19 June 2019). "Incorporating Open Data Into Introductory Courses in Statistics". Taylor and Francis. Retrieved 7 May 2020.
  43. Rivera, Roberto. "Principles of Managerial Statistics and Data Science". Wiley. Retrieved 15 February 2020.
  44. Towards a Science Commons Archived 14 July 2014 at the Wayback Machine includes an overview of the basis of openness in science data.
  45. Low, A., 2001. The Third Revolution: Plant Genetic Resources in Developing Countries and China: Global Village or Global Pillage. Int'l. Trade & Bus. L. Ann. 323
  46. Sharif, Naubahar; Ritter, Waltraut; Davidson, Robert L; Edmunds, Scott C (31 December 2018). "An Open Science 'State of the Art' for Hong Kong: Making Open Research Data Available to Support Hong Kong Innovation Policy". Journal of Contemporary Eastern Asia. 17 (2): 200–221. doi:10.17477/JCEA.2018.17.2.200.
  47. "Protocol for Implementing Open Access Data". Archived from the original on 30 January 2017. Retrieved 17 April 2009.
  48. creation of term
  49. Kauppinen, T.; Espindola, G. M. D. (2011). "Linked Open Science-Communicating, Sharing and Evaluating Data, Methods and Results for Executable Papers". Procedia Computer Science. 4: 726–731. doi: 10.1016/j.procs.2011.04.076 .
  50. Mailing List Archive
  51. Galsworthy, M.J. & McKee, M. (2013). Europe's "Horizon 2020" science funding programme: How is it shaping up? Journal of Health Services Research and Policy. doi: 10.1177/1355819613476017
  52. "Review of history and positions by the University of California". Archived from the original on 9 November 2006. Retrieved 31 October 2006.