Developer(s) | Freebase, then Google, now open source community |
---|---|
Initial release | November 10, 2010 |
Stable release | |
Repository | |
Written in | Java [2] |
Platform | Microsoft Windows, Linux, macOS |
Available in | English, Italian, Chinese, Japanese, French, German |
Type | |
License | BSD License |
Website | openrefine |
OpenRefine is an open-source desktop application for data cleanup and transformation to other formats, an activity commonly known as data wrangling. [3] It is similar to spreadsheet applications, and can handle spreadsheet file formats such as CSV, but it behaves more like a database.
It operates on rows of data which have cells under columns, similar to the manner in which relational database tables operate. OpenRefine projects consist of one table, whose rows can be filtered using facets that define criteria (for example, showing rows where a given column is not empty).
Unlike spreadsheets, most operations in OpenRefine are done on all visible rows, for example, the transformation of all cells in all rows under one column, [4] or the creation of a new column based on existing data. Actions performed on a dataset are stored the project and can be 'replayed' on other datasets. Formulas are not stored in cells, but are used to transform the data. Transformation is done only once. [5] Formula expressions can be written in General Refine Expression Language (GREL), [6] in Jython (i.e., Python), and in Clojure. [7]
The program operates as a local web app: it starts a web server and opens the default browser to 127.0.0.1:3333.
Import is supported from following formats: [14]
If input data is in a non-standard text format, it can be imported as whole lines, without splitting into columns, and then columns extracted later with OpenRefine's tools. Archived and compressed files are supported (.zip, .tar.gz, .tgz, .tar.bz2, .gz, or .bz2) and Refine can download input files from a URL. To use web pages as input, it is possible to import a list of URLs and then invoke a URL fetch function.
Export is supported in following formats: [16]
Whole OpenRefine projects in native format can be exported as a .tar.gz archive.
OpenRefine started life as Freebase Gridworks, developed by Metaweb and has been available as open source since January 2010. [17] On 16 July 2010, Google acquired Metaweb, [18] the creators of Freebase, and on 10 November 2010 renamed Freebase Gridwords Google Refine, releasing version 2.0. [19] On 2 October 2012, original author David Huynh announced that Google would soon stop its active support of Google Refine. [20] [21] [22] Since then, the codebase has been in transition to an open source project named OpenRefine. [23]
Microsoft Excel is a spreadsheet editor developed by Microsoft for Windows, macOS, Android, iOS and iPadOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications (VBA). Excel forms part of the Microsoft 365 suite of software.
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. The term spreadsheet may also refer to one such electronic document.
Comma-separated values (CSV) is a text file format that uses commas to separate values. A CSV file stores tabular data in plain text, where each line of the file typically represents one data record. Each record consists of the same number of fields, and these are separated by commas in the CSV file. If the field delimiter itself may appear within a field, fields can be surrounded with quotation marks.
The following tables compare general and technical information for many wiki software packages.
In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.
The following tables compare general and technical information for a number of office suites:
Symbolic Link (SYLK) is a Microsoft file format typically used to exchange data between applications, specifically spreadsheets. SYLK files conventionally have a .slk
suffix. Composed of only displayable ANSI characters, it can be easily created and processed by other applications, such as databases.
A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.
Gramps is a free and open source genealogy software. Gramps is programmed in Python using PyGObject. It uses Graphviz to create relationship graphs.
Google Code Search was a free beta product from Google which debuted in Google Labs on October 5, 2006, allowing web users to search for open-source code on the Internet. Features included the ability to search using operators, namely lang:, package:, license:, and file:.
Spreadsheet is a class of application software design to analyze tabular data called "worksheets". A collection of worksheets is called a "workbook". Online spreadsheets do not depend on a particular operating system but require a standards-compliant web browser instead. One of the incentives for the creation of online spreadsheets was offering worksheet sharing and public sharing or workbooks as part of their features which enables collaboration between multiple users. Some on-line spreadsheets provide remote data update, allowing data values to be extracted from other users' spreadsheets even though they may be inactive at the time.
PeaZip is a free and open-source file manager and file archiver for Microsoft Windows, ReactOS, Linux, MacOS and BSD by Giorgio Tani. It supports its native PEA archive format and other mainstream formats, with special focus on handling open formats. Version 9.4.0 supported 234 file extensions.
Metaweb Technologies, Inc. was a San Francisco–based company that developed Freebase, described as an "open, shared database of the world's knowledge". The company was co-founded by Danny Hillis, Veda Hlubinka-Cook and John Giannandrea in 2005.
Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions. Freebase aimed to create a global resource that allowed people to access common information more effectively. It was developed by the American software company Metaweb and run publicly beginning in March 2007. Metaweb was acquired by Google in a private sale announced on 16 July 2010. Google's Knowledge Graph is powered in part by Freebase.
FarPoint Spread is a suite of Microsoft Excel-compatible spreadsheet components available for .NET, COM, and Microsoft BizTalk Server. Software developers use the components to embed Microsoft Excel-compatible spreadsheet features into their applications, such as importing and exporting Microsoft Excel files, displaying, modifying, analyzing, and visualizing data. Spread components handle spreadsheet data at the cell, row, column, or worksheet level.
Veusz is a scientific plotting package. Veusz is a Qt application written in Python, PyQt and NumPy. It is freely available for anyone to distribute under the terms of the GPL. It is designed to produce publication-quality plots. The name should be pronounced as "views".
LibreOffice Calc is the spreadsheet component of the LibreOffice software package.
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license. Wikidata is a wiki powered by the software MediaWiki, including its extension for semi-structured data, the Wikibase.
translatewiki.net, formerly named Betawiki, is a web-based translation platform powered by the Translate extension for MediaWiki. It can be used to translate various kinds of texts but is commonly used for creating localisations for software interfaces.
Double Commander is a free and open-source multi-platform two-panel orthodox file manager that is inspired by the Microsoft Windows-only Total Commander.