Data USA

Last updated
Data USA
Original author(s) Deloitte, MIT Media Lab – Collective Learning, Datawheel
Developer(s) Datawheel
Initial releaseApril 4, 2016;7 years ago (2016-04-04)
Written in Python, JavaScript, React
Platform Web
Available inEnglish
License AGPL
Website datausa.io

Data USA is a free platform that allows users to collect, analyze, and visualize shared U.S. government data. Launched on April 4, 2016, Data USA is the product of an ongoing partnership between Deloitte, Massachusetts Institute of Technology (MIT) Collective Learning Group, and Datawheel. [1] [2] [3] [4] [5]

Contents

The platform won a 2017 Webby Award for Government & Civil Innovation, [6] along with a 2016 Kantar Information is Beautiful Award. [7]

On May 1, 2019, version 3.0 of the platform was released, which included a new "Viz Builder" tool, which allows users to build custom data visualizations using data from all of the data sources included on the site. [8] This allows for cross-dimensional queries of the data, which were previously unavailable given the vertical-nature of the profile pages.

Data USA belongs to a larger family of data visualization and distribution platforms, created under the vision of César Hidalgo, which take open data sources that are traditionally siloed and collates them into a single data portal with narrative profiles and data exploration tools. These sites include The Observatory of Economic Complexity (OEC), DataChile, Data Africa, and Data KOREA.

Architecture

Back-end

Data USA consolidates data from 21 open data sources, cleaned and standardized into a PostgreSQL database, and accessible via a public API. [9] The ETL steps are currently written in python, and the API is constructed using mondrian-rest. [10]

Front-end

The front-end of Data USA is written in HTML, CSS, and JavaScript, using a React framework called Canon. The codebase, much like the underlying data itself, is made open-source on GitHub under a GNU Affero General Public License v3.0. [11] The visualizations found on Data USA are created using D3plus, a library built on top of D3.js that enables quick visualization development by providing default styles and helper functions and classes. [12]

See also

Related Research Articles

GNOME-DB is a database application by the GNOME community. The project aims to provide a free unified data access architecture to the GNOME project for all Unix platforms. GNOME-DB is useful for any application that accesses persistent data, since it contains a data management API.

<span class="mw-page-title-main">3D Slicer</span> Image analysis and scientific visualization software

3D Slicer (Slicer) is a free and open source software package for image analysis and scientific visualization. Slicer is used in a variety of medical applications, including autism, multiple sclerosis, systemic lupus erythematosus, prostate cancer, lung cancer, breast cancer, schizophrenia, orthopedic biomechanics, COPD, cardiovascular disease and neurosurgery.

In FOSS development communities, a forge is a web-based collaborative software platform for both developing and sharing computer applications.

YouTrack is a proprietary, commercial browser-based bug tracker, issue tracking system, and project management software developed by JetBrains. This software is designed to facilitate query-based issue search with auto-completion, manipulating issues in batches, customizing the set of issue attributes, and creating custom workflows.

<span class="mw-page-title-main">IcCube</span>

icCube is known for its embeddable data analytics and visualization software platform tailored specifically for B2B Software-as-a-Service (SaaS) applications, i.e. Embedded analytics.

DataViva is an information visualization engine created by the Strategic Priorities Office of the government of Minas Gerais. DataViva makes official data about exports, industries, locations and occupations available for the entirety of Brazil through eight apps and more than 100 million possible visualizations.

The OpenAPI Specification, previously known as the Swagger Specification, is a specification for a machine-readable interface definition language for describing, producing, consuming and visualizing web services. Previously part of the Swagger framework, it became a separate project in 2015, overseen by the OpenAPI Initiative, an open-source collaboration project of the Linux Foundation.

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

<span class="mw-page-title-main">Enigma Technologies</span>

Enigma Technologies, Inc., is a data science company headquartered in New York City that specializes in providing data and intelligence about businesses. The company is mainly known for Enigma Public, a now defunct library of public data.

<span class="mw-page-title-main">GitLab</span> Open-source Git software package

GitLab Inc. is an open-core company that operates GitLab, a DevOps software package that can develop, secure, and operate software. The open-source software project was created by Ukrainian developer Dmytro Zaporozhets and Dutch developer Sytse Sijbrandij. In 2018, GitLab Inc. was considered to be the first partly-Ukrainian unicorn.

Perforce Software, Inc. is an American developer of software used for developing and running applications, including version control software, web-based repository management, developer collaboration, application lifecycle management, web application servers, debugging tools and agile planning software.

<span class="mw-page-title-main">KDE Gear</span> Set of applications and supporting libraries

The KDE Gear is a set of applications and supporting libraries that are developed by the KDE community, primarily used on Linux-based operating systems but mostly multiplatform, and released on a common release schedule.

<span class="mw-page-title-main">César Hidalgo</span> Chilean-Spanish-American physicist, author and entrepreneur

César A. Hidalgo is a Chilean born, Chilean-Spanish-American physicist, author, and entrepreneur. He directs the Center for Collective Learning at the Artificial and Natural Intelligence Institute (ANITI) of the University of Toulouse. He is also an Honorary Professor at the University of Manchester, and is a visiting professor at Harvard's School of Engineering and Applied Sciences. Hidalgo is known for work on Economic Complexity, Relatedness, Data Visualization, Applied Artificial Intelligence, and Digital Democracy. Prior to joining the University of Toulouse, Hidalgo was a professor at MIT where he directed the Collective Learning group. He is also a founder and partner at Datawheel, a data visualization and distribution company.

Wercker is a Docker-based continuous delivery platform that helps software developers build and deploy their applications and microservices. Using its command-line interface, developers can create Docker containers on their desktop, automate their build and deploy processes, testing them on their desktop, and then deploy them to various cloud platforms, ranging from Heroku to AWS and Rackspace. The command-line interface to Wercker has been open-sourced.

<span class="mw-page-title-main">Notebook interface</span> Programming tool blending code and documents

A notebook interface or computational notebook is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded in formatted documents; others separate calculations and text into separate sections. Notebooks share some goals and features with spreadsheets and word processors but go beyond their limited data models.

<span class="mw-page-title-main">Katalon Studio</span> Automation testing software tool

Katalon Platform is an automation testing software tool developed by Katalon, Inc. The software is built on top of the open-source automation frameworks Selenium, Appium with a specialized IDE interface for web, API, mobile and desktop application testing. Its initial release for internal use was in January 2015. Its first public release was in September 2016. In 2018, the software acquired 9% of market penetration for UI test automation, according to The State of Testing 2018 Report by SmartBear.

Mattermost is an open-source, self-hostable online chat service with file sharing, search, and integrations. It is designed as an internal chat for organisations and companies, and mostly markets itself as an open-source alternative to Slack and Microsoft Teams.

References

  1. Steve Lohr. "Media Lab's 'Data USA' aims to make government data easy to use – The Tech".
  2. Bruce Brown (5 April 2016). "MIT DATA USA Turns U.S. Data into Visual Interface – Digital Trends". Digital Trends.
  3. Tanvi Misra. "MIT and Deloitte's DataUSA Web Tool Makes City Data Easy to Access and Understand – CityLab". CityLab.
  4. "DataUSA Visualizes Improved Insight into Government Data". Data Informed. Archived from the original on 2016-04-07. Retrieved 2016-04-27.
  5. Steve Lohr. "Website Seeks to Make Government Data Easier to Sift Through". New York Times.
  6. "Data USA -- The Webby Awards" . Retrieved 2019-09-19.
  7. "Data USA". www.informationisbeautifulawards.com. Retrieved 2019-09-19.
  8. "Deloitte, MIT, and Datawheel Launch New 'Viz Builder' in Data USA 3.0 – Press Release | Deloitte US". Deloitte United States. Archived from the original on 2019-05-05. Retrieved 2019-09-19.
  9. "Data USA". datausa.io. Archived from the original on 2016-04-07. Retrieved 2019-09-19.
  10. Aristarán, Manuel (2019-09-06), GitHub - jazzido/mondrian-rest: A REST interface for Mondrian ROLAP server. , retrieved 2019-09-19
  11. The most comprehensive visualization of U.S. public data —: DataUSA/datausa-site, DataUSA, 2019-08-07, retrieved 2019-09-19
  12. "D3plus". d3plus.org. Retrieved 2019-09-19.