Original author(s) | Maxime Beauchemin / Airbnb |
---|---|
Stable release | 4.1.1 / 20 November 2024 |
Repository | Superset Repository on Github |
Written in | Python, TypeScript |
Operating system | Cross-platform |
Type | data visualization, business intelligence |
License | Apache License 2.0 |
Website | superset |
Apache Superset is an open-source software application for data exploration and data visualization able to handle data at petabyte scale (big data). The application started as a hack-a-thon project by Maxime Beauchemin (creator of Apache Airflow) while working at Airbnb and entered the Apache Incubator program in 2017. [1] In addition to Airbnb, the project has seen significant contributions from other leading technology companies, including Lyft and Dropbox. [2] Superset graduated from the incubator program and became a top-level project at the Apache Software Foundation in 2021. [3]
Maxime Beauchemin's company, Preset, offers Superset as a managed service (SaaS). [4]
Apache Nutch is a highly extensible and scalable open source web crawler software project.
Apache Flex, formerly Adobe Flex, is a software development kit (SDK) for the development and deployment of cross-platform rich web applications based on the Adobe Flash platform. Initially developed by Macromedia and then acquired by Adobe Systems, Adobe donated Flex to the Apache Software Foundation in 2011 and it was promoted to a top-level project in December 2012.
Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence (BI). The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.
Apache Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these. A model can also be queried through SPARQL 1.1.
OpenJPA is an open source implementation of the Java Persistence API specification. It is an object-relational mapping (ORM) solution for the Java language, which simplifies storing objects in databases. It is open-source software distributed under the Apache License 2.0.
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.
Apache Sling is an open source Web framework for the Java platform designed to create content-centric applications on top of a JSR-170-compliant content repository such as Apache Jackrabbit. Apache Sling allows developers to deploy their application components as OSGi bundles or as scripts and templates in the content repository. Supported scripting languages are JSP, server-side JavaScript, Ruby, Velocity. The goal of Apache Sling is to expose content in the content repository as HTTP resources, fostering a RESTful style of application architecture.
Apache Shiro is an open source software security framework that performs authentication, authorization, cryptography and session management. Shiro has been designed to be an intuitive and easy-to-use framework while still providing robust security features.
Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications. It is a project of the Apache Software Foundation.
Deltacloud is an application programming interface (API) developed by Red Hat and the Apache Software Foundation that abstracts differences between cloud computing implementations. It was created in 2009.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.
Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.
Apache Marmotta is a linked data platform that comprises several components. In its most basic configuration it is a Linked Data server. Marmotta is one of the reference projects early implementing the new Linked Data Platform recommendation that is being developed by W3C.
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.
Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.
Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.
Apache CarbonData is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is similar to the other columnar-storage file formats available in Hadoop namely RCFile and ORC. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.
Apache Airflow is an open-source workflow management platform for data engineering pipelines. It started at Airbnb in October 2014 as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface. From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019.
Apache Iceberg is a high performance open-source format for large analytic tables. Iceberg enables the use of SQL tables for big data while making it possible for engines like Spark, Trino, Flink, Presto, Hive, Impala, StarRocks, Doris, and Pig to safely work with the same tables, at the same time. Iceberg is released under the Apache License. Iceberg addresses the performance and usability challenges of Apache Hive tables in large and demanding data lake environments. Vendors currently supporting Apache Iceberg tables include Buster, CelerData, Cloudera, Crunchy Data, Dremio, IOMETE, Snowflake, Starburst, Tabular, AWS, and Google Cloud.