Original author(s) | Maxime Beauchemin / Airbnb |
---|---|
Stable release | 3.0.0 / 18 September 2023 |
Repository | Superset Repository on Github |
Written in | Python, TypeScript |
Operating system | Cross-platform |
Type | data visualization, business intelligence |
License | Apache License 2.0 |
Website | superset |
Apache Superset is an open-source software application for data exploration and data visualization able to handle data at petabyte scale (big data). The application started as a hack-a-thon project by Maxime Beauchemin (creator of Apache Airflow) while working at Airbnb and entered the Apache Incubator program in 2017. [1] In addition to Airbnb, the project has seen significant contributions from other leading technology companies, including Lyft and Dropbox. [2] Superset graduated from the incubator program and became a top-level project at the Apache Software Foundation in 2021. [3]
Maxime Beauchemin's company, Preset, offers Superset as a managed service (SaaS). [4]
Apache Nutch is a highly extensible and scalable open source web crawler software project.
Apache Flex, formerly Adobe Flex, is a software development kit (SDK) for the development and deployment of cross-platform rich web applications based on the Adobe Flash platform. Initially developed by Macromedia and then acquired by Adobe Systems, Adobe donated Flex to the Apache Software Foundation in 2011 and it was promoted to a top-level project in December 2012.
Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.
Apache Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these. A model can also be queried through SPARQL 1.1.
OpenJPA is an open source implementation of the Java Persistence API specification. It is an object-relational mapping (ORM) solution for the Java language, which simplifies storing objects in databases. It is open-source software distributed under the Apache License 2.0.
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.
Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.
Apache Shiro is an open source software security framework that performs authentication, authorization, cryptography and session management. Shiro has been designed to be an intuitive and easy-to-use framework while still providing robust security features.
Deltacloud is an application programming interface (API) developed by Red Hat and the Apache Software Foundation that abstracts differences between cloud computing implementations. It was announced on September 3, 2009.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.
NuttX is a free and open-source Real-Time Operating System (RTOS) with an emphasis on technical standards compliance and on having a small footprint. Scalable from 8-bit to 64-bit microcontroller environments, the main governing standards in NuttX are from the Portable Operating System Interface (POSIX) and the American National Standards Institute (ANSI). Further standard application programming interfaces (APIs) from Unix and other common RTOSes are adopted for functions unavailable under these standards, or inappropriate for deeply embedded environments, such as the fork system call.
Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.
Apache Marmotta is a linked data platform that comprises several components. In its most basic configuration it is a Linked Data server. Marmotta is one of the reference projects early implementing the new Linked Data Platform recommendation that is being developed by W3C.
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.
Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.
Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.
Apache CarbonData is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is similar to the other columnar-storage file formats available in Hadoop namely RCFile and ORC. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.
Apache Airflow is an open-source workflow management platform for data engineering pipelines. It started at Airbnb in October 2014 as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface. From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019.
Apache Iceberg is an open source high-performance format for huge analytic tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive, Impala, StarRocks, Doris, and Pig to safely work with the same tables, at the same time. Iceberg is released under the Apache License. Iceberg addresses the performance and usability challenges of using Apache Hive tables in large and demanding data lake environments. Vendors currently supporting Apache Iceberg tables in their products include CelerData, Cloudera, Dremio, IOMETE, Snowflake, Starburst, Tabular, and AWS.