Apache Apex

Last updated
Apache Apex
Developer(s) Apache Software Foundation
Final release
3.7.0 / April 27, 2018;5 years ago (2018-04-27) [1] [2]
Repository Apex Repository
Written in Java and Scala
Operating system Cross-platform
Type Stream processing, Batch processing
License Apache License 2.0
Website apex.apache.org OOjs UI icon edit-ltr-progressive.svg

Apache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable. [3]

Contents

Apache Apex was named a top-level project by The Apache Software Foundation on April 25, 2016. [4] [5] As of September 2019, it is no longer actively developed. [2]

Overview

Apache Apex is developed under the Apache License 2.0. [6] The project was driven by the San Jose, California-based start-up company DataTorrent.

There are two parts of Apache Apex: Apex Core and Apex Malhar. Apex Core is the platform or framework for building distributed applications on Hadoop. The core Apex platform is supplemented by Malhar, a library of connector and logic functions, enabling rapid application development. These input and output operators provide templates to sources and sinks such as Alluxio, S3, HDFS, NFS, FTP, Kafka, ActiveMQ, RabbitMQ, JMS, Cassandra, MongoDB, Redis, HBase, CouchDB, generic JDBC, and other database connectors.

History

DataTorrent has developed the platform since 2012 and then decided to open source the core that became Apache Apex. [7] It entered incubation in August 2015 and became Apache Software Foundation top level project within 8 months. DataTorrent itself shut down in May 2018. [8]

As of September 2019, Apache Apex is no longer being developed. [2]

ModuleVersionRelease date
Apex Core3.7.027 April 2018
Apex Malhar3.8.011 November 2017
Apex Core3.6.05 May 2017
Apex Malhar3.7.031 March 2017
Apex Core3.5.012 December 2016
Apex Malhar3.6.09 December 2016
Apex Malhar3.5.03 September 2016
Apex Core and Malhar3.4.05 May 2016

Apex Big Data World

Apex Big Data World [9] is a conference about Apache Apex. The first conference of Apex Big Data World took place in 2017. They were held in Pune, India and Mountain View, California, USA.

Related Research Articles

Apache Derby is a relational database management system (RDBMS) developed by the Apache Software Foundation that can be embedded in Java programs and used for online transaction processing. It has a 3.5 MB disk-space footprint.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Apache ServiceMix is an open-source software project to implement a distributed enterprise service bus (ESB).

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Apache ZooKeeper</span> System for distributed coordination

Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications. It is a project of the Apache Software Foundation.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

DataStax, Inc. is a real-time data company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.

Apache Attic is a project of Apache Software Foundation to provide processes to make it clear when an Apache project has reached its end-of-life. The Attic project was created in November 2008. Also the retired projects can be retained.

<span class="mw-page-title-main">Apache Storm</span> Open-source distributed stream processing

Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems via Kafka Connect, and provides the Kafka Streams libraries for stream processing applications. Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes."

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it, as remote service through TCP/IP.

<span class="mw-page-title-main">Apache Flink</span> Framework and distributed processing engine

Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

<span class="mw-page-title-main">Apache Kylin</span> Open-source distributed analytics engine

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

<span class="mw-page-title-main">Apache SINGA</span> Open-source machine learning library

Apache SINGA is an Apache top-level project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

Apache Arrow is a language-agnostic software framework for developing data analytics applications that process columnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modern CPU and GPU hardware. This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints of dynamic random-access memory.

<span class="mw-page-title-main">Apache RocketMQ</span> Open-source stream processing platform

RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability. It is the third generation distributed messaging middleware open sourced by Alibaba in 2012. On November 21, 2016, Alibaba donated RocketMQ to the Apache Software Foundation. Next year, on February 20, the Apache Software Foundation announced Apache RocketMQ as a Top-Level Project.

References

  1. Apache Apex Downloads , retrieved 4 July 2019
  2. 1 2 3 "Apache Apex - Apache Attic" . Retrieved 2 December 2019.
  3. "Apache Apex Web Page".
  4. "Spark rival Apache Apex hits top-level status". siliconangle.com.
  5. "The Apache Software Foundation Announces Apache Apex as a Top-Level Project". The Apache Software Foundation. 25 April 2016.
  6. "Apache Apex Github Repos". GitHub .
  7. "Apache Apex Is Promoted To Top-Level Project". informationweek.com. 27 April 2016.
  8. "DataTorrent, Stream Processing Startup, Folds". 8 May 2018.
  9. "Apex Big Data World". Archived from the original on 2017-04-04. Retrieved 2017-04-03.