Apache Ambari

Last updated
Ambari
Developer(s) Apache Software Foundation
Initial releaseSeptember 22, 2011;12 years ago (2011-09-22) [1]
Stable release
2.7.7 / December 11, 2022;16 months ago (2022-12-11) [2]
Repository Ambari Repository
Written in Java, JavaScript, Python
Operating system Cross-platform
Type Distributed computing
License Apache License 2.0
Website ambari.apache.org

The Apache Ambari project intends to simplify the management of Apache Hadoop clusters using a web UI. It also integrates with other existing applications using Ambari REST APIs. [3]

Contents

What Does Apache Ambari Do

Ambari provides a dashboard for monitoring the health and status of the Hadoop cluster making it easier to manage. Ambari leverages the Ambari Metrics System for metrics collection. Ambari leverages the Ambari Alert Framework for system alerting and will notify you when your attention is needed.[ citation needed ]

Related Research Articles

<span class="mw-page-title-main">Apache Nutch</span> Open source web crawler

Apache Nutch is a highly extensible and scalable open source web crawler software project.

Cascading is a software abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based language, hiding the underlying complexity of MapReduce jobs. It is open source and available under the Apache License. Commercial support is available from Driven, Inc.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

<span class="mw-page-title-main">Doug Cutting</span> American information theorist

Douglass Read Cutting is a software designer, advocate, and creator of open-source search technology. He founded two technology projects, Lucene, and Nutch, with Mike Cafarella. Both projects are now managed through the Apache Software Foundation. Cutting and Cafarella are also the co-founders of Apache Hadoop.

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala libraries for common math operations and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented.

<span class="mw-page-title-main">Apache ZooKeeper</span> System for distributed coordination

Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications. It is a project of the Apache Software Foundation.

<span class="mw-page-title-main">Apache Hama</span>

Apache Hama is a distributed computing framework based on bulk synchronous parallel computing techniques for massive scientific computations e.g., matrix, graph and network algorithms. Originally a sub-project of Hadoop, it became an Apache Software Foundation top level project in 2012. It was created by Edward J. Yoon, who named it, and Hama also means hippopotamus in Yoon's native Korean language (하마), following the trend of naming Apache projects after animals and zoology. Hama was inspired by Google's Pregel large-scale graph computing framework described in 2010. When executing graph algorithms, Hama showed a fifty-fold performance increase relative to Hadoop.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

Within database management systems, the RCFile is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure includes a data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.

Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data. The name Druid comes from the shapeshifting Druid class in many role-playing games, to reflect that the architecture of the system can shift to solve different types of data problems.

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

The MapR File System is a clustered file system that supports both very large-scale and high-performance uses. MapR FS supports a variety of interfaces including conventional read/write file access via NFS and a FUSE interface, as well as via the HDFS interface used by many systems such as Apache Hadoop and Apache Spark. In addition to file-oriented access, MapR FS supports access to tables and message streams using the Apache HBase and Apache Kafka APIs, as well as via a document database interface.

<span class="mw-page-title-main">Apache Pinot</span> Open-source distributed data store

Apache Pinot is a column-oriented, open-source, distributed data store written in Java. Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion. The name Pinot comes from the Pinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.

References

  1. "[AMBARI-1] Initial code import - ASF JIRA".
  2. "Release Apache Ambari-2.7.6" . Retrieved 24 April 2022.
  3. "Ambari -". ambari.apache.org. Retrieved 2022-04-24.