Developer(s) | Treasure Data |
---|---|
Initial release | 10 October 2011 |
Stable release | 1.16.6 / August 20, 2024 [1] |
Repository | |
Written in | C, Ruby |
Operating system | Linux (Amazon Linux, CentOS, RHEL), macOS (10.9 and above), Ruby, Windows (7 and above) |
Type | Logging tool |
License | Apache 2.0 |
Website | www |
Fluentd is a cross-platform open-source data collection software project originally developed at Treasure Data. It is written primarily in the C programming language with a thin-Ruby wrapper that gives users flexibility. [2]
Fluentd was positioned for "big data," semi- or un-structured data sets. It analyzes event logs, application logs, and clickstreams. [3] According to Suonsyrjä and Mikkonen, the "core idea of Fluentd is to be the unifying layer between different types of log inputs and outputs.", [4] Fluentd is available on Linux, macOS, and Windows. [5]
This article needs to be updated.(May 2024) |
Fluentd was created by Sadayuki Furuhashi as a project of the Mountain View-based firm Treasure Data. Written primarily in Ruby, its source code was released as open-source software in October 2011. [6] [7] The company announced $5 million of funding in 2013. [8] Treasure Data was then sold to Arm Ltd. in 2018. [9]
Fluentd was one of the data collection tools recommended by Amazon Web Services in 2013, when it was said to be similar to Apache Flume or Scribe. [10] Google Cloud Platform's BigQuery recommends Fluentd as the default real-time data-ingestion tool, and uses Google's customized version of Fluentd, called google-fluentd, as a default logging agent. [11] [12]
Fluent Bit is a log processor and log forwarder which is being developed as a CNCF sub-project under the umbrella of Fluentd project. [13] Fluentd is written in C and Ruby and consumes at least sixty megabytes of memory. Fluent Bit is written only in C, with no dependencies, and consumes approximately one megabyte of memory, making it easier to run under embedded Linux and in containers. [14]
Cascading is a software abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based language, hiding the underlying complexity of MapReduce jobs. It is open source and available under the Apache License. Commercial support is available from Driven, Inc.
In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.
Apache Hadoop is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.
Progress Chef is a configuration management tool written in Ruby and Erlang. It uses a pure-Ruby, domain-specific language (DSL) for writing system configuration "recipes". Chef is used to streamline the task of configuring and maintaining a company's servers, and can integrate with cloud-based platforms such as Amazon EC2, Google Cloud Platform, Oracle Cloud, OpenStack, IBM Cloud, Microsoft Azure, and Rackspace to automatically provision and configure new machines. Chef contains solutions for both small and large scale systems.
Cloud Foundry is an open source, multi-cloud application platform as a service (PaaS) governed by the Cloud Foundry Foundation, a 501(c)(6) organization.
HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.
OpenShift is a family of containerization software products developed by Red Hat. Its flagship product is the OpenShift Container Platform — a hybrid cloud platform as a service built around Linux containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux. The family's other products provide this platform through different environments: OKD serves as the community-driven upstream, Several deployment methods are available including self-managed, cloud native under ROSA, ARO and RHOIC on AWS, Azure, and IBM Cloud respectively, OpenShift Online as software as a service, and OpenShift Dedicated as a managed service.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.
Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.
Hue is an open-source SQL Cloud Editor, licensed under the Apache License 2.0.
Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to create, delete, and alter SQL tables, views, indexes, and sequences; insert and delete rows singly and in bulk; and query data through SQL. Phoenix compiles queries and other statements into native NoSQL store APIs rather than using MapReduce enabling the building of low latency applications on top of NoSQL stores.
Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.
Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.
Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.
Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.
Microsoft, a tech company historically known for its opposition to the open source software paradigm, turned to embrace the approach in the 2010s. From the 1970s through 2000s under CEOs Bill Gates and Steve Ballmer, Microsoft viewed the community creation and sharing of communal code, later to be known as free and open source software, as a threat to its business, and both executives spoke negatively against it. In the 2010s, as the industry turned towards cloud, embedded, and mobile computing—technologies powered by open source advances—CEO Satya Nadella led Microsoft towards open source adoption although Microsoft's traditional Windows business continued to grow throughout this period generating revenues of 26.8 billion in the third quarter of 2018, while Microsoft's Azure cloud revenues nearly doubled.
Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can query data lakes that contain open column-oriented data file formats like ORC or Parquet residing on different storage systems like HDFS, AWS S3, Google Cloud Storage, or Azure Blob Storage using the Hive and Iceberg table formats. Trino also has the ability to run federated queries that query tables in different data sources such as MySQL, PostgreSQL, Cassandra, Kafka, MongoDB and Elasticsearch. Trino is released under the Apache License.