Fluentd

Last updated
Fluentd
Developer(s) Treasure Data
Initial release10 October 2011;
12 years ago
 (2011-10-10)
Stable release
1.15.3 / November 2, 2022;13 months ago (2022-11-02) [1]
Repository
Written in C, Ruby
Operating system Linux (Amazon Linux, CentOS, RHEL), macOS (10.9 and above), Ruby, Windows (7 and above)
Type Logging tool
License Apache 2.0
Website www.fluentd.org OOjs UI icon edit-ltr-progressive.svg

Fluentd is a cross-platform open-source data collection software project originally developed at Treasure Data. It is written primarily in the Ruby programming language.

Contents

Overview

Fluentd was positioned for "big data", semi- or un-structured data sets. It analyzes event logs, application logs, and clickstreams. [2] According to Suonsyrjä and Mikkonen, the "core idea of Fluentd is to be the unifying layer between different types of log inputs and outputs.", [3] Fluentd is available on Linux, macOS, and Windows. [4]

History

Fluentd was created by Sadayuki Furuhashi as a project of the Mountain View-based firm Treasure Data. Written primarily in Ruby, its source code was released as open-source software in October 2011. [5] [6] The company announced $5 million of funding in 2013. [7] Treasure Data was then sold to Arm Ltd. in 2018. [8]

Users

Fluentd was one of the data collection tools recommended by Amazon Web Services in 2013, when it was said to be similar to Apache Flume or Scribe. [9] Google Cloud Platform's BigQuery recommends Fluentd as the default real-time data-ingestion tool, and uses Google's customized version of Fluentd, called google-fluentd, as a default logging agent. [10] [11]

Fluent Bit

Fluent Bit is a log processor and log forwarder which is being developed as a CNCF sub-project under the umbrella of Fluentd project. [12] Fluentd is written in C and Ruby and built as a Ruby gem so it consumes some amount of memory resources. On the other hand, since Fluent Bit is written only in C and has no dependencies, the consumed memory usage much decreased compared to Fluentd which makes it easy to run on the embedded Linux and container environment. [13]

Related Research Articles

Cascading is a software abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based language, hiding the underlying complexity of MapReduce jobs. It is open source and available under the Apache License. Commercial support is available from Driven, Inc.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

Progress Chef is a configuration management tool written in Ruby and Erlang. It uses a pure-Ruby, domain-specific language (DSL) for writing system configuration "recipes". Chef is used to streamline the task of configuring and maintaining a company's servers, and can integrate with cloud-based platforms such as Amazon EC2, Google Cloud Platform, Oracle Cloud, OpenStack, IBM Cloud, Microsoft Azure, and Rackspace to automatically provision and configure new machines. Chef contains solutions for both small and large scale systems.

<span class="mw-page-title-main">Cloud Foundry</span> Open source, multi-cloud application platform as a service

Cloud Foundry is an open source, multi-cloud application platform as a service (PaaS) governed by the Cloud Foundry Foundation, a 501(c)(6) organization.

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.

Cloud analytics is a marketing term for businesses to carry out analysis using cloud computing. It uses a range of analytical tools and techniques to help companies extract information from massive data and present it in a way that is easily categorised and readily available via a web browser.

<span class="mw-page-title-main">Hue (software)</span> Open-source SQL Cloud Editor

Hue is an open-source SQL Cloud Editor, licensed under the Apache License 2.0.

<span class="mw-page-title-main">Apache Mesos</span> Software to manage computer clusters

Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

<span class="mw-page-title-main">Apache Kylin</span> Open-source distributed analytics engine

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

<span class="mw-page-title-main">ClickHouse</span> Open-source database management system

ClickHouse is an open-source column-oriented DBMS for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in the San Francisco Bay Area with the subsidiary, ClickHouse B.V., based in Amsterdam, Netherlands.

Microsoft, a technology company historically known for its opposition to the open source software paradigm, turned to embrace the approach in the 2010s. From the 1970s through 2000s under CEOs Bill Gates and Steve Ballmer, Microsoft viewed the community creation and sharing of communal code, later to be known as free and open source software, as a threat to its business, and both executives spoke negatively against it. In the 2010s, as the industry turned towards cloud, embedded, and mobile computing—technologies powered by open source advances—CEO Satya Nadella led Microsoft towards open source adoption although Microsoft's traditional Windows business continued to grow throughout this period generating revenues of 26.8 billion in the third quarter of 2018, while Microsoft's Azure cloud revenues nearly doubled.

<span class="mw-page-title-main">Trino (SQL query engine)</span>

Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can query datalakes that contain open column-oriented data file formats like ORC or Parquet residing on different storage systems like HDFS, AWS S3, Google Cloud Storage, or Azure Blob Storage using the Hive and Iceberg table formats. Trino also has the ability to run federated queries that query tables in different data sources such as MySQL, PostgreSQL, Cassandra, Kafka, MongoDB and Elasticsearch. Trino is released under the Apache License.

References

  1. "Releases - fluent/fluentd" via GitHub.
  2. Pasupuleti, Pradeep and Purra, Beulah Salome (2015). Data Lake Development with Big Data. pp. 44–45; 48. Packt. ISBN   1785881663
  3. Suonsyrjä, Sampo and Mikkonen, Tommi "Designing an Unobtrusive Analytics Framework for Monitoring Java Applications", pp. 170–173 in Software Measurement. Springer. ISBN   3319242857
  4. Fluentd.org. "Download Fluentd". Retrieved 10 March 2016.
  5. Mayer, Chris (30 October 2013). "Treasure Data: Breaking down the Hadoop barrier". JAXenter
  6. Fluentd.org. "What is Fluentd?". Retrieved 10 March 2016.
  7. Derrick Harris (July 23, 2013). "Treasure Data raises $5M, fuses Hadoop and data warehouse in Amazon's cloud". GigaOm.
  8. "Arm unit Treasure Data to seek buyer or IPO before Nvidia sale". Nikkei Asia. November 19, 2020. Retrieved August 2, 2021.
  9. Parviz Deyhim (August 2013). "Best Practices for Amazon EMR" (PDF). Amazon Web Services. p. 12. Archived from the original (PDF) on 2016-03-26. Retrieved March 24, 2017.
  10. Google Cloud Platform (2016). "Real-time logs analysis using Fluentd and BigQuery". Retrieved 10 March 2016.
  11. Google Cloud Platform (2016). "The Logging Agent". Retrieved 10 March 2016.
  12. "Fluent Bit". fluentbit.io. Retrieved 2021-12-05.
  13. "Fluentd & Fluent Bit - Fluent Bit: Official Manual" . Retrieved 2021-12-05.

Further reading