Apache NiFi

Last updated
Apache NiFi
Developer(s) Apache Software Foundation
Initial release2006;18 years ago (2006)
Stable release
1.25.0 / 30 January 2024;40 days ago (2024-01-30) [1]
Repository github.com/apache/nifi
Written in Java
Operating system Cross-platform
Type Distributed dataflow
License Apache License 2.0
Website nifi.apache.org

Apache NiFi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems. Leveraging the concept of extract, transform, load (ETL), it is based on the "NiagaraFiles" software previously developed by the US National Security Agency (NSA), which is also the source of a part of its present name – NiFi. It was open-sourced as a part of NSA's technology transfer program in 2014. [2] [3] [4] [5] [6]

Contents

The software design is based on the flow-based programming model and offers features which prominently include the ability to operate within clusters, security using TLS encryption, extensibility (users can write their own software to extend its abilities) and improved usability features like a portal which can be used to view and modify behaviour visually. [7]

Components

NiFi - Software Components Apache NiFi Components.png
NiFi - Software Components

NiFi is a Java program that runs within a Java virtual machine running on a server. [8] The prominent components of Nifi are:

See also

Related Research Articles

The Jakarta Project created and maintained open source software for the Java platform. It operated as an umbrella project under the auspices of the Apache Software Foundation, and all Jakarta products are released under the Apache License. As of December 21, 2011 the Jakarta project was retired because no subprojects were remaining.

<span class="mw-page-title-main">Apache Subversion</span> Free and open-source software versioning and revision control system

Apache Subversion is a software versioning and revision control system distributed as open source under the Apache License. Software developers use Subversion to maintain current and historical versions of files such as source code, web pages, and documentation. Its goal is to be a mostly compatible successor to the widely used Concurrent Versions System (CVS).

<span class="mw-page-title-main">IntelliJ IDEA</span> Integrated development environment

IntelliJ IDEA is an integrated development environment (IDE) written in Java for developing computer software written in Java, Kotlin, Groovy, and other JVM-based languages. It is developed by JetBrains and is available as an Apache 2 Licensed community edition, and in a proprietary commercial edition. Both can be used for commercial development.

In computer programming, dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles and architecture. Dataflow programming languages share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term datastream instead of dataflow to avoid confusion with dataflow computing or dataflow architecture, based on an indeterministic machine paradigm. Dataflow programming was pioneered by Jack Dennis and his graduate students at MIT in the 1960s.

Free Java implementations are software projects that implement Oracle's Java technologies and are distributed under free software licences, making them free software. Sun released most of its Java source code as free software in May 2007, so it can now almost be considered a free Java implementation. Java implementations include compilers, runtimes, class libraries, etc. Advocates of free and open source software refer to free or open source Java virtual machine software as free runtimes or free Java runtimes.

Maven is a build automation tool used primarily for Java projects. Maven can also be used to build and manage projects written in C#, Ruby, Scala, and other languages. The Maven project is hosted by The Apache Software Foundation, where it was formerly part of the Jakarta Project.

Content Repository API for Java (JCR) is a specification for a Java platform application programming interface (API) to access content repositories in a uniform manner. The content repositories are used in content management systems to keep the content data and also the metadata used in content management systems (CMS) such as versioning metadata. The specification was developed under the Java Community Process as JSR-170, and as JSR-283. The main Java package is javax.jcr.

<span class="mw-page-title-main">Apache Felix</span> Open-source OSGi framework

Apache Felix is an open source implementation of the OSGi Core Release 6 framework specification. The initial codebase was donated from the Oscar project at ObjectWeb. The developers worked on Felix for a full year and have made various improvements while retaining the original footprint and performance. On June 21, 2007, the project graduated from incubation as a top level project and is considered the smallest size software at Apache Software Foundation.

Apache ObJectRelationalBridge (OJB) is an Object/Relational mapping tool that allows transparent persistence for Java Objects against relational databases. It was released on April 6, 2005.

<span class="mw-page-title-main">Spring Roo</span> Open-source software tool

Spring Roo is an open-source software tool that uses convention-over-configuration principles to provide rapid application development of Java-based enterprise software. The resulting applications use common Java technologies such as Spring Framework, Java Persistence API, Thymeleaf, Apache Maven and AspectJ. Spring Roo is a member of the Spring portfolio of projects.

<span class="mw-page-title-main">Apache ZooKeeper</span> System for distributed coordination

Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications. It is a project of the Apache Software Foundation.

LabKey Server is a software suite available for scientists to integrate, analyze, and share biomedical research data. The platform provides a secure data repository that allows web-based querying, reporting, and collaborating across a range of data sources. Specific scientific applications and workflows can be added on top of the basic platform and leverage a data processing pipeline.

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems via Kafka Connect, and provides the Kafka Streams libraries for stream processing applications. Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes."

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

<span class="mw-page-title-main">Apache Flink</span> Framework and distributed processing engine

Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs. Furthermore, Flink's runtime supports the execution of iterative algorithms natively.

Perforce Software, Inc. is an American developer of software used for developing and running applications, including version control software, web-based repository management, developer collaboration, application lifecycle management, web application servers, debugging tools and agile planning software.

Microsoft, a technology company historically known for its opposition to the open source software paradigm, turned to embrace the approach in the 2010s. From the 1970s through 2000s under CEOs Bill Gates and Steve Ballmer, Microsoft viewed the community creation and sharing of communal code, later to be known as free and open source software, as a threat to its business, and both executives spoke negatively against it. In the 2010s, as the industry turned towards cloud, embedded, and mobile computing—technologies powered by open source advances—CEO Satya Nadella led Microsoft towards open source adoption although Microsoft's traditional Windows business continued to grow throughout this period generating revenues of 26.8 billion in the third quarter of 2018, while Microsoft's Azure cloud revenues nearly doubled.

References

  1. "Apache NiFi Downloads". nifi.apache.org. Retrieved 2024-02-16.
  2. "NSA Releases First in Series of Software Products to Open Source Community". www.nsa.gov. Archived from the original on 2017-12-07. Retrieved 2017-12-07.
  3. Bridgwater, Adrian (2015-07-21). "NSA 'NiFi' Big Data Automation Project Out In The Open". Forbes . Retrieved 2016-09-21.
  4. Vaughan-Nichols, Steven J. "NSA partners with Apache to release open-source data traffic program | ZDNet". ZDNet. Retrieved 2016-09-21.
  5. "NSA Source Code Leak: Information slurp tools to appear online". The Register . Retrieved 2016-09-21.
  6. Wolpe, Toby. "Hortonworks CTO on Apache NiFi: What is it and why does it matter to IoT? | ZDNet". ZDNet. Retrieved 2016-09-21.
  7. "Apache NiFi Documentation". nifi.apache.org. Retrieved 2017-12-07.
  8. "Apache NiFi Developer Guide". nifi.apache.org. Retrieved 2018-01-31.