Apache Thrift

Last updated
Apache Thrift
Original author(s) Facebook, Inc.
Developer(s) Apache Software Foundation
Stable release
0.19.0 / 2 September 2023;6 months ago (2023-09-02) [1]
Repository Thrift Repository
Written in ActionScript, C, C#, C++, D, Dart, Delphi, Erlang, Go, Haskell, Haxe, Java, JavaScript, Node.js, OCaml, Perl, PHP, Python, Rust, Scala, Smalltalk
Type Remote procedure call framework
License Apache 2.0
Website thrift.apache.org

Thrift is an interface definition language and binary communication protocol [2] used for defining and creating services for programming languages. [3] It was developed by Facebook. Since 2020, it is an open source project in the Apache Software Foundation.

Contents

It uses a remote procedure call (RPC) framework and combines a software stack with a code generation engine to build cross-platform services. Ultimately, Thrift can connect applications written in a variety of languages and frameworks, including ActionScript, C, C++, [4] C#, [5] Cocoa, Delphi, Erlang, Go, Haskell, Java, JavaScript, Objective-C, OCaml, Perl, PHP, Python, Ruby, Elixir, [6] Rust, Scala, Smalltalk, and Swift. [7] The implementation was described in a April 2007 technical paper released by Facebook, now hosted on Apache. [8] [9]

Architecture

The Apache Thrift API client/server architecture Apache Thrift architecture.png
The Apache Thrift API client/server architecture

Thrift includes a complete stack for creating clients and servers. [10] The top part is generated code from the Thrift definition. From this file, the services generate client and processor codes. In contrast to built-in types, created data structures are sent as a result of generated code. The protocol and transport layer are part of the runtime library. With Thrift, it is possible to define a service and change the protocol and transport without recompiling the code. Besides the client part, Thrift includes server infrastructure to tie protocols and transports together, like blocking, non-blocking and multi-threaded servers. The underlying I/O part of the stack is implemented differently for different languages.

Thrift supports a number of protocols: [10]

The supported transports are:

Thrift also provides a number of servers, which are:

Benefits

Some stated benefits of Thrift include: [12]

Creating a Thrift service

Thrift is written in C++, but can create code for a number of languages. To create a Thrift service, one has to write Thrift files that describe it, generate the code in the destination language, write some code to start the server and call it from the client. Here is a code example of such a description file:

enumPhoneType{HOME,WORK,MOBILE,OTHER}structPhone{1:i32id,2:stringnumber,3:PhoneTypetype}servicePhoneService{PhonefindById(1:i32id),list<Phone>findAll()}

Thrift will generate the code out of this descriptive information. For instance, in Java, the PhoneType will be a simple enum inside the Phone class.

See also

Related Research Articles

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message-passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between systems on different operating systems, programming languages, and computing hardware. CORBA uses an object-oriented model although the systems that use the CORBA do not have to be object-oriented. CORBA is an example of the distributed object paradigm.

<span class="mw-page-title-main">Interface description language</span> Computer language used to describe a software components interface

An interface description language or interface definition language (IDL) is a generic term for a language that lets a program or object written in one language communicate with another program written in an unknown language. IDLs are usually used to describe data types and interfaces in a language-independent way, for example, between those written in C++ and those written in Java.

External Data Representation (XDR) is a standard data serialization format, for uses such as computer network protocols. It allows data to be transferred between different kinds of computer systems. Converting from the local representation to XDR is called encoding. Converting from XDR to the local representation is called decoding. XDR is implemented as a software library of functions which is portable between different operating systems and is also independent of the transport layer.

The Web Services Invocation Framework (WSIF) supports a simple and flexible Java API for invoking any Web Services Description Language (WSDL)-described service.

OPeNDAP is an acronym for "Open-source Project for a Network Data Access Protocol," an endeavor focused on enhancing the retrieval of remote, structured data through a Web-based architecture and a discipline-neutral Data Access Protocol (DAP). Widely used, especially in Earth science, the protocol is layered on HTTP, and its current specification is DAP4, though the previous DAP2 version remains broadly used. Developed and advanced by the non-profit OPeNDAP, Inc., DAP is intended to enable remote, selective data-retrieval as an easily invoked Web service. OPeNDAP, Inc. also develops and maintains zero-cost (reference) implementations of the DAP protocol in both server-side and client-side software.

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

<span class="mw-page-title-main">Twisted (software)</span> Event-driven network programming framework

Twisted is an event-driven network programming framework written in Python and licensed under the MIT License.

The Internet Communications Engine, or Ice, is an open-source RPC framework developed by ZeroC. It provides SDKs for C++, C#, Java, JavaScript, MATLAB, Objective-C, PHP, Python, Ruby and Swift, and can run on various operating systems, including Linux, Windows, macOS, iOS and Android.

OPC Unified Architecture is a cross-platform, open-source, IEC62541 standard for data exchange from sensors to cloud applications developed by the OPC Foundation. Distinguishing characteristics are:

In computer science, marshalling or marshaling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission, especially between different runtimes. It is typically used when data must be moved between different parts of a computer program or from one program to another.

Web2py is an open-source web application framework written in the Python programming language. Web2py allows web developers to program dynamic web content using Python. Web2py is designed to help reduce tedious web development tasks, such as developing web forms from scratch, although a web developer may build a form from scratch if required.

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.

<span class="mw-page-title-main">Apache Avro</span> Open-source remote procedure call framework

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing and another which is more machine-readable based on JSON.

WAMP is a WebSocket subprotocol registered at IANA, specified to offer routed RPC and PubSub. Its design goal is to provide an open standard for soft, real-time message exchange between application components and ease the creation of loosely coupled architectures based on microservices. Because of this, it is a suitable enterprise service bus (ESB), fit for developing responsive web applications or coordinating multiple connected IoT devices.

gRPC is a cross-platform open source high performance remote procedure call (RPC) framework. gRPC was initially created by Google, which used a single general-purpose RPC infrastructure called Stubby to connect the large number of microservices running within and across its data centers from about 2001. In March 2015, Google decided to build the next version of Stubby and make it open source. The result was gRPC, which is now used in many organizations aside from Google to power use cases from microservices to the "last mile" of computing. It uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.

FlatBuffers is a free software library implementing a serialization format similar to Protocol Buffers, Thrift, Apache Avro, SBE, and Cap'n Proto, primarily written by Wouter van Oortmerssen and open-sourced by Google. It supports “zero-copy” deserialization, so that accessing the serialized data does not require first copying it into a separate part of memory. This makes accessing data in these formats much faster than data in formats requiring more extensive processing, such as JSON, CSV, and in many cases Protocol Buffers. Compared to other serialization formats however, the handling of FlatBuffers requires usually more code, and some operations are not possible.

References

  1. "Apache Thrift - Downloads" . Retrieved 6 January 2024.
  2. "Installing and using Apache Cassandra With Java Part 4 (Thrift Client)". Sodeso – Software Development Solutions. Retrieved 2011-03-30. Thrift is a separate Apache project which is a binary communication protocol
  3. Andrew Prunicki. "Apache Thrift: Introduction". Object Computing. Archived from the original on 2011-07-23. Retrieved 2011-04-11. Through a simple and straightforward Interface Definition Language (IDL), Thrift allows [users] to define and create services which are both consumable by and serviceable by numerous languages. Using code generation, Thrift creates a set of files which can then be used for creating clients and/or servers. In addition to interoperability, Thrift can be very efficient through a unique serialization mechanism that is efficient in both time and space.
  4. Thrift Requirements, see this issue for Windows support
  5. Fred Potter, Using Thrift with Cappuccino Archived 2011-08-12 at the Wayback Machine , parallel48's posterously luscious blog, 10 June 2010.
  6. pinterest/elixir-thrift, Pinterest, 2020-02-05, retrieved 2020-02-06
  7. Andrew Prunicki. "Apache Thrift: Code Generation". Object Computing. Archived from the original on 2011-07-23. Retrieved 2011-04-12. Thrift supports many programming languages too varying degrees. The complete list is below. Be careful before assuming that just because your language has some support that it supports all of Thrift's features. Python for example, only supports TBinaryProtocol. Cocoa, C++, C#, Erlang, Haskell, Java, OCaml, Perl, PHP, Python, Ruby, and Smalltalk
  8. Mark Slee, Aditya Agarwal, Marc Kwiatkowski, Thrift: Scalable Cross-Language Services Implementation
  9. "LibraryFeatures - Thrift Wiki" . Retrieved 2016-04-21.
  10. 1 2 Andrew Prunicki. "Apache Thrift: Introduction". Object Computing. Archived from the original on 2011-07-23. Retrieved 2011-04-11. The top portion of the stack is generated code from your Thrift definition file. Thrift services result in generated client and processor code. These are represented by the brown boxes in the diagram. The data structures that are sent (other than built-in types) also result in generated code. These result in the red boxes. The protocol and transport are part of the Thrift runtime library. Therefore with Thrift, you can define a service, and are free to change the protocol and transport without re-generating your code. Thrift also includes a server infrastructure to tie the protocols and transports together. There are blocking, non-blocking, single and multi-threaded servers available. The "Underlying I/O" portion of the stack differs based on the language in question. For Java and Python network I/O, the built-in libraries are leveraged by the Thrift library, while the C++ implementation uses its own custom implementation.
  11. Skelton, Steven (3 August 2013). "Developer Friendly Thrift Request Logging" . Retrieved 3 July 2014.
  12. Programmer's Guide to Apache Thrift, Randy Abernathy, Manning Publications, 2019, ISBN   978-1-6172-9616-1