Protocol Buffers

Last updated
Protocol Buffers
Developer(s) Google
Initial releaseEarly 2001 (internal) [1]
July 7, 2008 (2008-07-07) (public)
Stable release
28.2  OOjs UI icon edit-ltr-progressive.svg / 18 September 2024;34 days ago (18 September 2024) [2]
Repository
Written inC++, C#, Java, Python, JavaScript, Ruby, Go, PHP, Dart
Operating system Any
Platform Cross-platform
Type serialization format and library, IDL compiler
License BSD
Website protobuf.dev OOjs UI icon edit-ltr-progressive.svg
Protocol Buffers
Filename extension
.proto
Internet media type application/protobuf, application/vnd.google.protobuf
Developed by Google
Latest release
3
Type of format Interface description language
Open format?Yes
Free format?Yes
Website protobuf.dev OOjs UI icon edit-ltr-progressive.svg

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.

Contents

Overview

Google developed Protocol Buffers for internal use and provided a code generator for multiple languages under an open-source license.

The design goals for Protocol Buffers emphasized simplicity and performance. In particular, it was designed to be smaller and faster than XML. [3]

Protocol Buffers is widely used at Google for storing and interchanging all kinds of structured information. The method serves as a basis for a custom remote procedure call (RPC) system that is used for nearly all inter-machine communication at Google. [4]

Protocol Buffers is similar to the Apache Thrift, Ion, and Microsoft Bond protocols, offering a concrete RPC protocol stack to use for defined services called gRPC. [5]

Data structure schemas (called messages) and services are described in a proto definition file (.proto) and compiled with protoc. This compilation generates code that can be invoked by a sender or recipient of these data structures. For example, example.pb.cc and example.pb.h are generated from example.proto. They define C++ classes for each message and service in example.proto.

Canonically, messages are serialized into a binary wire format which is compact, forward- and backward-compatible, but not self-describing (that is, there is no way to tell the names, meaning, or full datatypes of fields without an external specification). There is no defined way to include or refer to such an external specification (schema) within a Protocol Buffers file. The officially supported implementation includes an ASCII serialization format, [6] but this format—though self-describing—loses the forward- and backward-compatibility behavior, and is thus not a good choice for applications other than human editing and debugging. [7]

Though the primary purpose of Protocol Buffers is to facilitate network communication, its simplicity and speed make Protocol Buffers an alternative to data-centric C++ classes and structs, especially where interoperability with other languages or systems might be needed in the future.

Limitations

Protobufs have no single specification. [8] The format is best suited for small data chunks that don't exceed a few megabytes and can be loaded/sent into a memory right away and therefore is not a streamable format. [9] The library doesn't provide compression out of the box. The format also isn't well supported in non–object-oriented languages (e.g. Fortran). [10]

Example

A schema for a particular use of protocol buffers associates data types with field names, using integers to identify each field. (The protocol buffer data contains only the numbers, not the field names, providing some bandwidth/storage savings compared with systems that include the field names in the data.)

// polyline.protosyntax="proto2";messagePoint{requiredint32x=1;requiredint32y=2;optionalstringlabel=3;}messageLine{requiredPointstart=1;requiredPointend=2;optionalstringlabel=3;}messagePolyline{repeatedPointpoint=1;optionalstringlabel=2;}

The "Point" message defines two mandatory data items, x and y. The data item label is optional. Each data item has a tag. The tag is defined after the equal sign. For example, x has the tag 1.

The "Line" and "Polyline" messages, which both use Point, demonstrate how composition works in Protocol Buffers. Polyline has a repeated field, and thus Polyline behaves like a set of points (of unspecified number).

This schema can subsequently be compiled for use by one or more programming languages. Google provides a compiler called protoc which can produce output for C++, Java or Python. Other schema compilers are available from other sources to create language-dependent output for over 20 other languages. [11]

For example, after a C++ version of the protocol buffer schema above is produced, a C++ source code file, polyline.cpp, can use the message objects as follows:

// polyline.cpp#include"polyline.pb.h"  // generated by calling "protoc polyline.proto"Line*createNewLine(conststd::string&name){// create a line from (10, 20) to (30, 40)Line*line=newLine;line->mutable_start()->set_x(10);line->mutable_start()->set_y(20);line->mutable_end()->set_x(30);line->mutable_end()->set_y(40);line->set_label(name);returnline;}Polyline*createNewPolyline(){// create a polyline with points at (10,10) and (20,20)Polyline*polyline=newPolyline;Point*point1=polyline->add_point();point1->set_x(10);point1->set_y(10);Point*point2=polyline->add_point();point2->set_x(20);point2->set_y(20);returnpolyline;}

Language support

Protobuf 2.0 provides a code generator for C++, Java, C#, [12] and Python. [13]

Protobuf 3.0 provides a code generator for C++, Java (including JavaNano, a dialect intended for low-resource environments), Python, Go, Ruby, Objective-C, C#. [14] It also supports JavaScript since 3.0.0-beta-2. [15]

Third-party implementations are also available for Ballerina, [16] C, [17] [18] C++, [19] Dart, Elixir, [20] [21] Erlang, [22] Haskell, [23] JavaScript, [24] Julia, [25] Nim, [26] Perl, PHP, Prolog, [27] [28] R, [29] Rust, [30] [31] [32] Scala, [33] and Swift. [34]

See also

Related Research Articles

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

<span class="mw-page-title-main">Serialization</span> Conversion process for computer data

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.

Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

<span class="mw-page-title-main">Interface description language</span> Computer language used to describe a software components interface

An interface description language or interface definition language (IDL) is a generic term for a language that lets a program or object written in one language communicate with another program written in an unknown language. IDLs are usually used to describe data types and interfaces in a language-independent way, for example, between those written in C++ and those written in Java.

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

The Internet Communications Engine, or Ice, is an open-source RPC framework developed by ZeroC. It provides SDKs for C++, C#, Java, JavaScript, MATLAB, Objective-C, PHP, Python, Ruby and Swift, and can run on various operating systems, including Linux, Windows, macOS, iOS and Android.

Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. The Actionscript 3 language provides classes for encoding and decoding from the AMF format.

In computer science, marshalling or marshaling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission, especially between different runtimes. It is typically used when data must be moved between different parts of a computer program or from one program to another.

<span class="mw-page-title-main">Clojure</span> Dialect of the Lisp programming language on the Java platform

Clojure is a dynamic and functional dialect of the programming language Lisp on the Java platform.

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

<span class="mw-page-title-main">Apache Avro</span> Open-source remote procedure call framework

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages: one for human editing and another which is more machine-readable based on JSON.

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. MessagePack aims to be as compact and simple as possible. The official implementation is available in a variety of languages, some official libraries and others community created, such as C, C++, C#, D, Erlang, Go, Haskell, Java, JavaScript (NodeJS), Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

Qore is an interpreted, high-level, general-purpose, garbage collected dynamic programming language, featuring support for code embedding and sandboxing with optional strong typing and a focus on fundamental support for multithreading and SMP scalability.

Smile is a computer data interchange format based on JSON. It can also be considered a binary serialization of the generic JSON data model, which means tools that operate on JSON may be used with Smile as well, as long as a proper encoder/decoder exists for the tool. The name comes from the first 2 bytes of the 4 byte header, which consist of Smiley ":)" followed by a linefeed: a choice made to make it easier to recognize Smile-encoded data files using textual command-line tools.

WAMP is a WebSocket subprotocol registered at IANA, specified to offer routed RPC and PubSub. Its design goal is to provide an open standard for soft, real-time message exchange between application components and ease the creation of loosely coupled architectures based on microservices. Because of this, it is a suitable enterprise service bus (ESB), fit for developing responsive web applications or coordinating multiple connected IoT devices.

gRPC is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservices to the "last mile" of computing. gRPC uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.

FlatBuffers is a free software library implementing a serialization format similar to Protocol Buffers, Thrift, Apache Avro, SBE, and Cap'n Proto, primarily written by Wouter van Oortmerssen and open-sourced by Google. It supports “zero-copy” deserialization, so that accessing the serialized data does not require first copying it into a separate part of memory. This makes accessing data in these formats much faster than data in formats requiring more extensive processing, such as JSON, CSV, and in many cases Protocol Buffers. Compared to other serialization formats however, the handling of FlatBuffers requires usually more code, and some operations are not possible.

<span class="mw-page-title-main">Deno (software)</span> Secure JavaScript and TypeScript runtime

Deno is a runtime for JavaScript, TypeScript, and WebAssembly that is based on the V8 JavaScript engine and the Rust programming language. Deno was co-created by Ryan Dahl, who also created Node.js.

Cap’n Proto is a data serialization format and Remote Procedure Call (RPC) framework for exchanging data between computer programs. The high-level design focuses on speed and security, making it suitable for network as well as inter-process communication. Cap'n Proto was created by the former maintainer of Google's popular Protocol Buffers framework and was designed to avoid some of its perceived shortcomings.

References

  1. "Frequently Asked Questions | Protocol Buffers". Google Developers . Retrieved 2 October 2016.
  2. "Releases - google/protobuf" via GitHub.
  3. Eishay Smith. "jvm-serializers Benchmarks". GitHub . Retrieved 2010-07-12.
  4. Kenton Varda. "A response to Steve Vinoski" . Retrieved 2008-07-14.
  5. "grpc". grpc.io. Retrieved 2 October 2016.
  6. "text_format.h - Protocol Buffers - Google Code" . Retrieved 2012-03-02.
  7. "Proto Best Practices | Protocol Buffers Documentation" . Retrieved 2023-05-26.
  8. "Overview". protobuf.dev. Retrieved 2023-05-28.
  9. "Overview". protobuf.dev. Retrieved 2023-05-28.
  10. "Overview". protobuf.dev. Retrieved 2023-05-28.
  11. ThirdPartyAddOns - protobuf - Links to third-party add-ons. - Protocol Buffers - Google's data interchange format - Google Project Hosting. Code.google.com. Retrieved on 2013-09-18.
  12. "Protocol Buffers in C#". Code Blockage. Retrieved 2017-05-12.
  13. "Protocol Buffers Language Guide". Google Developers. Retrieved 2016-04-21.
  14. "Language Guide (proto3) | Protocol Buffers". Google Developers. Retrieved 2020-08-09.
  15. "Release Protocol Buffers v3.0.0-beta-2 · protocolbuffers/protobuf". GitHub. Retrieved 2020-08-09.
  16. "Ballerina - GRPC". Archived from the original on 2021-11-15. Retrieved 2021-03-24.
  17. "Nanopb - protocol buffers with small code size" . Retrieved 2017-12-12.
  18. "Protocol Buffers implementation in C". GitHub . Retrieved 2017-12-12.
  19. "Embedded Proto - Protobuf for microcontrollers" . Retrieved 2021-08-15.
  20. "Protox". GitHub . 25 October 2021.
  21. "Protobuf-elixir". GitHub . 26 October 2021.
  22. "Tomas-abrahamsson/GPB". GitHub . 19 October 2021.
  23. "Proto-lens". GitHub . 16 October 2021.
  24. "Protocol Buffers for JavaScript". github.com. Retrieved 2016-05-14.
  25. "ThirdPartyAddOns - protobuf - Links to third-party add-ons. - Protocol Buffers - Google's data interchange format - Google Project Hosting" . Retrieved 2012-11-07.
  26. "Protobuf implementation in pure Nim that leverages the power of the macro system to not depend on any external tools". GitHub. 21 October 2021.
  27. "SWI-Prolog: Google's Protocol Buffers Library".
  28. "SWI-Prolog / contrib-protobufs". GitHub . Retrieved 2022-04-21.
  29. "RProtoBuf". GitHub .
  30. "Rust-protobuf". GitHub . 26 October 2021.
  31. "PROST!". GitHub . 21 August 2021.
  32. "Quick-protobuf". GitHub . 12 October 2021.
  33. "ScalaPB". GitHub . Retrieved 27 September 2022.
  34. "Swift Protobuf". GitHub . 26 October 2021.