Gearman

Last updated
Gearman
Gearman80 title.gif
Original author(s) Brad Fitzpatrick
Developer(s) Brian Aker, Eric Day
Initial releaseJanuary 8, 2009 (2009-01-08)
Stable release
1.1.19.1 / February 18, 2020;2 years ago (2020-02-18)
Repository github.com/gearman/gearmand/
Written in C++, C, M4, Shell script
Operating system Linux, Windows (no server), OpenVMS
Available inEnglish
License BSD License
Website gearman.org

Gearman is an open-source application framework designed to distribute appropriate computer tasks to multiple computers, so large tasks can be done more quickly. In some cases, load balancing rather than raw speed may be the main goal; a Web server, for instance, could use Gearman to send tasks for which it is not optimized to another computer (which may be running on a different architecture, using another operating system, or loaded with a computer language better suited to a particular operation).

Contents

It was originally written in Perl by Brad Fitzpatrick. Brian Aker and Eric Day rewrote the framework in C.

How Gearman Works

The Gearman Application Stack. Gearman Stack.png
The Gearman Application Stack.

Gearman assigns each involved computer a role as client, job server, or worker. A worker machine can be assigned multiple instances of the worker role, which allows more powerful computers to complete more portions of a given task. Tasks originate on a client, are transmitted from the client to the job server, and performed on one or more workers. The completed task's output is then returned, again by way of the job server, to the client where the task originated. Gearman is conceptually related to MapReduce; Gearman handles MapReduce by allowing worker nodes to map out work to other workers, with the original worker acting as the reducer.

Gearman performs coalescence on the work sent by a client. If two or more clients ask for work to be completed on the same body of work, either by seeing that the same blocks are being sent or by using the unique value sent by the client, it will coalesce the work so that only one worker is used. It does this specifically to avoid thundering herd problems which are common to cache hit failures.

To mitigate the damage that would be done if a job server (or its network connection) were to fail, clients can be configured with more than one assigned job server; if the first assigned job server fails, another can be transparently substituted.

Gearman implements a protocol that consists of binary packets containing requests and responses; this protocol defines the structure of messages passing between the three parts of a Gearman implementation. By default, the Gearman protocol uses TCP port 4730. It previously operated on port 7003, but this conflicted with the AFS port range and the new port (4730) was assigned by IANA.

The name "Gearman" was chosen as an anagram for "Manager", "since it dispatches jobs to be done, but does not do anything useful itself." [1]

Features

Implementations

Clients

Currently there are client libraries for C, Perl, Node.js, Python, PHP, Ruby, Java, .NET, JMS, MySQL, PostgreSQL, and Drizzle. [2]

Similar software

Citations

  1. http://gearman.org/#introduction
  2. Gearman page Client & Worker APIs

Related Research Articles

Integrated development environment Software engineering toolkit

An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of at least a source code editor, build automation tools and a debugger. Some IDEs, such as NetBeans and Eclipse, contain the necessary compiler, interpreter, or both; others, such as SharpDevelop and Lazarus, do not.

PostgreSQL Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

Server-side scripting is a technique used in web development which involves employing scripts on a web server which produces a response customized for each user's (client's) request to the website. The alternative is for the web server itself to deliver a static web page. Scripts can be written in any of a number of server-side scripting languages that are available. Server-side scripting is distinguished from client-side scripting where embedded scripts, such as JavaScript, are run client-side in a web browser, but both techniques are often used together.

Load balancing (computing) Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing refers to the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

Master/slave is a model of asymmetric communication or control where one device or process controls one or more other devices or processes and serves as their communication hub. In some systems, a master is selected from a group of eligible devices, with the other devices acting in the role of slaves.

In the client–server model, server-side refers to programs and operations that run on the server. This is in contrast to client-side programs and operations which run on the client.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

LAMP (software bundle) Acronym for a common web hosting solution

LAMP is an acronym denoting one of the most common software stacks for many of the web's most popular applications. However, LAMP now refers to a generic software stack model and its components are largely interchangeable.

Catalyst (software)

Catalyst is an open source web application framework written in Perl, that closely follows the model–view–controller (MVC) architecture, and supports a number of experimental web patterns. It is written using Moose, a modern object system for Perl. Its design is heavily inspired by such frameworks as Ruby on Rails, Maypole, and Spring.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

The Test Anything Protocol (TAP) is a protocol to allow communication between unit tests and a test harness. It allows individual tests to communicate test results to the testing harness in a language-agnostic way. Originally developed for unit testing of the Perl interpreter in 1987, producers and parsers are now available for many development platforms.

The null coalescing operator is a binary operator that is part of the syntax for a basic conditional expression in several programming languages, including C#, PowerShell as of version 7.0.0, Perl as of version 5.10, Swift, and PHP 7.0.0. While its behavior differs between implementations, the null coalescing operator generally returns the result of its left-most operand if it exists and is not null, and otherwise returns the right-most operand. This behavior allows a default value to be defined for cases where a more specific value is not available.

Brian Aker American open-source hacker

Brian Aker, born August 4, 1972 in Lexington, Kentucky, US is an open-source hacker who has worked on various Apache modules, the Slash system, and numerous storage engines for the MySQL database. Aker was Director of Architecture at MySQL AB until it was acquired by Sun Microsystems. He led Sun's web scaling research group, where he worked on the Drizzle database project. He later became a Distinguished Engineer for Sun Microsystems. After leaving Sun when Oracle acquired it, he became the CTO of Data Differential and provided support to open source projects such as libmemcached, Gearman and the Drizzle database project. Aker is currently a Fellow and VP at Hewlett Packard Enterprise.

Xgrid Distributed computing protocol created by Apple

Xgrid is a proprietary program and distributed computing protocol developed by the Advanced Computation Group subdivision of Apple Inc that allows networked computers to contribute to a single task.

Opa (programming language)

Opa is an open-source programming language for developing scalable web applications.

In database management systems (DBMS), a prepared statement or parameterized statement is a feature used to pre-compile SQL code, separating it from data. Benefits of prepared statements are:

SymmetricDS is open source software for database and file synchronization with Multi-master replication, filtered synchronization, and transformation capabilities. It is designed to scale for a large number of nodes, work across low-bandwidth connections, and withstand periods of network outage. Data synchronization occurs asynchronously from a scheduled job, with data changes being sent over a push or pull operation. It uses standard web protocols (HTTP) and database technologies (JDBC) in order to support a wide range of platforms and maximize its interoperability. It includes support for Oracle, MySQL, MariaDB, PostgreSQL, Greenplum, SQL Server, SQL Server Azure, HSQLDB, H2, Derby, DB2, Firebird, Informix, Interbase, SQLite, Sybase ASE, Sybase ASA, MongoDB, Amazon_Redshift, and VoltDB databases.

The Internet Assigned Numbers Authority (IANA) officially assigned port 4605 to the SixChat End2End Direct secure messaging protocol created by Sixscape Communications, Pte. Ltd. The assignment was issued by IANA on 11 September 2014, and is listed in the official IANA resource registry at https://www.iana.org/assignments/service-names-port-numbers

Kea is an open-source DHCP server developed by the Internet Systems Consortium, authors of ISC DHCP, also known as DHCPd. Kea and ISC DHCP are both implementations of the Dynamic Host Configuration Protocol, a set of standards established by the Internet Engineering Task Force (IETF). Kea software is distributed in source code form on GitHub, from various ISC sites, and through a number of operating system packages. Kea is licensed under the Mozilla Public License 2.0.