Gearman

Last updated
Gearman
Original author(s) Brad Fitzpatrick
Developer(s) Brian Aker, Eric Day
Initial releaseJanuary 8, 2009 (2009-01-08)
Stable release
1.1.19.1 / February 18, 2020;4 years ago (2020-02-18)
Repository github.com/gearman/gearmand/
Written in C++, C, M4, Shell script
Operating system Linux, Windows (no server), OpenVMS
Available inEnglish
License BSD License
Website gearman.org

Gearman is an open-source application framework designed to distribute appropriate computer tasks to multiple computers, so large tasks can be done more quickly. In some cases, load balancing rather than raw speed may be the main goal; a Web server, for instance, could use Gearman to send tasks for which it is not optimized to another computer (which may be running on a different architecture, using another operating system, or loaded with a computer language better suited to a particular operation).

Contents

It was originally written in Perl by Brad Fitzpatrick. Brian Aker and Eric Day rewrote the framework in C.

How Gearman Works

The Gearman Application Stack. Gearman Stack.png
The Gearman Application Stack.

Gearman assigns each involved computer a role as client, job server, or worker. A worker machine can be assigned multiple instances of the worker role, which allows more powerful computers to complete more portions of a given task. Tasks originate on a client, are transmitted from the client to the job server, and performed on one or more workers. The completed task's output is then returned, again by way of the job server, to the client where the task originated. Gearman is conceptually related to MapReduce; Gearman handles MapReduce by allowing worker nodes to map out work to other workers, with the original worker acting as the reducer.

Gearman performs coalescence on the work sent by a client. If two or more clients ask for work to be completed on the same body of work, either by seeing that the same blocks are being sent or by using the unique value sent by the client, it will coalesce the work so that only one worker is used. It does this specifically to avoid thundering herd problems which are common to cache hit failures.

To mitigate the damage that would be done if a job server (or its network connection) were to fail, clients can be configured with more than one assigned job server; if the first assigned job server fails, another can be transparently substituted.

Gearman implements a protocol that consists of binary packets containing requests and responses; this protocol defines the structure of messages passing between the three parts of a Gearman implementation. By default, the Gearman protocol uses TCP port 4730. It previously operated on port 7003, but this conflicted with the AFS port range and the new port (4730) was assigned by IANA.

The name "Gearman" was chosen as an anagram for "Manager", "since it dispatches jobs to be done, but does not do anything useful itself." [1]

Features

Implementations

Clients

Currently there are client libraries for C, Perl, Node.js, Python, PHP, Ruby, Java, .NET, JMS, MySQL, PostgreSQL, and Drizzle. [2]

Citations

  1. "Gearman [Gearman Job Server]".
  2. Gearman page Client & Worker APIs

Related Research Articles

<span class="mw-page-title-main">Client–server model</span> Distributed application structure in computing

The client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate over a computer network on separate hardware, but both client and server may reside in the same system. A server host runs one or more server programs, which share their resources with clients. A client usually does not share any of its resources, but it requests content or service from a server. Clients, therefore, initiate communication sessions with servers, which await incoming requests. Examples of computer applications that use the client–server model are email, network printing, and the World Wide Web.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes, or web services with many concurrent users.

Server-side scripting is a technique used in web development which involves employing scripts on a web server which produces a response customized for each user's (client's) request to the website. Scripts can be written in any of a number of server-side scripting languages that are available. Server-side scripting is distinguished from client-side scripting where embedded scripts, such as JavaScript, are run client-side in a web browser, but both techniques are often used together. The alternative to either or both types of scripting is for the web server itself to deliver a static web page.

<span class="mw-page-title-main">Load balancing (computing)</span> Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

Web development is the work involved in developing a website for the Internet or an intranet. Web development can range from developing a simple single static page of plain text to complex web applications, electronic businesses, and social network services. A more comprehensive list of tasks to which Web development commonly refers, may include Web engineering, Web design, Web content development, client liaison, client-side/server-side scripting, Web server and network security configuration, and e-commerce development.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

<span class="mw-page-title-main">LAMP (software bundle)</span> Acronym for a common web hosting solution

A LAMP is one of the most common software stacks for the web's most popular applications. Its generic software stack model has largely interchangeable components.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

The Test Anything Protocol (TAP) is a protocol for communicating between test logic, called a TAP producer, and a test harness in a language-agnostic way. Originally developed for unit testing of the Perl interpreter in 1987, producers and parsers are now available for many development platforms.

The null coalescing operator is a binary operator that is part of the syntax for a basic conditional expression in several programming languages, such as : C# since version 2.0, Dart since version 1.12.0, PHP since version 7.0.0, Perl since version 5.10 as logical defined-or, PowerShell since 7.0.0, and Swift as nil-coalescing operator.

Netdisco is a network management tool suitable for small to very large networks. IP address and MAC address data is collected into a PostgreSQL database using SNMP, CLI, or device APIs. It allows network administrators to locate the exact switch port of any node connected to the network.

<span class="mw-page-title-main">Xgrid</span> Distributed computing protocol created by Apple

Xgrid is a proprietary grid computing program and protocol developed by the Advanced Computation Group subdivision of Apple Inc.

Web2py is an open-source web application framework written in the Python programming language. Web2py allows web developers to program dynamic web content using Python. Web2py is designed to help reduce tedious web development tasks, such as developing web forms from scratch, although a web developer may build a form from scratch if required.

<span class="mw-page-title-main">Opa (programming language)</span>

Opa is an open-source programming language for developing scalable web applications.

In database management systems (DBMS), a prepared statement, parameterized statement, or parameterized query is a feature where the database pre-compiles SQL code and stores the results, separating it from data. Benefits of prepared statements are:

SymmetricDS is open source software for database and file synchronization with Multi-master replication, filtered synchronization, and transformation capabilities. It is designed to scale for a large number of nodes, work across low-bandwidth connections, and withstand periods of network outage. Data synchronization occurs asynchronously from a scheduled job, with data changes being sent over a push or pull operation. It uses standard web protocols (HTTP) and database technologies (JDBC) in order to support a wide range of platforms and maximize its interoperability. It includes support for Oracle, MySQL, MariaDB, PostgreSQL, Greenplum, SQL Server, SQL Server Azure, HSQLDB, H2, Derby, DB2, Firebird, Informix, Interbase, SQLite, Sybase ASE, Sybase ASA, MongoDB, Amazon_Redshift, and VoltDB databases.

The Internet Assigned Numbers Authority (IANA) officially assigned port 4605 to the SixChat End2End Direct secure messaging protocol created by Sixscape Communications, Pte. Ltd. The assignment was issued by IANA on 11 September 2014, and is listed in the official IANA resource registry at https://www.iana.org/assignments/service-names-port-numbers

The following outline is provided as an overview of and topical guide to MySQL: