Scribe (log server)

Last updated
Scribe
Developer(s) Facebook, Inc.
Initial releaseOctober 24, 2008 (2008-10-24)
Repository
Written in C++, PHP, Python
License Apache License 2.0
Website Scribe homepage (Github)

Scribe was a server for aggregating log data streamed in real-time from many servers. It was designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.

Contents

Scribe was developed at Facebook and released in 2008 as open source. [1] [2]

Scribe servers are arranged in a directed graph, with each server knowing only about the next server in the graph. This network topology allows for adding extra layers of fan-in as a system grows, and batching messages before sending them between datacenters, without having any code that explicitly needs to understand datacenter topology, only a simple configuration. [3]

Scribe was designed to consider reliability but to not require heavyweight protocols and expansive disk usage. Scribe spools data to disk on any node to handle intermittent connectivity node failure, but doesn't sync a log file for every message. This creates a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. However, this degree of reliability is often suitable for most Facebook use cases. [3]

See also

Notes and references

  1. "Log in or sign up to view". www.facebook.com. Retrieved 2023-02-28.
  2. McCarthy, Caroline. "Facebook to developers: Here, have some code!". CNET. Retrieved 2023-02-28.
  3. 1 2 https://www.facebook.com/note.php?note_id=32008268919&id=9445547199 [ user-generated source ]


Related Research Articles

<span class="mw-page-title-main">Hyphanet</span> Peer-to-peer Internet platform for censorship-resistant communication

Hyphanet is a peer-to-peer platform for censorship-resistant, anonymous communication. It uses a decentralized distributed data store to keep and deliver information, and has a suite of free software for publishing and communicating on the Web without fear of censorship. Both Freenet and some of its associated tools were originally designed by Ian Clarke, who defined Freenet's goal as providing freedom of speech on the Internet with strong anonymity protection.

<span class="mw-page-title-main">Network topology</span> Arrangement of a communication network

Network topology is the arrangement of the elements of a communication network. Network topology can be used to define or describe the arrangement of various types of telecommunication networks, including command and control radio networks, industrial fieldbusses and computer networks.

<span class="mw-page-title-main">Distributed hash table</span> Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

In software architecture, publish–subscribe is a messaging pattern where publishers categorize messages into classes that are received by subscribers. This is contrasted to the typical messaging pattern model where publishers send messages directly to subscribers.

<span class="mw-page-title-main">RapidIO</span> High-speed interconnect technology

The RapidIO architecture is a high-performance packet-switched electrical connection technology. It supports messaging, read/write and cache coherency semantics. Based on industry-standard electrical specifications such as those for Ethernet, RapidIO can be used as a chip-to-chip, board-to-board, and chassis-to-chassis interconnect.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

<span class="mw-page-title-main">Urs Hölzle</span> Swiss computer scientist

Urs Hölzle is a Swiss software engineer and technology executive. As Google's eighth employee and its first VP of Engineering, he has shaped much of Google's development processes and infrastructure, as well as its engineering culture. His most notable contributions include leading the development of fundamental cloud infrastructure such as energy-efficient data centers, distributed compute and storage systems, and software-defined networking. Until July 2023, he was the Senior Vice President of Technical Infrastructure and Google Fellow at Google. In July 2023, he transitioned to being a Google Fellow only.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers (nodes). Before the task has begun, all network nodes are either unaware which node will serve as the "leader" of the task, or unable to communicate with the current coordinator. After a leader election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task leader.

Social network aggregation is the process of collecting content from multiple social network services into a unified presentation. Examples of social network aggregators include Hootsuite or FriendFeed, which may pull together information into a single location or help a user consolidate multiple social networking profiles into a single profile.

The Facebook Platform is the set of services, tools, and products provided by the social networking service Facebook for third-party developers to create their own applications and services that access data in Facebook.

oVirt Free, open-source virtualization management platform

oVirt is a free, open-source virtualization management platform. It was founded by Red Hat as a community project on which Red Hat Virtualization is based. It allows centralized management of virtual machines, compute, storage and networking resources, from an easy-to-use web-based front-end with platform independent access. KVM on x86-64, PowerPC64 and s390x architecture are the only hypervisors supported, but there is an ongoing effort to support ARM architecture in a future releases.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

A reliable multicast is any computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

Amplidata is a privately-held cloud storage technology provider based in Lochristi, Belgium. In November 2010, Amplidata opened its U.S. headquarters in Redwood City, California. The research and development department has locations in Belgium and Egypt, while the sales and support departments are represented in a number of countries in Europe and North America.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

<span class="mw-page-title-main">Valkey</span> Freely available in-memory key–value database

Valkey is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Valkey offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Valkey is the successor to Redis, the most popular NoSQL database, and one of the most popular databases overall. Valkey or its predecessor Redis are used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.