Scribe (log server)

Last updated
Scribe
Developer(s) Facebook, Inc.
Initial releaseOctober 24, 2008 (2008-10-24)
Repository
Written in C++, Python
License Apache License 2.0
Website Scribe homepage (Github)

Scribe was a server for aggregating log data streamed in real-time from many servers. It was designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.

Contents

Scribe was developed at Facebook and released in 2008 as open source. [1] [2]

Scribe servers are arranged in a directed graph, with each server knowing only about the next server in the graph. This network topology allows for adding extra layers of fan-in as a system grows, and batching messages before sending them between datacenters, without having any code that explicitly needs to understand datacenter topology, only a simple configuration. [3]

Scribe was designed to consider reliability but to not require heavyweight protocols and expansive disk usage. Scribe spools data to disk on any node to handle intermittent connectivity node failure, but doesn't sync a log file for every message. This creates a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. However, this degree of reliability is often suitable for most Facebook use cases. [3]

See also

Notes and references

  1. Robert Johnson. "Scribe Makes Its Open Source Debut" . Retrieved 2009-01-27.
  2. Caroline McCarthy. "Facebook to developers: Here, have some code!" . Retrieved 2009-01-27.
  3. 1 2 https://www.facebook.com/note.php?note_id=32008268919&id=9445547199 [ user-generated source ]


Related Research Articles

Freenet Peer-to-peer Internet platform for censorship-resistant communication

Freenet is a peer-to-peer platform for censorship-resistant, anonymous communication. It uses a decentralized distributed data store to keep and deliver information, and has a suite of free software for publishing and communicating on the Web without fear of censorship. Both Freenet and some of its associated tools were originally designed by Ian Clarke, who defined Freenet's goal as providing freedom of speech on the Internet with strong anonymity protection.

Network topology Arrangement of the elements of a communication network

Network topology is the arrangement of the elements of a communication network. Network topology can be used to define or describe the arrangement of various types of telecommunication networks, including command and control radio networks, industrial fieldbusses and computer networks.

Distributed hash table Decentralized distributed system with lookup service

A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

Windows Server 2003 Server operating system by Microsoft released in 2003

Windows Server 2003 is the first version of Windows Server operating system produced by Microsoft. It is part of the Windows NT family of operating systems and was released on April 24, 2003. Windows Server 2003 is the successor to the Server editions of Windows 2000 and the predecessor to Windows Server 2008. An updated version, Windows Server 2003 R2, was released to manufacturing on December 6, 2005.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers, if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

Event Viewer Component of Microsofts Windows NT operating system

Event Viewer is a component of Microsoft's Windows NT operating system that lets administrators and users view the event logs on a local or remote machine. Applications and operating-system components can use this centralized log service to report events that have taken place, such as a failure to start a component or to complete an action. In Windows Vista, Microsoft overhauled the event system.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Varnish is an HTTP accelerator designed for content-heavy dynamic web sites as well as APIs. In contrast to other web accelerators, such as Squid, which began life as a client-side cache, or Apache and nginx, which are primarily origin servers, Varnish was designed as an HTTP accelerator. Varnish is focused exclusively on HTTP, unlike other proxy servers that often support FTP, SMTP and other network protocols.

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers (nodes). Before the task has begun, all network nodes are either unaware which node will serve as the "leader" of the task, or unable to communicate with the current coordinator. After a leader election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task leader.

The Facebook Platform is the set of services, tools, and products provided by the social networking service Facebook for third-party developers to create their own applications and services that access data in Facebook.

oVirt Free, open-source virtualization management platform

oVirt is a free, open-source virtualization management platform. It was founded by Red Hat as a community project on which Red Hat Virtualization is based. It allows centralized management of virtual machines, compute, storage and networking resources, from an easy-to-use web-based front-end with platform independent access. KVM on x86-64, PowerPC64 and s390x architecture are the only hypervisors supported, but there is an ongoing effort to support ARM architecture in a future releases.

Apache Cassandra Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

A reliable multicast is any computer networking protocol that provides a reliable sequence of packets to multiple recipients simultaneously, making it suitable for applications such as multi-receiver file transfer.

Redis Open-source in-memory key–value database

Redis is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indices. The project was developed and maintained by Salvatore Sanfilippo. From 2015 until 2020, he led a project core team sponsored by Redis Labs. Salvatore Sanfilippo left Redis as the maintainer in 2020. It is open-source software released under a BSD 3-clause license. In 2021, not long after the original author and main maintainer left, Redis Labs dropped the Labs from its name and now redis, the open source DB as well as Redis Labs, the commercial company, are referred to as "redis".

Versant Object Database (VOD) is an object database software product developed by Versant Corporation.

AMPLIDATA is a privately-held cloud storage technology provider based in Lochristi, Belgium. In November 2010, Amplidata opened its U.S. headquarters in Redwood City, California. The research and development department has locations in Belgium and Egypt, while the sales and support departments are represented in a number of countries in Europe and North America.

Oracle NoSQL Database

Oracle NoSQL Database (ONDB) is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.