C10k problem

Last updated

The C10k problem is the problem of optimizing network sockets to handle a large number of clients at the same time. [1] The name C10k is a numeronym for concurrently handling ten thousand connections. [2] Handling many concurrent connections is a different problem from handling many requests per second: the latter requires high throughput (processing them quickly), while the former does not have to be fast, but requires efficient scheduling of connections.

Contents

The problem of socket server optimisation has been studied because a number of factors must be considered to allow a web server to support many clients. This can involve a combination of operating system constraints and web server software limitations. According to the scope of services to be made available and the capabilities of the operating system as well as hardware considerations such as multi-processing capabilities, a multi-threading model or a single threading model can be preferred. Concurrently with this aspect, which involves considerations regarding memory management (usually operating system related), strategies implied relate to the very diverse aspects of I/O management. [2]

History

The term C10k was coined in 1999 by software engineer Dan Kegel, [3] [4] citing the Simtel FTP host, cdrom.com, serving 10,000 clients at once over 1 gigabit per second Ethernet in that year. [1] The term has since been used for the general issue of large number of clients, with similar numeronyms for larger number of connections, most recently "C10M" in the 2010s to refer to 10 million concurrent connections. [5]

By the early 2010s millions of connections on a single commodity 1U rackmount server became possible: over 2 million connections (WhatsApp, 24 cores, using Erlang on FreeBSD) [6] [7] and 10–12 million connections (MigratoryData, 12 cores, using Java on Linux). [5] [8]

Common applications of very high numbers of connections include general public servers that have to serve thousands or even millions of users at a time, such as file servers, FTP servers, proxy servers, web servers, and load balancers. [9] [5]

See also

Related Research Articles

AppleTalk is a discontinued proprietary suite of networking protocols developed by Apple Computer for their Macintosh computers. AppleTalk includes a number of features that allow local area networks to be connected with no prior setup or the need for a centralized router or server of any sort. Connected AppleTalk-equipped systems automatically assign addresses, update the distributed namespace, and configure any required inter-networking routing.

<span class="mw-page-title-main">Apache HTTP Server</span> Open-source web server software

The Apache HTTP Server is a free and open-source cross-platform web server software, released under the terms of Apache License 2.0. It is developed and maintained by a community of developers under the auspices of the Apache Software Foundation.

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data connections between the client and the server. FTP users may authenticate themselves with a plain-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that protects the username and password, and encrypts the content, FTP is often secured with SSL/TLS (FTPS) or replaced with SSH File Transfer Protocol (SFTP).

<span class="mw-page-title-main">Load balancing (computing)</span> Set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing is the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

<span class="mw-page-title-main">Network-attached storage</span> Computer data storage server

Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. The term "NAS" can refer to both the technology and systems involved, or a specialized device built for such functionality.

<span class="mw-page-title-main">Terminal server</span> Device that interfaces serial hosts to a network

A terminal server connects devices with a serial port to a local area network (LAN). Products marketed as terminal servers can be very simple devices that do not offer any security functionality, such as data encryption and user authentication. The primary application scenario is to enable serial devices to access network server applications, or vice versa, where security of the data on the LAN is not generally an issue. There are also many terminal servers on the market that have highly advanced security functionality to ensure that only qualified personnel can access various servers and that any data that is transmitted across the LAN, or over the Internet, is encrypted. Usually, companies that need a terminal server with these advanced functions want to remotely control, monitor, diagnose and troubleshoot equipment over a telecommunications network.

In computer science, asynchronous I/O is a form of input/output processing that permits other processing to continue before the I/O operation has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O.

Simtel was an important long-running archive of freeware and shareware for various operating systems.

VirtualGL (VGL) is an open-source software package that redirects the 3D rendering commands from Unix and Linux OpenGL applications to 3D accelerator hardware in a dedicated server and sends the rendered output to a (thin) client located elsewhere on the network. On the server side, VirtualGL consists of a library that handles the redirection and a wrapper program that instructs applications to use this library. Clients can connect to the server either using a remote X11 connection or using an X11 proxy such as a Virtual Network Computing (VNC) server. In case of an X11 connection some client-side VirtualGL software is also needed to receive the rendered graphics output separately from the X11 stream. In case of a VNC connection no specific client-side software is needed other than the VNC client itself.

<span class="mw-page-title-main">Yaws (web server)</span> HTTP web server software

Yaws is a web server written in Erlang by Claes (klacke) Wikström. Yaws can be embedded into other Erlang-based applications or run as a regular standalone web server.

<span class="mw-page-title-main">Message broker</span> Computer program module

A message broker is an intermediary computer program module that translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver. Message brokers are elements in telecommunication or computer networks where software applications communicate by exchanging formally-defined messages. Message brokers are a building block of message-oriented middleware (MOM) but are typically not a replacement for traditional middleware like MOM and remote procedure call (RPC).

Nginx is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. The software was created by Russian developer Igor Sysoev and publicly released in 2004. Nginx is free and open-source software, released under the terms of the 2-clause BSD license. A large fraction of web servers use Nginx, often as a load balancer.

The reactor software design pattern is an event handling strategy that can respond to many potential service requests concurrently. The pattern's key component is an event loop, running in a single thread or process, which demultiplexes incoming requests and dispatches them to the correct request handler.

<span class="mw-page-title-main">Apache CouchDB</span> Document-oriented NoSQL database

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

<span class="mw-page-title-main">Search box</span> User interface element

A search box, search field or search bar is a graphical control element used in computer programs, such as file managers or web browsers, and on web sites. A search box is usually a single-line text box or search icon with the dedicated function of accepting user input to be searched for in a database. Search boxes on web pages are usually used to allow users to enter a query to be submitted to a Web search engine server-side script, where an index database is queried for entries that contain one or more of the user's keyword research.

OTP is a collection of useful middleware, libraries, and tools written in the Erlang programming language. It is an integral part of the open-source distribution of Erlang. The name OTP was originally an acronym for Open Telecom Platform, which was a branding attempt before Ericsson released Erlang/OTP as open source. However neither Erlang nor OTP is specific to telecom applications.

<span class="mw-page-title-main">Unisys OS 2200 communications</span> Aspect of Unisys OS 2200 operating system

The OS 2200 communications management system includes CPComm and MCB along with many programs that provide communications related functions such as file transfer, e-mail, and distributed transaction processing protocols.

FastCGI is a binary protocol for interfacing interactive programs with a web server. It is a variation on the earlier Common Gateway Interface (CGI). FastCGI's main aim is to reduce the overhead related to interfacing between web server and CGI programs, allowing a server to handle more web page requests per unit of time.

<span class="mw-page-title-main">SoftEther VPN</span> Open-source VPN client and server software

SoftEther VPN is free open-source, cross-platform, multi-protocol VPN client and VPN server software, developed as part of Daiyuu Nobori's master's thesis research at the University of Tsukuba. VPN protocols such as SSL VPN, L2TP/IPsec, OpenVPN, and Microsoft Secure Socket Tunneling Protocol are provided in a single VPN server. It was released using the GPLv2 license on January 4, 2014. The license was switched to Apache License 2.0 on January 21, 2019.

References

  1. 1 2 "The C10K problem". Archived from the original on 2013-07-22.
  2. 1 2 Liu, D.; Deters, R. (2009). "The Reverse C10K Problem for Server-Side Mashups". Service-Oriented Computing – ICSOC 2008 Workshops. Lecture Notes in Computer Science. Vol. 5472. p. 166. doi:10.1007/978-3-642-01247-1_16. ISBN   978-3-642-01246-4.
  3. Andrew Alexeev (2012). "§14. nginx; §14.1. Why Is High Concurrency Important?". In Amy Brown; Greg Wilson (eds.). The Architecture of Open Source Applications, Volume II: Structure, Scale and a Few More Fearless Hacks. Lulu.com. ISBN   9781105571817. Around ten years ago, Daniel Kegel, a prominent software engineer, … Kegel's C10K manifest … solving the C10K problem of 10,000 simultaneous connections, nginx
  4. Kegel, Dan (8 May 1999). "The C10K problem". Kegel com. Archived from the original on 8 May 1999. Retrieved 18 June 2019. And computers are big, too. You can buy a 500MHz machine with 1 gigabyte of RAM and six 100Mbit/sec Ethernet card for $3000 or so. Let's see - at 10000 clients, that's 50KHz, 100Kbytes, and 60Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of ten thousand clients. (That works out to $0.30 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck.
  5. 1 2 3 Mihai Rotaru (2015-05-20). "How MigratoryData solved the C10M problem: 10 Million Concurrent Connections on a Single Commodity Server". migratorydata.com. Retrieved 2021-10-15.
  6. "1 million is so 2011". WhatsApp blog. 6 January 2012. Archived from the original on 1 May 2014. Retrieved 25 July 2019. This time we also wanted to share some more technical details with you about hardware, OS and software: hw.machine: amd64 hw.model: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz hw.ncpu: 24 hw.physmem: 103062118400 hw.usermem: 100556451840
  7. Reed, Rick (30 March 2012). "Scaling to Millions of Simultaneous Connections" (PDF). Erlang Factory. p. 7. Archived from the original (PDF) on 9 July 2012. Retrieved 25 July 2019.
  8. Mihai Rotaru (2013-10-10). "Scaling to 12 Million Concurrent Connections: How MigratoryData Did It". migratorydata.com. Retrieved 2021-10-15.
  9. Ponnuswamy Sadayappan; Manish Parashar; Ramamurthy Badrinath; Viktor K. Prasanna (2008). High Performance Computing - HiPC 2008. Springer. ISBN   978-3-540-89893-1 . Retrieved 2021-10-15.