Berkeley sockets

Last updated

Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

Contents

A socket is an abstract representation (handle) for the local endpoint of a network communication path. The Berkeley sockets API represents it as a file descriptor (file handle) in the Unix philosophy that provides a common interface for input and output to streams of data.

Berkeley sockets evolved with little modification from a de facto standard into a component of the POSIX specification. The term POSIX sockets is essentially synonymous with Berkeley sockets, but they are also known as BSD sockets, acknowledging the first implementation in the Berkeley Software Distribution.

History and implementations

Berkeley sockets originated with the 4.2BSD Unix operating system, released in 1983, as a programming interface. Not until 1989, however, could the University of California, Berkeley release versions of the operating system and networking library free from the licensing constraints of AT&T Corporation's proprietary Unix.

All modern operating systems implement a version of the Berkeley socket interface. It became the standard interface for applications running in the Internet. Even the Winsock implementation for MS Windows, created by unaffiliated developers, closely follows the standard.

The BSD sockets API is written in the C programming language. Most other programming languages provide similar interfaces, typically written as a wrapper library based on the C API. [1]

BSD and POSIX sockets

As the Berkeley socket API evolved and ultimately yielded the POSIX socket API, [2] certain functions were deprecated or removed and replaced by others. The POSIX API is also designed to be reentrant and supports IPv6.

ActionBSDPOSIX
Conversion from text address to packed addressinet_atoninet_pton
Conversion from packed address to text addressinet_ntoainet_ntop
Forward lookup for host name/servicegethostbyname, gethostbyaddr, getservbyname, getservbyport getaddrinfo
Reverse lookup for host name/servicegethostbyaddr, getservbyport getnameinfo

Alternatives

The STREAMS-based Transport Layer Interface (TLI) API offers an alternative to the socket API. Many systems that provide the TLI API also provide the Berkeley socket API.

Non-Unix systems often expose the Berkeley socket API with a translation layer to a native networking API. Plan 9 [3] and Genode [4] use file-system APIs with control files rather than file-descriptors.

Header files

The Berkeley socket interface is defined in several header files. The names and content of these files differ slightly between implementations. In general, they include:

FileDescription
sys/socket.hCore socket functions and data structures.
netinet/in.hAF_INET and AF_INET6 address families and their corresponding protocol families, PF_INET and PF_INET6. These include standard IP addresses and TCP and UDP port numbers.
sys/un.hPF_UNIX and PF_LOCAL address family. Used for local communication between programs running on the same computer.
arpa/inet.hFunctions for manipulating numeric IP addresses.
netdb.hFunctions for translating protocol names and host names into numeric addresses. Searches local data as well as name services.

Socket API functions

Flow diagram of client-server transaction using sockets with the Transmission Control Protocol (TCP). InternetSocketBasicDiagram zhtw.png
Flow diagram of client-server transaction using sockets with the Transmission Control Protocol (TCP).

The Berkeley socket API typically provides the following functions:

socket

The function socket() creates an endpoint for communication and returns a file descriptor for the socket. It uses three arguments:

The function returns -1 if an error occurred. Otherwise, it returns an integer representing the newly assigned descriptor.

bind

bind() associates a socket with an address. When a socket is created with socket(), it is only given a protocol family, but not assigned an address. This association must be performed before the socket can accept connections from other hosts. The function has three arguments:

bind() returns 0 on success and -1 if an error occurs.

listen

After a socket has been associated with an address, listen() prepares it for incoming connections. However, this is only necessary for the stream-oriented (connection-oriented) data modes, i.e., for socket types (SOCK_STREAM, SOCK_SEQPACKET). listen() requires two arguments:

Once a connection is accepted, it is dequeued. On success, 0 is returned. If an error occurs, -1 is returned.

accept

When an application is listening for stream-oriented connections from other hosts, it is notified of such events (cf. select() function) and must initialize the connection using function accept(). It creates a new socket for each connection and removes the connection from the listening queue. The function has the following arguments:

accept() returns the new socket descriptor for the accepted connection, or the value -1 if an error occurs. All further communication with the remote host now occurs via this new socket.

Datagram sockets do not require processing by accept() since the receiver may immediately respond to the request using the listening socket.

connect

connect() establishes a direct communication link to a specific remote host identified by its address via a socket, identified by its file descriptor.

When using a connection-oriented protocol, this establishes a connection. Certain types of protocols are connectionless, most notably the User Datagram Protocol. When used with connectionless protocols, connect defines the remote address for sending and receiving data, allowing the use of functions such as send and recv. In these cases, the connect function prevents reception of datagrams from other sources.

connect() returns an integer representing the error code: 0 represents success, while –1 represents an error. Historically, in BSD-derived systems, the state of a socket descriptor is undefined if the call to connect fails (as it is specified in the Single Unix Specification), thus, portable applications should close the socket descriptor immediately and obtain a new descriptor with socket(), in the case the call to connect() fails. [5]

gethostbyname and gethostbyaddr

The functions gethostbyname() and gethostbyaddr() are used to resolve host names and addresses in the domain name system or the local host's other resolver mechanisms (e.g., /etc/hosts lookup). They return a pointer to an object of type struct hostent, which describes an Internet Protocol host. The functions use the following arguments:

The functions return a NULL pointer in case of error, in which case the external integer h_errno may be checked to see whether this is a temporary failure or an invalid or unknown host. Otherwise a valid struct hostent * is returned.

These functions are not strictly a component of the BSD socket API, but are often used in conjunction with the API functions for looking up a host. These functions are now considered legacy interfaces for querying the domain name system. New functions that are completely protocol-agnostic (supporting IPv6) have been defined. These new functions are getaddrinfo() and getnameinfo(), and are based on a new addrinfo data structure. [6]

This pair of functions appeared at the same time as the BSD socket API proper in 4.2BSD (1983), [7] the same year DNS was first created. Early versions did not query DNS and only performed /etc/hosts lookup. The 4.3BSD (1984) version added DNS in a crude way. The current implementation using Name Service Switch derives Solaris and later NetBSD 1.4 (1999). [8] Initially defined for NIS+, NSS makes DNS only one of the many options for lookup by these functions and its use can be disabled even today. [9]

Protocol and address families

The Berkeley socket API is a general interface for networking and interprocess communication, and supports the use of various network protocols and address architectures.

The following lists a sampling of protocol families (preceded by the standard symbolic identifier) defined in a modern Linux or BSD implementation:

IdentifierFunction or use
PF_APPLETALK AppleTalk
PF_ATMPVC Asynchronous Transfer Mode Permanent Virtual Circuits
PF_ATMSVCAsynchronous Transfer Mode Switched Virtual Circuits
PF_AX25Amateur Radio AX.25
PF_CAN Controller Area Network
PF_BLUETOOTH Bluetooth sockets
PF_BRIDGEMultiprotocol bridge
PF_DECnetReserved for DECnet project
PF_ECONETAcorn Econet
PF_INET Internet Protocol version 4
PF_INET6 Internet Protocol version 6
PF_IPXNovell's Internetwork Packet Exchange
PF_IRDA IrDA sockets
PF_KEYPF_KEY key management API
PF_LOCAL, PF_UNIX, PF_FILELocal to host (pipes and file-domain)
PF_NETROMAmateur radio NET/ROM (related to AX.25) [10]
PF_NETBEUIReserved for 802.2LLC project
PF_SECURITYSecurity callback pseudo AF
PF_NETLINK, PF_ROUTErouting API
PF_PACKETPacket capture sockets
PF_PPPOX PPP over X sockets
PF_SNALinux Systems Network Architecture (SNA) Project
PF_WANPIPESangoma Wanpipe API sockets

A socket for communications is created with the socket() function, by specifying the desired protocol family (PF_-identifier) as an argument.

The original design concept of the socket interface distinguished between protocol types (families) and the specific address types that each may use. It was envisioned that a protocol family may have several address types. Address types were defined by additional symbolic constants, using the prefix AF instead of PF. The AF-identifiers are intended for all data structures that specifically deal with the address type and not the protocol family. However, this concept of separation of protocol and address type has not found implementation support and the AF-constants were defined by the corresponding protocol identifier, leaving the distinction between AF and PF constants as a technical argument of no practical consequence. Indeed, much confusion exists in the proper usage of both forms. [11]

The POSIX.1—2008 specification doesn't specify any PF-constants, but only AF-constants [12]

Raw sockets

Raw sockets provide a simple interface that bypasses the processing by the host's TCP/IP stack. They permit implementation of networking protocols in user space and aid in debugging of the protocol stack. [13] Raw sockets are used by some services, such as ICMP, that operate at the Internet Layer of the TCP/IP model.

Blocking and non-blocking mode

Berkeley sockets can operate in one of two modes: blocking or non-blocking.

A blocking socket does not return control until it has sent (or received) some or all data specified for the operation. It is normal for a blocking socket not to send all data. The application must check the return value to determine how many bytes have been sent or received and it must resend any data not already processed. [14] When using blocking sockets, special consideration should be given to accept() as it may still block after indicating readability if a client disconnects during the connection phase.

A non-blocking socket returns whatever is in the receive buffer and immediately continues. If not written correctly, programs using non-blocking sockets are particularly susceptible to race conditions due to variances in network link speed.[ citation needed ]

A socket is typically set to blocking or non-blocking mode using the functions fcntl and ioctl.

Terminating sockets

The operating system does not release the resources allocated to a socket until the socket is closed. This is especially important if the connect call fails and will be retried.

When an application closes a socket, only the interface to the socket is destroyed. It is the kernel's responsibility to destroy the socket internally. Sometimes, a socket may enter a TIME_WAIT state, on the server side, for up to 4 minutes. [15]

On SVR4 systems use of close() may discard data. The use of shutdown() or SO_LINGER may be required on these systems to guarantee delivery of all data. [16]

Client-server example using TCP

The Transmission Control Protocol (TCP) is a connection-oriented protocol that provides a variety of error correction and performance features for transmission of byte streams. A process creates a TCP socket by calling the socket() function with the parameters for the protocol family ( PF INET , PF_INET6), the socket mode for stream sockets (SOCK_STREAM), and the IP protocol identifier for TCP (IPPROTO_TCP).

Server

Establishing a TCP server involves the following basic steps:

The following program creates a TCP server listening on port number 1100:

#include<sys/types.h>#include<sys/socket.h>#include<netinet/in.h>#include<arpa/inet.h>#include<stdio.h>#include<stdlib.h>#include<string.h>#include<unistd.h>intmain(void){structsockaddr_insa;intSocketFD=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP);if(SocketFD==-1){perror("cannot create socket");exit(EXIT_FAILURE);}memset(&sa,0,sizeofsa);sa.sin_family=AF_INET;sa.sin_port=htons(1100);sa.sin_addr.s_addr=htonl(INADDR_ANY);if(bind(SocketFD,(structsockaddr*)&sa,sizeofsa)==-1){perror("bind failed");close(SocketFD);exit(EXIT_FAILURE);}if(listen(SocketFD,10)==-1){perror("listen failed");close(SocketFD);exit(EXIT_FAILURE);}for(;;){intConnectFD=accept(SocketFD,NULL,NULL);if(ConnectFD==-1){perror("accept failed");close(SocketFD);exit(EXIT_FAILURE);}/* perform read write operations ...       read(ConnectFD, buff, size)      */if(shutdown(ConnectFD,SHUT_RDWR)==-1){perror("shutdown failed");close(ConnectFD);close(SocketFD);exit(EXIT_FAILURE);}close(ConnectFD);}close(SocketFD);returnEXIT_SUCCESS;}

Client

Programming a TCP client application involves the following steps:

#include<sys/types.h>#include<sys/socket.h>#include<netinet/in.h>#include<arpa/inet.h>#include<stdio.h>#include<stdlib.h>#include<string.h>#include<unistd.h>intmain(void){structsockaddr_insa;intres;intSocketFD;SocketFD=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP);if(SocketFD==-1){perror("cannot create socket");exit(EXIT_FAILURE);}memset(&sa,0,sizeofsa);sa.sin_family=AF_INET;sa.sin_port=htons(1100);res=inet_pton(AF_INET,"192.168.1.3",&sa.sin_addr);if(connect(SocketFD,(structsockaddr*)&sa,sizeofsa)==-1){perror("connect failed");close(SocketFD);exit(EXIT_FAILURE);}/* perform read write operations ... */close(SocketFD);returnEXIT_SUCCESS;}

Client-server example using UDP

The User Datagram Protocol (UDP) is a connectionless protocol with no guarantee of delivery. UDP packets may arrive out of order, multiple times, or not at all. Because of this minimal design, UDP has considerably less overhead than TCP. Being connectionless means that there is no concept of a stream or permanent connection between two hosts. Such data are referred to as datagrams (datagram sockets).

UDP address space, the space of UDP port numbers (in ISO terminology, the TSAPs), is completely disjoint from that of TCP ports.

Server

An application may set up a UDP server on port number 7654 as follows. The programs contains an infinite loop that receives UDP datagrams with function recvfrom().

#include<stdio.h>#include<errno.h>#include<string.h>#include<sys/socket.h>#include<sys/types.h>#include<netinet/in.h>#include<unistd.h> /* for close() for socket */ #include<stdlib.h>intmain(void){intsock;structsockaddr_insa;charbuffer[1024];ssize_trecsize;socklen_tfromlen;memset(&sa,0,sizeofsa);sa.sin_family=AF_INET;sa.sin_addr.s_addr=htonl(INADDR_ANY);sa.sin_port=htons(7654);fromlen=sizeofsa;sock=socket(PF_INET,SOCK_DGRAM,IPPROTO_UDP);if(bind(sock,(structsockaddr*)&sa,sizeofsa)==-1){perror("error bind failed");close(sock);exit(EXIT_FAILURE);}for(;;){recsize=recvfrom(sock,(void*)buffer,sizeofbuffer,0,(structsockaddr*)&sa,&fromlen);if(recsize<0){fprintf(stderr,"%s\n",strerror(errno));exit(EXIT_FAILURE);}printf("recsize: %d\n ",(int)recsize);sleep(1);printf("datagram: %.*s\n",(int)recsize,buffer);}}

Client

The following is a client program for sending a UDP packet containing the string "Hello World!" to address 127.0.0.1 at port number 7654.

#include<stdlib.h>#include<stdio.h>#include<errno.h>#include<string.h>#include<sys/socket.h>#include<sys/types.h>#include<netinet/in.h>#include<unistd.h>#include<arpa/inet.h>intmain(void){intsock;structsockaddr_insa;intbytes_sent;charbuffer[200];strcpy(buffer,"hello world!");/* create an Internet, datagram, socket using UDP */sock=socket(PF_INET,SOCK_DGRAM,IPPROTO_UDP);if(sock==-1){/* if socket failed to initialize, exit */printf("Error Creating Socket");exit(EXIT_FAILURE);}/* Zero out socket address */memset(&sa,0,sizeofsa);/* The address is IPv4 */sa.sin_family=AF_INET;/* IPv4 addresses is a uint32_t, convert a string representation of the octets to the appropriate value */sa.sin_addr.s_addr=inet_addr("127.0.0.1");/* sockets are unsigned shorts, htons(x) ensures x is in network byte order, set the port to 7654 */sa.sin_port=htons(7654);bytes_sent=sendto(sock,buffer,strlen(buffer),0,(structsockaddr*)&sa,sizeofsa);if(bytes_sent<0){printf("Error sending packet: %s\n",strerror(errno));exit(EXIT_FAILURE);}close(sock);/* close the socket */return0;}

In this code, buffer is a pointer to the data to be sent, and buffer_length specifies the size of the data.

Related Research Articles

In computing, traceroute and tracert are computer network diagnostic commands for displaying possible routes (paths) and measuring transit delays of packets across an Internet Protocol (IP) network. The history of the route is recorded as the round-trip times of the packets received from each successive host in the route (path); the sum of the mean times in each hop is a measure of the total time spent to establish the connection. Traceroute proceeds unless all sent packets are lost more than twice; then the connection is lost and the route cannot be evaluated. Ping, on the other hand, only computes the final round-trip times from the destination point.

In computer networking, the User Datagram Protocol (UDP) is one of the core communication protocols of the Internet protocol suite used to send messages to other hosts on an Internet Protocol (IP) network. Within an IP network, UDP does not require prior communication to set up communication channels or data paths.

<span class="mw-page-title-main">Transport layer</span> Layer in the OSI and TCP/IP models providing host-to-host communication services for applications

In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end communication services for applications. It provides services such as connection-oriented communication, reliability, flow control, and multiplexing.

NetBIOS is an acronym for Network Basic Input/Output System. It provides services related to the session layer of the OSI model allowing applications on separate computers to communicate over a local area network. As strictly an API, NetBIOS is not a networking protocol. Operating systems of the 1980s ran NetBIOS over IEEE 802.2 and IPX/SPX using the NetBIOS Frames (NBF) and NetBIOS over IPX/SPX (NBX) protocols, respectively. In modern networks, NetBIOS normally runs over TCP/IP via the NetBIOS over TCP/IP (NBT) protocol. This results in each computer in the network having both an IP address and a NetBIOS name corresponding to a host name. NetBIOS is also used for identifying system names in TCP/IP (Windows). Simply stated, it is a protocol that allows communication of data for files and printers through the Session Layer of the OSI Model in a LAN.

In computing, the Windows Sockets API (WSA), later shortened to Winsock, is an application programming interface (API) that defines how Windows network application software should access network services, especially TCP/IP. It defines a standard interface between a Windows TCP/IP client application and the underlying TCP/IP protocol stack. The nomenclature is based on the Berkeley sockets API used in BSD for communications between programs.

SOCKS is an Internet protocol that exchanges network packets between a client and server through a proxy server. SOCKS5 optionally provides authentication so only authorized users may access a server. Practically, a SOCKS server proxies TCP connections to an arbitrary IP address, and provides a means for UDP packets to be forwarded. A SOCKS server accepts incoming client connection on TCP port 1080, as defined in RFC 1928.

In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

netstat Command line network statistics tool

In computing, netstat is a command-line network utility that displays network connections for Transmission Control Protocol, routing tables, and a number of network interface and network protocol statistics. It is available on Unix, Plan 9, Inferno, and Unix-like operating systems including macOS, Linux, Solaris and BSD. It is also available on IBM OS/2 and on Microsoft Windows NT-based operating systems including Windows XP, Windows Vista, Windows 7, Windows 8 and Windows 10.

NetBIOS over TCP/IP is a networking protocol that allows legacy computer applications relying on the NetBIOS API to be used on modern TCP/IP networks.

inetd is a super-server daemon on many Unix systems that provides Internet services. For each configured service, it listens for requests from connecting clients. Requests are served by spawning a process which runs the appropriate executable, but simple services such as echo are served by inetd itself. External executables, which are run on request, can be single- or multi-threaded. First appearing in 4.3BSD, it is generally located at /usr/sbin/inetd. inetd is based on the (service) activator pattern

In C programming, the functions getaddrinfo and getnameinfo convert domain names, hostnames, and IP addresses between human-readable text representations and structured binary formats for the operating system's networking API. Both functions are contained in the POSIX standard application programming interface (API).

Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).

A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming interface (API) for the networking architecture. Sockets are created only during the lifetime of a process of an application running in the node.

A Unix domain socket aka UDS or IPC socket is a data communications endpoint for exchanging data between processes executing on the same host operating system. It is also referred to by its address family AF_UNIX.

In computer networking, a port or port number is a number assigned to uniquely identify a connection endpoint and to direct data to a specific service. At the software level, within an operating system, a port is a logical construct that identifies a specific process or a type of network service. A port at the software level is identified for each transport protocol and address combination by the port number assigned to it. The most common transport protocols that use port numbers are the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP); those port numbers are 16-bit unsigned numbers.

Transparent Inter Process Communication (TIPC) is an Inter-process communication (IPC) service in Linux designed for cluster-wide operation. It is sometimes presented as Cluster Domain Sockets, in contrast to the well-known Unix Domain Socket service; the latter working only on a single kernel.

In computer science, a type punning is any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

UDP-Lite is a connectionless protocol that allows a potentially damaged data payload to be delivered to an application rather than being discarded by the receiving station. This is useful as it allows decisions about the integrity of the data to be made in the application layer, where the significance of the bits is understood. UDP-Lite is described in RFC 3828.

select is a system call and application programming interface (API) in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels. The select system call is similar to the poll facility introduced in UNIX System V and later operating systems. However, with the c10k problem, both select and poll have been superseded by the likes of kqueue, epoll, /dev/poll and I/O completion ports.

<span class="mw-page-title-main">SocketCAN</span> Open source controller area network drivers and networking stack for the Linux kernel

SocketCAN is a set of open source CAN drivers and a networking stack contributed by Volkswagen Research to the Linux kernel. SocketCAN was formerly known as Low Level CAN Framework (LLCF).

References

  1. E. g. in the Ruby programming language ruby-doc::Socket
  2. "— POSIX.1-2008 specification". Opengroup.org. Retrieved 2012-07-26.
  3. "The Organization of Networks in Plan 9".
  4. "Linux TCP/IP stack as VFS plugin".
  5. Stevens & Rago 2013, p. 607.
  6. POSIX.1-2004
  7. gethostbyname(3)    FreeBSD Library Functions Manual
  8. Conill, Ariadne (March 27, 2022). "the tragedy of gethostbyname". ariadne.space.
  9. nsswitch.conf(5)    FreeBSD File Formats Manual
  10. https://manpages.debian.org/experimental/ax25-tools/netrom.4.en.html.{{cite web}}: Missing or empty |title= (help)
  11. UNIX Network Programming Volume 1, Third Edition: The Sockets Networking API, W. Richard Stevens, Bill Fenner, Andrew M. Rudoff, Addison Wesley, 2003.
  12. "The Open Group Base Specifications Issue 7". Pubs.opengroup.org. Retrieved 2012-07-26.
  13. "TCP/IP raw sockets - Win32 apps". 19 January 2022.
  14. "Beej's Guide to Network Programming". Beej.us. 2007-05-05. Retrieved 2012-07-26.
  15. "terminating sockets". Softlab.ntua.gr. Retrieved 2012-07-26.
  16. "ntua.gr - Programming UNIX Sockets in C - Frequently Asked Questions: Questions regarding both Clients and Servers (TCP/SOCK_STREAM)". Softlab.ntua.gr. Retrieved 2012-07-26.

The de jure standard definition of the Sockets interface is contained in the POSIX standard, known as:

Information about this standard and ongoing work on it is available from the Austin website.

The IPv6 extensions to the base socket API are documented in RFC 3493 and RFC 3542.