KSMBD

Last updated
KSMBD (formerly CIFSD)
Original author(s) Namjae Jeon
Developer(s) Namjae Jeon, Sergey Senozhatsky, Hyunchul Lee
Repository github.com/cifsd-team
Written in C
Operating system Linux
Type Network Filesystem
License GPLv2

KSMBD is an open-source in-kernel CIFS/SMB server created by Namjae Jeon for the Linux kernel. Initially the goal is to provide improved file I/O performance, but the bigger goal is to have some new features which are much easier to develop and maintain inside the kernel and expose the layers fully. Directions can be attributed to sections where Samba is moving to a few modules inside the kernel to have features like Remote direct memory access (RDMA) to work with actual performance gain.

Contents

Features

Implemented

Planned

Architecture

The subset of performance related operations belong in kernelspace and the other subset which belong to operations which are not really related with performance in userspace. So, DCE/RPC management that has historically resulted into number of buffer overflow issues and dangerous security bugs and winreg and user account management are implemented in user space as ksmbd.mountd. File operations that are related with performance (open/read/write/close etc.) are in kernel space (ksmbd). This also allows for easier integration with the VFS interface for all file operations.

Cifsd architecture.png

ksmbd (kernel daemon)

When the server daemon is started, it starts up a forker thread (ksmbd/0) at initialization time and opens a dedicated port 445 for listening to SMB requests. Whenever new clients make requests, the forker thread will accept the client connection and fork a new thread for a dedicated communication channel between the client and the server. It allows for parallel processing of SMB requests (commands) from clients as well as allowing for new clients to make new connections. Each instance is named ksmbd/1~n to indicate connected clients. Depending on the SMB request types, each new thread can decide to pass through the commands to the user space (ksmbd.mountd). Currently DCE/RPC commands are identified to be handled through the user space.

To further utilize the linux kernel, it has been chosen to process the commands as default workitems to be executed in the handlers of the default kworker threads. It allows for multiplexing of the handlers as the kernel take care of initiating extra worker threads if the load is increased and vice versa, if the load is decreased it destroys the extra worker threads. So, after connection is established with the client, a dedicated ksmbd task takes complete ownership of the receiving and parsing of SMB commands. Each of the multiple clients' commands received is worked in parallel. After receiving each command a separated kernel workitem is prepared for each command which is further queued to be handled by default kworker'threads inside the kernel. So, each SMB workitem is queued to the kworkers. This allows the benefit of load sharing to be managed optimally by the default kernel and optimizes client performance by handling client commands in parallel.

ksmbd.mountd (user space daemon)

ksmbd.mountd is userspace process to, transfer user account and password that are registered using ksmbd.adduser (part of utils for user space). Further it allows sharing information parameters that parsed from smb.conf to smb export layer in kernel. For the execution part it has a daemon which is continuously running and connected to the kernel interface using netlink socket, it waits for the requests (dcerpc and winreg). It handles RPC calls (at a minimum few dozen) that are most important for file server from NetShareEnum and NetServerGetInfo and various DFS related calls which a server must implement. Complete DCE/RPC response is prepared from the user space and passed over to the associated kernel thread for the client.

See also

Related Research Articles

Samba is a free software re-implementation of the SMB networking protocol, and was originally developed by Andrew Tridgell. Samba provides file and print services for various Microsoft Windows clients and can integrate with a Microsoft Windows Server domain, either as a Domain Controller (DC) or as a domain member. As of version 4, it supports Active Directory and Microsoft Windows NT domains.

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call system. NFS is an open IETF standard defined in a Request for Comments (RFC), allowing anyone to implement the protocol.

<span class="mw-page-title-main">Server Message Block</span> Network communication protocol for providing shared access to resources

Server Message Block (SMB) is a communication protocol used to share files, printers, serial ports, and miscellaneous communications between nodes on a network. On Microsoft Windows, the SMB implementation consists of two vaguely named Windows services: "Server" and "Workstation". It uses NTLM or Kerberos protocols for user authentication. It also provides an authenticated inter-process communication (IPC) mechanism.

The V operating system is a discontinued microkernel distributed operating system that was developed by faculty and students in the Distributed Systems Group at Stanford University from 1981 to 1988, led by Professors David Cheriton and Keith A. Lantz. V was the successor to the Thoth operating system and Verex kernel that Cheriton had developed in the 1970s. Despite similar names and close development dates, it is unrelated to UNIX System V.

The TUX web server is an unmaintained in-kernel web server for Linux licensed under the GNU General Public License (GPL). It was maintained by Ingo Molnár.

udev is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the /dev directory. At the same time, udev also handles all user space events raised when hardware devices are added into the system or removed from it, including firmware loading as required by certain devices.

<span class="mw-page-title-main">Git</span> Distributed version control software system

Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers collaboratively developing software.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

Distributed File System (DFS) is a set of client and server services that allow an organization using Microsoft Windows servers to organize many distributed SMB file shares into a distributed file system. DFS has two components to its service: Location transparency and Redundancy. Together, these components enable data availability in the case of failure or heavy load by allowing shares in multiple different locations to be logically grouped under one folder, the "DFS root".

<span class="mw-page-title-main">Architecture of Windows NT</span> Overview of the architecture of the Microsoft Windows NT line of operating systems

The architecture of Windows NT, a line of operating systems produced and sold by Microsoft, is a layered design that consists of two main components, user mode and kernel mode. It is a preemptive, reentrant multitasking operating system, which has been designed to work with uniprocessor and symmetrical multiprocessor (SMP)-based computers. To process input/output (I/O) requests, it uses packet-driven I/O, which utilizes I/O request packets (IRPs) and asynchronous I/O. Starting with Windows XP, Microsoft began making 64-bit versions of Windows available; before this, there were only 32-bit versions of these operating systems.

Vanguard is a discontinued experimental microkernel developed at Apple Computer, in the research-oriented Apple Advanced Technology Group (ATG) in the early 1990s. Based on the V-System, Vanguard introduced standardized object identifiers and a unique message chaining system for improved performance. Vanguard was not used in any of Apple's commercial products. Development ended in 1993 when Ross Finlayson, the project's main investigator, left Apple.

In computer networking, xinetd is an open-source super-server daemon which runs on many Unix-like systems, and manages Internet-based connectivity.

The Perl Object Environment (POE) is a library of Perl modules written in the Perl programming language by Rocco Caputo et al.

Microsoft RPC is a modified version of DCE/RPC. Additions include partial support for UCS-2 strings, implicit handles, and complex calculations in the variable-length string and structure paradigms already present in DCE/RPC.

DCEThreads is an implementation of POSIX Draft 4 threads. DCE Stands for "Distributed Computing Environment" DCEThreads allowed users to create multiple avenues of execution in a single process. It is based on pthreads interface.

The Local Inter-Process Communication is an internal, undocumented inter-process communication facility provided by the Microsoft Windows NT kernel for lightweight IPC between processes on the same computer. As of Windows Vista, LPC has been rewritten as Asynchronous Local Inter-Process Communication in order to provide a high-speed scalable communication mechanism required to efficiently implement User-Mode Driver Framework (UMDF), whose user-mode parts require an efficient communication channel with UMDF's components in the executive.

<span class="mw-page-title-main">BOINC client–server technology</span> BOINC volunteer computing client–server structure

BOINC client–server technology refers to the model under which BOINC works. The BOINC framework consists of two layers which operate under the client–server architecture. Once the BOINC software is installed in a machine, the server starts sending tasks to the client. The operations are performed client-side and the results are uploaded to the server-side.

<span class="mw-page-title-main">Network block device</span> Network storage protocol

On Linux, network block device (NBD) is a network protocol that can be used to forward a block device from one machine to a second machine. As an example, a local machine can access a hard disk drive that is attached to another computer.

GVfs is GNOME's userspace virtual filesystem designed to work with the I/O abstraction of GIO, a library available in GLib since version 2.15.1. It installs several modules that are automatically used by applications using the APIs of libgio. There is also FUSE support that allows applications not using GIO to access the GVfs filesystems.

References