SystemTap

Last updated
SystemTap
Initial release2005 (2005)
Stable release
5.1 / April 26, 2024;0 days ago (2024-04-26)
Repository
Written in C, C++
Operating system Linux
Type Tracing programming language
License GNU General Public License
Website sourceware.org/systemtap/

In computing, SystemTap (stap) is a scripting language and tool for dynamically instrumenting running production Linux-based operating systems. System administrators can use SystemTap to extract, filter and summarize data in order to enable diagnosis of complex performance or functional problems.

Contents

SystemTap consists of free and open-source software and includes contributions from Red Hat, IBM, Intel, Hitachi, Oracle, the University of Wisconsin-Madison and other community members. [1]

History

SystemTap debuted in 2005 in Red Hat Enterprise Linux 4 Update 2 as a technology preview. [2]

After four years in development, SystemTap 1.0 was released in 2009. [3]

As of 2011, SystemTap runs fully supported in all Linux distributions including RHEL / CentOS 5 [4] since update 2, SLES 10, [5] Fedora, Debian and Ubuntu.

Tracepoints in the CPython VM and JVM were added in SystemTap 1.2 in 2009. [6]

In November 2019, SystemTap 4.2 included prometheus exporter.

Usage

SystemTap files are written in the SystemTap language [7] (saved as .stp files) and run with the stap command-line. [8]

The system carries out a number of analysis passes on the script before allowing it to run. Scripts may be executed with one of three backends selected by the --runtime= option. The default is a loadable kernel module, which has the fullest capability to inspect and manipulate any part of the system, and therefore requires most privilege. Another backend is based on the dynamic program analysis library DynInst to instrument the user's own user-space programs only, and requires least privilege. The newest backend [9] is based on eBPF byte-code, is limited to the Linux kernel interpreter's capabilities, and requires an intermediate level of privilege. In each case, the module is unloaded when the script has finished running.

Scripts generally focus on events (such as starting or finishing a script), compiled-in probe points such as Linux "tracepoints", or the execution of functions or statements in the kernel or user-space.

Some "guru mode" scripts may also have embedded C, which may run with the -g command-line option. However, use of guru mode is discouraged, and each SystemTap release includes more probe points designed to remove the need for guru-mode scripts. Guru mode is required in order to permit scripts to modify state in the instrumented software, such as to apply some types of emergency security fixes.

As of SystemTap version 1.7, the software implements the new stapsys group and privilege level. [10]

Simple examples

The following script shows all applications setting TCP socket options on the system, what options are being set, and whether the option is set successfully or not.

# Show sockets setting options# Return enabled or disabled based on value of optvalfunctiongetstatus(optval){if(optval==1)return"enabling"elsereturn"disabling"}probe begin{print("\nChecking for apps setting socket options\n")}# Set a socket optionprobetcp.setsockopt{status=getstatus(user_int($optval))printf("  App '%s' (PID %d) is %s socket option %s... ",execname(),pid(),status,optstr)}# Check setting the socket option workedprobetcp.setsockopt.return{if(ret==0)printf("success")elseprintf("failed")printf("\n")}probe end{print("\nClosing down\n")}

Many other examples are shipped with SystemTap. [11] There are also real-world examples of SystemTap use at the War Stories page. [12]

Importing scripts from other tracing technologies

SystemTap can attach to DTrace markers when they are compiled into an application using macros from the sys/sdt.h header file.

See also

Related Research Articles

Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

OpenVPN is a virtual private network (VPN) system that implements techniques to create secure point-to-point or site-to-site connections in routed or bridged configurations and remote access facilities. It implements both client and server applications.

Synchronet is a multiplatform BBS software package, with current ports for Microsoft Windows, Linux, and BSD variants. Past versions also ran on MS-DOS and OS/2, but support for those platforms were dropped in version 3.0.

<span class="mw-page-title-main">DTrace</span> Dynamic tracing framework for kernel and applications

DTrace is a comprehensive dynamic tracing framework originally created by Sun Microsystems for troubleshooting kernel and application problems on production systems in real time. Originally developed for Solaris, it has since been released under the free Common Development and Distribution License (CDDL) in OpenSolaris and its descendant illumos, and has been ported to several other Unix-like systems.

Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).

authbind is an open-source system utility written by Ian Jackson and is distributed under the GNU General Public License. The authbind software allows a program that would normally require superuser privileges to access privileged network services to run as a non-privileged user. authbind allows the system administrator to permit specific users and groups access to bind to TCP and UDP ports below 1024. Ports 0 - 1023 are normally privileged and reserved for programs that are run as the root user. Allowing regular users limited access to privileged ports helps prevent possible privilege escalation and system compromise if the software happens to contain software bugs or is found to be vulnerable to unknown exploits.

The Berkeley Packet Filter is a network tap and packet filter which permits computer network packets to be captured and filtered at the operating system level. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received, and allows a userspace process to supply a filter program that specifies which packets it wants to receive. For example, a tcpdump process may want to receive only packets that initiate a TCP connection. BPF returns only packets that pass the filter that the process supplies. This avoids copying unwanted packets from the operating system kernel to the process, greatly improving performance. The filter program is in the form of instructions for a virtual machine, which are interpreted, or compiled into machine code by a just-in-time (JIT) mechanism and executed, in the kernel.

Sockstress is a method of attacking servers and other devices that accept TCP connections on the Internet and other TCP-based networks. This method depletes local resources in order to crash a service or an entire machine, essentially functioning as a denial-of-service attack.

ProbeVue is IBM's implementation of a lightweight dynamic tracing environment introduced in AIX version 6.1. ProbeVue provides the ability to probe running processes in order to provide statistical analysis as well as retrieve data from the probed process. The dynamic nature of ProbeVue allows it to be used as a global system performance tool while retaining the ability to drill into very specific events on a single process or thread.

LTTng is a system software package for correlated tracing of the Linux kernel, applications and libraries. The project was originated by Mathieu Desnoyers with an initial release in 2005. Its predecessor is the Linux Trace Toolkit.

dprobes is a Linux kernel analysis framework created in 2004, which features the ability to insert software probes dynamically into running code. It is based on kprobes.

In computing, SPICE is a remote-display system built for virtual environments which allows users to view a computing "desktop" environment – not only on its computer-server machine, but also from anywhere on the Internet – using a wide variety of machine architectures.

Linux on IBM Z or Linux on zSystems is the collective term for the Linux operating system compiled to run on IBM mainframes, especially IBM Z / IBM zSystems and IBM LinuxONE servers. Similar terms which imply the same meaning are Linux/390, Linux/390x, etc. The three Linux distributions certified for usage on the IBM Z hardware platform are Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Ubuntu.

netsniff-ng Linux networking toolkit

netsniff-ng is a free Linux network analyzer and networking toolkit originally written by Daniel Borkmann. Its gain of performance is reached by zero-copy mechanisms for network packets, so that the Linux kernel does not need to copy packets from kernel space to user space via system calls such as recvmsg . libpcap, starting with release 1.0.0, also supports the zero-copy mechanism on Linux for capturing (RX_RING), so programs using libpcap also use that mechanism on Linux.

Checkpoint/Restore In Userspace (CRIU), is a software tool for the Linux operating system. Using this tool, it is possible to freeze a running application and checkpoint it to persistent storage as a collection of files. One can then use the files to restore and run the application from the point it was frozen at. The distinctive feature of the CRIU project is that it is mainly implemented in user space, rather than in the kernel.

perf is a performance analyzing tool in Linux, available from Linux kernel version 2.6.31 in 2009. Userspace controlling utility, named perf, is accessed from the command line and provides a number of subcommands; it is capable of statistical profiling of the entire system.

ftrace is a tracing framework for the Linux kernel. Although its original name, Function Tracer, came from ftrace's ability to record information related to various function calls performed while the kernel is running, ftrace's tracing capabilities cover a much broader range of kernel's internal operations.

WireGuard is a communication protocol and free and open-source software that implements encrypted virtual private networks (VPNs), and was designed with the goals of ease of use, high speed performance, and low attack surface. It aims to be smaller and better performing than IPsec and OpenVPN, two common tunneling protocols. The WireGuard protocol passes traffic over UDP.

XDP is an eBPF-based high-performance data path used to send and receive network packets at high rates by bypassing most of the operating system networking stack. It is merged in the Linux kernel since version 4.8. This implementation is licensed under GPL. Large technology firms including Amazon, Google and Intel support its development. Microsoft released their free and open source implementation XDP for Windows in May 2022. It is licensed under MIT License.

eBPF Safe dynamic programs and tools

eBPF is a technology that can run programs in a privileged context such as the operating system kernel. It is the successor to the Berkeley Packet Filter (BPF) filtering mechanism in Linux, and is also used in other parts of the Linux kernel as well.

References

  1. "A SystemTap update". LWN.net.
  2. "Product Documentation for Red Hat Enterprise Linux". Red Hat.
  3. "Josh Stone - SystemTap release 1.0".
  4. "Product Documentation". Red Hat.
  5. "Optional update for SystemTap". Novell. 10 October 2006.
  6. "Features/SystemtapStaticProbes - FedoraProject". Fedoraproject.
  7. "SystemTap Language Reference".
  8. Compare Romans, Robb (2009). "SystemTap Language Reference: A guide to the constructs and syntax used in SystemTap scripts". Red Hat: 4. CiteSeerX   10.1.1.172.5186 . SystemTap [...] requires root privileges to actually run the kernel objects it builds using the sudo command, applied to the staprun program.[...] staprun is a part of the SystemTap package, dedicated to module loading and unloading and kernel-touser data transfer.{{cite journal}}: Cite journal requires |journal= (help)
  9. Merey, Aaron (2017-10-18). "systemtap 3.2 release" . Retrieved 2017-10-18. The systemtap team announces release 3.2 [...] early experimental eBPF (extended Berkeley Packet Filter) backend [...][ permanent dead link ]
  10. Eigler, Frank Ch. (2012-02-01). "systemtap 1.7 release" . Retrieved 2013-03-26. The systemtap team announces release 1.7 [...] The new group and privilege level "stapsys" has been added [...]
  11. "SystemTap Examples".
  12. "WarStories - Systemtap Wiki".