Weak symbol

Last updated

A weak symbol denotes a specially annotated symbol during linking of Executable and Linkable Format (ELF) object files. By default, without any annotation, a symbol in an object file is strong. During linking, a strong symbol can override a weak symbol of the same name. In contrast, in the presence of two strong symbols by the same name, the linker resolves the symbol in favor of the first one found. This behavior allows an executable to override standard library functions, such as malloc(3). When linking a binary executable, a weakly declared symbol does not need a definition. In comparison, (by default) a declared strong symbol without a definition triggers an undefined symbol link error.

Contents

Weak symbols are not mentioned by the C or C++ language standards; as such, inserting them into code is not very portable. Even if two platforms support the same or similar syntax for marking symbols as weak, the semantics may differ in subtle points, e.g. whether weak symbols during dynamic linking at runtime lose their semantics or not. [1]

Syntax

The GNU Compiler Collection and the Solaris Studio C compiler share the same syntax for annotating symbols as weak, namely a special #pragma, #pragma weak, and, alternatively, a function and variable attribute, __attribute__((weak)). [2] [3] [4] [5] [6] [7]

Pragma

// function declaration#pragma weak power2intpower2(intx);

Attribute

// function declarationint__attribute__((weak))power2(intx);// orintpower2(intx)__attribute__((weak));// variable declaration;externint__attribute__((weak))global_var;

Tools support

The nm command identifies weak symbols in object files, libraries, and executables. On Linux a weak function symbol is marked with "W" if a weak default definition is available, and with "w" if it is not. Weakly defined variable symbols are marked with "V" and "v". On Solaris "nm" prints "WEAK" instead of "GLOB" for a weak symbol.

Examples

The following examples work on Linux and Solaris with GCC and Solaris Studio.

Static example

main.c:

#include<stdio.h>#include<stdlib.h>#include"power_slow.h"intmain(intargc,char**argv){fprintf(stderr,"power3() = %d\n",power3(atoi(argv[1])));return0;}

power_slow.h:

#ifndef POWER2_SLOW_H#define POWER2_SLOW_H// alternative syntax// #pragma weak power2int__attribute__((weak))power2(intx)// alternatively after symbol// __attribute__((weak));intpower3(intx);#endif

power_slow.c:

#include<stdio.h>#include"power_slow.h"intpower2(intx){fprintf(stderr,"slow power2()\n");returnx*x;}intpower3(intx){returnpower2(x)*x;}

power.c:

#include<stdio.h>intpower2(intx){fprintf(stderr,"fast power2()\n");returnx*x;}

Build commands:

cc -g -c -o main.o main.c cc -g -c -o power_slow.o power_slow.c cc -g -c -o power.o power.c cc  main.o power_slow.o         -o slow cc  main.o power_slow.o power.o -o fast

Output:

$ ./slow3slow power2()power3() = 27$ ./fast3fast power2()power3() = 27

When removing the weak attribute and re-executing the build commands, the last one fails with the following error message (on Linux):

multiple definition of `power2'

The second-last one still succeeds, and ./slow has the same output.

If there are no definition for a weak symbol function as a default implementation or as another weak symbol function definition or a strong symbol function definition in any of the object files linked together, the linking will be done successfully without any undefined symbol error for that weak symbol, but the execution may cause runtime crash.

Shared example

Taking main.c from the preceding example and adding:

#ifndef NO_USER_HOOKvoiduser_hook(void){fprintf(stderr,"main: user_hook()\n");}#endif

Replacing power_slow.c with:

#include<stdio.h>#include"power_slow.h"void__attribute__((weak))user_hook(void);#ifdef ENABLE_DEFvoiduser_hook(void){fprintf(stderr,"power_slow: user_hook()\n");}#endifintpower2(intx){if(user_hook)// only needed ifndef ENABLE_DEFuser_hook();returnx*x;}intpower3(intx){returnpower2(x)*x;}

Build commands:

cc-g-c-omain.omain.c cc-g-fpic-c-opower_slow.popower_slow.c cc-shared-fpic-olibpowerslow.sopower_slow.po ccmain.o-L`pwd`-Wl,-R`pwd`-lpowerslow-omain  cc-g-DENABLE_DEF-fpic-c-opower_slow.popower_slow.c cc-shared-fpic-olibpowerslow.sopower_slow.po ccmain.o-L`pwd`-Wl,-R`pwd`-lpowerslow-omain2  cc-g-DNO_USER_HOOK-c-omain.omain.c cc-g-fpic-c-opower_slow.popower_slow.c cc-shared-fpic-olibpowerslow.sopower_slow.po ccmain.o-L`pwd`-Wl,-R`pwd`-lpowerslow-omain3  cc-g-DNO_USER_HOOK-c-omain.omain.c cc-g-DENABLE_DEF-fpic-c-opower_slow.popower_slow.c cc-shared-fpic-olibpowerslow.sopower_slow.po ccmain.o-L`pwd`-Wl,-R`pwd`-lpowerslow-omain4 

Output:

$ ./main3main: user_hook()power3() = 27$ ./main23main: user_hook()power3() = 27$ ./main33power3() = 27$ ./main43power_slow: user_hook()power3() = 27

Removing the weak attribute and re-executing the build commands does not yield build errors and leads to the same output (on Linux) for main and main2. The build commands for the main3 lead to following warning and error messages (on Linux):

warning: the address of ‘user_hook’ will always evaluate as ‘true’ libpowerslow.so: undefined reference to `user_hook' 

The warning is issued by the compiler because it can statically determine that in if(user_hook) the expression user_hook evaluates always to true, because it contains an ELF jump table entry. The error message is issued by the linker. The build for main4 includes the same warning but no link error.

Use cases

Weak symbols can be used as a mechanism to provide default implementations of functions that can be replaced by more specialized (e.g. optimized) ones at link-time. The default implementation is then declared as weak, and, on certain targets, object files with strongly declared symbols are added to the linker command line.

If a library defines a symbol as weak, a program that links that library is free to provide a strong one for, say, customization purposes.

Another use case for weak symbols is the maintenance of binary backward compatibility.

Limitations

On UNIX System V descendent systems, during program runtime the dynamic linker resolves weak symbols definitions like strong ones. For example, a binary is dynamically linked against libraries libfoo.so and libbar.so. libfoo defines symbol f and declares it as weak. libbar also defines f and declares it as strong. Depending on the library ordering on the link command line (i.e. -lfoo -lbar) the dynamic linker uses the weak f from libfoo.so although a strong version is available at runtime. The GNU ld provides the environment variable LD_DYNAMIC_WEAK to provide weak semantics for the dynamic linker. [1] [8]

When using constructs like

#pragma weak funcvoidfunc();voidbar(){if(func)func();}

, depending on the compiler and used optimization level, the compiler may interpret the conditional as always true (because func can be seen as undefined from a standards point of view). [7] An alternative to the above construct is using a system API to check if func is defined (e.g. dlsym with RTLD_DEFAULT). The above check may also fail for other reasons, e.g. when func contains an elf jump table entry. [9]

Using weak symbols in static libraries has other semantics than in shared ones, i.e. with a static library the symbol lookup stops at the first symbol – even if it is just weak and an object file with a strong symbol is also included in the library archive. On Linux, the linker option --whole-archive changes that behavior. [10]

The weak function attribute is supposed to be used on function declarations. Using it on a function definition may yield unexpected results, depending on the compiler and optimization level. [11]

In Solaris, the weak symbols are also used within kernel. The generic part of kernel (called genunix) specifies weak functions that are overridden in platform specific part of the kernel (called unix) such as Virtual memory routines. The kernel runtime linker sets the addresses of these functions when the kernel is combined in memory during boot. This does not work for kernel loadable modules though - weak symbol in the kernel is not replaced with kernel module symbol when the module is loaded.

C preprocessor (CPP) conditional constructs can also be used to switch between different versions of a symbol. The difference from weak symbols is that weak symbols are interpreted by the linker. The CPP is run during the compilation of each translation unit before the C compiler.

The build process (e.g. make) can be implemented in a conditional way such that just different versions of a symbol are created or different (specialized) libraries are used and linked depending on the target.

See also

Related Research Articles

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

<span class="mw-page-title-main">Kernel panic</span> Fatal error condition associated with Unix-like computer operating systems

A kernel panic is a safety measure taken by an operating system's kernel upon detecting an internal fatal error in which either it is unable to safely recover or continuing to run the system would have a higher risk of major data loss. The term is largely specific to Unix and Unix-like systems. The equivalent on Microsoft Windows operating systems is a stop error, often called a "blue screen of death".

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

<span class="mw-page-title-main">D (programming language)</span> Multi-paradigm system programming language

D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is a profoundly different language —features of D can be considered streamlined and expanded-upon ideas from C++, however D also draws inspiration from other high-level programming languages, notably Java, Python, Ruby, C#, and Eiffel.

In computer science and software engineering, busy-waiting, busy-looping or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input or a lock is available. Spinning can also be used to generate an arbitrary time delay, a technique that was necessary on systems that lacked a method of waiting a specific length of time. Processor speeds vary greatly from computer to computer, especially as some processors are designed to dynamically adjust speed based on current workload. Consequently, spinning as a time-delay technique can produce unpredictable or even inconsistent results on different systems unless code is included to determine the time a processor takes to execute a "do nothing" loop, or the looping code explicitly checks a real-time clock.

In compiler construction, name mangling is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.

stat (system call) Unix system call

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

In C programming, the functions getaddrinfo and getnameinfo convert domain names, hostnames, and IP addresses between human-readable text representations and structured binary formats for the operating system's networking API. Both functions are contained in the POSIX standard application programming interface (API).

In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.

In computer programming, DLL injection is a technique used for running code within the address space of another process by forcing it to load a dynamic-link library. DLL injection is often used by external programs to influence the behavior of another program in a way its authors did not anticipate or intend. For example, the injected code could hook system function calls, or read the contents of password textboxes, which cannot be done the usual way. A program used to inject arbitrary code into arbitrary processes is called a DLL injector.

The C date and time functions are a group of functions in the standard library of the C programming language implementing date and time manipulation operations. They provide support for time acquisition, conversion between date formats, and formatted output to strings.

C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

stdarg.h is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type. C++ provides this functionality in the header cstdarg.

Dynamic loading is a mechanism by which a computer program can, at run time, load a library into memory, retrieve the addresses of functions and variables contained in the library, execute those functions or access those variables, and unload the library from memory. It is one of the 3 mechanisms by which a computer program can use some other software; the other two are static linking and dynamic linking. Unlike static linking and dynamic linking, dynamic loading allows a computer program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality.

chattr is the command in Linux that allows a user to set certain attributes of a file. lsattr is the command that displays the attributes of a file.

Java Native Access (JNA) is a community-developed library that provides Java programs easy access to native shared libraries without using the Java Native Interface (JNI). JNA's design aims to provide native access in a natural way with a minimum of effort. Unlike JNI, no boilerplate or generated glue code is required.

Kernel markers were a static kernel instrumentation support mechanism for Linux kernel source code, allowing special tools such as LTTng or SystemTap to trace information exposed by these probe points. Kernel markers were declared in the kernel code by one-liners of the form:

Kqueue is a scalable event notification interface introduced in FreeBSD 4.1 in July 2000, also supported in NetBSD, OpenBSD, DragonFly BSD, and macOS. Kqueue was originally authored in 2000 by Jonathan Lemon, then involved with the FreeBSD Core Team. Kqueue makes it possible for software like nginx to solve the c10k problem.

A code sanitizer is a programming tool that detects bugs in the form of undefined or suspicious behavior by a compiler inserting instrumentation code at runtime. The class of tools was first introduced by Google's AddressSanitizer of 2012, which uses directly mapped shadow memory to detect memory corruption such as buffer overflows or accesses to a dangling pointer (use-after-free).

References

  1. 1 2 Drepper, Ulrich (2000-06-07). "weak handling".
  2. "GCC Manual, 6.58.9 Weak Pragmas".
  3. "GCC Manual, 6.30 Declaring Attributes of Functions". GNU. Retrieved 2013-05-29.
  4. "GCC Manual, 6.36 Specifying Attributes of Variables".
  5. "Oracle Solaris Studio 12.3: C User's Guide, 2.11.27 weak".
  6. "Oracle Solaris Studio 12.3: C User's Guide, 2.9 Supported Attributes".
  7. 1 2 "Oracle Solaris 11 Express 11/10 Linker and Libraries Guide, 2.11 Weak Symbols".
  8. Drepper, Ulrich (October 2011). "How To Write Shared Libraries (Version 4.1.2), 1.5.2 Symbol Relocations, page 6" (PDF).
  9. "Weak Linking and Linux Shared Libraries".
  10. "GNU LD man page".
  11. Kiszka, Jan (2006-05-23). "Re: weak-attribute over-optimisation with 4.1".