Shebang (Unix)

Last updated

#!
shebang

In computing, a shebang is the character sequence #!, consisting of the characters number sign (also known as sharp or hash) and exclamation mark (also known as bang), at the beginning of a script. It is also called sharp-exclamation, sha-bang, [1] [2] hashbang, [3] [4] pound-bang, [5] [6] or hash-pling. [7]

Contents

When a text file with a shebang is used as if it were an executable in a Unix-like operating system, the program loader mechanism parses the rest of the file's initial line as an interpreter directive. The loader executes the specified interpreter program, passing to it as an argument the path that was initially used when attempting to run the script, so that the program may use the file as input data. [8] For example, if a script is named with the path path/to/script, and it starts with the line #!/bin/sh, then the program loader is instructed to run the program /bin/sh, passing path/to/script as the first argument.

The shebang line is usually ignored by the interpreter, because the "#" character is a comment marker in many scripting languages; some language interpreters that do not use the hash mark to begin comments still may ignore the shebang line in recognition of its purpose. [9]

Syntax

The form of a shebang interpreter directive is as follows: [8]

#!interpreter [optional-arg]

in which interpreter is a path to an executable program. The space between #! and interpreter is optional. There could be any number of spaces or tabs either before or after interpreter. The optional-arg will include any extra spaces up to the end-of-line.

In Linux, the file specified by interpreter can be executed if it has the execute rights and is one of the following:

On Linux and Minix, an interpreter can also be a script. A chain of shebangs and wrappers yields a directly executable file that gets the encountered scripts as parameters in reverse order. For example, if file /bin/A is an executable file in ELF format, file /bin/B contains the shebang #!/bin/A optparam, and file /bin/C contains the shebang #!/bin/B, then executing file /bin/C resolves to /bin/B /bin/C, which finally resolves to /bin/A optparam /bin/B /bin/C.

In Solaris- and Darwin-derived operating systems (such as macOS), the file specified by interpreter must be an executable binary and cannot itself be a script. [10]

Examples

Some typical shebang lines:

Shebang lines may include specific options that are passed to the interpreter. However, implementations vary in the parsing behavior of options; for portability, only one option should be specified without any embedded whitespace. [11] Further portability guidelines are found below.

Purpose

Interpreter directives allow scripts and data files to be used as commands, hiding the details of their implementation from users and other programs, by removing the need to prefix scripts with their interpreter on the command line.

For example, consider a script having the initial line #!/bin/sh -x. It may be invoked simply by giving its file path, such as some/path/to/foo, [12] and some parameters, such as bar and baz:

some/path/to/foo bar baz

In this case /bin/sh is invoked in its place, with parameters -x, some/path/to/foo, bar, and baz, as if the original command had been

/bin/sh -x some/path/to/foo bar baz

Most interpreters make any additional arguments available to the script. If /bin/sh is a POSIX-compatible shell, then bar and baz are presented to the script as the positional parameter array "$@", and individually as parameters "$1" and "$2" respectively.

Because the initial # is the character used to introduce comments in the POSIX shell language (and in the languages understood by many other interpreters), the whole shebang line is ignored by the interpreter. However, it is up to the interpreter to ignore the shebang line, and not all do so; thus, a script consisting of the following two lines simply outputs both lines when run:

#!/bin/cat Hello world!

Strengths

When compared to the use of global association lists between file extensions and the interpreting applications, the interpreter directive method allows users to use interpreters not known at a global system level, and without administrator rights. It also allows specific selection of interpreter, without overloading the filename extension namespace (where one file extension refers to more than one file type), and allows the implementation language of a script to be changed without changing its invocation syntax by other programs. Invokers of the script need not know what the implementation language is as the script itself is responsible for specifying the interpreter to use.

Portability

Program location

Shebangs must specify absolute paths (or paths relative to current working directory) to system executables; this can cause problems on systems that have a non-standard file system layout. Even when systems have fairly standard paths, it is quite possible for variants of the same operating system to have different locations for the desired interpreter. Python, for example, might be in /usr/bin/python3, /usr/local/bin/python3, or even something like /home/username/bin/python3 if installed by an ordinary user.

A similar problem exists for the POSIX shell, since POSIX only required its name to be sh, but did not mandate a path. A common value is /bin/sh, but some systems such as Solaris have the POSIX-compatible shell at /usr/xpg4/bin/sh. [13] In many Linux systems, /bin/sh is a hard or symbolic link to /bin/bash, the Bourne Again shell (BASH). Using bash-specific syntax while maintaining a shebang pointing to sh is also not portable. [14]

Because of this it is sometimes required to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this reason and because POSIX does not standardize path names, POSIX does not standardize the feature. [15] The GNU Autoconf tool can test for system support with the macro AC_SYS_INTERPRETER. [16]

Often, the program /usr/bin/env can be used to circumvent this limitation by introducing a level of indirection. #! is followed by /usr/bin/env, followed by the desired command without full path, as in this example:

#!/usr/bin/env sh

This mostly works because the path /usr/bin/env is commonly used for the env utility, and it invokes the first sh found in the user's $PATH, typically /bin/sh.

This particular example (using sh) is of limited utility: neither /bin/sh nor /usr/bin/env is universal, with similar numbers of devices lacking each. More broadly using #!/usr/bin/env for any script still has some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env.

Using #!/usr/bin/env results in run-time indirection, which has the potential to degrade system security; for this reason some commentators recommend against its use [17] in packaged software, reserving it only for "educational examples".

Character interpretation

Another portability problem is the interpretation of the command arguments. Some systems, including Linux, do not split up the arguments; [18] for example, when running the script with the first line,

#!/usr/bin/env python3 -c

all text after the first space is treated as a single argument, that is, python3 -c will be passed as one argument to /usr/bin/env, rather than two arguments. Cygwin also behaves this way.

Complex interpreter invocations are possible through the use of an additional wrapper. FreeBSD 6.0 (2005) introduced a -S option to its env as it changed the shebang-reading behavior to non-splitting. This option tells env to split the string itself. [19] The GNU env utility since coreutils 8.30 (2018) also includes this feature. [20] Although using this option mitigates the portability issue on the kernel end with splitting, it adds the requirement that env supports this particular extension.

Another problem is scripts containing a carriage return character immediately after the shebang line, perhaps as a result of being edited on a system that uses DOS line breaks, such as Microsoft Windows. Some systems interpret the carriage return character as part of the interpreter command, resulting in an error message. [21]

Magic number

The shebang is actually a human-readable instance of a magic number in the executable file, the magic byte string being 0x23 0x21, the two-character encoding in ASCII of #!. This magic number is detected by the "exec" family of functions, which determine whether a file is a script or an executable binary. The presence of the shebang will result in the execution of the specified executable, usually an interpreter for the script's language. It has been claimed [22] that some old versions of Unix expect the normal shebang to be followed by a space and a slash (#! /), but this appears to be untrue; [11] rather, blanks after the shebang have traditionally been allowed, and sometimes documented with a space, as described in the 1980 historical email below.

The shebang characters are represented by the same two bytes in extended ASCII encodings, including UTF-8, which is commonly used for scripts and other text files on current Unix-like systems. However, UTF-8 files may begin with the optional byte order mark (BOM); if the "exec" function specifically detects the bytes 0x23 and 0x21, then the presence of the BOM (0xEF 0xBB 0xBF) before the shebang will prevent the script interpreter from being executed. Some authorities recommend against using the byte order mark in POSIX (Unix-like) scripts, [23] for this reason and for wider interoperability and philosophical concerns. Additionally, a byte order mark is not necessary in UTF-8, as that encoding does not have endianness issues; it serves only to identify the encoding as UTF-8. [23]

Etymology

An executable file starting with an interpreter directive is simply called a script, often prefaced with the name or general classification of the intended interpreter. The name shebang for the distinctive two characters may have come from an inexact contraction of SHArp bang or haSH bang, referring to the two typical Unix names for them. Another theory on the sh in shebang is that it is from the default shell sh, usually invoked with shebang. [24] This usage was current by December 1989, [25] and probably earlier.

History

The shebang was introduced by Dennis Ritchie between Edition 7 and 8 at Bell Laboratories. It was also added to the BSD releases from Berkeley's Computer Science Research (present at 2.8BSD [26] and activated by default by 4.2BSD). As AT&T Bell Laboratories Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.

The lack of an interpreter directive, but support for shell scripts, is apparent in the documentation from Version 7 Unix in 1979, [27] which describes instead a facility of the Bourne shell where files with execute permission would be handled specially by the shell, which would (sometimes depending on initial characters in the script, such as ":" or "#") spawn a subshell which would interpret and run the commands contained in the file. In this model, scripts would only behave as other commands if called from within a Bourne shell. An attempt to directly execute such a file via the operating system's own exec() system call would fail, preventing scripts from behaving uniformly as normal system commands.

Version 8 improved shell scripts

In later versions of Unix-like systems, this inconsistency was removed. Dennis Ritchie introduced kernel support for interpreter directives in January 1980, for Version 8 Unix, with the following description: [26]

From uucp Thu Jan 10 01:37:58 1980 >From dmr Thu Jan 10 04:25:49 1980 remote from research  The system has been changed so that if a file being executed begins with the magic characters #! , the rest of the line is understood to be the name of an interpreter for the executed file. Previously (and in fact still) the shell did much of this job; it automatically executed itself on a text file with executable mode when the text file's name was typed as a command. Putting the facility into the system gives the following benefits.  1) It makes shell scripts more like real executable files, because they can be the subject of 'exec.'  2) If you do a 'ps' while such a command is running, its real name appears instead of 'sh'. Likewise, accounting is done on the basis of the real name.  3) Shell scripts can be set-user-ID. [a]   4) It is simpler to have alternate shells available; e.g. if you like the Berkeley csh there is no question about which shell is to interpret a file.  5) It will allow other interpreters to fit in more smoothly.  To take advantage of this wonderful opportunity, put    #! /bin/sh   at the left margin of the first line of your shell scripts. Blanks after ! are OK.  Use a complete pathname (no search is done). At the moment the whole line is restricted to 16 characters but this limit will be raised.

Unnamed shell script feature

The feature's creator didn't give it a name, however: [29]

From:"Ritchie,DennisM(Dennis)**CTR**"<dmr@[redacted]> To:<[redacted]@talisman.org> Date:Thu, 19 Nov 2009 18:37:37 -0600Subject:RE:Whatdo-you-callyour#!<something>line?   I can't recall that we ever gave it a proper name. It was pretty late that it went in--I think that I got the idea from someone at one of the UCB conferences on Berkeley Unix; I may have been one of the first to actually install it, but it was an idea that I got from elsewhere.  As for the name: probably something descriptive like "hash-bang" though this has a specifically British flavor, but in any event I don't recall particularly using a pet name for the construction. 

Kernel support for interpreter directives spread to other versions of Unix, and one modern implementation can be seen in the Linux kernel source in fs/binfmt_script.c. [30]

This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems).

Note that, even in systems with full kernel support for the #! magic number, some scripts lacking interpreter directives (although usually still requiring execute permission) are still runnable by virtue of the legacy script handling of the Bourne shell, still present in many of its modern descendants. Scripts are then interpreted by the user's default shell.

See also

Notes

  1. The setuid feature is disabled in most modern operating systems following the realization that a race condition can be exploited to change the script while it's being processed. [28]

Related Research Articles

<span class="mw-page-title-main">Bash (Unix shell)</span> GNU replacement for the Bourne shell

Bash, short for Bourne-Again SHell, is a shell program and command language supported by the Free Software Foundation and first developed for the GNU Project by Brian Fox. Designed as a 100% free software alternative for the Bourne shell, it was initially released in 1989. Its moniker is a play on words, referencing both its predecessor, the Bourne shell, and the concept of rebirth.

<span class="mw-page-title-main">Cygwin</span> Unix-like environment for Windows

Cygwin is a free and open-source Unix-like environment and command-line interface (CLI) for Microsoft Windows. The project also provides a software repository containing many open-source packages. Cygwin allows source code for Unix-like operating systems to be compiled and run on Windows. Cygwin provides native integration of Windows-based applications.

<span class="mw-page-title-main">Shell script</span> Script written for the shell, or command line interpreter, of an operating system

A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be command languages. Typical operations performed by shell scripts include file manipulation, program execution, and printing text. A script which sets up the environment, runs the program, and does any necessary cleanup or logging, is called a wrapper.

<span class="mw-page-title-main">Unix shell</span> Command-line interpreter for Unix operating system

A Unix shell is a command-line interpreter or shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command language and a scripting language, and is used by the operating system to control the execution of the system using shell scripts.

<span class="mw-page-title-main">Bourne shell</span> Command-line interpreter for operating systems

The Bourne shell (sh) is a shell command-line interpreter for computer operating systems.

<span class="mw-page-title-main">C shell</span> Unix shell

The C shell is a Unix shell created by Bill Joy while he was a graduate student at University of California, Berkeley in the late 1970s. It has been widely distributed, beginning with the 2BSD release of the Berkeley Software Distribution (BSD) which Joy first distributed in 1978. Other early contributors to the ideas or the code were Michael Ubell, Eric Allman, Mike O'Brien and Jim Kulp.

cd (command) Computer command in various operating systems

The cd command, also known as chdir, is a command-line shell command used to change the current working directory in various operating systems. It can be used in shell scripts and batch files.

Almquist shell is a lightweight Unix shell originally written by Kenneth Almquist in the late 1980s. Initially a clone of the System V.4 variant of the Bourne shell, it replaced the original Bourne shell in the BSD versions of Unix released in the early 1990s.

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

time (Unix) Command in Unix and Unix-like operating systems

In computing, time is a command in Unix and Unix-like operating systems. It is used to determine the duration of execution of a particular command.

In Unix-like operating systems, find is a command-line utility that locates files based on some user-specified criteria and either prints the pathname of each matched object or, if another action is requested, performs that action on each matched object.

<span class="mw-page-title-main">Comparison of command shells</span>

A command shell is a command-line interface to interact with and manipulate a computer's operating system.

env Unix command utility

env is a shell command for Unix and Unix-like operating systems. It is used to either print a list of environment variables or run another utility in an altered environment without having to modify the currently existing environment. Using env, variables may be added or removed, and existing variables may be changed by assigning new values to them.

alias (command) Command in various command line interpreters

In computing, alias is a command in various command-line interpreters (shells), which enables a replacement of a word by another string. It is mainly used for abbreviating a system command, or for adding default arguments to a regularly used command. alias is available in Unix shells, AmigaDOS, 4DOS/4NT, FreeDOS, KolibriOS, Windows PowerShell, ReactOS, and the EFI shell. Aliasing functionality in the MS-DOS and Microsoft Windows operating systems is provided by the DOSKey command-line utility.

PATH is an environment variable on Unix-like operating systems, DOS, OS/2, and Microsoft Windows, specifying a set of directories where executable programs are located. In general, each executing process or user session has its own PATH setting.

The script command is a Unix utility that records a terminal session. It dates back to the 1979 3.0 Berkeley Software Distribution (BSD).

An interpreter directive is a computer language construct, that on some systems is better described as an aspect of the system's executable file format, that is used to control which interpreter parses and interprets the instructions in a computer program.

In a Unix shell, the full stop called the dot command (.) is a command that evaluates commands in a computer file in the current execution context. In the C shell, a similar functionality is provided as the source command, and this name is seen in "extended" POSIX shells as well.

The restricted shell is a Unix shell that restricts some of the capabilities available to an interactive user session, or to a shell script, running within it. It is intended to provide an additional layer of security, but is insufficient to allow execution of entirely untrusted software. A restricted mode operation is found in the original Bourne shell and its later counterpart Bash, and in the KornShell. In some cases a restricted shell is used in conjunction with a chroot jail, in a further attempt to limit access to the system as a whole.

<span class="mw-page-title-main">Command-line interface</span> Computer interface that uses text

A command-line interface (CLI) is a means of interacting with a computer program by inputting lines of text called command-lines. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternative to the non-interactive interface available with punched cards.

References

  1. "Advanced Bash Scripting Guide: Chapter 2. Starting Off With a Sha-Bang". Archived from the original on 10 December 2019. Retrieved 10 December 2019.
  2. Cooper, Mendel (5 November 2010). Advanced Bash Scripting Guide 5.3 Volume 1. lulu.com. p. 5. ISBN   978-1-4357-5218-4.
  3. MacDonald, Matthew (2011). HTML5: The Missing Manual. Sebastopol, California: O'Reilly Media. p. 373. ISBN   978-1-4493-0239-9.
  4. Lutz, Mark (September 2009). Learning Python (4th ed.). O'Reilly Media. p. 48. ISBN   978-0-596-15806-4.
  5. Guelich, Gundavaram and Birznieks, Scott, Shishir and Gunther (29 July 2000). CGI Programming with PERL (2nd ed.). O'Reilly Media. p.  358. ISBN   978-1-56592-419-2.{{cite book}}: CS1 maint: multiple names: authors list (link)
  6. Lie Hetland, Magnus (4 October 2005). Beginning Python: From Novice to Professional. Apress. p. 21. ISBN   978-1-59059-519-0.
  7. Schitka, John (24 December 2002). Linux+ Guide to Linux Certification. Course Technology. p. 353. ISBN   978-0-619-13004-6.
  8. 1 2 "execve(2) - Linux man page" . Retrieved 21 October 2010.
  9. "SRFI 22".
  10. "Python - Python3 shebang line not working as expected".
  11. 1 2 Maschek, Sven (30 December 2010). "The #! magic, details about the shebang/hash-bang mechanism: Blank after #! required?". www.in-ulm.de. Retrieved 18 January 2024.
  12. if permitted by the file's exec permission bits
  13. "The Open Group Base Specifications Issue 7". 2008. Retrieved 5 April 2010.
  14. "pixelbeat.org: Common shell script mistakes". It's much better to test scripts directly in a POSIX compliant shell if possible. The `bash --posix` option doesn't suffice as it still accepts some 'bashisms'
  15. "Chapter 2. Shell Command Language", The Open Group Base Specifications (IEEE Std 1003.1-2017) (Issue 7 ed.), IEEE, 2018 [2008], If the first line of a file of shell commands starts with the characters "#!", the results are unspecified
  16. Autoconf, Free Software Foundation, Macro: AC_SYS_INTERPRETER: Check whether the system supports starting scripts with a line of the form '#!/bin/sh' to select the interpreter to use for the script.
  17. "What's with #!/usr/bin/env bash?" . Retrieved 6 March 2024.
  18. "/usr/bin/env behaviour". Mail-index.netbsd.org. 9 November 2008. Retrieved 18 November 2010.
  19. env(1)    FreeBSD General Commands Manual
  20. "env invocation". GNU Coreutils. Retrieved 11 February 2020.
  21. "Carriage Return causes bash to fail". 8 November 2013.
  22. "GNU Autoconf Manual v2.57, Chapter 10: Portable Shell Programming". Archived from the original on 18 January 2008. Retrieved 14 May 2020.
  23. 1 2 "FAQ UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8 bytes are in big-endian order?". Unicode. Retrieved 10 November 2023.
  24. "Jargon File entry for shebang". Catb.org. Retrieved 16 June 2010.
  25. Wall, Larry. "Perl didn't grok setuid scripts that had a space on the first line between the shebang and the interpreter name". USENET.
  26. 1 2 "CSRG Archive CD-ROMs".
  27. UNIX TIME-SHARING SYSTEM: UNIX PROGRAMMER'S MANUAL (PDF), vol. 2A (Seventh ed.), January 1979
  28. Gilles. "linux - Why is SUID disabled for shell scripts but not for binaries?". Information Security Stack Exchange.
  29. Richie, Dennis. "Dennis Ritchie and Hash-Bang". Talisman.org. Retrieved 3 December 2020.
  30. Rubini, Alessandro (31 December 1997). "Playing with Binary Formats". Linux Journal. Retrieved 1 January 2015.