Getopt

Last updated

Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.

Contents

History

A long-standing issue with command line programs was how to specify options; early programs used many ways of doing so, including single character options (-a), multiple options specified together (-abc is equivalent to -a -b -c), multicharacter options (-inum), options with arguments (-a arg, -inum 3, -a=arg), and different prefix characters (-a, +b, /c).

The getopt function was written to be a standard mechanism that all programs could use to parse command-line options so that there would be a common interface on which everyone could depend. As such, the original authors picked out of the variations support for single character options, multiple options specified together, and options with arguments (-a arg or -aarg), all controllable by an option string.

getopt dates back to at least 1980 [1] and was first published by AT&T at the 1985 UNIFORUM conference in Dallas, Texas, with the intent for it to be available in the public domain. [2] Versions of it were subsequently picked up by other flavors of Unix (4.3BSD, Linux, etc.). It is specified in the POSIX.2 standard as part of the unistd.h header file. Derivatives of getopt have been created for many programming languages to parse command-line options.

A POSIX-standard companion function to getopt [3] is getsubopt. [4] It parses a string of comma-separated sub-options. It appeared in 4.4BSD (1995). [5]

Extensions

getopt is a system dependent function, and its behavior depends on the implementation in the C library. Some custom implementations like gnulib are available, however. [6]

The conventional (POSIX and BSD) handling is that the options end when the first non-option argument is encountered, and that getopt would return -1 to signal that. In the glibc extension, however, options are allowed anywhere for ease of use; getopt implicitly permutes the argument vector so it still leaves the non-options in the end. Since POSIX already has the convention of returning -1 on -- and skipping it, one can always portably use it as an end-of-options signifier. [6]

A GNU extension, getopt_long, allows parsing of more readable, multicharacter options, which are introduced by two dashes instead of one. The choice of two dashes allows multicharacter options (--inum) to be differentiated from single character options specified together (-abc). The GNU extension also allows an alternative format for options with arguments: --name=arg. [6] This interface proved popular, and has been taken up (sans the permutation) by many BSD distributions including FreeBSD as well as Solaris. [7] An alternative way to support long options is seen in Solaris and Korn Shell (extending optstring), but it was not as popular. [8]

Another common advanced extension of getopt is resetting the state of argument parsing; this is useful as a replacement of the options-anyware GNU extension, or as a way to "layer" a set of command-line interface with different options at different levels. This is achieved in BSD systems using an optreset variable, and on GNU systems by setting optind to 0. [6]

Usage

For users

The command-line syntaxes for getopt-based programs is the POSIX-recommended Utility Argument Syntax. In short: [9]

Extensions on the syntax include the GNU convention and Sun's CLIP specification. [10] [11]

For programmers

The getopt manual from GNU specifies such a usage for getopt: [12]

#include<unistd.h>intgetopt(intargc,char*constargv[],constchar*optstring);

Here the argc and argv are defined exactly like they are in the C main function prototype; i.e., argc indicates the length of the argv array-of-strings. The optstring contains a specification of what options to look for (normal alphanumerals except W), and what options to accept arguments (colons). For example, "vf::o:" refers to three options: an argumentless v, an optional-argument f, and a mandatory-argument o. GNU here implements a W extension for long option synonyms. [12]

getopt itself returns an integer that is either an option character or -1 for end-of-options. [12] The idiom is to use a while-loop to go through options, and to use a switch-case statement to pick and act on options. See the example section of this article.

To communicate extra information back to the program, a few global extern variables are referenced by the program to fetch information from getopt:

externchar*optarg;externintoptind,opterr,optopt;
optarg
A pointer to the argument of the current option, if present. Can be used to control where to start parsing (again).
optind
Where getopt is currently looking at in argv.
opterr
A boolean switch controlling whether getopt should print error messages.
optopt
If an unrecognized option occurs, the value of that unrecognized character.

The GNU extension getopt_long interface is similar, although it belongs to a different header file and takes an extra option for defining the "short" names of long options and some extra controls. If a short name is not defined, getopt will put an index referring to the option structure in the longindex pointer instead. [12]

#include<getopt.h>intgetopt_long(intargc,char*constargv[],constchar*optstring,conststructoption*longopts,int*longindex);

Examples

Using POSIX standard getopt

#include<stdio.h>     /* for printf */#include<stdlib.h>    /* for exit */#include<unistd.h>    /* for getopt */intmain(intargc,char**argv){intc;intdigit_optind=0;intaopt=0,bopt=0;char*copt=0,*dopt=0;while((c=getopt(argc,argv,"abc:d:012"))!=-1){intthis_option_optind=optind?optind:1;switch(c){case'0':case'1':case'2':if(digit_optind!=0&&digit_optind!=this_option_optind){printf("digits occur in two different argv-elements.\n");}digit_optind=this_option_optind;printf("option %c\n",c);break;case'a':printf("option a\n");aopt=1;break;case'b':printf("option b\n");bopt=1;break;case'c':printf("option c with value '%s'\n",optarg);copt=optarg;break;case'd':printf("option d with value '%s'\n",optarg);dopt=optarg;break;case'?':break;default:printf("?? getopt returned character code 0%o ??\n",c);}}if(optind<argc){printf("non-option ARGV-elements: ");while(optind<argc){printf("%s ",argv[optind++]);}printf("\n");}exit(0);}

Using GNU extension getopt_long

#include<stdio.h>     /* for printf */#include<stdlib.h>    /* for exit */#include<getopt.h>    /* for getopt_long; POSIX standard getopt is in unistd.h */intmain(intargc,char**argv){intc;intdigit_optind=0;intaopt=0,bopt=0;char*copt=0,*dopt=0;staticstructoptionlong_options[]={/*   NAME       ARGUMENT           FLAG  SHORTNAME */{"add",required_argument,NULL,0},{"append",no_argument,NULL,0},{"delete",required_argument,NULL,0},{"verbose",no_argument,NULL,0},{"create",required_argument,NULL,'c'},{"file",required_argument,NULL,0},{NULL,0,NULL,0}};intoption_index=0;while((c=getopt_long(argc,argv,"abc:d:012",long_options,&option_index))!=-1){intthis_option_optind=optind?optind:1;switch(c){case0:printf("option %s",long_options[option_index].name);if(optarg){printf(" with arg %s",optarg);}printf("\n");break;case'0':case'1':case'2':if(digit_optind!=0&&digit_optind!=this_option_optind){printf("digits occur in two different argv-elements.\n");}digit_optind=this_option_optind;printf("option %c\n",c);break;case'a':printf("option a\n");aopt=1;break;case'b':printf("option b\n");bopt=1;break;case'c':printf("option c with value '%s'\n",optarg);copt=optarg;break;case'd':printf("option d with value '%s'\n",optarg);dopt=optarg;break;case'?':break;default:printf("?? getopt returned character code 0%o ??\n",c);}}if(optind<argc){printf("non-option ARGV-elements: ");while(optind<argc){printf("%s ",argv[optind++]);}printf("\n");}exit(0);}

In Shell

Shell script programmers commonly want to provide a consistent way of providing options. To achieve this goal, they turn to getopts and seek to port it to their own language.

The first attempt at porting was the program getopt, implemented by Unix System Laboratories (USL). This version was unable to deal with quoting and shell metacharacters, as it shows no attempts at quoting. It has been inherited to FreeBSD. [13]

In 1986, USL decided that being unsafe around metacharacters and whitespace was no longer acceptable, and they created the builtin getopts command for Unix SVR3 Bourne Shell instead. The advantage of building the command into the shell is that it now has access to the shell's variables, so values could be written safely without quoting. It uses the shell's own variables to track the position of current and argument positions, OPTIND and OPTARG, and returns the option name in a shell variable.

In 1995, getopts was included in the Single UNIX Specification version 1 / X/Open Portability Guidelines Issue 4. [14] Now a part of the POSIX Shell standard, getopts have spread far and wide in many other shells trying to be POSIX-compliant.

getopt was basically forgotten until util-linux came out with an enhanced version that fixed all of old getopt's problems by escaping. It also supports GNU's long option names. [15] On the other hand, long options have been implemented rarely in the getopts command in other shells, ksh93 being an exception.

In other languages

getopt is a concise description of the common POSIX command argument structure, and it is replicated widely by programmers seeking to provide a similar interface, both to themselves and to the user on the command-line.

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

<span class="mw-page-title-main">GNU Debugger</span> Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

printf is a C standard library function that formats text and writes it to standard output.

In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/ moves all files with names ending in .txt from the current directory to the directory textfiles. Here, * is a wildcard and *.txt is a glob pattern. The wildcard * stands for "any string of any length including empty, but excluding the path separator characters ".

xargs is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.

In computing, POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a programming language, as well as a parallel execution model. It allows a program to control multiple different flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control over these flows is achieved by making calls to the POSIX Threads API. POSIX Threads is an API defined by the Institute of Electrical and Electronics Engineers (IEEE) standard POSIX.1c, Threads extensions .

stat (system call) Unix system call

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.

In computing, sigaction is a function API defined by POSIX to give the programmer access to what should be a program's behavior when receiving specific OS signals.

In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process, replacing the previous executable. This act is also referred to as an overlay. It is especially important in Unix-like systems, although it also exists elsewhere. As no new process is created, the process identifier (PID) does not change, but the machine code, data, heap, and stack of the process are replaced by those of the new program.

setcontext is one of a family of C library functions used for context control. The setcontext family allows the implementation in C of advanced control flow patterns such as iterators, fibers, and coroutines. They may be viewed as an advanced version of setjmp/longjmp; whereas the latter allows only a single non-local jump up the stack, setcontext allows the creation of multiple cooperative threads of control, each with its own stack.

<span class="mw-page-title-main">CPU time</span> Time used by a computer

CPU time is the amount of time for which a central processing unit (CPU) was used for processing instructions of a computer program or operating system, as opposed to elapsed time, which includes for example, waiting for input/output (I/O) operations or entering low-power (idle) mode. The CPU time is measured in clock ticks or seconds. Often, it is useful to measure CPU time as a percentage of the CPU's capacity, which is called the CPU usage. CPU time and CPU usage have two main uses.

stdarg.h is a header in the C standard library of the C programming language that allows functions to accept an indefinite number of arguments. It provides facilities for stepping through a list of function arguments of unknown number and type. C++ provides this functionality in the header cstdarg.

Spawn in computing refers to a function that loads and executes a new child process. The current process may wait for the child to terminate or may continue to execute concurrent computing. Creating a new subprocess requires enough memory in which both the child process and the current program can execute.

sum is a legacy utility available on some Unix and Unix-like operating systems. This utility outputs a 16-bit checksum of each argument file, as well as the number of blocks they take on disk. Two different checksum algorithms are in use. POSIX abandoned sum in favor of cksum.

select is a system call and application programming interface (API) in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels. The select system call is similar to the poll facility introduced in UNIX System V and later operating systems. However, with the c10k problem, both select and poll have been superseded by the likes of kqueue, epoll, /dev/poll and I/O completion ports.

getopts is a built-in Unix shell command for parsing command-line arguments. It is designed to process command line arguments that follow the POSIX Utility Syntax Guidelines, based on the C interface of getopt.

Different command-line argument parsing methods are used by different programming languages to parse command-line arguments.

re2c is a free and open-source lexer generator for C, C++, Go, and Rust. It compiles declarative regular expression specifications to deterministic finite automata. Originally written by Peter Bumbulis and described in his paper, re2c was put in public domain and has been since maintained by volunteers. It is the lexer generator adopted by projects such as PHP, SpamAssassin, Ninja build system and others. Together with the Lemon parser generator, re2c is used in BRL-CAD. This combination is also used with STEPcode, an implementation of ISO 10303 standard.

References

  1. "usr/src/lib/libc/pdp11/gen/getopt.c". From System III, released June 1980, linked here from Warren Toomey's The Unix Tree project. Archived from the original on 2023-05-12. Retrieved 2024-04-22.{{cite web}}: CS1 maint: others (link)
  2. Quarterman, John (1985-11-03). "public domain AT&T getopt source". linux.co.cr (originally in mod.std.unix newsgroup). Archived from the original on 2023-05-12. Retrieved 2024-04-22.
  3. "getopt" (The Open Group Base Specifications Issue 7, 2018 edition; IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) ed.). The Open Group. 2018. Archived from the original on 2024-03-24. Retrieved 2024-04-22.
  4. "getsubopt" (The Open Group Base Specifications Issue 7, 2018 edition; IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) ed.). The Open Group. 2018. Archived from the original on 2023-12-02. Retrieved 2024-04-22.
  5. getsubopt(3)    FreeBSD Library Functions Manual
  6. 1 2 3 4 5 "getopt". GNU Gnulib. Retrieved 23 January 2020.
  7. getopt_long(3)    FreeBSD Library Functions Manual
  8. "getopt(3)". Oracle Solaris 11.2 Information Library.
  9. "Utility Conventions". POSIX.1-2018.
  10. "Argument Syntax". The GNU C Library. Retrieved 24 January 2020.
  11. David-John, Burrowes; Kowalski III, Joseph E. (2003-01-22). "CLIP Specification, Version 1.0, PSARC 1999/645" (PDF). Archived from the original (PDF) on 2020-06-27.
  12. 1 2 3 4 getopt(3)    Linux Library Functions Manual
  13. getopt(1)    FreeBSD General Commands Manual
  14. "getopts". The Open Group (POSIX 2018).
  15. getopt(1)    Linux User Manual – User Commands
  16. "visual studio - getopt.h: Compiling Linux C-Code in Windows". Stack Overflow.
  17. "Package flag".
  18. "Package getopt".
  19. "Package getopt".
  20. "System.Console.GetOpt".
  21. "Class gnu.getopt.Getopt" . Retrieved 2013-06-24.
  22. "Commons CLI". Apache Commons. Apache Software Foundation. February 27, 2013. Retrieved June 24, 2013.
  23. "Getopt::Long - perldoc.perl.org".
  24. "Getopt::Std - perldoc.perl.org".
  25. "PHP: getopt - Manual".
  26. "16.5. getopt — C-style parser for command line options — Python 3.6.0 documentation".
  27. "Parser for command line options" . Retrieved 2013-04-30.Deprecated since version 2.7
  28. "Parser for command-line options, arguments and sub-commands" . Retrieved 2013-04-30.
  29. "GNU Getopt .NET". GitHub .