Open (system call)

Last updated

For most file systems, a program initializes access to a file in a file system using the open system call. This allocates resources associated to the file (the file descriptor), and returns a handle that the process will use to refer to that file. In some cases the open is performed by the first access.

Contents

The same file may be opened simultaneously by several processes, and even by the same process, resulting in several file descriptors for the same file; depending on the file organization and filesystem. Operations on the descriptors such as moving the file pointer or closing it are independentthey do not affect other descriptors for the same file. Operations on the file, such as a write, can be seen by operations on the other descriptors: a later read can read the newly written data.

During the open, the filesystem may allocate memory for buffers, or it may wait until the first operation.

The absolute file path is resolved. This may include connecting to a remote host and notifying an operator that a removable medium is required. It may include the initialization of a communication device. At this point an error may be returned if the host or medium is not available. The first access to at least the directory within the filesystem is performed. An error will usually be returned if the higher level components of the path (directories) cannot be located or accessed. An error will be returned if the file is expected to exist and it does not or if the file should not already exist and it does.

If the file is expected to exist and it does, the file access, as restricted by permission flags within the file meta data or access control list, is validated against the requested type of operations. This usually requires an additional filesystem access although in some filesystems meta-flags may be part of the directory structure.

If the file is being created, the filesystem may allocate the default initial amount of storage or a specified amount depending on the file system capabilities. If this fails an error will be returned. Updating the directory with the new entry may be performed or it may be delayed until the close is performed.

Various other errors which may occur during the open include directory update failures, un-permitted multiple connections, media failures, communication link failures and device failures.

The return value must always be examined and an error specific action taken.

In many cases programming language-specific run-time library opens may perform additional actions including initializing a run-time library structure related to the file.

As soon as a file is no longer needed, the program should close it. This will cause run-time library and filesystem buffers to be updated to the physical media and permit other processes to access the data if exclusive use had been required. Some run-time libraries may close a file if the program calls the run-time exit. Some filesystems may perform the necessary operations if the program terminates. Neither of these is likely to take place in the event of a kernel or power failure. This can cause damaged filesystem structures requiring the running of privileged and lengthy filesystem utilities during which the entire filesystem may be inaccessible.

open call arguments

  1. The pathname to the file,
  2. The kind of access requested on the file (read, write, append etc.),
  3. The initial file permission is requested using the third argument called mode. This argument is relevant only when a new file is being created.

After using the file, the process should close the file using close call, which takes the file descriptor of the file to be closed. Some filesystems include a disposition to permit releasing the file.

Some computer languages include run-time libraries which include additional functionality for particular filesystems. The open (or some auxiliary routine) may include specifications for key size, record size, connection speed. Some open routines include specification of the program code to be executed in the event of an error.

Perl language form

openFILEHANDLE,MODE[,EXPR]

for example:

open(my$fh,">","output.txt");

Perl also uses the tie function of the Tie::File module to associate an array with a file. [1] The tie::AnyDBM_File function associates a hash with a file. [2]

C library POSIX definition

The open call is standardized by the POSIX specification for C language:

intopen(constchar*path,intoflag,.../*,mode_t mode */);intopenat(intfd,constchar*path,intoflag,...);intcreat(constchar*path,mode_tmode);FILE*fopen(constchar*restrictfilename,constchar*restrictmode);

The value returned is a file descriptor which is a reference to a process specific structure which contains, among other things, a position pointer that indicates which place in the file will be acted upon by the next operation.

Open may return −1 indicating a failure with errno detailing the error.

The file system also updates a global table of all open files which is used for determining if a file is currently in use by any process.

path

The name of the file to open. It includes the file path defining where, in which file system, the file is found (or should be created).

openat expects a relative path.

oflag

This argument formed by OR'ing together optional parameters and (from <fcntl.h>) one of:

O_RDONLY, O_RDWR and O_WRONLY

Option parameters include:

O_APPEND data written will be appended to the end of the file. The file operations will always adjust the position pointer to the end of the file.
O_CREAT Create the file if it does not exist; otherwise the open fails setting errno to ENOENT.
O_EXCL Used with O_CREAT if the file already exists, then fail, setting errno to EEXIST.
O_TRUNC If the file already exists then discard its previous contents, reducing it to an empty file. Not applicable for a device or named pipe.

Additional flags and errors are defined in open call.

creat() is implemented as:

intcreat(constchar*path,mode_tmode){returnopen(path,O_WRONLY|O_CREAT|O_TRUNC,mode);}

fopen uses string flags such as r, w, a and + and returns a file pointer used with fgets, fputs and fclose.

mode

Optional and relevant only when creating a new file, defines the file permissions. These include read, write or execute the file by the owner, group or all users. The mode is masked by the calling process's umask: bits set in the umask are cleared in the mode.

See also

Notes

  1. "Tie::File". perldoc.perl.org. Retrieved 2011-08-07.
  2. "AnyDBM_File". perldoc.perl.org. Retrieved 2011-08-07.

Related Research Articles

GNU Debugger Source-level debugger

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, C, C++, Objective-C, Free Pascal, Fortran, Go, and partially others.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created. In some cases, an object is considered immutable even if some internally used attributes change, but the object's state appears unchanging from an external point of view. For example, an object that uses memoization to cache the results of expensive computations could still be considered an immutable object.

The C programming language provides many standard library functions for file input and output. These functions make up the bulk of the C standard library header <stdio.h>. The functionality descends from a "portable I/O package" written by Mike Lesk at Bell Labs in the early 1970s, and officially became part of the Unix operating system in Version 7.

In Unix and Unix-like computer operating systems, a file descriptor is a unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

stat (system call)

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

java.nio is a collection of Java programming language APIs that offer features for intensive I/O operations. It was introduced with the J2SE 1.4 release of Java by Sun Microsystems to complement an existing standard I/O. NIO was developed under the Java Community Process as JSR 51. An extension to NIO that offers a new file system API, called NIO.2, was released with Java SE 7 ("Dolphin").

In the C, C++, D, JavaScript, Julia, Rust programming languages, among others, const is a type qualifier: a keyword applied to a data type that indicates that the data is read only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in being part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages like Rust, the data is not in a single memory location, but copied at compile time on each use.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

inetd is a super-server daemon on many Unix systems that provides Internet services. For each configured service, it listens for requests from connecting clients. Requests are served by spawning a process which runs the appropriate executable, but simple services such as echo are served by inetd itself. External executables, which are run on request, can be single- or multi-threaded. First appearing in 4.3BSD, it is generally located at /usr/sbin/inetd.

In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process, replacing the previous executable. This act is also referred to as an overlay. It is especially important in Unix-like systems, although it exists elsewhere. As no new process is created, the process identifier (PID) does not change, but the machine code, data, heap, and stack of the process are replaced by those of the new program.

Spawn in computing refers to a function that loads and executes a new child process. The current process may wait for the child to terminate or may continue to execute concurrent computing. Creating a new subprocess requires enough memory in which both the child process and the current program can execute.

select is a system call and application programming interface (API) in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels. The select system call is similar to the poll facility introduced in UNIX System V and later operating systems. However, with the c10k problem, both select and poll have been superseded by the likes of kqueue, epoll, /dev/poll and I/O completion ports.

In computer programming, a handle is an abstract reference to a resource that is used when application software references blocks of memory or objects that are managed by another system like a database or an operating system.

In the C and C++ programming languages, the comma operator is a binary operator that evaluates its first operand and discards the result, and then evaluates the second operand and returns this value ; there is a sequence point between these evaluations.

In computing, undefined value is a condition where an expression does not have a correct value, although it is syntactically correct. An undefined value must not be confused with empty string, Boolean "false" or other "empty" values. Depending on circumstances, evaluation to an undefined value may lead to exception or undefined behaviour, but in some programming languages undefined values can occur during a normal, predictable course of program execution.

A close system call is a system call used to close a file descriptor by the kernel. For most file systems, a program terminates access to a file in a filesystem using the close system call. This flushes file buffers, updates file metadata, which may include and end-of-file indicator in the data; de-allocates resources associated with the file and updates the system wide table of files in use. Some programming languages maintain a data structure of files opened by their runtime library and may close when the program terminates. This practice is known as resource acquisition is initialization (RAII). Some operating systems will invoke the close on files held by a program if it terminates. Some operating systems will invoke the close syscall as part of an operating system recovery as a result of a system failure.

The write is one of the most basic routines provided by a Unix-like operating system kernel. It writes data from a buffer declared by the user to a given device, such as a file. This is the primary way to output data from a program by directly using a system call. The destination is identified by a numeric code. The data to be written, for instance a piece of text, is defined by a pointer and a size, given in number of bytes.

In Unix-like operating systems, dup and dup2 system calls create a copy of a given file descriptor. This new descriptor actually does not behave like a copy, but like an alias of the old one.

References