End-of-file

Last updated

In computing, end-of-file (EOF) [1] is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream.

Contents

Details

In the C standard library, the character reading functions such as getchar return a value equal to the symbolic value (macro) EOF to indicate that an end-of-file condition has occurred. The actual value of EOF is implementation-dependent and must be negative (but is commonly −1, such as in glibc [2] ). Block-reading functions return the number of bytes read, and if this is fewer than asked for, then the end of file was reached or an error occurred (checking of errno or dedicated function, such as ferror is required to determine which).

EOF character

Input from a terminal never really "ends" (unless the device is disconnected), but it is useful to enter more than one "file" into a terminal, so a key sequence is reserved to indicate end of input. In UNIX the translation of the keystroke to EOF is performed by the terminal driver, so a program does not need to distinguish terminals from other input files. By default, the driver converts a Control-D character at the start of a line into an end-of-file indicator. To insert an actual Control-D (ASCII 04) character into the input stream, the user precedes it with a "quote" command character (usually Control-V). AmigaDOS is similar but uses Control-\ instead of Control-D.

In DOS and Windows (and in CP/M and many DEC operating systems such as the PDP-6 monitor, [3] RT-11, VMS or TOPS-10 [4] ), reading from the terminal will never produce an EOF. Instead, programs recognize that the source is a terminal (or other "character device") and interpret a given reserved character or sequence as an end-of-file indicator; most commonly this is an ASCII Control-Z, code 26. Some MS-DOS programs, including parts of the Microsoft MS-DOS shell (COMMAND.COM) and operating-system utility programs (such as EDLIN), treat a Control-Z in a text file as marking the end of meaningful data, and/or append a Control-Z to the end when writing a text file. This was done for two reasons:

In the ANSI X3.27-1969 magnetic tape standard, the end of file was indicated by a tape mark, which consisted of a gap of approximately 3.5 inches of tape followed by a single byte containing the character 13 (hex) for nine-track tapes and 17 (octal) for seven-track tapes. [5] The end-of-tape, commonly abbreviated as EOT, was indicated by two tape marks. This was the standard used, for example, on IBM 360. The reflective strip that was used to announce impending physical end of tape was also called an EOT marker.

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. All modern computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

In computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly printing, printable, or graphic characters, except perhaps for the "space" character.

A disk operating system (DOS) is a computer operating system that resides on and can use a disk storage device, such as a floppy disk, hard disk drive, or optical disc. A disk operating system provides a file system for organizing, reading, and writing files on the storage disk. Strictly, this definition does not include any other functionality, so it does not apply to more complex OSes, such as Microsoft Windows, and is more appropriately used only for older generations of operating systems.

<span class="mw-page-title-main">Booting</span> Process of starting a computer

In computing, booting is the process of starting a computer as initiated via hardware such as a button or by a software command. After it is switched on, a computer's central processing unit (CPU) has no software in its main memory, so some process must load software into memory before it can be executed. This may be done by hardware or firmware in the CPU, or by a separate processor in the computer system.

In telecommunication, an End-of-Transmission character (EOT) is a transmission control character. Its intended use is to indicate the conclusion of a transmission that may have included one or more texts and any associated message headings.

<span class="mw-page-title-main">CP/M</span> Discontinued family of computer operating systems

CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/85-based microcomputers by Gary Kildall of Digital Research, Inc. Initially confined to single-tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations and were migrated to 16-bit processors.

<span class="mw-page-title-main">History of operating systems</span> Aspect of computing history

Computer operating systems (OSes) provide a set of functions needed and used by most application programs on a computer, and the links needed to control and synchronize computer hardware. On the first computers, with no operating system, every program needed the full hardware specification to run correctly and perform standard tasks, and its own drivers for peripheral devices like printers and punched paper card readers. The growing complexity of hardware and application programs eventually made operating systems a necessity for everyday use.

<span class="mw-page-title-main">ANSI escape code</span> Method used for display options on video text terminals

ANSI escape sequences are a standard for in-band signaling to control cursor location, color, font styling, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with an ASCII escape character and a bracket character, are embedded into text. The terminal interprets these sequences as commands, rather than text to display verbatim.

RT-11 is a discontinued small, low-end, single-user real-time operating system for the full line of Digital Equipment Corporation PDP-11 16-bit computers. RT-11 was first implemented in 1970. It was widely used for real-time computing systems, process control, and data acquisition across all PDP-11s. It was also used for low-cost general-use computing.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file (EOF) marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. Most text files need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

<span class="mw-page-title-main">Newline</span> Special characters in computing signifying the end of a line of text

Newline is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a line of text and the start of a new one.

OS/8 is the primary operating system used on the Digital Equipment Corporation's PDP-8 minicomputer.

Peripheral Interchange Program (PIP) was a utility to transfer files on and between devices on Digital Equipment Corporation's computers. It was first implemented on the PDP-6 architecture by Harrison "Dit" Morse early in the 1960s. It was subsequently implemented for DEC's operating systems for PDP-10, PDP-11, and PDP-8 architectures. In the 1970s and 1980s Digital Research implemented PIP on CP/M and MP/M.

<span class="mw-page-title-main">RSTS/E</span> Computer operating system

RSTS is a multi-user time-sharing operating system developed by Digital Equipment Corporation for the PDP-11 series of 16-bit minicomputers. The first version of RSTS was implemented in 1970 by DEC software engineers that developed the TSS-8 time-sharing operating system for the PDP-8. The last version of RSTS was released in September 1992. RSTS-11 and RSTS/E are usually referred to just as "RSTS" and this article will generally use the shorter form. RSTS-11 supports the BASIC programming language, an extended version called BASIC-PLUS, developed under contract by Evans Griffiths & Hart of Boston. Starting with RSTS/E version 5B, DEC added support for additional programming languages by emulating the execution environment of the RT-11 and RSX-11 operating systems.

dd is a command-line utility for Unix, Plan 9, Inferno, and Unix-like operating systems and beyond, the primary purpose of which is to convert and copy files. On Unix, device drivers for hardware and special device files appear in the file system just like normal files; dd can also read and/or write from/to these files, provided that function is implemented in their respective driver. As a result, dd can be used for tasks such as backing up the boot sector of a hard drive, and obtaining a fixed amount of random data. The dd program can also perform conversions on the data as it is copied, including byte order swapping and conversion to and from the ASCII and EBCDIC text encodings.

A fat binary is a computer executable program or library which has been expanded with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.

<span class="mw-page-title-main">Rainbow 100</span> DEC microcomputer

The Rainbow 100 is a microcomputer introduced by Digital Equipment Corporation (DEC) in 1982. This desktop unit had a monitor similar to the VT220 and a dual-CPU box with both 4 MHz Zilog Z80 and 4.81 MHz Intel 8088 CPUs. The Rainbow 100 was a triple-use machine: VT100 mode, 8-bit CP/M mode, and CP/M-86 or MS-DOS mode using the 8088. It ultimately failed to in the marketplace which became dominated by the simpler IBM PC and its clones which established the industry standard as compatibility with CP/M became less important than IBM PC compatibility. Writer David Ahl called it a disastrous foray into the personal computer market. The Rainbow was launched along with the similarly packaged DEC Professional and DECmate II which were also not successful. The failure of DEC to gain a significant foothold in the high-volume PC market would be the beginning of the end of the computer hardware industry in New England, as nearly all computer companies located there were focused on minicomputers for large organizations, from DEC to Data General, Wang, Prime, Computervision, Honeywell, and Symbolics Inc.

In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some programming languages.

In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

References

  1. Pollock, Wayne. "Shell Here Document Overview". hccfl.edu. Archived from the original on 2014-05-29. Retrieved 2014-05-28.
  2. "The GNU C Library". www.gnu.org.
  3. "Table of IO Device Characteristics - Console or Teletypewriters". PDP-6 Multiprogramming System Manual (PDF). Maynard, Massachusetts, USA: Digital Equipment Corporation (DEC). 1965. p. 43. DEC-6-0-EX-SYS-UM-IP-PRE00. Archived (PDF) from the original on 2014-07-14. Retrieved 2014-07-10. (1+84+10 pages)
  4. "5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line)". PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors (PDF). Vol. 3. Digital Equipment Corporation (DEC). 1969. pp. 5-3 – 5-6 [5-5 (431)]. Archived (PDF) from the original on 2011-11-15. Retrieved 2014-07-10. (207 pages)
  5. "Tape Transfer (Pre-1977): Exchange Media: MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media (Library of Congress)". www.loc.gov.