Hex editor

Last updated

A hex editor (or binary file editor or byte editor) is a computer program that allows for manipulation of the fundamental binary data that constitutes a computer file. The name 'hex' comes from 'hexadecimal', a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the storage medium, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors .

Contents

Details

Screenshot of a common hex editor (hexedit by Pascal Rigaux (Pixel)) Hexedit-screenshot.png
Screenshot of a common hex editor (hexedit by Pascal Rigaux (Pixel))

With a hex editor, a user can see or edit the raw and exact contents of a file, as opposed to the interpretation of the same content that other, higher level application software may associate with the file format. For example, this could be raw image data, in contrast to the way image editing software would interpret and show the same file.

Hex editors may be used to correct data corrupted by system or application program problems where it may not be worthwhile to write a special program to make the corrections. They are useful to bypass application edit checks which may prevent correction of erroneous data. They have been used to "patch" executable programs to change or add a few instructions as an alternative to recompilation. Program fixes for IBM mainframe systems are sometimes distributed as patches rather than distributing a complete copy of the affected program.

In most hex editor applications, the data of the computer file is represented as hexadecimal values grouped in 4 groups of 4 bytes (or two groups of 8 bytes), followed by one group of 16 printable ASCII characters which correspond to each pair of hex values (each byte). Non-printable ASCII characters (e.g., Bell) and characters that would take more than one character space (e.g., tab) are typically represented by a dot (".") in the following ASCII field.

Size limits

Unlike conventional text editors, Hex editors are able to efficiently handle files with indefinite sizes, as only a portion of the file is loaded while browsing it and modified when saving it, rather than the entire file at once.

Early history

Since the invention of computers and their different uses, a variety of file formats has been created. In some special circumstances it was convenient to be able to access the data as a series of raw digits. A program called SUPERZAP (AMASPZAP) was available for IBM OS/360 systems which could edit raw disk records and also understood the format of executable files. [1] Pairs of hexadecimal digits (each pair can represent a byte) are the current standard, because the vast majority of machines and file formats in use today handle data in units or groups of 8-bit bytes. Hexadecimal and also octal are common because these digits allow one to see which bits in a byte are set. Today, decimal instead of hexadecimal representation is becoming a popular second option due to the more familiar number base and additional helper tools, such as template systems and data inspectors, that reduce the benefits of the hexadecimal numerical format.[ citation needed ]

Template systems

An example of a simple template-based hex editor Physics-Editor-For-Doukutsu-Monogatari.png
An example of a simple template-based hex editor

Some hex editors offer a template system that can present the sequence of bytes of a binary file in a structured way, covering part or all of the desired file format. Usually the GUI for a template is a separate tool window next to the main hex editor. Some cheat engine systems consist only of such a template GUI.

Typically, a template is represented as a list of labeled text boxes, such that individual values of a file can be easily edited in the appropriate format (e.g., as string, color, or decimal number). Without template support, it is necessary to find the right offset in a file where the value that is to be changed is stored. Also, raw hex editing may require conversion from hexadecimal to decimal, catering for byte order, or other data type conversion peculiarities.

Templates can be stored as files, thereby exchanged by users, and are often shared publicly over the manufacturer's website. Most if not all hex editors define their own template file format; there is no trend to support a standard or even compatibility between the various formats out in the wild.

Scripting systems

Advanced hex editors have scripting systems that let the user create macro like functionality as a sequence of user interface commands for automating common tasks. This can be used for providing scripts that automatically patch files (e.g., game cheating, modding, or product fixes provided by community) or to write more complex/intelligent templates.

Scripting languages vary widely, often being product specific languages resembling MS-DOS batch files, to systems that support fully-fledged scripting languages such as Lua or Python.

Plugin systems

A few select editors[ which? ] have a plugin system that allows to extend the GUI and add new functionality, usually loading dynamic link libraries written in a C-compatible language.

See also

Related Research Articles

Hexadecimal is a positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbols, hexadecimal uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values 0 to 9 and "A"–"F" to represent values from ten to fifteen.

Octal is a numeral system with eight as the base.

A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer; the encoding used by the computer's instruction set generally requires conversion for external use, such as for printing and display. Different types of processors may have different internal representations of numerical values and different conventions are used for integer and real numbers. Most calculations are carried out with number formats that fit into a processor register, but some software systems allow representation of arbitrarily large numbers using multiple words of memory.

In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding characters.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system.

In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.

uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at the University of California, Berkeley in 1980, for encoding binary data for transmission in email systems.

Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Historically, because of the wide range of systems and protocols that could be used to transfer messages, e-mail was often assumed to be non-8-bit-clean – however, modern SMTP servers are in most cases 8-bit clean and support 8BITMIME extension. It can also be used with data that contains non-permitted octets or line lengths exceeding SMTP limits. It is defined as a MIME content transfer encoding for use in e-mail.

<span class="mw-page-title-main">Disk editor</span> Computer software

A disk editor is a computer program that allows its user to read, edit, and write raw data on disk drives ; as such, they are sometimes called sector editors, since the read/write routines built into the electronics of most disk drives require to read/write data in chunks of sectors. Many disk editors can also be used to edit the contents of a running computer's memory or a disk image.

<span class="mw-page-title-main">Binary file</span> Non-human-readable computer file encoded in binary form

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

MBASIC is the Microsoft BASIC implementation of BASIC for the CP/M operating system. MBASIC is a descendant of the original Altair BASIC interpreters that were among Microsoft's first products. MBASIC was one of the two versions of BASIC bundled with the Osborne 1 computer. The name "MBASIC" is derived from the disk file name MBASIC.COM of the BASIC interpreter.

The Burroughs B2500 through Burroughs B4900 was a series of mainframe computers developed and manufactured by Burroughs Corporation in Pasadena, California, United States, from 1966 to 1991. They were aimed at the business world with an instruction set optimized for the COBOL programming language. They were also known as Burroughs Medium Systems, by contrast with the Burroughs Large Systems and Burroughs Small Systems.

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some programming languages.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form, making it possible to store on non-binary media such as paper tape, punch cards, etc., to display on text terminals or be printed on line-oriented printers. The format is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a object or executable file in hexadecimal format. In some applications, the Intel hex format is also used as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution. There are various tools to convert files between hexadecimal and binary format, and vice versa.

<span class="mw-page-title-main">SREC (file format)</span> File format developed by Motorola

Motorola S-record is a file format, created by Motorola in the mid-1970s, that conveys binary information as hex values in ASCII text form. This file format may also be known as SRECORD, SREC, S19, S28, S37. It is commonly used for programming flash memory in microcontrollers, EPROMs, EEPROMs, and other types of programmable logic devices. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. The HEX file is then imported by a programmer to write the machine code into non-volatile memory, or is transferred to the target system for loading and execution.

<span class="mw-page-title-main">GNU Unifont</span> Duospaced bitmap font

GNU Unifont is a free Unicode bitmap font created by Roman Czyborra. The main Unifont covers all of the Basic Multilingual Plane (BMP). The "upper" companion covers significant parts of the Supplementary Multilingual Plane (SMP). The "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is method to add a specific Unicode character to a computer file; it is a common way to input characters not directly supported by a physical keyboard. Characters can be entered either by selecting them from a display, by typing a certain sequence of keys on a physical keyboard, or by drawing the symbol by hand on touch-sensitive screen. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols.

In the C programming language, an escape sequence is specially delimited text in a character or string literal that represents one or more other characters to the compiler. It allows a programmer to specify characters that are otherwise difficult or impossible to specify in a literal.

010 Editor is a commercial hex editor and text editor for Microsoft Windows, Linux and macOS. Typically 010 Editor is used to edit text files, binary files, hard drives, processes, tagged data, source code, shell scripts, log files, etc. A large variety of binary data formats can be edited through the use of Binary Templates.

References

  1. "SuperZap" . Retrieved Jun 7, 2015.