Hex editor

Last updated

A hex editor (or binary file editor or byte editor) is a computer program that allows for manipulation of the fundamental binary data that constitutes a computer file. The name 'hex' comes from 'hexadecimal', a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the storage medium, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors .

Contents

Details

Screenshot of a common hex editor (hexedit by Pascal Rigaux (Pixel)) Hexedit-screenshot.png
Screenshot of a common hex editor (hexedit by Pascal Rigaux (Pixel))

With a hex editor, a user can see or edit the raw and exact contents of a file, as opposed to the interpretation of the same content that other, higher level application software may associate with the file format. For example, this could be raw image data, in contrast to the way image editing software would interpret and show the same file.

Hex editors may be used to correct data corrupted by system or application program problems where it may not be worthwhile to write a special program to make the corrections. They are useful to bypass application edit checks which may prevent correction of erroneous data. They have been used to "patch" executable programs to change or add a few instructions as an alternative to recompilation. Program fixes for IBM mainframe systems are sometimes distributed as patches rather than distributing a complete copy of the affected program.

In most hex editor applications, the data of the computer file is represented as hexadecimal values grouped in 4 groups of 4 bytes (or two groups of 8 bytes), followed by one group of 16 printable ASCII characters which correspond to each pair of hex values (each byte). Non-printable ASCII characters (e.g., Bell) and characters that would take more than one character space (e.g., tab) are typically represented by a dot (".") in the following ASCII field.

Size limits

Unlike conventional text editors, Hex editors are able to efficiently handle files with indefinite sizes, as only a portion of the file is loaded while browsing it and modified when saving it, rather than the entire file at once.

Early history

Since the invention of computers and their different uses, a variety of file formats has been created. In some special circumstances it was convenient to be able to access the data as a series of raw digits. A program called SUPERZAP (AMASPZAP) was available for IBM OS/360 systems which could edit raw disk records and also understood the format of executable files. [1] Pairs of hexadecimal digits (each pair can represent a byte) are the current standard, because the vast majority of machines and file formats in use today handle data in units or groups of 8-bit bytes. Hexadecimal and also octal are common because these digits allow one to see which bits in a byte are set. Today, decimal instead of hexadecimal representation is becoming a popular second option due to the more familiar number base and additional helper tools, such as template systems and data inspectors, that reduce the benefits of the hexadecimal numerical format.[ citation needed ]

Template systems

An example of a simple template-based hex editor. Physics-Editor-For-Doukutsu-Monogatari.png
An example of a simple template-based hex editor.

Some hex editors offer a template system that can present the sequence of bytes of a binary file in a structured way, covering part or all of the desired file format. Usually the GUI for a template is a separate tool window next to the main hex editor. Some cheat engine systems consist only of such a template GUI.

Typically, a template is represented as a list of labeled text boxes, such that individual values of a file can be easily edited in the appropriate format (e.g., as string, color, or decimal number). Without template support, it is necessary to find the right offset in a file where the value that is to be changed is stored. Also, raw hex editing may require conversion from hexadecimal to decimal, catering for byte order, or other data type conversion peculiarities.

Templates can be stored as files, thereby exchanged by users, and are often shared publicly over the manufacturer's website. Most if not all hex editors define their own template file format; there is no trend to support a standard or even compatibility between the various formats out in the wild.

Scripting systems

Advanced hex editors have scripting systems that let the user create macro like functionality as a sequence of user interface commands for automating common tasks. This can be used for providing scripts that automatically patch files (e.g., game cheating, modding, or product fixes provided by community) or to write more complex/intelligent templates.

Scripting languages vary widely, often being product specific languages resembling MS-DOS batch files, to systems that support fully-fledged scripting languages such as Lua or Python.

Plugin systems

A few select editors have a plugin system that allows to extend the GUI and add new functionality, usually loading dynamic link libraries written in a C-compatible language.

See also

Related Research Articles

In mathematics and computing, the hexadecimal numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexadecimal uses 16 distinct symbols, most often the symbols "0"–"9" to represent values 0 to 9, and "A"–"F" to represent values from 10 to 15.

<span class="mw-page-title-main">Nibble</span> Unit of information, four bits

In computing, a nibble (occasionally nybble, nyble, or nybl to match the spelling of byte) is a four-bit aggregation, or half an octet. It is also known as half-byte or tetrade. In a networking or telecommunication context, the nibble is often called a semi-octet, quadbit, or quartet. A nibble has sixteen (24) possible values. A nibble can be represented by a single hexadecimal digit (0F) and called a hex digit.

The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, uses a base-10 number system, hence a true octal system might use different vocabulary.

A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer; the encoding used by the computer's instruction set generally requires conversion for external use, such as for printing and display. Different types of processors may have different internal representations of numerical values and different conventions are used for integer and real numbers. Most calculations are carried out with number formats that fit into a processor register, but some software systems allow representation of arbitrarily large numbers using multiple words of memory.

In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding characters.

A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating systems such as CP/M and MS-DOS, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file (EOF) marker, as padding after the last line in a text file. On modern operating systems such as Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes. Most text files need to have end-of-line delimiters, which are done in a few different ways depending on operating system. Some operating systems with record-orientated file systems may not use new line delimiters and will primarily store text files with lines separated as fixed or variable length records.

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems.

<span class="mw-page-title-main">Disk editor</span> Computer software

A disk editor is a computer program that allows its user to read, edit, and write raw data on disk drives ; as such, they are sometimes called sector editors, since the read/write routines built into the electronics of most disk drives require to read/write data in chunks of sectors. Many disk editors can also be used to edit the contents of a running computer's memory or a disk image.

<span class="mw-page-title-main">Binary file</span> Non-human-readable computer file encoded in binary form

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

MBASIC is the Microsoft BASIC implementation of BASIC for the CP/M operating system. MBASIC is a descendant of the original Altair BASIC interpreters that were among Microsoft's first products. MBASIC was one of the two versions of BASIC bundled with the Osborne 1 computer. The name "MBASIC" is derived from the disk file name MBASIC.COM of the BASIC interpreter.

Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data, it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data.

The Burroughs B2500 through Burroughs B4900 was a series of mainframe computers developed and manufactured by Burroughs Corporation in Pasadena, California, United States, from 1966 to 1991. They were aimed at the business world with an instruction set optimized for the COBOL programming language. They were also known as Burroughs Medium Systems, by contrast with the Burroughs Large Systems and Burroughs Small Systems.

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

In computer data, a substitute character (␚) is a control character that is used to pad transmitted data in order to send it in blocks of fixed size, or to stand in place of a character that is recognized to be invalid, erroneous or unrepresentable on a given device. It is also used as an escape sequence in some programming languages.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form. It is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. Some also use it as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution.

<span class="mw-page-title-main">SREC (file format)</span>

Motorola S-record is a file format, created by Motorola in the mid-1970s, that conveys binary information as hex values in ASCII text form. This file format may also be known as SRECORD, SREC, S19, S28, S37. It is commonly used for programming flash memory in microcontrollers, EPROMs, EEPROMs, and other types of programmable logic devices. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. The HEX file is then imported by a programmer to "burn" the machine code into non-volatile memory, or is transferred to the target system for loading and execution.

<span class="mw-page-title-main">GNU Unifont</span> Duospaced bitmap font

GNU Unifont is a free Unicode bitmap font using an intermediate bitmapped font format created by Roman Czyborra. The main Unifont covers all of the Basic Multilingual Plane (BMP). The "upper" companion covers significant parts of the Supplementary Multilingual Plane (SMP). The "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.

010 Editor is a commercial hex editor and text editor for Microsoft Windows, Linux and macOS. Typically 010 Editor is used to edit text files, binary files, hard drives, processes, tagged data, source code, shell scripts, log files, etc. A large variety of binary data formats can be edited through the use of Binary Templates.

References

  1. "SuperZap" . Retrieved Jun 7, 2015.