In computer programming, a magic number is any of the following:
The term magic number or magic constant refers to the anti-pattern of using numbers directly in source code. This has been referred to as breaking one of the oldest rules of programming, dating back to the COBOL, FORTRAN and PL/1 manuals of the 1960s. [1] The use of unnamed magic numbers in code obscures the developers' intent in choosing that number, [2] increases opportunities for subtle errors (e.g. is every digit correct in 3.14159265358979323846 and can be rounded to 3.14159?[ clarification needed ] [3] ) and makes it more difficult for the program to be adapted and extended in the future. [4] Replacing all significant magic numbers with named constants (also called explanatory variables) makes programs easier to read, understand and maintain. [5]
Names chosen to be meaningful in the context of the program can result in code that is more easily understood by a maintainer who is not the original author (or even by the original author after a period of time). [6] An example of an uninformatively named constant is int SIXTEEN = 16
, while int NUMBER_OF_BITS = 16
is more descriptive.
The problems associated with magic 'numbers' described above are not limited to numerical types and the term is also applied to other data types where declaring a named constant would be more flexible and communicative. [1] Thus, declaring const string testUserName = "John"
is better than several occurrences of the 'magic value' "John"
in a test suite.
For example, if it is required to randomly shuffle the values in an array representing a standard pack of playing cards, this pseudocode does the job using the Fisher–Yates shuffle algorithm:
for i from 1 to 52 j := i + randomInt(53 - i) - 1 a.swapEntries(i, j)
where a
is an array object, the function randomInt(x)
chooses a random integer between 1 and x, inclusive, and swapEntries(i, j)
swaps the ith and jth entries in the array. In the preceding example, 52
and 53
are magic numbers, also not clearly related to each other. It is considered better programming style to write the following:
int deckSize:= 52 for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j)
This is preferable for several reasons:
deckSize
variable in the second example would be a simple, one-line change.dekSize
" instead of "deckSize
" would result in the compiler's warning that dekSize
is undeclared.deckSize
into a parameter of that procedure, whereas the first example would require several changes.function shuffle (int deckSize) for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j)
Disadvantages are:
deckSize + 1
at run-time than the value "53", although most modern compilers and interpreters will notice that deckSize
has been declared as a constant and pre-calculate the value 53 in the compiled code. Even when that's not an option, loop optimization will move the addition so that it is performed before the loop. There is therefore usually no (or negligible) speed penalty compared to using magic numbers in code. Especially the cost of debugging and the time needed trying to understand non-explanatory code must be held against the tiny calculation cost.This section needs additional citations for verification .(March 2010) |
In some contexts, the use of unnamed numerical constants is generally accepted (and arguably "not magic"). While such acceptance is subjective, and often depends on individual coding habits, the following are common examples:
for(inti=0;i<max;i+=1)
isEven = (x % 2 == 0)
, where %
is the modulo operatorcircumference = 2 * Math.PI * radius
, [1] or for calculating the discriminant of a quadratic equation as d = b^2 − 4*a*c
(f(x) ** 2 + f(y) ** 2) ** 0.5
for The constants 1 and 0 are sometimes used to represent the Boolean values true and false in programming languages without a Boolean type, such as older versions of C. Most modern programming languages provide a boolean
or bool
primitive type and so the use of 0 and 1 is ill-advised. This can be more confusing since 0 sometimes means programmatic success (when -1 means failure) and failure in other cases (when 1 means success).
In C and C++, 0 represents the null pointer. As with Boolean values, the C standard library includes a macro definition NULL
whose use is encouraged. Other languages provide a specific null
or nil
value and when this is the case no alternative should be used. The typed pointer constant nullptr
has been introduced with C++11.
Format indicators were first used in early Version 7 Unix source code.[ citation needed ]
Unix was ported to one of the first DEC PDP-11/20s, which did not have memory protection. So early versions of Unix used the relocatable memory reference model. [7] Pre-Sixth Edition Unix versions read an executable file into memory and jumped to the first low memory address of the program, relative address zero. With the development of paged versions of Unix, a header was created to describe the executable image components. Also, a branch instruction was inserted as the first word of the header to skip the header and start the program. In this way a program could be run in the older relocatable memory reference (regular) mode or in paged mode. As more executable formats were developed, new constants were added by incrementing the branch offset. [8]
In the Sixth Edition source code of the Unix program loader, the exec() function read the executable (binary) image from the file system. The first 8 bytes of the file was a header containing the sizes of the program (text) and initialized (global) data areas. Also, the first 16-bit word of the header was compared to two constants to determine if the executable image contained relocatable memory references (normal), the newly implemented paged read-only executable image, or the separated instruction and data paged image. [9] There was no mention of the dual role of the header constant, but the high order byte of the constant was, in fact, the operation code for the PDP-11 branch instruction (octal 000407 or hex 0107). Adding seven to the program counter showed that if this constant was executed, it would branch the Unix exec() service over the executable image eight byte header and start the program.
Since the Sixth and Seventh Editions of Unix employed paging code, the dual role of the header constant was hidden. That is, the exec() service read the executable file header (meta) data into a kernel space buffer, but read the executable image into user space, thereby not using the constant's branching feature. Magic number creation was implemented in the Unix linker and loader and magic number branching was probably still used in the suite of stand-alone diagnostic programs that came with the Sixth and Seventh Editions. Thus, the header constant did provide an illusion and met the criteria for magic.
In Version Seven Unix, the header constant was not tested directly, but assigned to a variable labeled ux_mag [10] and subsequently referred to as the magic number. Probably because of its uniqueness, the term magic number came to mean executable format type, then expanded to mean file system type, and expanded again to mean any type of file.
Magic numbers are common in programs across many operating systems. Magic numbers implement strongly typed data and are a form of in-band signaling to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many file formats and can yield further run-time information.
CAFEBABE
. When compressed with Pack200 the bytes are changed to CAFED00D
.47
49
46
38
39
61
) or "GIF87a" (47
49
46
38
37
61
)FF
D8
and end with FF
D9
. JPEG/JFIF files contain the null terminated string "JFIF" (4A
46
49
46
00
). JPEG/Exif files contain the null terminated string "Exif" (45
78
69
66
00
), followed by more metadata about the file.89
50
4E
47
0D
0A
1A
0A
). That signature contains various newline characters to permit detecting unwarranted automated newline conversions, such as transferring the file using FTP with the ASCII transfer mode instead of the binary mode. [11] 4D
54
68
64
) followed by more metadata.23
21
) followed by the path to an interpreter, if the interpreter is likely to be different from the one from which the script was invoked.7F
followed by "ELF" (7F
45
4C
46
).25
21
).25
50
44
46
).4D
5A
), the initials of the designer of the file format, Mark Zbikowski. The definition allows the uncommon "ZM" (5A
4D
) as well for dosZMXP, a non-PE EXE. [12] 19
54
01
19
or 01
19
54
depending on version; both represent the birthday of the author, Marshall Kirk McKusick.55
AA
as its last two bytes.4A
6F
79
21
) as a prefix.49
49
2A
00
. "MM" is for Motorola, which uses big endian byte ordering, so the magic number is 4D
4D
00
2A
.FE
FF
for big endian and FF
FE
for little endian). And on Microsoft Windows, UTF-8 text files often start with the UTF-8 encoding of the same character, EF
BB
BF
.42
43
).D0
CF
11
E0
, which is visually suggestive of the word "DOCFILE0".50
4B
03
04
), where "PK" are the initials of Phil Katz, author of DOS compression utility PKZIP.37
7A
BC
AF
27
1C
).The Unix utility program file
can read and interpret magic numbers from files, and the file which is used to parse the information is called magic. The Windows utility TrID has a similar purpose.
2A
.52
46
42
, for "Remote Frame Buffer") followed by the client's protocol version number.FF
53
4D
42
', or "\xFFSMB"
at the start of the SMB request.05
at the start of the request (representing Microsoft DCE/RPC Version 5), followed immediately by a 00
or 01
for the minor version. In UDP-based MSRPC requests the first byte is always 04
.4D
45
4F
57
). Debugging extensions (used for DCOM channel hooking) are prefaced with the byte sequence "MARB" (4D
41
52
42
).19
representing the header length, followed immediately by the phrase "BitTorrent protocol" at byte position 1.E3
represents an eDonkey client, C5
represents eMule, and D4
represents compressed eMule.0xD9B4BEF9
, which indicates the main network, while the constant 0xDAB5BFFA
indicates the testnet.80
and an SSLv3 server response to a client hello begins with 16
(though this may vary).0x63
0x82
0x53
0x63
' at the start of the options section of the packet. This value is included in all DHCP packet types.0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
', or "PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n
". The preface is designed to avoid the processing of frames by servers and intermediaries which support earlier versions of HTTP but not 2.0.Magic numbers are common in API functions and interfaces across many operating systems, including DOS, Windows and NetWare:
0000
and 1234
to decide if the system should count up memory or not on reboot, thereby performing a cold or a warm boot. Theses values are also used by EMM386 memory managers intercepting boot requests. [13] BIOSes also use magic values 55 AA
to determine if a disk is bootable. [14] This is a list of limits of data storage types: [16]
Decimal | Hex | Description |
---|---|---|
18,446,744,073,709,551,615 | FFFF FFFF FFFF FFFF | The maximum unsigned 64 bit value (264 − 1) |
9,223,372,036,854,775,807 | 7FFF FFFF FFFF FFFF | The maximum signed 64 bit value (263 − 1) |
9,007,199,254,740,992 | 0020 0000 0000 0000 | The largest consecutive integer in IEEE 754 double precision (253) |
4,294,967,295 | FFFF FFFF | The maximum unsigned 32 bit value (232 − 1) |
2,147,483,647 | 7FFF FFFF | The maximum signed 32 bit value (231 − 1) |
16,777,216 | 0100 0000 | The largest consecutive integer in IEEE 754 single precision (224) |
65,535 | FFFF | The maximum unsigned 16 bit value (216 − 1) |
32,767 | 7FFF | The maximum signed 16 bit value (215 − 1) |
255 | FF | The maximum unsigned 8 bit value (28 − 1) |
127 | 7F | The maximum signed 8 bit value (27 − 1) |
−128 | 80 | Minimum signed 8 bit value |
−32,768 | 8000 | Minimum signed 16 bit value |
−2,147,483,648 | 8000 0000 | Minimum signed 32 bit value |
−9,223,372,036,854,775,808 | 8000 0000 0000 0000 | Minimum signed 64 bit value |
It is possible to create or alter globally unique identifiers (GUIDs) so that they are memorable, but this is highly discouraged as it compromises their strength as near-unique identifiers. [17] [18] The specifications for generating GUIDs and UUIDs are quite complex, which is what leads to them being virtually unique, if properly implemented. [19]
Microsoft Windows product ID numbers for Microsoft Office products sometimes end with 0000-0000-0000000FF1CE
("OFFICE"), such as {90160000-008C-0000-0000-0000000FF1CE
}, the product ID for the "Office 16 Click-to-Run Extensibility Component".
Java uses several GUIDs starting with CAFEEFAC
. [20]
In the GUID Partition Table of the GPT partitioning scheme, BIOS Boot partitions use the special GUID {21686148-6449-6E6F-744E-656564454649
} [21] which does not follow the GUID definition; instead, it is formed by using the ASCII codes for the string "Hah!IdontNeedEFI
" partially in little endian order. [22]
Magic debug values are specific values written to memory during allocation or deallocation, so that it will later be possible to tell whether or not they have become corrupted, and to make it obvious when values taken from uninitialized memory are being used. Memory is usually viewed in hexadecimal, so memorable repeating or hexspeak values are common. Numerically odd values may be preferred so that processors without byte addressing will fault when attempting to use them as pointers (which must fall at even addresses). Values should be chosen that are away from likely addresses (the program code, static data, heap data, or the stack). Similarly, they may be chosen so that they are not valid codes in the instruction set for the given architecture.
Since it is very unlikely, although possible, that a 32-bit integer would take this specific value, the appearance of such a number in a debugger or memory dump most likely indicates an error such as a buffer overflow or an uninitialized variable.
Famous and common examples include:
Code | Description |
---|---|
00008123 | Used in MS Visual C++. Deleted pointers are set to this value, so they throw an exception, when they are used after; it is a more recognizable alias for the zero address. It is activated with the Security Development Lifecycle (/sdl) option. [23] |
..FACADE | "Facade", Used by a number of RTOSes |
1BADB002 | "1 bad boot", Multiboot header magic number [24] |
8BADF00D | "Ate bad food", Indicates that an Apple iOS application has been terminated because a watchdog timeout occurred. [25] |
A5A5A5A5 | Used in embedded development because the alternating bit pattern (1010 0101) creates an easily recognized pattern on oscilloscopes and logic analyzers. |
A5 | Used in FreeBSD's PHK malloc(3) for debugging when /etc/malloc.conf is symlinked to "-J" to initialize all newly allocated memory as this value is not a NULL pointer or ASCII NUL character. |
ABABABAB | Used by Microsoft's debug HeapAlloc() to mark "no man's land" guard bytes after allocated heap memory. [26] |
ABADBABE | "A bad babe", Used by Apple as the "Boot Zero Block" magic number |
ABBABABE | "ABBA babe", used by Driver Parallel Lines memory heap. |
ABADCAFE | "A bad cafe", Used to initialize all unallocated memory (Mungwall, AmigaOS) |
B16B00B5 | "Big Boobs", Formerly required by Microsoft's Hyper-V hypervisor to be used by Linux guests as the upper half of their "guest id" [27] |
BAADF00D | "Bad food", Used by Microsoft's debug HeapAlloc() to mark uninitialized allocated heap memory [26] |
BAAAAAAD | "Baaaaaad", Indicates that the Apple iOS log is a stackshot of the entire system, not a crash report [25] |
BAD22222 | "Bad too repeatedly", Indicates that an Apple iOS VoIP application has been terminated because it resumed too frequently [25] |
BADBADBADBAD | "Bad bad bad bad", Burroughs large systems "uninitialized" memory (48-bit words) |
BADC0FFEE0DDF00D | "Bad coffee odd food", Used on IBM RS/6000 64-bit systems to indicate uninitialized CPU registers |
BADDCAFE | "Bad cafe", On Sun Microsystems' Solaris, marks uninitialized kernel memory (KMEM_UNINITIALIZED_PATTERN) |
BBADBEEF | "Bad beef", Used in WebKit, for particularly unrecoverable errors [28] |
BEBEBEBE | Used by AddressSanitizer to fill allocated but not initialized memory [29] |
BEEFCACE | "Beef cake", Used by Microsoft .NET as a magic number in resource files |
C00010FF | "Cool off", Indicates Apple iOS app was killed by the operating system in response to a thermal event [25] |
CAFEBABE | "Cafe babe", Used by Java for class files |
CAFED00D | "Cafe dude", Used by Java for their pack200 compression |
CAFEFEED | "Cafe feed", Used by Sun Microsystems' Solaris debugging kernel to mark kmemfree() memory |
CCCCCCCC | Used by Microsoft's C++ debugging runtime library and many DOS environments to mark uninitialized stack memory. CC is the opcode of the INT 3 debug breakpoint interrupt on x86 processors. [30] |
CDCDCDCD | Used by Microsoft's C/C++ debug malloc() function to mark uninitialized heap memory, usually returned from HeapAlloc() [26] |
0D15EA5E | "Zero Disease", Used as a flag to indicate regular boot on the GameCube and Wii consoles |
DDDDDDDD | Used by MicroQuill's SmartHeap and Microsoft's C/C++ debug free() function to mark freed heap memory [26] |
DEAD10CC | "Dead lock", Indicates that an Apple iOS application has been terminated because it held on to a system resource while running in the background [25] |
DEADBABE | "Dead babe", Used at the start of Silicon Graphics' IRIX arena files |
DEADBEEF | "Dead beef", Famously used on IBM systems such as the RS/6000, also used in the classic Mac OS operating systems, OPENSTEP Enterprise, and the Commodore Amiga. On Sun Microsystems' Solaris, marks freed kernel memory (KMEM_FREE_PATTERN) |
DEADCAFE | "Dead cafe", Used by Microsoft .NET as an error number in DLLs |
DEADC0DE | "Dead code", Used as a marker in OpenWRT firmware to signify the beginning of the to-be created jffs2 file system at the end of the static firmware |
DEADFA11 | "Dead fail", Indicates that an Apple iOS application has been force quit by the user [25] |
DEADF00D | "Dead food", Used by Mungwall on the Commodore Amiga to mark allocated but uninitialized memory [31] |
DEFEC8ED | "Defecated", Used for OpenSolaris core dumps |
DEADDEAD | "Dead Dead" indicates that the user deliberately initiated a crash dump from either the kernel debugger or the keyboard under Microsoft Windows. [32] |
D00D2BAD | "Dude, Too Bad", Used by Safari crashes on macOS Big Sur. [33] |
D00DF33D | "Dude feed", Used by the devicetree to mark the start of headers. [34] |
EBEBEBEB | From MicroQuill's SmartHeap |
FADEDEAD | "Fade dead", Comes at the end to identify every AppleScript script |
FDFDFDFD | Used by Microsoft's C/C++ debug malloc() function to mark "no man's land" guard bytes before and after allocated heap memory, [26] and some debug Secure C-Runtime functions implemented by Microsoft (e.g. strncat_s) [35] |
FEE1DEAD | "Feel dead", Used by Linux reboot() syscall |
FEEDFACE | "Feed face", Seen in PowerPC Mach-O binaries on Apple Inc.'s Mac OSX platform. On Sun Microsystems' Solaris, marks the red zone (KMEM_REDZONE_PATTERN) Used by VLC player and some IP cameras in RTP/RTCP protocol, VLC player sends four bytes in the order of the endianness of the system. Some IP cameras expect the player to send this magic number and do not start the stream if it is not received. |
FEEEFEEE | "Fee fee", Used by Microsoft's debug HeapFree() to mark freed heap memory. Some nearby internal bookkeeping values may have the high word set to FEEE as well. [26] |
Most of these are 32 bits long –the word size of most 32-bit architecture computers.
The prevalence of these values in Microsoft technology is no coincidence; they are discussed in detail in Steve Maguire's book Writing Solid Code from Microsoft Press. He gives a variety of criteria for these values, such as:
Since they were often used to mark areas of memory that were essentially empty, some of these terms came to be used in phrases meaning "gone, aborted, flushed from memory"; e.g. "Your program is DEADBEEF".[ citation needed ]
In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.
In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binary representation of a computer program which is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions.
In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Cohen into computer science for data ordering in an Internet Experiment Note published in 1980. The adjective endian has its origin in the writings of 18th century Anglo-Irish writer Jonathan Swift. In the 1726 novel Gulliver's Travels, he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a boiled egg from the big end or from the little end. By analogy, a CPU may read a digital word big end first, or little end first.
Resource Interchange File Format (RIFF) is a generic file container format for storing data in tagged chunks. It is primarily used for audio and video, though it can be used for arbitrary data.
The Common Object File Format (COFF) is a format for executable, object code, and shared library computer files used on Unix systems. It was introduced in Unix System V, replaced the previously used a.out format, and formed the basis for extended specifications such as XCOFF and ECOFF, before being largely replaced by ELF, introduced with SVR4. COFF and its variants continue to be used on some Unix-like systems, on Microsoft Windows, in UEFI environments and in some embedded development systems.
An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.
The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been used in the 7z format of the 7-Zip archiver since 2001. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio and a variable compression-dictionary size, while still maintaining decompression speed similar to other commonly used compression algorithms.
Hexspeak is a novelty form of variant English spelling using the hexadecimal digits. Created by programmers as memorable magic numbers, hexspeak words can serve as a clear and unique identifier with which to mark memory or data.
TRSDOS is the operating system for the Tandy TRS-80 line of eight-bit Zilog Z80 microcomputers that were sold through Radio Shack from 1977 through 1991. Tandy's manuals recommended that it be pronounced triss-doss. TRSDOS should not be confused with Tandy DOS, a version of MS-DOS licensed from Microsoft for Tandy's x86 line of personal computers (PCs).
A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM
was used as a filename extension for text files containing commands to be issued to the operating system. With the introduction of Digital Research's CP/M, the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.
Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.
A Java class file is a file containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files containing Java classes. If a source file has more than one class, each class is compiled into a separate class file. Thus, it is called a .class file because it contains the bytecode for a single class.
A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.
A FourCC is a sequence of four bytes used to uniquely identify data formats. It originated from the OSType or ResType metadata system used in classic Mac OS and was adopted for the Amiga/Electronic Arts Interchange File Format and derivatives. The idea was later reused to identify compressed data types in QuickTime and DirectShow.
Commodore DOS, also known as CBM DOS, is the disk operating system used with Commodore's 8-bit computers. Unlike most other DOSes, which are loaded from disk into the computer's own RAM and executed there, CBM DOS is executed internally in the drive: the DOS resides in ROM chips inside the drive, and is run there by one or more dedicated MOS 6502 family CPUs. Thus, data transfer between Commodore 8-bit computers and their disk drives more closely resembles a local area network connection than typical disk/host transfers.
Netpbm is an open-source package of graphics programs and a programming library. It is used mainly in the Unix world, where one can find it included in all major open-source operating system distributions, but also works on Microsoft Windows, macOS, and other operating systems.
The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive, using universally unique identifiers (UUIDs), which are also known as globally unique identifiers (GUIDs). Forming a part of the Unified Extensible Firmware Interface (UEFI) standard, it is nevertheless also used for some BIOSs, because of the limitations of master boot record (MBR) partition tables, which use 32 bits for logical block addressing (LBA) of traditional 512-byte disk sectors.
In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.
Program database (PDB) is a file format for storing debugging information about a program. PDB files commonly have a .pdb extension. A PDB file is typically created from source files during compilation. It stores a list of all symbols in a module with their addresses and possibly the name of the file and the line on which the symbol was declared. This symbol information is not stored in the module itself, because it takes up a lot of space.
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.