Leet (programming language)

Last updated

Leet (or L33t) is an esoteric programming language based loosely on Brainfuck and named for the resemblance of its source code to the symbolic language "L33t 5p34k". L33t was designed by Stephen McGreal [1] and Alex Mole to be as confusing as possible. It is Turing-complete and has the possibility for self-modifying code. Software written in the language can make network connections and may therefore be used to write malware.[ citation needed ]

Contents

Language specification

The basic data unit of L33t is the unsigned byte (big-endian), which can represent ASCII values and numbers in the range 0-255.

The source code is in "l33t 5p34k" and words are separated by spaces or carriage returns. The language uses 10 opcodes and each word in the source code is translated into an opcode by adding all the digits in the word together, e.g. l33t = 3 + 3 = 6. It is not necessary to use anything but digits in the code.

The language uses a 64K block of memory, and 2 pointers - a memory pointer and an instruction pointer. The l33t interpreter tokenizes all the words in the source to create a sequence of numerical opcodes, and places them in order into the memory block, starting at byte 0. The instruction pointer will keep incrementing until it encounters an END. The memory pointer starts at the first byte after the instructions. Memory "wraps": incrementing the memory and the instruction pointer past 64K will cause it to run around to byte 0, and vice versa.

Memory pointers can also be moved into the area of memory occupied by the instructions, so code can be self modified at runtime. Similarly, the instruction pointer will continue incrementing or jumping until it encounters an END, so code can be generated at runtime and subsequently executed.

Opcodes

VALUEOPCODEDESCRIPTION
0NOPNo Operation, except to increment the instruction pointer.
1WRTWrites the ASCII values of the byte under the memory pointer to the current connection (see CON). Increments the instruction pointer.
2RDReads a character from the current connection (see CON) and writes it to the byte currently under the memory pointer. Increments the instruction pointer.
3IFMoves the instruction pointer forward to the command following the matching EIF, if the byte under the memory pointer is equal to zero.
If the byte under the memory pointer does not equal zero, IF simply increments the instruction pointer.
4EIFMoves the instruction pointer backwards to the command following the matching IF, if the byte under the memory pointer is not equal to zero.
If the byte under the memory pointer does equal zero, EIF simply increments the instruction pointer.
5FWDMove memory pointer forward by (next word+1) bytes. Adds 2 to the instruction pointer.
6BAKMove memory pointer backward by (next word+1) bytes. Subtracts 2 from the instruction pointer.
7INCIncrement value of the byte under memory pointer by (next word+1). Adds 2 to the instruction pointer.
8DECDecrement value of the byte under memory pointer by (next word+1). Adds 2 to the instruction pointer.
9CONReads the 6 bytes starting with the memory pointer (the first 4 bytes specifying an IP in the format 127.0.0.1, and the last 2 bytes combining to make a 16-bit port number * ),
and opens a connection if possible. If a connection can't be opened, l33t will return the error message:
"h0s7 5uXz0r5! c4N'7 c0Nn3<7 l0l0l0l0l l4m3R !!!".

and reset the current connection to the last successful one (stdin/stdout if there were no previous successful connections).
If all 6 bytes read 0, l33t reverts to the local machine's stdin and stdout (this is the default setting upon starting a l33t program). Increments the instruction pointer.
Regardless of whether the connection was successful or not, the memory pointer will be left in the same place as it was. Only FWD and BAK move the memory pointer.

  • The port number can be calculated by something along the lines of: portNumber = (byte5 << 8) + byte
10ENDCloses all open connections and ends the program. The value 10 won't end the program if it is used as data for opcodes FWD, BAK, INC or DEC.

Bugs

F00l! teh c0d3 1s b1g3R th4n teh m3m0ry!!1!

You tried to load a program that is too big to fit in the memory. Note that at compile time, one byte is reserved for the memory buffer, so the program's size must be less than the memory size minus one byte.

Byt3 s1z3 must be at l34st 11, n00b!

The byte_size argument of new() was less than 11. The byte size of an interpreter must be at least 11 (to accommodate for the opcodes).

L0L!!1!1!! n0 l33t pr0gr4m l04d3d, sUxX0r!

run() called before any program was loaded.

Interpreters

Python

Written by Alex Mole. Does not support the CON opcode, but otherwise considered the "definitive" interpreter.[ citation needed ]

Ruby

Written by Eric Redmond. This one contains an implementation of CON.

JavaScript

By Phil McCarthy, it is based on the Python one but is a bit more interactive, which is nice. See also his interpreter for The Tory Programming Language, which is deeply silly and bears a striking resemblance to l33t ;o)

C

Interpreters for C have been written by Kuisma Salonen (for use in Linux) and by Alecs King.

Perl 6

By Gaal Yahas. This interpreter is notable for being the first which comes with a debugger.

Related Research Articles

Brainfuck is an esoteric programming language created in 1993 by Urban Müller.

Buffer overflow Anomaly in computer security and programming

In information security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations.

Forth is a procedural, stack-oriented programming language and interactive environment designed by Charles H. "Chuck" Moore and first used by other programmers in 1970. Although not an acronym, the language's name in its early years was often spelled in all capital letters as FORTH, but Forth is more common now.

Zilog Z80 8-bit microprocessor

The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were delivered in March 1976, and it was officially introduced on the market in July 1976. With the revenue from the Z80, the company built its own chip factories and grew to over a thousand employees over the following two years.

Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. CIL instructions are executed by a CLI-compatible runtime environment such as the Common Language Runtime. Languages which target the CLI compile to CIL. CIL is object-oriented, stack-based bytecode. Runtimes typically just-in-time compile CIL instructions into native code.

In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.

Interpreter (computing) Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.
Intel 8051 Single chip microcontroller series by Intel

The Intel MCS-51 is a single chip microcontroller (MCU) series developed by Intel in 1980 for use in embedded systems. The architect of the Intel MCS-51 instruction set was John H. Wharton. Intel's original versions were popular in the 1980s and early 1990s, and enhanced binary compatible derivatives remain popular today. It is an example of a complex instruction set computer and has separate memory spaces for program instructions and data.

Bytecode, also termed p-code, is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. Self-modification is an alternative to the method of "flag setting" and conditional program branching, used primarily to reduce the number of times a condition needs to be tested. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

CHIP-8

CHIP-8 is an interpreted programming language, developed by Joseph Weisbecker. It was initially used on the COSMAC VIP and Telmac 1800 8-bit microcomputers in the mid-1970s. CHIP-8 programs are run on a CHIP-8 virtual machine. It was made to allow video games to be more easily programmed for these computers, but CHIP 8 is still used today, due to its simplicity, and consequently on any platform and its teaching of programming Binary numbers.

In computer engineering, an orthogonal instruction set is an instruction set architecture where all instruction types can use all addressing modes. It is "orthogonal" in the sense that the instruction type and the addressing mode vary independently. An orthogonal instruction set does not impose a limitation that requires a certain instruction to use a specific register so there is little overlapping of instruction functionality.

TI-990 Series of 16-bit computers by Texas Instruments.

The TI-990 was a series of 16-bit minicomputers sold by Texas Instruments (TI) in the 1970s and 1980s. The TI-990 was a replacement for TI's earlier minicomputer systems, the TI-960 and the TI-980. It had several unique features, and was easier to program than its predecessors. Among its core concepts was the ability to easily support multiprogramming using a software-switchable set of processor registers that allowed it to perform rapid context switches between programs.

Control table

Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design. In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines

The PIC instruction set refers to the set of instructions that Microchip Technology PIC or dsPIC microcontroller supports. The instructions are usually programmed into the Flash memory of the processor, and automatically executed by the microcontroller on startup.

STM8

The STM8 is an 8-bit microcontroller family by STMicroelectronics. The STM8 microcontrollers use an extended variant of the ST7 microcontroller architecture. STM8 microcontrollers are particularly low cost for a full-featured 8-bit microcontroller.

Toi is an imperative, type-sensitive language that provides the basic functionality of a programming language. The language was designed and developed from the ground-up by Paul Longtine. Written in C, Toi was created with the intent to be an educational experience and serves as a learning tool for those looking to familiarize themselves with the inner-workings of a programming language.

BASIC interpreter Interpreter that enables users to enter and run programs in the BASIC language

A BASIC interpreter is an interpreter that enables users to enter and run programs in the BASIC language and was, for the first part of the microcomputer era, the default application that computers would launch. Users were expected to use the BASIC interpreter to type in programs or to load programs from storage.

References