Toi (programming language)

Last updated

Toi is an imperative, type-sensitive language that provides the basic functionality of a programming language. The language was designed and developed from the ground-up by Paul Longtine. [1] Written in C, Toi was created with the intent to be an educational experience and serves as a learning tool (or toy, hence the name) for those looking to familiarize themselves with the inner-workings of a programming language. [2]

Contents

Specification [3] [4]

Types

0 VOID    - Null, no data 1 ADDR    - Address type (bytecode) 2 TYPE    - A `type` type 3 PLIST   - Parameter list 4 FUNC    - Function 5 OBJBLDR - Object builder 6 OBJECT  - Object/Class 7 G_PTR   - Generic pointer 8 G_INT   - Generic integer 9 G_FLOAT - Generic double 10 G_CHAR  - Generic character 11 G_STR   - Generic string 12 S_ARRAY - Static array 13 D_ARRAY - Dynamic array 14 H_TABLE - Hashtable 15 G_FIFO  - Stack

Runtime

Runtime context definition

The runtime context keeps track of an individual threads metadata, such as:

  • The operating stack
    • The operating stack where current running instructions push/pop to.
    • refer to STACK DEFINITION
  • Namespace instance
    • Data structure that holds the references to variable containers, also proving the interface for Namespace Levels.
    • refer to NAMESPACE DEFINITION
  • Argument stack
    • Arguments to function calls are pushed on to this stack, flushed on call.
    • refer to STACK DEFINITION, FUNCTION DEFINITION
  • Program counter
    • An interface around bytecode to keep track of traversing line-numbered instructions.
    • refer to PROGRAM COUNTER DEFINITION

This context gives definition to an 'environment' where code is executed.

Namespace definition

A key part to any operational computer language is the notion of a 'Namespace'. This notion of a 'Namespace' refers to the ability to declare a name, along with needed metadata, and call upon the same name to retrieve the values associated with that name.

In this definition, the namespace will provide the following key mechanisms:

  • Declaring a name
  • Assigning a name to a value
  • Retrieving a name's value
  • Handle a name's scope
  • Implicitly move in/out of scopes

The scope argument is a single byte, where the format is as follows:

Namespace|Scope 0000000  |0

Scopes are handled by referencing to either the Global Scope or the Local Scope. The Local Scope is denoted by '0' in the scope argument when referring to names, and this scope is initialized when evaluating any new block of code. When a different block of code is called, a new scope is added as a new Namespace level. Namespace levels act as context switches within function contexts. For example, the local namespace must be 'returned to' if that local namespace context needs to be preserved on return. Pushing 'Namespace levels' ensures that for every n function calls, you can traverse n instances of previous namespaces. For example, take this namespace level graphic, where each Level is a namespace instance:

Level 0: Global namespace, LSB == '1'. Level 1: Namespace level, where Local Level is at 1, LSB == '0'.

When a function is called, another namespace level is created and the local level increases, like so:

Level 0: Global namespace, LSB == '1'. Level 1: Namespace level. Level 2: Namespace level, where Local Level is at 2, LSB == '0'.

Global scope names (LSB == 1 in the scope argument) are persistent through the runtime as they handle all function definitions, objects, and names declared in the global scope. The "Local Level" is at where references that have a scope argument of '0' refer to when accessing names.

The Namespace argument refers to which Namespace the variable exists in. When the namespace argument equals 0, the current namespace is referenced. The global namespace is 1 by default, and any other namespaces must be declared by using the

Variable definition

Variables in this definition provide the following mechanisms:

  • Provide a distinguishable area of typed data
  • Provide a generic container around typed data, to allow for labeling
  • Declare a set of fundamental datatypes, and methods to:
    • Allocate the proper space of memory for the given data type,
    • Deallocate the space of memory a variables data may take up, and
    • Set in place a notion of ownership

For a given variable V, V defines the following attributes

   V -> Ownership    V -> Type    V -> Pointer to typed space in memory

Each variable then can be handled as a generic container.

In the previous section, the notion of Namespace levels was introduced. Much like how names are scoped, generic variable containers must communicate their scope in terms of location within a given set of scopes. This is what is called 'Ownership'. In a given runtime, variable containers can exist in the following structures: A stack instance, Bytecode arguments, and Namespaces

The concept of ownership differentiates variables existing on one or more of the structures. This is set in place to prevent accidental deallocation of variable containers that are not copied, but instead passed as references to these structures.

Function definition

Functions in this virtual machine are a pointer to a set of instructions in a program with metadata about parameters defined.

Object definition

In this paradigm, objects are units that encapsulate a separate namespace and collection of methods.

Bytecode spec

Bytecode is arranged in the following order:

<opcode>, <arg 0>, <arg 1>, <arg 2>

Where the <opcode> is a single byte denoting which subroutine to call with the following arguments when executed. Different opcodes have different argument lengths, some having 0 arguments, and others having 3 arguments.

Interpreting Bytecode Instructions

A bytecode instruction is a single-byte opcode, followed by at maximum 3 arguments, which can be in the following forms:

  • Static (single byte)
  • Name (single word)
  • Address (depending on runtime state, usually a word)
  • Dynamic (size terminated by NULL, followed by (size)*bytes of data)
    • i.e. FF FF 00 <0xFFFF bytes of data>,
    • 01 00 <0x1 bytes of data>,
    • 06 00 <0x6 bytes of data>, etc.

Below is the specification of all the instructions with a short description for each instruction, and instruction category:

Opcode

Keywords:

TOS           - 'Top Of Stack' The top element TBI           - 'To be Implemented' S<[variable]> - Static Argument. N<[variable]> - Name. A<[variable]> - Address Argument. D<[variable]> - Dynamic bytecode argument.

Hex | Memnonic | arguments - description

Stack manipulation

   These subroutines operate on the current-working stack(1).

10 POP S<n>  - pops the stack n times. 11 ROT       - rotates top of stack 12 DUP       - duplicates the top of the stack 13 ROT_THREE - rotates top three elements of stack

Variable management

20 DEC S<scope> S<type> N - declare variable of type 21 LOV S<scope> N - loads reference variable on to stack 22 STV S<scope> N - stores TOS to reference variable 23 CTV S<scope> N D<data> - loads constant into variable 24 CTS D<data>                 - loads constant into stack

Type management

Types are in the air at this moment. I'll detail what types there are when the time comes


30 TYPEOF       - pushes type of TOS on to the stack                        TBI 31 CAST S<type> - Tries to cast TOS to <type>                               TBI

Binary Ops

OPS take the two top elements of the stack, perform an operation and push the result on the stack.


40 ADD  - adds 41 SUB  - subtracts 42 MULT - multiplies 43 DIV  - divides  44 POW  - power, TOS^TOS1                                                   TBI 45 BRT  - base root, TOS root TOS1                                          TBI 46 SIN  - sine                                                              TBI 47 COS  - cosine                                                            TBI 48 TAN  - tangent                                                           TBI  49 ISIN - inverse sine                                                      TBI 4A ICOS - inverse consine                                                   TBI 4B ITAN - inverse tangent                                                   TBI 4C MOD  - modulus                                                           TBI 4D OR   - or's                                                              TBI 4E XOR  - xor's                                                             TBI 4F NAND - and's                                                             TBI

Conditional Expressions

Things for comparison, < > = ! and so on and so forth. Behaves like Arithmetic instructions, besides NOT instruction. Pushes boolean to TOS


50 GTHAN    - Greater than 51 LTHAN    - Less than 52 GTHAN_EQ - Greater than or equal to 53 LTHAN_EQ - Less than or equal to 54 EQ       - Equal to 55 NEQ      - Not equal to 56 NOT      - Inverts TOS if TOS is boolean 57 OR       - Boolean OR 58 AND      - Boolean AND

Loops

60 STARTL - Start of loop 61 CLOOP  - Conditional loop. If TOS is true, continue looping, else break 6E BREAK  - Breaks out of loop 6F ENDL   - End of loop

Code flow

These instructions dictate code flow.


70 GOTO A<addr> - Goes to address 71 JUMPF A<n>   - Goes forward <n> lines 72 IFDO         - If TOS is TRUE, do until done, if not, jump to done 73 ELSE         - Chained with an IFDO statement, if IFDO fails, execute ELSE                   block until DONE is reached. 74 JTR          - jump-to-return.                                           TBI 75 JTE          - jump-to-error. Error object on TOS                        TBI 7D ERR          - Start error block, uses TOS to evaluate error             TBI 7E DONE         - End of block 7F CALL N - Calls function, pushes return value on to STACK.

Generic object interface. Expects object on TOS

80 GETN N<name>   - Returns variable associated with name in object 81 SETN N<name>   - Sets the variable associated with name in object                     Object on TOS, Variable on TOS1 82 CALLM N<name>  - Calls method in object 83 INDEXO         - Index an object, uses argument stack 84 MODO S<OP>     - Modify an object based on op. [+, -, *, /, %, ^ .. etc.]

F - Functions/classes

FF DEFUN NS<type> D<args> - Un-funs everything. no, no- it defines a                                   function. D is its name, S<type> is                                   the return value, D<args> is the args.
FE DECLASS ND<args>       - Defines a class. FD DENS S                 - Declares namespace F2 ENDCLASS               - End of class block F1 NEW S<scope> N         - Instantiates class F0 RETURN                 - Returns from function

Special Bytes

00 NULL          - No-op 01 LC N<name>    - Calls OS function library, i.e. I/O, opening files, etc.  TBI 02 PRINT         - Prints whatever is on the TOS. 03 DEBUG         - Toggle debug mode 0E ARGB          - Builds argument stack 0F PC S          - Primitive call, calls a subroutine A. A list of     TBI                    primitive subroutines providing methods to tweak                    objects this bytecode set cannot touch. Uses argstack.

Compiler/Translator/Assembler

Lexical analysis

Going from code to bytecode is what this section is all about. First off an abstract notation for the code will be broken down into a binary tree as so:

<node>                                      /\                                     /  \                                    /    \                                  <arg><next>

node> can be an argument of its parent node, or the next instruction. Instruction nodes are nodes that will produce an instruction, or multiple based on the bytecode interpretation of its instruction. For example, this line of code:

                                  int x = 3

would translate into:

                                     def                                       /\                                      /  \                                     /    \                                    /      \                                   /        \                                 int        set                                 /\          /\                                /  \        /  \                              null 'x'    'x'  null                                          /\                                         /  \                                       null  3

Functions are expressed as individual binary trees. The root of any file is treated as an individual binary tree, as this is also a function.

The various instruction nodes are as follows:

  • def <type><name>
    • Define a named space in memory with the type specified
      • See the 'TYPES' section under 'OVERVIEW'
  • set <name> <value>
    • Set a named space in memory with value specified
Going from Binary Trees to Bytecode

The various instruction nodes within the tree will call specific functions that will take arguments specified and lookahead and lookbehind to formulate the correct bytecode equivalent.

Developer's Website

The developer of the language, Paul Longtine, operates a publicly available website and blog called banna.tech, named after his online alias 'banna'.

Related Research Articles

C (programming language) General-purpose programming language

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software, and is common in computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 (S20018). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived from the ANSI Common Lisp standard.

Java virtual machine Virtual machine

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. CIL instructions are executed by a CLI-compatible runtime environment such as the Common Language Runtime. Languages which target the CLI compile to CIL. CIL is object-oriented, stack-based bytecode. Runtimes typically just-in-time compile CIL instructions into native code.

Lua (programming language) Lightweight programming language

Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications. Lua is cross-platform, since the interpreter of compiled bytecode is written in ANSI C, and Lua has a relatively simple C API to embed it into applications.

Interpreter (computing) Program that executes source code without a separate compilation step

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

  1. Parse the source code and perform its behavior directly;
  2. Translate source code into some efficient intermediate representation or object code and immediately execute that;
  3. Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter Virtual Machine.

In programming languages, a closure, also lexical closure or function closure, is a technique for implementing lexically scoped name binding in a language with first-class functions. Operationally, a closure is a record storing a function together with an environment. The environment is a mapping associating each free variable of the function with the value or reference to which the name was bound when the closure was created. Unlike a plain function, a closure allows the function to access those captured variables through the closure's copies of their values or references, even when the function is invoked outside their scope.

Bytecode, also termed p-code, is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

Pointer (computer programming) Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

ActionScript Object-oriented programming language

ActionScript is an object-oriented programming language originally developed by Macromedia Inc.. It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript, though it originally arose as a sibling, both being influenced by HyperTalk. ActionScript code is usually converted to byte-code format by the compiler.

This article compares two programming languages: C# with Java. While the focus of this article is mainly the languages and their features, such a comparison will necessarily also consider some features of platforms and libraries. For a more detailed comparison of the platforms, see Comparison of the Java and .NET platforms.

Partial template specialization is a particular form of class template specialization. Usually used in reference to the C++ programming language, it allows the programmer to specialize only some arguments of a class template, as opposed to explicit full specialization, where all the template arguments are provided.

typedef is a reserved keyword in the programming languages C and C++. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

Recursion (computer science) Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

This article describes the syntax of the C# programming language. The features described are compatible with .NET Framework and Mono.

Cosmos (operating system) Toolkit for building operating systems

C# Open Source Managed Operating System (Cosmos) is a toolkit for building operating systems, written mostly in the programming language C# and small amounts of a high level assembly language named X#. Cosmos is a backronym, in that the acronym was chosen before the meaning. It is open-source software released under a BSD license.

This comparison of programming languages compares how object-oriented programming languages such as C++, Java, Smalltalk, Object Pascal, Perl, Python, and others manipulate data structures.

Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM).

Kotlin is a cross-platform, statically typed, general-purpose programming language with type inference. Kotlin is designed to interoperate fully with Java, and the JVM version of Kotlin's standard library depends on the Java Class Library, but type inference allows its syntax to be more concise. Kotlin mainly targets the JVM, but also compiles to JavaScript or native code via LLVM. Language development costs are borne by JetBrains, while the Kotlin Foundation protects the Kotlin trademark.

References

  1. "bannana/language". GitHub . 17 February 2021.
  2. "banna - useless things". Archived from the original on 2016-10-24. Retrieved 2016-10-23.
  3. "bannana/language". GitHub . 17 February 2021.
  4. "SPECIFICATION - language - some fools attempt at an interpreted language".