Snowball (programming language)

Last updated

Snowball is a small string processing programming language designed for creating stemming algorithms for use in information retrieval. [1]

The name Snowball was chosen as a tribute to the SNOBOL programming language, with which it shares the concept of string patterns delivering signals that are used to control the flow of the program. The creator of Snowball, Dr. Martin Porter, "toyed with the idea of calling it 'strippergram' ", because it "effectively provides a 'suffix STRIPPER GRAMmar' ". [1]

The Snowball compiler translates a Snowball script (a .sbl file) into program in thread-safe ANSI C, Java, Ada, C#, Go, Javascript, Object Pascal, Python or Rust. For ANSI C, each Snowball script produces a program file and corresponding header file (with .c and .h extensions). [2] The Snowball compiler checks the consistency of its script, and this check was used to discover a typo in a seminal academic paper by Lovins which had remained undetected for 30 years. [3]

The basic datatypes handled by Snowball are strings of characters, signed integers, and boolean truth values, or more simply strings, integers and booleans. Snowball's characters are either 8-bit wide, or 16-bit, depending on the mode of use. In particular, both ASCII and 16-bit Unicode are supported. Like the SNOBOL programming language, the flow of control in Snowball is arranged by the implicit use of signals (each statement returns a true or false value), rather than the explicit use of constructs such as if, then, and break found in C and many other programming languages. [4]

Related Research Articles

C is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems code, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

Icon is a very high-level programming language based on the concept of "goal-directed execution" in which code returns a "success" along with valid values, or a "failure", indicating that there is no valid data to return. The success and failure of a given block of code is used to direct further processing, whereas conventional languages would typically use boolean logic written by the programmer to achieve the same ends. Because the logic for basic control structures is often implicit in Icon, common tasks can be completed with less explicit code.

SNOBOL is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of a number of text-string-oriented languages developed during the 1950s and 1960s; others included COMIT and TRAC.

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in, where is a string literal with value. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, "primitive data types" may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification of the programming language in which the source code is written. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

<span class="mw-page-title-main">ActionScript</span> Object-oriented programming language created for the Flash multimedia platform

ActionScript is an object-oriented programming language originally developed by Macromedia Inc.. It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript, though it originally arose as a sibling, both being influenced by HyperTalk. ActionScript code is usually converted to byte-code format by a compiler.

In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value or its textual representation as a string, and vice versa. Type conversions can take advantage of certain features of type hierarchies or data representations. Two important aspects of a type conversion are whether it happens implicitly (automatically) or explicitly, and whether the underlying data representation is converted from one representation into another, or a given representation is merely reinterpreted as the representation of another data type. In general, both primitive and compound data types can be converted.

In computer science, the Boolean is a data type that has one of two possible values which is intended to represent the two truth values of logic and Boolean algebra. It is named after George Boole, who first defined an algebraic system of logic in the mid 19th century. The Boolean data type is primarily associated with conditional statements, which allow different actions by changing control flow depending on whether a programmer-specified Boolean condition evaluates to true or false. It is a special case of a more general logical data type—logic does not always need to be Boolean.

<span class="mw-page-title-main">CMS-2</span>

CMS-2 is an embedded systems programming language used by the United States Navy. It was an early attempt to develop a standardized high-level computer programming language intended to improve code portability and reusability. CMS-2 was developed primarily for the US Navy’s tactical data systems (NTDS).

In computer programming, a sigil is a symbol affixed to a variable name, showing the variable's datatype or scope, usually a prefix, as in $foo, where $ is the sigil.

In computer science, a literal is a textual representation (notation) of a value as it is written in source code. Almost all programming languages have notations for atomic values such as integers, floating-point numbers, and strings, and usually for Booleans and characters; some also have notations for elements of enumerated types and compound values such as arrays, records, and objects. An anonymous function is a literal for the function type.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members, enumeral, or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language. An enumerated type can be seen as a degenerate tagged union of unit type. A variable that has been declared as having an enumerated type can be assigned any of the enumerators as a value. In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily.

<span class="mw-page-title-main">Visual Basic (classic)</span> Microsofts programming language based on BASIC and COM

Visual Basic (VB) before .NET, sometimes referred to as Classic Visual Basic, is a third-generation programming language, based on BASIC, and an integrated development environment (IDE), from Microsoft for Windows known for supporting rapid application development (RAD) of graphical user interface (GUI) applications, event-driven programming and both consumption and development of components via the Component Object Model (COM) technology.

Pic Micro Pascala.k.a. PMP is a free Pascal cross compiler for PIC microcontrollers. It is intended to work with the Microchip Technology MPLAB suite installed; it has its own IDE (Scintilla-based) and it is a highly optimized compiler.

References

  1. 1 2 "Snowball", Martin Porter, web page. Retrieved 2 September 2014.
  2. "Snowball: Quick introduction", Martin Porter, web page. Retrieved 2 September 2014.
  3. Martin Porter (December 2001). "Lovins revisited". snowball.tartarus.org. Retrieved 6 August 2024.
  4. "Snowball Manual", Martin Porter, web page. Retrieved 2 September 2014.