Cyclone (programming language)

Last updated
Cyclone
Designed by AT&T Labs
Developer Cornell University
First appeared2002;22 years ago (2002)
Stable release
1.0 / May 8, 2006;17 years ago (2006-05-08)
Website cyclone.thelanguage.org
Influenced by
C
Influenced
Rust, Project Verona

The Cyclone programming language was intended to be a safe dialect of the C language. It avoids buffer overflows and other vulnerabilities that are possible in C programs by design, without losing the power and convenience of C as a tool for system programming. It is no longer supported by its original developers, with the reference tooling not supporting 64-bit platforms. The Rust language is mentioned by the original developers for having integrated many of the same ideas Cyclone had. [1]

Contents

Cyclone development was started as a joint project of AT&T Labs Research and Greg Morrisett's group at Cornell University in 2001. Version 1.0 was released on May 8, 2006. [2]

Language features

Cyclone attempts to avoid some of the common pitfalls of C, while still maintaining its look and performance. To this end, Cyclone places the following limits on programs:

To maintain the tool set that C programmers are used to, Cyclone provides the following extensions:

For a better high-level introduction to Cyclone, the reasoning behind Cyclone and the source of these lists, see this paper.

Cyclone looks, in general, much like C, but it should be viewed as a C-like language.

Pointer types

Cyclone implements three kinds of pointer:

The purpose of introducing these new pointer types is to avoid common problems when using pointers. Take for instance a function, called foo that takes a pointer to an int:

intfoo(int*);

Although the person who wrote the function foo could have inserted NULL checks, let us assume that for performance reasons they did not. Calling foo(NULL); will result in undefined behavior (typically, although not necessarily, a SIGSEGV signal being sent to the application). To avoid such problems, Cyclone introduces the @ pointer type, which can never be NULL. Thus, the "safe" version of foo would be:

intfoo(int@);

This tells the Cyclone compiler that the argument to foo should never be NULL, avoiding the aforementioned undefined behavior. The simple change of * to @ saves the programmer from having to write NULL checks and the operating system from having to trap NULL pointer dereferences. This extra limit, however, can be a rather large stumbling block for most C programmers, who are used to being able to manipulate their pointers directly with arithmetic. Although this is desirable, it can lead to buffer overflows and other "off-by-one"-style mistakes. To avoid this, the ? pointer type is delimited by a known bound, the size of the array. Although this adds overhead due to the extra information stored about the pointer, it improves safety and security. Take for instance a simple (and naïve) strlen function, written in C:

intstrlen(constchar*s){inti=0;if(s==NULL)return0;while(s[i]!='\0'){i++;}returni;}

This function assumes that the string being passed in is terminated by NULL ('\0'). However, what would happen if charbuf[6]={'h','e','l','l','o','!'}; were passed to this string? This is perfectly legal in C, yet would cause strlen to iterate through memory not necessarily associated with the string s. There are functions, such as strnlen which can be used to avoid such problems, but these functions are not standard with every implementation of ANSI C. The Cyclone version of strlen is not so different from the C version:

intstrlen(constchar?s){inti,n=s.size;if(s==NULL)return0;for(i=0;i<n;i++,s++)if(*s=='\0')returni;returnn;}

Here, strlen bounds itself by the length of the array passed to it, thus not going over the actual length. Each of the kinds of pointer type can be safely cast to each of the others, and arrays and strings are automatically cast to ? by the compiler. (Casting from ? to * invokes a bounds check, and casting from ? to @ invokes both a NULL check and a bounds check. Casting from * to ? results in no checks whatsoever; the resulting ? pointer has a size of 1.)

Dangling pointers and region analysis

Consider the following code, in C:

char*itoa(inti){charbuf[20];sprintf(buf,"%d",i);returnbuf;}

The function itoa allocates an array of chars buf on the stack and returns a pointer to the start of buf. However, the memory used on the stack for buf is deallocated when the function returns, so the returned value cannot be used safely outside of the function. While GNU Compiler Collection and other compilers will warn about such code, the following will typically compile without warnings:

char*itoa(inti){charbuf[20],*z;sprintf(buf,"%d",i);z=buf;returnz;}

GNU Compiler Collection can produce warnings for such code as a side-effect of option -O2 or -O3, but there are no guarantees that all such errors will be detected. Cyclone does regional analysis of each segment of code, preventing dangling pointers, such as the one returned from this version of itoa. All of the local variables in a given scope are considered to be part of the same region, separate from the heap or any other local region. Thus, when analyzing itoa, the Cyclone compiler would see that z is a pointer into the local stack, and would report an error.

See also

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

<span class="mw-page-title-main">Pointer (computer programming)</span> Object which stores memory addresses in a computer program

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

A function pointer, also called a subroutine pointer or procedure pointer, is a pointer referencing executable code, rather than data. Dereferencing the function pointer yields the referenced function, which can be invoked and passed arguments just as in a normal function call. Such an invocation is also known as an "indirect" call, because the function is being invoked indirectly through a variable instead of directly through a fixed identifier or address.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In the C++ programming language, a reference is a simple reference datatype that is less powerful but safer than the pointer type inherited from C. The name C++ reference may cause confusion, as in computer science a reference is a general concept datatype, with pointers and C++ references being specific reference datatype implementations. The definition of a reference in C++ is such that it does not need to exist. It can be implemented as a new name for an existing object.

<span class="mw-page-title-main">Dangling pointer</span> Pointer that does not point to a valid object

Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references are references that do not resolve to a valid destination.

In computing, an uninitialized variable is a variable that is declared but is not set to a definite known value before it is used. It will have some value, but not a predictable one. As such, it is a programming error and a common source of bugs in software.

In some programming languages, const is a type qualifier, which indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

typedef is a reserved keyword in the programming languages C, C++, and Objective-C. It is used to create an additional name (alias) for another data type, but does not create a new type, except in the obscure case of a qualified typedef of an array type where the typedef qualifiers are transferred to the array element type. As such, it is often used to simplify the syntax of declaring complex data structures consisting of struct and union types, although it is also commonly used to provide specific descriptive type names for integer data types of varying sizes.

sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of char-sized units. Consequently, the construct sizeof (char) is guaranteed to be 1. The actual number of bits of type char is specified by the preprocessor macro CHAR_BIT, defined in the standard include file limits.h. On most modern computing platforms this is eight bits. The result of sizeof has an unsigned integer type that is usually denoted by size_t.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

The C++ programming language has support for string handling, mostly implemented in its standard library. The language standard specifies several string types, some inherited from C, some designed to make use of the language's features, such as classes and RAII. The most-used of these is std::string.

setjmp.h is a header defined in the C standard library to provide "non-local jumps": control flow that deviates from the usual subroutine call and return sequence. The complementary functions setjmp and longjmp provide this functionality.

Protel stands for "Procedure Oriented Type Enforcing Language". It is a programming language created by Nortel Networks and used on telecommunications switching systems such as the DMS-100. Protel-2 is the object-oriented version of Protel.

Secure coding is the practice of developing computer software in such a way that guards against the accidental introduction of security vulnerabilities. Defects, bugs and logic flaws are consistently the primary cause of commonly exploited software vulnerabilities. Through the analysis of thousands of reported vulnerabilities, security professionals have discovered that most vulnerabilities stem from a relatively small number of common software programming errors. By identifying the insecure coding practices that lead to these errors and educating developers on secure alternatives, organizations can take proactive steps to help significantly reduce or eliminate vulnerabilities in software before deployment.

<span class="mw-page-title-main">ATS (programming language)</span> Programming language

In computing, ATS is a programming language designed by Hongwei Xi to unify programming with formal specification. ATS has support for combining theorem proving with practical programming through the use of advanced type systems. A past version of The Computer Language Benchmarks Game has demonstrated that the performance of ATS is comparable to that of the C and C++ programming languages. By using theorem proving and strict type checking, the compiler can detect and prove that its implemented functions are not susceptible to bugs such as division by zero, memory leaks, buffer overflow, and other forms of memory corruption by verifying pointer arithmetic and reference counting before the program compiles. Additionally, by using the integrated theorem-proving system of ATS (ATS/LF), the programmer may make use of static constructs that are intertwined with the operative code to prove that a function conforms to its specification.

A code sanitizer is a programming tool that detects bugs in the form of undefined or suspicious behavior by a compiler inserting instrumentation code at runtime. The class of tools was first introduced by Google's AddressSanitizer of 2012, which uses directly mapped shadow memory to detect memory corruption such as buffer overflows or accesses to a dangling pointer (use-after-free).

References

  1. "Cyclone". cyclone.thelanguage.org. Retrieved 11 December 2023.
  2. "Cyclone". Cornell University .

Presentations: