Coccinelle (software)

Last updated
Coccinelle
Stable release
1.1.0 [1] / February 25, 2021;3 years ago (2021-02-25)
Repository
Written in OCaml and Python
Type Static program analysis
License GPLv2
Website coccinelle.gitlabpages.inria.fr/website/

Coccinelle (Italian for ladybugs) is an open-source for matching and transforming the source code of programs written in the C programming language.

Contents

Utility

Coccinelle was initially used to aid the evolution of the Linux kernel, providing support for changes to library application programming interfaces (APIs) such as renaming a function, adding a function argument whose value is somehow context-dependent, and reorganizing a data structure.

It can also be used to find defective programming patterns in code (i.e., pieces of code that are erroneous with high probability such as possible NULL pointer dereference) without transforming them. Therefore coccinelle's role is close to that of static analysis tools. Examples of such use are provided by the applications of the herodotos tool, which keeps track of warnings generated by coccinelle. [2] [3]

Support for Coccinelle is provided by IRILL. Funding for the development has been provided by the Agence Nationale de la Recherche (France), the Danish Research Council for Technology and Production Sciences, and INRIA.

The source code of Coccinelle is licensed under the terms of version 2 of the GNU General Public License (GPL).

Semantic Patch Language

The source code to be matched or replaced is specified using a "semantic patch" syntax based on the patch syntax. [4] The Semantic Patch Language (SmPL) pattern resembles a unified diff with C-like declarations. [5] [6]

Example

@@ expression lock, flags; expression urb; @@spin_lock_irqsave(lock, flags); <... - usb_submit_urb(urb)+ usb_submit_urb(urb, GFP_ATOMIC)...> spin_unlock_irqrestore(lock, flags);  @@ expression urb; @@- usb_submit_urb(urb)+ usb_submit_urb(urb, GFP_KERNEL)

Related Research Articles

<span class="mw-page-title-main">Regular expression</span> Sequence of characters that forms a search pattern

A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

<span class="mw-page-title-main">Firmware</span> Low-level computer software

In computing, firmware is software that provides low-level control of computing device hardware. For a relatively simple device, firmware may perform all control, monitoring and data manipulation functionality. For a more complex device, firmware may provide relatively low-level control as well as hardware abstraction services to higher-level software such as an operating system.

grep is a command-line utility for searching plaintext datasets for lines that match a regular expression. Its name comes from the ed command g/re/p, which has the same effect. grep was originally developed for the Unix operating system, but later became available for all Unix-like systems and some others such as OS-9.

Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.

In computer science, a preprocessor is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.

In software development, Make is a command-line interface software tool that performs actions ordered by configured dependencies as defined in a configuration file called a makefile. It is commonly used for build automation to build executable code from source code. But, not limited to building, Make can perform any operation available via the operating system shell.

<span class="mw-page-title-main">F Sharp (programming language)</span> Microsoft programming language

F# is a general-purpose, high-level, strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming methods. It is most often used as a cross-platform Common Language Infrastructure (CLI) language on .NET, but can also generate JavaScript and graphics processing unit (GPU) code.

<span class="mw-page-title-main">Syntax highlighting</span> Tool of editors for programming, scripting, and markup

Syntax highlighting is a feature of text editors that is used for programming, scripting, or markup languages, such as HTML. The feature displays text, especially source code, in different colours and fonts according to the category of terms. This feature facilitates writing in a structured language such as a programming language or a markup language as both structures and syntax errors are visually distinct. This feature is also employed in many programming related contexts, either in the form of colorful books or online websites to make understanding code snippets easier for readers. Highlighting does not affect the meaning of the text itself; it is intended only for human readers.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel developers, such as the mixing of pointers to user and kernel address spaces.

udev is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the /dev directory. At the same time, udev also handles all user space events raised when hardware devices are added into the system or removed from it, including firmware loading as required by certain devices.

The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems. DMS was originally motivated by a theory for maintaining designs of software called Design Maintenance Systems. DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.

<span class="mw-page-title-main">Ksplice</span> Live patch extension for the Linux kernel

Ksplice is an open-source extension of the Linux kernel that allows security patches to be applied to a running kernel without the need for reboots, avoiding downtimes and improving availability. Ksplice supports only the patches that do not make significant semantic changes to kernel's data structures.

<span class="mw-page-title-main">Linux kernel</span> Free Unix-like operating system kernel

The Linux kernel is a free and open source, UNIX-like kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the kernel for the GNU operating system (OS) which was created to be a free replacement for Unix. Since the late 1990s, it has been included in many operating system distributions, many of which are called Linux. One such Linux kernel operating system is Android which is used in many mobile and embedded devices.

nftables is a subsystem of the Linux kernel providing filtering and classification of network packets/datagrams/frames. It has been available since Linux kernel 3.13 released on 19 January 2014.

LinuxCNC is a free, open-source Linux software system that implements computer numerical control (CNC) capability using general purpose computers to control CNC machines. It's mainly intended to run on PC AMD x86-64 systems. Designed by various volunteer developers at linuxcnc.org, it is typically bundled as an ISO file with a modified version of Debian Linux which provides the required real-time kernel.

The following outline is provided as an overview of and topical guide to the Perl programming language:

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 at the Viewpoints Research Institute. The language is based on parsing expression grammars (PEGs), rather than context-free grammars, with the intent to provide "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".

kpatch is a feature of the Linux kernel that implements live patching of a running kernel, which allows kernel patches to be applied while the kernel is still running. By avoiding the need for rebooting the system with a new kernel that contains the desired patches, kpatch aims to maximize the system uptime and availability. At the same time, kpatch allows kernel-related security updates to be applied without deferring them to scheduled downtimes. Internally, kpatch allows entire functions in a running kernel to be replaced with their patched versions, doing that safely by stopping all running processes while the live patching is performed.

Julia Laetitia Lawall is a computer scientist specializing in programming languages. Educated in the US, she has worked in the US, Denmark, and France, where she is a director of research for Inria. She is one of the developers of Coccinelle, a tool for finding patterns and making systematic transformations of source code, and she has also done research on domain-specific languages for operating systems.

References

  1. "Coccinelle: A Program Matching and Transformation Tool for Systems Code". coccinelle.gitlabpages.inria.fr. Retrieved 2021-03-09.
  2. Palix, Nicolas; Lawall, Julia; Muller, Gilles (2010). "Tracking code patterns over multiple software versions with Herodotos" (PDF). Proceedings of the 9th International Conference on Aspect-Oriented Software Development (PDF). ACM. pp. 169–180. doi:10.1145/1739230.1739250. ISBN   9781605589589. S2CID   1082611.
  3. Nicolas Palix. "Nicolas Palix: Herodotos".
  4. Padioleau, Yoann; Lawall, Julia; Muller, Gilles (2007). "Semantic Patches, Documenting and Automating Collateral Evolutions in Linux Device Drivers" (PDF). coccinelle.gitlabpages.inria.fr. Retrieved 2020-08-29.
  5. Valerie Henson (2009-01-20). "Semantic patching with Coccinelle". Linux Weekly News . Retrieved 2011-04-25.
  6. Wolfram Sang (2010-03-30). "Evolutionary development of a semantic patch using Coccinelle". Linux Weekly News . Retrieved 2011-04-25.