Programming idiom

Last updated

In computer programming, a programming idiom or code idiom is a group of code fragments sharing an equivalent semantic role, [1] which recurs frequently across software projects often expressing a special feature of a recurring construct in one or more programming languages or libraries. This definition is rooted in the definition of "idiom" as used in the field of linguistics. Developers recognize programming idioms by associating meaning (semantic role) to one or more syntactical expressions within code snippets (code fragments). The idiom can be seen as an action on a programming concept underlying a pattern in code, which is represented in implementation by contiguous or scattered code fragments. These fragments are available in several programming languages, frameworks or even libraries. Generally speaking, a programming idiom's semantic role is a natural language expression of a simple task, algorithm, or data structure that is not a built-in feature in the programming language being used, or, conversely, the use of an unusual or notable feature that is built into a programming language.

Contents

Knowing the idioms associated with a programming language and how to use them is an important part of gaining fluency in that language. It also helps to transfer knowledge in the form of analogies from one language or framework to another. Such idiomatic knowledge is widely used in crowdsourced repositories to help developers overcome programming barriers. [2] Mapping code idioms to idiosyncrasies can be a helpful way to navigate the tradeoffs between generalization and specificity. By identifying common patterns and idioms, developers can create mental models and schemata that help them quickly understand and navigate new code. Furthermore, by mapping these idioms to idiosyncrasies and specific use cases, developers can ensure that they are applying the correct approach and not overgeneralizing it. One way to do this is by creating a reference or documentation that maps common idioms to specific use cases, highlighting where they may need to be adapted or modified to fit a particular project or development team. This can help ensure that developers are working with a shared understanding of best practices and can make informed decisions about when to use established idioms and when to adapt them to fit their specific needs.

A common misconception is to use the adverbial or adjectival form of the term as using a programming language in a typical way, which really refers to a idiosyncrasy. An idiom implies the semantics of some code in a programming language has similarities to other languages or frameworks. For example, an idiosyncratic way to manage dynamic memory in C would be to use the C standard library functions malloc and free, whereas idiomatic refers to manual memory management as recurring semantic role that can be achieved with code fragments malloc in C, or pointer = new type [number_of_elements] in C++. In both cases, the semantics of the code are intelligible to developers familiar with C or C++, once the idiomatic or idiosyncratic rationale is exposed to them. However, while idiomatic rationale is often general to the programming domain, idiosyncratic rationale is frequently tied to specific API terminology.

Examples of simple idioms

Printing Hello World

One of the most common starting points to learn to program or notice the syntax differences between a known language and a new one. [3]

It has several implementations, among them the code fragments for C++:

std::cout<<"Hello World\n";

For Java:

System.out.println("Hello World");

Inserting an element in an array

This idiom helps developers understand how to manipulate collections in a given language, particularly inserting an element x at a position i in a list s and moving the elements to its right. [4]

Code fragments:

For Python:

s.insert(i,x)

For JavaScript:

s.splice(i,0,x);

For Perl:

splice(@s,$i,0,$x)

See also

Related Research Articles

Semantics is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and computer science.

An idiom is a phrase or expression that typically presents a figurative, non-literal meaning attached to the phrase; but some phrases become figurative idioms while retaining the literal meaning of the phrase. Categorized as formulaic language, an idiom's figurative meaning is different from the literal meaning. Idioms occur frequently in all languages; in English alone there are an estimated twenty-five million idiomatic expressions.

<span class="mw-page-title-main">Visual Basic (.NET)</span> Object-oriented computer programming language

Visual Basic (VB), originally called Visual Basic .NET (VB.NET), is a multi-paradigm, object-oriented programming language, implemented on .NET, Mono, and the .NET Framework. Microsoft launched VB.NET in 2002 as the successor to its original Visual Basic language, the last version of which was Visual Basic 6.0. Although the ".NET" portion of the name was dropped in 2005, this article uses "Visual Basic [.NET]" to refer to all Visual Basic languages released since 2002, in order to distinguish between them and the classic Visual Basic. Along with C# and F#, it is one of the three main languages targeting the .NET ecosystem. As of March 11, 2020, Microsoft announced that evolution of the VB.NET language has concluded.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

In programming language theory, semantics is the rigorous mathematical study of the meaning of programming languages. Semantics assigns computational meaning to valid strings in a programming language syntax.

In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is, some facilities are type-safe and their usage will not result in type errors, while other facilities in the same language may be type-unsafe and a program using them may encounter type errors. The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values that are not of the appropriate data type, e.g., adding a string to an integer when there's no definition on how to handle this case. This classification is partly based on opinion.

Construction grammar is a family of theories within the field of cognitive linguistics which posit that constructions, or learned pairings of linguistic patterns with meanings, are the fundamental building blocks of human language. Constructions include words, morphemes, fixed expressions and idioms, and abstract grammatical rules such as the passive voice or the ditransitive. Any linguistic pattern is considered to be a construction as long as some aspect of its form or its meaning cannot be predicted from its component parts, or from other constructions that are recognized to exist. In construction grammar, every utterance is understood to be a combination of multiple different constructions, which together specify its precise meaning and form.

In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.

The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. According to Andreas M. Hein, the semantic gap can be defined as "the difference in meaning between constructs formed within different representation systems". In computer science, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation.

<span class="mw-page-title-main">C Sharp (programming language)</span> Programming language

C# is a general-purpose high-level programming language supporting multiple paradigms. C# encompasses static typing, strong typing, lexically scoped, imperative, declarative, functional, generic, object-oriented (class-based), and component-oriented programming disciplines.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of phrases, clauses, sentences and paragraphs to the level of the writing as a whole, to their language-independent meanings. It also involves removing features specific to particular linguistic and cultural contexts, to the extent that such a project is possible. The elements of idiom and figurative speech, being cultural, are often also converted into relatively invariant meanings in semantic analysis. Semantics, although related to pragmatics, is distinct in that the former deals with word or sentence choice in any given context, while pragmatics considers the unique or particular meaning derived from context or tone. To reiterate in different terms, semantics is about universally coded meaning, and pragmatics, the meaning encoded in words that is then interpreted by an audience.

The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to this, development tools for the two languages are often integrated into a single product, with the programmer able to specify C or C++ as their source language.

Generic Eclipse Modeling System (GEMS) is a configurable toolkit for creating domain-specific modeling and program synthesis environments for Eclipse. The project aims to bridge the gap between the communities experienced with visual metamodeling tools like those built around the Eclipse modeling technologies, such as the Eclipse Modeling Framework (EMF) and Graphical Modeling Framework (GMF). GEMS helps developers rapidly create a graphical modeling tool from a visual language description or metamodel without any coding in third-generation languages. Graphical modeling tools created with GEMS automatically support complex capabilities, such as remote updating and querying, template creation, styling with Cascading Style Sheets (CSS), and model linking.

Domain-driven design (DDD) is a major software design approach, focusing on modeling software to match a domain according to input from that domain's experts.

A multiword expression (MWE), also called phraseme, is a lexeme-like unit made up of a sequence of two or more lexemes that has properties that are not predictable from the properties of the individual lexemes or their normal mode of combination. MWEs differ from lexemes in that the latter are required by many sources to have meaning that cannot be derived from the meaning of separate components. While MWEs must have some properties that cannot be derived from the same property of the components, the property in question does not need to be meaning.

A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java, requiring more sophisticated techniques. Decompilers are usually unable to perfectly reconstruct the original source code, thus will frequently produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.

<span class="mw-page-title-main">Scripting language</span> Programming language for run-time events

A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled.

Idiom, also called idiomaticness or idiomaticity, is the syntactical, grammatical, or structural form peculiar to a language. Idiom is the realized structure of a language, as opposed to possible but unrealized structures that could have developed to serve the same semantic functions but did not.

References

  1. Allamanis, Miltiadis; Sutton, Charles (2014). "Mining idioms from source code". Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 472–483. arXiv: 1404.0417 . doi:10.1145/2635868.2635901. ISBN   9781450330565.
  2. Samudio, David I.; Latoza, Thomas D. (2022). Barriers in Front-End Web Development. D. I. Samudio and T. D. LaToza. 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Roma, Italy, 2022, pp. 1-11, doi: 10.1109/VL/HCC53370.2022.9833127 (PDF). ieeexplore.ieee.org. pp. 1–11. doi:10.1109/VL/HCC53370.2022.9833127. ISBN   978-1-6654-4214-5. S2CID   251657931.
  3. "Print Hello World". www.programming-idioms.org.
  4. "Insert element in list". www.programming-idioms.org.