Comparison of programming languages (strings)

Last updated

This comparison of programming languages (strings) compares the features of string data structures or text-string processing for over 52 various computer programming languages.

Contents

Concatenation

Different languages use different symbols for the concatenation operator. Many languages use the "+" symbol, though several deviate from this.

Common variants

OperatorLanguages
+ ALGOL 68, BASIC, C++, C#, Cobra, Dart, Eiffel, F#, Go, Java, JavaScript, Object Pascal, Objective-C, Pascal, Python, Ruby, Rust, Scala, Swift, Turing, Windows PowerShell, Ya
++ Erlang, Haskell
$+ mIRC scripting language
& Ada, AppleScript, COBOL (for literals only), Curl, Excel, FreeBASIC, HyperTalk, Nim, Seed7, VHDL, Visual Basic, Visual Basic .NET
concatenate Common Lisp
.Autohotkey, Maple (up to version 5), Perl, PHP
~ D, Raku, Symfony (Expression Language component)
|| Icon, Maple (from version 6), PL/I, Rexx, Standard SQL
<> Mathematica, Wolfram Language
.. Lua
 : Pick Basic
, APL, J, Smalltalk
^ F#, OCaml, rc, Standard ML
// Fortran
* Julia

Unique variants

String literals

This section compares styles for declaring a string literal.

Quoted interpolated

An expression is "interpolated" into a string when the compiler/interpreter evaluates it and inserts the result in its place.

SyntaxLanguage(s)
$"hello, {name}"C#, Visual Basic .NET
"Hello, $name!" Bourne shell, Dart, Perl, PHP, Windows PowerShell
qq(Hello, $name!)Perl (alternate)
"Hello, {$name}!"PHP (alternate)
"Hello, #{name}!"CoffeeScript, Ruby
%Q(Hello, #{name}!)Ruby (alternate)
(format nil "Hello, ~A" name) Common Lisp
`Hello, ${name}!`JavaScript (ECMAScript 6)
"Hello, \(name)!"Swift
f'Hello, {name}!'Python

Escaped quotes

"Escaped" quotes means that a 'flag' symbol is used to warn that the character after the flag is used in the string rather than ending the string.

SyntaxLanguage(s)
"I said \"Hello, world!\""C, C++, C#, D, Dart, F#, Java, JavaScript, Mathematica, Ocaml, Perl, PHP, Python, Rust, Swift, Wolfram Language, Ya
'I said \'Hello, world!\''CoffeeScript, Dart (alternate), JavaScript (alternate), Python (alternate)
"I said `"Hello, world!`""Windows Powershell
"I said ^"Hello, world!^""REBOL
{I said "Hello, world!"}REBOL (alternate)
"I said, %"Hello, World!%""Eiffel
!"I said \"Hello, world!\""FreeBASIC
r#"I said "Hello, world!""#Rust (alternate)

Dual quoting

"Dual quoting" means that whenever a quote is used in a string, it is used twice, and one of them is discarded and the single quote is then used within the string.

SyntaxLanguage(s)
"I said ""Hello, world!"""Ada, ALGOL 68, COBOL, Excel, Fortran, FreeBASIC, Visual Basic (.NET)
'I said ''Hello, world!'''APL, COBOL, Fortran, Object Pascal, Pascal, rc, Smalltalk, SQL

Quoted raw

"Raw" means the compiler treats every character within the literal exactly as written, without processing any escapes or interpolations.

SyntaxLanguage(s)
'Hello, world!'APL, Bourne shell, Fortran, Object Pascal, Pascal, Perl, PHP, Pick Basic, Ruby, Smalltalk, Windows PowerShell
q(Hello, world!)Perl (alternate)
%q(Hello, world!)Ruby (alternate)
R"(Hello, world!)"C++11
@"Hello, world!"C#, F#
r"Hello, world!"Cobra, D, Dart, Python, Rust
r'Hello, world!'Dart (alternate)
"Hello, world!"Cobol, FreeBASIC, Pick Basic
`Hello, world!`D, Go
raw"Hello, world!"Scala
String.raw`Hello, World!`JavaScript (ECMAScript 6)

Multiline string

Many languages have a syntax specifically intended for strings with multiple lines. In some of these languages, this syntax is a here document or "heredoc": A token representing the string is put in the middle of a line of code, but the code continues after the starting token and the string's content doesn't appear until the next line. In other languages, the string's content starts immediately after the starting token and the code continues after the string literal's terminator.

SyntaxHere
document
Language(s)
<<EOF I have a lot of things to say and so little time to say them EOF
YesBourne shell, Perl, Ruby
<<<EOF I have a lot of things to say and so little time to say them EOF
YesPHP
@" I have a lot of things to say and so little time to say them "@
NoWindows Powershell
"[ I have a lot of things to say and so little time to say them ]"
NoEiffel
""" I have a lot of things to say and so little time to say them """
NoCoffeeScript, Dart, Groovy, Kotlin, Python, Swift
" I have a lot of things to say and so little time to say them "
No Common Lisp (all strings are multiline), Rust (all strings are multiline), Visual Basic .NET (all strings are multiline)
r" I have a lot of things to say and so little time to say them "
NoRust
[[ I have a lot of things to say and so little time to say them ]]
NoLua
` I have a lot of things to say and so little time to say them `
NoJavaScript (ECMAScript 6)

Unique quoting variants

SyntaxVariant nameLanguage(s)
13HHello, world!Hollerith notationFortran 66
(indented with whitespace)Indented with whitespace and newlinesYAML

Notes

1. ^ String.raw`` still processes string interpolation.

Related Research Articles

<span class="mw-page-title-main">Formal language</span> Sequence of words formed by specific rules

In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.

<span class="mw-page-title-main">Regular expression</span> Sequence of characters that forms a search pattern

A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

<span class="mw-page-title-main">String (computer science)</span> Sequence of characters, data type

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed. A string is generally considered as a data type and is often implemented as an array data structure of bytes that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence data types and structures.

<span class="mw-page-title-main">S-expression</span> Data serialization format

In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.

In computer science, Backus–Naur form is a notation used to describe the syntax of programming languages or other formal languages. It was developed by John Backus and Peter Naur. BNF can be described as a metasyntax notation for context-free grammars. Backus–Naur form is applied wherever exact descriptions of languages are needed, such as in official language specifications, in manuals, and in textbooks on programming language theory. BNF can be used to describe document formats, instruction sets, and communication protocols.

In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenation theory, also called string theory, string concatenation is a primitive notion.

In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation.

Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in Large language models (LLMs), but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.

A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where "foo" is a string literal with value foo. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.

The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

The backtick` is a typographical mark used mainly in computing. It is also known as backquote, grave, or grave accent.

In computer programming, operators are constructs defined within programming languages which behave generally like functions, but which differ syntactically or semantically.

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

In computing, a here document is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also used for a form of multiline string literals that use similar syntax, preserving line breaks and other whitespace in the text.

Harbour is a computer programming language, primarily used to create database/business programs. It is a modernised, open source and cross-platform version of the older Clipper system, which in turn developed from the dBase database market of the 1980s and 1990s.

In computer programming, a one-pass compiler is a compiler that passes through the parts of each compilation unit only once, immediately translating each part into its final machine code. This is in contrast to a multi-pass compiler which converts the program into one or more intermediate representations in steps between source code and machine code, and which reprocesses the entire compilation unit in each sequential pass.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

This comparison of programming languages compares the features of language syntax (format) for over 50 computer programming languages.

In computer programming, string interpolation is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values. It is a form of simple template processing or, in formal terms, a form of quasi-quotation. The placeholder may be a variable name, or in some languages an arbitrary expression, in either case evaluated in the current context.

References

1. ^ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw