This comparison of programming languages (strings) compares the features of string data structures or text-string processing for over 52 various computer programming languages.
Different languages use different symbols for the concatenation operator. Many languages use the "+" symbol, though several deviate from this.
Operator | Languages |
---|---|
+ | ALGOL 68, BASIC, C++, C#, Cobra, Dart, Eiffel, F#, Go, Java, JavaScript, Object Pascal, Objective-C, Pascal, Python, Ruby, Rust, Scala, Swift, Turing, Windows PowerShell, Ya |
++ | Erlang, Haskell |
$+ | mIRC scripting language |
& | Ada, AppleScript, COBOL (for literals only), Curl, Excel, FreeBASIC, HyperTalk, Nim, Seed7, VHDL, Visual Basic, Visual Basic .NET |
concatenate | Common Lisp |
. | Autohotkey, Maple (up to version 5), Perl, PHP |
~ | D, Raku, Symfony (Expression Language component) |
|| | Icon, Maple (from version 6), PL/I, Rexx, Standard SQL |
<> | Mathematica, Wolfram Language |
.. | Lua |
: | Pick Basic |
, | APL, J, Smalltalk |
^ | F#, OCaml, rc, Standard ML |
// | Fortran |
* | Julia |
strcat
function must be used.STRING
statement to concatenate string variables.[x y]
" to concatenate x and y.+
" sign but at the risk of ambiguity if a string representing a number and a number are together.&
" and the function "=CONCATENATE(X,Y)
".concat!
macro and the format!
macro, of which the latter is the most prevalent throughout the documentation and examples.This section compares styles for declaring a string literal.
An expression is "interpolated" into a string when the compiler/interpreter evaluates it and inserts the result in its place.
Syntax | Language(s) |
---|---|
$"hello, {name}" | C#, Visual Basic .NET |
"Hello, $name!" | Bourne shell, Dart, Perl, PHP, Windows PowerShell |
qq(Hello, $name!) | Perl (alternate) |
"Hello, {$name}!" | PHP (alternate) |
"Hello, #{name}!" | CoffeeScript, Ruby |
%Q(Hello, #{name}!) | Ruby (alternate) |
(format nil "Hello, ~A" name) | Common Lisp |
`Hello, ${name}!` | JavaScript (ECMAScript 6) |
"Hello, \(name)!" | Swift |
f'Hello, {name}!' | Python |
"Escaped" quotes means that a 'flag' symbol is used to warn that the character after the flag is used in the string rather than ending the string.
Syntax | Language(s) |
---|---|
"I said \"Hello, world!\"" | C, C++, C#, D, Dart, F#, Java, JavaScript, Mathematica, Ocaml, Perl, PHP, Python, Rust, Swift, Wolfram Language, Ya |
'I said \'Hello, world!\'' | CoffeeScript, Dart (alternate), JavaScript (alternate), Python (alternate) |
"I said `"Hello, world!`"" | Windows Powershell |
"I said ^"Hello, world!^"" | REBOL |
{I said "Hello, world!"} | REBOL (alternate) |
"I said, %"Hello, World!%"" | Eiffel |
!"I said \"Hello, world!\"" | FreeBASIC |
r#"I said "Hello, world!""# | Rust (alternate) |
R"("I said "Hello, world!")" | C++ (alternate) |
"Dual quoting" means that whenever a quote is used in a string, it is used twice, and one of them is discarded and the single quote is then used within the string.
Syntax | Language(s) |
---|---|
"I said ""Hello, world!""" | Ada, ALGOL 68, COBOL, Excel, Fortran, FreeBASIC, Visual Basic (.NET) |
'I said ''Hello, world!''' | APL, COBOL, Fortran, Object Pascal, Pascal, rc, Smalltalk, SQL |
"Raw" means the compiler treats every character within the literal exactly as written, without processing any escapes or interpolations.
Many languages have a syntax specifically intended for strings with multiple lines. In some of these languages, this syntax is a here document or "heredoc": A token representing the string is put in the middle of a line of code, but the code continues after the starting token and the string's content doesn't appear until the next line. In other languages, the string's content starts immediately after the starting token and the code continues after the string literal's terminator.
Syntax | Here document | Language(s) |
---|---|---|
<<EOF I have a lot of things to say and so little time to say them EOF | Yes | Bourne shell, Perl, Ruby |
<<<EOF I have a lot of things to say and so little time to say them EOF | Yes | PHP |
@" I have a lot of things to say and so little time to say them "@ | No | Windows Powershell |
"[ I have a lot of things to say and so little time to say them ]" | No | Eiffel |
""" I have a lot of things to say and so little time to say them """ | No | CoffeeScript, Dart, Groovy, Kotlin, Python, Swift |
" I have a lot of things to say and so little time to say them " | No | Common Lisp (all strings are multiline), Rust (all strings are multiline), Visual Basic .NET (all strings are multiline) |
R"( I have a lot of things to say and so little time to say them )" | No | C++ |
r" I have a lot of things to say and so little time to say them " | No | Rust |
[[ I have a lot of things to say and so little time to say them ]] | No | Lua |
` I have a lot of things to say and so little time to say them ` | No | JavaScript (ECMAScript 6) |
Syntax | Variant name | Language(s) |
---|---|---|
13HHello, world! | Hollerith notation | Fortran 66 |
(indented with whitespace) | Indented with whitespace and newlines | YAML |
String.raw``
still processes string interpolation.In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules called a formal grammar.
A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.
In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenation theory, also called string theory, string concatenation is a primitive notion.
In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programming language. They are extensions of the basic Backus–Naur form (BNF) metasyntax notation.
Lexical tokenization is conversion of a text into meaningful lexical tokens belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives, punctuations etc. In case of a programming language, the categories include identifiers, operators, grouping symbols and data types. Lexical tokenization is related to the type of tokenization used in large language models (LLMs) but with two differences. First, lexical tokenization is usually based on a lexical grammar, whereas LLM tokenizers are usually probability-based. Second, LLM tokenizers perform a second step that converts the tokens into numerical values.
A string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo"
, where "foo"
is a string literal with value foo
. Methods such as escape sequences can be used to avoid the problem of delimiter collision and allow the delimiters to be embedded in a string. There are many alternate notations for specifying string literals especially in complicated cases. The exact notation depends on the programming language in question. Nevertheless, there are general guidelines that most modern programming languages follow.
The C preprocessor is the macro preprocessor for several computer programming languages, such as C, Objective-C, C++, and a variety of Fortran languages. The preprocessor provides inclusion of header files, macro expansions, conditional compilation, and line control.
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.
The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.
The backtick` is a typographical mark used mainly in computing. It is also known as backquote, grave, or grave accent.
In computer programming, operators are constructs defined within programming languages which behave generally like functions, but which differ syntactically or semantically.
In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.
In computing, a here document is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also used for a form of multiline string literals that use similar syntax, preserving line breaks and other whitespace in the text.
Harbour is a computer programming language, primarily used to create database/business programs. It is a modernised, open source and cross-platform version of the older Clipper system, which in turn developed from the dBase database market of the 1980s and 1990s.
In computer programming, a one-pass compiler is a compiler that passes through the parts of each compilation unit only once, immediately translating each part into its final machine code. This is in contrast to a multi-pass compiler which converts the program into one or more intermediate representations in steps between source code and machine code, and which reprocesses the entire compilation unit in each sequential pass.
In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.
In computer programming, homoiconicity is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated as data using the language. The program's internal representation can thus be inferred just by reading the program itself. This property is often summarized by saying that the language treats code as data.
This comparison of programming languages compares the features of language syntax (format) for over 50 computer programming languages.
In computer programming, string interpolation is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values. It is a form of simple template processing or, in formal terms, a form of quasi-quotation. The placeholder may be a variable name, or in some languages an arbitrary expression, in either case evaluated in the current context.