Comparison of regular expression engines

Last updated December 19, 2024

This is a comparison of regular expression engines.

Libraries

List of regular expression libraries
Name	Official website	Programming language	Software license	Used by
Boost.Regex^{[Note 1]}	Boost C++ Libraries	C++	Boost	Notepad++ >= 6.0.0, EmEditor
Boost.Xpressive	Boost C++ Libraries	C++	Boost
DEELX	RegExLab	C++	Proprietary
FREJ^{[Note 2]}	Fuzzy Regular Expressions for Java	Java	LGPL
GLib/GRegex^{[Note 3]}	GLib reference manual	C	LGPL
GNU regex	Gnulib reference manual	C	LGPL	GNU libc, GNU programs
GRETA	Microsoft Research	C++	Proprietary
Gregex	Grovf Inc.	RTL, HLS	Proprietary	FPGA accelerated >100 Gbit/s regex engine for cybersecurity, financial, e-commerce industries.
Hyperscan	Intel	C, x86-specific assembly (SSSE3+^[1])	3-clause BSD	Rspamd
ICU	International Components for Unicode	C, C++^{[Note 4]}	ICU	Foundation (Apple and Swift open-source versions)
Jakarta Regexp	The Apache Jakarta Project	Java	Apache
java.util.regex	Java's User manual	Java	GNU GPLv2 with Classpath exception	jEdit
JRegex	JRegex	Java	BSD
MATLAB	Regular Expressions	MATLAB Language	Proprietary
Oniguruma	Kosako	C	BSD	Atom, Take Command Console, Tera Term, TextMate, Sublime Text, SubEthaEdit, EmEditor, jq, Ruby
Pattwo	Stevesoft	Java (compatible with Java 1.0)	LGPL
PCRE	pcre.org	C, C++^{[Note 5]}	BSD	Apache HTTP Server, Nginx, BBEdit, Edbrowse, Julia, HHVM, Notepad++ < 6.0.0, PHP, Delphi, R, Exim, SWI-Prolog, Elixir, Erlang
Qt/QRegExp	Digia Archived 2013-12-12 at the Wayback Machine	C++	Qt GNU GPL v. 3.0, Qt GNU LGPL v. 2.1, Qt Commercial	Kate, Kile
regex - Henry Spencer's regular expression libraries	ArgList	C	BSD
RE2	RE2	C++	BSD	Go, Google Sheets, Gmail, G Suite
Henry Spencer's Advanced Regular Expressions	Tcl	C	BSD
RGX	RGX	C++ based component library	P6R
RXP	Titan IC	RTL	Proprietary	hardware-accelerated search acceleration using RegEx available for ASIC, FPGA and cloud. Enables massively parallel content processing at ultra-high speeds.
SubReg	Matt Bucknall	C	MIT
TPerlRegEx	TPerlRegEx VCL Component	Object Pascal	MPLv1.1
TRE ^{[Note 2]}	Ville Laurikari	C	BSD	musl
TRegExpr	TRegExpr, documentation, (RegExp Studio)	Object Pascal	Dual-license: freeware, or LGPL with static linking exception	Total Commander
Wolfram Language (Mathematica)	Wolfram Language Documentation Center	Wolfram Language	Proprietary	Mathematica, the Wolfram Development Platform
XRegExp	XRegExp	JavaScript	MIT

↑ Formerly called Regex++.
1 2 One of fuzzy regular expression engines.
↑ Included since version 2.13.0.
↑ ICU4J, the Java version, does not support regular expressions.
↑ C++ bindings were developed by Google and became officially part of PCRE in 2006.

Languages

List of languages and frameworks including regular expression support
Language	Official website	Software license	Remarks
ActionScript 3	ActionScript Technology Center	Free
APL (APLX, Dyalog, GNU)	APL Wiki	Licensed by the respective implementation	`⎕SS` (PCRE), `⎕R`/`⎕S` (PCRE), `⎕SS` (PCRE2), respectively
C++11 (C++)	C++ standards website	Licensed by the respective implementation	Since ISO14822:2011(e), similar to ECMAScript on default (Grammar Description)
D	D	Boost Software License ^{[Note 1]}
Elixir	elixir-lang.org	Apache 2.0	Standard library includes PCRE-based Regex module. The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced and some parts of the library go beyond what PCRE offers. Currently PCRE version 8.40 (release date 2017-01-11) is used.
Erlang	erlang.org	Apache 2.0	Standard library includes PCRE-based re module. The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced and some parts of the library go beyond what PCRE offers. Currently PCRE version 8.40 (release date 2017-01-11) is used.
Free Pascal (Object Pascal)	freepascal.org	LGPL with static linking exception	Free Pascal 2.6+ ships with TRegExpr from Sorokin and two other regular expression libraries; See wiki.lazarus.freepascal.org/Regexpr.
Go	go.dev	BSD-style
Haskell	Haskell.org	BSD3	Omitted in the language report, and in GHC's Hierarchical Libraries
Java	Java	GNU General Public License	REs are written as strings in source code: all backslashes must be doubled, harming readability.
JavaScript (ECMAScript)	ECMA-262	BSD3	Limited but REs are first-class citizens of the language with a specific `/.../mod` syntax.
Julia	JuliaLang.org	MIT License	REs are part of the language core library using PCRE built-in and an optional wrapper for (C code) ICU is available.
Lua	Lua.org	MIT License	Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg.
Mathematica	Wolfram	Proprietary
.NET	MSDN	MIT License ^{[Note 2]}^{[Note 3]}
Nim	nim-lang.org	MIT License	Standard library includes PCRE-based re and nre modules, as well as various alternatives (ex. strutils, pegs (Parsing Expression Grammar matching), strscans, parseutils, etc.).
OCaml	Caml	LGPL	As of 2010^[update], the standard module is generally regarded as deprecated;^[2] often recommended libraries are pcre (with full support for PCRE) and re (which is not as complete but claims better performance and provides frontends to popular syntaxes: PCRE, Perl, Posix, Emacs, shell globbing).
Perl	Perl.com	Artistic License, or GNU General Public License	Full, central part of the language
PHP	PHP.net	PHP License	Has two implementations, with PCRE being the more efficient in speed, functions
POSIX C (C)	POSIX.1 web publication	Licensed by the respective implementation	Supports POSIX BRE and ERE syntax
Python	python.org	Python Software Foundation License	Python has two major implementations, the built in re and the regex library.
Ruby	ruby-doc.org	GNU Library General Public License	Ruby 1.8, Ruby 1.9, and Ruby 2.0 and later versions use different engines; Ruby 1.9 integrates Oniguruma, Ruby 2.0 and later integrate Onigmo, a fork from Oniguruma.
Rust	docs.rs	MIT License	The primary regex crate does not allow look-around expressions. There is an Oniguruma binding called onig that does.
SAP ABAP	SAP.com	Proprietary
Tcl	tcl.tk	Tcl/Tk License (BSD-style)	Tcl library doubles as a regular expression library.
Wolfram Language	Wolfram Research	Proprietary: usable for free on a limited scale on the Wolfram Development platform
XML Schema	W3C	Licensed by the respective implementation
XPath 3/XQuery	W3C	Licensed by the respective implementation

↑ "STD.regex - D Programming Language - Digital Mars".
↑ "Dotnet/Corefx". GitHub . 16 February 2022.
↑ "Dotnet/Corefx". GitHub . 16 February 2022.

Language features

NOTE: An application using a library for regular expression support does not necessarily support the full set of features of the library, e.g., GNU grep uses PCRE, but supports no lookahead, though PCRE does.

Part 1

Language feature comparison (part 1)
	"+" quantifier	Negated character classes	Non-greedy quantifiers ^{[Note 1]}	Shy groups ^{[Note 2]}	Recursion	Look-ahead	Look-behind	Backreferences ^{[Note 3]}	>9 indexable captures
Boost.Regex	Yes	Yes	Yes	Yes	Yes^{[Note 4]}	Yes	Yes	Yes	Yes
Boost.Xpressive	Yes	Yes	Yes	Yes	Yes^{[Note 5]}	Yes	Yes	Yes	Yes
CL-PPCRE	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
EmEditor	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	No
FREJ	No^{[Note 6]}	No	Some^{[Note 6]}	Yes	No	No	No	Yes	Yes
GLib/GRegex	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
GNU grep	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	—
Haskell	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
RXP	Yes	Yes	Yes	Yes	No	No	No	Yes	Yes
ICU Regex	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
Java	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
JavaScript (ECMAScript)	Yes	Yes	Yes	Yes	No	Yes	Yes^{[Note 7]}	Yes	Yes
JGsoft	Yes	Yes	Yes	Yes	Yes^[3]	Yes	Yes	Yes	Yes
Lua	Yes	Yes	Some^{[Note 8]}	No	No	No	No	Yes	No
.NET	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
OCaml	Yes	Yes	No	No	No	No	No	Yes	No
PCRE	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Perl	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
PHP	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Python	Yes	Yes	Yes	Yes	Yes^{[Note 9]}	Yes	Yes	Yes	Yes
Qt/QRegExp	Yes	Yes	Yes	Yes	No	Yes	No	Yes	Yes
RE2	Yes	Yes	Yes	Yes	No	No	No	No	Yes
Ruby, Onigmo	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
TRE	Yes	Yes	Yes	Yes	No	No	No	Yes	No
Vim	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	No
RGX	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
Tcl	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
TRegExpr	Yes	?	Yes	?	?	?	?	?	?
XML Schema	Yes	Yes	No	—	No	No	No	No	—
XPath 3/XQuery	Yes	Yes	Yes	Yes	No	No	No	Yes	Yes
XRegExp	Yes	Yes	Yes	Yes	No	Yes	Yes^{[Note 7]}	Yes	Yes

↑ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.
↑ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.
↑ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab".
↑ "Perl Regular Expression Syntax - 1.47.0".
↑ "User's Guide - 1.47.0".
1 2 FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.
1 2 As of ES2018
↑ Lua's only non-greedy quantifier is -, which is a non-greedy version of *. It does not have non-greedy versions of + or ?; in the former case, the non-greedy effect can be achieved by repeating the token followed by -, but in the latter case, there is no equivalent.
↑ Supported by the optional regex library only.

Part 2

Language feature comparison (part 2)
	Directives ^{[Note 1]}	Conditionals	Atomic groups ^{[Note 2]}	Named capture ^{[Note 3]}	Comments	Embedded code	Unicode property support ^[4]	Balancing groups ^{[Note 4]}	Variable-length look-behinds ^{[Note 5]}
Boost.Regex	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	No	No
Boost.Xpressive	Yes	No	Yes	Yes	Yes	No	No	No	No
CL-PPCRE	Yes	Yes	Yes	Yes	Yes	Yes	Some^{[Note 6]}	No	No
EmEditor	Yes	Yes	?	?	Yes	No	?	No	No
FREJ	No	No	Yes	Yes	Yes	No	?	No	No
GLib/GRegex	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	No	No
GNU grep	Yes	Yes	?	Yes	Yes	No	No	No	No
Haskell	?	?	?	?	?	No	No	No	No
RXP	Yes	Yes	No	Yes	Yes	No	No	No	No
ICU Regex	Yes	No	Yes	Yes^{[Note 7]}	Yes	No	Yes	No	No
Java	Yes	No	Yes	Yes^{[Note 8]}	Yes	No	Some^{[Note 6]}	No	No
JavaScript (ECMAScript)	No	No	No	Yes	No	No	Some^{[Note 6]}^{[Note 9]}^[5]	No	Yes
JGsoft	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	No	Yes
Lua	No	No	No	No	No	No	No	No	No
.NET	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	Yes	Yes
OCaml	No	No	No	No	No	No	No	No	No
PCRE	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No
Perl	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No^{[Note 10]}
PHP	Yes	Yes	Yes	Yes	Yes	No	No	No	No
Python	Yes	Yes	Yes^{[Note 11]}	Yes	Yes	No	Yes^{[Note 12]}	No	Yes^{[Note 13]}
Qt/QRegExp	No	No	No	No	No	No	No	No	No
RE2	Yes	No	?	Yes	No	No	Some^{[Note 6]}	No	No
Ruby, Onigmo	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	No	No
Tcl	Yes	No	Yes	No	Yes	No	Yes	No	No
TRE	Yes	No	No	No	Yes	No	?	No	No
Vim	Yes	No	Yes	No	No	No	No	No	Yes
RGX	Yes	Yes	Yes	Yes	Yes	No	Yes	No	No
XML Schema	No	No	No	No	No	No	Yes	No	No
XPath 3/XQuery	No	No	No	No	No	No	Yes	No	No
XRegExp	Leading only	No	No	Yes	Yes	No	Yes	No	Yes

↑ Also known as flags modifiers, modes modifiers or option letters. Example pattern: "(?i:test)".
↑ Also called independent sub-expressions.
↑ Similar to back references, but with names instead of indices.
↑ Special feature allowing to match balanced constructs without recursion.
↑ Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable.
1 2 3 4 5 6 7 8 9 Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply.
↑ Available as of ICU55.
↑ Available as of JDK7.
↑ The support and range of properties is dependent on implementation.
↑ Experimental support added in v5.29.9.
↑ Supported by Python v3.11 and later, and the optional regex library only.
↑ May only be available in the regex library when used with Python versions after 3.3.
↑ Supported by the optional regex library only.

API features

API feature comparison
	Native UTF-16 support^{[Note 1]}	Native UTF-8 support^{[Note 1]}	Multi-line matching	Partial match^{[Note 2]}
Boost.Regex	No	No	Yes	Yes
GLib/GRegex	Yes	Yes	Yes	Yes
RXP	Yes	Yes	No	Yes
ICU Regex	Yes	No	Yes	?
Java	Yes^{[Note 3]}	Yes^{[Note 3]}	Yes	Yes
.NET	No^{[Note 4]}	Yes	Yes	?
PCRE	Yes^{[Note 5]}	Yes	Yes	Yes
Qt/QRegExp	Yes	No	No	Yes^{[Note 6]}
Qt/QRegularExpression	Yes	Yes	Yes	Yes
Tcl	Yes	Yes^{[Note 7]}	Yes	?
TRE	Yes	Yes	Yes	?
RGX	No	No	Yes	?
wxWidgets::wxRegEx ^{[Note 8]}	Yes	Yes	Yes	?
XRegExp	Yes	Yes	Yes	No

1 2 Means the format can be used internally without explicit conversion.
↑ Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully..
1 2 Supports Unicode 15.0 standard from 2023..
↑ Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010..
↑ Since version 8.30.
↑ Partial matching is performed implicitly, requiring a separate call to matchedLength() if an exact match fails.
↑ Tcl includes facilities to convert to and from UTF-8.
↑ wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library.

Related Research Articles

<span class="mw-page-title-main">Regular expression</span> Sequence of characters that forms a search pattern

A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit. Almost every webpage is stored in UTF-8.

UTF-16 (16-bit Unicode Transformation Format) is a character encoding method capable of encoding all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two 16-bitcode units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 2¹⁶ (65,536) code points were needed, including most emoji and important CJK characters such as for personal and place names.

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.

UTF-32 (32-bit Unicode Transformation Format), sometimes called UCS-4, is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 2³² Unicode code points, needing actually only 21 bits). In contrast, all other Unicode transformation formats are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, primitive data types may refer to the standard data types built into a programming language. Data types which are not primitive are referred to as derived or composite.

In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txttextfiles/ moves all files with names ending in .txt from the current directory to the directory textfiles. Here, * is a wildcard and *.txt is a glob pattern. The wildcard * stands for "any string of any length including empty, but excluding the path separator characters ".

International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies. ICU has been included as a standard component with Microsoft Windows since Windows 10 version 1703.

This article provides basic comparisons for notable text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles. This article may not be up-to-date or necessarily all-inclusive.

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of name–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and than that of many other regular-expression libraries.

C# is a general-purpose high-level programming language supporting multiple paradigms. C# encompasses static typing, strong typing, lexically scoped, imperative, declarative, functional, generic, object-oriented (class-based), and component-oriented programming disciplines.

Raku rules are the regular expression, string matching and general-purpose parsing facility of the Raku programming language, and are a core part of the language. Since Perl's pattern-matching constructs have exceeded the capabilities of formal regular expressions for some time, Raku documentation refers to them exclusively as regexes, distancing the term from the formal definition.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.

A regular expression denial of service (ReDoS) is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and/or an input that takes a long time to evaluate. The attack exploits the fact that many regular expression implementations have super-linear worst-case complexity; on certain regex-input pairs, the time taken can grow polynomially or exponentially in relation to the input size. An attacker can thus cause a program to spend substantial time by providing a specially crafted regular expression and/or input. The program will then slow down or become unresponsive.

TRE is an open-source library for pattern matching in text, which works like a regular expression engine with the ability to do approximate string matching. It was developed by Ville Laurikari and is distributed under a 2-clause BSD-like license.

RE2 is a software library which implements a regular expression engine. It uses finite-state machines, in contrast to most other regular expression libraries. RE2 supports a C++ interface.

re2c is a free and open-source lexer generator for C, C++, D, Go, Haskell, Java, JavaScript, OCaml, Python, Rust, V and Zig. It compiles declarative regular expression specifications to deterministic finite automata. Originally written by Peter Bumbulis and described in his paper, re2c was put in public domain and has been since maintained by volunteers. It is the lexer generator adopted by projects such as PHP, SpamAssassin, Ninja build system and others. Together with the Lemon parser generator, re2c is used in BRL-CAD. This combination is also used with STEPcode, an implementation of ISO 10303 standard.

RE/flex is a free and open source computer program written in C++ that generates fast lexical analyzers in C++. RE/flex offers full Unicode support, indentation anchors, word boundaries, lazy quantifiers, and performance tuning options. RE/flex accepts Flex lexer specifications and offers options to generate scanners for Bison parsers. RE/flex includes a fast C++ regular expression library.

References

↑ "Getting Started – Hyperscan 5.4.0 documentation".
↑ "Regex - Regular Expressions in OCaml".
↑ "Recursive Regex—Tutorial".
↑ "UTS #18: Unicode Regular Expressions".
↑ "ECMA-262, 9th edition, June 2018 ECMAScript® 2018 Language Specification". www.ecma-international.org. Retrieved 4 August 2020.

External links

Regular Expression Flavor Comparison – Detailed comparison of the most popular regular expression flavors
Regexp Syntax Summary
Online Regular Expression Testing – with support for Java, JavaScript, .Net, PHP, Python and Ruby
Implementing Regular Expressions – series of articles by Russ Cox, author of RE2
Regular Expression Engines

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[boost_regex_formerly_regex-1] Formerly called Regex++.

[fuzzy_regexp_libraries-2] 1 2 One of fuzzy regular expression engines.

[glib_gregex_version-3] Included since version 2.13.0.

[icu4j-5] ICU4J, the Java version, does not support regular expressions.

[pcre_cpp-6] C++ bindings were developed by Google and became officially part of PCRE in 2006.

[boost_mars-7] "STD.regex - D Programming Language - Digital Mars".

[dotnet_regex_license-8] "Dotnet/Corefx". GitHub . 16 February 2022.

[dotnet_license-9] "Dotnet/Corefx". GitHub . 16 February 2022.

[non_greedy-11] Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.

[shy-12] Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.

[backref-13] Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab".

[boost_regex_recursion-14] "Perl Regular Expression Syntax - 1.47.0".

[xpressive_recursion-15] "User's Guide - 1.47.0".

[frej_non_greedy-16] 1 2 FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.

[js_lookbehind_es2018-17] 1 2 As of ES2018

[lua_non_greedy-19] Lua's only non-greedy quantifier is -, which is a non-greedy version of *. It does not have non-greedy versions of + or ?; in the former case, the non-greedy effect can be achieved by repeating the token followed by -, but in the latter case, there is no equivalent.

[python_regex_only2-20] Supported by the optional regex library only.

[directives_explanation-21] Also known as flags modifiers, modes modifiers or option letters. Example pattern: "(?i:test)".

[atomic_grouping_explanation-22] Also called independent sub-expressions.

[named_groups_explanation-23] Similar to back references, but with names instead of indices.

[balancing_groups_explanation-25] Special feature allowing to match balanced constructs without recursion.

[varlength_lookbehind_explanation-26] Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable.

[properties_limited-27] 1 2 3 4 5 6 7 8 9 Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply.

[available_icu_55-28] Available as of ICU55.

[available_java_7-29] Available as of JDK7.

[30] The support and range of properties is dependent on implementation.

[perl5_varlength_lookbehind-32] Experimental support added in v5.29.9.

[python_11_and_regex_only-33] Supported by Python v3.11 and later, and the optional regex library only.

[python3_regex_only-34] May only be available in the regex library when used with Python versions after 3.3.

[python_regex_only2-35] Supported by the optional regex library only.

[unicode_native-36] 1 2 Means the format can be used internally without explicit conversion.

[partial_match_explanation-37] Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully..

[Java_Unicode-38] 1 2 Supports Unicode 15.0 standard from 2023..

[UCS2-39] Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010..

[8.30-40] Since version 8.30.

[partial-41] Partial matching is performed implicitly, requiring a separate call to matchedLength() if an exact match fails.

[Conversion-42] Tcl includes facilities to convert to and from UTF-8.

[wxRegEx-43] wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library.

[4] "Getting Started – Hyperscan 5.4.0 documentation".

[10] "Regex - Regular Expressions in OCaml".

[18] "Recursive Regex—Tutorial".

[24] "UTS #18: Unicode Regular Expressions".

[31] "ECMA-262, 9th edition, June 2018 ECMAScript® 2018 Language Specification". www.ecma-international.org. Retrieved 4 August 2020.

[Note 1]

[Note 2]

[Note 3]

[1]

[Note 4]

[Note 5]

[Note 1]

[Note 2]

[Note 3]

[2]

[Note 1]

[Note 2]

[Note 3]

[Note 4]

[Note 5]

[Note 6]

[Note 7]

[3]

[Note 8]

[Note 9]

[Note 1]

[Note 2]

[Note 3]

[4]

[Note 4]

[Note 5]

[Note 6]

[Note 7]

[Note 8]

[Note 9]

[5]

[Note 10]

[Note 11]

[Note 12]

[Note 13]

[Note 1]

[Note 2]

[Note 3]

[Note 4]

[Note 5]

[Note 6]

[Note 7]

[Note 8]

v t e Strings
String metric	Approximate string matching Bitap algorithm Damerau–Levenshtein distance Edit distance Gestalt pattern matching Hamming distance Jaro–Winkler distance Lee distance Levenshtein automaton Levenshtein distance Wagner–Fischer algorithm
String-searching algorithm	Apostolico–Giancarlo algorithm Boyer–Moore string-search algorithm Boyer–Moore–Horspool algorithm Knuth–Morris–Pratt algorithm Rabin–Karp algorithm Raita algorithm Trigram search Two-way string-matching algorithm Zhu–Takaoka string matching algorithm
Multiple string searching	Aho–Corasick Commentz-Walter algorithm
Regular expression	Comparison of regular-expression engines Regular grammar Thompson's construction Nondeterministic finite automaton
Sequence alignment	BLAST Hirschberg's algorithm Needleman–Wunsch algorithm Smith–Waterman algorithm
Data structure	DAFSA Substring index Suffix array Suffix automaton Suffix tree Compressed suffix array LCP array FM-index Generalized suffix tree Rope Ternary search tree Trie
Other	Parsing Pattern matching Compressed pattern matching Longest common subsequence Longest common substring Sequential pattern mining Sorting String rewriting systems String operations