![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page . (Learn how and when to remove these template messages)
|
This is a comparison of regular-expression engines.
Name | Official website | Programming language | Software license | Used by |
---|---|---|---|---|
Boost.Regex [Note 1] | Boost C++ Libraries | C++ | Boost | Notepad++ >= 6.0.0, EmEditor |
Boost.Xpressive | Boost C++ Libraries | C++ | Boost | |
DEELX | RegExLab | C++ | Free personal and commercial use | |
FREJ [Note 2] | Fuzzy Regular Expressions for Java | Java | LGPL | |
GLib/GRegex [Note 3] | GLib reference manual | C | LGPL | |
GRETA | Microsoft Research | C++ | Microsoft | |
Gregex | Grovf Inc. | RTL, HLS | Proprietary | FPGA accelerated >100Gbit/s regex engine for cybersecurity, financial, e-commerce industries. |
RXP | Titan IC | RTL | Proprietary | hardware-accelerated search acceleration using RegEx available for ASIC, FPGA and cloud. Enables massively parallel content processing at ultra-high speeds. |
Hyperscan | Intel | C, x86-specific assembly (SSSE3+ [1] ) | 3-clause BSD | Rspamd |
ICU | International Components for Unicode | C, C++ [Note 4] | ICU | Foundation (Apple and Swift open-source versions) |
Jakarta/Regexp | The Apache Jakarta Project | Java | Apache | |
java.util.regex | Java's User manual | Java | GNU GPLv2 with Classpath exception | jEdit |
JRegex | JRegex | Java | BSD | |
MATLAB | Regular Expressions | MATLAB Language | MATLAB, The Language of Technical Computing | |
Oniguruma | Kosako | C | BSD | Atom, Take Command Console, Tera Term, TextMate, Sublime Text, SubEthaEdit, EmEditor and jq |
Pattwo | Stevesoft | Java (compatible with Java 1.0) | LGPL | |
PCRE | pcre.org | C, C++ [Note 5] | BSD | Apache HTTP Server, Nginx, BBEdit, Julia, HHVM, Notepad++ < 6.0.0, PHP, Delphi, R, Exim |
Qt/QRegExp | Digia | C++ | Qt GNU GPL v. 3.0, | Kate, Kile |
regex - Henry Spencer's regular expression libraries | ArgList | C | BSD | |
RE2 | RE2 | C++ | BSD | Go, Google Sheets, Gmail, G Suite |
Henry Spencer's Advanced Regular Expressions | Tcl | C | BSD | |
RGX | RGX | C++ based component library | P6R | |
SubReg | Matt Bucknall | C | MIT | |
TPerlRegEx | TPerlRegEx VCL Component | Object Pascal | MPLv1.1 | |
TRE [Note 2] | Ville Laurikari | C | BSD | musl |
TRegExpr | RegExp Studio | Object Pascal | Dual-license: freeware, or LGPL with static linking exception | Total Commander |
XRegExp | XRegExp | JavaScript | MIT | |
Wolfram Language (Mathematica) | Wolfram Language Documentation Center | Wolfram Language | Mathematica, the Wolfram Development Platform |
Language | Official website | Software license | Remarks |
---|---|---|---|
ActionScript 3 | ActionScript Technology Center | Free | |
APL (APLX, Dyalog, GNU) | APL Wiki | Licensed by the respective implementation | ⎕SS (PCRE), ⎕R /⎕S (PCRE), ⎕SS (PCRE2), respectively |
C++11 (C++) | C++ standards website | Licensed by the respective implementation | Since ISO14822:2011(e), similar to ECMAScript on default (Grammar Description) |
D | D | Boost Software License [Note 1] | |
Go | Golang.org | BSD-style | |
Haskell | Haskell.org | BSD3 | Omitted in the language report, and in GHC's Hierarchical Libraries |
Java | Java | GNU General Public License | REs are written as strings in source code: all backslashes must be doubled, harming readability. |
JavaScript (ECMAScript) | ECMA-262 | BSD3 | Limited but REs are first-class citizens of the language with a specific /.../mod syntax. |
Julia | JuliaLang.org | MIT License | REs are part of the language core library using PCRE built-in and an optional wrapper for (C code) ICU is available. |
Lua | Lua.org | MIT License | Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg. |
Mathematica | Wolfram | Proprietary | |
.NET | MSDN | MIT License [Note 2] [Note 3] | |
Nim | nim-lang.org | MIT License | Standard library includes PCRE-based re and nre modules, as well as various alternatives (ex. strutils, pegs (Parsing Expression Grammar matching), strscans, parseutils, etc.). |
Free Pascal (Object Pascal) | www.freepascal.org | LGPL with static linking exception | Free Pascal 2.6+ ships with TRegExpr from Sorokin and two other regular expression libraries; See wiki.lazarus.freepascal.org/Regexpr. |
OCaml | Caml | LGPL | As of 2010 [update] , the standard module is generally regarded as deprecated; [2] often recommended libraries are pcre (with full support for PCRE) and re (which is not as complete but claims better performance and provides frontends to popular syntaxes: PCRE, Perl, Posix, Emacs, shell globbing). |
Perl | Perl.com | Artistic License, or GNU General Public License | Full, central part of the language |
PHP | PHP.net | PHP License | Has two implementations, with PCRE being the more efficient in speed, functions |
POSIX C (C) | POSIX.1 web publication | Licensed by the respective implementation | Supports POSIX BRE and ERE syntax |
Python | python.org | Python Software Foundation License | Python has two major implementations, the built in re and the regex library. |
Ruby | ruby-doc.org | GNU Library General Public License | Ruby 1.8, Ruby 1.9, and Ruby 2.0 and later versions use different engines; Ruby 1.9 integrates Oniguruma, Ruby 2.0 and later integrate Onigmo, a fork from Oniguruma. |
Rust | docs.rs | MIT License | The primary regex crate does not allow look-around expressions. There is an Oniguruma binding called onig that does. |
SAP ABAP | SAP.com | Proprietary | |
Tcl | tcl.tk | Tcl/Tk License (BSD-style) | Tcl library doubles as a regular expression library. |
Wolfram Language | Wolfram Research | Proprietary: usable for free on a limited scale on the Wolfram Development platform | |
XML Schema | W3C | Licensed by the respective implementation | |
XPath 3/XQuery | W3C | Licensed by the respective implementation |
NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU grep which uses PCRE does not offer lookahead support, though PCRE does.
"+" quantifier | Negated character classes | Non-greedy quantifiers [Note 1] | Shy groups [Note 2] | Recursion | Look-ahead | Look-behind | Backreferences [Note 3] | >9 indexable captures | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes [Note 4] | Yes | Yes | Yes | Yes |
Boost.Xpressive | Yes | Yes | Yes | Yes | Yes [Note 5] | Yes | Yes | Yes | Yes |
CL-PPCRE | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
EmEditor | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
FREJ | No [Note 6] | No | Some [Note 6] | Yes | No | No | No | Yes | Yes |
GLib/GRegex | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
GNU grep | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | ? |
Haskell | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
RXP | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes |
ICU Regex | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Java | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
JavaScript (ECMAScript) | Yes | Yes | Yes | Yes | No | Yes | Yes [Note 7] | Yes | Yes |
JGsoft | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Lua | Yes | Yes | Some [Note 8] | No | No | No | No | Yes | No |
.NET | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
OCaml | Yes | Yes | No | No | No | No | No | Yes | No |
PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
PHP | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Python | Yes | Yes | Yes | Yes | Yes [Note 9] | Yes | Yes | Yes | Yes |
Qt/QRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
RE2 | Yes | Yes | Yes | Yes | No | No | No | No | Yes |
Ruby / Onigmo | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
TRE | Yes | Yes | Yes | Yes | No | No | No | Yes | No |
Vim | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
RGX | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Tcl | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
TRegExpr | Yes | ? | Yes | ? | ? | ? | ? | ? | ? |
XML Schema | Yes | Yes | No | N/A | No | No | No | No | N/A |
XPath 3/XQuery | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes |
XRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
-
, which is a non-greedy version of *
. It does not have non-greedy versions of +
or ?
; in the former case, the non-greedy effect can be achieved by repeating the token followed by -
, but in the latter case, there is no equivalent.Directives [Note 1] | Conditionals | Atomic groups [Note 2] | Named capture [Note 3] | Comments | Embedded code | Unicode property support [3] | Balancing groups [Note 4] | Variable-length look-behinds [Note 5] | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes | No | Some [Note 6] | No | No |
Boost.Xpressive | Yes | No | Yes | Yes | Yes | No | No | No | No |
CL-PPCRE | Yes | Yes | Yes | Yes | Yes | Yes | Some [Note 6] | No | No |
EmEditor | Yes | Yes | ? | ? | Yes | No | ? | No | No |
FREJ | No | No | Yes | Yes | Yes | No | ? | No | No |
GLib/GRegex | Yes | Yes | Yes | Yes | Yes | No | Some [Note 6] | No | No |
GNU grep | Yes | Yes | ? | Yes | Yes | No | No | No | No |
Haskell | ? | ? | ? | ? | ? | No | No | No | No |
RXP | Yes | Yes | No | Yes | Yes | No | No | No | No |
ICU Regex | Yes | No | Yes | Yes [Note 7] | Yes | No | Yes | No | No |
Java | Yes | No | Yes | Yes [Note 8] | Yes | No | Some [Note 6] | No | No |
JavaScript (ECMAScript) | No | No | No | No | No | No | Some [Note 6] [Note 9] [4] | No | No |
JGsoft | Yes | Yes | Yes | Yes | Yes | No | Some [Note 6] | No | Yes |
Lua | No | No | No | No | No | No | No | No | No |
.NET | Yes | Yes | Yes | Yes | Yes | No | Some [Note 6] | Yes | Yes |
OCaml | No | No | No | No | No | No | No | No | No |
PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No |
Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No [Note 10] |
PHP | Yes | Yes | Yes | Yes | Yes | No | No | No | No |
Python | Yes | Yes | Yes [Note 11] | Yes | Yes | No | Yes [Note 12] | No | Yes [Note 11] |
Qt/QRegExp | No | No | No | No | No | No | No | No | No |
RE2 | Yes | No | ? | Yes | No | No | Some [Note 6] | No | No |
Ruby / Onigmo | Yes | Yes | Yes | Yes | Yes | Yes | Some [Note 6] | No | No |
Tcl | Yes | No | Yes | No | Yes | No | Yes | No | No |
TRE | Yes | No | No | No | Yes | No | ? | No | No |
Vim | Yes | No | Yes | No | No | No | No | No | Yes |
RGX | Yes | Yes | Yes | Yes | Yes | No | Yes | No | No |
XML Schema | No | No | No | No | No | No | Yes | No | No |
XPath 3/XQuery | No | No | No | No | No | No | Yes | No | No |
XRegExp | Leading only | No | No | Yes | Yes | No | Yes | No | No |
Native UTF-16 support [Note 1] | Native UTF-8 support [Note 1] | Multi-line matching | Partial match [Note 2] | |
---|---|---|---|---|
Boost.Regex | No | No | Yes | Yes |
GLib/GRegex | Yes | Yes | Yes | Yes |
RXP | Yes | Yes | No | Yes |
ICU Regex | Yes | No | Yes | ? |
Java | No | Partial [Note 3] | Yes | Yes |
.NET | No [Note 4] | Yes | Yes | ? |
PCRE | Yes [Note 5] | Yes | Yes | Yes |
Qt/QRegExp | Yes | No | No | ? |
Tcl | Yes | Yes [Note 6] | Yes | ? |
TRE | Yes | Yes | Yes | ? |
RGX | No | No | Yes | ? |
wxWdigets::wxRegEx [Note 7] | Yes | Yes | Yes | ? |
XRegExp | Yes | ? | Yes | ? |
AWK (awk) is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
A regular expression is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.
JScript is Microsoft's dialect of the ECMAScript standard that is used in Microsoft's Internet Explorer.
A path, the general form of the name of a file or directory, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash ("/"), the backslash character ("\"), or colon (":"), though some operating systems may use a different delimiter. Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems, and are essential in the construction of Uniform Resource Locators (URLs). Resources can be represented by either absolute or relative paths.
In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txt textfiles/
moves all files with names ending in .txt
from the current directory to the directory textfiles
. Here, *
is a wildcard standing for "any string of characters" and *.txt
is a glob pattern. The other common wildcard is the question mark (?
), which stands for one character. For example, mv ?.txt shorttextfiles/
will move all files named with a single character followed by .txt
from the current directory to directory shorttextfiles
, while ??.txt
would match all files whose name consists of 2 characters followed by .txt
.
TextPad is a text editor for the Microsoft Windows family of operating systems. It is produced by Helios Software Solutions. It is currently in its eighth major version.
International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies.
This article provides basic comparisons for notable text editors. More feature details for text editors are available from the Category of text editor features and from the individual products' articles. This article may not be up-to-date or necessarily all-inclusive.
Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and than that of many other regular-expression libraries.
C# is a general-purpose, multi-paradigm programming language encompassing static typing, strong typing, lexically scoped, imperative, declarative, functional, generic, object-oriented (class-based), and component-oriented programming disciplines.
EmEditor is a lightweight extensible commercial text editor for Microsoft Windows. It was developed by Yutaka Emura of Emurasoft, Inc. It includes full Unicode support, 32-bit and 64-bit builds, syntax highlighting, find and replace with regular expressions, vertical selection editing, editing of large files, and is extensible via plugins and scripts. The software has free trial and after that it downgrades to free version, which still can handle huge files and regex.
In computer programming, trimming (trim) or stripping (strip) is a string manipulation in which leading and trailing whitespace is removed from a string.
C++11 is a version of the standard for the programming language C++. It was approved by International Organization for Standardization (ISO) on 12 August 2011, replacing C++03, superseded by C++14 on 18 August 2014 and later, by C++17. The name follows the tradition of naming language versions by the publication year of the specification, though it was formerly named C++0x because it was expected to be published before 2010.
TypeScript is a programming language developed and maintained by Microsoft. It is a strict syntactical superset of JavaScript and adds optional static typing to the language. TypeScript is designed for the development of large applications and transcompiles to JavaScript. As TypeScript is a superset of JavaScript, existing JavaScript programs are also valid TypeScript programs.
RegexBuddy is a regular expression programming tool by Just Great Software Co. Ltd. for the Microsoft Windows operating system. It provides an interface for building, testing, and debugging regular expressions, in addition to a library of commonly used regular expressions, an interface for generating code to use regular expressions in the desired programming environment, a graphical grep tool for searching through files and directories, and an integrated forum for seeking and providing regular expression advice with other RegexBuddy users.
BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for "Binary JSON". It is a binary form for representing simple or complex data structures including associative arrays, integer indexed arrays, and a suite of fundamental scalar types. BSON originated in 2009 at MongoDB. Several scalar data types are of specific interest to MongoDB and the format is used both as a data storage and network transfer format for the MongoDB database, but it can be used independently outside of MongoDB. Implementations are available in a variety of languages such as C, C++, C#, D, Delphi, Erlang, Go, Haskell, Java, JavaScript, Julia, Lua, OCaml, Perl, PHP, Python, Ruby, Rust, Scala, Smalltalk, and Swift.
TRE is an open-source library for pattern matching in text, which works like a regular expression engine with the ability to do approximate string matching. It was developed by Ville Laurikari and is distributed under a 2-clause BSD-like license.
RE2 is a software library for regular expressions via a finite-state machine using automata theory, in contrast to almost all other regular expression libraries, which use backtracking implementations. It provides a C++ interface.
RE/flex is a free and open source computer program written in C++ that generates fast lexical analyzers in C++. RE/flex offers full Unicode support, indentation anchors, word boundaries, lazy quantifiers, and performance tuning options. RE/flex accepts Flex lexer specifications and offers options to generate scanners for Bison parsers. RE/flex includes a fast C++ regular expression library.