Comparison of regular expression engines

This is a comparison of regular expression engines.

Libraries

List of regular expression libraries
Name	Official website	Programming language	Software license	Used by
Boost.Regex^{[Note 1]}	Boost C++ Libraries	C++	Boost	Notepad++ >= 6.0.0, EmEditor
Boost.Xpressive	Boost C++ Libraries	C++	Boost
CL-PPCRE	Edi Weitz	Common Lisp	BSD
cppre	Jeff Stuart	C++	GPL
DEELX	RegExLab	C++	Free personal and commercial use
FREJ^{[Note 2]}	Fuzzy Regular Expressions for Java	Java	LGPL
GLib/GRegex^{[Note 3]}	GLib reference manual	C	LGPL
GRETA	Microsoft Research	C++	?
Helios RXPF	Titan IC	RTL	Proprietary	hardware accelerated regex engine for cybersecurity OEMs
Hyperscan	Intel	C, x86-specific assembly (SSSE3+^[1])	3-clause BSD	Rspamd
ICU	International Components for Unicode	C, C++^{[Note 4]}	ICU	Foundation (Apple and Swift open-source versions)
Jakarta/Regexp	The Apache Jakarta Project	Java	Apache
java.util.regex	Java's User manual	Java	GNU GPLv2 with Classpath exception	jEdit
JRegex	JRegex	Java	BSD
MATLAB	Regular Expressions	MATLAB Language	MATLAB, The Language of Technical Computing
Oniguruma	Kosako	C	BSD	Atom, Take Command Console, Tera Term, TextMate, Sublime Text, SubEthaEdit, EmEditor and jq
Pattwo	Stevesoft	Java (compatible with Java 1.0)	LGPL
PCRE	pcre.org	C, C++^{[Note 5]}	BSD	Apache HTTP Server, Nginx, Julia, HHVM, Notepad++ < 6.0.0, PHP
Qt/QRegExp	Digia	C++	Qt GNU GPL v. 3.0, Qt GNU LGPL v. 2.1, Qt Commercial	Kate, Kile
regex - Henry Spencer's regular expression libraries	ArgList	C	BSD
RE2	RE2	C++	BSD	Go
Henry Spencer's Advanced Regular Expressions	Tcl	C	BSD
RGX	RGX	C++ based component library	P6R
SubReg	Matt Bucknall	C	MIT
TPerlRegEx	TPerlRegEx VCL Component	Object Pascal	MPLv1.1
TRE^{[Note 2]}	Ville Laurikari	C	BSD
TRegExpr	RegExp Studio	Object Pascal	Dual-license: freeware, or LGPL with static linking exception	Total Commander
XRegExp	XRegExp	JavaScript	MIT
Wolfram Language (Mathematica)	Wolfram Language Documentation Center	Wolfram Language		Mathematica, the Wolfram Development Platform

^ Formerly called Regex++
^ ^a ^b One of fuzzy regular expression engines
^ Included since version 2.13.0
^ ICU4J, the Java version, does not support regular expressions.
^ C++ bindings were developed by Google and became officially part of PCRE in 2006.

Languages

List of languages and frameworks including regular expression support
Language	Official website	Software license	Remarks
ActionScript 3	ActionScript Technology Center	Free
C++11 (C++)	C++ standards website	Licensed by the respecive implementation	Since ISO14822:2011(e)
D	D	Boost Software License^{[Note 1]}
Go	Golang.org	BSD-style
Haskell	Haskell.org	BSD3	Omitted in the language report, and in GHC's Hierarchical Libraries
Java	Java	GNU General Public License	REs are written as strings in source code: all backslashes must be doubled, harming readability.
JavaScript (ECMAScript)	ECMA-262	BSD3	Limited but REs are first-class citizens of the language with a specific `/.../mod` syntax.
Julia	JuliaLang.org	MIT License	REs are part of the language core library using PCRE built-in and an optional wrapper for (C code) ICU is available.
Lua	Lua.org	MIT License	Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg.
Mathematica	Wolfram	Proprietary
.NET	MSDN	MIT License^{[Note 2]}^{[Note 3]}
Nim	nim-lang.org	MIT License	Standard library includes PCRE-based re and nre modules, as well as various alternatives (ex. strutils, pegs (Parsing Expression Grammar matching), strscans, parseutils, etc.).
Free Pascal (Object Pascal)	www.freepascal.org	LGPL with static linking exception	Free Pascal 2.6+ ships with TRegExpr from Sorokin and two other regular expression libraries; See wiki.lazarus.freepascal.org/Regexpr.
OCaml	Caml	LGPL	As of 2010^[update], the standard module is generally regarded as deprecated;^[2] often recommended libraries are pcre (with full support for PCRE) and re (which is not as complete but claims better performance and provides frontends to popular syntaxes: PCRE, Perl, Posix, Emacs, shell globbing).
Perl	Perl.com	Artistic License, or GNU General Public License	Full, central part of the language
PHP	PHP.net	PHP License	Has two implementations, with PCRE being the more efficient in speed, functions
POSIX C (C)	libc/regex from BSD	BSD	According to regex(3), available from at least 4.4BSD (if not earlier)
Python	python.org	Python Software Foundation License	Python has two major implementations, the built in re and the regex library.
Ruby	ruby-doc.org	GNU Library General Public License	Ruby 1.8 and 1.9 use different engines; 1.9 integrates Oniguruma.
Rust	docs.rs	MIT License	The primary regex crate does not allow look-around expressions. There is an Oniguruma binding called onig that does.
SAP ABAP	SAP.com	Proprietary
Tcl	tcl.tk	Tcl/Tk License (BSD-style)	Tcl library doubles as a regular expression library.
Wolfram Language	Wolfram Research	Proprietary; usable for free on a limited scale on the Wolfram Development platform.
XML Schema	W3C	implementation depend
XPath 3/XQuery	W3C	implementation depend

Language features

NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU grep which uses PCRE does not offer lookahead support, though PCRE does.

Part 1

Language feature comparison (part 1)
	"+" quantifier	Negated character classes	Non-greedy quantifiers^{[Note 1]}	Shy groups^{[Note 2]}	Recursion	Look-ahead	Look-behind	Backreferences^{[Note 3]}	>9 indexable captures
Boost.Regex	Yes	Yes	Yes	Yes	Yes^{[Note 4]}	Yes	Yes	Yes	Yes
Boost.Xpressive	Yes	Yes	Yes	Yes	Yes^{[Note 5]}	Yes	Yes	Yes	Yes
CL-PPCRE	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
EmEditor	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	No
FREJ	No^{[Note 6]}	No	Some^{[Note 6]}	Yes	No	No	No	Yes	Yes
GLib/GRegex	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
GNU grep	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	?
Haskell	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
Helios RXPF	Yes	Yes	Yes	Yes	No	No	No	Yes	Yes
ICU Regex	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
Java	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
JavaScript (ECMAScript)	Yes	Yes	Yes	Yes	No	Yes	No	Yes	Yes
JGsoft	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
Lua	Yes	Yes	Some^{[Note 7]}	No	No	No	No	Yes	No
.NET	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
OCaml	Yes	Yes	No	No	No	No	No	Yes	No
OmniOutliner 3.6.2	Yes	Yes	Yes	No	No	No	No	?	?
PCRE	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Perl	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
PHP	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Python	Yes	Yes	Yes	Yes	Yes^{[Note 8]}	Yes	Yes	Yes	Yes
Qt/QRegExp	Yes	Yes	Yes	Yes	No	Yes	No	Yes	Yes
R^{[Note 9]}	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
RE2	Yes	Yes	Yes	Yes	No	No	No	No	Yes
Ruby	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
TRE	Yes	Yes	Yes	Yes	No	No	No	Yes	No
Vim	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	No
RGX	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
Tcl	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes
TRegExpr	Yes	?	Yes	?	?	?	?	?	?
XML Schema	Yes	Yes	No	—	No	No	No	No	—
XPath 3/XQuery	Yes	Yes	Yes	Yes	No	No	No	Yes	Yes
XRegExp	Yes	Yes	Yes	Yes	No	Yes	No	Yes	Yes

^ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.
^ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.
^ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab".
^ http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions
^ http://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference
^ ^a ^b FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.
^ Lua's only non-greedy quantifier is -, which is a non-greedy version of *. It does not have non-greedy versions of + or ?; in the former case, the non-greedy effect can be achieved by repeating the token followed by -, but in the latter case, there is no equivalent.
^ Supported by the optional regex library only.
^ Regular Expressions as used in R

Part 2

Language feature comparison (part 2)
	Directives^{[Note 1]}	Conditionals	Atomic groups^{[Note 2]}	Named capture^{[Note 3]}	Comments	Embedded code	Unicode property support ^[3]	Balancing groups^{[Note 4]}	Variable-length look-behinds^{[Note 5]}
Boost.Regex	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	No	No
Boost.Xpressive	Yes	No	Yes	Yes	Yes	No	No	No	No
CL-PPCRE	Yes	Yes	Yes	Yes	Yes	Yes	Some^{[Note 6]}	No	No
EmEditor	Yes	Yes	?	?	Yes	No	?	No	No
FREJ	No	No	Yes	Yes	Yes	No	?	No	No
GLib/GRegex	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	No	No
GNU grep	Yes	Yes	?	Yes	Yes	No	No	No	No
Haskell	?	?	?	?	?	No	No	No	No
Helios RXPF	Yes	Yes	No	Yes	Yes	No	No	No	No
ICU Regex	Yes	No	Yes	Yes^{[Note 7]}	Yes	No	Yes	No	No
Java	Yes	No	Yes	Yes^{[Note 8]}	Yes	No	Some^{[Note 6]}	No	No
JavaScript (ECMAScript)	No	No	No	No	No	No	No	No	No
JGsoft	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	No	Yes
Lua	No	No	No	No	No	No	No	No	No
.NET	Yes	Yes	Yes	Yes	Yes	No	Some^{[Note 6]}	Yes	Yes
OCaml	No	No	No	No	No	No	No	No	No
OmniOutliner 3.6.2	?	?	?	?	No	No	?	No	No
PCRE	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No
Perl	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No^{[Note 9]}
PHP	Yes	Yes	Yes	Yes	Yes	No	No	No	No
Python	Yes	Yes	Yes^{[Note 10]}	Yes	Yes	No	Yes^{[Note 11]}	No	Yes^{[Note 10]}
Qt/QRegExp	No	No	No	No	No	No	No	No	No
RE2	Yes	No	?	Yes	No	No	Some^{[Note 6]}	No	No
Ruby	Yes	Yes	Yes	Yes	Yes	Yes	Some^{[Note 6]}	No	No
Tcl	Yes	No	Yes	No	Yes	No	Yes	No	No
TRE	Yes	No	No	No	Yes	No	?	No	No
Vim	Yes	No	Yes	No	No	No	No	No	Yes
RGX	Yes	Yes	Yes	Yes	Yes	No	Yes	No	No
XML Schema	No	No	No	No	No	No	Yes	No	No
XPath 3/XQuery	No	No	No	No	No	No	Yes	No	No
XRegExp	Leading only	No	No	Yes	Yes	No	Yes	No	No

^ Also known as Flags modifiers, Modes modifiers or Option letters. Example pattern: "(?i:test)".
^ Also called Independent sub-expressions
^ Similar to back references but with names instead of indices
^ Special feature allowing to match balanced constructs without recursion
^ Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable
^ ^a ^b ^c ^d ^e ^f ^g ^h Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply.
^ Available as of ICU55
^ Available as of JDK7
^ Experimental support added in v5.29.9
^ ^a ^b Supported by the optional regex library only.
^ May only be available in the regex library when used with Python versions after 3.3

API features

API feature comparison
	Native UTF-16 support^{[Note 1]}	Native UTF-8 support^{[Note 1]}	Multi-line matching	Partial match^{[Note 2]}
Boost.Regex	No	No	Yes	Yes
GLib/GRegex	Yes	Yes	Yes	Yes
Helios RXPF	Yes	Yes	No	Yes
ICU Regex	Yes	No	Yes	?
Java	No	Partial^{[Note 3]}	Yes	Yes
.NET	No^{[Note 4]}	Yes	Yes	?
PCRE	Yes^{[Note 5]}	Yes	Yes	Yes
Qt/QRegExp	Yes	No	No	?
Tcl	Yes	Yes^{[Note 6]}	Yes	?
TRE	No	?	Yes	?
RGX	No	No	Yes	?
wxWdigets::wxRegEx^{[Note 7]}	Yes	Yes	Yes	?
XRegExp	Yes	?	Yes	?

^ ^a ^b Means the format can be used internally without explicit conversion.
^ Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully.[1]
^ Supports Unicode 4.0 standard from 2003; latest plans for JDK7 include Unicode 6.0 (2011) support.[2]
^ Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010.[3].
^ Since version 8.30
^ Tcl includes facilities to convert to and from UTF-8.
^ wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library.

References

External links

Regular Expression Flavor Comparison — Detailed comparison of the most popular regular expression flavors
Regexp Syntax Summary
Online Regular Expression Testing — with support for Java, JavaScript, .Net, PHP, Python and Ruby
Implementing Regular Expressions — series of articles by Russ Cox, author of RE2
Regular Expression Engines

[boost_regex_formerly_regex-1] Formerly called Regex++

[fuzzy_regexp_libraries-2] One of fuzzy regular expression engines

[glib_gregex_version-3] Included since version 2.13.0

[icu4j-5] ICU4J, the Java version, does not support regular expressions.

[pcre_cpp-6] C++ bindings were developed by Google and became officially part of PCRE in 2006.

[boost_mars-12] ttp://www.digitalmars.com/d/2.0/phobos/std_regex.html

[dotnet_regex_license-13] ttps://github.com/dotnet/corefx/blob/7116584186f8f3a886616aaf8cb5d4a982c60e27/src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs#L2

[dotnet_license-14] ttps://github.com/dotnet/corefx#license

[non_greedy-19] Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.

[shy-20] Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.

[backref-21] Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab".

[boost_regex_recursion-22] ttp://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions

[xpressive_recursion-23] ttp://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference

[frej_non_greedy-24] FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.

[lua_non_greedy-25] Lua's only non-greedy quantifier is -, which is a non-greedy version of *. It does not have non-greedy versions of + or ?; in the former case, the non-greedy effect can be achieved by repeating the token followed by -, but in the latter case, there is no equivalent.

[python_regex_only2-26] Supported by the optional regex library only.

[r_regex-27] Regular Expressions as used in R

[directives_explanation-37] Also known as Flags modifiers, Modes modifiers or Option letters. Example pattern: "(?i:test)".

[atomic_grouping_explanation-38] Also called Independent sub-expressions

[named_groups_explanation-39] Similar to back references but with names instead of indices

[balancing_groups_explanation-41] Special feature allowing to match balanced constructs without recursion

[varlength_lookbehind_explanation-42] Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable

[properties_limited-43] ^ ^a ^b ^c ^d ^e ^f ^g ^h Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply.

[available_icu_55-44] Available as of ICU55

[available_java_7-45] Available as of JDK7

[perl5_varlength_lookbehind-46] Experimental support added in v5.29.9

[python_regex_only2-47] Supported by the optional regex library only.

[python3_regex_only-48] May only be available in the regex library when used with Python versions after 3.3

[unicode_native-60] Means the format can be used internally without explicit conversion.

[partial_match_explanation-61] Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully.[1]

[Partial_Uni4-62] Supports Unicode 4.0 standard from 2003; latest plans for JDK7 include Unicode 6.0 (2011) support.[2]

[UCS2-63] Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010.[3].

[8.30-64] Since version 8.30

[Conversion-65] Tcl includes facilities to convert to and from UTF-8.

[wxRegEx-66] wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library.

[4] ttps://intel.github.io/hyperscan/dev-reference/getting_started.html#requirements

[15] [4]

[40] ttps://www.unicode.org/reports/tr18/

[Note 1]

[Note 2]

[Note 3]

[1]

[Note 4]

[Note 5]

[Note 1]

[Note 2]

[Note 3]

[2]

[Note 1]

[Note 2]

[Note 3]

[Note 4]

[Note 5]

[Note 6]

[Note 7]

[Note 8]

[Note 9]

[Note 1]

[Note 2]

[Note 3]

[3]

[Note 4]

[Note 5]

[Note 6]

[Note 7]

[Note 8]

[Note 9]

[Note 10]

[Note 11]

[Note 1]

[Note 2]

[Note 3]

[Note 4]

[Note 5]

[Note 6]

[Note 7]