The Complete Guide to Regular Expressions (2026): Every Pattern & Engine Explained

Name: Regular Expression Pattern Library & Engine Comparison Dataset 2026
Creator: OnlineTools4Free
Published: 2026-04-14
License: https://creativecommons.org/licenses/by/4.0/

Executive Summary

Regular expressions are one of the most powerful and universally available tools in computing. Every programming language, every text editor, every database engine, and every web server supports regex in some form. Yet regex remains widely misunderstood, frequently misused, and occasionally feared. This guide provides a complete, practical reference that covers every aspect of regular expressions in 2026.

We cover the full regex syntax from basic character classes through advanced features like lookahead/lookbehind assertions, named groups, backreferences, atomic groups, possessive quantifiers, and Unicode property escapes. We compare 10 regex engines across 8 programming languages, analyze performance characteristics and catastrophic backtracking, and provide 100+ production-ready patterns you can copy and use immediately.

The report includes engine benchmarks, a pattern library searchable by category, common mistake analysis, a 40+ term glossary, 20 FAQ answers, and an embedded regex tester. Whether you are writing your first pattern or debugging a ReDoS vulnerability, this guide covers it.

100+

Ready-to-use patterns

Engines compared

40+

Glossary terms defined

FAQ questions answered

68% of JavaScript developers use regex regularly, making it the most common language for regex usage. TypeScript follows at 58%, Python at 62%.
Catastrophic backtracking remains the #1 regex security risk. Nested quantifiers like (a+)+ can cause exponential processing time on crafted input, enabling ReDoS attacks. Linear-time engines (RE2, Rust regex) eliminate this risk entirely.
ES2024 introduced the /v flag (Unicode Sets) enabling set subtraction and intersection in character classes, the most significant JavaScript regex addition since ES2018 lookbehinds.
PCRE2 with JIT compilation is the fastest backtracking engine at 0.05 microseconds per simple match, while the Rust regex crate leads among guaranteed-linear engines at 0.08 microseconds.

Part 1: History of Regular Expressions

Regular expressions have their roots in formal language theory. In 1951, mathematician Stephen Kleene described “regular events” using his mathematical notation, which he published formally in 1956 in his seminal paper “Representation of Events in Nerve Nets and Finite Automata.” Kleene was modeling the behavior of neural networks using simple algebras, and his notation for describing patterns in strings became the foundation of what we now call regular expressions.

The jump from theory to practice came in 1968 when Ken Thompson implemented regular expression search in the QED text editor at Bell Labs. Thompson’s key insight was translating regex into a non-deterministic finite automaton (NFA), which could be efficiently simulated. In 1973, Thompson created grep (“global regular expression print”), extracting the search functionality into a standalone Unix tool. grep became one of the most important Unix utilities and introduced regex to generations of programmers.

The modern era of regex began with Perl. Larry Wall’s Perl 1.0 (1987) included a rich regex syntax that went far beyond POSIX, adding features like non-greedy quantifiers, lookaheads, non-capturing groups, and later backreferences and Unicode support. Perl’s regex dialect became so influential that Philip Hazel created the PCRE (Perl Compatible Regular Expressions) library in 1997 to bring Perl-style regex to other languages. PCRE became the standard regex library for PHP, Apache, Nginx, R, and many other tools.

In 2007, Russ Cox (Google) created RE2, a regex engine that guarantees linear-time matching by using a DFA approach instead of backtracking. RE2 sacrifices features like backreferences and lookarounds but eliminates the possibility of catastrophic backtracking. This trade-off proved valuable for security-sensitive applications: Google uses RE2 for all user-facing regex features in its products. The Rust regex crate (2020) follows the same linear-time philosophy.

JavaScript regex has evolved significantly in recent years. ES2015 added the /u (Unicode) flag. ES2018 brought lookbehind assertions, named capturing groups, the /s (dotall) flag, and Unicode property escapes. ES2022 added the /d flag for match indices. ES2024 introduced the /v (Unicode Sets) flag, enabling set subtraction and intersection within character classes. These additions have made JavaScript regex comparable in power to PCRE for most practical patterns.

Regex History Timeline

23 rows

Year	Event	Era
1951	Stephen Kleene defines regular events / regular expressions	Theory
1956	Kleene publishes "Representation of Events in Nerve Nets"	Theory
1968	Ken Thompson implements regex in QED editor	Unix
1973	grep utility created for Unix (by Ken Thompson)	Unix
1979	awk (Aho, Weinberger, Kernighan) includes regex support	Unix
1986	POSIX Basic Regular Expressions (BRE) standardized	Standards
1986	Henry Spencer writes first portable regex library in C	Libraries
1987	Perl 1.0 introduces powerful regex syntax	Languages
1992	POSIX Extended Regular Expressions (ERE) finalized	Standards
1997	PCRE (Perl Compatible Regular Expressions) library released	Libraries
1998	ECMAScript 3 (JavaScript) adds native RegExp object	Languages
1999	Python re module stabilized (Python 1.6+)	Languages
2002	.NET adds named groups, lookbehind, conditionals	Languages
2004	Java 1.4 java.util.regex with full Unicode support	Languages
2007	RE2 engine by Russ Cox (guaranteed linear-time)	Engines
2012	Unicode 6.1 script property support in major engines	Unicode
2015	ES2015 adds /u (Unicode) flag to JavaScript	Languages
2018	ES2018 adds lookbehind assertions, named groups, /s flag	Languages
2020	Rust regex crate 1.0 achieves both safety and speed	Engines
2022	ES2022 adds /d (match indices) flag	Languages
2024	ES2024 adds /v (Unicode sets) flag for set operations	Languages
2025	PCRE2 10.44 adds extended callout features	Libraries
2026	Most engines support Unicode 16.0 property escapes	Unicode

Part 2: Character Classes

Character classes define a set of characters to match at a single position. They are the building blocks of every regex pattern. A character class can be as simple as a literal character (a matches “a”) or as complex as a Unicode property escape matching characters from a specific writing system.

Basic Character Classes

The dot (.) metacharacter matches any single character except a newline (\n). With the /s (dotall) flag, the dot matches newlines as well. Square brackets create custom character classes: [abc] matches any single character that is a, b, or c. Ranges are specified with a hyphen: [a-z] matches any lowercase letter, [0-9] matches any digit. Multiple ranges can be combined: [a-zA-Z0-9] matches any alphanumeric character.

Negated character classes start with ^ inside the brackets: [^abc] matches any character that is NOT a, b, or c. [^0-9] matches any non-digit character. The caret must be the first character after the opening bracket to negate; elsewhere it is a literal ^.

Shorthand Character Classes

Shorthand classes provide convenient notation for common character sets. \d matches any digit [0-9] (or Unicode digits with /u). \D matches any non-digit. \w matches word characters [a-zA-Z0-9_]. \W matches non-word characters. \s matches whitespace (space, tab, newline, form feed, carriage return). \S matches non-whitespace. These shorthands are available in every modern regex engine.

Unicode Character Properties

Unicode property escapes (\p{Property}) match characters by their Unicode category, script, or other property. \p{L} matches any Unicode letter (Latin, Cyrillic, Han, Arabic, etc.), \p{N} matches any number, \p{P} matches any punctuation. Script-specific matching like \p{Script=Latin} matches only Latin-script characters. The ES2024 /v flag adds set operations: [[\p{Letter}--[\p{Script=Greek}]] matches any letter that is NOT Greek.

Key Finding

Always use the /u flag (or /v in ES2024+) when working with non-ASCII text in JavaScript regex.

Without /u, JavaScript treats strings as UTF-16 code units, which breaks emoji matching, supplementary characters, and property escapes. The /u flag enables correct code-point-based matching.

Character Classes Reference

22 rows

Syntax	Description	Example	Category
.	Any character except newline (unless /s flag)	a.c matches "abc", "a1c"	Basic
\d	Digit character [0-9]	\d{3} matches "123"	Shorthand
\D	Non-digit character [^0-9]	\D+ matches "abc"	Shorthand
\w	Word character [a-zA-Z0-9_]	\w+ matches "hello_42"	Shorthand
\W	Non-word character [^a-zA-Z0-9_]	\W matches "!"	Shorthand
\s	Whitespace [\t\n\r\f\v ]	\s+ matches " "	Shorthand
\S	Non-whitespace character	\S+ matches "hello"	Shorthand
[abc]	Character set: a, b, or c	[aeiou] matches vowels	Set
[^abc]	Negated set: not a, b, or c	[^0-9] matches non-digits	Set
[a-z]	Range: a through z	[a-zA-Z] matches letters	Set
[\u{1F600}-\u{1F64F}]	Unicode range (with /u flag)	Matches emoticon block	Unicode
\p{L}	Unicode Letter category (with /u)	\p{L}+ matches "cafe"	Unicode
\p{N}	Unicode Number category	\p{N}+ matches "42"	Unicode
\p{P}	Unicode Punctuation category	\p{P} matches "."	Unicode
\p{S}	Unicode Symbol category	\p{S} matches "$"	Unicode
\p{Script=Latin}	Unicode Latin script	Matches Latin letters	Unicode
\p{Script=Han}	Unicode CJK script	Matches Chinese characters	Unicode
\p{Emoji}	Unicode emoji property (ES2024 /v)	Matches emoji characters	Unicode
\b	Word boundary	\bcat\b matches "cat" not "catch"	Boundary
\B	Non-word boundary	\Bcat\B matches "concatenate"	Boundary

Page 1 of 2

Part 3: Quantifiers

Quantifiers specify how many times the preceding element should be matched. They are the mechanism that transforms a single-character match into a multi-character pattern. Every regex quantifier has three modes: greedy (default), lazy, and possessive.

Greedy vs. Lazy vs. Possessive

Greedy quantifiers (*, +, ?, {n,m}) match as many characters as possible, then give back characters one at a time if the rest of the pattern fails. This is the default behavior. Lazy quantifiers (*?, +?, ??, {n,m}?) match as few characters as possible, then expand one character at a time. Possessive quantifiers (*+, ++, ?+, {n,m}+) match as many characters as possible and never backtrack — they either succeed or fail immediately.

Example: given the input “aaa” and the pattern a{2,4}, greedy matches “aaa” (maximum 3), lazy matches “aa” (minimum 2), and possessive matches “aaa” (maximum 3, no backtracking). The difference matters when the quantifier is followed by more pattern elements that might require the quantifier to give back characters.

Possessive quantifiers are supported by PCRE, Java, and .NET but not by JavaScript or Python. They are particularly valuable for preventing catastrophic backtracking: if you know a quantifier should never give back characters, making it possessive eliminates the possibility of exponential backtracking.

Quantifiers Reference

6 rows

Syntax	Name	Description	Example	Lazy Form	Possessive Form
*	Star	0 or more (greedy)	a* matches "", "a", "aaa"	*?	*+
+	Plus	1 or more (greedy)	a+ matches "a", "aaa" not ""	+?	++
?	Question	0 or 1 (greedy)	colou?r matches "color", "colour"	??	?+
{n}	Exact	Exactly n times	\d{4} matches "2026"	N/A	N/A
{n,}	At least	n or more times	\d{2,} matches "42", "123"	{n,}?	{n,}+
{n,m}	Range	Between n and m times	\d{2,4} matches "42", "2026"	{n,m}?	{n,m}+

Characters matched by a{quantifier} on input 'aaa'

Source: OnlineTools4Free Research

Part 4: Groups & Backreferences

Groups serve two purposes in regex: they apply quantifiers or alternation to a sub-expression, and they capture matched text for later use. Understanding the different group types is essential for writing effective patterns.

Capturing vs. Non-Capturing Groups

A capturing group (expr) saves the matched text in a numbered slot. Groups are numbered left-to-right by their opening parenthesis: in (a)(b(c)), group 1 captures “a”, group 2 captures “bc”, and group 3 captures “c”. Non-capturing groups (?:expr) group the expression without capturing, which improves performance and avoids cluttering the match result.

Named capturing groups (?<name>expr) provide a readable alternative to numbered groups. Instead of referencing \1, you reference \k<name>. In JavaScript, named captures appear in the match.groups object. Named groups are self-documenting: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) is immediately readable.

Backreferences

A backreference \1 (or \k<name>) matches the exact same text that was captured by the referenced group. This is fundamentally different from repeating the group pattern: (\w+)\s\1 matches “the the” (same word repeated) but (\w+)\s(\w+) matches “the cat” (any two words). Backreferences are powerful for detecting duplicates, matching delimiters, and validating symmetric structures.

Important: backreferences make a regex non-regular in the formal language theory sense. This means engines that guarantee linear time (RE2, Rust regex) cannot support backreferences. NFA-based engines (PCRE, JavaScript, Java, Python) support them but at the cost of potential exponential backtracking.

Atomic Groups and Conditionals

Atomic groups (?>expr) prevent backtracking into the group after it has matched. Once the engine commits to a match inside an atomic group, it cannot reconsider. This is a powerful tool for preventing catastrophic backtracking. Conditional patterns (?(condition)yes|no) match different sub-patterns depending on whether a condition is met (typically whether a previous group captured anything).

Group Types Reference

12 rows

Syntax	Name	Description	Example	Engine Support
(expr)	Capturing Group	Captures matched text for backreference	(abc)\1 matches "abcabc"	All engines
(?:expr)	Non-Capturing Group	Groups without capturing	(?:abc)+ matches "abcabc"	All engines
(?<name>expr)	Named Capturing Group	Named capture for readability	(?<year>\d{4})	PCRE, JS, .NET, Python, Java
(?=expr)	Positive Lookahead	Asserts what follows matches	\d(?=px) matches "5" in "5px"	All engines
(?!expr)	Negative Lookahead	Asserts what follows does not match	\d(?!px) matches "5" in "5em"	All engines
(?<=expr)	Positive Lookbehind	Asserts what precedes matches	(?<=\$)\d+ matches "100" in "$100"	PCRE, JS (ES2018+), .NET, Python, Java
(?<!expr)	Negative Lookbehind	Asserts what precedes does not match	(?<!\$)\d+ matches "100" in "EUR100"	PCRE, JS (ES2018+), .NET, Python, Java
(?>expr)	Atomic Group	No backtracking into group	(?>a+)b prevents catastrophic backtracking	PCRE, .NET, Java, Ruby
(?P<name>expr)	Python Named Group	Python-style named group	(?P<year>\d{4})	Python, PCRE
(?(1)yes\|no)	Conditional	If group 1 matched, try yes else no	(a)?(?(1)b\|c)	PCRE, .NET, Python
\1	Backreference	Matches same text as group 1	(\w+)\s\1 matches "the the"	All engines
\k<name>	Named Backreference	Matches same text as named group	(?<word>\w+)\s\k<word>	PCRE, JS, .NET, Java

Part 5: Lookahead & Lookbehind Assertions

Lookahead and lookbehind assertions (collectively called “lookarounds”) are zero-width assertions that check whether a pattern matches at a position without consuming any characters. They are among the most powerful and most misunderstood regex features.

Positive and Negative Lookahead

Positive lookahead (?=expr) asserts that what follows the current position matches expr. The classic use case is password validation with multiple requirements: ^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$ uses three lookaheads at position 0 to verify the presence of uppercase, lowercase, and digit characters anywhere in the string, then the .{8,}$ matches the actual 8+ characters.

Negative lookahead (?!expr) asserts that what follows does NOT match. Example: \d+(?!px) matches numbers not followed by “px”, useful for matching values in “5em” but not in “5px”. Another example: ^(?!.*admin).*$ matches strings that do NOT contain “admin” anywhere.

Positive and Negative Lookbehind

Positive lookbehind (?<=expr) asserts that what precedes the current position matches expr. Example: (?<=\$)\d+ matches digits preceded by a dollar sign, capturing “100” from “$100” without including the dollar sign in the match. Negative lookbehind (?<!expr) asserts that the preceding text does NOT match.

Lookbehind support varies significantly across engines. JavaScript added lookbehind in ES2018 (Chrome 62+, Firefox 78+, Node 10+) and supports variable-length lookbehinds. Python’s re module allows only fixed-length lookbehinds (each alternative must have the same length). PCRE2, .NET, and Java support variable-length lookbehinds. RE2 and the Rust regex crate do not support lookbehinds at all.

Practical Lookaround Patterns

Lookarounds excel in several scenarios. For adding thousands separators: (?<=\d)(?=(\d{3})+(?!\d)) inserts a comma between a digit and a group of three digits. For matching words at boundaries without consuming the boundary: (?<=\s|^)word(?=\s|$). For extracting content between delimiters without including them: (?<=").*?(?=").

A powerful technique is the “tempered greedy token”: (?:(?!stop).)*stop matches everything up to “stop” without crossing it. At each position, the negative lookahead verifies that “stop” does not start here, then the dot consumes one character. This pattern is linear-time and works across engines, though it is slower than a simple negated class due to the per-character assertion.

Key Finding

Lookaround assertions do not consume characters. They assert a condition at a position, allowing the main pattern to match without including the asserted text.

This makes lookarounds essential for complex validation (multiple conditions on the same text), context-sensitive matching (match X only when preceded/followed by Y), and extraction without delimiters.

Regex Flags Reference

9 rows

Flag	Name	Description	Support	Notes
g	Global	Find all matches, not just the first	JS, PCRE, Python (re.findall)	In most languages, iteration or findall replaces the flag
i	Case-insensitive	Match upper and lowercase equivalently	All engines	Unicode case folding may differ from ASCII-only
m	Multiline	^ and $ match line boundaries, not string boundaries	All engines	In Ruby, /m is the dotall flag instead
s	Dotall / Single-line	. matches newline characters	PCRE, JS (ES2018+), Python (re.DOTALL), .NET	Without this, . stops at \n
u	Unicode	Enable full Unicode matching and \p{} support	JS (ES2015+), PCRE2	Critical for non-ASCII text. Always use in modern JS.
v	Unicode Sets	Unicode set operations, improved classes	JS (ES2024+)	Superset of /u. Enables set subtraction, intersection.
x	Extended / Verbose	Allow whitespace and comments in pattern	PCRE, Python (re.VERBOSE), Ruby, Java	Great for documenting complex patterns
d	Match indices	Return start/end indices for each capture	JS (ES2022+)	Adds indices property to match result
y	Sticky	Match only at lastIndex position	JS (ES2015+)	Used for lexer/tokenizer implementations

Part 6: Unicode & Property Escapes

Unicode support in regex has progressed from an afterthought to a first-class feature. Modern applications must handle text in multiple scripts, emoji, and supplementary characters. Without proper Unicode support, regex patterns break on non-ASCII input in subtle and frustrating ways.

The Unicode Problem in JavaScript

JavaScript uses UTF-16 encoding internally. Characters above U+FFFF (including most emoji) are represented as two 16-bit “surrogate” code units. Without the /u flag, JavaScript regex treats each surrogate as a separate character. This means /^.$/ fails to match a single emoji character, and quantifiers apply to individual surrogates rather than complete code points. The /u flag fixes this by switching to code-point semantics.

The /u flag also enables Unicode property escapes: \p{Letter} matches any letter in any script, \p{Script=Cyrillic} matches Cyrillic characters, and \P{Letter} (uppercase P) matches non-letters. The ES2024 /v flag (Unicode Sets) extends this further with set operations: [[a-z]--[aeiou]] matches consonants, and [[\p{Letter}&&\p{Script=Latin}]] matches only Latin letters.

Unicode Categories and Scripts

Unicode defines General Categories that classify every character: Letter (L), Number (N), Punctuation (P), Symbol (S), Separator (Z), Mark (M), and Other (C). Each has subcategories: Lu (uppercase letter), Ll (lowercase letter), Nd (decimal digit number). Regex engines expose these via \p{Lu}, \p{Nd}, etc.

Unicode Scripts assign characters to writing systems: Latin, Cyrillic, Han, Arabic, Devanagari, and 150+ others. Script matching is essential for input validation in multilingual applications. For example, to accept only Latin and common characters: ^[\p{Script=Latin}\p{Common}]+$. To detect mixed-script text (potential homograph attacks): check if input contains characters from multiple scripts.

Regex Usage by Programming Language (% of developers using regex regularly)

Source: OnlineTools4Free Research

Regex Usage by Language

10 rows

Language	Usage (%)	Engine	Named Groups	Lookbehind
JavaScript	68	Irregexp (V8)	Yes	Yes (ES2018)
Python	62	SRE	Yes	Fixed-length
Java	45	NFA backtracking	Yes	Finite length
PHP	35	PCRE2	Yes	Variable length
Go	28	RE2 variant	Yes	No
C#	25	NFA backtracking	Yes	Variable length
Ruby	18	Oniguruma	Yes	Variable length
Rust	15	Hybrid NFA/DFA	Yes	No
TypeScript	58	Irregexp (V8)	Yes	Yes (ES2018)
Shell/Bash	42	POSIX / PCRE	PCRE only	PCRE only

Part 7: Regex Engines Compared

Not all regex engines are created equal. The choice of engine determines which features are available, what performance guarantees exist, and whether catastrophic backtracking is possible. Understanding engine differences is essential when writing patterns that must work across languages.

NFA vs. DFA Engines

NFA (Non-deterministic Finite Automaton) engines use backtracking to explore possible matches. They support the full regex feature set including backreferences, lookarounds, atomic groups, and possessive quantifiers. The trade-off is that pathological patterns can cause exponential worst-case performance. PCRE, JavaScript, Python, Java, Ruby, and .NET all use NFA engines.

DFA (Deterministic Finite Automaton) engines process each input character exactly once, guaranteeing O(n) time complexity regardless of the pattern. They cannot support backreferences or lookarounds because these features require tracking state that DFA engines do not maintain. RE2 (Google, used by Go) and the Rust regex crate use hybrid NFA/DFA approaches that provide linear-time guarantees while supporting most common features.

Engine Feature Matrix

The table below compares 10 regex engines across key features. PCRE2 has the richest feature set (recursion, callouts, conditionals, variable-length lookbehind). JavaScript has improved rapidly with ES2018-2024 additions but still lacks atomic groups, possessive quantifiers, and recursion. RE2 and the Rust regex crate trade features for guaranteed linear-time performance.

Regex Engine Comparison

10 rows

Engine	Language	Type	Variable Lookbehind	Atomic Groups	Recursion	Speed
PCRE2	C (used by PHP, R, Nginx)	Backtracking (NFA)	Yes	Yes	Yes	Fast
JavaScript (V8)	JavaScript	Backtracking (NFA)	Yes (ES2018+)	No	No	Fast
.NET (System.Text.RegularExpressions)	C#, F#, VB.NET	Backtracking (NFA)	Yes	Yes	Yes (balancing groups)	Fast
Python (re)	Python	Backtracking (NFA)	Fixed-length only	No	No (use regex module)	Moderate
Java (java.util.regex)	Java, Kotlin	Backtracking (NFA)	Finite (not *)	Yes (?>)	No	Moderate
RE2	Go, C++	Thompson NFA / DFA	No lookbehind	N/A	N/A	Very fast (linear guarantee)
Rust (regex crate)	Rust	Hybrid NFA/DFA	No lookbehind	N/A	N/A	Very fast (linear guarantee)
Ruby (Oniguruma)	Ruby	Backtracking (NFA)	Yes	Yes	Yes	Fast
POSIX BRE	sed, grep (default)	DFA/NFA	No	No	No	Fast
POSIX ERE	grep -E, awk	DFA/NFA	No	No	No	Fast

Regex Engine Speed: Simple vs Complex Pattern (microseconds, lower is better)

Source: OnlineTools4Free Research

Part 8: Performance & ReDoS

Regex performance is dominated by one concern: catastrophic backtracking. A well-written regex runs in linear time O(n) on any input. A poorly written regex can run in exponential time O(2^n), hanging the application for seconds, minutes, or effectively forever on crafted input. Understanding and preventing catastrophic backtracking is the most important performance skill in regex.

Catastrophic Backtracking Explained

Catastrophic backtracking occurs when a pattern has overlapping or nested quantifiers that create an exponential number of ways to match the input. The canonical example is (a+)+b applied to the input “aaaaaaaaaaac”. The outer + and inner + both try to match the ‘a’ characters. When the engine fails to find ‘b’ at the end, it backtracks through every possible partition of the ‘a’ characters between the two quantifiers: (aaaa)(aa), (aaa)(aaa), (aaa)(aa)(a), etc. With n ‘a’ characters, there are 2^(n-1) possible partitions.

Other dangerous patterns include: (a|a)*b (alternation with overlap), (a*)*b (nested star quantifiers), and .*X.* on long inputs without X (quadratic backtracking). The fix is to eliminate ambiguity: rewrite (a+)+b as a+b (since the nested grouping adds nothing), use atomic groups (?>a+)b to prevent backtracking into the group, or use possessive quantifiers a++b.

ReDoS: Regex Denial of Service

ReDoS attacks exploit catastrophic backtracking in server-side regex patterns. If a web application uses a vulnerable regex to validate user input, an attacker can craft input that causes the regex to run for minutes, consuming a CPU core and potentially bringing down the service. Notable ReDoS incidents have affected Node.js packages (ua-parser-js, moment.js), Cloudflare (2019 outage caused by a single regex), and Stack Overflow.

Prevention strategies: use a linear-time engine (RE2, Rust regex) for user-facing validation; set timeouts on regex execution (.NET supports this natively; in Node.js, use worker threads with timeouts); run static analysis tools (safe-regex, rxxr2, redos-checker) on patterns; avoid processing untrusted input with complex patterns; and review all regex patterns that include nested quantifiers, alternation with overlapping alternatives, or .* in the middle of a pattern.

Key Finding

Every regex used on user-controlled input should be reviewed for ReDoS vulnerability. A single vulnerable pattern can take down an entire service.

Use linear-time engines (RE2, Rust regex) for untrusted input, or validate patterns with static analysis tools. Set hard timeouts on regex execution.

Regex Performance: Backtracking Complexity

8 rows

Pattern	Input Size	Match Time (ms)	No-Match Time (ms)	Complexity	Issue
/aab/	20	0.001	450	O(2^n)	Catastrophic backtracking: overlapping a* quantifiers
/(a+)+b/	25	0.001	3200	O(2^n)	Nested quantifiers: exponential on non-match
/(a\|a)*b/	25	0.001	2800	O(2^n)	Alternation with overlapping: same exponential
/^[a-z]+$/	1000000	15	15	O(n)	Linear: no backtracking needed
/\d{4}-\d{2}-\d{2}/	1000000	22	20	O(n)	Linear: deterministic match/skip
/(?>a+)b/	25	0.001	0.001	O(n)	Atomic group prevents backtracking
/^(?:(?!ab).)*$/	10000	8	0.01	O(n)	Tempered greedy token: linear but slower constant
/\b\w+@\w+\.\w+\b/	100000	12	10	O(n)	Simple email-like: linear scan

Engine Speed: Email Pattern Scan on 10KB Text (microseconds)

Source: OnlineTools4Free Research

Part 9: 100+ Ready-to-Use Regex Patterns

The table below contains over 100 production-ready regex patterns organized by category. Each pattern includes a description, example match, and implementation notes. You can search, sort, and filter the table to find the pattern you need, and download the entire collection as CSV for offline reference.

Categories covered: Email, Phone, URL, IP Address, Date/Time, Password Validation, Credit Card, HTML/Markup, Numbers, Code/Programming, Network, File/Path, Text Processing, Data Formats, Identifiers, Security/Log, Markdown, and Search/Replace patterns.

Important: these patterns are starting points. Real-world validation should combine regex with additional logic (Luhn checksum for credit cards, DNS lookup for email domains, date range validation for dates). No single regex can replace proper parsing for complex formats like HTML or JSON.

100+ Regex Patterns Library

95 rows

Category	Description	Pattern	Example	Notes
Email	Email address (basic)	^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$	[email protected]	Covers 99% of valid emails. Does not handle quoted local parts.
Email	Email address (RFC 5322 simplified)	^[a-zA-Z0-9.!#$%&'+/=?^_`{\|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)$	[email protected]	Closely follows RFC 5322 without quoted strings.
Email	Common provider email only	^[a-zA-Z0-9._%+-]+@(gmail\|yahoo\|outlook\|hotmail)\.[a-z]{2,}$	[email protected]	Restricts to major providers.
Phone	International phone (E.164)	^\+?[1-9]\d{1,14}$	+14155552671	ITU-T E.164 format. Up to 15 digits.
Phone	US phone number	^$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}$	(415) 555-2671	Matches (XXX) XXX-XXXX, XXX-XXX-XXXX, XXXXXXXXXX.
Phone	UK mobile phone	^\+44\s?7\d{3}\s?\d{6}$	+44 7911 123456	UK mobile numbers start with 07.
Phone	French mobile phone	^\+33\s?[67]\d{8}$	+33 612345678	French mobile starts with 06 or 07.
Phone	German mobile phone	^\+49\s?1[567]\d{1,2}\s?\d{7,8}$	+49 151 12345678	German mobile prefixes: 015x, 016x, 017x.
URL	URL (basic match)	https?://[^\s/$.?#].[^\s]*	https://example.com/path	Matches most HTTP/HTTPS URLs in text.
URL	URL (strict validation)	^https?://(?:www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_+.~#?&/=]*)$	https://www.example.com/path?q=1	Validates full URL structure with query params.
URL	YouTube video URL	^(?:https?://)?(?:www\.)?(?:youtube\.com/watch\?v=\|youtu\.be/)([a-zA-Z0-9_-]{11})	https://youtu.be/dQw4w9WgXcQ	Captures 11-char video ID.
URL	GitHub repo URL	^(?:https?://)?(?:www\.)?github\.com/([a-zA-Z0-9-]+)/([a-zA-Z0-9._-]+)(?:/.*)?$	https://github.com/user/repo	Captures owner and repo name.
IP	IPv4 address	^(?:(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)\.){3}(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)$	192.168.1.1	Validates 0-255 range per octet.
IP	IPv6 address (simplified)	^(?:(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\|::(?:[0-9a-fA-F]{1,4}:){0,5}[0-9a-fA-F]{1,4})$	2001:0db8:85a3::8a2e:0370:7334	Handles full and :: shortened forms.
IP	IPv4 CIDR notation	^(?:(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)\.){3}(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)/(?:3[0-2]\|[12]?\d)$	10.0.0.0/24	IP with subnet mask /0-/32.
Date	Date ISO 8601 (YYYY-MM-DD)	^\d{4}-(?:0[1-9]\|1[0-2])-(?:0[1-9]\|[12]\d\|3[01])$	2026-04-14	Does not validate day-of-month vs month length.
Date	Date US format (MM/DD/YYYY)	^(?:0[1-9]\|1[0-2])/(?:0[1-9]\|[12]\d\|3[01])/\d{4}$	04/14/2026	US date format with leading zeros.
Date	Date European (DD.MM.YYYY)	^(?:0[1-9]\|[12]\d\|3[01])\.(?:0[1-9]\|1[0-2])\.\d{4}$	14.04.2026	Common in Germany, France.
Date	Time 24-hour (HH:MM or HH:MM:SS)	^(?:[01]\d\|2[0-3]):[0-5]\d(?::[0-5]\d)?$	14:30:00	Validates 00:00-23:59:59.
Date	ISO 8601 datetime with timezone	^\d{4}-(?:0[1-9]\|1[0-2])-(?:0[1-9]\|[12]\d\|3[01])T(?:[01]\d\|2[0-3]):[0-5]\d:[0-5]\d(?:\.\d+)?(?:Z\|[+-](?:[01]\d\|2[0-3]):[0-5]\d)$	2026-04-14T14:30:00Z	Full ISO datetime with timezone.

Page 1 of 5

Part 10: Common Regex Mistakes

Even experienced developers make regex mistakes regularly. The patterns are powerful but unforgiving: a single missing backslash, anchor, or flag can produce subtly wrong results that pass initial tests but fail in production. The table below catalogs the most frequent mistakes and their fixes.

Common Regex Mistakes and Fixes

12 rows

Mistake	Example	Fix	Severity
Not escaping special characters	Using . instead of \. for literal dot	Escape metacharacters: . * + ? ^ $ { } [ ] ( ) \| \	High
Greedy matching when lazy is needed	"<.*>" matches "<a>text</b>" entirely	Use lazy quantifier: "<.?>" or be specific: "<[^>]>"	High
Catastrophic backtracking	/(a+)+b/ on "aaaaaaaaaaac" hangs	Avoid nested quantifiers. Use atomic groups or rewrite.	Critical
Missing anchors for validation	/\d{5}/ matches "abc12345xyz"	Add anchors: /^\d{5}$/ for exact match	High
Forgetting multiline flag	/^line$/ does not match individual lines	Add /m flag so ^ and $ match line boundaries	Medium
Not using Unicode flag	/\w+/ does not match accented letters	Use /u flag and \p{L} for Unicode letters	High
Using regex to parse HTML	/<div>(.*)<\/div>/ fails on nested divs	Use a proper HTML parser (DOMParser, cheerio, etc.)	Critical
Overly specific patterns	Hardcoding whitespace as single space	Use \s+ to match any whitespace (tabs, newlines, etc.)	Medium
Case sensitivity oversight	/hello/ does not match "Hello"	Add /i flag for case-insensitive matching	Low
Using capturing groups unnecessarily	(abc)+ when you do not need the capture	Use non-capturing groups: (?:abc)+	Low
Not testing edge cases	Email regex that rejects valid + in local part	Test with +, dots, long TLDs, international domains	Medium
Matching too broadly	.* at the start/end of patterns	Be as specific as possible to reduce false positives	Medium

Regex in Tools and Frameworks

Regex is embedded in virtually every developer tool. Understanding which engine each tool uses helps you write compatible patterns. VS Code uses the JavaScript engine, so you get named groups and lookbehind. grep with -P uses PCRE, giving you the full feature set. ripgrep uses the Rust regex crate, which is fast but lacks lookaheads. Nginx uses PCRE2 for location matching.

Regex in Developer Tools and Frameworks

10 rows

Tool	Regex Usage	Engine	Notes
VS Code	Search & replace	JavaScript (V8)	Toggle regex with Alt+R. Supports lookahead, named groups.
grep / ripgrep	Text search	POSIX / PCRE / Rust regex	grep -P for PCRE, rg uses Rust regex (linear-time).
sed	Stream editing	POSIX BRE / ERE (-E)	sed -E for extended regex. No lookaheads.
awk	Pattern matching	POSIX ERE	Built-in pattern matching and splitting.
Nginx	Location matching, rewrites	PCRE2	~ for case-sensitive, ~* for case-insensitive.
Apache (.htaccess)	URL rewriting (mod_rewrite)	PCRE	RewriteRule uses PCRE syntax.
ESLint	Code pattern matching	JavaScript	No-invalid-regexp rule prevents bad patterns.
React Router	Route path matching	path-to-regexp	Uses parameterized patterns, not raw regex.
Cloudflare WAF	Security rule matching	RE2 (linear-time)	Uses RE2 to prevent ReDoS in WAF rules.
Google Analytics	Filter patterns, audience rules	RE2	Limited regex syntax (no lookaheads).

Glossary: 40+ Regex Terms Defined

This glossary defines every essential regex term used in this guide. Terms are organized alphabetically within categories. Each definition provides practical context for working developers.

API

Match Object

The object returned by a regex match operation containing the matched text, captured groups, start/end indices, and named groups. In JavaScript: RegExp.exec() returns a match array with index and groups properties.

Advanced

Conditional

A construct (?(condition)yes|no) that matches one pattern if a condition is met and another if it is not. The condition can check if a group was captured: (?(1)yes|no). Supported by PCRE, .NET, and Python.

Recursion

A PCRE/.NET feature where a pattern can call itself or a named group: (?R) or (?&name). Used to match nested structures like balanced parentheses. Not available in JavaScript, Python re, Go, or Rust.

Subroutine

A PCRE feature that calls a captured group as a subroutine: (?1) calls group 1. Unlike backreferences, subroutines re-execute the group pattern rather than matching the same captured text. Useful for reusing sub-patterns.

Assertions

Anchor

A zero-width assertion that matches a position rather than a character. ^ matches the start of the string (or line with /m), $ matches the end, \b matches a word boundary, \A matches absolute start, and \z matches absolute end.

Assertion

A regex construct that checks a condition without consuming characters. Lookaheads (?=) and lookbehinds (?<=) are assertions. Anchors (^, $, \b) are also assertions. They do not advance the engine position.

Lookahead

A zero-width assertion that checks if the text after the current position matches a pattern. Positive lookahead (?=expr) succeeds if expr matches. Negative lookahead (?!expr) succeeds if expr does not match. Does not consume characters.

Lookbehind

A zero-width assertion that checks if the text before the current position matches a pattern. Positive lookbehind (?<=expr) and negative lookbehind (?<!expr). Some engines require fixed-length lookbehinds (Python re, older Java).

Word Boundary

The \b assertion matches the position between a word character (\w) and a non-word character (or string boundary). \bcat\b matches "cat" in "the cat sat" but not in "concatenate". \B matches a non-word-boundary position.

Zero-Width

A match that has a length of zero characters. Anchors (^, $), word boundaries (\b), and lookarounds ((?=), (?<=)) are zero-width: they assert a condition at a position without consuming any input text.

Classes

Character Class

A set of characters enclosed in square brackets [...] that matches any single character in the set. [aeiou] matches any vowel. Ranges are specified with a hyphen: [a-z]. Negated classes [^...] match any character NOT in the set.

Negated Class

A character class starting with ^ that matches any character NOT in the class. [^aeiou] matches any non-vowel. [^0-9] is equivalent to \D. The ^ must be the first character after [.

POSIX Character Class

Named character classes like [:alpha:], [:digit:], [:alnum:], [:space:] used inside bracket expressions [[:alpha:]]. POSIX classes are locale-aware. Used by grep, sed, awk, and POSIX-compliant tools.

Engine

Backtracking

The process by which an NFA regex engine tries alternative paths when a match attempt fails. The engine backtracks to the most recent choice point and tries the next alternative. Excessive backtracking causes catastrophic performance.

Catastrophic Backtracking

Exponential-time behavior caused by ambiguous patterns with nested or overlapping quantifiers. The pattern (a+)+ on a non-matching input causes the engine to explore 2^n paths. Also called pathological backtracking or ReDoS (Regular expression Denial of Service).

DFA (Deterministic Finite Automaton)

A regex engine implementation that processes each input character exactly once, guaranteeing O(n) time complexity. RE2 and the Rust regex crate use DFA/NFA hybrid approaches. DFAs cannot support backreferences or lookarounds.

NFA (Non-deterministic Finite Automaton)

A regex engine implementation that uses backtracking to try all possible paths. NFA engines support backreferences, lookarounds, and other advanced features but can exhibit exponential worst-case performance on pathological patterns.

PCRE (Perl Compatible Regular Expressions)

A widely-used C library implementing Perl-style regex syntax. Used by PHP (preg_* functions), Nginx, R, and many other tools. PCRE2 is the current version with Unicode 16.0 support, JIT compilation, and extended features.

Flags

Dotall Mode

The /s flag that makes the dot (.) metacharacter match newline characters as well as all other characters. Without dotall, the dot matches everything except \n. In Ruby, this mode is confusingly triggered by /m.

Flag (Modifier)

A letter appended after the closing delimiter that modifies regex behavior. Common flags: i (case-insensitive), g (global), m (multiline), s (dotall), u (Unicode), x (extended/verbose).

Multiline Mode

The /m flag that changes ^ and $ to match at line boundaries (after/before \n) rather than only at the start/end of the entire string. Without /m, ^ and $ match only the absolute start and end.

Verbose Mode

The /x flag that allows whitespace and comments within regex patterns. Whitespace is ignored (use \s or [ ] for literal space), and # starts a comment to end-of-line. Essential for documenting complex patterns.

Groups

Atomic Group

A group (?>...) that, once matched, prevents the engine from backtracking into it. This eliminates catastrophic backtracking for certain patterns. Supported by PCRE, .NET, Java, and Ruby, but not JavaScript or Python re.

Backreference

A reference to a previously captured group, written as \1, \2, etc. (or \k<name> for named groups). The backreference matches the exact same text that the group captured. Example: (\w+)\s\1 matches repeated words like "the the".

Capture Group

A parenthesized expression (expr) that saves the matched text for later retrieval via backreferences or in the match result array. Groups are numbered left-to-right by opening parenthesis. Named groups (?<name>expr) provide readable access.

Named Group

A capturing group with a name instead of (or in addition to) a number. JavaScript/PCRE: (?<name>expr). Python: (?P<name>expr). Named groups make complex patterns more readable and maintainable. Referenced via \k<name>.

Non-Capturing Group

A group (?:expr) that groups a sub-expression for quantifiers or alternation without capturing the match. Useful for performance (no capture overhead) and clarity (does not affect group numbering).

Operators

Alternation

The | (pipe) operator that matches either the expression on its left or the one on its right. abc|def matches "abc" or "def". Alternation has the lowest precedence among regex operators, so grouping is often needed: (ab|cd)e.

Quantifiers

Greedy Quantifier

A quantifier that matches as many characters as possible while still allowing the overall pattern to match. *, +, ?, and {n,m} are greedy by default. The engine tries the maximum first, then backtracks if needed.

Lazy Quantifier

A quantifier that matches as few characters as possible. Created by appending ? to a greedy quantifier: *?, +?, ??, {n,m}?. The engine tries the minimum first, then expands if the rest of the pattern fails.

Possessive Quantifier

A quantifier that matches as many characters as possible and never backtracks. Written by appending + to a quantifier: *+, ++, ?+, {n,m}+. Fails immediately if the rest of the pattern cannot match. Prevents catastrophic backtracking.

Quantifier

A modifier that specifies how many times the preceding element should match. * (0+), + (1+), ? (0-1), {n} (exactly n), {n,m} (n to m). Quantifiers can be greedy (default), lazy (?), or possessive (+).

Security

ReDoS (Regular Expression Denial of Service)

A denial-of-service attack that exploits catastrophic backtracking in regex patterns. An attacker crafts input that causes exponential processing time. Prevented by using linear-time engines (RE2, Rust regex) or carefully reviewing patterns.

Syntax

Escape Sequence

A backslash followed by a character that represents something other than the literal character. \n = newline, \t = tab, \d = digit, \s = whitespace, \b = word boundary (in pattern) or backspace (in character class).

Literal

A character in a regex that matches itself. Most characters (a-z, 0-9) are literals. Metacharacters must be escaped with backslash to be treated as literals. Example: \. matches a literal period.

Metacharacter

A character with special meaning in regex syntax. The metacharacters are: . ^ $ * + ? { } [ ] ( ) | \. To match a metacharacter literally, it must be escaped with a backslash.

Unicode

Code Point

A numeric value assigned to each Unicode character. Written as U+XXXX (e.g., U+0041 = "A"). In regex, \u{1F600} (with /u flag) matches the character at that code point. JavaScript uses UTF-16 internally.

Surrogate Pair

In UTF-16 encoding, characters above U+FFFF are represented as two 16-bit code units (a surrogate pair). Without the /u flag, JavaScript regex treats each surrogate as a separate character, breaking emoji and CJK matching.

Unicode Category

Unicode defines general categories like Letter (L), Number (N), Punctuation (P), Symbol (S), and Separator (Z). Regex engines with Unicode support allow matching by category: \p{L} matches any letter in any script.

Unicode Property Escape

The \p{Property} syntax for matching characters by Unicode property. Supported properties include General_Category (\p{L}), Script (\p{Script=Latin}), and binary properties (\p{Emoji}). Requires the /u flag in JavaScript.

Unicode Script

Unicode assigns each character to a script (Latin, Cyrillic, Han, Arabic, etc.). The \p{Script=Latin} or \p{sc=Latn} syntax matches characters from a specific writing system. Useful for input validation in multilingual applications.

Frequently Asked Questions (20 Questions)

The most common questions about regular expressions, drawn from search data, developer forums, and Stack Overflow. Each answer provides actionable guidance.

What is a regular expression?

A regular expression (regex) is a sequence of characters that defines a search pattern. Regex is used for string matching, validation, search-and-replace, and text parsing. Every modern programming language includes a regex engine. The pattern /^\d{5}$/ matches exactly five digits, and /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/ matches email addresses.

What is the difference between greedy and lazy quantifiers?

Greedy quantifiers (*, +, ?) match as much text as possible, then backtrack if needed. Lazy quantifiers (*?, +?, ??) match as little as possible, then expand. Example: given "<a>text</b>", the greedy pattern "<.*>" matches the entire string, while the lazy "<.*?>" matches "<a>" and "</b>" separately. Use lazy quantifiers or negated character classes [^>]* when you need the shortest match.

How do I match a literal dot or other special character?

Precede the special character with a backslash: \. matches a literal period, \* matches a literal asterisk, \( matches a literal opening parenthesis. The regex metacharacters that need escaping are: . ^ $ * + ? { } [ ] ( ) | \. Inside a character class [...], only ], \, ^, and - have special meaning and need escaping.

What is catastrophic backtracking and how do I prevent it?

Catastrophic backtracking occurs when a regex has ambiguous patterns (typically nested quantifiers like (a+)+) that cause the engine to explore exponentially many paths on non-matching input. Prevention: avoid nested quantifiers, use atomic groups (?>...) or possessive quantifiers (++, *+), use specific character classes instead of .*, or use a linear-time engine like RE2 or the Rust regex crate.

What is a lookahead and when should I use one?

A lookahead (?=...) checks if the text ahead matches a pattern without consuming characters. Positive lookahead (?=expr) succeeds if the pattern matches; negative lookahead (?!expr) succeeds if it does not. Common uses: password validation with multiple requirements ((?=.*[A-Z])(?=.*\d).{8,}), matching a word only if followed by specific text (\w+(?=\s*=) to find variable names before =).

What is the difference between \d and [0-9]?

In most engines without Unicode mode, \d and [0-9] are equivalent (ASCII digits only). With Unicode mode (/u flag in JavaScript, Python, Java), \d may match digits from other scripts (Arabic-Indic, Devanagari, etc.) depending on the engine. For strict ASCII digit matching, use [0-9]. For international digit matching, use \d with the Unicode flag.

How do I validate an email address with regex?

A practical email regex is: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. This covers 99%+ of real-world email addresses. The full RFC 5322 specification allows quoted strings, comments, and IP literals that make a truly compliant regex extremely complex (thousands of characters). For production systems, use this simplified pattern for client-side validation and verify the email by sending a confirmation message.

What does the /g flag do and when do I need it?

The /g (global) flag in JavaScript causes methods like match() and replace() to find ALL matches in the string instead of stopping at the first. Without /g, "aaa".match(/a/) returns ["a"]. With /g, "aaa".match(/a/g) returns ["a", "a", "a"]. Be careful: with /g on a RegExp object, exec() maintains state via lastIndex, which can cause issues if the regex is reused.

How do named groups work?

Named groups (?<name>expr) let you assign a name to a capture group instead of using numeric indices. In JavaScript: const match = /(?<year>\d{4})-(?<month>\d{2})/.exec("2026-04"); match.groups.year is "2026". Named groups make patterns self-documenting and resistant to breakage when groups are added or removed. Backreference syntax: \k<name>. In Python, the syntax is (?P<name>expr) and (?P=name).

Can regex match nested structures like balanced parentheses?

Standard regex cannot match arbitrarily nested structures because regular languages cannot count unbounded nesting (the pumping lemma). However, some engines extend beyond regular languages: PCRE and .NET support recursion ((?R) or \g<0>) and balancing groups respectively, which CAN match balanced parentheses. For most practical purposes, use a parser or stack-based approach instead of regex for nested structures.

What is the difference between .* and [^x]*?

.* matches any character (except newline without /s flag) zero or more times. It is very broad and often causes over-matching. [^x]* matches any character EXCEPT x, zero or more times. Using negated classes is more precise and performant: to match content between quotes, "[^"]*" is better than ".*?" because it cannot cross quote boundaries and does not require backtracking.

How do I make regex case-insensitive?

Add the /i flag: /hello/i matches "Hello", "HELLO", "hElLo". In Python: re.compile("hello", re.IGNORECASE). In Java: Pattern.compile("hello", Pattern.CASE_INSENSITIVE). For partial case-insensitivity within a pattern, some engines support inline flags: (?i:hello) makes only "hello" case-insensitive while the rest of the pattern remains case-sensitive.

What is the Unicode /u flag and why should I always use it?

The /u flag (JavaScript ES2015+) enables full Unicode matching. Without it, JavaScript treats strings as sequences of 16-bit code units, breaking emoji and supplementary characters. With /u: the dot matches full code points (not half a surrogate pair), \p{} property escapes work, and quantifiers apply to full code points. Always use /u (or /v in ES2024+) in modern JavaScript for correct international text handling.

How do I test and debug regex patterns?

Use interactive tools: regex101.com (explanation + debugger), regexr.com (real-time highlighting), or the OnlineTools4Free regex tester embedded in this guide. For debugging: break complex patterns into parts, test each part separately, use verbose mode (/x flag) with comments, add test cases for edge cases (empty string, very long input, special characters, Unicode), and check for catastrophic backtracking with pathological inputs.

What is the difference between match(), exec(), and test()?

In JavaScript: test() returns true/false (/\d/.test("a1") returns true). exec() returns a match object with groups and indices (one match at a time, advances lastIndex with /g). match() without /g returns a match object (like exec). match() with /g returns an array of all matched strings (no groups). matchAll() (ES2020) returns an iterator of all match objects with groups. Use test() for validation, matchAll() for extraction.

How do I match across multiple lines?

Two flags are relevant: /m (multiline) makes ^ and $ match at line boundaries (not just string start/end). /s (dotall) makes . match newline characters. To match a pattern that spans lines, use /s so the dot crosses newlines, or use [\s\S] as an alternative to . that always matches any character including newlines (works without /s flag).

What is ReDoS and how do I protect against it?

ReDoS (Regular expression Denial of Service) exploits catastrophic backtracking to cause a server to hang. An attacker sends input designed to trigger exponential backtracking in a vulnerable regex. Protection: use a linear-time engine (RE2, Rust regex) for user-facing validation, set timeouts on regex operations (.NET supports this natively), avoid patterns with nested quantifiers on user input, and use static analysis tools (safe-regex, rxxr2) to detect vulnerable patterns.

How do I replace text using captured groups?

In JavaScript: "2026-04-14".replace(/(\d{4})-(\d{2})-(\d{2})/, "$2/$3/$1") returns "04/14/2026". $1, $2, $3 refer to captured groups. Named groups use $<name>: .replace(/(?<y>\d{4})-(?<m>\d{2})/, "$<m>/$<y>"). In Python: re.sub(r"(\d{4})-(\d{2})", r"\2/\1", text). The replacement string uses backreferences to rearrange captured text.

What are possessive quantifiers and when should I use them?

Possessive quantifiers (*+, ++, ?+) work like greedy quantifiers but never backtrack. Once they consume characters, they do not give them back. This makes them faster and prevents catastrophic backtracking, but they may fail to match where a greedy quantifier would succeed through backtracking. Use them when you know backtracking is not needed: [^"]*+ inside a quoted string match, or \d++ for a numeric field.

How do I use regex in find-and-replace in VS Code?

Enable regex mode by clicking the .* icon in the search bar (or pressing Alt+R). Use $1, $2 for captured groups in the replacement field. Example: find "(\w+):\s*(\w+)" and replace with "$2: $1" to swap key-value pairs. VS Code uses the JavaScript regex engine, so named groups (?<name>...) and $<name> replacements work. Use \n for newline in the replacement.

Try It Yourself

Use the embedded regex tester below to experiment with patterns from this guide. Paste any pattern from the library above and test it against your own input data in real time.

Regex Tester

Enter a regex pattern and test string to see matches highlighted in real time. Try patterns from the 100+ library above or write your own.

Try it yourself

Regex Tester

Open full tool

Regex Builder

Build regex patterns visually by selecting character classes, quantifiers, and groups. See the generated pattern and test it simultaneously.

Try it yourself

Regex Builder

Open full tool

Raw Data Downloads

All datasets used in this report are available for download. Use them for your own reference, teaching, or integration.

Citations & Sources

Kleene, S.C.. “Representation of Events in Nerve Nets and Finite Automata.” RAND Corporation, 1956. https://www.rand.org/pubs/research_memoranda/RM704.html

Thompson, K.. “Regular Expression Search Algorithm.” Communications of the ACM, 1968. https://dl.acm.org/doi/10.1145/363347.363387

Cox, R.. “Regular Expression Matching Can Be Simple And Fast.” 2007. https://swtch.com/~rsc/regexp/regexp1.html

Hazel, P.. “PCRE2 — Perl Compatible Regular Expressions (revised API).” pcre.org, 2025. https://www.pcre.org/current/doc/html/

ECMA International. “ECMAScript 2024 Language Specification: RegExp (Unicode Sets).” ECMA International, 2024. https://tc39.es/ecma262/#sec-regexp-regular-expression-objects

ECMA International. “ECMAScript 2018 Language Specification: Lookbehind Assertions.” ECMA International, 2018. https://tc39.es/ecma262/#sec-assertion

Friedl, J.E.F.. “Mastering Regular Expressions, 3rd Edition.” O'Reilly Media, 2006. https://www.oreilly.com/library/view/mastering-regular-expressions/0596528124/

Cox, R.. “RE2: A principled approach to regular expression matching.” Google, 2010. https://github.com/google/re2

The Rust Project. “regex — An implementation of regular expressions for Rust.” crates.io, 2024. https://docs.rs/regex/latest/regex/

Davis, J.C., et al.. “Why Aren't Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions.” ACM ESEC/FSE, 2019. https://doi.org/10.1145/3236024.3236058

Wustholz, V., Olivo, O., Heule, M., Dillig, I.. “Static Detection of DoS Vulnerabilities in Programs that Use Regular Expressions.” TACAS, 2017. https://doi.org/10.1007/978-3-662-54577-5_1

Unicode Consortium. “Unicode Technical Standard #18: Unicode Regular Expressions.” The Unicode Consortium, 2024. https://unicode.org/reports/tr18/

Executive Summary

100+

Ready-to-use patterns

Engines compared

40+

Glossary terms defined

FAQ questions answered

68% of JavaScript developers use regex regularly, making it the most common language for regex usage. TypeScript follows at 58%, Python at 62%.
Catastrophic backtracking remains the #1 regex security risk. Nested quantifiers like (a+)+ can cause exponential processing time on crafted input, enabling ReDoS attacks. Linear-time engines (RE2, Rust regex) eliminate this risk entirely.
ES2024 introduced the /v flag (Unicode Sets) enabling set subtraction and intersection in character classes, the most significant JavaScript regex addition since ES2018 lookbehinds.
PCRE2 with JIT compilation is the fastest backtracking engine at 0.05 microseconds per simple match, while the Rust regex crate leads among guaranteed-linear engines at 0.08 microseconds.

Part 1: History of Regular Expressions

Regex History Timeline

23 rows

Year	Event	Era
1951	Stephen Kleene defines regular events / regular expressions	Theory
1956	Kleene publishes "Representation of Events in Nerve Nets"	Theory
1968	Ken Thompson implements regex in QED editor	Unix
1973	grep utility created for Unix (by Ken Thompson)	Unix
1979	awk (Aho, Weinberger, Kernighan) includes regex support	Unix
1986	POSIX Basic Regular Expressions (BRE) standardized	Standards
1986	Henry Spencer writes first portable regex library in C	Libraries
1987	Perl 1.0 introduces powerful regex syntax	Languages
1992	POSIX Extended Regular Expressions (ERE) finalized	Standards
1997	PCRE (Perl Compatible Regular Expressions) library released	Libraries
1998	ECMAScript 3 (JavaScript) adds native RegExp object	Languages
1999	Python re module stabilized (Python 1.6+)	Languages
2002	.NET adds named groups, lookbehind, conditionals	Languages
2004	Java 1.4 java.util.regex with full Unicode support	Languages
2007	RE2 engine by Russ Cox (guaranteed linear-time)	Engines
2012	Unicode 6.1 script property support in major engines	Unicode
2015	ES2015 adds /u (Unicode) flag to JavaScript	Languages
2018	ES2018 adds lookbehind assertions, named groups, /s flag	Languages
2020	Rust regex crate 1.0 achieves both safety and speed	Engines
2022	ES2022 adds /d (match indices) flag	Languages
2024	ES2024 adds /v (Unicode sets) flag for set operations	Languages
2025	PCRE2 10.44 adds extended callout features	Libraries
2026	Most engines support Unicode 16.0 property escapes	Unicode

Part 2: Character Classes

Basic Character Classes

Shorthand Character Classes

Unicode Character Properties

Key Finding

Always use the /u flag (or /v in ES2024+) when working with non-ASCII text in JavaScript regex.

Without /u, JavaScript treats strings as UTF-16 code units, which breaks emoji matching, supplementary characters, and property escapes. The /u flag enables correct code-point-based matching.

Character Classes Reference

22 rows

Syntax	Description	Example	Category
.	Any character except newline (unless /s flag)	a.c matches "abc", "a1c"	Basic
\d	Digit character [0-9]	\d{3} matches "123"	Shorthand
\D	Non-digit character [^0-9]	\D+ matches "abc"	Shorthand
\w	Word character [a-zA-Z0-9_]	\w+ matches "hello_42"	Shorthand
\W	Non-word character [^a-zA-Z0-9_]	\W matches "!"	Shorthand
\s	Whitespace [\t\n\r\f\v ]	\s+ matches " "	Shorthand
\S	Non-whitespace character	\S+ matches "hello"	Shorthand
[abc]	Character set: a, b, or c	[aeiou] matches vowels	Set
[^abc]	Negated set: not a, b, or c	[^0-9] matches non-digits	Set
[a-z]	Range: a through z	[a-zA-Z] matches letters	Set
[\u{1F600}-\u{1F64F}]	Unicode range (with /u flag)	Matches emoticon block	Unicode
\p{L}	Unicode Letter category (with /u)	\p{L}+ matches "cafe"	Unicode
\p{N}	Unicode Number category	\p{N}+ matches "42"	Unicode
\p{P}	Unicode Punctuation category	\p{P} matches "."	Unicode
\p{S}	Unicode Symbol category	\p{S} matches "$"	Unicode
\p{Script=Latin}	Unicode Latin script	Matches Latin letters	Unicode
\p{Script=Han}	Unicode CJK script	Matches Chinese characters	Unicode
\p{Emoji}	Unicode emoji property (ES2024 /v)	Matches emoji characters	Unicode
\b	Word boundary	\bcat\b matches "cat" not "catch"	Boundary
\B	Non-word boundary	\Bcat\B matches "concatenate"	Boundary

Page 1 of 2

Part 3: Quantifiers

Greedy vs. Lazy vs. Possessive

Quantifiers Reference

6 rows

Syntax	Name	Description	Example	Lazy Form	Possessive Form
*	Star	0 or more (greedy)	a* matches "", "a", "aaa"	*?	*+
+	Plus	1 or more (greedy)	a+ matches "a", "aaa" not ""	+?	++
?	Question	0 or 1 (greedy)	colou?r matches "color", "colour"	??	?+
{n}	Exact	Exactly n times	\d{4} matches "2026"	N/A	N/A
{n,}	At least	n or more times	\d{2,} matches "42", "123"	{n,}?	{n,}+
{n,m}	Range	Between n and m times	\d{2,4} matches "42", "2026"	{n,m}?	{n,m}+

Characters matched by a{quantifier} on input 'aaa'

Source: OnlineTools4Free Research

Part 4: Groups & Backreferences

Capturing vs. Non-Capturing Groups

Backreferences

Atomic Groups and Conditionals

Group Types Reference

12 rows

Syntax	Name	Description	Example	Engine Support
(expr)	Capturing Group	Captures matched text for backreference	(abc)\1 matches "abcabc"	All engines
(?:expr)	Non-Capturing Group	Groups without capturing	(?:abc)+ matches "abcabc"	All engines
(?<name>expr)	Named Capturing Group	Named capture for readability	(?<year>\d{4})	PCRE, JS, .NET, Python, Java
(?=expr)	Positive Lookahead	Asserts what follows matches	\d(?=px) matches "5" in "5px"	All engines
(?!expr)	Negative Lookahead	Asserts what follows does not match	\d(?!px) matches "5" in "5em"	All engines
(?<=expr)	Positive Lookbehind	Asserts what precedes matches	(?<=\$)\d+ matches "100" in "$100"	PCRE, JS (ES2018+), .NET, Python, Java
(?<!expr)	Negative Lookbehind	Asserts what precedes does not match	(?<!\$)\d+ matches "100" in "EUR100"	PCRE, JS (ES2018+), .NET, Python, Java
(?>expr)	Atomic Group	No backtracking into group	(?>a+)b prevents catastrophic backtracking	PCRE, .NET, Java, Ruby
(?P<name>expr)	Python Named Group	Python-style named group	(?P<year>\d{4})	Python, PCRE
(?(1)yes\|no)	Conditional	If group 1 matched, try yes else no	(a)?(?(1)b\|c)	PCRE, .NET, Python
\1	Backreference	Matches same text as group 1	(\w+)\s\1 matches "the the"	All engines
\k<name>	Named Backreference	Matches same text as named group	(?<word>\w+)\s\k<word>	PCRE, JS, .NET, Java

Part 5: Lookahead & Lookbehind Assertions

Positive and Negative Lookahead

Positive and Negative Lookbehind

Practical Lookaround Patterns

Key Finding

Lookaround assertions do not consume characters. They assert a condition at a position, allowing the main pattern to match without including the asserted text.

Regex Flags Reference

9 rows

Flag	Name	Description	Support	Notes
g	Global	Find all matches, not just the first	JS, PCRE, Python (re.findall)	In most languages, iteration or findall replaces the flag
i	Case-insensitive	Match upper and lowercase equivalently	All engines	Unicode case folding may differ from ASCII-only
m	Multiline	^ and $ match line boundaries, not string boundaries	All engines	In Ruby, /m is the dotall flag instead
s	Dotall / Single-line	. matches newline characters	PCRE, JS (ES2018+), Python (re.DOTALL), .NET	Without this, . stops at \n
u	Unicode	Enable full Unicode matching and \p{} support	JS (ES2015+), PCRE2	Critical for non-ASCII text. Always use in modern JS.
v	Unicode Sets	Unicode set operations, improved classes	JS (ES2024+)	Superset of /u. Enables set subtraction, intersection.
x	Extended / Verbose	Allow whitespace and comments in pattern	PCRE, Python (re.VERBOSE), Ruby, Java	Great for documenting complex patterns
d	Match indices	Return start/end indices for each capture	JS (ES2022+)	Adds indices property to match result
y	Sticky	Match only at lastIndex position	JS (ES2015+)	Used for lexer/tokenizer implementations

Part 6: Unicode & Property Escapes

The Unicode Problem in JavaScript

Unicode Categories and Scripts

Regex Usage by Programming Language (% of developers using regex regularly)

Source: OnlineTools4Free Research

Regex Usage by Language

10 rows

Language	Usage (%)	Engine	Named Groups	Lookbehind
JavaScript	68	Irregexp (V8)	Yes	Yes (ES2018)
Python	62	SRE	Yes	Fixed-length
Java	45	NFA backtracking	Yes	Finite length
PHP	35	PCRE2	Yes	Variable length
Go	28	RE2 variant	Yes	No
C#	25	NFA backtracking	Yes	Variable length
Ruby	18	Oniguruma	Yes	Variable length
Rust	15	Hybrid NFA/DFA	Yes	No
TypeScript	58	Irregexp (V8)	Yes	Yes (ES2018)
Shell/Bash	42	POSIX / PCRE	PCRE only	PCRE only

Part 7: Regex Engines Compared

NFA vs. DFA Engines

Engine Feature Matrix

Regex Engine Comparison

10 rows

Engine	Language	Type	Variable Lookbehind	Atomic Groups	Recursion	Speed
PCRE2	C (used by PHP, R, Nginx)	Backtracking (NFA)	Yes	Yes	Yes	Fast
JavaScript (V8)	JavaScript	Backtracking (NFA)	Yes (ES2018+)	No	No	Fast
.NET (System.Text.RegularExpressions)	C#, F#, VB.NET	Backtracking (NFA)	Yes	Yes	Yes (balancing groups)	Fast
Python (re)	Python	Backtracking (NFA)	Fixed-length only	No	No (use regex module)	Moderate
Java (java.util.regex)	Java, Kotlin	Backtracking (NFA)	Finite (not *)	Yes (?>)	No	Moderate
RE2	Go, C++	Thompson NFA / DFA	No lookbehind	N/A	N/A	Very fast (linear guarantee)
Rust (regex crate)	Rust	Hybrid NFA/DFA	No lookbehind	N/A	N/A	Very fast (linear guarantee)
Ruby (Oniguruma)	Ruby	Backtracking (NFA)	Yes	Yes	Yes	Fast
POSIX BRE	sed, grep (default)	DFA/NFA	No	No	No	Fast
POSIX ERE	grep -E, awk	DFA/NFA	No	No	No	Fast

Regex Engine Speed: Simple vs Complex Pattern (microseconds, lower is better)

Source: OnlineTools4Free Research

Part 8: Performance & ReDoS

Catastrophic Backtracking Explained

ReDoS: Regex Denial of Service

Key Finding

Every regex used on user-controlled input should be reviewed for ReDoS vulnerability. A single vulnerable pattern can take down an entire service.

Use linear-time engines (RE2, Rust regex) for untrusted input, or validate patterns with static analysis tools. Set hard timeouts on regex execution.

Regex Performance: Backtracking Complexity

8 rows

Pattern	Input Size	Match Time (ms)	No-Match Time (ms)	Complexity	Issue
/aab/	20	0.001	450	O(2^n)	Catastrophic backtracking: overlapping a* quantifiers
/(a+)+b/	25	0.001	3200	O(2^n)	Nested quantifiers: exponential on non-match
/(a\|a)*b/	25	0.001	2800	O(2^n)	Alternation with overlapping: same exponential
/^[a-z]+$/	1000000	15	15	O(n)	Linear: no backtracking needed
/\d{4}-\d{2}-\d{2}/	1000000	22	20	O(n)	Linear: deterministic match/skip
/(?>a+)b/	25	0.001	0.001	O(n)	Atomic group prevents backtracking
/^(?:(?!ab).)*$/	10000	8	0.01	O(n)	Tempered greedy token: linear but slower constant
/\b\w+@\w+\.\w+\b/	100000	12	10	O(n)	Simple email-like: linear scan

Engine Speed: Email Pattern Scan on 10KB Text (microseconds)

Source: OnlineTools4Free Research

Part 9: 100+ Ready-to-Use Regex Patterns

100+ Regex Patterns Library

95 rows

Category	Description	Pattern	Example	Notes
Email	Email address (basic)	^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$	[email protected]	Covers 99% of valid emails. Does not handle quoted local parts.
Email	Email address (RFC 5322 simplified)	^[a-zA-Z0-9.!#$%&'+/=?^_`{\|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)$	[email protected]	Closely follows RFC 5322 without quoted strings.
Email	Common provider email only	^[a-zA-Z0-9._%+-]+@(gmail\|yahoo\|outlook\|hotmail)\.[a-z]{2,}$	[email protected]	Restricts to major providers.
Phone	International phone (E.164)	^\+?[1-9]\d{1,14}$	+14155552671	ITU-T E.164 format. Up to 15 digits.
Phone	US phone number	^$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}$	(415) 555-2671	Matches (XXX) XXX-XXXX, XXX-XXX-XXXX, XXXXXXXXXX.
Phone	UK mobile phone	^\+44\s?7\d{3}\s?\d{6}$	+44 7911 123456	UK mobile numbers start with 07.
Phone	French mobile phone	^\+33\s?[67]\d{8}$	+33 612345678	French mobile starts with 06 or 07.
Phone	German mobile phone	^\+49\s?1[567]\d{1,2}\s?\d{7,8}$	+49 151 12345678	German mobile prefixes: 015x, 016x, 017x.
URL	URL (basic match)	https?://[^\s/$.?#].[^\s]*	https://example.com/path	Matches most HTTP/HTTPS URLs in text.
URL	URL (strict validation)	^https?://(?:www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_+.~#?&/=]*)$	https://www.example.com/path?q=1	Validates full URL structure with query params.
URL	YouTube video URL	^(?:https?://)?(?:www\.)?(?:youtube\.com/watch\?v=\|youtu\.be/)([a-zA-Z0-9_-]{11})	https://youtu.be/dQw4w9WgXcQ	Captures 11-char video ID.
URL	GitHub repo URL	^(?:https?://)?(?:www\.)?github\.com/([a-zA-Z0-9-]+)/([a-zA-Z0-9._-]+)(?:/.*)?$	https://github.com/user/repo	Captures owner and repo name.
IP	IPv4 address	^(?:(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)\.){3}(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)$	192.168.1.1	Validates 0-255 range per octet.
IP	IPv6 address (simplified)	^(?:(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\|::(?:[0-9a-fA-F]{1,4}:){0,5}[0-9a-fA-F]{1,4})$	2001:0db8:85a3::8a2e:0370:7334	Handles full and :: shortened forms.
IP	IPv4 CIDR notation	^(?:(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)\.){3}(?:25[0-5]\|2[0-4]\d\|[01]?\d\d?)/(?:3[0-2]\|[12]?\d)$	10.0.0.0/24	IP with subnet mask /0-/32.
Date	Date ISO 8601 (YYYY-MM-DD)	^\d{4}-(?:0[1-9]\|1[0-2])-(?:0[1-9]\|[12]\d\|3[01])$	2026-04-14	Does not validate day-of-month vs month length.
Date	Date US format (MM/DD/YYYY)	^(?:0[1-9]\|1[0-2])/(?:0[1-9]\|[12]\d\|3[01])/\d{4}$	04/14/2026	US date format with leading zeros.
Date	Date European (DD.MM.YYYY)	^(?:0[1-9]\|[12]\d\|3[01])\.(?:0[1-9]\|1[0-2])\.\d{4}$	14.04.2026	Common in Germany, France.
Date	Time 24-hour (HH:MM or HH:MM:SS)	^(?:[01]\d\|2[0-3]):[0-5]\d(?::[0-5]\d)?$	14:30:00	Validates 00:00-23:59:59.
Date	ISO 8601 datetime with timezone	^\d{4}-(?:0[1-9]\|1[0-2])-(?:0[1-9]\|[12]\d\|3[01])T(?:[01]\d\|2[0-3]):[0-5]\d:[0-5]\d(?:\.\d+)?(?:Z\|[+-](?:[01]\d\|2[0-3]):[0-5]\d)$	2026-04-14T14:30:00Z	Full ISO datetime with timezone.

Page 1 of 5

Part 10: Common Regex Mistakes

Common Regex Mistakes and Fixes

12 rows

Mistake	Example	Fix	Severity
Not escaping special characters	Using . instead of \. for literal dot	Escape metacharacters: . * + ? ^ $ { } [ ] ( ) \| \	High
Greedy matching when lazy is needed	"<.*>" matches "<a>text</b>" entirely	Use lazy quantifier: "<.?>" or be specific: "<[^>]>"	High
Catastrophic backtracking	/(a+)+b/ on "aaaaaaaaaaac" hangs	Avoid nested quantifiers. Use atomic groups or rewrite.	Critical
Missing anchors for validation	/\d{5}/ matches "abc12345xyz"	Add anchors: /^\d{5}$/ for exact match	High
Forgetting multiline flag	/^line$/ does not match individual lines	Add /m flag so ^ and $ match line boundaries	Medium
Not using Unicode flag	/\w+/ does not match accented letters	Use /u flag and \p{L} for Unicode letters	High
Using regex to parse HTML	/<div>(.*)<\/div>/ fails on nested divs	Use a proper HTML parser (DOMParser, cheerio, etc.)	Critical
Overly specific patterns	Hardcoding whitespace as single space	Use \s+ to match any whitespace (tabs, newlines, etc.)	Medium
Case sensitivity oversight	/hello/ does not match "Hello"	Add /i flag for case-insensitive matching	Low
Using capturing groups unnecessarily	(abc)+ when you do not need the capture	Use non-capturing groups: (?:abc)+	Low
Not testing edge cases	Email regex that rejects valid + in local part	Test with +, dots, long TLDs, international domains	Medium
Matching too broadly	.* at the start/end of patterns	Be as specific as possible to reduce false positives	Medium

Regex in Tools and Frameworks

Regex in Developer Tools and Frameworks

10 rows

Tool	Regex Usage	Engine	Notes
VS Code	Search & replace	JavaScript (V8)	Toggle regex with Alt+R. Supports lookahead, named groups.
grep / ripgrep	Text search	POSIX / PCRE / Rust regex	grep -P for PCRE, rg uses Rust regex (linear-time).
sed	Stream editing	POSIX BRE / ERE (-E)	sed -E for extended regex. No lookaheads.
awk	Pattern matching	POSIX ERE	Built-in pattern matching and splitting.
Nginx	Location matching, rewrites	PCRE2	~ for case-sensitive, ~* for case-insensitive.
Apache (.htaccess)	URL rewriting (mod_rewrite)	PCRE	RewriteRule uses PCRE syntax.
ESLint	Code pattern matching	JavaScript	No-invalid-regexp rule prevents bad patterns.
React Router	Route path matching	path-to-regexp	Uses parameterized patterns, not raw regex.
Cloudflare WAF	Security rule matching	RE2 (linear-time)	Uses RE2 to prevent ReDoS in WAF rules.
Google Analytics	Filter patterns, audience rules	RE2	Limited regex syntax (no lookaheads).

Glossary: 40+ Regex Terms Defined

This glossary defines every essential regex term used in this guide. Terms are organized alphabetically within categories. Each definition provides practical context for working developers.

API

Match Object

Advanced

Conditional

Recursion

A PCRE/.NET feature where a pattern can call itself or a named group: (?R) or (?&name). Used to match nested structures like balanced parentheses. Not available in JavaScript, Python re, Go, or Rust.

Subroutine

Assertions

Anchor

Assertion

Lookahead

Lookbehind

Word Boundary

Zero-Width

Classes

Character Class

Negated Class

A character class starting with ^ that matches any character NOT in the class. [^aeiou] matches any non-vowel. [^0-9] is equivalent to \D. The ^ must be the first character after [.

POSIX Character Class

Named character classes like [:alpha:], [:digit:], [:alnum:], [:space:] used inside bracket expressions [[:alpha:]]. POSIX classes are locale-aware. Used by grep, sed, awk, and POSIX-compliant tools.

Engine

Backtracking

Catastrophic Backtracking

DFA (Deterministic Finite Automaton)

NFA (Non-deterministic Finite Automaton)

PCRE (Perl Compatible Regular Expressions)

Flags

Dotall Mode

Flag (Modifier)

A letter appended after the closing delimiter that modifies regex behavior. Common flags: i (case-insensitive), g (global), m (multiline), s (dotall), u (Unicode), x (extended/verbose).

Multiline Mode

The /m flag that changes ^ and $ to match at line boundaries (after/before \n) rather than only at the start/end of the entire string. Without /m, ^ and $ match only the absolute start and end.

Verbose Mode

Groups

Atomic Group

Backreference

Capture Group

Named Group

Non-Capturing Group

A group (?:expr) that groups a sub-expression for quantifiers or alternation without capturing the match. Useful for performance (no capture overhead) and clarity (does not affect group numbering).

Operators

Alternation

Quantifiers

Greedy Quantifier

Lazy Quantifier

Possessive Quantifier

Quantifier

Security

ReDoS (Regular Expression Denial of Service)

Syntax

Escape Sequence

Literal

A character in a regex that matches itself. Most characters (a-z, 0-9) are literals. Metacharacters must be escaped with backslash to be treated as literals. Example: \. matches a literal period.

Metacharacter

A character with special meaning in regex syntax. The metacharacters are: . ^ $ * + ? { } [ ] ( ) | \. To match a metacharacter literally, it must be escaped with a backslash.

Unicode

Code Point

Surrogate Pair

Unicode Category

Unicode Property Escape

Unicode Script

Frequently Asked Questions (20 Questions)

The most common questions about regular expressions, drawn from search data, developer forums, and Stack Overflow. Each answer provides actionable guidance.

What is a regular expression?

What is the difference between greedy and lazy quantifiers?

How do I match a literal dot or other special character?

What is catastrophic backtracking and how do I prevent it?

What is a lookahead and when should I use one?

What is the difference between \d and [0-9]?

How do I validate an email address with regex?

What does the /g flag do and when do I need it?

How do named groups work?

Can regex match nested structures like balanced parentheses?

What is the difference between .* and [^x]*?

How do I make regex case-insensitive?

What is the Unicode /u flag and why should I always use it?

How do I test and debug regex patterns?

What is the difference between match(), exec(), and test()?

How do I match across multiple lines?

What is ReDoS and how do I protect against it?

How do I replace text using captured groups?

What are possessive quantifiers and when should I use them?

How do I use regex in find-and-replace in VS Code?

Try It Yourself

Use the embedded regex tester below to experiment with patterns from this guide. Paste any pattern from the library above and test it against your own input data in real time.

Regex Tester

Enter a regex pattern and test string to see matches highlighted in real time. Try patterns from the 100+ library above or write your own.

Try it yourself

Regex Tester

Open full tool

Regular Expression

//g

Global

Test String

Regex Builder

Build regex patterns visually by selecting character classes, quantifiers, and groups. See the generated pattern and test it simultaneously.

Try it yourself

Regex Builder

Open full tool

Presets

Build Components

Or type regex directly

Generated Regex

gim

/.../g

Test String

Raw Data Downloads

All datasets used in this report are available for download. Use them for your own reference, teaching, or integration.

Citations & Sources

Kleene, S.C.. “Representation of Events in Nerve Nets and Finite Automata.” RAND Corporation, 1956. https://www.rand.org/pubs/research_memoranda/RM704.html

Thompson, K.. “Regular Expression Search Algorithm.” Communications of the ACM, 1968. https://dl.acm.org/doi/10.1145/363347.363387

Cox, R.. “Regular Expression Matching Can Be Simple And Fast.” 2007. https://swtch.com/~rsc/regexp/regexp1.html

Hazel, P.. “PCRE2 — Perl Compatible Regular Expressions (revised API).” pcre.org, 2025. https://www.pcre.org/current/doc/html/

ECMA International. “ECMAScript 2024 Language Specification: RegExp (Unicode Sets).” ECMA International, 2024. https://tc39.es/ecma262/#sec-regexp-regular-expression-objects

ECMA International. “ECMAScript 2018 Language Specification: Lookbehind Assertions.” ECMA International, 2018. https://tc39.es/ecma262/#sec-assertion

Friedl, J.E.F.. “Mastering Regular Expressions, 3rd Edition.” O'Reilly Media, 2006. https://www.oreilly.com/library/view/mastering-regular-expressions/0596528124/

Cox, R.. “RE2: A principled approach to regular expression matching.” Google, 2010. https://github.com/google/re2

The Rust Project. “regex — An implementation of regular expressions for Rust.” crates.io, 2024. https://docs.rs/regex/latest/regex/

Wustholz, V., Olivo, O., Heule, M., Dillig, I.. “Static Detection of DoS Vulnerabilities in Programs that Use Regular Expressions.” TACAS, 2017. https://doi.org/10.1007/978-3-662-54577-5_1

Unicode Consortium. “Unicode Technical Standard #18: Unicode Regular Expressions.” The Unicode Consortium, 2024. https://unicode.org/reports/tr18/