Executive Summary
Regular expressions are one of the most powerful and universally available tools in computing. Every programming language, every text editor, every database engine, and every web server supports regex in some form. Yet regex remains widely misunderstood, frequently misused, and occasionally feared. This guide provides a complete, practical reference that covers every aspect of regular expressions in 2026.
We cover the full regex syntax from basic character classes through advanced features like lookahead/lookbehind assertions, named groups, backreferences, atomic groups, possessive quantifiers, and Unicode property escapes. We compare 10 regex engines across 8 programming languages, analyze performance characteristics and catastrophic backtracking, and provide 100+ production-ready patterns you can copy and use immediately.
The report includes engine benchmarks, a pattern library searchable by category, common mistake analysis, a 40+ term glossary, 20 FAQ answers, and an embedded regex tester. Whether you are writing your first pattern or debugging a ReDoS vulnerability, this guide covers it.
100+
Ready-to-use patterns
10
Engines compared
40+
Glossary terms defined
20
FAQ questions answered
- 68% of JavaScript developers use regex regularly, making it the most common language for regex usage. TypeScript follows at 58%, Python at 62%.
- Catastrophic backtracking remains the #1 regex security risk. Nested quantifiers like (a+)+ can cause exponential processing time on crafted input, enabling ReDoS attacks. Linear-time engines (RE2, Rust regex) eliminate this risk entirely.
- ES2024 introduced the /v flag (Unicode Sets) enabling set subtraction and intersection in character classes, the most significant JavaScript regex addition since ES2018 lookbehinds.
- PCRE2 with JIT compilation is the fastest backtracking engine at 0.05 microseconds per simple match, while the Rust regex crate leads among guaranteed-linear engines at 0.08 microseconds.
Part 1: History of Regular Expressions
Regular expressions have their roots in formal language theory. In 1951, mathematician Stephen Kleene described “regular events” using his mathematical notation, which he published formally in 1956 in his seminal paper “Representation of Events in Nerve Nets and Finite Automata.” Kleene was modeling the behavior of neural networks using simple algebras, and his notation for describing patterns in strings became the foundation of what we now call regular expressions.
The jump from theory to practice came in 1968 when Ken Thompson implemented regular expression search in the QED text editor at Bell Labs. Thompson’s key insight was translating regex into a non-deterministic finite automaton (NFA), which could be efficiently simulated. In 1973, Thompson created grep (“global regular expression print”), extracting the search functionality into a standalone Unix tool. grep became one of the most important Unix utilities and introduced regex to generations of programmers.
The modern era of regex began with Perl. Larry Wall’s Perl 1.0 (1987) included a rich regex syntax that went far beyond POSIX, adding features like non-greedy quantifiers, lookaheads, non-capturing groups, and later backreferences and Unicode support. Perl’s regex dialect became so influential that Philip Hazel created the PCRE (Perl Compatible Regular Expressions) library in 1997 to bring Perl-style regex to other languages. PCRE became the standard regex library for PHP, Apache, Nginx, R, and many other tools.
In 2007, Russ Cox (Google) created RE2, a regex engine that guarantees linear-time matching by using a DFA approach instead of backtracking. RE2 sacrifices features like backreferences and lookarounds but eliminates the possibility of catastrophic backtracking. This trade-off proved valuable for security-sensitive applications: Google uses RE2 for all user-facing regex features in its products. The Rust regex crate (2020) follows the same linear-time philosophy.
JavaScript regex has evolved significantly in recent years. ES2015 added the /u (Unicode) flag. ES2018 brought lookbehind assertions, named capturing groups, the /s (dotall) flag, and Unicode property escapes. ES2022 added the /d flag for match indices. ES2024 introduced the /v (Unicode Sets) flag, enabling set subtraction and intersection within character classes. These additions have made JavaScript regex comparable in power to PCRE for most practical patterns.
Regex History Timeline
23 rows
| Year | Event | Era |
|---|---|---|
| 1951 | Stephen Kleene defines regular events / regular expressions | Theory |
| 1956 | Kleene publishes "Representation of Events in Nerve Nets" | Theory |
| 1968 | Ken Thompson implements regex in QED editor | Unix |
| 1973 | grep utility created for Unix (by Ken Thompson) | Unix |
| 1979 | awk (Aho, Weinberger, Kernighan) includes regex support | Unix |
| 1986 | POSIX Basic Regular Expressions (BRE) standardized | Standards |
| 1986 | Henry Spencer writes first portable regex library in C | Libraries |
| 1987 | Perl 1.0 introduces powerful regex syntax | Languages |
| 1992 | POSIX Extended Regular Expressions (ERE) finalized | Standards |
| 1997 | PCRE (Perl Compatible Regular Expressions) library released | Libraries |
| 1998 | ECMAScript 3 (JavaScript) adds native RegExp object | Languages |
| 1999 | Python re module stabilized (Python 1.6+) | Languages |
| 2002 | .NET adds named groups, lookbehind, conditionals | Languages |
| 2004 | Java 1.4 java.util.regex with full Unicode support | Languages |
| 2007 | RE2 engine by Russ Cox (guaranteed linear-time) | Engines |
| 2012 | Unicode 6.1 script property support in major engines | Unicode |
| 2015 | ES2015 adds /u (Unicode) flag to JavaScript | Languages |
| 2018 | ES2018 adds lookbehind assertions, named groups, /s flag | Languages |
| 2020 | Rust regex crate 1.0 achieves both safety and speed | Engines |
| 2022 | ES2022 adds /d (match indices) flag | Languages |
| 2024 | ES2024 adds /v (Unicode sets) flag for set operations | Languages |
| 2025 | PCRE2 10.44 adds extended callout features | Libraries |
| 2026 | Most engines support Unicode 16.0 property escapes | Unicode |
Part 2: Character Classes
Character classes define a set of characters to match at a single position. They are the building blocks of every regex pattern. A character class can be as simple as a literal character (a matches “a”) or as complex as a Unicode property escape matching characters from a specific writing system.
Basic Character Classes
The dot (.) metacharacter matches any single character except a newline (\n). With the /s (dotall) flag, the dot matches newlines as well. Square brackets create custom character classes: [abc] matches any single character that is a, b, or c. Ranges are specified with a hyphen: [a-z] matches any lowercase letter, [0-9] matches any digit. Multiple ranges can be combined: [a-zA-Z0-9] matches any alphanumeric character.
Negated character classes start with ^ inside the brackets: [^abc] matches any character that is NOT a, b, or c. [^0-9] matches any non-digit character. The caret must be the first character after the opening bracket to negate; elsewhere it is a literal ^.
Shorthand Character Classes
Shorthand classes provide convenient notation for common character sets. \d matches any digit [0-9] (or Unicode digits with /u). \D matches any non-digit. \w matches word characters [a-zA-Z0-9_]. \W matches non-word characters. \s matches whitespace (space, tab, newline, form feed, carriage return). \S matches non-whitespace. These shorthands are available in every modern regex engine.
Unicode Character Properties
Unicode property escapes (\p{Property}) match characters by their Unicode category, script, or other property. \p{L} matches any Unicode letter (Latin, Cyrillic, Han, Arabic, etc.), \p{N} matches any number, \p{P} matches any punctuation. Script-specific matching like \p{Script=Latin} matches only Latin-script characters. The ES2024 /v flag adds set operations: [[\p{Letter}--[\p{Script=Greek}]] matches any letter that is NOT Greek.
Key Finding
Always use the /u flag (or /v in ES2024+) when working with non-ASCII text in JavaScript regex.
Without /u, JavaScript treats strings as UTF-16 code units, which breaks emoji matching, supplementary characters, and property escapes. The /u flag enables correct code-point-based matching.
Character Classes Reference
22 rows
| Syntax | Description | Example | Category |
|---|---|---|---|
| . | Any character except newline (unless /s flag) | a.c matches "abc", "a1c" | Basic |
| \d | Digit character [0-9] | \d{3} matches "123" | Shorthand |
| \D | Non-digit character [^0-9] | \D+ matches "abc" | Shorthand |
| \w | Word character [a-zA-Z0-9_] | \w+ matches "hello_42" | Shorthand |
| \W | Non-word character [^a-zA-Z0-9_] | \W matches "!" | Shorthand |
| \s | Whitespace [\t\n\r\f\v ] | \s+ matches " " | Shorthand |
| \S | Non-whitespace character | \S+ matches "hello" | Shorthand |
| [abc] | Character set: a, b, or c | [aeiou] matches vowels | Set |
| [^abc] | Negated set: not a, b, or c | [^0-9] matches non-digits | Set |
| [a-z] | Range: a through z | [a-zA-Z] matches letters | Set |
| [\u{1F600}-\u{1F64F}] | Unicode range (with /u flag) | Matches emoticon block | Unicode |
| \p{L} | Unicode Letter category (with /u) | \p{L}+ matches "cafe" | Unicode |
| \p{N} | Unicode Number category | \p{N}+ matches "42" | Unicode |
| \p{P} | Unicode Punctuation category | \p{P} matches "." | Unicode |
| \p{S} | Unicode Symbol category | \p{S} matches "$" | Unicode |
| \p{Script=Latin} | Unicode Latin script | Matches Latin letters | Unicode |
| \p{Script=Han} | Unicode CJK script | Matches Chinese characters | Unicode |
| \p{Emoji} | Unicode emoji property (ES2024 /v) | Matches emoji characters | Unicode |
| \b | Word boundary | \bcat\b matches "cat" not "catch" | Boundary |
| \B | Non-word boundary | \Bcat\B matches "concatenate" | Boundary |
Page 1 of 2
Part 3: Quantifiers
Quantifiers specify how many times the preceding element should be matched. They are the mechanism that transforms a single-character match into a multi-character pattern. Every regex quantifier has three modes: greedy (default), lazy, and possessive.
Greedy vs. Lazy vs. Possessive
Greedy quantifiers (*, +, ?, {n,m}) match as many characters as possible, then give back characters one at a time if the rest of the pattern fails. This is the default behavior. Lazy quantifiers (*?, +?, ??, {n,m}?) match as few characters as possible, then expand one character at a time. Possessive quantifiers (*+, ++, ?+, {n,m}+) match as many characters as possible and never backtrack — they either succeed or fail immediately.
Example: given the input “aaa” and the pattern a{2,4}, greedy matches “aaa” (maximum 3), lazy matches “aa” (minimum 2), and possessive matches “aaa” (maximum 3, no backtracking). The difference matters when the quantifier is followed by more pattern elements that might require the quantifier to give back characters.
Possessive quantifiers are supported by PCRE, Java, and .NET but not by JavaScript or Python. They are particularly valuable for preventing catastrophic backtracking: if you know a quantifier should never give back characters, making it possessive eliminates the possibility of exponential backtracking.
Quantifiers Reference
6 rows
| Syntax | Name | Description | Example | Lazy Form | Possessive Form |
|---|---|---|---|---|---|
| * | Star | 0 or more (greedy) | a* matches "", "a", "aaa" | *? | *+ |
| + | Plus | 1 or more (greedy) | a+ matches "a", "aaa" not "" | +? | ++ |
| ? | Question | 0 or 1 (greedy) | colou?r matches "color", "colour" | ?? | ?+ |
| {n} | Exact | Exactly n times | \d{4} matches "2026" | N/A | N/A |
| {n,} | At least | n or more times | \d{2,} matches "42", "123" | {n,}? | {n,}+ |
| {n,m} | Range | Between n and m times | \d{2,4} matches "42", "2026" | {n,m}? | {n,m}+ |
Characters matched by a{quantifier} on input 'aaa'
Source: OnlineTools4Free Research
Part 4: Groups & Backreferences
Groups serve two purposes in regex: they apply quantifiers or alternation to a sub-expression, and they capture matched text for later use. Understanding the different group types is essential for writing effective patterns.
Capturing vs. Non-Capturing Groups
A capturing group (expr) saves the matched text in a numbered slot. Groups are numbered left-to-right by their opening parenthesis: in (a)(b(c)), group 1 captures “a”, group 2 captures “bc”, and group 3 captures “c”. Non-capturing groups (?:expr) group the expression without capturing, which improves performance and avoids cluttering the match result.
Named capturing groups (?<name>expr) provide a readable alternative to numbered groups. Instead of referencing \1, you reference \k<name>. In JavaScript, named captures appear in the match.groups object. Named groups are self-documenting: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) is immediately readable.
Backreferences
A backreference \1 (or \k<name>) matches the exact same text that was captured by the referenced group. This is fundamentally different from repeating the group pattern: (\w+)\s\1 matches “the the” (same word repeated) but (\w+)\s(\w+) matches “the cat” (any two words). Backreferences are powerful for detecting duplicates, matching delimiters, and validating symmetric structures.
Important: backreferences make a regex non-regular in the formal language theory sense. This means engines that guarantee linear time (RE2, Rust regex) cannot support backreferences. NFA-based engines (PCRE, JavaScript, Java, Python) support them but at the cost of potential exponential backtracking.
Atomic Groups and Conditionals
Atomic groups (?>expr) prevent backtracking into the group after it has matched. Once the engine commits to a match inside an atomic group, it cannot reconsider. This is a powerful tool for preventing catastrophic backtracking. Conditional patterns (?(condition)yes|no) match different sub-patterns depending on whether a condition is met (typically whether a previous group captured anything).
Group Types Reference
12 rows
| Syntax | Name | Description | Example | Engine Support |
|---|---|---|---|---|
| (expr) | Capturing Group | Captures matched text for backreference | (abc)\1 matches "abcabc" | All engines |
| (?:expr) | Non-Capturing Group | Groups without capturing | (?:abc)+ matches "abcabc" | All engines |
| (?<name>expr) | Named Capturing Group | Named capture for readability | (?<year>\d{4}) | PCRE, JS, .NET, Python, Java |
| (?=expr) | Positive Lookahead | Asserts what follows matches | \d(?=px) matches "5" in "5px" | All engines |
| (?!expr) | Negative Lookahead | Asserts what follows does not match | \d(?!px) matches "5" in "5em" | All engines |
| (?<=expr) | Positive Lookbehind | Asserts what precedes matches | (?<=\$)\d+ matches "100" in "$100" | PCRE, JS (ES2018+), .NET, Python, Java |
| (?<!expr) | Negative Lookbehind | Asserts what precedes does not match | (?<!\$)\d+ matches "100" in "EUR100" | PCRE, JS (ES2018+), .NET, Python, Java |
| (?>expr) | Atomic Group | No backtracking into group | (?>a+)b prevents catastrophic backtracking | PCRE, .NET, Java, Ruby |
| (?P<name>expr) | Python Named Group | Python-style named group | (?P<year>\d{4}) | Python, PCRE |
| (?(1)yes|no) | Conditional | If group 1 matched, try yes else no | (a)?(?(1)b|c) | PCRE, .NET, Python |
| \1 | Backreference | Matches same text as group 1 | (\w+)\s\1 matches "the the" | All engines |
| \k<name> | Named Backreference | Matches same text as named group | (?<word>\w+)\s\k<word> | PCRE, JS, .NET, Java |
Part 5: Lookahead & Lookbehind Assertions
Lookahead and lookbehind assertions (collectively called “lookarounds”) are zero-width assertions that check whether a pattern matches at a position without consuming any characters. They are among the most powerful and most misunderstood regex features.
Positive and Negative Lookahead
Positive lookahead (?=expr) asserts that what follows the current position matches expr. The classic use case is password validation with multiple requirements: ^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$ uses three lookaheads at position 0 to verify the presence of uppercase, lowercase, and digit characters anywhere in the string, then the .{8,}$ matches the actual 8+ characters.
Negative lookahead (?!expr) asserts that what follows does NOT match. Example: \d+(?!px) matches numbers not followed by “px”, useful for matching values in “5em” but not in “5px”. Another example: ^(?!.*admin).*$ matches strings that do NOT contain “admin” anywhere.
Positive and Negative Lookbehind
Positive lookbehind (?<=expr) asserts that what precedes the current position matches expr. Example: (?<=\$)\d+ matches digits preceded by a dollar sign, capturing “100” from “$100” without including the dollar sign in the match. Negative lookbehind (?<!expr) asserts that the preceding text does NOT match.
Lookbehind support varies significantly across engines. JavaScript added lookbehind in ES2018 (Chrome 62+, Firefox 78+, Node 10+) and supports variable-length lookbehinds. Python’s re module allows only fixed-length lookbehinds (each alternative must have the same length). PCRE2, .NET, and Java support variable-length lookbehinds. RE2 and the Rust regex crate do not support lookbehinds at all.
Practical Lookaround Patterns
Lookarounds excel in several scenarios. For adding thousands separators: (?<=\d)(?=(\d{3})+(?!\d)) inserts a comma between a digit and a group of three digits. For matching words at boundaries without consuming the boundary: (?<=\s|^)word(?=\s|$). For extracting content between delimiters without including them: (?<=").*?(?=").
A powerful technique is the “tempered greedy token”: (?:(?!stop).)*stop matches everything up to “stop” without crossing it. At each position, the negative lookahead verifies that “stop” does not start here, then the dot consumes one character. This pattern is linear-time and works across engines, though it is slower than a simple negated class due to the per-character assertion.
Key Finding
Lookaround assertions do not consume characters. They assert a condition at a position, allowing the main pattern to match without including the asserted text.
This makes lookarounds essential for complex validation (multiple conditions on the same text), context-sensitive matching (match X only when preceded/followed by Y), and extraction without delimiters.
Regex Flags Reference
9 rows
| Flag | Name | Description | Support | Notes |
|---|---|---|---|---|
| g | Global | Find all matches, not just the first | JS, PCRE, Python (re.findall) | In most languages, iteration or findall replaces the flag |
| i | Case-insensitive | Match upper and lowercase equivalently | All engines | Unicode case folding may differ from ASCII-only |
| m | Multiline | ^ and $ match line boundaries, not string boundaries | All engines | In Ruby, /m is the dotall flag instead |
| s | Dotall / Single-line | . matches newline characters | PCRE, JS (ES2018+), Python (re.DOTALL), .NET | Without this, . stops at \n |
| u | Unicode | Enable full Unicode matching and \p{} support | JS (ES2015+), PCRE2 | Critical for non-ASCII text. Always use in modern JS. |
| v | Unicode Sets | Unicode set operations, improved classes | JS (ES2024+) | Superset of /u. Enables set subtraction, intersection. |
| x | Extended / Verbose | Allow whitespace and comments in pattern | PCRE, Python (re.VERBOSE), Ruby, Java | Great for documenting complex patterns |
| d | Match indices | Return start/end indices for each capture | JS (ES2022+) | Adds indices property to match result |
| y | Sticky | Match only at lastIndex position | JS (ES2015+) | Used for lexer/tokenizer implementations |
Part 6: Unicode & Property Escapes
Unicode support in regex has progressed from an afterthought to a first-class feature. Modern applications must handle text in multiple scripts, emoji, and supplementary characters. Without proper Unicode support, regex patterns break on non-ASCII input in subtle and frustrating ways.
The Unicode Problem in JavaScript
JavaScript uses UTF-16 encoding internally. Characters above U+FFFF (including most emoji) are represented as two 16-bit “surrogate” code units. Without the /u flag, JavaScript regex treats each surrogate as a separate character. This means /^.$/ fails to match a single emoji character, and quantifiers apply to individual surrogates rather than complete code points. The /u flag fixes this by switching to code-point semantics.
The /u flag also enables Unicode property escapes: \p{Letter} matches any letter in any script, \p{Script=Cyrillic} matches Cyrillic characters, and \P{Letter} (uppercase P) matches non-letters. The ES2024 /v flag (Unicode Sets) extends this further with set operations: [[a-z]--[aeiou]] matches consonants, and [[\p{Letter}&&\p{Script=Latin}]] matches only Latin letters.
Unicode Categories and Scripts
Unicode defines General Categories that classify every character: Letter (L), Number (N), Punctuation (P), Symbol (S), Separator (Z), Mark (M), and Other (C). Each has subcategories: Lu (uppercase letter), Ll (lowercase letter), Nd (decimal digit number). Regex engines expose these via \p{Lu}, \p{Nd}, etc.
Unicode Scripts assign characters to writing systems: Latin, Cyrillic, Han, Arabic, Devanagari, and 150+ others. Script matching is essential for input validation in multilingual applications. For example, to accept only Latin and common characters: ^[\p{Script=Latin}\p{Common}]+$. To detect mixed-script text (potential homograph attacks): check if input contains characters from multiple scripts.
Regex Usage by Programming Language (% of developers using regex regularly)
Source: OnlineTools4Free Research
Regex Usage by Language
10 rows
| Language | Usage (%) | Engine | Named Groups | Lookbehind |
|---|---|---|---|---|
| JavaScript | 68 | Irregexp (V8) | Yes | Yes (ES2018) |
| Python | 62 | SRE | Yes | Fixed-length |
| Java | 45 | NFA backtracking | Yes | Finite length |
| PHP | 35 | PCRE2 | Yes | Variable length |
| Go | 28 | RE2 variant | Yes | No |
| C# | 25 | NFA backtracking | Yes | Variable length |
| Ruby | 18 | Oniguruma | Yes | Variable length |
| Rust | 15 | Hybrid NFA/DFA | Yes | No |
| TypeScript | 58 | Irregexp (V8) | Yes | Yes (ES2018) |
| Shell/Bash | 42 | POSIX / PCRE | PCRE only | PCRE only |
Part 7: Regex Engines Compared
Not all regex engines are created equal. The choice of engine determines which features are available, what performance guarantees exist, and whether catastrophic backtracking is possible. Understanding engine differences is essential when writing patterns that must work across languages.
NFA vs. DFA Engines
NFA (Non-deterministic Finite Automaton) engines use backtracking to explore possible matches. They support the full regex feature set including backreferences, lookarounds, atomic groups, and possessive quantifiers. The trade-off is that pathological patterns can cause exponential worst-case performance. PCRE, JavaScript, Python, Java, Ruby, and .NET all use NFA engines.
DFA (Deterministic Finite Automaton) engines process each input character exactly once, guaranteeing O(n) time complexity regardless of the pattern. They cannot support backreferences or lookarounds because these features require tracking state that DFA engines do not maintain. RE2 (Google, used by Go) and the Rust regex crate use hybrid NFA/DFA approaches that provide linear-time guarantees while supporting most common features.
Engine Feature Matrix
The table below compares 10 regex engines across key features. PCRE2 has the richest feature set (recursion, callouts, conditionals, variable-length lookbehind). JavaScript has improved rapidly with ES2018-2024 additions but still lacks atomic groups, possessive quantifiers, and recursion. RE2 and the Rust regex crate trade features for guaranteed linear-time performance.
Regex Engine Comparison
10 rows
| Engine | Language | Type | Variable Lookbehind | Atomic Groups | Recursion | Speed |
|---|---|---|---|---|---|---|
| PCRE2 | C (used by PHP, R, Nginx) | Backtracking (NFA) | Yes | Yes | Yes | Fast |
| JavaScript (V8) | JavaScript | Backtracking (NFA) | Yes (ES2018+) | No | No | Fast |
| .NET (System.Text.RegularExpressions) | C#, F#, VB.NET | Backtracking (NFA) | Yes | Yes | Yes (balancing groups) | Fast |
| Python (re) | Python | Backtracking (NFA) | Fixed-length only | No | No (use regex module) | Moderate |
| Java (java.util.regex) | Java, Kotlin | Backtracking (NFA) | Finite (not *) | Yes (?>) | No | Moderate |
| RE2 | Go, C++ | Thompson NFA / DFA | No lookbehind | N/A | N/A | Very fast (linear guarantee) |
| Rust (regex crate) | Rust | Hybrid NFA/DFA | No lookbehind | N/A | N/A | Very fast (linear guarantee) |
| Ruby (Oniguruma) | Ruby | Backtracking (NFA) | Yes | Yes | Yes | Fast |
| POSIX BRE | sed, grep (default) | DFA/NFA | No | No | No | Fast |
| POSIX ERE | grep -E, awk | DFA/NFA | No | No | No | Fast |
Regex Engine Speed: Simple vs Complex Pattern (microseconds, lower is better)
Source: OnlineTools4Free Research
Part 8: Performance & ReDoS
Regex performance is dominated by one concern: catastrophic backtracking. A well-written regex runs in linear time O(n) on any input. A poorly written regex can run in exponential time O(2^n), hanging the application for seconds, minutes, or effectively forever on crafted input. Understanding and preventing catastrophic backtracking is the most important performance skill in regex.
Catastrophic Backtracking Explained
Catastrophic backtracking occurs when a pattern has overlapping or nested quantifiers that create an exponential number of ways to match the input. The canonical example is (a+)+b applied to the input “aaaaaaaaaaac”. The outer + and inner + both try to match the ‘a’ characters. When the engine fails to find ‘b’ at the end, it backtracks through every possible partition of the ‘a’ characters between the two quantifiers: (aaaa)(aa), (aaa)(aaa), (aaa)(aa)(a), etc. With n ‘a’ characters, there are 2^(n-1) possible partitions.
Other dangerous patterns include: (a|a)*b (alternation with overlap), (a*)*b (nested star quantifiers), and .*X.* on long inputs without X (quadratic backtracking). The fix is to eliminate ambiguity: rewrite (a+)+b as a+b (since the nested grouping adds nothing), use atomic groups (?>a+)b to prevent backtracking into the group, or use possessive quantifiers a++b.
ReDoS: Regex Denial of Service
ReDoS attacks exploit catastrophic backtracking in server-side regex patterns. If a web application uses a vulnerable regex to validate user input, an attacker can craft input that causes the regex to run for minutes, consuming a CPU core and potentially bringing down the service. Notable ReDoS incidents have affected Node.js packages (ua-parser-js, moment.js), Cloudflare (2019 outage caused by a single regex), and Stack Overflow.
Prevention strategies: use a linear-time engine (RE2, Rust regex) for user-facing validation; set timeouts on regex execution (.NET supports this natively; in Node.js, use worker threads with timeouts); run static analysis tools (safe-regex, rxxr2, redos-checker) on patterns; avoid processing untrusted input with complex patterns; and review all regex patterns that include nested quantifiers, alternation with overlapping alternatives, or .* in the middle of a pattern.
Key Finding
Every regex used on user-controlled input should be reviewed for ReDoS vulnerability. A single vulnerable pattern can take down an entire service.
Use linear-time engines (RE2, Rust regex) for untrusted input, or validate patterns with static analysis tools. Set hard timeouts on regex execution.
Regex Performance: Backtracking Complexity
8 rows
| Pattern | Input Size | Match Time (ms) | No-Match Time (ms) | Complexity | Issue |
|---|---|---|---|---|---|
| /a*a*b/ | 20 | 0.001 | 450 | O(2^n) | Catastrophic backtracking: overlapping a* quantifiers |
| /(a+)+b/ | 25 | 0.001 | 3200 | O(2^n) | Nested quantifiers: exponential on non-match |
| /(a|a)*b/ | 25 | 0.001 | 2800 | O(2^n) | Alternation with overlapping: same exponential |
| /^[a-z]+$/ | 1000000 | 15 | 15 | O(n) | Linear: no backtracking needed |
| /\d{4}-\d{2}-\d{2}/ | 1000000 | 22 | 20 | O(n) | Linear: deterministic match/skip |
| /(?>a+)b/ | 25 | 0.001 | 0.001 | O(n) | Atomic group prevents backtracking |
| /^(?:(?!ab).)*$/ | 10000 | 8 | 0.01 | O(n) | Tempered greedy token: linear but slower constant |
| /\b\w+@\w+\.\w+\b/ | 100000 | 12 | 10 | O(n) | Simple email-like: linear scan |
Engine Speed: Email Pattern Scan on 10KB Text (microseconds)
Source: OnlineTools4Free Research
Part 9: 100+ Ready-to-Use Regex Patterns
The table below contains over 100 production-ready regex patterns organized by category. Each pattern includes a description, example match, and implementation notes. You can search, sort, and filter the table to find the pattern you need, and download the entire collection as CSV for offline reference.
Categories covered: Email, Phone, URL, IP Address, Date/Time, Password Validation, Credit Card, HTML/Markup, Numbers, Code/Programming, Network, File/Path, Text Processing, Data Formats, Identifiers, Security/Log, Markdown, and Search/Replace patterns.
Important: these patterns are starting points. Real-world validation should combine regex with additional logic (Luhn checksum for credit cards, DNS lookup for email domains, date range validation for dates). No single regex can replace proper parsing for complex formats like HTML or JSON.
100+ Regex Patterns Library
95 rows
| Category | Description | Pattern | Example | Notes |
|---|---|---|---|---|
| Email address (basic) | ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ | [email protected] | Covers 99% of valid emails. Does not handle quoted local parts. | |
| Email address (RFC 5322 simplified) | ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$ | [email protected] | Closely follows RFC 5322 without quoted strings. | |
| Common provider email only | ^[a-zA-Z0-9._%+-]+@(gmail|yahoo|outlook|hotmail)\.[a-z]{2,}$ | [email protected] | Restricts to major providers. | |
| Phone | International phone (E.164) | ^\+?[1-9]\d{1,14}$ | +14155552671 | ITU-T E.164 format. Up to 15 digits. |
| Phone | US phone number | ^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$ | (415) 555-2671 | Matches (XXX) XXX-XXXX, XXX-XXX-XXXX, XXXXXXXXXX. |
| Phone | UK mobile phone | ^\+44\s?7\d{3}\s?\d{6}$ | +44 7911 123456 | UK mobile numbers start with 07. |
| Phone | French mobile phone | ^\+33\s?[67]\d{8}$ | +33 612345678 | French mobile starts with 06 or 07. |
| Phone | German mobile phone | ^\+49\s?1[567]\d{1,2}\s?\d{7,8}$ | +49 151 12345678 | German mobile prefixes: 015x, 016x, 017x. |
| URL | URL (basic match) | https?://[^\s/$.?#].[^\s]* | https://example.com/path | Matches most HTTP/HTTPS URLs in text. |
| URL | URL (strict validation) | ^https?://(?:www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_+.~#?&/=]*)$ | https://www.example.com/path?q=1 | Validates full URL structure with query params. |
| URL | YouTube video URL | ^(?:https?://)?(?:www\.)?(?:youtube\.com/watch\?v=|youtu\.be/)([a-zA-Z0-9_-]{11}) | https://youtu.be/dQw4w9WgXcQ | Captures 11-char video ID. |
| URL | GitHub repo URL | ^(?:https?://)?(?:www\.)?github\.com/([a-zA-Z0-9-]+)/([a-zA-Z0-9._-]+)(?:/.*)?$ | https://github.com/user/repo | Captures owner and repo name. |
| IP | IPv4 address | ^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$ | 192.168.1.1 | Validates 0-255 range per octet. |
| IP | IPv6 address (simplified) | ^(?:(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}|::(?:[0-9a-fA-F]{1,4}:){0,5}[0-9a-fA-F]{1,4})$ | 2001:0db8:85a3::8a2e:0370:7334 | Handles full and :: shortened forms. |
| IP | IPv4 CIDR notation | ^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)/(?:3[0-2]|[12]?\d)$ | 10.0.0.0/24 | IP with subnet mask /0-/32. |
| Date | Date ISO 8601 (YYYY-MM-DD) | ^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$ | 2026-04-14 | Does not validate day-of-month vs month length. |
| Date | Date US format (MM/DD/YYYY) | ^(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}$ | 04/14/2026 | US date format with leading zeros. |
| Date | Date European (DD.MM.YYYY) | ^(?:0[1-9]|[12]\d|3[01])\.(?:0[1-9]|1[0-2])\.\d{4}$ | 14.04.2026 | Common in Germany, France. |
| Date | Time 24-hour (HH:MM or HH:MM:SS) | ^(?:[01]\d|2[0-3]):[0-5]\d(?::[0-5]\d)?$ | 14:30:00 | Validates 00:00-23:59:59. |
| Date | ISO 8601 datetime with timezone | ^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])T(?:[01]\d|2[0-3]):[0-5]\d:[0-5]\d(?:\.\d+)?(?:Z|[+-](?:[01]\d|2[0-3]):[0-5]\d)$ | 2026-04-14T14:30:00Z | Full ISO datetime with timezone. |
Page 1 of 5
Part 10: Common Regex Mistakes
Even experienced developers make regex mistakes regularly. The patterns are powerful but unforgiving: a single missing backslash, anchor, or flag can produce subtly wrong results that pass initial tests but fail in production. The table below catalogs the most frequent mistakes and their fixes.
Common Regex Mistakes and Fixes
12 rows
| Mistake | Example | Fix | Severity |
|---|---|---|---|
| Not escaping special characters | Using . instead of \. for literal dot | Escape metacharacters: . * + ? ^ $ { } [ ] ( ) | \ | High |
| Greedy matching when lazy is needed | "<.*>" matches "<a>text</b>" entirely | Use lazy quantifier: "<.*?>" or be specific: "<[^>]*>" | High |
| Catastrophic backtracking | /(a+)+b/ on "aaaaaaaaaaac" hangs | Avoid nested quantifiers. Use atomic groups or rewrite. | Critical |
| Missing anchors for validation | /\d{5}/ matches "abc12345xyz" | Add anchors: /^\d{5}$/ for exact match | High |
| Forgetting multiline flag | /^line$/ does not match individual lines | Add /m flag so ^ and $ match line boundaries | Medium |
| Not using Unicode flag | /\w+/ does not match accented letters | Use /u flag and \p{L} for Unicode letters | High |
| Using regex to parse HTML | /<div>(.*)<\/div>/ fails on nested divs | Use a proper HTML parser (DOMParser, cheerio, etc.) | Critical |
| Overly specific patterns | Hardcoding whitespace as single space | Use \s+ to match any whitespace (tabs, newlines, etc.) | Medium |
| Case sensitivity oversight | /hello/ does not match "Hello" | Add /i flag for case-insensitive matching | Low |
| Using capturing groups unnecessarily | (abc)+ when you do not need the capture | Use non-capturing groups: (?:abc)+ | Low |
| Not testing edge cases | Email regex that rejects valid + in local part | Test with +, dots, long TLDs, international domains | Medium |
| Matching too broadly | .* at the start/end of patterns | Be as specific as possible to reduce false positives | Medium |
Regex in Tools and Frameworks
Regex is embedded in virtually every developer tool. Understanding which engine each tool uses helps you write compatible patterns. VS Code uses the JavaScript engine, so you get named groups and lookbehind. grep with -P uses PCRE, giving you the full feature set. ripgrep uses the Rust regex crate, which is fast but lacks lookaheads. Nginx uses PCRE2 for location matching.
Regex in Developer Tools and Frameworks
10 rows
| Tool | Regex Usage | Engine | Notes |
|---|---|---|---|
| VS Code | Search & replace | JavaScript (V8) | Toggle regex with Alt+R. Supports lookahead, named groups. |
| grep / ripgrep | Text search | POSIX / PCRE / Rust regex | grep -P for PCRE, rg uses Rust regex (linear-time). |
| sed | Stream editing | POSIX BRE / ERE (-E) | sed -E for extended regex. No lookaheads. |
| awk | Pattern matching | POSIX ERE | Built-in pattern matching and splitting. |
| Nginx | Location matching, rewrites | PCRE2 | ~ for case-sensitive, ~* for case-insensitive. |
| Apache (.htaccess) | URL rewriting (mod_rewrite) | PCRE | RewriteRule uses PCRE syntax. |
| ESLint | Code pattern matching | JavaScript | No-invalid-regexp rule prevents bad patterns. |
| React Router | Route path matching | path-to-regexp | Uses parameterized patterns, not raw regex. |
| Cloudflare WAF | Security rule matching | RE2 (linear-time) | Uses RE2 to prevent ReDoS in WAF rules. |
| Google Analytics | Filter patterns, audience rules | RE2 | Limited regex syntax (no lookaheads). |
Glossary: 40+ Regex Terms Defined
This glossary defines every essential regex term used in this guide. Terms are organized alphabetically within categories. Each definition provides practical context for working developers.
API
Advanced
Assertions
Classes
Engine
Flags
Groups
Operators
Quantifiers
Security
Syntax
Unicode
Frequently Asked Questions (20 Questions)
The most common questions about regular expressions, drawn from search data, developer forums, and Stack Overflow. Each answer provides actionable guidance.
What is a regular expression?
A regular expression (regex) is a sequence of characters that defines a search pattern. Regex is used for string matching, validation, search-and-replace, and text parsing. Every modern programming language includes a regex engine. The pattern /^\d{5}$/ matches exactly five digits, and /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/ matches email addresses.
What is the difference between greedy and lazy quantifiers?
Greedy quantifiers (*, +, ?) match as much text as possible, then backtrack if needed. Lazy quantifiers (*?, +?, ??) match as little as possible, then expand. Example: given "<a>text</b>", the greedy pattern "<.*>" matches the entire string, while the lazy "<.*?>" matches "<a>" and "</b>" separately. Use lazy quantifiers or negated character classes [^>]* when you need the shortest match.
How do I match a literal dot or other special character?
Precede the special character with a backslash: \. matches a literal period, \* matches a literal asterisk, \( matches a literal opening parenthesis. The regex metacharacters that need escaping are: . ^ $ * + ? { } [ ] ( ) | \. Inside a character class [...], only ], \, ^, and - have special meaning and need escaping.
What is catastrophic backtracking and how do I prevent it?
Catastrophic backtracking occurs when a regex has ambiguous patterns (typically nested quantifiers like (a+)+) that cause the engine to explore exponentially many paths on non-matching input. Prevention: avoid nested quantifiers, use atomic groups (?>...) or possessive quantifiers (++, *+), use specific character classes instead of .*, or use a linear-time engine like RE2 or the Rust regex crate.
What is a lookahead and when should I use one?
A lookahead (?=...) checks if the text ahead matches a pattern without consuming characters. Positive lookahead (?=expr) succeeds if the pattern matches; negative lookahead (?!expr) succeeds if it does not. Common uses: password validation with multiple requirements ((?=.*[A-Z])(?=.*\d).{8,}), matching a word only if followed by specific text (\w+(?=\s*=) to find variable names before =).
What is the difference between \d and [0-9]?
In most engines without Unicode mode, \d and [0-9] are equivalent (ASCII digits only). With Unicode mode (/u flag in JavaScript, Python, Java), \d may match digits from other scripts (Arabic-Indic, Devanagari, etc.) depending on the engine. For strict ASCII digit matching, use [0-9]. For international digit matching, use \d with the Unicode flag.
How do I validate an email address with regex?
A practical email regex is: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. This covers 99%+ of real-world email addresses. The full RFC 5322 specification allows quoted strings, comments, and IP literals that make a truly compliant regex extremely complex (thousands of characters). For production systems, use this simplified pattern for client-side validation and verify the email by sending a confirmation message.
What does the /g flag do and when do I need it?
The /g (global) flag in JavaScript causes methods like match() and replace() to find ALL matches in the string instead of stopping at the first. Without /g, "aaa".match(/a/) returns ["a"]. With /g, "aaa".match(/a/g) returns ["a", "a", "a"]. Be careful: with /g on a RegExp object, exec() maintains state via lastIndex, which can cause issues if the regex is reused.
How do named groups work?
Named groups (?<name>expr) let you assign a name to a capture group instead of using numeric indices. In JavaScript: const match = /(?<year>\d{4})-(?<month>\d{2})/.exec("2026-04"); match.groups.year is "2026". Named groups make patterns self-documenting and resistant to breakage when groups are added or removed. Backreference syntax: \k<name>. In Python, the syntax is (?P<name>expr) and (?P=name).
Can regex match nested structures like balanced parentheses?
Standard regex cannot match arbitrarily nested structures because regular languages cannot count unbounded nesting (the pumping lemma). However, some engines extend beyond regular languages: PCRE and .NET support recursion ((?R) or \g<0>) and balancing groups respectively, which CAN match balanced parentheses. For most practical purposes, use a parser or stack-based approach instead of regex for nested structures.
What is the difference between .* and [^x]*?
.* matches any character (except newline without /s flag) zero or more times. It is very broad and often causes over-matching. [^x]* matches any character EXCEPT x, zero or more times. Using negated classes is more precise and performant: to match content between quotes, "[^"]*" is better than ".*?" because it cannot cross quote boundaries and does not require backtracking.
How do I make regex case-insensitive?
Add the /i flag: /hello/i matches "Hello", "HELLO", "hElLo". In Python: re.compile("hello", re.IGNORECASE). In Java: Pattern.compile("hello", Pattern.CASE_INSENSITIVE). For partial case-insensitivity within a pattern, some engines support inline flags: (?i:hello) makes only "hello" case-insensitive while the rest of the pattern remains case-sensitive.
What is the Unicode /u flag and why should I always use it?
The /u flag (JavaScript ES2015+) enables full Unicode matching. Without it, JavaScript treats strings as sequences of 16-bit code units, breaking emoji and supplementary characters. With /u: the dot matches full code points (not half a surrogate pair), \p{} property escapes work, and quantifiers apply to full code points. Always use /u (or /v in ES2024+) in modern JavaScript for correct international text handling.
How do I test and debug regex patterns?
Use interactive tools: regex101.com (explanation + debugger), regexr.com (real-time highlighting), or the OnlineTools4Free regex tester embedded in this guide. For debugging: break complex patterns into parts, test each part separately, use verbose mode (/x flag) with comments, add test cases for edge cases (empty string, very long input, special characters, Unicode), and check for catastrophic backtracking with pathological inputs.
What is the difference between match(), exec(), and test()?
In JavaScript: test() returns true/false (/\d/.test("a1") returns true). exec() returns a match object with groups and indices (one match at a time, advances lastIndex with /g). match() without /g returns a match object (like exec). match() with /g returns an array of all matched strings (no groups). matchAll() (ES2020) returns an iterator of all match objects with groups. Use test() for validation, matchAll() for extraction.
How do I match across multiple lines?
Two flags are relevant: /m (multiline) makes ^ and $ match at line boundaries (not just string start/end). /s (dotall) makes . match newline characters. To match a pattern that spans lines, use /s so the dot crosses newlines, or use [\s\S] as an alternative to . that always matches any character including newlines (works without /s flag).
What is ReDoS and how do I protect against it?
ReDoS (Regular expression Denial of Service) exploits catastrophic backtracking to cause a server to hang. An attacker sends input designed to trigger exponential backtracking in a vulnerable regex. Protection: use a linear-time engine (RE2, Rust regex) for user-facing validation, set timeouts on regex operations (.NET supports this natively), avoid patterns with nested quantifiers on user input, and use static analysis tools (safe-regex, rxxr2) to detect vulnerable patterns.
How do I replace text using captured groups?
In JavaScript: "2026-04-14".replace(/(\d{4})-(\d{2})-(\d{2})/, "$2/$3/$1") returns "04/14/2026". $1, $2, $3 refer to captured groups. Named groups use $<name>: .replace(/(?<y>\d{4})-(?<m>\d{2})/, "$<m>/$<y>"). In Python: re.sub(r"(\d{4})-(\d{2})", r"\2/\1", text). The replacement string uses backreferences to rearrange captured text.
What are possessive quantifiers and when should I use them?
Possessive quantifiers (*+, ++, ?+) work like greedy quantifiers but never backtrack. Once they consume characters, they do not give them back. This makes them faster and prevents catastrophic backtracking, but they may fail to match where a greedy quantifier would succeed through backtracking. Use them when you know backtracking is not needed: [^"]*+ inside a quoted string match, or \d++ for a numeric field.
How do I use regex in find-and-replace in VS Code?
Enable regex mode by clicking the .* icon in the search bar (or pressing Alt+R). Use $1, $2 for captured groups in the replacement field. Example: find "(\w+):\s*(\w+)" and replace with "$2: $1" to swap key-value pairs. VS Code uses the JavaScript regex engine, so named groups (?<name>...) and $<name> replacements work. Use \n for newline in the replacement.
Try It Yourself
Use the embedded regex tester below to experiment with patterns from this guide. Paste any pattern from the library above and test it against your own input data in real time.
Regex Tester
Enter a regex pattern and test string to see matches highlighted in real time. Try patterns from the 100+ library above or write your own.
Try it yourself
Regex Tester
Regex Builder
Build regex patterns visually by selecting character classes, quantifiers, and groups. See the generated pattern and test it simultaneously.
Try it yourself
Regex Builder
Raw Data Downloads
All datasets used in this report are available for download. Use them for your own reference, teaching, or integration.
