Regular expression Denial of Service ('ReDoS')

ID

java.redos

Severity

critical

Resource

Resource Management

Language

Java

Tags

CAPEC:492, CWE:1333, NIST.SP.800-53, PCI-DSS:6.5.6

Description

Potential denial-of-service attack through malicious regular expression ('ReDoS').

Most regular expression engines use non-deterministic Finite Automaton (NFA) with backtracking, to try all possible execution paths of the regular expression when evaluating an input. In some cases this can lead to a situation called catastrophic backtracking. In the worst case, the complexity of the regular expression is exponential in the size of the input, meaning that a small crafted input (such as 20 characters) can trigger catastrophic backtracking and cause a denial of service to the application.

While these regular expression engines can quickly confirm a positive match, confirming a negative match can take much more longer.

Rationale

This detector analyzes the runtime complexity of a regular expression, checking if catastrophic backtracking is possible.

Obviously, an attacker with total or partial control over a regular expression may carry out a denial-of-service (DoS) attack, by adding vulnerable patterns and providing the crafted input to be matched by the regular expression. This results in a Denial of Service attack.

But even if the attacker does not have control over the regular expression, vulnerable regular expressions can also be exploited using malicious input to launch a catastrophic backtracking attack.

In Java, regular expressions are based on the Pattern and Matcher classes, and the underlying engine uses backtracking and many other features (possessive quantifiers, atomic groups, named capturing groups).

Consider the following example of a regex vulnerable to catastrophic backtracking:

// nested quantifiers is a recipe to disaster
String regex = "(a+)+d";
String input = "aaaaaaaaaaaa!";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
    System.out.println("Matched!");
}
java

In this example, the regex (a+)+d is vulnerable to ReDoS because it employs nested repetition. With crafted inputs like "aaaaaaaaaaaa!", the pattern can cause the regex engine to backtrack more than necessary, leading to performance degradation and potential service disruption.

The problematic pattern occurs when a regex uses nested quantifiers (e.g., (a+)\+), alternation (e.g., (a|aa)+), or excessive capturing groups in unreliable contexts. Attackers can exploit these features, leading to significant delays in processing or even causing the application to hang or crash under heavy load.

Remediation

The following ideas can be used when facing a ReDoS vulnerability, to reduce the risk of a DoS attack exploiting catastrophic backtracking:

  • Do not make the regular expression pattern dependent on untrusted input. If an attacker can choose the regexp at will, he can probably launch a DoS attack. Use quoting for including user-controlled input if needed.

  • Even with literal regexp patterns, an attacker may control the string to match, and care should be taken with nested quantifiers such as \* and +. Follow the hints in "catastrophic backtracking" about how to rewrite a vulnerable regexp pattern to make it safe.

    • Avoid, if possible, nested quantifiers (repeated or alternated tokens inside a group that is itself repeated/alternated), like (T1+T2+)+T3. It is better to split into groups and then process each group in a second pass, than to use overly complex regular expressions with nested quantifiers.

    • When available, possessive quantifiers (*+, ++ and {n, m}+) significantly improve performance when the regex match fails, and no backtracking is needed.

    • The (?>subexp) element (atomic grouping) disables backtracking.

    • If possible, define a maximum number of expected repetitions using the bounded quantifiers {n, m}, like {1,3} instead of +.

    • Use negated character classes excluding separators, instead of .* to match tokens with separators. For example, the pattern .*_.* can be improved by changing it to [^_]*_.*

  • When creating a regex follow the advice: "if it’s going to fail, make it fail as quickly as possible". You can use features like atomic groups and possessive quantifiers for that, but if you get the structure of the regex right, you rarely need them. Backtracking is not inevitable.

Configuration

The detector has the following configurable parameters:

  • sources, that indicates the source kinds to check.

  • neutralizations, that indicates the neutralization kinds to check.

Unless you need to change the default behavior, you typically do not need to configure this detector.

References