Problem Overview: Regex Matching with . and *
LeetCode 10 Regular Expression Matching asks you to implement regex matching with two special characters: "." which matches any single character, and "*" which matches zero or more of the preceding element. Given an input string s and a pattern p, return true if p matches s entirely — not a partial match.
For example, s = "aa", p = "a*" returns true because "a*" means zero or more "a" characters. But s = "aa", p = "." returns false because "." matches only a single character, not two. The constraint that "*" always refers to the immediately preceding element — and can match zero occurrences — is what makes this problem hard.
This is harder than glob matching (LeetCode 44) because the "*" in glob applies to any character by itself, while here it is always a two-character construct like "a*" or ".*". The problem has an O(m*n) DP solution where m and n are the lengths of s and p respectively.
- Input: string s (1–20 lowercase letters), pattern p (1–30 lowercase letters, ".", "*")
- Output: true if p matches s in its entirety
- "." matches exactly one arbitrary character
- "*" matches zero or more of the preceding character or "."
- "*" always follows another character — it is never the first character of p
Understanding the . and * Operators
The "." operator is simple: it matches exactly one character, any character. So "a.c" matches "abc", "axc", "a1c", and so on. In the DP table, when the current pattern character is ".", it behaves identically to a character match — it consumes one character from s and one from p.
The "*" operator is more subtle. It means "zero or more of the preceding element." The preceding element can be a literal character like "a" or a dot ".". So "a*" can match "", "a", "aa", "aaa", and so on. Meanwhile ".*" matches any string at all, including the empty string.
Crucially, "*" never appears alone — it is always part of a two-character unit "X*" where X is the character before it. When processing the pattern from left to right, whenever you see a "*" at position j, you know that positions j-1 and j form a unit. This pairing structure is the key to writing the recurrence correctly.
Mental Model
Always think of "*" as part of a two-character group "X*". The group can match zero occurrences (skip both X and *) or one-or-more occurrences (consume one X from s and keep the group active). This two-case split drives the entire DP recurrence.
Recursive Approach and Why Memoization Is Needed
Before building the DP table, it helps to think recursively. Define match(i, j) as true if s[i:] matches p[j:]. The base case is match(len(s), len(p)) = true — both strings are fully consumed. If j == len(p) but i < len(s), return false (pattern exhausted but string is not).
For the recursive step, first check if the current characters match: first_match = (i < len(s)) and (p[j] == s[i] or p[j] == ".")). Then handle the "*" case: if j+1 < len(p) and p[j+1] == "*", you can either skip the "X*" unit (match(i, j+2)) or, if first_match, consume one character from s (match(i+1, j)). If no "*" follows, simply advance both pointers when first_match is true.
The naive recursive approach has exponential worst-case complexity because the same (i, j) pairs get recomputed many times — especially in patterns with multiple "*" operators. Memoization reduces this to O(m*n) time and space. The iterative bottom-up DP builds the same logic from smaller subproblems to larger ones, avoiding recursion overhead.
- 1Define match(i, j): does s[i:] match p[j:]?
- 2Base case: both exhausted → true; pattern exhausted but string not → false
- 3Check first_match: i < len(s) and (s[i] == p[j] or p[j] == ".")
- 4If p[j+1] == "*": return match(i, j+2) or (first_match and match(i+1, j))
- 5Else: return first_match and match(i+1, j+1)
- 6Memoize on (i, j) to avoid recomputation
2D DP Table Setup and Initialization
For the bottom-up approach, define dp[i][j] as true if s[0..i-1] (the first i characters of s) matches p[0..j-1] (the first j characters of p). The table has dimensions (m+1) x (n+1) where m = len(s) and n = len(p), to account for the empty-string base cases.
The initialization is: dp[0][0] = true (empty string matches empty pattern). dp[i][0] = false for i > 0 (non-empty string cannot match empty pattern). For dp[0][j], the empty string can match a pattern like "a*b*c*" — any sequence of "X*" groups. Specifically, dp[0][j] = true when p[j-1] == "*" and dp[0][j-2] is true. This initialization correctly handles patterns that can match the empty string.
The rest of the table is filled in order of increasing i and j. Each cell dp[i][j] depends only on cells with smaller indices, so the row-by-row, column-by-column traversal order is correct. The final answer is dp[m][n].
Initialization Detail
The first row dp[0][j] (empty string vs pattern) is non-trivial. A pattern like "a*b*" can match the empty string because each "X*" group can match zero times. Make sure to initialize dp[0][j] = (p[j-1] == "*" and dp[0][j-2]) for j >= 2.
State Transitions: Walking Through the Recurrence
For each cell dp[i][j] with i > 0 and j > 0, there are two main cases. Case 1: p[j-1] is a literal character or ".". If p[j-1] == s[i-1] or p[j-1] == ".", then dp[i][j] = dp[i-1][j-1]. Otherwise dp[i][j] = false. This is the straightforward single-character match.
Case 2: p[j-1] == "*". The "*" is part of a "X*" group where X = p[j-2]. There are two sub-cases: (a) Zero occurrences — the "X*" group matches nothing, so dp[i][j] = dp[i][j-2]. (b) One or more occurrences — this requires that the current string character matches X (s[i-1] == p[j-2] or p[j-2] == "."), and dp[i-1][j] is true (matching one fewer character with the same pattern, keeping the "*" group active). Combining: dp[i][j] = dp[i][j-2] or ((s[i-1] == p[j-2] or p[j-2] == ".") and dp[i-1][j]).
The key to understanding sub-case (b) is that dp[i-1][j] asks: did s[0..i-2] match p[0..j-1]? If yes, and if s[i-1] also matches p[j-2] (the X in "X*"), then the "*" group can be extended by one more character to cover s[0..i-1]. This is why the pattern index j stays the same — we leave the "X*" group open to potentially match more characters.
Code Implementation — Python and Java
In Python: initialize dp as a (m+1) x (n+1) list of False. Set dp[0][0] = True. For j from 2 to n+1, set dp[0][j] = (p[j-1] == "*" and dp[0][j-2]). Then for i from 1 to m+1 and j from 1 to n+1: if p[j-1] == "*", set dp[i][j] = dp[i][j-2] or ((p[j-2] in (s[i-1], ".")) and dp[i-1][j]). Else, set dp[i][j] = dp[i-1][j-1] and (s[i-1] == p[j-1] or p[j-1] == "."). Return dp[m][n].
In Java: allocate boolean[][] dp = new boolean[m+1][n+1]. Set dp[0][0] = true. Initialize the first row similarly. Fill the rest with the same two-case recurrence using nested for loops. The character comparison uses s.charAt(i-1) and p.charAt(j-1). Return dp[m][n].
Space optimization is possible: since dp[i][j] depends on dp[i-1][j], dp[i][j-1], and dp[i][j-2], you cannot trivially reduce to a single 1D array. A rolling 2-row approach works but complicates index management. For interview purposes, the O(m*n) space 2D table is preferred for clarity.
Common Pitfall
When p[j-1] == "*", check p[j-2] (the character before "*") against s[i-1], not p[j-1]. Using the wrong index here is a very common bug. Always double-check which index refers to the pattern character that "*" is quantifying.