Problem Walkthrough

Matching Subsequences LeetCode 792 Guide

Preprocess string s into a character-to-indices map, then for each word use binary search to find the next valid index for each character. Alternatively use the bucket iterator approach for O(s + total word length) time.

8 min read|

Matching Subsequences

LeetCode 792

Problem Overview

LeetCode 792 — Number of Matching Subsequences — gives you a string s and an array of strings words. Return the number of words that are subsequences of s. A subsequence is formed by deleting zero or more characters from s without changing the order of remaining characters.

The naive approach checks each word against s using the standard two-pointer subsequence check in O(|s|) per word, giving O(|s| * |words|) total. When s is long (up to 50,000) and there are many words (up to 5,000), this can be slow. We need to preprocess s for faster lookups.

Two efficient approaches exist: binary search on character indices (preprocess once, then binary search per character per word) and the bucket iterator approach (process s once while advancing all words simultaneously). Both avoid redundant scanning of s.

  • Input: string s, array of strings words
  • Count how many words are subsequences of s
  • s length up to 50,000, words up to 5,000
  • Each word length up to 50
  • Naive O(|s| * |words|) can be optimized

Approach 1: Binary Search on Character Indices

Preprocess s by building a map from each character to the sorted list of its indices in s. For example, if s = "abcde", then map[a] = [0], map[b] = [1], etc. If s = "aabbc", then map[a] = [0,1], map[b] = [2,3], map[c] = [4].

For each word, simulate the subsequence check. Maintain a pointer pos = -1 representing the last matched index in s. For each character c in the word, binary search in map[c] for the smallest index greater than pos. If found, update pos to that index. If not found, the word is not a subsequence.

The binary search uses bisect_right (or upper_bound in C++) on the sorted index list. Each character lookup is O(log k) where k is the number of occurrences of that character in s. Total time is O(|s| + sum of word lengths * log(|s|)).

💡

bisect_right vs bisect_left

Use bisect_right(indices, pos) or bisect_left(indices, pos+1) — both find the first index strictly greater than pos. The result index tells you where to look in the sorted list. If it equals the list length, no valid index exists.

Approach 2: Bucket Iterators

Create 26 buckets, one per lowercase letter. For each word, place an iterator (word_index, char_index) into the bucket corresponding to the word's first character. This groups all words by the next character they need to match.

Scan through s one character at a time. For character s[i], process all iterators in that character's bucket. Each iterator advances to its next character. If the word is fully matched, increment the count. Otherwise, move the iterator to the bucket for the next needed character.

After scanning all of s, the count gives the answer. Each iterator advances exactly once per character it matches, and each character in s processes only the iterators in its bucket. Total work is O(|s| + sum of word lengths).

Step-by-Step Walkthrough

Consider s = "abcde", words = ["a", "bb", "acd", "ace"]. Binary search approach: map = {a:[0], b:[1], c:[2], d:[3], e:[4]}. Word "a": search a for index > -1, found 0. Matched! Word "bb": search b for >-1, found 1. Search b for >1, none found. Not a subsequence.

Word "acd": search a for >-1 → 0. Search c for >0 → 2. Search d for >2 → 3. Matched! Word "ace": search a for >-1 → 0. Search c for >0 → 2. Search e for >2 → 4. Matched! Answer: 3.

Bucket approach: initial buckets: a→["a"@0, "acd"@0, "ace"@0], b→["bb"@0]. Scan s[0]=a: process 3 iterators. "a" fully matched (count=1). "acd" advances to c bucket. "ace" advances to c bucket. Scan s[1]=b: process "bb"@0, advances to b bucket at @1. Scan s[2]=c: process "acd"@1→d bucket, "ace"@1→e bucket. And so on. Final count: 3.

ℹ️

Which Approach to Choose?

Binary search is easier to implement and explain in interviews. Bucket iterators are asymptotically faster (no log factor) but more complex to code. Lead with binary search, mention buckets as an optimization if asked.

Implementation in Python and Java

Python binary search: from bisect import bisect_left. Build char_indices with defaultdict(list). For each word, iterate characters and bisect for next index. If bisect result equals list length, break (not subsequence). Else update pos.

Python bucket approach: use deque of (word, index) pairs in 26 buckets. Iterate through s, processing and redistributing iterators. Count fully matched words. This approach is elegant but requires careful iterator management.

Java binary search: use a List<List<Integer>> of size 26 for character indices. Use Collections.binarySearch or a manual upper-bound search. Java's binarySearch returns -(insertion_point)-1 for missing values, which needs careful handling to extract the next valid index.

Complexity Analysis

Binary search approach: O(|s| + W * log(|s|)) time where W is the total length of all words. Preprocessing s takes O(|s|). Each character in each word requires one binary search in a list of at most |s| elements. Space is O(|s|) for the index map.

Bucket iterator approach: O(|s| + W) time. Scanning s is O(|s|). Each character in each word is processed exactly once when its iterator advances. Space is O(W) for the iterator queues.

Both are significant improvements over the naive O(|s| * |words|) approach. The bucket approach is theoretically optimal but the binary search approach has a smaller constant factor in practice due to simpler data structures.

YeetCode Flashcard Tip

The character-index-map + binary search pattern applies to many subsequence problems. Practice it alongside Is Subsequence and Longest Common Subsequence on YeetCode to master subsequence techniques.

Ready to master algorithm patterns?

YeetCode flashcards help you build pattern recognition through active recall and spaced repetition.

Start practicing now