Problem Walkthrough

Top K Frequent Words LeetCode 692

Count word frequencies with a hash map, then extract the top k most frequent words using a min-heap with a custom comparator that breaks ties lexicographically.

8 min read|

Top K Frequent Words

LeetCode 692

Problem Overview

LeetCode 692 — Top K Frequent Words — gives you an array of strings words and an integer k. Return the k most frequent strings sorted by frequency from highest to lowest. If two words have the same frequency, the lexicographically smaller word comes first.

The problem combines frequency counting with sorting under a custom ordering. The frequency part is straightforward with a hash map. The challenge lies in efficiently extracting the top k elements while respecting the tie-breaking rule.

This is a classic heap problem that tests your ability to design custom comparators. The min-heap approach processes all words in O(n log k) time, which is optimal when k is much smaller than n. An alternative bucket sort approach achieves O(n) for the grouping step.

  • Input: array of strings words[], integer k
  • Count the frequency of each unique word
  • Return the k most frequent words
  • Sort by frequency descending, then lexicographic ascending for ties
  • Guaranteed: k is always valid (1 <= k <= number of unique words)

Step 1: Frequency Counting with Hash Map

The first step is always the same: iterate through the words array and build a frequency map. In Python, use collections.Counter for a one-liner. In Java, use a HashMap<String, Integer> and getOrDefault to increment counts.

This step runs in O(n) time where n is the total number of words. The space used is O(m) where m is the number of unique words. After this step, you have a complete picture of how often each word appears.

The frequency map is the foundation for both the heap and bucket sort approaches. Regardless of which extraction method you choose, you always start here. The map transforms the raw input into a structured representation that supports efficient top-k queries.

💡

Counter Shortcut

In Python, Counter(words) builds the frequency map in one line. In Java, words.stream().collect(Collectors.groupingBy(w -> w, Collectors.counting())) does the same but the HashMap loop is usually cleaner in interviews.

Step 2a: Min-Heap of Size K

The heap approach maintains a min-heap of size k. The comparator is crucial: when comparing two entries, the one with lower frequency should be "greater" (closer to the top of the min-heap so it gets evicted first). For equal frequencies, the lexicographically larger word should be closer to the top.

This inverted comparator ensures that the least desirable element sits at the heap root. As you iterate through the frequency map, push each entry onto the heap. When the heap size exceeds k, pop the root — this removes the weakest candidate, keeping only the top k.

After processing all entries, the heap contains exactly k elements. Pop them all into a result list and reverse it. Since the min-heap pops the weakest first, reversing produces the correct descending-frequency, ascending-lexicographic order.

Step 2b: Bucket Sort Alternative

The bucket sort approach groups words by frequency. Create an array of lists where index i holds all words with frequency i. The maximum possible frequency is n (the total word count), so the array has size n+1.

After grouping, iterate from the highest frequency bucket down to 1. Within each bucket, sort the words lexicographically. Collect words into the result until you have k. This approach avoids the heap entirely and runs in O(n + m log m) time where m is unique words.

Bucket sort is particularly elegant when the frequency range is bounded. It trades space (O(n) for the bucket array) for simplicity. In practice, the constant factors are small, and the code is often easier to write correctly under interview pressure than the heap with its tricky comparator.

ℹ️

When to Choose Which

Use the min-heap when k is much smaller than the number of unique words — it gives O(n log k). Use bucket sort when you want simpler code or when k is close to the total unique count, making the heap advantage negligible.

Complete Implementation

In Python, the heap solution uses heapq with a custom wrapper class or tuple comparison. A clean approach: push (-freq, word) tuples into a max-heap (negated frequency turns min-heap into max-heap). Then pop k times. The negative frequency handles the primary sort, and Python's native string comparison handles the lexicographic tie-breaking correctly.

In Java, use a PriorityQueue with a comparator: (a, b) -> a.freq == b.freq ? b.word.compareTo(a.word) : a.freq - b.freq. This puts lower frequency and lexicographically larger words at the top for eviction. After filling the heap, pop all k elements and reverse.

The Python negative-frequency trick is a common interview pattern worth memorizing. It avoids writing a custom comparator class and leverages Python's default tuple comparison where the first element (negated frequency) dominates and the second (word string) breaks ties naturally.

Complexity Analysis

The min-heap approach runs in O(n log k) time: O(n) for counting, O(m log k) for heap operations where m is unique words. Space is O(m) for the frequency map plus O(k) for the heap. When k is constant, this is effectively O(n).

The bucket sort approach runs in O(n + m log m) time in the worst case due to sorting within buckets. If bucket sizes are small (common in practice), the sorting cost is negligible. Space is O(n) for the bucket array plus O(m) for the frequency map.

Both approaches are accepted on LeetCode. The heap is the expected interview answer and demonstrates understanding of priority queue mechanics. The bucket sort is a strong follow-up that shows breadth of algorithmic thinking.

YeetCode Flashcard Tip

The min-heap with custom comparator pattern appears in Top K Frequent Elements, K Closest Points, and many more. Practice all of them on YeetCode to build automatic recall of the heap setup.

Common Mistakes and Edge Cases

The most common mistake is getting the comparator direction wrong. Remember: in a min-heap for top-k, you want to evict the least desirable element. For this problem, that means lower frequency first, and for equal frequency, the lexicographically larger word first (since you want to keep the smaller one).

Another pitfall is forgetting to reverse the result after popping from the min-heap. The heap pops elements in ascending order of desirability, but the problem wants descending frequency order. Always reverse at the end, or use a max-heap from the start.

Edge cases to consider: all words have the same frequency (pure lexicographic sort), k equals the number of unique words (return everything sorted), and single-character words where lexicographic comparison is trivial. Test these cases to validate your comparator logic.

Ready to master algorithm patterns?

YeetCode flashcards help you build pattern recognition through active recall and spaced repetition.

Start practicing now