Problem Walkthrough

Top K Frequent Elements LeetCode 347

Solve LeetCode 347 Top K Frequent Elements using three approaches: a min-heap for O(n log k) time, bucket sort for O(n) guaranteed time, and quickselect on frequency values for O(n) average time — understand when each approach is preferable and how to implement all three cleanly in Python and Java.

9 min read|

Top K Frequent Elements

LeetCode 347

Problem Overview — Return the K Most Frequent Elements

LeetCode 347 gives you an integer array nums and an integer k, and asks you to return the k most frequent elements. The answer is guaranteed to be unique — no ties at the k-th boundary — and can be returned in any order. The elements in the output array are the actual values from nums, not their frequencies. For example, given nums = [1, 1, 1, 2, 2, 3] and k = 2, the answer is [1, 2] because 1 appears 3 times and 2 appears 2 times.

The naive approach is to count all frequencies and sort by frequency in descending order, then return the first k elements. Sorting takes O(n log n) time which the problem expects you to beat. LeetCode 347 explicitly states the answer must be computed in better than O(n log n) time, pushing you toward selection algorithms or bucket sort that exploit the bounded range of frequency values.

All three optimal approaches start with the same first step: count frequencies using a hashmap in O(n) time and O(n) space. The key difference lies in how each approach selects the top k from those frequency counts. Understanding the trade-offs between heap, bucket sort, and quickselect is the main learning objective of this problem.

  • Given integer array nums and integer k, return k most frequent elements
  • Answer is guaranteed unique — no ties at the k-th position
  • Output can be in any order — only the elements matter, not their rank
  • Array length 1 to 100,000; values -10,000 to 10,000
  • k is always valid: 1 <= k <= number of unique elements
  • Must be better than O(n log n) time — ruling out a simple sort

Frequency Counting First — All Approaches Start With a Hashmap

Every efficient solution to LeetCode 347 begins identically: build a frequency map by iterating through nums once and incrementing the count for each value. This step is O(n) time and O(n) space — unavoidable since you must examine every element to know its frequency. In Python, collections.Counter(nums) handles this in one line. In Java, you iterate through the array and call map.put(n, map.getOrDefault(n, 0) + 1).

After the frequency map is built, the remaining problem is purely a selection problem: given a collection of (value, frequency) pairs, find the k pairs with the highest frequency values. This is structurally identical to finding the k largest values in a list — the only difference is that the list contains frequency counts rather than the original array values. This reframing connects LeetCode 347 directly to LeetCode 215 Kth Largest Element.

The three approaches differ only in how they solve this selection sub-problem. Min-heap maintains a heap of size k, scanning all frequency entries once for O(n log k). Bucket sort exploits the fact that frequencies are bounded between 1 and n, creating a direct-address array that enables O(n) selection. Quickselect partitions frequency entries around a pivot to select the k largest in O(n) average time without sorting.

💡

The Real Problem Is Selection After Counting

All three approaches count frequencies in O(n) using a hashmap — that part is fixed. The real algorithmic question is how to select the top k from those frequency entries. Heap gives O(n log k), bucket sort gives O(n) guaranteed because frequencies are bounded by n, and quickselect gives O(n) average. For interviews, bucket sort is the most impressive answer because it achieves true linear time.

Min-Heap Approach — O(n log k) Time With Bounded Memory

The min-heap approach processes all (frequency, value) pairs from the frequency map and maintains a min-heap of exactly size k. For each new entry, push it onto the heap. If the heap grows beyond size k, pop the minimum. After processing all entries, the heap contains exactly the k elements with the highest frequencies. Since the heap never exceeds size k, each push and pop costs O(log k), and iterating over all unique elements (at most n) gives O(n log k) total.

The critical detail is using a min-heap — not a max-heap. A min-heap keeps the smallest frequency at the top, which lets you efficiently discard low-frequency elements as you process new ones. With a max-heap you would need to push everything first and then pop k times, giving O(n log n) — no better than sorting. The min-heap of size k is a canonical pattern for "top k" problems: it limits memory to O(k) for the heap while achieving O(n log k) selection.

In Python, the heapq module provides a min-heap by default. Push tuples of (frequency, value) so the heap orders by frequency first. In Java, use a PriorityQueue with a lambda comparator (a, b) -> map.get(a) - map.get(b) that orders by frequency ascending. After processing all map entries, drain the heap into your result array. The final result is in frequency-ascending order inside the heap but the problem allows any order.

  1. 1Build frequency map: count occurrences of each value in nums — O(n)
  2. 2Initialize empty min-heap
  3. 3For each (value, freq) pair in the frequency map:
  4. 4 Push (freq, value) onto the heap
  5. 5 If heap size > k, pop the minimum (lowest frequency element)
  6. 6After processing all pairs, heap contains exactly k elements
  7. 7Extract all values from the heap — these are the top k frequent elements
  8. 8Time: O(n log k) — Space: O(n) for the frequency map, O(k) for the heap

Bucket Sort Approach — O(n) Guaranteed With Frequency Indexing

The bucket sort approach achieves O(n) guaranteed time by exploiting a key observation: the maximum frequency of any element in an array of length n is at most n. This means all frequency values lie in the range [1, n], so you can create a direct-address array (the "bucket array") of size n+1 where bucket[i] is a list of all values that appear exactly i times. Building this bucket array requires one pass over the frequency map — O(n) time and O(n) space.

After building the bucket array, iterate from index n down to index 1, collecting elements from each bucket until you have accumulated k elements. Because you start from the highest possible frequency and work downward, the first k elements you collect are guaranteed to be the k most frequent. This collection step is also O(n) in the worst case because the bucket array has n+1 slots and each element appears in exactly one bucket.

The total time complexity is O(n) for the initial frequency counting, O(n) for building the bucket array, and O(n) for the final collection — O(n) end-to-end. This is optimal since you must read all n input elements. The space complexity is O(n) for the frequency map plus O(n) for the bucket array. Bucket sort is often the "best answer" expected in LeetCode 347 interviews because it achieves true linear time with a clean and intuitive implementation.

ℹ️

Bucket Sort Is O(n) Guaranteed Because Max Frequency Is Bounded by n

The reason bucket sort achieves O(n) — and not just O(n) average — is that the frequency values are integers bounded between 1 and n. This bounded integer range is the prerequisite for direct-address (counting/bucket) sort to achieve linear time. You create exactly n+1 buckets (indices 0 through n), each element lands in exactly one bucket, and you collect from highest index downward. No comparison-based sorting is needed.

Quickselect on Frequencies — O(n) Average Time

Quickselect adapts the partition step from quicksort to select the k-th element without fully sorting the array. For LeetCode 347, you run quickselect on the list of unique elements ordered by their frequency values, selecting the k elements with the highest frequencies. At each step, choose a pivot element and partition the list so that all elements with frequency greater than the pivot are on the right and all with lower frequency are on the left.

After partitioning, if the pivot ends up at position len-k (where len is the total number of unique elements), then all elements to the right of the pivot plus the pivot itself are the k most frequent — return them. If the pivot position is less than len-k, recurse on the right partition. If greater, recurse on the left. Each partition step is O(m) where m is the current subarray size, and on average the subarrays halve in size each step, giving O(n) total with O(log n) expected recursion depth.

The worst-case for quickselect is O(n^2) if the pivot is always the minimum or maximum, but randomizing the pivot choice makes this astronomically unlikely. In practice, quickselect runs in O(n) average time and is the algorithm behind Python's statistics.median and numpy's partition. For LeetCode 347, quickselect on frequency values is structurally identical to LeetCode 215 Kth Largest Element — the same code pattern applies, just operating on frequency counts rather than raw array values.

  1. 1Build frequency map and collect unique elements into an array — O(n)
  2. 2Define target position: pivot should land at index (len(unique) - k)
  3. 3Randomly choose a pivot element from the unique elements array
  4. 4Partition: elements with freq > pivot.freq go right, lower go left
  5. 5If pivot index == target: return all elements from target to end
  6. 6If pivot index < target: recurse on right partition
  7. 7If pivot index > target: recurse on left partition
  8. 8Average O(n) time; worst case O(n^2) avoided with randomized pivot

Code Walkthrough — Python and Java With Bucket Sort

Python bucket sort solution: from collections import Counter. def topKFrequent(nums, k): count = Counter(nums); buckets = [[] for _ in range(len(nums) + 1)]; for val, freq in count.items(): buckets[freq].append(val); result = []; for i in range(len(buckets) - 1, 0, -1): result.extend(buckets[i]); if len(result) >= k: return result[:k]. Note: Python's Counter.most_common(k) is a built-in shortcut that returns the k most frequent elements — it uses a heap internally and runs in O(n log k), not O(n). For the true O(n) bucket sort, implement it manually as shown above.

Java bucket sort solution: public int[] topKFrequent(int[] nums, int k) { Map<Integer, Integer> count = new HashMap<>(); for (int n : nums) count.put(n, count.getOrDefault(n, 0) + 1); List<Integer>[] buckets = new List[nums.length + 1]; for (int i = 0; i <= nums.length; i++) buckets[i] = new ArrayList<>(); for (Map.Entry<Integer, Integer> e : count.entrySet()) buckets[e.getValue()].add(e.getKey()); int[] result = new int[k]; int idx = 0; for (int i = buckets.length - 1; i > 0 && idx < k; i--) for (int val : buckets[i]) { result[idx++] = val; if (idx == k) break; } return result; }

Both solutions follow the same three-phase structure: build Counter in O(n), populate bucket array in O(n), collect from highest bucket downward in O(n). The Java solution uses a raw array of Lists to avoid generic array creation warnings, or you can use ArrayList<List<Integer>>. The bucket array index represents frequency, and each bucket contains all values at that frequency. Iterating from index n downward guarantees you collect highest-frequency elements first.

⚠️

Bucket Sort Uses O(n) Extra Space — Same Asymptotic Cost as the Hashmap

Bucket sort requires O(n) space for the bucket array (size n+1) plus O(n) for the frequency hashmap — O(n) total. This is the same asymptotic space as any other approach since all approaches need the hashmap. The bucket array does not add an extra asymptotic cost. However, if memory is extremely constrained in practice (e.g., n = 100,000), the bucket array allocates n+1 list objects even if most are empty. In those cases a heap approach with O(k) additional space is more memory-efficient.

Ready to master algorithm patterns?

YeetCode flashcards help you build pattern recognition through active recall and spaced repetition.

Start practicing now