Problem Walkthrough

Find Median Data Stream LeetCode 295

The two-heap approach that gives you O(log n) insertion and O(1) median finding — a must-know pattern for streaming algorithm interviews.

9 min read|

Find Median Stream

LeetCode 295

Problem Overview — Find Median from Data Stream

LeetCode 295 asks you to design a data structure called MedianFinder that supports two operations: addNum(int num) which adds a number from the data stream to the structure, and findMedian() which returns the median of all numbers added so far. Numbers arrive one at a time — you never see the full dataset upfront.

The median is defined as the middle value in a sorted list. If the total count is odd, it is the single middle element. If even, it is the average of the two middle elements. For example, after adding [1, 2, 3] the median is 2; after adding [1, 2] the median is 1.5.

Key constraints to keep in mind: up to 50,000 calls to addNum and findMedian combined, values in range [-100,000, 100,000], and at least one element before findMedian is called. The follow-up challenge asks: can you optimize if all integers are in [0, 100], or if 99% fall in [0, 100]?

  • addNum(int num) — add a number from an incoming stream
  • findMedian() — return the median of all numbers seen so far
  • Up to 50,000 combined operations — naive O(n) per call is too slow
  • Follow-up: optimize for integers in [0,100] range

Naive Approaches and Why They Fail

The most intuitive approach is to maintain a sorted array. On each addNum call, binary search to find the insertion point and insert. findMedian then reads the middle element directly. Binary search itself is O(log n), but insertion into a sorted array requires shifting elements — O(n) per addNum call.

A second naive approach defers sorting: just append to an unsorted array on addNum, then fully sort on findMedian. This makes addNum O(1) but findMedian O(n log n). If findMedian is called after every addNum, the total cost is still O(n^2 log n) across all operations.

Both approaches are too slow when n reaches 50,000 operations. What you need is a data structure that maintains order incrementally without paying the full sort cost each time.

💡

Key Insight

Two heaps give you O(log n) per addNum and O(1) findMedian — a dramatic improvement over re-sorting. The trick is maintaining a max-heap for the lower half and a min-heap for the upper half.

Two-Heap Intuition — Lower and Upper Halves

The core idea is to split your numbers into two halves at all times. A max-heap holds the smaller half of the numbers, and a min-heap holds the larger half. The top of the max-heap is the largest number in the lower half; the top of the min-heap is the smallest number in the upper half.

To find the median: if the total count is odd, one heap has one extra element and its top is the median. If even, the median is the average of both tops. You never need to sort anything — the answer sits right at the heap tops.

The constraint is that the heaps must stay balanced: their sizes differ by at most one. Specifically, we maintain the invariant that the max-heap size is always greater than or equal to the min-heap size, and at most one larger. This lets us pick the right heap to read from for the median.

  1. 1Max-heap (lower half) holds the smaller numbers — top is the largest small number
  2. 2Min-heap (upper half) holds the larger numbers — top is the smallest large number
  3. 3Median is top of max-heap when count is odd, or average of both tops when even
  4. 4Heaps stay balanced: sizes differ by at most 1, max-heap size >= min-heap size

Balancing the Heaps — The Add-Then-Rebalance Pattern

The balancing algorithm has two steps. First, always push the new number onto the max-heap, then immediately pop the max-heap top and push it to the min-heap. This ensures every number passes through the max-heap and any number larger than the current lower half migrates to the upper half.

Second, if the min-heap grows larger than the max-heap after the first step, pop the min-heap top and push it back to the max-heap. This restores the size invariant. After these at-most-two operations, the heaps are balanced again.

Why always push to max-heap first? It routes the new number through a comparison with the current median boundary. If the new number is smaller than the current median, it stays in the max-heap. If larger, it ends up in the min-heap. The rebalance step then corrects the size difference if needed.

ℹ️

Implementation Note

The add-then-rebalance pattern guarantees the heap invariant in just 2 operations: push new element to max-heap, move max-heap top to min-heap, then optionally move min-heap top back. You never need more than 2 pushes and 2 pops per addNum call.

Follow-Up Optimizations for Constrained Inputs

If all integers are guaranteed to be in the range [0, 100], you can use a bucket counting array of size 101. Increment the bucket on addNum in O(1). For findMedian, scan through the buckets to find the middle count — at most 101 iterations, effectively O(1).

If 99% of integers fall in [0, 100] but outliers exist, use a hybrid: bucket counting for in-range values plus two small heaps for out-of-range outliers. Track the total count carefully so findMedian knows whether the median falls in the bucket range or in the outlier heaps.

These optimizations illustrate a broader principle: when input distribution is constrained, you can often beat the general-case complexity. The interviewer asking this follow-up is testing whether you can reason about tradeoffs between generality and performance.

  1. 1All integers in [0,100]: bucket array of size 101, O(1) add, O(100) find median
  2. 2Track running total count and prefix sums over buckets to locate median
  3. 399% in [0,100]: bucket counting for range + two heaps for outliers
  4. 4Combine outlier heaps with bucket median to find true median across all numbers

Code Walkthrough — Python and Java

In Python, heapq only provides a min-heap. To simulate a max-heap for the lower half, negate all values before pushing and negate again when popping. The max-heap stores negated values so the smallest negated value (most negative) corresponds to the largest actual value. The min-heap stores values as-is.

In Java, PriorityQueue defaults to a min-heap. Create the max-heap with Collections.reverseOrder() or a lambda comparator. The addNum logic is identical: offer to maxHeap, then poll from maxHeap and offer to minHeap, then rebalance if needed. findMedian reads maxHeap.peek() when sizes differ, or the average of both peeks.

Time complexity: addNum is O(log n) due to heap push and pop operations. findMedian is O(1) — just reading the top elements. Space complexity is O(n) for n numbers stored across both heaps. This is the optimal complexity class for the general streaming median problem.

⚠️

Python Pitfall

In Python, always negate values in BOTH directions when using heapq as a max-heap: negate before pushing (-val) and negate after popping (-heapq.heappop(heap)). Forgetting to negate in one direction is the most common bug in Python heap implementations.

Ready to master algorithm patterns?

YeetCode flashcards help you build pattern recognition through active recall and spaced repetition.

Start practicing now