Problem Walkthrough

Find Median from Data Stream LeetCode 295

Maintain two heaps: a max-heap for the lower half and a min-heap for the upper half. Balance their sizes so the median is always accessible from the heap tops in O(1).

8 min read|

Find Median Data Stream

LeetCode 295

Problem Overview

LeetCode 295 — Find Median from Data Stream — asks you to design a MedianFinder class with two methods: addNum(num) to add a number from the data stream, and findMedian() to return the median of all numbers added so far.

The median of a sorted list is the middle element (odd length) or the average of the two middle elements (even length). A naive approach sorts the list on each findMedian call — O(n log n) per query. We need O(1) or O(log n) per operation.

The two-heap technique is the classic solution: split the data into a lower half (max-heap) and an upper half (min-heap). The median is at the interface between the two heaps, accessible from their tops in O(1). Adding a number rebalances in O(log n).

  • addNum(num) — add a number from the stream
  • findMedian() — return median of all numbers so far
  • Median: middle element (odd) or average of two middles (even)
  • Two heaps: max-heap (lower half) and min-heap (upper half)
  • O(log n) per add, O(1) per findMedian

The Two-Heap Design

Maintain two heaps: lo (max-heap storing the smaller half of numbers) and hi (min-heap storing the larger half). The invariant: every element in lo is <= every element in hi. Additionally, sizes differ by at most 1: len(lo) == len(hi) or len(lo) == len(hi) + 1.

The max-heap lo gives instant access to the largest element in the lower half. The min-heap hi gives instant access to the smallest element in the upper half. Together, these two tops are the one or two middle elements.

findMedian(): if sizes are equal, return (lo.top + hi.top) / 2.0. If lo has one more element, return lo.top. This is O(1) — just reading the heap tops without any restructuring.

💡

Why Max-Heap for Lower Half?

The lower half needs its MAXIMUM accessible (the larger middle element). A max-heap provides this in O(1). The upper half needs its MINIMUM (the smaller middle element). A min-heap provides this. The two extremes meet at the median.

addNum Balancing Logic

When adding num: first, push it to lo (max-heap). Then, pop the max from lo and push it to hi (min-heap). This ensures lo's max doesn't exceed hi's min. If hi becomes larger than lo, pop hi's min and push it back to lo.

This two-step process guarantees the invariant: (1) all elements in lo <= all elements in hi (because we route through lo first, moving the max to hi). (2) lo's size >= hi's size (because we rebalance if hi grows too large).

Each addNum performs at most 3 heap operations (push to lo, pop from lo + push to hi, possibly pop from hi + push to lo). Each operation is O(log n), so addNum is O(log n) total.

Step-by-Step Walkthrough

Stream: [1, 2, 3]. addNum(1): push to lo → lo=[1], hi=[]. Pop lo max (1), push to hi → lo=[], hi=[1]. hi larger → pop hi min (1), push to lo → lo=[1], hi=[]. findMedian: lo.top = 1.

addNum(2): push to lo → lo=[2,1]. Pop lo max (2), push to hi → lo=[1], hi=[2]. Sizes equal. findMedian: (1+2)/2 = 1.5. addNum(3): push to lo → lo=[3,1]. Pop lo max (3), push to hi → lo=[1], hi=[2,3]. hi larger → pop hi min (2), push to lo → lo=[2,1], hi=[3].

findMedian: lo has more elements, return lo.top = 2. Correct: sorted [1,2,3], median is 2. After 3 adds, lo=[2,1] (max-heap, top=2) and hi=[3] (min-heap, top=3). The median sits at lo's top.

ℹ️

Always Route Through Lo First

The add logic always pushes to lo first, then moves lo's max to hi. This ensures the ordering invariant. Then rebalancing ensures the size invariant. This fixed sequence avoids complex conditional logic.

Implementation in Python and Java

Python: heapq is a min-heap. For max-heap, negate values. lo = [] (max-heap via negation), hi = []. addNum: heappush(lo, -num). heappush(hi, -heappop(lo)). If len(hi) > len(lo): heappush(lo, -heappop(hi)). findMedian: if equal sizes, (-lo[0] + hi[0]) / 2. Else -lo[0].

Java: PriorityQueue<Integer> lo = new PriorityQueue<>(Collections.reverseOrder()) (max-heap). PriorityQueue<Integer> hi = new PriorityQueue<>() (min-heap). Same add logic. findMedian with lo.peek() and hi.peek().

The negation trick in Python is a standard pattern for max-heaps. Push -num, and when reading, negate again. Java's Collections.reverseOrder() comparator is cleaner but the underlying logic is identical.

Complexity Analysis

addNum: O(log n) per call. At most 3 heap push/pop operations, each O(log n). Over n calls, total time is O(n log n).

findMedian: O(1) per call. Just peek at the tops of both heaps. No heap restructuring needed.

Space: O(n) total for storing all n numbers across the two heaps. Each number lives in exactly one heap.

YeetCode Flashcard Tip

Find Median from Data Stream is the hardest heap problem on LeetCode. Practice it alongside Top K Frequent Words and Merge K Sorted Lists on YeetCode to master heap-based design patterns.

Ready to master algorithm patterns?

YeetCode flashcards help you build pattern recognition through active recall and spaced repetition.

Start practicing now