Problem Overview — Weighted Random Index Selection
LeetCode 528 Random Pick with Weight asks you to implement a class with two methods. The constructor receives an integer array w where w[i] is the weight of index i. The pickIndex() method returns an index i with probability proportional to w[i] divided by the total sum of all weights. Heavier weights should be chosen more often.
For example, if w = [1, 3], then index 0 should be returned 25% of the time and index 1 should be returned 75% of the time. The weights define a discrete probability distribution, and your implementation must sample from it correctly over many calls without bias.
This problem appears frequently in interviews at companies like Google, Amazon, and Facebook because it combines two foundational techniques — prefix sums and binary search — into a clean real-world application. Understanding it deeply prepares you for randomized algorithm questions, sampling problems, and the inverse CDF method used in statistics.
- 1 <= w.length <= 10000
- 1 <= w[i] <= 10^5
- pickIndex() will be called at most 10^4 times
- Return index i with probability w[i] / sum(w)
- All weights are positive integers — no zero weights
Naive Approach and Why It Wastes Space
The simplest approach is to expand the weight array into a flat array where each index appears exactly w[i] times. For w = [1, 3], you would create [0, 1, 1, 1] — index 0 appears once (25%) and index 1 appears three times (75%). Picking a random element from this flat array gives the correct probability distribution automatically.
This naive approach is easy to understand but wastes enormous amounts of memory. If w = [100000, 100000, 100000], the flat array has 300,000 elements just to represent 3 indices. With 10,000 weights each at maximum value 10^5, the flat array could reach 10^9 elements — far too large to store in memory. The O(sum(w)) space complexity makes this approach impractical.
The prefix sum approach solves the same problem in O(n) space. Instead of materializing every repeated element, you store where each weight range starts and ends. The key insight is that a random number maps to exactly one range, and you can find that range in O(log n) using binary search rather than storing every element explicitly.
Prefix Sums Compress Weight Ranges into O(n) Space
Prefix sums compress the weight ranges into O(n) space while preserving the probability distribution. Instead of storing index 1 three times, store a range boundary — prefix[1] = 4 — so any random number from 2 to 4 maps to index 1. This uses one number to represent an arbitrarily large weight, keeping space linear regardless of weight magnitudes.
Prefix Sum Construction — Building Weight Boundaries
Build the prefix sum array by computing prefix[i] = w[0] + w[1] + ... + w[i] for all i. Each index i owns the half-open range (prefix[i-1], prefix[i]] — that is, from one past the previous prefix sum up to and including the current prefix sum. The total sum is prefix[n-1], the last element of the prefix array.
For w = [1, 3, 2], the prefix array is [1, 4, 6]. Index 0 owns range (0, 1] — just the value 1. Index 1 owns range (1, 4] — values 2, 3, 4. Index 2 owns range (4, 6] — values 5, 6. A random number in [1, 6] falls in exactly one of these ranges with probability proportional to the range width, which equals the original weight.
The critical property is that these ranges partition the interval [1, totalSum] with no gaps and no overlaps. Every integer from 1 to totalSum belongs to exactly one range. Generating a uniform random integer in [1, totalSum] and finding which range it lands in gives each index i a selection probability of exactly w[i] / totalSum.
- 1Compute prefix[0] = w[0]
- 2For i from 1 to n-1: prefix[i] = prefix[i-1] + w[i]
- 3totalSum = prefix[n-1]
- 4Index i owns range (prefix[i-1], prefix[i]] where prefix[-1] = 0
- 5A random integer r in [1, totalSum] belongs to exactly one range
Binary Search for Pick — O(log n) Random Selection
To implement pickIndex(), generate a random integer r uniformly in [1, totalSum]. Then binary search the prefix sum array for the smallest index i where prefix[i] >= r. This index i is the one whose range contains r, so it is returned as the result. The binary search takes O(log n) time per call, regardless of the weight magnitudes.
The correctness follows directly from the prefix sum construction. For index i to be selected, r must fall in the range (prefix[i-1], prefix[i]]. The number of integers in this range is prefix[i] - prefix[i-1] = w[i]. Since r is uniform over totalSum values, index i is selected with probability w[i] / totalSum, which is exactly what the problem requires.
Compared to the naive flat array approach, this is both space-efficient and fast. Construction is O(n) time and O(n) space — linear in the number of weights. Each pickIndex() call is O(log n) — far better than O(sum(w)) for the naive approach. The trade-off is minimal: binary search adds negligible overhead while saving enormous space.
Inverse CDF Method from Probability Theory
This binary search on prefix sums is equivalent to sampling from a discrete probability distribution using the inverse CDF (cumulative distribution function) method. The prefix sum array IS the CDF of the distribution. Generating a uniform random number and finding where it falls in the CDF maps to the correct outcome — a classical technique used throughout statistics, simulation, and randomized algorithms.
Why bisect_left vs bisect_right — Python Details
In Python, the bisect module provides bisect_left and bisect_right for binary search insertion points. For the prefix sum array, use bisect_left with the target r. bisect_left(prefix, r) returns the leftmost index where r could be inserted to keep the array sorted — which is the first index where prefix[i] >= r. This is exactly the index whose range contains r.
Using bisect_right would give incorrect results in the edge case where r equals a prefix sum value exactly. bisect_right(prefix, r) returns the index after all occurrences of r, which would point to the next range instead of the current one. For example, with prefix = [1, 4, 6] and r = 1, bisect_right returns 1 (pointing to index 1), but bisect_left returns 0 (correctly pointing to index 0 whose range includes 1).
The random number must be generated in [1, totalSum] inclusive using random.randint(1, totalSum) in Python. This ensures every integer from 1 to totalSum can be generated, and each maps uniquely to one prefix range. The bisect_left call then finds the correct bucket in O(log n) time without any additional logic.
- 1import bisect and random at the top
- 2In __init__: build self.prefix as cumulative sum of w
- 3In pickIndex: r = random.randint(1, self.prefix[-1])
- 4Return bisect.bisect_left(self.prefix, r)
- 5bisect_left finds first prefix[i] >= r — correct bucket
- 6bisect_right would skip to next bucket on exact boundary hits
Code Walkthrough — Python and Java Implementations
Python solution: __init__ builds the prefix sum array by iterating through w and appending cumulative sums. This is O(n) time and O(n) space. pickIndex uses random.randint(1, self.prefix[-1]) to generate a uniform integer in [1, totalSum], then bisect.bisect_left(self.prefix, r) to find the target index in O(log n). The entire class is 5-6 lines of code.
Java solution: in the constructor, build int[] prefix with the same cumulative sum logic. pickIndex generates a random integer using new Random().nextInt(prefix[n-1]) + 1 to get a value in [1, totalSum]. Then use Arrays.binarySearch(prefix, r) — if the value is found, it returns the exact index; if not found, it returns -(insertionPoint) - 1, so the actual insertion point is -(result) - 1. Alternatively, implement binary search manually to avoid the negative index handling.
Both implementations share the same O(n) constructor and O(log n) pickIndex. The only language-specific difference is random number generation and binary search API. For interviews, the Python bisect solution is cleanest. For Java, a manual binary search using lo/hi/mid is often clearer than dealing with Arrays.binarySearch negative return values.
Generate Random in [1, totalSum] Inclusive
Generate the random number in [1, totalSum] inclusive, NOT [0, totalSum-1]. The prefix sum ranges start at 1, not 0 — the first range is (0, prefix[0]] which contains values 1 through prefix[0]. If you generate 0, no range contains it and bisect_left returns 0 incidentally, but this represents an off-by-one bug that produces incorrect probability distribution. Always use random.randint(1, totalSum) in Python or nextInt(totalSum) + 1 in Java.