Problem Overview — Count Smaller Elements to the Right
LeetCode 315 Count of Smaller Numbers After Self presents an integer array nums and asks you to return a counts array where counts[i] equals the number of elements to the right of nums[i] that are strictly smaller than nums[i]. Each position in the output independently counts how many values in the suffix are smaller — this is not a global sort but a per-element query over a varying suffix window.
For example, given nums = [5, 2, 6, 1], the answer is [2, 1, 1, 0]. At index 0 the value 5 has two smaller elements to its right: 2 and 1. At index 1 the value 2 has one smaller element to its right: 1. At index 2 the value 6 has one smaller element: 1. At index 3 the value 1 is the last element so its count is 0. A single-element array always returns [0].
The input constraints are what make this problem interesting: nums can have up to 10^5 elements with values ranging from -10^4 to 10^4. The naive O(n^2) solution that scans all right neighbors for each element exceeds time limits. An efficient solution must exploit structure in the problem — specifically, the insight that inversion counting during merge sort and Binary Indexed Tree prefix queries both solve this in O(n log n).
- Given: integer array nums of length 1 to 10^5
- Return: integer array counts where counts[i] = number of elements to the RIGHT of nums[i] that are STRICTLY LESS than nums[i]
- Values range from -10^4 to 10^4, including negatives
- Last element always has count 0 — no elements to its right
- Example: [5,2,6,1] → [2,1,1,0]
- Example: [-1,-1] → [0,0] (equal elements are NOT counted)
Brute Force and Why O(n^2) Fails
The brute force approach iterates over each index i, then scans all indices j > i to count elements where nums[j] < nums[i]. This nested loop is straightforward to implement and easy to verify for correctness, but its O(n^2) time complexity makes it untenable for the full input range. At n = 10^5, the nested loop performs roughly 5 * 10^9 comparisons — well beyond the ~10^8 operations per second budget that competitive programming judges allow.
Each element independently needs information about its entire suffix, and suffixes overlap heavily: the suffix for index 0 contains all n-1 remaining elements, the suffix for index 1 contains n-2 elements, and so on. The total work is proportional to n + (n-1) + ... + 1 = n*(n+1)/2. No caching of intermediate results is straightforward because each element uses a different suffix window. A fundamentally different algorithmic approach is required.
Space complexity of the brute force is O(1) extra (beyond the output array), which is technically optimal, but the quadratic time makes it unusable. Both the merge sort and BIT approaches trade a modest O(n) or O(n log n) space overhead for the O(n log n) time improvement that makes the solution viable for the full constraint range.
This Is an Inversion Counting Problem
Counting elements smaller to the right is equivalent to counting inversions in a specialized sense: for each position i, count pairs (i, j) where i < j and nums[i] > nums[j]. This is a classic inversion counting problem solvable with modified merge sort. The key insight is that merge sort naturally groups elements such that cross-half inversions can be counted in bulk during the merge step rather than one comparison at a time.
Merge Sort Approach — Track Original Indices
The merge sort approach modifies the standard merge sort algorithm to count inversions as a byproduct of the sort. The central idea is: when an element from the right half is placed before elements from the left half during a merge, each such placement represents an inversion — the right element is smaller than the remaining left elements, meaning those left elements each have one more "smaller element to the right" count.
The critical implementation detail is index tracking. The output array counts must map back to the original positions, but merge sort rearranges elements. To handle this, instead of sorting the raw values array, sort an array of (value, original_index) pairs. When a merge places a right-half element before left-half elements, the counts at the original indices of those left-half elements are incremented by the number of right-half elements that have already been placed (those that came before in the current merge step).
The recursion divides the array in half, recursively sorts and counts for each half, then merges. The merge step is where all the counting happens. Because the two halves are already internally sorted after the recursive calls, the merge can use a two-pointer technique to identify inversions in O(n) per level, giving O(n log n) total. The counts array is initialized to all zeros and accumulated across all merge steps in the recursion tree.
- 1Create indexed array: pairs = [(nums[i], i) for i in range(n)]
- 2Initialize counts array of length n with zeros
- 3Call merge_sort(pairs, counts)
- 4In merge_sort(arr, counts): if len <= 1 return arr
- 5Split arr into left and right halves
- 6Recursively sort: left = merge_sort(left, counts); right = merge_sort(right, counts)
- 7Merge: use pointers i (left) and j (right)
- 8While merging, if right[j].val < left[i].val: right element placed first, increment counts[left[k].orig_idx] for all remaining left k >= i by 1 (or track j_count)
- 9Actually: track how many right elements have been placed so far (right_count); when placing a left element, counts[left[i].orig_idx] += right_count
- 10Return merged sorted array
How Merge Counts Inversions — The Core Insight
During the merge of a sorted left half and a sorted right half, the two-pointer process compares left[i] and right[j]. If right[j] < left[i], then because the left half is sorted, right[j] is also smaller than left[i+1], left[i+2], all the way through left[end]. Rather than counting each of those inversions individually, the algorithm can count them all at once: right[j] contributes one to the count for every remaining left element, which is (left.length - i) elements at the moment right[j] is placed.
Equivalently, the algorithm can track right_count: the number of right-half elements that have been placed into the merged output so far. Each time a left-half element left[i] is placed into the output, right_count elements from the right half have already been placed that are smaller than left[i] (because they were placed first and the right half is sorted). Therefore counts[original_index_of_left[i]] += right_count at the moment left[i] is placed.
This counting is exact because: (1) both halves are sorted, so the order of placement is deterministic; (2) right elements placed before a left element are exactly those that are smaller than the left element within this merge; (3) inversions between elements in the same half are already counted in the recursive calls. Together these three properties guarantee that every inversion pair (i, j) with i < j and nums[i] > nums[j] is counted exactly once across all merge levels.
Merge Step Counts Cross-Half Inversions Exactly Once
The merge step in modified merge sort counts inversions between the left half and right half precisely because both halves are already sorted. An element from the left half that is larger than k elements from the right half will see all k of those right elements placed before it in the merged output — right_count captures exactly this k. Inversions within each half are handled in the recursive calls, so each inversion pair is counted at exactly one level of the recursion tree, guaranteeing correctness.
BIT Alternative — Coordinate Compression and Prefix Queries
A Binary Indexed Tree (BIT), also called a Fenwick Tree, provides an alternative O(n log n) approach. The key operation needed is: given the elements already processed, how many are strictly smaller than the current element? A BIT supports prefix sum queries and point updates in O(log n) each, making it ideal. The algorithm processes elements right-to-left: for each element, query the BIT for count of elements smaller than it (already inserted from its right), then insert it.
Because BIT indices must be positive integers in a bounded range, coordinate compression is required first. Sort the unique values of nums, assign each unique value a rank from 1 to m (where m is the number of distinct values). Replace each element in nums with its rank. Now queries for "how many inserted elements have rank < current_rank" are prefix sum queries: query(current_rank - 1). After querying, update the BIT at current_rank to record that this value has been seen.
The BIT approach is iterative (no recursion) and avoids the complexity of tracking index pairs through a sort. However, the coordinate compression step requires extra code and the constant factors are comparable to merge sort. Both approaches have O(n log n) time and O(n) space. In practice, the BIT approach can be slightly faster due to better cache behavior and simpler control flow, while merge sort is more natural for candidates familiar with divide-and-conquer.
- 1Coordinate compress: sorted_unique = sorted(set(nums)); rank = {v: i+1 for i, v in enumerate(sorted_unique)}
- 2Initialize BIT of size m+1 (m = number of distinct values), all zeros
- 3Initialize counts array of length n with zeros
- 4Iterate i from n-1 down to 0:
- 5 r = rank[nums[i]]
- 6 counts[i] = bit_query(r - 1) # how many inserted values have rank < r
- 7 bit_update(r, 1) # record that nums[i] has been seen
- 8Return counts
- 9BIT query(i): sum of BIT[1..i] using i += i & (-i) traversal
- 10BIT update(i, delta): add delta at index i and propagate using i -= i & (-i)
Code Walkthrough — Python and Java
In Python, the merge sort solution creates indexed pairs with enumerate, initializes a counts list of zeros, and defines a nested merge_sort function. The function splits the list in half, recursively processes each half, then merges using a right_count variable. For each left element placed, counts[orig_idx] += right_count. For each right element placed, right_count += 1. The time complexity is O(n log n) for the sort with O(n) per merge level across O(log n) levels. Space is O(n) for the temporary merged arrays at each level.
In Java, the approach uses an int[][] array where each entry is {value, originalIndex}. A helper method mergeSort takes the array, counts array, left bound, and right bound. It splits at mid, recurses on both halves, then calls a merge helper. The merge helper uses a rightCount int variable and two ArrayDeques or a temporary array. When a left element is chosen, counts[left[i][1]] += rightCount. When a right element is chosen, rightCount++. The final merged array overwrites the subarray in-place using System.arraycopy.
A common pitfall is forgetting to pass and mutate the original counts array through the recursion — if counts is declared locally inside merge_sort it will be reset on each call. Both implementations must also handle the edge case where the input has length 1, returning immediately without counting. For the BIT version in Python, a simple list-based BIT with 1-indexed operations suffices; in Java, an int[] of size n+2 with the standard Fenwick Tree update and query loops works correctly.
Must Track Original Indices Through the Sort
The most common implementation bug in the merge sort approach is forgetting to track original indices. Merge sort rearranges elements, so after sorting you cannot map counts back to the original positions without storing the original index alongside each value. Always sort pairs (value, original_index) rather than raw values. When a left-half element is placed during a merge, use its stored original_index to update counts[original_index] += right_count.