Problem Overview — Find the Longest Consecutive Chain in O(n)
LeetCode 128 Longest Consecutive Sequence gives you an unsorted integer array and asks you to return the length of the longest sequence of consecutive integers. Consecutive here means elements that differ by exactly 1 — [1,2,3,4] is consecutive, [1,3,5] is not. For example, given [100,4,200,1,3,2], the answer is 4 because the subsequence [1,2,3,4] forms the longest consecutive chain. The elements do not need to be adjacent in the original array — they just need to all be present somewhere in it.
The critical constraint is the O(n) time requirement. A naive approach that sorts the array would find the answer in O(n log n) time and O(1) extra space, but the problem explicitly requires O(n). This forces you away from comparison-based sorting and toward a hashing approach that enables O(1) membership testing. The HashSet solution achieves true O(n) time by cleverly avoiding redundant work via the sequence-start optimization.
The problem guarantees no specific range for values, so the numbers may be very spread out — [1, 1000000000] is a valid input. This rules out direct-address arrays or bucket-based approaches. The only O(n) solution that handles arbitrary integer ranges is the HashSet approach, making it the canonical answer for this problem in interviews.
- Given unsorted integer array nums, return length of longest consecutive elements sequence
- Consecutive means elements differing by exactly 1 — [1,2,3,4] not [1,3,5]
- [100,4,200,1,3,2] → 4 because [1,2,3,4] is present in the array
- Elements need not be adjacent in input — just all present somewhere
- Array length 0 to 100,000; values from −10^9 to 10^9
- Must run in O(n) time — ruling out sorting-based O(n log n) solutions
Sorting Approach — O(n log n) Fallback That Still Works
The sorting approach is the most intuitive solution: sort the array, then do a single linear scan counting consecutive runs. After sorting, any consecutive sequence becomes a contiguous block. Walk through the sorted array and maintain a current streak length. If nums[i] equals nums[i-1]+1, increment the streak. If nums[i] equals nums[i-1], skip (duplicate). Otherwise, reset the streak to 1. Track the maximum streak seen. This is O(n log n) due to sorting, plus O(1) extra space (or O(n) if sorting is not in-place).
Handling duplicates is important: if the same number appears multiple times it should not break a consecutive sequence, since you only need each value to appear at least once. Skip any nums[i] where nums[i] == nums[i-1] before checking for consecutive increments. For example, [1,2,2,3] should still report a streak of 3 for [1,2,3], not reset when it encounters the second 2.
The sorting approach is completely correct — it just does not meet the O(n) requirement. In an interview setting, presenting the sorting solution first and then explaining why it is O(n log n) before pivoting to the HashSet approach demonstrates strong systematic thinking. Many interviewers appreciate seeing both solutions and understanding the trade-off.
Sorting Is a Great Interview Fallback
Sorting is a great fallback if you blank on the O(n) approach in an interview. Present it first — it is correct, handles all edge cases, and requires only O(1) extra space. Then explain why the O(n) HashSet solution is preferable: it avoids the O(n log n) sort by using O(n) space to gain O(1) lookup. Interviewers reward candidates who can articulate trade-offs, not just memorize optimal solutions.
HashSet + Sequence Start — The Key to O(n) Without Redundant Work
The HashSet approach inserts all numbers into a set in O(n) time, enabling O(1) membership checks. Then for each number in the set, check whether num-1 exists in the set. If num-1 does NOT exist, then num is the start of a consecutive sequence — no smaller number connects to it. Only from these sequence starts do you count forward: increment a counter while num+count exists in the set. Track the longest count seen across all sequence starts.
The sequence-start check (num-1 not in set) is the entire trick. Without it, every number would trigger a forward count, and numbers in the middle of a sequence would recount the same chain redundantly. With it, only one number per chain — the smallest — triggers the inner counting loop. Every other number is visited in the outer loop but skipped immediately because num-1 exists in the set.
The steps are: (1) insert all elements into a HashSet in O(n); (2) for each num in the set, if num-1 is not in the set, count forward using a while loop: streak=1, while (num+streak) in set, streak++; (3) update global maximum with streak. Return the global maximum. Edge case: empty array returns 0.
- 1Insert all elements of nums into a HashSet — O(n) time, O(n) space
- 2Initialize longestStreak = 0
- 3For each num in the HashSet:
- 4 Check if (num - 1) is in the HashSet
- 5 If num - 1 is NOT present: this is a sequence start
- 6 Initialize currentStreak = 1, currentNum = num
- 7 While (currentNum + 1) is in the HashSet: increment currentNum and currentStreak
- 8 Update longestStreak = max(longestStreak, currentStreak)
- 9 If num - 1 IS present: skip — this number is not a sequence start
- 10Return longestStreak
Why This Is O(n) — Each Number Visited at Most Twice
The nested loop structure of the HashSet solution looks like it could be O(n^2) — for each number, you have a while loop. But the total number of iterations across all inner while loops is bounded by n, not n^2. Here is why: the inner while loop only runs starting from sequence starts, and it counts each number in the sequence exactly once. Every number in the array belongs to exactly one consecutive sequence. Once it is counted as part of a chain, it is never the start of another chain.
Concretely: in the outer loop, every number is visited once — O(n) total outer iterations. For each number, either (a) num-1 is in the set, so you skip in O(1), or (b) num is a sequence start, so you count forward. The inner while loop across all sequence starts collectively visits every number at most once more. The total across all inner loop iterations is at most n. So total work = n (outer) + n (all inner combined) = O(n).
This is an amortized argument: individual sequence starts may trigger long inner loops, but the sum of all inner loop lengths equals the total number of elements in the array. Amortized analysis is the right framework here — similar to how a dynamic array achieves amortized O(1) push even though occasional resizes are O(n).
The num-1 Check Prevents Redundant Counting
The key is the num-1 check: it ensures only sequence starts trigger counting, preventing redundant work. Without it, [1,2,3,4] would trigger inner loops from 1, 2, 3, and 4 — counting the chain 4+3+2+1=10 times total instead of once. With it, only 1 triggers a count (since 0 is absent), and 2, 3, 4 are all skipped immediately because their predecessors exist. The total inner loop work stays O(n) across the entire array.
Union-Find Alternative — Treat Numbers as Graph Nodes
Union-Find (Disjoint Set Union) offers an alternative O(n * alpha(n)) solution, where alpha(n) is the inverse Ackermann function — effectively constant for all practical inputs, making this essentially O(n). The idea: treat each distinct number as a node. For each number, if num+1 exists in the input, union num and num+1 into the same component. The answer is the size of the largest component across all unions.
Implementation requires a DSU structure with union-by-rank and path compression for the near-O(1) per operation guarantee. Build a map from each number to its DSU index. For each number, if num+1 exists in the map, call union(index(num), index(num+1)). After all unions, find the maximum component size. Component sizes can be tracked in the DSU's size array during union operations.
Union-Find is a valid alternative and demonstrates broader algorithmic knowledge, but it is more complex to implement correctly in an interview than the HashSet approach. The HashSet solution is shorter, equally efficient, and more directly solves the problem. Union-Find shines when you need to dynamically add numbers and query the longest sequence after each addition — which is a follow-up variant of this problem worth knowing.
- 1Map each distinct number to an index 0..m-1 where m = number of distinct values
- 2Initialize DSU: parent[i] = i, size[i] = 1 for all i in 0..m-1
- 3For each distinct number num:
- 4 If (num + 1) also exists in the map: union(index(num), index(num+1))
- 5After all unions, scan all roots and find max(size[find(i)]) over all i
- 6Return that maximum — it equals the longest consecutive sequence length
- 7Time: O(n * alpha(n)) ≈ O(n) — Space: O(n) for the DSU arrays and number map
Code Walkthrough — Python and Java Core Logic
Python solution: def longestConsecutive(nums): numSet = set(nums); longest = 0; for num in numSet: if num - 1 not in numSet: currentNum = num; streak = 1; while currentNum + 1 in numSet: currentNum += 1; streak += 1; longest = max(longest, streak); return longest. Key detail: iterate over numSet (not nums) to avoid rechecking duplicates. set(nums) deduplicates in O(n). The while condition checks currentNum+1 in numSet — O(1) per check. The body is 10 lines of core logic.
Java solution: public int longestConsecutive(int[] nums) { Set<Integer> numSet = new HashSet<>(); for (int n : nums) numSet.add(n); int longest = 0; for (int num : numSet) { if (!numSet.contains(num - 1)) { int currentNum = num; int streak = 1; while (numSet.contains(currentNum + 1)) { currentNum++; streak++; } longest = Math.max(longest, streak); } } return longest; } Java uses HashSet<Integer> with contains() for O(1) lookup. Both solutions are O(n) time O(n) space.
Both implementations return 0 for empty arrays automatically — the outer loop never executes, so longest remains 0. For arrays with all duplicates (e.g., [7,7,7,7]), the set has one element, num-1 (6) is absent, the while loop body never executes (8 is absent), streak=1, return 1. Handle the array of length 1 — any single element is a consecutive sequence of length 1, which both implementations handle correctly without special cases.
Handle Empty Arrays and All-Duplicate Arrays
Don't forget to handle empty arrays and arrays with all duplicates. An empty array should return 0 — both implementations handle this since the for loop never runs. An array like [7,7,7,7] should return 1 — the set deduplicates to {7}, then 7 is a sequence start (6 absent), the while loop finds 8 absent immediately, streak=1. No special-case code needed, but verify these edge cases mentally before submitting in an interview.