Contains Duplicate: Your First Hash Set Problem
Contains Duplicate (#217) is often the very first LeetCode problem people solve — and for good reason. It is the opening problem in both the NeetCode 150 roadmap and the Blind 75 list, and it perfectly demonstrates why hash sets exist. If you can solve this problem, you already understand the single most important data structure trick in coding interviews.
The contains duplicate leetcode problem asks a deceptively simple question: given an integer array, return true if any value appears at least twice. That is it. No complex constraints, no tricky edge cases hiding behind the problem statement. But the simplicity is exactly the point — this problem is a clean canvas for learning how to think about time and space tradeoffs.
In this walkthrough, you will see three different approaches to solving LeetCode 217, understand why each one works, and learn why the hash set solution is optimal. By the end, you will have a pattern you can apply to dozens of future problems.
Understanding the Contains Duplicate Problem
The problem statement for LeetCode 217 is refreshingly straightforward. Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct. The input array can contain both positive and negative integers, and its length can range from one to over one hundred thousand elements.
Before jumping into code, consider what "duplicate detection" really means. You need to determine whether any element in the array has been seen before as you traverse it. The challenge is not logic — it is efficiency. A naive approach works, but can you do it fast enough for large inputs?
- Input: an integer array nums of length 1 to 100,000
- Output: true if any value appears at least twice, false otherwise
- Constraints: elements can be any integer from negative one billion to positive one billion
- Expected optimal time complexity: O(n)
Gateway Problem
Contains Duplicate (#217) is the first problem in the NeetCode 150 roadmap and the Blind 75 — it is the gateway problem that introduces hash sets and O(1) lookup.
Approach 1: Brute Force — Check Every Pair
The most obvious approach is to compare every pair of elements. For each element at index i, you check every element at index j where j is greater than i. If you find any pair where nums[i] equals nums[j], you return true. If you exhaust all pairs without a match, you return false.
This approach is correct, but it is painfully slow. With two nested loops, the time complexity is O(n squared). For an array of 100,000 elements, that means up to 10 billion comparisons. LeetCode will time out on large test cases, and any interviewer will expect you to do better.
The brute force approach uses O(1) extra space since you are not storing anything beyond loop variables. But the quadratic time complexity makes it impractical. Think of it as the baseline — the solution you mention briefly to show you understand the problem before moving on.
Approach 2: Sort First, Then Check Adjacent Elements
A better idea is to sort the array first. Once sorted, any duplicate elements will be adjacent to each other. You can then make a single pass through the sorted array, comparing each element to the one before it. If any two adjacent elements are equal, you have found a duplicate.
Sorting takes O(n log n) time, and the subsequent pass takes O(n), so the overall time complexity is O(n log n). This is a significant improvement over brute force, and it only requires O(1) extra space if you sort in place. Many candidates stop here, but there is a faster approach.
The main drawback of the sort approach is that it modifies the original array. In an interview, the interviewer might explicitly say "do not modify the input," which immediately rules this option out. Even if they do not say it, mentioning this tradeoff shows awareness of real-world constraints.
Watch Out
The sort approach modifies the original array — if the interviewer says "don't modify the input," the hash set approach is the only O(n) option. Always check constraints.
Approach 3: Hash Set — The Optimal Contains Duplicate Solution
The optimal leetcode 217 solution uses a hash set. As you iterate through the array, you check whether each element is already in the set. If it is, you return true immediately — you have found a duplicate. If it is not, you add it to the set and continue. If you finish the entire array without finding a duplicate, you return false.
This approach runs in O(n) time because each lookup and insertion into a hash set takes O(1) on average. You visit each element exactly once. The space complexity is O(n) in the worst case, because you might need to store every element before discovering there are no duplicates.
In Python, the implementation is elegant. You create an empty set, loop through each number, check membership with the "in" operator, and add to the set if it is new. The hash set duplicate detection pattern is the same one used in Two Sum, Valid Anagram, and nearly every deduplication problem on LeetCode.
The hash set approach is what interviewers want to see. It demonstrates that you understand the fundamental tradeoff: you spend O(n) extra memory to buy O(1) lookup time, reducing total time from O(n squared) to O(n). This tradeoff is the backbone of countless interview problems.
- Time complexity: O(n) — one pass through the array
- Space complexity: O(n) — hash set stores up to n elements
- Key insight: hash set gives O(1) average lookup, turning a quadratic problem into a linear one
- Works without modifying the original array
Edge Cases for Contains Duplicate
Even simple problems have edge cases worth mentioning in an interview. An empty array should return false — there are no elements, so there cannot be duplicates. A single-element array also returns false for the same reason. These are the base cases your solution should handle naturally.
An array where every element is identical is the opposite extreme — your hash set will catch the duplicate on the very second element, returning true almost immediately. Very large arrays with no duplicates represent the worst case for space, since the set will grow to hold every element.
Another consideration is integer overflow in languages like C++ or Java. Since elements can range from negative one billion to positive one billion, make sure your data types can handle the range. In Python, this is a non-issue since integers have arbitrary precision.
Pro Tip
The entire solution in Python is: return len(nums) != len(set(nums)). One line. But in an interview, walk through the hash set approach step by step to show your understanding.
What Contains Duplicate Teaches You
Contains Duplicate is not just a warm-up problem — it is the foundation for an entire family of interview questions. The hash set for O(1) lookup pattern you learn here is the exact same pattern used in Two Sum (#1), Valid Anagram (#242), Group Anagrams (#49), and Longest Consecutive Sequence (#128). Master this pattern once, and you unlock all of them.
The broader lesson is about recognizing when a problem needs fast membership testing. Any time a problem asks "have you seen this before" or "does this value exist in a collection," your first instinct should be a hash set. This single insight eliminates brute force from an enormous number of LeetCode problems.
If you are just starting your LeetCode journey, solve Contains Duplicate first, then move to Two Sum and Valid Anagram. These three problems form a natural progression that builds your hash map and hash set intuition. Practice them with YeetCode flashcards to lock the pattern into long-term memory through spaced repetition.