Problem Overview
LeetCode 8 — String to Integer atoi — asks you to implement a function that converts a string to a 32-bit signed integer, mimicking the behavior of the C standard library function atoi. The algorithm must handle leading whitespace, an optional sign character, a sequence of digit characters, and overflow — returning INT_MIN or INT_MAX when the parsed value falls outside the range [-2^31, 2^31-1].
The function stops reading at the first character that is not a valid part of the integer: any non-digit after the sign and digit phase terminates parsing. If no valid digit sequence is found after stripping whitespace and reading the sign, the function returns 0. The result is then clamped to the 32-bit signed integer range regardless of how large the intermediate value grew.
String to Integer atoi is a medium-difficulty problem that tests systematic parsing, state management, and careful overflow detection. It appears frequently in interviews because it has multiple edge cases and requires disciplined, step-by-step reasoning rather than a clever algorithmic insight. A clean sequential implementation is more valued in interviews than a complex one.
- Input: a string s that may start with whitespace, then an optional sign, then digits
- Goal: return the integer value, or 0 if no valid integer found, or clamped if out of range
- Stop at the first non-digit character after the sign
- Clamp to [-2^31, 2^31-1] if the value overflows 32-bit signed integer
- Return 0 if the string has no valid integer (no digits found)
- Edge cases: empty string, only whitespace, only sign, leading zeros, huge numbers
The Parsing Steps
The algorithm proceeds in exactly four sequential steps. First, skip all leading whitespace characters — advance the index i while s[i] is a space character. Second, read the optional sign character: if s[i] is + or -, record the sign and advance i. If s[i] is neither + nor -, assume positive and do not advance. Third, read consecutive digit characters and accumulate the result: while s[i] is a digit 0-9, update result = result * 10 + (s[i] - '0'), then advance i. Fourth, apply the sign and clamp to the 32-bit range.
Each step consumes exactly the characters it needs and leaves the index positioned for the next step. There is no backtracking. The steps are independent and composable — you could implement each as a small helper function. The sequential structure is what makes this implementation easy to reason about and debug.
The four steps map naturally to a state machine: the whitespace step is the WHITESPACE state, the sign step is the SIGNED state, the digit accumulation step is the IN_NUMBER state, and any terminating condition is the DONE state. Thinking in states helps you handle edge cases systematically — each state has a defined set of valid transitions and a defined action on each transition.
- 1Step 1 — Skip leading whitespace: while i < len(s) and s[i] == ' ': i += 1
- 2Step 2 — Read optional sign: if s[i] == '+': i += 1; elif s[i] == '-': sign = -1; i += 1
- 3Step 3 — Read consecutive digits: while i < len(s) and s[i].isdigit(): result = result * 10 + int(s[i]); i += 1
- 4Step 4 — Apply sign and clamp: result = sign * result; return max(INT_MIN, min(INT_MAX, result))
Treat This as a State Machine with 4 States
Treat this as a state machine with four states: WHITESPACE, SIGN, DIGIT, and DONE. Each character causes a transition — a space in WHITESPACE stays in WHITESPACE; a +/- in WHITESPACE moves to SIGN; a digit in SIGN or WHITESPACE moves to DIGIT; any non-digit in DIGIT moves to DONE. Thinking in states prevents you from forgetting edge cases like a sign character appearing after digits, or multiple sign characters.
Step-by-Step Implementation
The implementation maintains a single index i starting at 0 and a sign variable defaulting to +1. The whitespace-skipping loop runs while i is within bounds and s[i] is a space. After the loop, check if i is still within bounds before reading the sign character — the string might be entirely whitespace. If s[i] is '-', set sign = -1 and advance i; if s[i] is '+', just advance i; otherwise leave sign = +1 and do not advance.
The digit accumulation loop runs while i is within bounds and s[i] is a digit. Before computing result = result * 10 + digit, perform an overflow check. This check must happen before the multiplication, not after, because the multiplication itself may overflow the integer type in statically typed languages. In Python, integers are arbitrary precision, so there is no actual overflow during accumulation — the clamping happens at the end.
After the digit loop, apply the sign: result = sign * result. Then clamp: return max(-2147483648, min(2147483647, result)). The implementation is O(n) time because the index i advances monotonically through the string and each character is visited at most once. Space is O(1) — only the index, sign, and result variables are used.
- 1Initialize: i = 0, sign = 1, result = 0
- 2Skip spaces: while i < n and s[i] == ' ': i += 1
- 3Read sign: if i < n and s[i] in '+-': sign = (-1 if s[i] == '-' else 1); i += 1
- 4Read digits: while i < n and s[i].isdigit():
- 5 digit = int(s[i])
- 6 Check overflow BEFORE multiplying (see Overflow Detection section)
- 7 result = result * 10 + digit; i += 1
- 8Return: max(INT_MIN, min(INT_MAX, sign * result))
Overflow Detection
Overflow detection is the trickiest part of the implementation, especially in statically typed languages like C++ and Java where integer arithmetic wraps around. Before computing result * 10 + digit, check whether the multiplication would exceed INT_MAX (2147483647). The condition is: if result > INT_MAX // 10, return immediately (INT_MAX if sign is positive, INT_MIN if sign is negative). If result == INT_MAX // 10 and digit > 7, also return immediately — because 2147483647 ends in 7, so any digit greater than 7 would push the value past INT_MAX.
The symmetric case for INT_MIN is handled by the same check: since INT_MIN = -2147483648 and INT_MAX = 2147483647, the magnitude of INT_MIN is one greater than INT_MAX. However, if you check against INT_MAX // 10 = 214748364 and digit > 7 before applying the sign, you correctly handle both the positive overflow (clamp to INT_MAX) and negative overflow (clamp to INT_MIN) cases — because the last digit of |INT_MIN| is 8, which is also greater than 7.
In Python, arbitrary-precision integers mean the multiplication result * 10 + digit never overflows during the accumulation loop. The clamping with max(INT_MIN, min(INT_MAX, sign * result)) at the end is sufficient. However, adding the pre-multiplication overflow check is still a good habit in Python if the interviewer asks you to mimic the behavior of a C implementation or to optimize for early termination on very large inputs.
Check Overflow BEFORE the Multiplication
Check overflow BEFORE computing result * 10 + digit, not after. In C++ and Java, a signed integer overflow is undefined behavior or produces an incorrect wrapped value — checking after the operation gives you garbage. The pre-multiplication check (result > INT_MAX // 10 OR result == INT_MAX // 10 AND digit > 7) is mathematically equivalent to checking result * 10 + digit > INT_MAX, but it avoids the overflow entirely. Python handles big ints natively, but you still need to clamp at the end.
State Machine Alternative
The formal deterministic finite automaton (DFA) approach models the parsing as a state machine with four states: START, SIGNED, IN_NUMBER, and END. The initial state is START. Character classes determine transitions: space characters in START stay in START; a sign character in START moves to SIGNED; a digit in START or SIGNED moves to IN_NUMBER; any other character moves to END. Once in END, no further characters are processed.
The DFA approach makes the state transitions explicit and is more structured than the sequential parsing approach. It is particularly useful when the grammar is more complex — for example, if the problem added support for hexadecimal literals or decimal points, the DFA can be extended by adding new states and transitions without restructuring the entire algorithm. The DFA is also easier to verify formally: each state has a clear meaning and each transition has a clear trigger.
For the atoi problem specifically, the sequential parsing approach and the DFA approach produce identical results. The sequential version is shorter and cleaner for an interview setting. The DFA version is more defensive and scales better to more complex parsing rules. Both are O(n) time and O(1) space. Choose the one you can implement most cleanly under interview pressure.
- 1States: START, SIGNED, IN_NUMBER, END
- 2START + space → START (skip whitespace)
- 3START + (+/-) → SIGNED (record sign)
- 4START or SIGNED + digit → IN_NUMBER (begin accumulation)
- 5IN_NUMBER + digit → IN_NUMBER (continue accumulation)
- 6Any state + any other character → END (stop parsing)
- 7END: return sign * result clamped to [INT_MIN, INT_MAX]
Code Walkthrough — Python and Java
Python sequential version: def myAtoi(s): i, sign, result = 0, 1, 0; INT_MAX, INT_MIN = 2147483647, -2147483648; while i < len(s) and s[i] == ' ': i += 1; if i < len(s) and s[i] in '+-': sign = -1 if s[i] == '-' else 1; i += 1; while i < len(s) and s[i].isdigit(): digit = int(s[i]); if result > INT_MAX // 10 or (result == INT_MAX // 10 and digit > 7): return INT_MAX if sign == 1 else INT_MIN; result = result * 10 + digit; i += 1; return sign * result. Time O(n), space O(1).
Java sequential version: public int myAtoi(String s) { int i = 0, sign = 1, result = 0; while (i < s.length() && s.charAt(i) == ' ') i++; if (i < s.length() && (s.charAt(i) == '+' || s.charAt(i) == '-')) { sign = s.charAt(i) == '-' ? -1 : 1; i++; } while (i < s.length() && Character.isDigit(s.charAt(i))) { int digit = s.charAt(i) - '0'; if (result > Integer.MAX_VALUE / 10 || (result == Integer.MAX_VALUE / 10 && digit > 7)) return sign == 1 ? Integer.MAX_VALUE : Integer.MIN_VALUE; result = result * 10 + digit; i++; } return sign * result; }
Both implementations are O(n) time and O(1) space. The sequential version is preferred over the DFA version in interviews because it is shorter, easier to type quickly, and directly maps to the four parsing steps that an interviewer expects you to enumerate. The DFA version is best saved for follow-up discussion when the interviewer asks how you would extend the solution.
INT_MIN Has Absolute Value One Greater Than INT_MAX
INT_MIN = -2147483648 has an absolute value one greater than INT_MAX = 2147483647 in two's complement representation. Clamping negative overflow to -2^31 (not -(2^31-1)) is a common mistake — the correct lower bound is -2147483648, not -2147483647. The overflow check digit > 7 catches this correctly for both directions because |INT_MIN| = 2147483648 ends in 8, and 8 > 7 triggers the early return for the negative case.