Problem Overview: Serialize and Recover a String List
LeetCode 271 Encode and Decode Strings asks you to design two methods — encode and decode — such that encode(strs) converts a list of strings into a single string, and decode(s) reconstructs the original list perfectly. The key constraint is that the strings can contain any character including spaces, commas, newlines, and even the delimiter you choose.
This is a design problem rather than an algorithmic puzzle. The challenge is choosing an encoding scheme that is truly unambiguous — one that allows the decoder to always know exactly where one string ends and the next begins, regardless of the content of the strings. Simple delimiter-based schemes fail when strings contain that delimiter.
The problem is marked medium and frequently appears in system design-adjacent interviews. It tests your ability to think about encoding schemes, edge cases with special characters, and the contract between a serializer and deserializer.
- Design encode(strs: List[str]) -> str and decode(s: str) -> List[str]
- The two methods must be inverses: decode(encode(strs)) == strs for all valid inputs
- Strings can contain any character: letters, digits, spaces, punctuation, newlines
- Strings can be empty — empty string must round-trip correctly
- The input list can be empty — must return an empty list when decoded
- No restriction on the length of individual strings or the list size
Why Simple Delimiters Fail
The naive approach is to join the strings with a delimiter like comma or newline: encode returns strs.join(",") and decode returns s.split(","). This breaks immediately when a string contains a comma — the decoder cannot tell whether a comma is part of a string or a separator between strings.
Switching to a rare delimiter like a pipe or tilde does not fundamentally solve the problem — it just makes the failure case less likely. Any fixed single-character delimiter will fail for at least one valid input. Escaping the delimiter adds complexity and new edge cases: what if the escape character itself appears in the string? Proper escaping requires recursive handling.
The cleanest solution avoids delimiter ambiguity entirely by encoding the length of each string before the string itself. Because the length is read as an integer, the decoder always knows exactly how many characters to consume — no delimiter within the string content can confuse it.
Length-Prefixed Encoding Is Foolproof
Length-prefixed encoding is foolproof: the length tells you exactly how many chars to read, no ambiguity possible. Format each string as len(string) + "#" + string. The number before "#" is the byte count to consume after the "#". Even if the string content is "3#abc", the decoder reads the leading length and skips exactly that many characters — the embedded "#" is never misinterpreted as a separator.
Length-Prefix Encoding: The Algorithm
For each string in the input list, the encoder produces: the decimal representation of the string's length, followed by the character "#", followed by the string's characters. The "#" character acts as a separator between the length header and the string body. Multiple encoded strings are concatenated directly — no delimiter needed between them because each length header tells the decoder exactly where the next string ends.
For example, encoding ["hello", "world"] produces "5#hello5#world". Encoding ["a#b", "cd"] produces "3#a#bcd2#cd" — the "#" inside "a#b" is not a problem because the decoder reads the leading 3 and consumes exactly three characters after the next "#", yielding "a#b" correctly. The outer "#" separating the length from the body is found by position, not by scanning for any "#" character.
The encoder is a simple loop: for each string s, append str(len(s)) + "#" + s to a result buffer. In Python this is "".join(str(len(s)) + "#" + s for s in strs). The time complexity is O(n) where n is the total number of characters across all strings. Space complexity is O(n) for the output string.
- 1Initialize an empty result string or buffer
- 2For each string s in the input list:
- 3Compute the length: length = len(s)
- 4Append the prefix: result += str(length) + "#"
- 5Append the string body: result += s
- 6After all strings, return the result string
Decoding Process: Read Length, Then Slice
The decoder maintains a pointer i starting at position 0. At each step, it scans forward from i to find the next "#" character. The substring from i to the "#" position is the decimal length of the next string. Parse it as an integer to get length. The string body starts at the character immediately after "#" and runs for exactly length characters.
After reading the string body, advance i to the position just after the last character of the body: i = j + 1 + length, where j is the index of "#". Append the extracted string to the result list and repeat until i reaches the end of the encoded string. Return the result list.
The critical insight is that we always locate "#" by scanning forward from the current position i, not by searching the entire remaining string. Because we immediately parse the length and skip exactly that many characters, the content of the string body never influences parsing of subsequent strings.
The "#" Between Length and Content Is Safe
The "#" delimiter between length and content is safe because we read a fixed number of chars after it, even if the content contains "#". When decoding, we call s.find("#", i) to locate the first "#" at or after position i. We parse the length from s[i:j], then take s[j+1 : j+1+length] as the string body. If the body is "3#hi", we consume exactly 4 chars after the "#" and the embedded "#" is just data.
Edge Cases: Empty Strings, Empty List, Special Characters
An empty string "" has length 0, so it encodes as "0#". The decoder reads "0", finds "#", and then slices 0 characters — correctly recovering the empty string. An input list of ["", "", ""] encodes as "0#0#0#" and decodes back to three empty strings. The length-prefix scheme handles empty strings without any special case code.
An empty input list encodes as "" — the empty string. The decoder starts with i = 0, immediately sees that i >= len(s), and returns an empty list without entering the loop. This is the correct behavior and requires no special handling. The encode-decode round trip works for all list lengths from zero upward.
Strings containing "#", digits, spaces, newlines, Unicode code points, and null bytes all encode correctly. The length is always measured in characters (or bytes, depending on the language), and the decoder slices exactly that many units. There is no character in the string body that can be mistaken for a length separator because the length header is always at a known offset from the previous string's end.
- 1Empty string "" encodes as "0#" — length 0 followed by "#" followed by zero chars
- 2Empty list [] encodes as "" — the empty string; decodes back to []
- 3String with "#" inside, e.g. "a#b", encodes as "3#a#b" — length 3 protects the inner "#"
- 4String with digits, e.g. "42", encodes as "2#42" — no conflict with length prefix
- 5String with newline, e.g. "a\nb", encodes as "3#a\nb" — newline is just another character
- 6Unicode string, e.g. "caf\u00e9", encodes correctly if length is measured in code points
Code Walkthrough: Python and Java
In Python, encode is one line: return "".join(str(len(s)) + "#" + s for s in strs). Decode uses a while loop: initialize i = 0 and res = []. While i < len(s): find j = s.find("#", i), parse length = int(s[i:j]), append s[j+1:j+1+length] to res, then set i = j + 1 + length. Return res. The entire decode function is about 8 lines. Both encode and decode run in O(n) time where n is the total number of characters across all strings.
In Java, encode builds a StringBuilder: for each string str, append str.length() + "#" + str. Decode uses a similar pointer loop: int i = 0; List<String> res = new ArrayList<>(). While i < s.length(): find int j = s.indexOf('#', i); parse int len = Integer.parseInt(s.substring(i, j)); add s.substring(j + 1, j + 1 + len) to res; set i = j + 1 + len. Return res. Both Java and Python solutions achieve O(n) time complexity and O(1) extra space beyond the output.
The overall space complexity is O(n) for the encoded string in encode, and O(n) for the decoded list in decode — both are unavoidable since the output must contain all the input characters. There is no intermediate data structure beyond the output itself and the pointer variable i.
Use str.find("#", i) Not split("#") for Decoding
Use str.find("#", i) not split("#") for decoding since the content itself may contain "#". Calling s.split("#") on an encoded string like "3#a#b2#cd" produces ["3", "a", "b2", "cd"] — four pieces, not two — and loses the structure entirely. Always use find with a start offset equal to the current pointer position i so you locate only the length-body separator, not any "#" inside a string body.