Problem Overview: Serialize and Recover a String List

LeetCode 271 Encode and Decode Strings asks you to design two methods — encode and decode — such that encode(strs) converts a list of strings into a single string, and decode(s) reconstructs the original list perfectly. The key constraint is that the strings can contain any character including spaces, commas, newlines, and even the delimiter you choose.

This is a design problem rather than an algorithmic puzzle. The challenge is choosing an encoding scheme that is truly unambiguous — one that allows the decoder to always know exactly where one string ends and the next begins, regardless of the content of the strings. Simple delimiter-based schemes fail when strings contain that delimiter.

The problem is marked medium and frequently appears in system design-adjacent interviews. It tests your ability to think about encoding schemes, edge cases with special characters, and the contract between a serializer and deserializer.

Design encode(strs: List[str]) -> str and decode(s: str) -> List[str]
The two methods must be inverses: decode(encode(strs)) == strs for all valid inputs
Strings can contain any character: letters, digits, spaces, punctuation, newlines
Strings can be empty — empty string must round-trip correctly
The input list can be empty — must return an empty list when decoded
No restriction on the length of individual strings or the list size

Why Simple Delimiters Fail

The naive approach is to join the strings with a delimiter like comma or newline: encode returns strs.join(",") and decode returns s.split(","). This breaks immediately when a string contains a comma — the decoder cannot tell whether a comma is part of a string or a separator between strings.

Switching to a rare delimiter like a pipe or tilde does not fundamentally solve the problem — it just makes the failure case less likely. Any fixed single-character delimiter will fail for at least one valid input. Escaping the delimiter adds complexity and new edge cases: what if the escape character itself appears in the string? Proper escaping requires recursive handling.

The cleanest solution avoids delimiter ambiguity entirely by encoding the length of each string before the string itself. Because the length is read as an integer, the decoder always knows exactly how many characters to consume — no delimiter within the string content can confuse it.

💡

Length-Prefixed Encoding Is Foolproof

Length-prefixed encoding is foolproof: the length tells you exactly how many chars to read, no ambiguity possible. Format each string as len(string) + "#" + string. The number before "#" is the byte count to consume after the "#". Even if the string content is "3#abc", the decoder reads the leading length and skips exactly that many characters — the embedded "#" is never misinterpreted as a separator.

Length-Prefix Encoding: The Algorithm

For each string in the input list, the encoder produces: the decimal representation of the string's length, followed by the character "#", followed by the string's characters. The "#" character acts as a separator between the length header and the string body. Multiple encoded strings are concatenated directly — no delimiter needed between them because each length header tells the decoder exactly where the next string ends.

For example, encoding ["hello", "world"] produces "5#hello5#world". Encoding ["a#b", "cd"] produces "3#a#bcd2#cd" — the "#" inside "a#b" is not a problem because the decoder reads the leading 3 and consumes exactly three characters after the next "#", yielding "a#b" correctly. The outer "#" separating the length from the body is found by position, not by scanning for any "#" character.

The encoder is a simple loop: for each string s, append str(len(s)) + "#" + s to a result buffer. In Python this is "".join(str(len(s)) + "#" + s for s in strs). The time complexity is O(n) where n is the total number of characters across all strings. Space complexity is O(n) for the output string.

1Initialize an empty result string or buffer
2For each string s in the input list:
3Compute the length: length = len(s)
4Append the prefix: result += str(length) + "#"
5Append the string body: result += s
6After all strings, return the result string

Decoding Process: Read Length, Then Slice

The decoder maintains a pointer i starting at position 0. At each step, it scans forward from i to find the next "#" character. The substring from i to the "#" position is the decimal length of the next string. Parse it as an integer to get length. The string body starts at the character immediately after "#" and runs for exactly length characters.

After reading the string body, advance i to the position just after the last character of the body: i = j + 1 + length, where j is the index of "#". Append the extracted string to the result list and repeat until i reaches the end of the encoded string. Return the result list.

The critical insight is that we always locate "#" by scanning forward from the current position i, not by searching the entire remaining string. Because we immediately parse the length and skip exactly that many characters, the content of the string body never influences parsing of subsequent strings.

ℹ️

The "#" Between Length and Content Is Safe

The "#" delimiter between length and content is safe because we read a fixed number of chars after it, even if the content contains "#". When decoding, we call s.find("#", i) to locate the first "#" at or after position i. We parse the length from s[i:j], then take s[j+1 : j+1+length] as the string body. If the body is "3#hi", we consume exactly 4 chars after the "#" and the embedded "#" is just data.

Edge Cases: Empty Strings, Empty List, Special Characters

An empty string "" has length 0, so it encodes as "0#". The decoder reads "0", finds "#", and then slices 0 characters — correctly recovering the empty string. An input list of ["", "", ""] encodes as "0#0#0#" and decodes back to three empty strings. The length-prefix scheme handles empty strings without any special case code.

An empty input list encodes as "" — the empty string. The decoder starts with i = 0, immediately sees that i >= len(s), and returns an empty list without entering the loop. This is the correct behavior and requires no special handling. The encode-decode round trip works for all list lengths from zero upward.

Strings containing "#", digits, spaces, newlines, Unicode code points, and null bytes all encode correctly. The length is always measured in characters (or bytes, depending on the language), and the decoder slices exactly that many units. There is no character in the string body that can be mistaken for a length separator because the length header is always at a known offset from the previous string's end.

1Empty string "" encodes as "0#" — length 0 followed by "#" followed by zero chars
2Empty list [] encodes as "" — the empty string; decodes back to []
3String with "#" inside, e.g. "a#b", encodes as "3#a#b" — length 3 protects the inner "#"
4String with digits, e.g. "42", encodes as "2#42" — no conflict with length prefix
5String with newline, e.g. "a\nb", encodes as "3#a\nb" — newline is just another character
6Unicode string, e.g. "caf\u00e9", encodes correctly if length is measured in code points

Code Walkthrough: Python and Java

In Python, encode is one line: return "".join(str(len(s)) + "#" + s for s in strs). Decode uses a while loop: initialize i = 0 and res = []. While i < len(s): find j = s.find("#", i), parse length = int(s[i:j]), append s[j+1:j+1+length] to res, then set i = j + 1 + length. Return res. The entire decode function is about 8 lines. Both encode and decode run in O(n) time where n is the total number of characters across all strings.

In Java, encode builds a StringBuilder: for each string str, append str.length() + "#" + str. Decode uses a similar pointer loop: int i = 0; List<String> res = new ArrayList<>(). While i < s.length(): find int j = s.indexOf('#', i); parse int len = Integer.parseInt(s.substring(i, j)); add s.substring(j + 1, j + 1 + len) to res; set i = j + 1 + len. Return res. Both Java and Python solutions achieve O(n) time complexity and O(1) extra space beyond the output.

The overall space complexity is O(n) for the encoded string in encode, and O(n) for the decoded list in decode — both are unavoidable since the output must contain all the input characters. There is no intermediate data structure beyond the output itself and the pointer variable i.

⚠️

Use str.find("#", i) Not split("#") for Decoding

Use str.find("#", i) not split("#") for decoding since the content itself may contain "#". Calling s.split("#") on an encoded string like "3#a#b2#cd" produces ["3", "a", "b2", "cd"] — four pieces, not two — and loses the structure entirely. Always use find with a start offset equal to the current pointer position i so you locate only the length-body separator, not any "#" inside a string body.

Encode Decode Strings LeetCode 271 Guide

Problem Overview: Serialize and Recover a String List

Why Simple Delimiters Fail

Length-Prefix Encoding: The Algorithm

Decoding Process: Read Length, Then Slice

Edge Cases: Empty Strings, Empty List, Special Characters

Code Walkthrough: Python and Java

Ready to master algorithm patterns?

Encode Decode Strings LeetCode 271 Guide

Problem Overview: Serialize and Recover a String List

Why Simple Delimiters Fail

Length-Prefix Encoding: The Algorithm

Decoding Process: Read Length, Then Slice

Edge Cases: Empty Strings, Empty List, Special Characters

Code Walkthrough: Python and Java

Related String Design and Encoding Problems

Ready to master algorithm patterns?