Problem Walkthrough

Alien Dictionary LeetCode 269 Complete

An extended deep dive into LeetCode 269 Alien Dictionary covering every edge case — prefix invalidity, single words, and identical inputs — with a side-by-side BFS Kahn's algorithm vs DFS three-state coloring comparison and a rigorous cycle detection proof using both approaches.

10 min read|

Alien Dictionary

LeetCode 269 Complete

Problem Overview — Deriving Character Order from a Sorted Word List

LeetCode 269 Alien Dictionary presents you with a list of words sorted in lexicographic order according to an alien language. Your task is to derive the ordering of characters in that alien alphabet and return any valid ordering as a string. If no valid ordering exists — because the constraints form a cycle — return an empty string. If the input is valid but multiple orderings exist, any one of them is accepted.

The core idea is to treat character ordering as a graph problem. By comparing adjacent pairs of words, you can extract directed edges: if word[i] and word[i+1] first differ at position k, then word[i][k] comes before word[i+1][k] in the alien alphabet. These edges form a directed graph, and a valid character ordering is a topological sort of that graph. If the graph has a cycle, no valid ordering exists.

This problem combines string comparison, graph construction, and topological sorting into a single pipeline. Candidates who have not seen it before often get stuck building the graph correctly — particularly missing edge cases during word comparison — and then separately stumble on cycle detection. This guide walks through every layer methodically.

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 100
  • All characters are lowercase English letters
  • Return a valid character ordering string, or "" if none exists
  • Any valid ordering is accepted when multiple orderings are possible
  • All unique characters must appear in the output string

Edge Cases That Break Solutions — Prefix Invalidity and Single Words

The most dangerous edge case is prefix invalidity: if a longer word appears before a shorter word that is a prefix of it, the input is invalid and you must return an empty string. For example, if "abc" appears before "ab" in the sorted list, no character ordering can explain this — "ab" should always come before "abc" in lexicographic order regardless of the alphabet. Many solutions incorrectly skip this check and produce a wrong answer without a cycle.

Single-word inputs are degenerate but valid. When the list contains only one word, you can extract no ordering constraints from adjacent word comparisons. Every character in that word is part of the alien alphabet and can appear in any order in your output. The correct response is to return all unique characters from the single word in any order.

All identical words present a similar situation: no edges are created because no adjacent pair ever produces a differing character. Every unique character is valid with no ordering constraint. Empty strings in the input are edge cases depending on the problem variant — treat them as words with no characters that contribute no edges and no unique characters to the output.

💡

Most Commonly Missed Edge Case — Check Prefix Invalidity Before Building the Graph

Always check if a shorter word appears after a longer word that it is a prefix of. If words[i] starts with words[i+1] but words[i].length > words[i+1].length, the ordering is impossible — return empty string immediately. This check must happen during the word comparison loop before any edge is added to the graph. Skipping it produces solutions that pass most test cases but fail on this specific invalid input, which LeetCode 269 includes as a hidden test.

Building the Graph — Extracting Directed Edges from Adjacent Word Pairs

To build the character ordering graph, iterate through every adjacent pair of words. For each pair, find the first position where the characters differ. The character from the first word at that position must come before the character from the second word in the alien alphabet — add a directed edge from char_a to char_b. Stop comparing after the first differing character; subsequent characters give no ordering information.

A critical implementation detail: initialize the graph and in-degree map with ALL unique characters from ALL words, not just characters that appear in edges. Characters that are present in words but never appear as the first differing character in any adjacent pair have no ordering constraints. They still need to appear in the output — initializing them in the graph ensures they are included even with zero in-degree edges.

The resulting directed graph has at most 26 nodes (one per lowercase letter) and at most one edge per adjacent word pair. Multiple word pairs may contribute edges between the same pair of characters — store edges as a set to avoid duplicate edges, which would corrupt in-degree counts in BFS or cause redundant DFS visits.

  1. 1Collect all unique characters from all words; initialize graph adjacency set and in-degree map for each
  2. 2For i from 0 to len(words) - 2, compare words[i] and words[i+1]
  3. 3Find min length of the two words; iterate character by character
  4. 4At first differing position k: add edge words[i][k] → words[i+1][k] if not already present; increment in-degree of words[i+1][k]; break
  5. 5If no differing character found but words[i] is longer than words[i+1]: return "" immediately (prefix invalidity)
  6. 6After all pairs: graph and in-degree map are ready for topological sort

BFS Kahn's Algorithm — Queue-Based Topological Sort with Cycle Detection

Kahn's algorithm performs topological sort using BFS. Start by enqueuing all characters with in-degree zero — these have no prerequisites and can appear first in the ordering. Process the queue: for each dequeued character, append it to the result string, then decrement the in-degree of every character it points to. When a neighbor's in-degree drops to zero, enqueue it.

Cycle detection is built into the process: if the queue empties before all characters have been appended to the result, the remaining characters are part of a cycle. A cycle means some subset of characters each requires another character in the subset to come first — an impossibility. In this case return an empty string. If the result length equals the total number of unique characters, the topological sort succeeded.

BFS naturally produces a valid character ordering because it processes characters in dependency order — no character is added to the result until all characters that must precede it have already been processed. The ordering is not unique in general; when multiple characters have in-degree zero simultaneously, the queue processes them in an arbitrary order, all of which are valid.

ℹ️

BFS Detects Cycles Simultaneously with Ordering — No Separate Cycle Check Needed

BFS Kahn's algorithm naturally produces a valid ordering and detects cycles simultaneously. If the queue empties before all characters are processed, the remaining characters form a cycle — they all have in-degree greater than zero, meaning each one depends on another in the same stuck group. The cycle check is simply: if len(result) != total_unique_chars, return "". No explicit cycle detection logic is needed beyond checking the final result length.

DFS with Three-State Coloring — WHITE, GRAY, BLACK Cycle Detection

The DFS approach uses three-state node coloring to detect cycles and build the ordering simultaneously. WHITE means unvisited, GRAY means currently being explored (in the DFS call stack), and BLACK means fully explored. Initialize all nodes to WHITE. For each unvisited node, launch a DFS.

During DFS, mark the current node GRAY before visiting neighbors. If a neighbor is already GRAY, a back edge has been found — a cycle exists, return empty string. If a neighbor is WHITE, recurse into it. If a neighbor is BLACK, it is already fully processed; skip it. After all neighbors are fully explored, mark the current node BLACK and append it to the result.

Because nodes are appended after all their dependencies are explored (post-order), the result is built in reverse topological order. Reverse the result at the end to obtain a valid ordering. The three-state coloring correctly handles all directed graph topologies including disconnected graphs and graphs with nodes that have no edges — every node is visited exactly once across all DFS calls.

  1. 1Initialize color map: all unique chars → WHITE; result list = []
  2. 2Define dfs(char): mark char GRAY
  3. 3For each neighbor of char: if GRAY → cycle detected, return False; if WHITE → recurse, propagate False on cycle
  4. 4Mark char BLACK; append char to result; return True
  5. 5For each unique char: if WHITE, call dfs(char); return "" if cycle detected
  6. 6Return "".join(reversed(result))

Code Walkthrough — Python and Java BFS and DFS Implementations

Python BFS implementation uses collections.defaultdict for the adjacency list and collections.deque for the queue. Graph is initialized with all unique characters. Adjacent word pairs populate edges using a set to avoid duplicates. In-degree counts drive the BFS. The solution is approximately 20 lines. Time complexity O(C) where C is the total number of characters across all words — each character is processed once in word comparison, and the graph has at most 26 nodes and at most 26*25 edges.

Python DFS implementation uses a color dictionary initialized to 0 (WHITE) for all chars, 1 (GRAY) during exploration, 2 (BLACK) when done. A nested dfs() function returns False on cycle detection. The result list is reversed after all DFS calls. Both BFS and DFS have identical asymptotic complexity O(C) for graph building plus O(V+E) for traversal where V <= 26 and E <= 26*25, so the bottleneck is always graph construction.

Java implementations mirror the Python logic using HashMap and ArrayDeque for BFS, and HashMap with enum State {WHITE, GRAY, BLACK} for DFS. Java enum states make the coloring explicit and type-safe, reducing the chance of off-by-one bugs in state comparison. Both languages benefit from using a Set to store edges during graph construction to prevent duplicate in-degree increments.

⚠️

Initialize the Graph with ALL Unique Characters — Not Just Characters in Edges

Characters that appear in words but have no ordering constraints — because they never appear as the first differing character in any adjacent word pair — must still appear in the output string. If you only add characters to the graph when creating edges, isolated characters are silently dropped from the result. Always initialize the graph node set and in-degree map by scanning every character of every word before comparing adjacent pairs. This ensures characters with zero in-degree and zero out-degree are still enqueued in BFS or visited in DFS.

Ready to master algorithm patterns?

YeetCode flashcards help you build pattern recognition through active recall and spaced repetition.

Start practicing now