Patterns

Union Find Patterns — Complete Guide

Master the path compression and union by rank template that makes Union Find nearly O(1) amortized. Learn exactly when Union Find beats BFS/DFS — dynamic connectivity, cycle detection, and Kruskal's MST — and how to extend the structure for weighted relationships and evaluate division problems.

11 min read|

Union Find Patterns

Disjoint Set Guide

What Is Union Find

Union Find — also called Disjoint Set Union (DSU) — is a data structure that tracks a collection of elements partitioned into disjoint (non-overlapping) connected components. It answers one fundamental question efficiently: are two elements in the same component? And it supports one mutation: merge two components together.

The two core operations are find(x), which returns the representative root of the component containing x, and union(x, y), which merges the components containing x and y into a single component. After applying all union operations for a graph's edges, elements with the same find() root are connected; elements with different roots are in separate components.

With the two key optimizations — path compression and union by rank — Union Find achieves near O(1) amortized time per operation, making it one of the most efficient data structures for connectivity queries on dynamic graphs.

In LeetCode interviews, Union Find is the go-to data structure whenever a problem involves grouping elements into connected components dynamically — for example, determining the number of connected components after each edge is added in a streaming graph. Recognizing this signal is the first step to mastering Union Find LeetCode problems: if the graph is built incrementally and connectivity must be queried at each step, Union Find will almost always be faster and simpler than repeated BFS or DFS traversals.

  • find(x) — returns the root (representative) of the component containing x
  • union(x, y) — merges the components containing x and y into one
  • connected(x, y) — returns true if find(x) == find(y)
  • O(n) initialization — parent[i] = i, rank[i] = 0 for all i
  • Near O(1) amortized per operation with both optimizations applied
  • Tracks number of components: start at n, decrement on each successful union

Path Compression

Without any optimization, find(x) walks up the parent chain from x to the root — a chain that can degenerate to O(n) depth if unions always attach nodes in a line. Path compression fixes this by making every node on the path from x to the root point directly to the root during the find traversal. Future find calls on those same nodes then reach the root in a single step.

The implementation is a one-line recursive change: find(x) = if parent[x] != x: parent[x] = find(parent[x]); return parent[x]. On the recursive return, each node's parent pointer is updated to the root. This flattens the tree incrementally — after enough find calls, the tree becomes nearly flat and subsequent finds are O(1).

Path compression alone gives O(log n) amortized time per operation. The tree never grows taller than O(log n) under path compression even without union by rank. Combined with union by rank, the amortized cost drops to O(alpha(n)) — the inverse Ackermann function — which is effectively constant for any input size that exists in practice.

💡

Path Compression + Union by Rank = O(alpha(n)) ≈ O(1)

Path compression alone gives O(log n) amortized per operation. Combined with union by rank it achieves O(alpha(n)) — the inverse Ackermann function — which is less than 5 for any n that could ever be stored in physical memory. For all practical purposes, both optimizations together make find and union effectively O(1).

Union by Rank

Union by rank prevents the degenerate case where repeated unions create a linear chain of n nodes — a scenario that makes find O(n). The idea is simple: always attach the root of the shorter tree under the root of the taller tree. The "rank" of a node approximates its subtree height. When two trees of equal rank merge, the resulting tree is one rank taller. When a smaller-rank tree merges into a larger-rank tree, the larger tree's rank does not change.

The union operation with rank: first find the roots a = find(x) and b = find(y). If a == b they are already connected — return false. Otherwise compare ranks: if rank[a] < rank[b], attach a under b (parent[a] = b). If rank[a] > rank[b], attach b under a (parent[b] = a). If ranks are equal, attach b under a and increment rank[a] by one. Return true to signal a successful merge.

Union by rank ensures that the tree height stays at O(log n) even without path compression. The combination of both optimizations is why Union Find is so powerful: path compression flattens trees aggressively during find, and union by rank prevents trees from growing tall in the first place. Together they create a self-balancing structure that requires almost no extra bookkeeping.

  1. 1Step 1 — Find roots: a = find(x), b = find(y)
  2. 2Step 2 — If a == b, already connected — return false (no merge needed)
  3. 3Step 3 — If rank[a] < rank[b]: parent[a] = b (attach shorter tree under taller)
  4. 4Step 4 — If rank[a] > rank[b]: parent[b] = a (attach shorter tree under taller)
  5. 5Step 5 — If rank[a] == rank[b]: parent[b] = a, then rank[a] += 1
  6. 6Step 6 — Decrement component count by 1, return true

When to Use Union Find vs BFS/DFS

The core question is whether the graph is static or dynamic. BFS and DFS require the full graph to be built upfront: you iterate over all nodes and edges in a fixed structure. Union Find shines when edges arrive one at a time — you process each edge as a union operation and query connectivity at any point without rebuilding the graph. This makes Union Find the natural choice for problems that add edges incrementally.

For cycle detection, Union Find is often the cleanest approach: before performing union(x, y), check if find(x) == find(y). If they are already in the same component, adding this edge creates a cycle. This is the basis of Kruskal's Minimum Spanning Tree algorithm, which processes edges in sorted order and unions them unless they would create a cycle. BFS/DFS-based cycle detection works but requires traversal state tracking.

Use BFS when you need shortest-path distances, level-order processing, or multi-source spreading (like rotting oranges or walls and gates). Use DFS when you need traversal order, recursion depth, or connectivity in a fully-known graph. Use Union Find when the problem is purely about connectivity or component membership, especially when edges are processed one at a time or the problem asks to detect the moment two components become connected.

ℹ️

The Key Signal for Union Find: Dynamic Edge Processing

Reach for Union Find when the problem processes edges one at a time and asks about connectivity or components at each step. If the problem says "as each edge/connection is added, find the number of components" or "find the first edge that creates a cycle," that is a Union Find problem. BFS/DFS require the full graph upfront; Union Find handles edges incrementally in near O(1) per edge.

Weighted Union Find

Weighted (or value-labeled) Union Find extends the basic structure to track a weight or ratio between each node and its parent. Instead of parent[x] storing only the root, the structure stores both the root and a cumulative weight representing the relationship from x to the root. On path compression, weights are composed along the path so the direct root pointer carries the total weight from x to root.

The classic application is "evaluate division" (LeetCode 399): given equations like a/b = 2.0 and b/c = 3.0, answer queries like a/c = ? Union x and y with weight val (meaning x/y = val). The find operation returns (root, weight_to_root) where weight_to_root is the product of all weights on the path from x to root. To answer x/y, compute find(x) and find(y); if they share a root, the answer is weight_x / weight_y.

Weighted Union Find also appears in "swim in rising water" and problems involving relative differences between elements. The key insight is that path compression still works — you just need to propagate the weight product during compression. This keeps the near-O(1) amortized performance while enabling powerful relational queries.

  1. 1Step 1 — Store weight[x] = ratio from x to parent[x] (initialize weight[x] = 1.0)
  2. 2Step 2 — find(x): if parent[x] != x, recurse; on return, compose weights: weight[x] *= weight[parent[x]]; parent[x] = root
  3. 3Step 3 — find returns (root, weight_to_root) where weight_to_root = product of weights from x to root
  4. 4Step 4 — union(x, y, val): find roots rx, ry with weights wx, wy; set parent[rx] = ry; weight[rx] = wy / wx * val
  5. 5Step 5 — Query x/y: if find(x).root != find(y).root return -1 (no relation); else return find(x).weight / find(y).weight

Template Code — Python and Java

The complete Union Find template in Python is roughly 20 lines. Initialize parent as a list where parent[i] = i, rank as a list of zeros, and a component count starting at n. The find method uses recursion with path compression. The union method compares ranks and decrements the component counter on a successful merge. The connected method is a one-liner calling find on both elements.

In Java, the same structure maps to a class with int[] parent and int[] rank fields, initialized in the constructor. The find and union methods are direct translations of the Python logic. Java's lack of tuple return means find returns only the root integer; weight tracking for weighted Union Find requires a separate double[] weight array and a custom two-value return via a helper object or an int[] with encoded values.

Both implementations give O(n) initialization, O(alpha(n)) amortized per find and union, and O(n) space. The template is compact enough to write from memory in an interview in under two minutes once practiced. The most important detail is the path compression line inside find — without it, the structure degrades to O(log n) or worse.

⚠️

Always Initialize parent[i] = i and rank[i] = 0

The most common Union Find bug is forgetting to initialize parent[i] = i for every node i. If parent[i] is uninitialized (or left at 0), find(i) will incorrectly report that i belongs to the component of node 0 rather than its own component. Always initialize every element as its own parent with rank 0 before processing any edges.

When Should You Use Union Find in LeetCode?

Union Find is the right choice when a problem involves dynamic connectivity — specifically when edges or connections are added one at a time and you need to know the component structure after each addition. If the problem says "as each edge is added, how many components remain?" or "find the earliest moment two nodes become connected," that is a Union Find problem by design.

The three core LeetCode scenarios where Union Find outperforms BFS/DFS are: cycle detection in undirected graphs (Redundant Connection, #684), counting connected components as a graph is built incrementally (Number of Islands II, #305), and minimum spanning tree construction via Kruskal's algorithm (Min Cost to Connect All Points, #1584). In each case, the key advantage is that Union Find processes each edge in near O(1) amortized time without re-traversing the entire graph.

As a rule of thumb: if the full graph is known upfront and you need path information or distances, use BFS or DFS. If components must be tracked across a sequence of union operations, reach for Union Find. This distinction is tested directly in senior-level LeetCode rounds and system design discussions about distributed connectivity algorithms.

💡

Quick Decision Rule

Edges added one at a time + connectivity queries = Union Find. Full static graph + shortest path = BFS/DFS. When in doubt, check whether the problem processes edges incrementally — that is the clearest Union Find signal.

Problem Roadmap

Start with "number of provinces" (LeetCode 547) — it is the canonical Union Find introduction. You are given an adjacency matrix; union all connected pairs and count the remaining components. The answer is the component counter after all unions. This problem requires no optimizations to pass but teaches the basic structure.

For medium problems, "redundant connection" (LeetCode 684) is the cycle detection template: process edges in order; the first edge where both endpoints share a root is the redundant edge. "Accounts merge" (LeetCode 721) requires mapping string emails to integer IDs before applying Union Find. "Number of islands II" (LeetCode 305) is the dynamic connectivity problem: islands are added one at a time and you report component count after each addition.

For hard problems, "swim in rising water" (LeetCode 778) uses Union Find on a grid where cells become passable as the water level rises — binary search on the answer plus Union Find for reachability. "Evaluate division" (LeetCode 399) is the weighted Union Find template. Both are significantly harder than typical medium problems and are good stretch goals for senior-level interview preparation.

  • Easy: Number of Provinces (547) — canonical Union Find; count components from adjacency matrix
  • Medium: Redundant Connection (684) — first edge where find(a) == find(b) before union
  • Medium: Accounts Merge (721) — map emails to IDs, union by shared account, reconstruct groups
  • Medium: Number of Islands II (305) — dynamic connectivity; union each new island cell with neighbors
  • Hard: Swim in Rising Water (778) — binary search + Union Find for grid reachability
  • Hard: Evaluate Division (399) — weighted Union Find tracking ratios between variables

Ready to master algorithm patterns?

YeetCode flashcards help you build pattern recognition through active recall and spaced repetition.

Start practicing now