Distributed Mode¶
For databases with large tables or high write throughput, pg2iceberg can scale the materializer horizontally — one process runs the WAL writer, and N processes run materializer workers that share the work across tables.
See Run Modes for the flags to enable this.
Architecture¶
┌────────────────────────┐
│ pg2iceberg │ ┌──────────────────────────┐
│ stream-only │────►│ pg2iceberg │
│ │ │ materializer-only │
│ WAL writer only │ │ --worker-id worker-1 │
│ No materializer cycle │ │ (tables: orders, │
└────────────────────────┘ │ products) │
│ ├──────────────────────────┤
S3 + _pg2iceberg │ pg2iceberg │
coordination ───────────────│ materializer-only │
│ --worker-id worker-2 │
│ (tables: users, │
│ payments) │
└──────────────────────────┘
All workers share the same leaderless log (S3 + _pg2iceberg schema). There is no central coordinator process — workers coordinate entirely through PostgreSQL.
Table assignment¶
At the start of each materializer cycle, every worker independently computes the same deterministic assignment without any locking:
- Register a heartbeat in the
consumertable (TTL: 30 seconds) - Query
consumerfor all active workers, sorted alphabetically by worker ID - Sort table names alphabetically
- Assign tables via round-robin:
table[i] → workers[i % len(workers)]
Because both lists are sorted the same way on every worker, all workers reach identical conclusions about who owns which table — no coordination message needed.
Example with 2 workers and 4 tables:
| Table | Worker |
|---|---|
orders |
worker-1 |
payments |
worker-2 |
products |
worker-1 |
users |
worker-2 |
Rebalancing¶
When a worker joins or leaves, the consumer table changes. Every worker detects this on the next cycle and recomputes the assignment. Tables rebalance automatically — no manual intervention needed.
A worker leaving (crash or graceful shutdown) is detected within one heartbeat TTL (30 seconds). Its tables are picked up by the remaining workers on the next cycle.
Cursor isolation¶
mat_cursor is keyed by (group_name, table_name) — workers within the same group share cursors per table because the deterministic round-robin assignment guarantees only one worker materializes any given table per cycle. There's no worker_id in the cursor key; ownership is recomputed at each cycle's start, not persisted.
Commit isolation¶
Two workers shouldn't ever process the same table on the same cycle (the assignment is deterministic), but transient races during rebalance are possible if one worker hasn't yet noticed a peer dropped out. In that window:
- The catalog's
assert-ref-snapshot-idrequirement on Iceberg's transaction commit makes the loser's commit return a conflict — the catalog rejects the second commit. - The losing worker's
set_cursordoesn't run because the commit failed; it retries cleanly on the next cycle.
In practice this only happens transiently during rebalancing.