Overview¶
pg2iceberg has two operating modes:
Logical replication mode (recommended) decouples WAL capture from Iceberg writes. The flusher captures row changes from PostgreSQL and stages them as Parquet files in S3. The materializer runs on a separate interval, reads the staged files, and merges them into Iceberg using merge-on-read semantics. This split means the replication slot LSN can be advanced quickly — the hot write path never touches the Iceberg catalog.
Query mode polls PostgreSQL with watermark-based SELECT queries and writes directly to Iceberg. Simpler, but cannot detect hard deletes and has no transaction semantics.
┌─────────────────────────────────────────────────────┐
│ pg2iceberg │
│ │
PostgreSQL ──WAL──►│ Flusher ──staged Parquet──► Materializer ──► │──► Iceberg
│ (S3 + PG coord) │
└─────────────────────────────────────────────────────┘
The two components communicate through a staged log — append-only, offset-indexed Parquet files in S3 with a lightweight index in PostgreSQL (_pg2iceberg.log_index). Same offset-claim primitive (Coordinator::claim_offsets) is the durability gate for both single-process and distributed deployments.
In single-process mode (pg2iceberg run), the WAL writer and materializer share the same process and coord — staged files still upload to S3 but the materializer reads them via the coord cursor as in distributed mode. There is no in-memory hand-off shortcut; profiling didn't justify the added complexity.
End-to-end flow¶
sequenceDiagram
participant App as Application
participant PG as PostgreSQL
participant F as Flusher
participant S3s as S3 (staging)
participant Coord as _pg2iceberg
participant Mat as Materializer
participant Cat as Iceberg Catalog
participant S3i as S3 (Iceberg)
App->>PG: INSERT INTO orders ...
PG-->>F: BEGIN
PG-->>F: INSERT (row data)
PG-->>F: INSERT (row data)
PG-->>F: COMMIT
Note over F: Serialize buffered events to Parquet
F->>S3s: Upload staged.parquet
F->>Coord: ClaimOffsets (log_seq++, insert log_index)
F->>Coord: set_flushed_lsn (durable record for tamper detection)
F->>PG: Standby status (WALFlushPosition = staged LSN)
Note over PG: WAL before staged LSN can now be recycled
Note over Mat: Materializer interval fires
Mat->>Coord: GetCursor(group, table) → offset 42
Mat->>Coord: Read log_index where offset > 42
Coord-->>Mat: LogEntry (s3=staged.parquet, offsets 43–1042)
Mat->>S3s: Download staged.parquet
Note over Mat: Fold: dedup by PK
Mat->>S3i: Upload data files (Parquet)
Mat->>S3i: Upload manifests
Mat->>Cat: CommitTransaction
Cat-->>Mat: OK (snapshot committed)
Mat->>Coord: SetCursor(offset 1042)
Distributed mode
The same flow describes both single-process and distributed deployments. In distributed mode (pg2iceberg stream-only + N pg2iceberg materializer-only), each materializer worker takes a deterministic round-robin slice of the table list — see distributed.md for the assignment rules and rebalancing semantics.