Skip to content

pgBackRest async archiver silently drops timeline history file during dataSource bootstrap, causing replicas to permanently fail pg_rewind #4472

@hors

Description

@hors

Overview

When a PostgresCluster is bootstrapped from a dataSource.pgbackrest restore, postgres promotes from timeline 1 to timeline 2 and immediately tries to archive 00000002.history via archive_command. At that moment the pgBackRest stanza (archive.info) does not yet exist — the operator creates it later in its reconcile loop. pgBackRest's async archiver silently drops the push with error 103 and removes the spool entry. postgres considers the file archived and never retries. The history file is permanently absent from the archive.

Without 00000002.history in the archive, pg_rewind cannot reconstruct the full timeline chain during any future PITR restore and fails with:

could not find common ancestor of the source and target cluster's timelines

Replica pods remain stuck at 2/4 containers Ready indefinitely.

Environment

  • Platform: GKE
  • Platform Version: 1.35.3-gke.1522000
  • PGO Image Tag: ubi9-5.8.7-0
  • Postgres Version: 18
  • Storage: pd-ssd (GKE standard SSD persistent disk)

Steps to Reproduce

REPRO

  1. Create a source PostgresCluster with a pgBackRest S3/object-store repo and wait for stanzaCreated: true
  2. Write some data and trigger a full backup; wait for the backup job to complete
  3. Create a second PostgresCluster with dataSource.pgbackrest pointing to the source cluster's repo (this triggers a bootstrap restore)
  4. Wait for the restored cluster to reach Ready and stanzaCreated: true
  5. Check whether 00000002.history is present in the pgBackRest archive:
PRIMARY=$(kubectl get pod -n <namespace> \
  -l postgres-operator.crunchydata.com/cluster=<restored-cluster>,postgres-operator.crunchydata.com/role=master \
  -o jsonpath='{.items[0].metadata.name}')

kubectl exec -n <namespace> "${PRIMARY}" -c database -- \
  pgbackrest --stanza=db archive-get "00000002.history" "/tmp/00000002.history.check"
echo "exit code: $?"

EXPECTED

archive-get exits 0 and downloads 00000002.history — the timeline history file is present in the archive.

ACTUAL

archive-get exits non-zero — 00000002.history is missing from the archive. This is a race condition; it does not reproduce on every run. When it does not reproduce, stanza-create happened to run before the async archiver background worker attempted the push.

The root cause sequence:

postgres restore completes
  └─ postgres promotes TL1 → TL2
       └─ archive_command: pgbackrest archive-push 00000002.history
            └─ async mode: spool entry queued, exit 0 returned to postgres ✓
                 └─ background archiver runs: archive-push → ERROR 103 (archive.info missing)
                      └─ spool entry dropped — postgres never retries ✗

(later) reconcileStanzaCreate → stanza-create → archive.info created
  └─ 00000002.history is permanently missing from the archive

Once the history file is missing, any subsequent PITR restore of this cluster will leave all replicas permanently stuck:

could not find common ancestor of the source and target cluster's timelines

Replica pods never reach Ready (2/4 containers).

Logs

pgBackRest async archiver log on the primary pod (captured from /tmp/pgbackrest/archive-push-async.log):

-------------------PROCESS START-------------------
P00   INFO: archive-push async start
P00   INFO: push 1 WAL file(s) to archive: 00000002.history
P01 DETAIL: pushed WAL file '00000002.history' to the archive

(or in the failing case — error 103 below)

P01  ERROR: [103]: unable to open archive file '00000002.history' for write:
             raised from remote-0 protocol on '...': archive.info does not exist

pgBackRest archive status confirming the file is missing:

kubectl exec -n <namespace> "${PRIMARY}" -c database -- \
  pgbackrest --stanza=db info
# The archive section shows WAL starts at 000000020000000000000001,
# but 00000002.history is absent.

Additional Information

The bug is non-deterministic — it is a race between the async archiver background worker and reconcileStanzaCreate. When the cluster is under load or the object store is slow, the window widens and the bug reproduces more reliably.

Workaround: Set archive-async = n in the pgBackRest global configuration. This forces synchronous archiving so postgres retries on failure instead of silently dropping the spool entry. This has a performance cost for normal WAL archiving.

Proposed fix: After a successful stanza-create, immediately re-push any *.history files found in $PGDATA/pg_wal using --no-archive-async. This is idempotent — if the file is already in the archive the push exits 0; if it was dropped by the race it is recovered. The call site is reconcileStanzaCreate in internal/controller/postgrescluster/pgbackrest.go, immediately after StanzaCreateOrUpgrade returns success and before stanzaCreated: true is written to status.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions