Skip to content

CBG-5220: Freezable cluster compat version#8236

Draft
bbrks wants to merge 5 commits intoCBG-5266from
CBG-5220
Draft

CBG-5220: Freezable cluster compat version#8236
bbrks wants to merge 5 commits intoCBG-5266from
CBG-5220

Conversation

@bbrks
Copy link
Copy Markdown
Member

@bbrks bbrks commented May 6, 2026

CBG-5220

Add ability to freeze current cluster compatibility version to "pin" a cluster to a given version and avoid rolling CCV forwards.
Allows for supported downgrades/rollbacks even if all nodes in the cluster have been upgraded but are pinned behind the frozen version.

REST API changes:

  • GET /_cluster_compat_version returns the cluster-wide version, per-node versions, and the frozen value if set.
  • POST /_cluster_compat_version/freeze pins the version to the current value to preserve rollback capability across upgrades.
  • POST /_cluster_compat_version/unfreeze clears the frozen version.

Dependencies (if applicable)

Integration Tests

@bbrks bbrks changed the title Cbg 5220 CBG-5220: Freezable cluster compat version May 6, 2026
bbrks and others added 3 commits May 6, 2026 15:56
Documents three new admin endpoints under /_cluster_compat_version:
GET returns the cluster-wide version, per-node versions, and the
frozen value if set; POST /freeze pins the version to the current
value to preserve rollback capability across upgrades; POST /unfreeze
clears the freeze. Adds ClusterCompatVersionState response schema and
RegistryFreeze record on GatewayRegistry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an admin-controlled freeze for the cluster compatibility version,
allowing an operator to pin the reported version to its current value
across rolling upgrades and preserve the option to roll back a node.

Storage: GatewayRegistry gains a Frozen *RegistryFreeze field stored
per bucket; the cluster-wide freeze is the aggregate (any bucket frozen
means the cluster is held back). New CAS-safe SetRegistryFreeze and
ClearRegistryFreeze methods on bootstrapContext mirror the existing
node-registration helpers.

Manager: clusterCompatManager tracks the auto-computed live-node
minimum and the aggregate freeze separately. ClusterCompatVersion()
reports the frozen value when set, otherwise the auto minimum.
Refresh and RegisterBucket pick up the freeze record from each tracked
registry. Freeze fans out to all tracked buckets and is success-on-any
(safe direction); Unfreeze is success-on-all and returns the residual
freeze on partial failure.

REST: three new admin endpoints under /_cluster_compat_version (GET,
POST /freeze, POST /unfreeze), DevOps-permission gated. Unfreeze
returns the current state in a 503 body when partially applied.
Three new audit events cover the read and state-changing operations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Align GatewayRegistry's Frozen JSON tag with the OpenAPI spec
  (frozen_cluster_compat_version).
- Audit unfreeze attempts unconditionally so partial failures still
  produce an audit trail.
- Verify each REST endpoint emits its audit event in the round-trip
  test by wiring it through the EE audit-logging test harness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bbrks bbrks self-assigned this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant