Add separate PVC for tiered storage cache#239
Conversation
db077d2 to
7c22df7
Compare
Pre-Flight Review —
|
| Concern | Status |
|---|---|
| Crash safety / re-entrancy | Correct — state written before replacement PVC created |
| Terminating pod = absent | Correct |
| Grow path (no special handling) | Correct |
| CC + log.dirs exclusion | Correct |
| CRD backward compatibility | omitempty on all new fields |
Findings
HIGH — Fix before merging
C1 — deleteRemovedCachePVCs deletes a mounted PVC
pkg/resources/kafka/kafka.go
When a TieredStorageCache: true storage config is removed while the broker pod is still running and no CacheVolumeStates entry exists for that mount path, handleBrokerCacheResizeCleanup returns early (len(cacheVolumeStates) == 0), then deleteRemovedCachePVCs runs unconditionally and issues a Delete against the actively-mounted PVC. The PVC-protection finalizer prevents immediate data loss, but the delete is issued unintentionally.
cleanupOrphanedDuplicateCachePVCs (correctly) gates on !brokerPodExists. deleteRemovedCachePVCs must do the same:
if !brokerPodExists {
if err := r.deleteRemovedCachePVCs(ctx, log, brokerId, pvcList, desiredMountPaths); err != nil {
return err
}
}MEDIUM — Strongly recommended
C4 — Missing test: crash re-entry with no replacement PVC
pkg/resources/kafka/kafka_test.go
No test covers: initialCacheVolumeState = CacheResizePendingDeletion, only the old PVC exists (reconciler crashed between writing state and creating replacement), pod running. Expected: no Delete, no state clear, one Create (replacement re-staged). The invariant is described in comments but untested.
A1 — Empty-string sentinel for map deletion is an implicit contract
pkg/k8sutil/status.go:182
CacheResizeState("") silently deletes map entries in generateBrokerState. The established pattern is DeleteVolumeStatus. An uninitialized CacheResizeState variable passed by a future caller would delete entries with no compile error. Consider a named constant (CacheResizeStateNone) or a dedicated DeleteCacheVolumeStatus helper.
A2 — "tieredStorageCache" annotation key duplicated as bare string 9+ times
pkg/resources/kafka/kafka.go, pvc.go, kafka_test.go
A typo at any read site silently treats a cache PVC as a regular data volume. Define a named constant (e.g., TieredStorageCacheAnnotation = "tieredStorageCache") following the BrokerIdLabelKey precedent.
A5 — Tiered-cache PVC guard predicate duplicated at 3 sites
pkg/resources/kafka/kafka.go:1298, 1326, 1339
pvc.DeletionTimestamp != nil || pvc.Annotations["tieredStorageCache"] != annotationTrue appears three times. Extract to a helper (isTieredCachePVC).
SE1 — PVC deletion gated only on mutable annotations, no OwnerReference check
pkg/resources/kafka/kafka.go — deleteRemovedCachePVCs, cleanupOrphanedDuplicateCachePVCs
The List selector (LabelsForKafka + BrokerIdLabelKey) and annotation filter (tieredStorageCache, mountPath) have no OwnerReference verification. A principal with patch access to PVCs in the namespace could craft a matching PVC to trigger deletion. Add an ownership guard before deleting in both helpers.
SE2 — Annotation backfill enables two-step privilege escalation
pkg/resources/kafka/kafka.go:1608
A KafkaCluster CR writer (without direct PVC delete access) can: (1) set TieredStorageCache: true on an existing data volume's storage config — operator backfills the annotation; (2) remove the storage config entry — deleteRemovedCachePVCs deletes the now-annotated data PVC, bypassing the CC disk-removal workflow. Consider restricting backfill to ClaimPending PVCs or adding a validation webhook.
D1 — Rollback mid-resize leaves two PVCs indefinitely with no recovery path
docs/tiered-storage-pvc-resize.md
If the operator is rolled back while pending-deletion state is set, the old operator ignores the unknown field — two PVCs coexist at the same mount path with no automatic recovery and no visible signal in kubectl describe kafkacluster. Add a "Rollback safety" section with detection steps and a manual recovery procedure.
D2 — No Kubernetes Events for resize lifecycle
pkg/resources/kafka/kafka.go
The entire resize lifecycle is visible only in operator logs. Emit record.Event(...) at minimum when pending-deletion is first written and when cleanup completes. This is the standard on-call triage interface; without it, a broker restart from a resize is indistinguishable from a crash without operator log access.
LOW — Optional improvements
| ID | Finding | Location |
|---|---|---|
| C3 | Add unit test for C1's scenario (config removed, pod running, no prior state) | kafka_test.go |
| A3 | annotationTrue used for both annotation values and Kafka config comparisons — split or inline in configmap.go |
kafka.go:72 |
| A4 | skipBroker bool return from handleBrokerCacheResizeCleanup couples helper to caller's loop structure |
kafka.go:360 |
| S3 | Two new //nolint:funlen lack rationale comments |
kafka.go:1402, 1552 |
| D4 | UpdateBrokerStatus(ConfigOutOfSync) called every cycle during rolling restart; guard may miss due to stale in-memory state |
kafka.go:~1653 |
| D6 | Pre-cleanup in e2e uses 10-min tsResizePhaseTimeout for a step that should complete in seconds |
test_tiered_storage_cache_resize.go:228 |
Rejected Findings
| ID | Reason |
|---|---|
| C2 | getCreatedPvcForBroker deduplication target is irrelevant — reconcileDesiredPvcsForBroker independently identifies the correct PVC by size |
| A6 | Per-PVC r.List in inner loop is pre-existing; PR doesn't worsen it |
| SE3 | Log verbosity preference; mount paths/PVC names are standard operational metadata |
| D3 | SETUP_ENVTEST_VERSION pin has explicit rationale comment; intentionally excluded from renovate |
| D5 | Namespace set explicitly on the line before requireDeleteKafkaCluster; code comment documents the pattern |
Add tiered storage cache as a separate PVC
Summary
Kafka log volumes.
replacement is created immediately (provisioning overlaps with gate evaluation), and the old PVC is deleted once the broker pod stops. A grow continues to use the standard Kubernetes in-place expansion path. Full details in
docs/tiered-storage-pvc-resize.md.
Test plan