Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a panic in the on-demand import-for-GET path when an import is cancelled (via ErrImportCancelled) and importDoc ends up returning (nil, nil), which previously could lead to a nil dereference when GetDocSyncData returns doc.SyncData. This addresses a crash scenario reachable from revocation feed processing (buildRevokedFeed) that can otherwise take down the process via FatalPanicHandler.
Changes:
- Add a defensive
doc == nilcheck afterOnDemandImportForGetinGetDocSyncData, treating the situation as not found. - Add a regression test that reproduces the CAS-mismatch retry + resurrection race that can trigger
ErrImportCancelledmid-flight.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| db/crud.go | Prevents nil dereference by handling the (nil, nil) import-cancelled outcome as not-found during GetDocSyncData on-demand import. |
| db/import_test.go | Adds a targeted regression test to reproduce the import-cancel race and verify it returns an error instead of panicking. |
| if importErr != nil { | ||
| return emptySyncData, importErr | ||
| } | ||
| // importDoc swallows ErrImportCancelled (e.g. SG purge race), returning | ||
| // (nil, nil). Treat that the same as not found. | ||
| if doc == nil { | ||
| base.DebugfCtx(ctx, base.KeyImport, "Unable to import doc %q during on demand import for get - will be treated as not found.", base.UD(docid)) | ||
| return emptySyncData, base.ErrNotFound | ||
| } |
There was a problem hiding this comment.
Went for this approach as I feel this is the less risky approach considering this is candidate for backport into maintenance releases. Other callers correctly guard against this
There was a problem hiding this comment.
Can you make sure we have a followup ticket to fix this? I see that the other case of OnDemandImportForGet is covered but this is a footgun.
I actually don't even think we need to log for this case.
There was a problem hiding this comment.
torcolvin
left a comment
There was a problem hiding this comment.
Code looks good, just some questions about the tests.
| return | ||
| } | ||
| resurrectOnce.Do(func() { | ||
| err := docDatastore.SetRaw(key, 0, nil, []byte(`{"foo":"resurrected"}`)) |
There was a problem hiding this comment.
SetRaw behaves differently for rosmar and this would get marked as a binary document, I am surprised this fails in the same way, can you verify if this is the case and this doesn't hit https://jira.issues.couchbase.com/browse/CBG-4496?
| // Ensure GetDocSyncData will handle a nil doc returned from an on-demand import event when ErrImportCancelled is returned. | ||
| _, err = collection.GetDocSyncData(ctx, docID) | ||
| require.Error(t, err, "expected an error when import is cancelled mid-flight, not a panic") | ||
| require.True(t, base.IsDocNotFoundError(err), "expected a document-not-found error after import is cancelled mid-flight, got: %v", err) |
There was a problem hiding this comment.
nit, take or leave:
| require.True(t, base.IsDocNotFoundError(err), "expected a document-not-found error after import is cancelled mid-flight, got: %v", err) | |
| base.RequireDocNotFoundError(t, err) |
| if key != docID || !importTriggered { | ||
| return | ||
| } | ||
| resurrectOnce.Do(func() { |
There was a problem hiding this comment.
Why does this get hit multiple times and need a sync.Once? Is it better to just have a boolean to say, only resurrect once? I'm OK either way, I am just curious why this is necessary and trying to put myself in the shoes of future me debugging these tests.
Is it worth making sure that this code does get hit with a an assertion that this code does actually get hit, in the case that the write method is changed from WriteUpdateWithXattrs to Update or something like that? I recently debugged some code where it turned out the callback wasn't getting hit and it was relying on different behavior?
| // the body. Execution falls through to isDelete && doc.GetRevTreeID() == "", which fires and returns ErrImportCancelled. | ||
| // 5. importDoc's switch has no return statement in the ErrImportCancelled case, so it | ||
| // falls through to return docOut, nil with docOut==nil. | ||
| func TestGetDocSyncDataPanicOnImportCancelled(t *testing.T) { |
There was a problem hiding this comment.
I greatly prefer if you don't include the word Panic in the test name so it is easier to find a real panic when searching test output.
Do we need to do a similar test for feed import, or do you think this is covered?
CBG-5355
When
ErrImportCancelledis encountered inimportDocwe swallow the error and return silently nil doc but also nil error. This results in us returning nil, nil forOnDemandImportForGetinsideGetDocSyncDatawhich can lead to panic when returning doc.SyncData.GetDocSyncDatais called frombuildRevokedFeedwhich when hitting this panic can bring down the whole process viaFatalPanicHandler.Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests