Skip to content

CBG-5355: fix panic for on demand get when ErrImportCancelled is returned from import#8225

Open
gregns1 wants to merge 3 commits intomainfrom
CBG-5355
Open

CBG-5355: fix panic for on demand get when ErrImportCancelled is returned from import#8225
gregns1 wants to merge 3 commits intomainfrom
CBG-5355

Conversation

@gregns1
Copy link
Copy Markdown
Contributor

@gregns1 gregns1 commented May 5, 2026

CBG-5355

When ErrImportCancelled is encountered in importDoc we swallow the error and return silently nil doc but also nil error. This results in us returning nil, nil for OnDemandImportForGet inside GetDocSyncData which can lead to panic when returning doc.SyncData.

GetDocSyncData is called from buildRevokedFeed which when hitting this panic can bring down the whole process via FatalPanicHandler.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

  • Link upstream PRs
  • Update Go module dependencies when merged

Integration Tests

Copilot AI review requested due to automatic review settings May 5, 2026 15:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a panic in the on-demand import-for-GET path when an import is cancelled (via ErrImportCancelled) and importDoc ends up returning (nil, nil), which previously could lead to a nil dereference when GetDocSyncData returns doc.SyncData. This addresses a crash scenario reachable from revocation feed processing (buildRevokedFeed) that can otherwise take down the process via FatalPanicHandler.

Changes:

  • Add a defensive doc == nil check after OnDemandImportForGet in GetDocSyncData, treating the situation as not found.
  • Add a regression test that reproduces the CAS-mismatch retry + resurrection race that can trigger ErrImportCancelled mid-flight.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
db/crud.go Prevents nil dereference by handling the (nil, nil) import-cancelled outcome as not-found during GetDocSyncData on-demand import.
db/import_test.go Adds a targeted regression test to reproduce the import-cancel race and verify it returns an error instead of panicking.

Comment thread db/crud.go
Comment on lines 193 to +201
if importErr != nil {
return emptySyncData, importErr
}
// importDoc swallows ErrImportCancelled (e.g. SG purge race), returning
// (nil, nil). Treat that the same as not found.
if doc == nil {
base.DebugfCtx(ctx, base.KeyImport, "Unable to import doc %q during on demand import for get - will be treated as not found.", base.UD(docid))
return emptySyncData, base.ErrNotFound
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went for this approach as I feel this is the less risky approach considering this is candidate for backport into maintenance releases. Other callers correctly guard against this

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure we have a followup ticket to fix this? I see that the other case of OnDemandImportForGet is covered but this is a footgun.

I actually don't even think we need to log for this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread db/import_test.go
Comment thread db/import_test.go Outdated
Comment thread db/import_test.go Outdated
Copy link
Copy Markdown
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, just some questions about the tests.

Comment thread db/import_test.go
return
}
resurrectOnce.Do(func() {
err := docDatastore.SetRaw(key, 0, nil, []byte(`{"foo":"resurrected"}`))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetRaw behaves differently for rosmar and this would get marked as a binary document, I am surprised this fails in the same way, can you verify if this is the case and this doesn't hit https://jira.issues.couchbase.com/browse/CBG-4496?

Comment thread db/import_test.go
// Ensure GetDocSyncData will handle a nil doc returned from an on-demand import event when ErrImportCancelled is returned.
_, err = collection.GetDocSyncData(ctx, docID)
require.Error(t, err, "expected an error when import is cancelled mid-flight, not a panic")
require.True(t, base.IsDocNotFoundError(err), "expected a document-not-found error after import is cancelled mid-flight, got: %v", err)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take or leave:

Suggested change
require.True(t, base.IsDocNotFoundError(err), "expected a document-not-found error after import is cancelled mid-flight, got: %v", err)
base.RequireDocNotFoundError(t, err)

Comment thread db/import_test.go
if key != docID || !importTriggered {
return
}
resurrectOnce.Do(func() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this get hit multiple times and need a sync.Once? Is it better to just have a boolean to say, only resurrect once? I'm OK either way, I am just curious why this is necessary and trying to put myself in the shoes of future me debugging these tests.

Is it worth making sure that this code does get hit with a an assertion that this code does actually get hit, in the case that the write method is changed from WriteUpdateWithXattrs to Update or something like that? I recently debugged some code where it turned out the callback wasn't getting hit and it was relying on different behavior?

Comment thread db/import_test.go
// the body. Execution falls through to isDelete && doc.GetRevTreeID() == "", which fires and returns ErrImportCancelled.
// 5. importDoc's switch has no return statement in the ErrImportCancelled case, so it
// falls through to return docOut, nil with docOut==nil.
func TestGetDocSyncDataPanicOnImportCancelled(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I greatly prefer if you don't include the word Panic in the test name so it is easier to find a real panic when searching test output.

Do we need to do a similar test for feed import, or do you think this is covered?

@torcolvin torcolvin assigned gregns1 and unassigned torcolvin May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants