Skip to content

CS-10103: clamp error_doc on persistence to dodge jsonb 256 MiB array limit#4564

Merged
habdelra merged 2 commits intomainfrom
cs-10103-error-doc-exceeded-column-limit
Apr 29, 2026
Merged

CS-10103: clamp error_doc on persistence to dodge jsonb 256 MiB array limit#4564
habdelra merged 2 commits intomainfrom
cs-10103-error-doc-exceeded-column-limit

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

Summary

Fixes CS-10103 — the production Sentry where boxel_index.error_doc upserts fail with total size of jsonb array elements exceeds the maximum of 268435455 bytes. Postgres jsonb stores child offsets in 28 bits, so a single jsonb array's elements must total < 256 MiB — that's a format-level constraint baked into jsonb, not a column setting we can raise.

The triggering pattern in production: dependency-error fan-out repeatedly copies each dep row's full error_doc (nested additionalErrors and all) into its parent during indexing. Across many indexing cycles in a realm with chained errors, the additionalErrors tree compounds without bound and eventually hits the jsonb limit on upsert.

What this changes

  • New clampSerializedError in runtime-common/error.ts. It is the chokepoint that runs on every error_doc going into the database (both boxel_index.error_doc via IndexWriter.normalizeErrorDoc and modules.error_doc via CachingDefinitionLookup.writeToDatabaseCache).
  • In normal operation it is a pure pass-through. When an error_doc already serializes under ERROR_DOC_MAX_BYTES (8 MiB — 32× under the jsonb limit), the input is returned unchanged. Nothing is dropped.
  • When over budget it sheds structure progressively, stopping as soon as the doc fits:
    1. truncate each additionalErrors[i].stack to ~64 KiB
    2. truncate the top-level stack to ~64 KiB
    3. truncate each additionalErrors[i].message to ~16 KiB
    4. collapse nested additionalErrors of inherited entries (drop only the second level of nesting)
    5. cap the array length to 200 with a sentinel entry recording how many were omitted
    6. drop additionalErrors entirely with a sentinel — last resort
    7. aggressively shrink the top-level stack/message if even the bare envelope is still too big
  • Top-level id, status, title, deps, source, diagnostics, isCardError are never touched — they're scalars/small.

The dep-error propagation paths in index-runner/index-backed-dependency-errors.ts and definition-lookup.ts are intentionally unchanged — preserving full additionalErrors trees for debug visibility was a hard requirement. The clamp is a safety net at the persistence layer, not a routine truncation in the propagators.

Test plan

  • clamp-serialized-error-test.ts — 10 unit tests, one per shedding step, each asserting that only the targeted step ran (later steps left the doc alone). Verifies pass-through for in-budget docs and no-mutation of the input.
  • index-writer-test.ts — 2 integration tests: one verifying an in-budget error_doc is persisted verbatim through the IndexWriter, one verifying an oversized doc is shed progressively until it fits.
  • pnpm lint clean for runtime-common, realm-server, host.
  • Indexing-test in realm-server (in progress — full integration run with the dep-error fan-out path).

🤖 Generated with Claude Code

`clampSerializedError` is a no-op for any error_doc that already
serializes under 8 MiB. When a doc would exceed the budget — almost
always because dep-error fan-out has accumulated a deep
`additionalErrors` tree across many indexing cycles — it sheds
structure progressively (per-entry stacks → top-level stack →
per-entry messages → nested additionalErrors → entry count cap →
drop additionalErrors → envelope shrink), stopping as soon as the
doc fits. Wired in at the IndexWriter and modules-cache write paths
so every error row passes through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a persistence-layer safety net to prevent oversized error_doc JSONB writes from failing due to Postgres’s jsonb array element-size limit, by clamping serialized errors right before they’re written to the database.

Changes:

  • Introduces clampSerializedError() (with exported size/entry-count constants) to progressively shed SerializedError.additionalErrors/stack/message data until under an 8 MiB budget.
  • Applies the clamp at the two main persistence chokepoints: index writer error-doc writes and module cache error-doc writes.
  • Adds unit + integration test coverage to verify pass-through behavior when under budget and progressive shedding when over budget.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/runtime-common/index.ts Re-exports the new clamp function and related constants for consumers.
packages/runtime-common/index-writer.ts Clamps normalized error docs immediately before persistence to boxel_index.error_doc.
packages/runtime-common/error.ts Implements clampSerializedError() and defines size/entry limits + sentinel behavior.
packages/runtime-common/definition-lookup.ts Clamps modules.error_doc.error before JSON serialization to the DB cache.
packages/realm-server/tests/index.ts Registers the new clamp unit test file in the realm-server test suite.
packages/realm-server/tests/clamp-serialized-error-test.ts Adds focused unit tests for each progressive shedding step and no-mutation behavior.
packages/host/tests/unit/index-writer-test.ts Adds integration tests ensuring IndexWriter persists in-budget errors verbatim and sheds oversized ones.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/runtime-common/error.ts Outdated
Comment thread packages/runtime-common/error.ts Outdated
Addresses two PR review comments on the clamp:

1. Step 5 now keeps `MAX-1` real entries plus one sentinel, so the
   final array length equals `ERROR_DOC_MAX_ADDITIONAL_ERRORS` — the
   constant name and the resulting length agree.

2. Capture `originalAdditionalCount` once at the top so both step 5's
   and step 6's sentinels report the *original* count, not whatever
   the array length happens to be after a prior step.

Adds a step-6 assertion that the lone sentinel reports the original
count (otherwise step 6 would silently report `MAX` instead of the
real number that got dropped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Preview deployments

@habdelra habdelra requested a review from a team April 29, 2026 01:03
@github-actions
Copy link
Copy Markdown
Contributor

Realm Server Test Results

    1 files  ± 0      1 suites  ±0   17m 17s ⏱️ - 1m 5s
1 112 tests +10  1 112 ✅ +11  0 💤 ±0  0 ❌  - 1 
1 184 runs  +10  1 184 ✅ +11  0 💤 ±0  0 ❌  - 1 

Results for commit fb37c25. ± Comparison against base commit a3c9faa.

@github-actions
Copy link
Copy Markdown
Contributor

Host Test Results

    1 files  ±0      1 suites  ±0   3h 3m 22s ⏱️ + 4m 54s
2 475 tests +2  2 460 ✅ +2  15 💤 ±0  0 ❌ ±0 
2 494 runs  +2  2 479 ✅ +2  15 💤 ±0  0 ❌ ±0 

Results for commit fb37c25. ± Comparison against base commit a3c9faa.

@habdelra habdelra merged commit df48c95 into main Apr 29, 2026
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants