Skip to content

ci(coverage): make the job advisory + switch to lcov#329

Open
fpelliccioni wants to merge 1 commit intomasterfrom
ci/coverage-drop-html-details
Open

ci(coverage): make the job advisory + switch to lcov#329
fpelliccioni wants to merge 1 commit intomasterfrom
ci/coverage-drop-html-details

Conversation

@fpelliccioni
Copy link
Copy Markdown
Contributor

@fpelliccioni fpelliccioni commented May 3, 2026

Summary

Coverage CI has been red since 2026-04-24. Iterated through four
report-tool configurations on this branch (gcovr 8.6 with HTML,
gcovr 8.6 XML-only, gcovr 8.5, lcov serial, lcov --parallel
narrowed) — every one hit the same "hosted runner lost
communication" signature, GHA's report when the runner process
gets OS-killed (memory pressure SIGKILL).

The Sanitizers job in master's Build-and-Test workflow has been
failing the same way during the same window, which points at the
GHA hosted runner's 7 GB ceiling rather than this workflow's
config. The codebase has grown ~6.4 k lines across `src/` since
the last green coverage run on 2026-04-23 (`24851294208`, gcovr
finished in 2m39s), and instrumented builds (-O0 + --coverage,
TSAN, ASAN) no longer fit.

Diff

Two changes, both narrow:

  • Mark the job `continue-on-error: true`. The Code Coverage
    check stays present so runs that do squeeze through still post
    to Codecov, but a runner-lost no longer fails the PR. PRs
    aren't blocked on a problem fundamentally outside this
    workflow's reach until coverage moves to a runner with more
    headroom (or the codebase shrinks).

  • Switch report tool from gcovr to lcov. Even though the
    job is advisory, lcov's bounded per-TU memory model is more
    likely to actually finish on the runs that aren't OOM-killed
    in the test step.

    • Capture narrowed to the three sub-projects we keep
      (`domain` / `infrastructure` / `blockchain`).
    • `--parallel $(nproc)` so gcov runs concurrently per TU.
    • `lcov_branch_coverage=0` (smaller intermediates,
      faster parse).
    • `geninfo_unexecuted_blocks=1` (unreached funcs still
      show 0%).
    • Single `--remove` pass for test sources.
    • Drop the legacy `--html-details` output and the matching
      `📎 Upload HTML report` step (Codecov's UI serves the
      same drill-down; the artifact upload has been unreachable
      for ten days).

What's unchanged

The `-O0 --coverage` build flags, ccache wiring, the master-
and-PR triggers, the 130-min job timeout, the 20-min report-step
timeout, the Codecov action version + secret wiring, the
`if: always()` semantics on the report + upload steps.

Test plan

  • CI runs the new workflow on this PR — Code Coverage
    check posts as advisory (not required), so the PR's
    mergeable state isn't tied to its outcome.
  • Codecov picks up `coverage.info` on runs that complete
    and posts the usual PR comment.
  • After merge, master's coverage badge updates on at
    least one of the next few master pushes.

Summary by CodeRabbit

  • Chores

    • CI coverage reporting switched to lcov-format with a single consolidated coverage file for external upload; separate HTML artifact removed.
    • Coverage capture optimized for multi-directory builds, run in parallel, with unexecuted-block handling and pruning of test-only sources.
    • Branch coverage disabled to reduce memory use and streamline reporting.
  • Documentation

    • Added explanation of memory-related constraints and the safer lcov-based approach.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

📝 Walkthrough

Walkthrough

Replaces gcovr-based coverage generation with an lcov pipeline in .github/workflows/coverage.yml: install lcov, run lcov --capture over narrowed build subdirectories, prune test sources with lcov --remove, produce a single coverage.info, and upload that to Codecov; HTML artifact removed. (≤50 words)

Changes

Coverage workflow: switch to lcov

Layer / File(s) Summary
Advisory / Job Resilience
.github/workflows/coverage.yml
Adds explanatory comments about memory constraints and sets continue-on-error: true for the coverage job.
Tooling / Capture
.github/workflows/coverage.yml
Replaces pip install gcovr / gcovr steps with lcov installation and lcov --capture, targeting narrowed build/.../Release directories and enabling --parallel and geninfo_unexecuted_blocks while disabling branch coverage flags.
Path Filtering / Pruning
.github/workflows/coverage.yml
Prunes test sources and unwanted paths using lcov --remove (narrowed capture scope, explicit directory targets).
Summary / Upload
.github/workflows/coverage.yml
Writes a single coverage.info, prints a short summary via `lcov --list coverage.info
Removed artifacts / cleanup
.github/workflows/coverage.yml
Removes GCOVR XML/HTML generation and the separate HTML coverage artifact upload.

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly Related PRs

  • k-nuth/kth#191: Introduces a gcovr-based coverage job in the same workflow (producing coverage.xml and an HTML artifact) — directly related to the switch from gcovr to lcov.
  • k-nuth/kth#317: Modifies .github/workflows/coverage.yml to narrow coverage capture paths and remove the HTML artifact — closely related scope.

Poem

🐰 I nibbled at CI's log so fine,

swapped gcovr for lcov line by line.
A single coverage.info snug and neat,
lighter reports, no bulky HTML to keep.
Hooray — hop, patch, and carrot treat! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'ci(coverage): make the job advisory + switch to lcov' accurately summarizes the main change—switching from gcovr to lcov for coverage reporting in the CI workflow.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/coverage-drop-html-details

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@fpelliccioni fpelliccioni force-pushed the ci/coverage-drop-html-details branch from 087ab7e to b8c69c4 Compare May 3, 2026 20:15
@fpelliccioni fpelliccioni changed the title ci(coverage): unblock gcovr by dropping per-file HTML output ci(coverage): pin gcovr to 8.5 to unblock the runner-OOM May 3, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/coverage.yml:
- Around line 130-131: Update the inline comment that mentions gcovr release
dates: replace "March 2026" with the correct PyPI release date for gcovr 8.6
("January 13, 2026") and, if mentioning gcovr 8.5, ensure it reads "January 8,
2026"; locate the comment around the gcovr pinning text (the comment lines
referencing "Pin to gcovr 8.5" and the subsequent sentence about the 8.6
release) and edit the dates accordingly so the comment accurately reflects
PyPI's release dates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4bd71681-147d-4fda-83e2-ff6ce5bf7088

📥 Commits

Reviewing files that changed from the base of the PR and between a38f3f0 and b8c69c4.

📒 Files selected for processing (1)
  • .github/workflows/coverage.yml

Comment thread .github/workflows/coverage.yml Outdated
@fpelliccioni fpelliccioni force-pushed the ci/coverage-drop-html-details branch from b8c69c4 to b437e3a Compare May 5, 2026 12:23
@fpelliccioni fpelliccioni changed the title ci(coverage): pin gcovr to 8.5 to unblock the runner-OOM ci(coverage): switch to lcov to unblock the OOM-killed runner May 5, 2026
@fpelliccioni fpelliccioni force-pushed the ci/coverage-drop-html-details branch from b437e3a to 9aa6964 Compare May 5, 2026 17:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/coverage.yml:
- Line 165: Replace the deprecated RC option "--rc lcov_branch_coverage=0" with
the new "--rc branch_coverage=0" wherever it appears in the workflow (update
both occurrences matching the literal string "--rc lcov_branch_coverage=0");
ensure the lcov invocation(s) that currently pass "--rc lcov_branch_coverage=0"
(search for that exact token) are changed to "--rc branch_coverage=0" so lcov
2.x no longer emits the deprecation warning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc843a2e-eb3f-49ca-843d-5839d7c89406

📥 Commits

Reviewing files that changed from the base of the PR and between b437e3a and 9aa6964.

📒 Files selected for processing (1)
  • .github/workflows/coverage.yml

Comment thread .github/workflows/coverage.yml
--parallel $(nproc) \
--rc lcov_branch_coverage=0 \
--rc geninfo_unexecuted_blocks=1 \
--output-file coverage.info
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing --ignore-errors mismatch for lcov 2.0 + GCC 15

Medium Severity

lcov 2.0 (shipped by Ubuntu 24.04 apt) is known to hard-fail with geninfo: ERROR: mismatched end line when processing gcov output from GCC 14+. The container uses GCC 15, which exhibits the same gcov data inconsistencies. Without --ignore-errors mismatch on both the lcov --capture and lcov --remove invocations, the coverage step can abort on the first function with mismatched boundaries, producing no coverage.info at all.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9aa6964. Configure here.

…h parallel capture

Coverage CI has been red since 2026-04-24. Iterated through four
report-tool configurations on this branch and master's
Build-and-Test (Sanitizers) job has been showing the same
runner-lost-communication signature for the same window:

  1. gcovr 8.6 with --html-details — runner OOM during the
     report step.
  2. gcovr 8.6 XML-only — same OOM; the parsed coverage tree
     gcovr holds is the memory hog regardless of writer.
  3. gcovr 8.5 — same OOM; the 8.6 → 8.5 refactor was not the
     trigger.
  4. lcov serial — capture step ran past the 20-min step
     timeout while invoking gcov sequentially on every TU
     under build/.
  5. lcov --parallel + narrowed --directory + branch coverage
     off — still hit the runner-lost signature.

Last green coverage run was 2026-04-23 (24851294208), with
gcovr completing in 2m39s. The codebase has grown ~6.4 k lines
across src/ since, and the GHA hosted runner's 7 GB ceiling no
longer accommodates the instrumented build's coverage tooling
under any of the configurations above. The Sanitizers job hits
the same wall on master, pointing at runner memory rather than
this workflow specifically.

Two changes here, both narrow:

* Mark the job `continue-on-error: true`. The Code Coverage
  check stays present (so any run that does squeeze through
  posts to Codecov), but a runner-lost no longer marks the PR
  as failed. PRs aren't blocked on a problem that's
  fundamentally outside this workflow's reach until coverage
  moves to a runner with more headroom (or the codebase
  shrinks).

* Switch the report tool from gcovr to lcov with --parallel
  capture, narrowed --directory to the three sub-projects
  we keep (domain / infrastructure / blockchain), branch
  coverage off, and a single --remove pass for test sources.
  The serial-lcov attempt on this branch ran past the 20-min
  budget; the parallel + narrowed form completes the test
  step + capture inside the budget on every run that doesn't
  hit the runner-OOM. Codecov accepts `coverage.info`
  natively, so the upload contract is unchanged.

What's unchanged: the `-O0 --coverage` build flags, the
ccache wiring, the master-and-PR triggers, the 130-min job
timeout, the 20-min report-step timeout, the Codecov action
version + secret wiring, the `if: always()` semantics on
the report and upload steps. The only legacy artifact that
goes is the per-file HTML output and its `📎 Upload HTML
report` step — Codecov's hosted UI has served the same
drill-down view for ten days while the artifact upload was
unreachable, so the artifact was redundant when it worked.
@fpelliccioni fpelliccioni force-pushed the ci/coverage-drop-html-details branch from 9aa6964 to 1e21a73 Compare May 5, 2026 21:05
@fpelliccioni fpelliccioni changed the title ci(coverage): switch to lcov to unblock the OOM-killed runner ci(coverage): make the job advisory + switch to lcov May 5, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
.github/workflows/coverage.yml (3)

167-167: 💤 Low value

Avoid silencing stderr on the lcov install.

apt-get install -y -qq lcov > /dev/null 2>&1 discards both stdout and stderr. If the install ever fails (mirror flake, transient apt issue, missing index after a base-image bump), the next line (lcov --version) will fail with a confusing "command not found" instead of the underlying apt error, complicating debugging on a job that is already advisory. -qq already minimizes noise; dropping the redirect surfaces real failures.

♻️ Proposed change
-          apt-get update -qq && apt-get install -y -qq lcov > /dev/null 2>&1
+          apt-get update -qq && apt-get install -y -qq lcov
           lcov --version
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/coverage.yml at line 167, The apt-get install invocation
currently silences both stdout and stderr ("apt-get install -y -qq lcov >
/dev/null 2>&1"); remove the redirection so installation failures surface (keep
-qq to minimize noise), i.e., change the command used to install lcov in the
workflow so it no longer redirects output to /dev/null and thus preserves error
messages from apt-get.

169-186: 💤 Low value

Capture scope is well constrained; consider failing loudly if a --directory is absent.

The narrowed capture (src/domain, src/infrastructure, src/blockchain) and the explicit --remove '*/test/*' '*/tests/*' correctly translate the prior gcovr filters and bound peak memory. One minor concern: if a future refactor renames or removes one of these build/build/Release/src/* subtrees, lcov will warn ("no .gcda files found") but still produce a partial coverage.info, silently shrinking the reported coverage surface. Since this job is already advisory (continue-on-error: true), a missing directory could go unnoticed for a while.

Optional: pre-check directory existence and fail fast, e.g.:

for d in domain infrastructure blockchain; do
  test -d "build/build/Release/src/$d" || { echo "::error::missing coverage source dir: $d"; exit 1; }
done
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/coverage.yml around lines 169 - 186, Add a pre-check
before the lcov capture block to fail loudly if any of the expected source
directories (domain, infrastructure, blockchain) under build/build/Release/src
are missing: iterate over the names "domain", "infrastructure", "blockchain",
test -d for each "build/build/Release/src/$d", and if any check fails emit a
GitHub Actions error message (e.g. "::error::missing coverage source dir: $d")
and exit non‑zero so the job stops before running the lcov commands; place this
check immediately before the existing lcov --capture invocation to prevent
producing a partial coverage.info when directories are absent.

186-186: 💤 Low value

lcov --list ... | tail -1 masks pipeline errors.

GitHub Actions runs run: blocks with bash -e but not pipefail, so a failure in lcov --list is hidden by the successful tail. The report has already been generated by this point so it's not critical, but enabling pipefail (or running lcov --list and tail as separate steps) makes the summary line a more reliable smoke-test for a malformed coverage.info.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/coverage.yml at line 186, The pipeline masks failures
because the command `lcov --list coverage.info | tail -1` runs under bash
without pipefail; modify the run step to enable pipefail (e.g., prefix with `set
-o pipefail` or use `bash -eo pipefail`) so a failing `lcov --list` causes the
step to fail, or alternatively split into two steps by running `lcov --list
coverage.info` and then `tail -1` separately; update the line containing `lcov
--list coverage.info | tail -1` accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/coverage.yml:
- Line 167: The apt-get install invocation currently silences both stdout and
stderr ("apt-get install -y -qq lcov > /dev/null 2>&1"); remove the redirection
so installation failures surface (keep -qq to minimize noise), i.e., change the
command used to install lcov in the workflow so it no longer redirects output to
/dev/null and thus preserves error messages from apt-get.
- Around line 169-186: Add a pre-check before the lcov capture block to fail
loudly if any of the expected source directories (domain, infrastructure,
blockchain) under build/build/Release/src are missing: iterate over the names
"domain", "infrastructure", "blockchain", test -d for each
"build/build/Release/src/$d", and if any check fails emit a GitHub Actions error
message (e.g. "::error::missing coverage source dir: $d") and exit non‑zero so
the job stops before running the lcov commands; place this check immediately
before the existing lcov --capture invocation to prevent producing a partial
coverage.info when directories are absent.
- Line 186: The pipeline masks failures because the command `lcov --list
coverage.info | tail -1` runs under bash without pipefail; modify the run step
to enable pipefail (e.g., prefix with `set -o pipefail` or use `bash -eo
pipefail`) so a failing `lcov --list` causes the step to fail, or alternatively
split into two steps by running `lcov --list coverage.info` and then `tail -1`
separately; update the line containing `lcov --list coverage.info | tail -1`
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31fd87c5-0f18-48d5-89f4-553c5818196f

📥 Commits

Reviewing files that changed from the base of the PR and between 9aa6964 and 1e21a73.

📒 Files selected for processing (1)
  • .github/workflows/coverage.yml

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1e21a73. Configure here.

# codebase shrinks), keep the job present so it can capture
# data on the runs that do squeeze through, without making
# every PR red on the runs that don't.
continue-on-error: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue-on-error neutralises fail_ci_if_error guard

Medium Severity

The newly added continue-on-error: true at the job level means the workflow always reports this job as successful, regardless of any step failure. This silently neutralises the existing fail_ci_if_error: ${{ secrets.CODECOV_TOKEN != '' }} on the Codecov upload step — that setting can still mark the step red in the UI, but it can no longer block the PR or surface a workflow-level failure. If the Codecov token expires, upload misconfigures, or coverage data is corrupt, the failure will be invisible to anyone not clicking into the job details.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1e21a73. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant