Skip to content

[caliper] Start integrating the post-processing in the orchestration layer#55

Open
kpouget wants to merge 15 commits intoopenshift-psap:mainfrom
kpouget:caliper-full
Open

[caliper] Start integrating the post-processing in the orchestration layer#55
kpouget wants to merge 15 commits intoopenshift-psap:mainfrom
kpouget:caliper-full

Conversation

@kpouget
Copy link
Copy Markdown
Contributor

@kpouget kpouget commented May 5, 2026

Summary by CodeRabbit

  • New Features

    • End-to-end post-processing: parse, visualize, KPI generate/import/export, regression analysis, and AI-eval export via an enhanced CLI with workspace-aware options.
    • Per-test-base caching, plugin protocol, sample/demo plugins, Dash KPI viewer, reports index generation, and CI orchestration hooks.
  • Documentation

    • Expanded README, CLI docs, and JSON Schema additions clarifying formats and flags.
  • Chores

    • Added .caliper_cache/ to .gitignore.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tosokin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Caution

Review failed

Failed to post review comments

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Caliper post‑processing engine and orchestration: workspace-aware CLI, plugin protocol, test‑base discovery with per‑test caching, visualization/KPI/AI‑eval pipelines, JSON‑schema validation, Pydantic orchestration config, sample plugins/fixtures, Dash KPI viewer, and FORGE integration points.

Changes

Caliper post-processing engine & orchestration

Layer / File(s) Summary
Data Models / Plugin Protocol
projects/caliper/engine/model.py
Adds dataclasses (TestBaseNode, UnifiedResultRecord, ParseResult, UnifiedRunModel, RegressionFinding) and PostProcessingPlugin protocol with parse/visualize/kpi/ai‑eval hooks.
Discovery / Traversal
projects/caliper/engine/traverse.py
Discovers test bases by locating __test_labels__.yaml, loads labels, and lists artifact files into TestBaseNodes.
Fingerprinting & Cache
projects/caliper/engine/cache.py, .gitignore
Stable fingerprints for base/test‑base, cache path resolution under .caliper_cache/, read/write/validation for test‑base caches; .caliper_cache/ added to .gitignore.
Parsing Orchestration
projects/caliper/engine/parse.py
run_parse discovers test bases, loads per‑test‑base caches when valid, calls plugin.parse, writes per‑base caches, aggregates into UnifiedRunModel.
Validation / Serialization
projects/caliper/engine/validation.py, projects/caliper/schemas/*
Schema loaders/validators, model_to_jsonable/model_from_jsonable for UnifiedRunModel; added KPI and AI‑eval JSON schemas.
Label Filtering
projects/caliper/engine/label_filters.py
Parse KEY=VALUE filters, apply include/exclude semantics, and filter records by distinguishing_labels.
Visualization Orchestration
projects/caliper/engine/visualize.py, projects/caliper/dash_app/kpi_view.py
Resolve visualize config/groups, compute report IDs, filter records, invoke plugin.visualize; simple Dash KPI table view added.
KPI Pipeline (compute/export/import)
projects/caliper/engine/kpi/*
Catalog, run_kpi_generate, OpenSearch client builder, import/export helpers, regression rules, and KPI analysis (analyze.py).
AI‑eval Export
projects/caliper/engine/ai_eval.py
run_ai_eval_export builds payload via plugin, validates against AI‑eval schema, writes pretty JSON.
CLI & Wiring
projects/caliper/cli/main.py
Workspace-aware CLI root options (--artifacts-dir/--postprocess-config/--plugin-module), subcommands (parse, visualize, kpi group, ai-eval-export, artifacts export), plugin resolution and exit-code handling.
Orchestration Config & Runner
projects/caliper/orchestration/postprocess_config.py, projects/caliper/orchestration/postprocess.py, projects/caliper/orchestration/postprocess_outcome.py
Pydantic models for postprocess sections with validators, orchestration runner invoking enabled steps, and deterministic final status computation.
Plugins & Fixtures
projects/caliper/tests/stub_plugin.py, projects/caliper/tests/fixtures/*, projects/skeleton/postprocess/default/*
Adds stub and skeleton sample plugins (parse/visualize/kpi/ai‑eval), fixture trees, demo metrics, visualize‑groups, and caliper manifests.
FORGE Integration
projects/core/library/postprocess.py, projects/skeleton/orchestration/*, projects/core/ci_entrypoint/fournos.py
Wrap test execution to run Caliper postprocess, artifact dir resolution, status YAML writing, skeleton test seeding and CLI registration, and Fournos job YAML relocation into CI metadata.
CI / Packaging / Docs
pyproject.toml, .github/workflows/test_toolbox_dsl.yml, projects/caliper/README.md
Adds testing optional deps and Ruff rule, workflow installs package with [testing] extras, README expanded for post‑processing CLI and commands.
Reports Index & Notifications
projects/core/library/reports_index.py, projects/core/notifications/send.py
Generates HTML index for Caliper reports and integrates Caliper postprocess status into notification messages.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / Orchestration
    participant Engine as Caliper Engine
    participant Plugin as PostProcessing Plugin
    participant Cache as Cache Layer
    participant Schema as JSON Schema
    participant Output as Output Files

    rect rgba(70, 130, 180, 0.5)
    note over CLI,Plugin: Parse Phase
    CLI->>Engine: run_parse(base_dir, plugin)
    Engine->>Engine: discover_test_bases(base_dir)
    Engine->>Cache: fingerprint_test_base & read_test_base_cache
    alt Cache Valid
        Cache-->>Engine: cached records
    else Cache Invalid/Missing
        Engine->>Plugin: parse(base_dir, test_nodes)
        Plugin-->>Engine: ParseResult(records, warnings)
        Engine->>Cache: write_test_base_cache
        Cache-->>Engine: cache written
    end
    Engine-->>CLI: UnifiedRunModel
    end

    rect rgba(100, 149, 237, 0.5)
    note over CLI,Plugin: Visualize Phase
    CLI->>Engine: run_visualize(model, report_ids, filters)
    Engine->>Engine: filter_records by labels
    Engine->>Plugin: visualize(model, output_dir, report_ids)
    Plugin-->>Output: HTML reports
    Engine-->>CLI: list[str] file paths
    end

    rect rgba(144, 238, 144, 0.5)
    note over CLI,Plugin: KPI & AI-eval
    CLI->>Engine: run_kpi_generate(model, plugin)
    Engine->>Plugin: compute_kpis(model)
    Plugin-->>Engine: KPI records
    Engine->>Schema: load_schema(kpi_record.schema.json)
    Engine->>Engine: validate_instance per record
    Engine->>Output: write JSONL
    CLI->>Engine: run_ai_eval_export(model, plugin)
    Engine->>Plugin: build_ai_eval_payload(model)
    Engine->>Schema: load_schema(ai_eval_payload.schema.json)
    Engine->>Engine: validate_instance
    Engine->>Output: write JSON
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

I hopped through trees of tests and files,
I cached and parsed and stacked the piles,
KPIs counted, dashboards bright—
I skittered, sorted, piped to sight,
Now postprocess dreams in moonlit aisles. 🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

This reverts commit 2d4d91b.

Aka, put back the post-processing engine generated by Cursor
@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 5, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🔴 Test of 'skeleton test' failed after 00 hours 00 minutes 00 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260505-152347
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Failure indicator:

## /workspace/artifacts/000__test/FAILURE 
--- 📍KeyError STACKTRACE ---
--- 📍'ARTIFACT_BASE_DIR'

   Traceback (most recent call last):
     File "/app/forge/projects/core/library/ci.py", line 100, in wrapper
       exit_code = command_func(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/app/forge/projects/skeleton/orchestration/ci.py", line 49, in test
       return test_skeleton.test()
              ^^^^^^^^^^^^^^^^^^^^
     File "/app/forge/projects/skeleton/orchestration/test_skeleton.py", line 53, in test
       artifact_base_dir = pathlib.Path(env.ARTIFACT_BASE_DIR).resolve()
                                        ^^^^^^^^^^^^^^^^^^^^^
     File "/app/forge/projects/core/library/env.py", line 36, in __getattr__
       return globals()[name]
              ~~~~~~~~~^^^^^^
   KeyError: 'ARTIFACT_BASE_DIR'

[...]

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🔴 Test of 'fournos_launcher submit' failed after 00 hours 01 minutes 19 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

• Failure indicator: Empty.
Execution logs

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 5, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 10 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260505-152713
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 29 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

Execution logs

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 5, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt
/var caliper.postprocess.parse.enabled: true
/var caliper.postprocess.visualize.enabled: true

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 10 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

caliper.postprocess.parse.enabled: 'true'
caliper.postprocess.visualize.enabled: 'true'
ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260505-153234
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 31 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/var caliper.postprocess.parse.enabled: true
/var caliper.postprocess.visualize.enabled: true
/cluster psap-mgmt

Execution logs

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 5, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 10 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260505-165855
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 5, 2026

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 29 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

Execution logs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (14)
projects/caliper/schemas/kpi_record.schema.json-22-28 (1)

22-28: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

source sub-object fields are not marked required

test_base_path and plugin_module are declared as properties of source but have no required constraint, so "source": {} passes validation. Every call site (see projects/caliper/tests/stub_plugin.py and projects/caliper/engine/kpi/generate.py) always populates both fields — the schema should enforce this to catch broken plugin implementations at runtime rather than silently accepting incomplete records.

🛡️ Proposed fix
     "source": {
       "type": "object",
+      "required": ["test_base_path", "plugin_module"],
       "properties": {
         "test_base_path": { "type": "string" },
         "plugin_module": { "type": "string" }
       }
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/schemas/kpi_record.schema.json` around lines 22 - 28, The
"source" object currently defines properties test_base_path and plugin_module
but doesn't enforce them; update the schema for the "source" object to include a
required array listing "test_base_path" and "plugin_module" so an empty
"source": {} fails validation; modify the source object definition (the same
block that declares properties test_base_path and plugin_module) to add
"required": ["test_base_path","plugin_module"] ensuring both fields are
mandatory.
projects/caliper/engine/kpi/opensearch_client.py-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

int(port_s) will raise an unhelpful ValueError on invalid port strings

If OPENSEARCH_HOSTS contains a malformed entry like "localhost:abc", int(port_s) raises a bare ValueError with no context about which host entry is broken.

🛡️ Proposed fix
-            host_list.append({"host": host, "port": int(port_s)})
+            try:
+                host_list.append({"host": host, "port": int(port_s)})
+            except ValueError:
+                raise ValueError(
+                    f"Invalid port in OPENSEARCH_HOSTS entry '{h}': expected integer, got '{port_s}'"
+                ) from None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/kpi/opensearch_client.py` at line 25, The code
currently converts port_s to int directly in the OPENSEARCH_HOSTS parsing
(host_list.append({"host": host, "port": int(port_s)})), which will raise a bare
ValueError without context for malformed entries like "localhost:abc"; wrap the
int(port_s) conversion in a try/except that catches ValueError, and either raise
a new ValueError with a descriptive message including the original host
entry/port_s and OPENSEARCH_HOSTS string or log the bad entry and skip it;
update the logic around host_list and port_s so failures include the offending
host/port (and the variable OPENSEARCH_HOSTS) in the error message to make
debugging clear.
projects/caliper/engine/label_filters.py-27-32 (1)

27-32: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Label value type mismatch: non-string label values silently fail filter matches

LabelMap values are typed Any, so labels can carry bool or int values (e.g., higher_is_better: True in the stub plugin). Filter values from parse_filter_kv are always str, so labels.get("higher_is_better") == "True" evaluates to False even when the label is True. Users would get silently empty results rather than an error.

Consider either coercing label values to str during comparison or documenting that only string-valued labels are supported for filtering.

♻️ Proposed minimal fix (coerce to str for comparison)
-    for k, v in exclude.items():
-        if labels.get(k) == v:
+    for k, v in exclude.items():
+        if str(labels.get(k, "")) == v:
             return False
     if not include:
         return True
-    return all(labels.get(k) == v for k, v in include.items())
+    return all(str(labels.get(k, "")) == v for k, v in include.items())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/label_filters.py` around lines 27 - 32, The filter
comparison silently fails when label values in LabelMap are non-string (e.g.,
bool/int) because parse_filter_kv yields strings; update the comparison logic in
the include/exclude block (the loop over exclude and the all(...) for include)
to coerce the label side to str before comparing (e.g., use str(labels.get(k))
== v and ensure labels.get(k) is not None) so that values like True/1 match
their string counterparts from parse_filter_kv; reference parse_filter_kv,
LabelMap, and the exclude/include comparison code to locate the changes.
projects/caliper/engine/cache.py-57-60 (1)

57-60: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

read_cache lets corrupted JSON crash the caller instead of falling back to a re-parse.

A partially-written cache (e.g. after a disk-full abort in write_cache) raises json.JSONDecodeError, which propagates through the parse pipeline with no actionable message. Returning None on a parse error triggers the existing re-parse path gracefully.

🐛 Proposed fix
 def read_cache(path: Path) -> dict[str, Any] | None:
     if not path.is_file():
         return None
-    return json.loads(path.read_text(encoding="utf-8"))
+    try:
+        return json.loads(path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError):
+        return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/cache.py` around lines 57 - 60, The read_cache
function currently calls json.loads(path.read_text(...)) which will raise
json.JSONDecodeError on corrupted/partially-written files; change read_cache to
catch json.JSONDecodeError (and optionally ValueError) around the json.loads
call and return None when a parse error occurs so the caller falls back to the
re-parse path; keep the existing path.is_file() check and ensure only JSON parse
errors are swallowed (re-raise unexpected exceptions if needed) and reference
the read_cache function and the json.loads(path.read_text(encoding="utf-8"))
expression when applying the change.
projects/caliper/tests/stub_plugin.py-28-36 (1)

28-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing break after reading metrics.json — last match silently wins.

_list_files_under uses rglob("*"), so artifact_paths can include metrics.json files from nested subdirectories. Without a break, the final file visited silently overwrites raw, discarding earlier parse errors and metrics. The skeleton plugin (projects/skeleton/postprocess/default/plugin.py) correctly exits the loop after the first match.

🐛 Proposed fix
             for p in node.artifact_paths:
                 if p.name == "metrics.json":
                     import json  # noqa: PLC0415
 
                     try:
                         raw = json.loads(p.read_text(encoding="utf-8"))
                     except json.JSONDecodeError as e:
                         warnings.append(f"Malformed JSON {p}: {e}")
                         raw = {"_error": "partial_parse"}
+                    break
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/tests/stub_plugin.py` around lines 28 - 36, The loop over
node.artifact_paths in the test stub (the for p in node.artifact_paths block
that looks for p.name == "metrics.json") can allow later matches to overwrite
raw; after successfully handling (or catching a JSONDecodeError for) the first
metrics.json you must exit the loop—add a break right after setting raw (both in
the try and except paths or after the shared assignment) so the first discovered
metrics.json is used and later files cannot silently win; locate this logic in
stub_plugin.py inside the loop that references node.artifact_paths and modify it
to break once metrics.json has been processed.
projects/caliper/engine/kpi/generate.py-35-36 (1)

35-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Ensure output parent directory exists before writing JSONL.

Writing directly can fail when output includes a new subdirectory path. Creating parents first makes this path robust for orchestration-provided outputs.

Proposed fix
     text = "\n".join(json.dumps(r, ensure_ascii=False) for r in rows) + ("\n" if rows else "")
     if output:
+        output.parent.mkdir(parents=True, exist_ok=True)
         output.write_text(text, encoding="utf-8")
     return rows
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/kpi/generate.py` around lines 35 - 36, The write to
output.path can fail when the path contains non-existent parent directories;
before calling output.write_text(text, encoding="utf-8") ensure the parent
directory exists by creating it (use output.parent.mkdir(parents=True,
exist_ok=True)) so the write is robust for new subdirectories — update the block
around the output variable in generate.py to create parents first, then write
the JSONL.
projects/caliper/orchestration/postprocess_config.py-149-153 (1)

149-153: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Harden visualize selector validation against whitespace-only values.

reports: " " or report_group: " " currently passes validation but is effectively empty and may fail downstream. Align this check with the baseline validator’s strip() behavior.

Proposed fix
     `@model_validator`(mode="after")
     def _visualize_needs_selector(self) -> Self:
         if not self.visualize.enabled:
             return self
-        if not (self.visualize.reports or self.visualize.report_group):
+        has_reports = bool(self.visualize.reports and self.visualize.reports.strip())
+        has_report_group = bool(
+            self.visualize.report_group and self.visualize.report_group.strip()
+        )
+        if not (has_reports or has_report_group):
             raise ValueError(
                 "caliper.postprocess.visualize.enabled requires "
                 "`reports` (comma-separated) or `report_group`."
             )
         return self
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/postprocess_config.py` around lines 149 - 153,
The current validation for visualize selectors treats whitespace-only strings as
valid; update the check that uses self.visualize.reports and
self.visualize.report_group to strip() each value before testing truthiness
(e.g., consider reports.strip() and report_group.strip()) so that values
consisting only of whitespace are rejected and the ValueError is raised; ensure
you handle None safely (only call strip() when not None) and preserve the
existing error message and logic surrounding the raise.
projects/caliper/engine/kpi/import_export.py-45-47 (1)

45-47: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

load_kpis_jsonl raises an opaque FileNotFoundError if path doesn't exist.

Unlike build_layout in kpi_view.py (which guards with kpi_jsonl_path.is_file()), this function immediately calls path.read_text(). Since it's used by run_analyze for both current and baseline files (user-supplied paths), a missing baseline produces an unguarded traceback rather than a clear diagnostic message. Add an existence check or let callers handle it explicitly, but document the expectation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/kpi/import_export.py` around lines 45 - 47, The
load_kpis_jsonl function currently calls path.read_text() unguarded and can
raise an opaque FileNotFoundError; update load_kpis_jsonl to first check
path.is_file() (or path.exists()) and raise a clear, contextual
FileNotFoundError (or ValueError) that includes the path and whether it's
expected to be a baseline or current file, and add/update the function docstring
to state whether callers may pass non-existent paths; reference load_kpis_jsonl
and callers like run_analyze and the pattern used in build_layout
(kpi_view.build_layout) to mirror the guarded behavior.
projects/caliper/engine/kpi/analyze.py-31-31 (1)

31-31: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate kpi_id in baseline silently drops all but the last entry.

The dict comprehension {b["kpi_id"]: b for b in baseline} uses last-write-wins semantics. If a baseline JSONL file contains multiple rows with the same kpi_id (e.g., from different runs), earlier rows are silently discarded and only the last is compared against. For regression analysis this can produce misleading results. At minimum, log a warning when a collision is detected.

🛡️ Proposed guard
-    base_by_id = {b["kpi_id"]: b for b in baseline}
+    base_by_id: dict[str, Any] = {}
+    for b in baseline:
+        kid = b["kpi_id"]
+        if kid in base_by_id:
+            import warnings  # noqa: PLC0415
+            warnings.warn(f"Duplicate kpi_id '{kid}' in baseline; keeping last entry.")
+        base_by_id[kid] = b
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/kpi/analyze.py` at line 31, The dict comprehension
building base_by_id from baseline silently drops earlier entries with the same
kpi_id; replace it with an explicit loop over baseline that checks for existing
keys and logs a warning when a duplicate kpi_id is encountered (include the
duplicated kpi_id and any identifying info available from the baseline entries),
then decide whether to keep the first, last, or accumulate entries — at minimum
keep the first occurrence by only setting base_by_id[b["kpi_id"]] if the key is
not already present. Update the code that uses base_by_id accordingly (symbols:
base_by_id, baseline) and use the module's logger to emit the warning.
projects/caliper/engine/kpi/import_export.py-22-22 (1)

22-22: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

doc_id produces "None-None-None" when required fields are absent.

If any of kpi_id, run_id, or timestamp is missing from a record, rec.get(...) returns None, yielding the literal string "None-None-None". All such malformed records will collide into a single OpenSearch document, silently overwriting each other. Add a validation/assertion before constructing doc_id, or raise an early error if required fields are missing.

🛡️ Proposed guard
-        doc_id = f"{rec.get('kpi_id')}-{rec.get('run_id')}-{rec.get('timestamp')}"
+        kpi_id = rec["kpi_id"]
+        run_id = rec["run_id"]
+        timestamp = rec["timestamp"]
+        doc_id = f"{kpi_id}-{run_id}-{timestamp}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/kpi/import_export.py` at line 22, The doc_id creation
uses rec.get('kpi_id'), rec.get('run_id'), and rec.get('timestamp') which can
return None and produce "None-None-None"; before constructing doc_id, validate
that rec contains non-empty values for 'kpi_id', 'run_id', and 'timestamp'
(e.g., check rec.get(...) is not None/empty) and either raise a ValueError or
skip/log the record so malformed records cannot overwrite each other; update the
code around the doc_id assignment (the block that sets doc_id from rec) to
perform the checks and handle the error path consistently (raise or continue) so
downstream OpenSearch indexing only receives well-formed doc_ids.
projects/caliper/engine/parse.py-57-68 (1)

57-68: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Partial/error parse results are cached unconditionally.

When warnings includes entries from a json.JSONDecodeError (i.e., a record carries {"_parse_error": True}), the bad parse is still written to the cache. The next invocation with use_cache=True will load and return the erroneous model without re-parsing. Consider skipping write_cache when warnings are non-empty, or at minimum storing a has_warnings flag in the cache doc so cache_is_valid can reject it on the next call.

Additionally, line 68 (model.parse_cache_ref = str(path)) is a no-op: parse_cache_ref is already set to the same value in the constructor at line 55.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/parse.py` around lines 57 - 68, The current code
unconditionally calls write_cache and then sets model.parse_cache_ref even when
warnings (including JSONDecodeError parse failures) are present; change the
logic in the block that uses warnings/force_report_partial so that if any
warning record has {"_parse_error": True} you do NOT call write_cache (to avoid
caching a broken parse), otherwise if warnings are non-empty include a
has_warnings flag in the cached doc (e.g. pass has_warnings=True through the
write_cache payload) so cache_is_valid can reject it later; also remove the
redundant model.parse_cache_ref = str(path) assignment since parse_cache_ref is
already set in the constructor. Use the existing symbols to find the code:
warnings, force_report_partial, model_to_jsonable, unified_dict, write_cache,
model.parse_cache_ref, and cache_is_valid.
projects/caliper/cli/main.py-313-322 (1)

313-322: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

kpi_import success message "Wrote snapshot" is semantically inverted — the command reads (imports), not writes.

The command imports data from the snapshot file, yet the feedback says "Wrote snapshot {snapshot}", implying a file was created at that path.

🐛 Proposed fix
-    click.echo(f"Wrote snapshot {snapshot}")
+    click.echo(f"Import complete (from {snapshot})")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/cli/main.py` around lines 313 - 322, The success message in
kpi_import is misleading — change the final click.echo in the kpi_import
function to indicate the snapshot was imported rather than written; update the
message (e.g., "Imported snapshot {snapshot}" or "Successfully imported snapshot
{snapshot}") so it accurately reflects that import_kpis_snapshot was run and the
data was read from the provided snapshot path.
projects/core/library/postprocess.py-99-105 (1)

99-105: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

except Exception is too broad in orchestration_artifact_dir — masks real errors.

Catching all exceptions here silently swallows AttributeError, TypeError, etc. that may indicate programming mistakes unrelated to ARTIFACT_DIR being unset. Use a tighter guard.

🛡️ Proposed fix
 def orchestration_artifact_dir() -> Path | None:
     """FORGE CI artifact directory ($ARTIFACT_DIR), when initialized."""
     try:
         ad = env.ARTIFACT_DIR
-    except Exception:
+    except (AttributeError, RuntimeError, OSError):
         return None
     return Path(ad) if ad is not None else None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/core/library/postprocess.py` around lines 99 - 105, The current
broad except in orchestration_artifact_dir masks real errors; replace the
try/except with a safe lookup using getattr(env, "ARTIFACT_DIR", None) (or if
you prefer, catch only AttributeError) so only the missing attribute case is
handled; then return Path(ad) if ad is not None else None. Ensure you update the
body of orchestration_artifact_dir (referencing env.ARTIFACT_DIR and the
variable ad) accordingly.
projects/caliper/engine/visualize.py-22-26 (1)

22-26: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

yaml.safe_load result is not validated as a mapping before downstream .get(...) access.

yaml.safe_load can return None (empty file), a list, or a scalar. If a caller passes or the auto-discovered file contains a non-mapping YAML root, resolve_report_ids will call config.get("groups", {}) and raise AttributeError, producing a confusing traceback.

🛡️ Proposed fix
 def resolve_visualize_config(
     base_dir: Path,
     explicit_path: Path | None,
 ) -> dict[str, Any] | None:
     if explicit_path is not None:
         p = explicit_path.resolve()
         if not p.is_file():
             raise FileNotFoundError(f"Visualize config not found: {p}")
-        return yaml.safe_load(p.read_text(encoding="utf-8"))
+        data = yaml.safe_load(p.read_text(encoding="utf-8"))
+        if data is not None and not isinstance(data, dict):
+            raise ValueError(f"Visualize config {p} must be a YAML mapping, got {type(data).__name__}")
+        return data
     for name in ("visualize-groups.yaml", "visualize-groups.yml"):
         cand = base_dir / name
         if cand.is_file():
-            return yaml.safe_load(cand.read_text(encoding="utf-8"))
+            data = yaml.safe_load(cand.read_text(encoding="utf-8"))
+            if data is not None and not isinstance(data, dict):
+                raise ValueError(f"Visualize config {cand} must be a YAML mapping, got {type(data).__name__}")
+            return data
     return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/visualize.py` around lines 22 - 26, The YAML loader
can return non-mappings causing resolve_report_ids to call config.get(...) and
crash; after each yaml.safe_load call (the returned value used as config) assign
it to a variable and ensure it's a mapping (e.g., isinstance(cfg, dict)); if
it's not, replace it with an empty dict (or raise a clear TypeError) before
returning so callers like resolve_report_ids can safely call
config.get("groups", {}).
🧹 Nitpick comments (6)
projects/caliper/engine/plugin_config.py (2)

10-10: 💤 Low value

MANIFEST_FILENAMES lookup order is an implicit priority contract

The first match wins (_find_manifest_under_base), so caliper.yaml takes precedence over forge-postprocess.yaml and postprocess.yaml. This priority is undocumented. A brief comment noting the resolution order would prevent confusion when a directory contains multiple manifests.

📝 Suggested doc comment
-MANIFEST_FILENAMES = ("caliper.yaml", "forge-postprocess.yaml", "postprocess.yaml")
+# Checked in order; first match wins.
+MANIFEST_FILENAMES = ("caliper.yaml", "forge-postprocess.yaml", "postprocess.yaml")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/plugin_config.py` at line 10, Add a short doc comment
above the MANIFEST_FILENAMES tuple explaining that the tuple order defines
priority for manifest resolution (first match wins) and that
_find_manifest_under_base will return the first filename found under a
directory; reference both MANIFEST_FILENAMES and _find_manifest_under_base so
readers know that caliper.yaml will take precedence over later entries when
multiple manifests exist.

26-32: 💤 Low value

_find_manifest_under_base silently returns None for non-existent or non-directory base_dir

If base_dir doesn't exist or is a file, candidate.is_file() will always be False and the function returns None. The caller resolve_plugin_module_string then raises a ValueError about "no plugin module" rather than a more informative error about the invalid base directory. This is a minor DX issue — adding an early check here would surface the real problem.

♻️ Proposed fix
 def _find_manifest_under_base(base_dir: Path) -> Path | None:
     base_dir = base_dir.resolve()
+    if not base_dir.is_dir():
+        return None  # caller will surface a clearer error via resolve_plugin_module_string
     for name in MANIFEST_FILENAMES:

The comment keeps the behaviour identical (returns None) but the existence check is a no-op in valid usage and could be extended to raise a NotADirectoryError if stricter feedback is wanted later.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/plugin_config.py` around lines 26 - 32, The helper
_find_manifest_under_base currently resolves base_dir but silently returns None
when base_dir doesn't exist or is not a directory; update
_find_manifest_under_base to perform an early sanity check (using
base_dir.exists() and base_dir.is_dir()) and immediately return None if the path
is missing or not a directory so callers get a clearer failure path; keep the
rest of the function intact (iterate MANIFEST_FILENAMES, check
candidate.is_file()) and consider adding a comment noting a future option to
raise NotADirectoryError for stricter feedback.
projects/caliper/schemas/ai_eval_payload.schema.json (1)

9-12: 💤 Low value

Property name "optional" is semantically confusing in a schema context

Using "optional" as a field name next to JSON Schema keywords like "required" and "properties" creates a naming collision in the reader's mental model. Consider renaming to "nullable_metrics" or "missing_metrics" to clearly convey intent without the ambiguity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/schemas/ai_eval_payload.schema.json` around lines 9 - 12,
Rename the confusing top-level schema property "optional" to a clearer name
(e.g., "nullable_metrics" or "missing_metrics") in the AI eval payload schema
and update its "description" accordingly; search for and update all references
to "optional" in code, tests, and consumers (schema validators, TypeScript
types, serialization/deserialization code) to use the new property name so the
schema keywords ("required", "properties") remain semantically distinct from the
field name.
projects/caliper/engine/load_plugin.py (1)

38-40: ⚡ Quick win

Add a runtime isinstance guard before returning.

cast is a static-type-only hint; it performs no check at runtime. If a module returns an incompatible object (e.g. a raw function or wrong class), the failure manifests later as a cryptic AttributeError at the first plugin method call instead of a clear error here.

🛡️ Proposed fix
 if plugin is None:
     raise RuntimeError(f"Plugin module {module_path!r} returned no plugin.")
-return cast(PostProcessingPlugin, plugin)
+if not isinstance(plugin, PostProcessingPlugin):
+    raise RuntimeError(
+        f"Plugin module {module_path!r} returned {type(plugin)!r}, "
+        "expected a PostProcessingPlugin subclass instance."
+    )
+return plugin
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/load_plugin.py` around lines 38 - 40, Before
returning the plugin from load_plugin.py, add a runtime isinstance guard to
verify the object implements the PostProcessingPlugin type instead of relying on
cast alone: check that plugin is not None and isinstance(plugin,
PostProcessingPlugin) (or use an appropriate abstract base/class check if
PostProcessingPlugin is an ABC/protocol), and if the check fails raise a
RuntimeError with a clear message including module_path and the plugin's actual
type; ensure this replaces or augments the existing plugin is None check so
callers get an immediate, descriptive error rather than a late AttributeError.
projects/caliper/engine/ai_eval.py (1)

11-28: ⚡ Quick win

plugin: object defeats static type checking — use PostProcessingPlugin.

Annotating plugin as object means the attribute access on line 27 (plugin.build_ai_eval_payload) is unresolvable to any type checker, turning a potential AttributeError into a runtime surprise. The same pattern recurs in projects/caliper/engine/parse.py (plugin.parse). The PostProcessingPlugin base class (from projects/caliper/engine/model.py) is already imported by callers, so the type is available. The return type (dict[str, object]) should also align with the interface contract (dict[str, Any]).

♻️ Proposed fix
+from projects.caliper.engine.model import PostProcessingPlugin, UnifiedRunModel
+from typing import Any

 def run_ai_eval_export(
     *,
     base_dir: Path,
     plugin_module: str,
-    plugin: object,
+    plugin: PostProcessingPlugin,
     output: Path,
     use_cache: bool,
     cache_path: Path | None,
-) -> dict[str, object]:
+) -> dict[str, Any]:
     ...
-    build = plugin.build_ai_eval_payload
-    payload = build(model)
+    payload = plugin.build_ai_eval_payload(model)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/ai_eval.py` around lines 11 - 28, Change the loose
typing that hides attribute access errors: annotate the plugin parameter of
run_ai_eval_export as PostProcessingPlugin (so calls to
plugin.build_ai_eval_payload are type-checked) and update the function return
type from dict[str, object] to dict[str, Any]; mirror the same change in
projects/caliper/engine/parse.py for the plugin parameter used with
plugin.parse. Ensure the PostProcessingPlugin symbol is imported or referenced
where these functions are defined so static checkers can verify the
build_ai_eval_payload/parse methods and their expected return types.
projects/caliper/orchestration/postprocess.py (1)

234-297: ⚡ Quick win

_run_kpi should return the generated KPI output path to avoid re-deriving it at the call site.

_run_kpi internally computes out_kpi (line 248) and uses it as the run_kpi_generate output target, but the caller then re-derives the same path at lines 358–360 via a separate _resolve_under_workspace call with identical arguments. These two computations are currently equivalent, but if _run_kpi is ever changed to compute the output path differently, _run_analyze will silently receive a stale path. Returning the path from _run_kpi eliminates the duplication.

♻️ Proposed refactor
-    def _run_kpi() -> tuple[dict[str, Any], dict[str, Any], list[dict[str, Any]] | None]:
+    def _run_kpi() -> tuple[dict[str, Any], dict[str, Any], list[dict[str, Any]] | None, Path | None]:
         nonlocal kpi_generate_failed, kpi_export_failed
         empty: dict[str, Any] = {}
         if not ppc.kpi.enabled:
-            return {"status": "skipped", "reason": "kpi disabled"}, empty, None
+            return {"status": "skipped", "reason": "kpi disabled"}, empty, None, None
         rows: list[dict[str, Any]] | None = None
         gen_report: dict[str, Any] = {}
         exp_report: dict[str, Any] = {}
+        out_kpi_path: Path | None = None

         if ppc.kpi.generate.enabled:
             try:
                 mod_str, plugin = _load_plugin(...)
                 out_kpi = _resolve_under_workspace(ppc.kpi.generate.output, "kpis.jsonl", workspace)
+                out_kpi_path = out_kpi
                 rows = run_kpi_generate(..., output=out_kpi, ...)
                 ...
             except Exception as e:
                 ...
                 rows = None
         ...
-        return gen_report, exp_report, rows
+        return gen_report, exp_report, rows, out_kpi_path

     ...

     if ppc.kpi.enabled:
-        gen_r, exp_r, rows = _run_kpi()
+        gen_r, exp_r, rows, current_kpi_path = _run_kpi()
         steps["kpi_generate"] = gen_r
         steps["kpi_export"] = exp_r
-        if rows is not None and ppc.kpi.generate.enabled:
-            current_kpi_path = _resolve_under_workspace(
-                ppc.kpi.generate.output, "kpis.jsonl", workspace
-            )

Also applies to: 352-360

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/postprocess.py` around lines 234 - 297, The
_run_kpi function currently computes out_kpi locally and returns (gen_report,
exp_report, rows); change it to also return the actual output path (e.g.,
out_kpi as a pathlib.Path or str) so callers don't re-derive it: update the
return type annotation of _run_kpi to tuple[dict[str, Any], dict[str, Any],
list[dict[str, Any]] | None, str | None], set out_kpi when kpi.generate runs
(and return None when skipped or on failure), and update callers (e.g.,
_run_analyze) to accept the extra value and use the returned path instead of
calling _resolve_under_workspace again; ensure all branches (generate
skipped/failure and export-only) produce a sensible None or path value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/caliper/engine/cache.py`:
- Around line 14-32: fingerprint_base_dir currently includes the cache file
itself when walking base_dir which causes the cache to always change; update
fingerprint_base_dir to skip the cache directory ('.caliper_cache') and/or any
files under it (the cache files created by default_cache_path) when iterating
os.walk so that entries do not include files in '.caliper_cache'; ensure you
compare relative paths (rel) against that directory name and continue the loop
when matched so the cache file is excluded from the fingerprint.

In `@projects/caliper/engine/kpi/import_export.py`:
- Around line 13-23: The export_kpis_to_index function currently calls
client.index(..., refresh=True) per record which causes a synchronous refresh
per document; change it to use opensearchpy.helpers.bulk by building an iterable
of actions (use index=idx, _id constructed the same way from
rec.get('kpi_id')/run_id/timestamp, and "_source" or "doc" from rec) and call
helpers.bulk(client, actions) once, then perform a single
client.indices.refresh(index=idx) if an immediate refresh is required (or simply
remove per-doc refresh and rely on the cluster). Update references in
export_kpis_to_index and keep build_client() usage; remove refresh=True from
per-document indexing and replace with a single bulk operation and optional
final refresh.

In `@projects/caliper/engine/kpi/opensearch_client.py`:
- Line 32: The OpenSearch client's use_ssl parameter currently falls back to
False when OPENSEARCH_USE_SSL is unset; update the opensearch client creation
(the use_ssl argument in projects/caliper/engine/kpi/opensearch_client.py) to
adopt a safer posture: either default use_ssl to True when OPENSEARCH_USE_SSL is
not provided, or keep the existing behavior but emit a warning via the module
logger (or a Warning/WarningLog) when OPENSEARCH_USE_SSL is missing or
explicitly false, so operators are alerted that plaintext connections will be
used; modify the code around the use_ssl=os.environ.get("OPENSEARCH_USE_SSL",
"").lower() in ("1", "true", "yes") expression and add the corresponding
log/warning or change the default.

In
`@projects/caliper/tests/fixtures/minimal_tree/.caliper_cache/projects_caliper_tests_stub_plugin_v1.json`:
- Around line 1-41: This fixture file is invalid as a cache hit because it
contains machine-specific absolute paths (fields: "base_directory", "directory",
"artifact_paths", "test_base_path") and a stale "input_fingerprint" computed
before the cache file existed; remove this committed fixture from .caliper_cache
now and do not commit cache files, and instead regenerate portable fixtures or
update tests to produce relative/templated paths; additionally note the runtime
code path fingerprint_base_dir should be fixed separately so it ignores
.caliper_cache when computing fingerprints.

In `@projects/core/library/postprocess.py`:
- Around line 148-158: The Click option for "--artifact-directory" in the
caliper-postprocess command currently allows non-existent paths and files;
update the click.option for "artifact_directory" (in the caliper-postprocess
command definition) to validate that the path exists and is a directory by
changing the Path option parameters to exists=True, file_okay=False,
dir_okay=True so mistyped/missing directories fail at CLI parse time.

In `@projects/skeleton/orchestration/test_skeleton.py`:
- Around line 96-119: The test() function currently re-raises the postprocess
exception from run_caliper_postprocess_after_test and thereby hides any original
test failure captured in exc_msg; change the finally block so that when exc_msg
is not None (i.e., do_test() raised) you catch and log the postprocess Exception
(use logger.exception or logger.error) but do not re-raise it, and only
propagate the postprocess exception when there was no prior test exception (or
explicitly chain it with "raise NewPostprocessError(...) from original_exc" if
you want both preserved); update references in the finally block around
run_caliper_postprocess_after_test, exc_msg, ret and logger.exception
accordingly so the original test failure remains visible to callers.
- Line 25: seed_skeleton_caliper_artifacts() writes artifacts to
env.BASE_ARTIFACT_DIR or env.ARTIFACT_DIR but test() calls
run_caliper_postprocess_after_test(...) with pathlib.Path(env.ARTIFACT_DIR),
causing postprocessing to scan the wrong directory when ARTIFACT_DIR has been
changed; fix by passing the immutable base path (env.BASE_ARTIFACT_DIR or
env.ARTIFACT_DIR) into run_caliper_postprocess_after_test from test(), or
alternatively update run_caliper_postprocess_after_test to compute
artifact_base_dir using pathlib.Path(env.BASE_ARTIFACT_DIR or env.ARTIFACT_DIR).
Ensure you reference the same expression used in
seed_skeleton_caliper_artifacts() so postprocessing always targets the base
artifact directory.

---

Minor comments:
In `@projects/caliper/cli/main.py`:
- Around line 313-322: The success message in kpi_import is misleading — change
the final click.echo in the kpi_import function to indicate the snapshot was
imported rather than written; update the message (e.g., "Imported snapshot
{snapshot}" or "Successfully imported snapshot {snapshot}") so it accurately
reflects that import_kpis_snapshot was run and the data was read from the
provided snapshot path.

In `@projects/caliper/engine/cache.py`:
- Around line 57-60: The read_cache function currently calls
json.loads(path.read_text(...)) which will raise json.JSONDecodeError on
corrupted/partially-written files; change read_cache to catch
json.JSONDecodeError (and optionally ValueError) around the json.loads call and
return None when a parse error occurs so the caller falls back to the re-parse
path; keep the existing path.is_file() check and ensure only JSON parse errors
are swallowed (re-raise unexpected exceptions if needed) and reference the
read_cache function and the json.loads(path.read_text(encoding="utf-8"))
expression when applying the change.

In `@projects/caliper/engine/kpi/analyze.py`:
- Line 31: The dict comprehension building base_by_id from baseline silently
drops earlier entries with the same kpi_id; replace it with an explicit loop
over baseline that checks for existing keys and logs a warning when a duplicate
kpi_id is encountered (include the duplicated kpi_id and any identifying info
available from the baseline entries), then decide whether to keep the first,
last, or accumulate entries — at minimum keep the first occurrence by only
setting base_by_id[b["kpi_id"]] if the key is not already present. Update the
code that uses base_by_id accordingly (symbols: base_by_id, baseline) and use
the module's logger to emit the warning.

In `@projects/caliper/engine/kpi/generate.py`:
- Around line 35-36: The write to output.path can fail when the path contains
non-existent parent directories; before calling output.write_text(text,
encoding="utf-8") ensure the parent directory exists by creating it (use
output.parent.mkdir(parents=True, exist_ok=True)) so the write is robust for new
subdirectories — update the block around the output variable in generate.py to
create parents first, then write the JSONL.

In `@projects/caliper/engine/kpi/import_export.py`:
- Around line 45-47: The load_kpis_jsonl function currently calls
path.read_text() unguarded and can raise an opaque FileNotFoundError; update
load_kpis_jsonl to first check path.is_file() (or path.exists()) and raise a
clear, contextual FileNotFoundError (or ValueError) that includes the path and
whether it's expected to be a baseline or current file, and add/update the
function docstring to state whether callers may pass non-existent paths;
reference load_kpis_jsonl and callers like run_analyze and the pattern used in
build_layout (kpi_view.build_layout) to mirror the guarded behavior.
- Line 22: The doc_id creation uses rec.get('kpi_id'), rec.get('run_id'), and
rec.get('timestamp') which can return None and produce "None-None-None"; before
constructing doc_id, validate that rec contains non-empty values for 'kpi_id',
'run_id', and 'timestamp' (e.g., check rec.get(...) is not None/empty) and
either raise a ValueError or skip/log the record so malformed records cannot
overwrite each other; update the code around the doc_id assignment (the block
that sets doc_id from rec) to perform the checks and handle the error path
consistently (raise or continue) so downstream OpenSearch indexing only receives
well-formed doc_ids.

In `@projects/caliper/engine/kpi/opensearch_client.py`:
- Line 25: The code currently converts port_s to int directly in the
OPENSEARCH_HOSTS parsing (host_list.append({"host": host, "port":
int(port_s)})), which will raise a bare ValueError without context for malformed
entries like "localhost:abc"; wrap the int(port_s) conversion in a try/except
that catches ValueError, and either raise a new ValueError with a descriptive
message including the original host entry/port_s and OPENSEARCH_HOSTS string or
log the bad entry and skip it; update the logic around host_list and port_s so
failures include the offending host/port (and the variable OPENSEARCH_HOSTS) in
the error message to make debugging clear.

In `@projects/caliper/engine/label_filters.py`:
- Around line 27-32: The filter comparison silently fails when label values in
LabelMap are non-string (e.g., bool/int) because parse_filter_kv yields strings;
update the comparison logic in the include/exclude block (the loop over exclude
and the all(...) for include) to coerce the label side to str before comparing
(e.g., use str(labels.get(k)) == v and ensure labels.get(k) is not None) so that
values like True/1 match their string counterparts from parse_filter_kv;
reference parse_filter_kv, LabelMap, and the exclude/include comparison code to
locate the changes.

In `@projects/caliper/engine/parse.py`:
- Around line 57-68: The current code unconditionally calls write_cache and then
sets model.parse_cache_ref even when warnings (including JSONDecodeError parse
failures) are present; change the logic in the block that uses
warnings/force_report_partial so that if any warning record has {"_parse_error":
True} you do NOT call write_cache (to avoid caching a broken parse), otherwise
if warnings are non-empty include a has_warnings flag in the cached doc (e.g.
pass has_warnings=True through the write_cache payload) so cache_is_valid can
reject it later; also remove the redundant model.parse_cache_ref = str(path)
assignment since parse_cache_ref is already set in the constructor. Use the
existing symbols to find the code: warnings, force_report_partial,
model_to_jsonable, unified_dict, write_cache, model.parse_cache_ref, and
cache_is_valid.

In `@projects/caliper/engine/visualize.py`:
- Around line 22-26: The YAML loader can return non-mappings causing
resolve_report_ids to call config.get(...) and crash; after each yaml.safe_load
call (the returned value used as config) assign it to a variable and ensure it's
a mapping (e.g., isinstance(cfg, dict)); if it's not, replace it with an empty
dict (or raise a clear TypeError) before returning so callers like
resolve_report_ids can safely call config.get("groups", {}).

In `@projects/caliper/orchestration/postprocess_config.py`:
- Around line 149-153: The current validation for visualize selectors treats
whitespace-only strings as valid; update the check that uses
self.visualize.reports and self.visualize.report_group to strip() each value
before testing truthiness (e.g., consider reports.strip() and
report_group.strip()) so that values consisting only of whitespace are rejected
and the ValueError is raised; ensure you handle None safely (only call strip()
when not None) and preserve the existing error message and logic surrounding the
raise.

In `@projects/caliper/schemas/kpi_record.schema.json`:
- Around line 22-28: The "source" object currently defines properties
test_base_path and plugin_module but doesn't enforce them; update the schema for
the "source" object to include a required array listing "test_base_path" and
"plugin_module" so an empty "source": {} fails validation; modify the source
object definition (the same block that declares properties test_base_path and
plugin_module) to add "required": ["test_base_path","plugin_module"] ensuring
both fields are mandatory.

In `@projects/caliper/tests/stub_plugin.py`:
- Around line 28-36: The loop over node.artifact_paths in the test stub (the for
p in node.artifact_paths block that looks for p.name == "metrics.json") can
allow later matches to overwrite raw; after successfully handling (or catching a
JSONDecodeError for) the first metrics.json you must exit the loop—add a break
right after setting raw (both in the try and except paths or after the shared
assignment) so the first discovered metrics.json is used and later files cannot
silently win; locate this logic in stub_plugin.py inside the loop that
references node.artifact_paths and modify it to break once metrics.json has been
processed.

In `@projects/core/library/postprocess.py`:
- Around line 99-105: The current broad except in orchestration_artifact_dir
masks real errors; replace the try/except with a safe lookup using getattr(env,
"ARTIFACT_DIR", None) (or if you prefer, catch only AttributeError) so only the
missing attribute case is handled; then return Path(ad) if ad is not None else
None. Ensure you update the body of orchestration_artifact_dir (referencing
env.ARTIFACT_DIR and the variable ad) accordingly.

---

Nitpick comments:
In `@projects/caliper/engine/ai_eval.py`:
- Around line 11-28: Change the loose typing that hides attribute access errors:
annotate the plugin parameter of run_ai_eval_export as PostProcessingPlugin (so
calls to plugin.build_ai_eval_payload are type-checked) and update the function
return type from dict[str, object] to dict[str, Any]; mirror the same change in
projects/caliper/engine/parse.py for the plugin parameter used with
plugin.parse. Ensure the PostProcessingPlugin symbol is imported or referenced
where these functions are defined so static checkers can verify the
build_ai_eval_payload/parse methods and their expected return types.

In `@projects/caliper/engine/load_plugin.py`:
- Around line 38-40: Before returning the plugin from load_plugin.py, add a
runtime isinstance guard to verify the object implements the
PostProcessingPlugin type instead of relying on cast alone: check that plugin is
not None and isinstance(plugin, PostProcessingPlugin) (or use an appropriate
abstract base/class check if PostProcessingPlugin is an ABC/protocol), and if
the check fails raise a RuntimeError with a clear message including module_path
and the plugin's actual type; ensure this replaces or augments the existing
plugin is None check so callers get an immediate, descriptive error rather than
a late AttributeError.

In `@projects/caliper/engine/plugin_config.py`:
- Line 10: Add a short doc comment above the MANIFEST_FILENAMES tuple explaining
that the tuple order defines priority for manifest resolution (first match wins)
and that _find_manifest_under_base will return the first filename found under a
directory; reference both MANIFEST_FILENAMES and _find_manifest_under_base so
readers know that caliper.yaml will take precedence over later entries when
multiple manifests exist.
- Around line 26-32: The helper _find_manifest_under_base currently resolves
base_dir but silently returns None when base_dir doesn't exist or is not a
directory; update _find_manifest_under_base to perform an early sanity check
(using base_dir.exists() and base_dir.is_dir()) and immediately return None if
the path is missing or not a directory so callers get a clearer failure path;
keep the rest of the function intact (iterate MANIFEST_FILENAMES, check
candidate.is_file()) and consider adding a comment noting a future option to
raise NotADirectoryError for stricter feedback.

In `@projects/caliper/orchestration/postprocess.py`:
- Around line 234-297: The _run_kpi function currently computes out_kpi locally
and returns (gen_report, exp_report, rows); change it to also return the actual
output path (e.g., out_kpi as a pathlib.Path or str) so callers don't re-derive
it: update the return type annotation of _run_kpi to tuple[dict[str, Any],
dict[str, Any], list[dict[str, Any]] | None, str | None], set out_kpi when
kpi.generate runs (and return None when skipped or on failure), and update
callers (e.g., _run_analyze) to accept the extra value and use the returned path
instead of calling _resolve_under_workspace again; ensure all branches (generate
skipped/failure and export-only) produce a sensible None or path value.

In `@projects/caliper/schemas/ai_eval_payload.schema.json`:
- Around line 9-12: Rename the confusing top-level schema property "optional" to
a clearer name (e.g., "nullable_metrics" or "missing_metrics") in the AI eval
payload schema and update its "description" accordingly; search for and update
all references to "optional" in code, tests, and consumers (schema validators,
TypeScript types, serialization/deserialization code) to use the new property
name so the schema keywords ("required", "properties") remain semantically
distinct from the field name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 672d0927-19e3-4813-b036-ced74f654fcd

📥 Commits

Reviewing files that changed from the base of the PR and between 8392886 and 891b719.

📒 Files selected for processing (46)
  • .gitignore
  • projects/caliper/README.md
  • projects/caliper/cli/main.py
  • projects/caliper/dash_app/kpi_view.py
  • projects/caliper/engine/__init__.py
  • projects/caliper/engine/ai_eval.py
  • projects/caliper/engine/cache.py
  • projects/caliper/engine/kpi/__init__.py
  • projects/caliper/engine/kpi/analyze.py
  • projects/caliper/engine/kpi/catalog.py
  • projects/caliper/engine/kpi/generate.py
  • projects/caliper/engine/kpi/import_export.py
  • projects/caliper/engine/kpi/opensearch_client.py
  • projects/caliper/engine/kpi/rules.py
  • projects/caliper/engine/label_filters.py
  • projects/caliper/engine/load_plugin.py
  • projects/caliper/engine/model.py
  • projects/caliper/engine/parse.py
  • projects/caliper/engine/plugin_config.py
  • projects/caliper/engine/traverse.py
  • projects/caliper/engine/validation.py
  • projects/caliper/engine/visualize.py
  • projects/caliper/orchestration/postprocess.py
  • projects/caliper/orchestration/postprocess_config.py
  • projects/caliper/orchestration/postprocess_outcome.py
  • projects/caliper/schemas/ai_eval_payload.schema.json
  • projects/caliper/schemas/kpi_record.schema.json
  • projects/caliper/tests/__init__.py
  • projects/caliper/tests/fixtures/minimal_tree/.caliper_cache/projects_caliper_tests_stub_plugin_v1.json
  • projects/caliper/tests/fixtures/minimal_tree/caliper.yaml
  • projects/caliper/tests/fixtures/minimal_tree/team_a/__test_labels__.yaml
  • projects/caliper/tests/fixtures/minimal_tree/team_a/metrics.json
  • projects/caliper/tests/stub_plugin.py
  • projects/core/library/postprocess.py
  • projects/skeleton/orchestration/ci.py
  • projects/skeleton/orchestration/config.yaml
  • projects/skeleton/orchestration/test_skeleton.py
  • projects/skeleton/postprocess/__init__.py
  • projects/skeleton/postprocess/default/__init__.py
  • projects/skeleton/postprocess/default/caliper.yaml
  • projects/skeleton/postprocess/default/demo_run/load/__test_labels__.yaml
  • projects/skeleton/postprocess/default/demo_run/load/metrics.json
  • projects/skeleton/postprocess/default/demo_run/smoke/__test_labels__.yaml
  • projects/skeleton/postprocess/default/demo_run/smoke/metrics.json
  • projects/skeleton/postprocess/default/plugin.py
  • projects/skeleton/postprocess/default/visualize-groups.yaml

Comment thread projects/caliper/engine/cache.py
Comment thread projects/caliper/engine/kpi/import_export.py
Comment thread projects/caliper/engine/kpi/opensearch_client.py
Comment thread projects/core/library/postprocess.py Outdated
Comment thread projects/skeleton/orchestration/test_skeleton.py Outdated
Comment thread projects/skeleton/orchestration/test_skeleton.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/core/library/postprocess.py`:
- Around line 85-89: The function run_caliper_postprocess_after_test is
annotated to return None but actually returns a dict when post-processing is
enabled (it returns the variable status from
run_caliper_orchestration_postprocess); update the signature of
run_caliper_postprocess_after_test to return dict[str, Any] | None so it matches
both the early None return and the dict returned via status, and adjust any
callers or type hints that assume a pure None return if necessary.
- Line 230: The file fails ruff format due to an overlong function signature for
caliper_postprocess_command; run the formatter or manually wrap the signature to
comply with the 88-char line limit (e.g., break parameters onto multiple lines
or split the type annotations) and then run `ruff format
projects/core/library/postprocess.py` to apply formatting; ensure the function
definition for caliper_postprocess_command and any adjacent lines are
reformatted so ruff format --check passes.
- Around line 218-226: The help text for the click option "--output-directory"
(parameter name output_directory in postprocess.py) is incorrect and copy-pasted
from the artifact-directory option; update the help string to describe that this
option specifies where post-processing results are written (the directory to
store output results), and remove the incorrect sentence about overriding
caliper.postprocess.artifacts_dir and ARTIFACT_BASE_DIR so the message clearly
states it controls the output/results target rather than the input artifact tree
root.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9a2e96a-c18a-4195-b93f-b20cfa276f62

📥 Commits

Reviewing files that changed from the base of the PR and between 891b719 and 8425b73.

📒 Files selected for processing (20)
  • .gitignore
  • projects/caliper/README.md
  • projects/caliper/cli/main.py
  • projects/caliper/engine/plugin_config.py
  • projects/caliper/orchestration/postprocess.py
  • projects/caliper/orchestration/postprocess_config.py
  • projects/caliper/orchestration/postprocess_outcome.py
  • projects/core/library/postprocess.py
  • projects/skeleton/orchestration/cli.py
  • projects/skeleton/orchestration/config.yaml
  • projects/skeleton/orchestration/test_skeleton.py
  • projects/skeleton/postprocess/__init__.py
  • projects/skeleton/postprocess/default/__init__.py
  • projects/skeleton/postprocess/default/caliper.yaml
  • projects/skeleton/postprocess/default/demo_run/load/__test_labels__.yaml
  • projects/skeleton/postprocess/default/demo_run/load/metrics.json
  • projects/skeleton/postprocess/default/demo_run/smoke/__test_labels__.yaml
  • projects/skeleton/postprocess/default/demo_run/smoke/metrics.json
  • projects/skeleton/postprocess/default/plugin.py
  • projects/skeleton/postprocess/default/visualize-groups.yaml
✅ Files skipped from review due to trivial changes (10)
  • projects/skeleton/postprocess/init.py
  • projects/skeleton/postprocess/default/demo_run/smoke/test_labels.yaml
  • .gitignore
  • projects/skeleton/postprocess/default/demo_run/load/metrics.json
  • projects/skeleton/postprocess/default/demo_run/smoke/metrics.json
  • projects/skeleton/postprocess/default/init.py
  • projects/skeleton/postprocess/default/demo_run/load/test_labels.yaml
  • projects/skeleton/postprocess/default/visualize-groups.yaml
  • projects/skeleton/postprocess/default/caliper.yaml
  • projects/caliper/README.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • projects/skeleton/orchestration/config.yaml
  • projects/caliper/engine/plugin_config.py
  • projects/caliper/orchestration/postprocess.py
  • projects/skeleton/postprocess/default/plugin.py
  • projects/caliper/orchestration/postprocess_config.py
  • projects/caliper/cli/main.py

Comment thread projects/core/library/postprocess.py Outdated
Comment thread projects/core/library/postprocess.py
Comment thread projects/core/library/postprocess.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
projects/caliper/engine/cache.py (1)

35-53: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: fingerprint_test_base self-invalidates the per-test-base cache.

cache_path_for_test_base writes the cache to test_base_dir/.caliper_cache/<safe>_v1.json (line 64) — i.e. inside test_base_dir. fingerprint_test_base walks test_base_dir with os.walk and no exclusions, so once write_test_base_cache has been called once:

  1. Run 1: walk yields no .caliper_cache/... entries → fingerprint F1 → write cache (mtime=T1, size=S1).
  2. Run 2: walk now includes .caliper_cache/<safe>_v1.json at (T1, S1) → fingerprint F2 ≠ F1test_base_cache_is_valid returns False → re-parse and overwrite cache.
  3. Run 3: cache file is now at (T2, S2)F3 ≠ F2 → miss again.

The per-test-base cache (the active code path used by run_parse) can therefore never report a hit. This is the same root cause raised previously on fingerprint_base_dir (lines 14–32); the fix needs to be applied to both functions since fingerprint_test_base is the one wired into parse.py.

🐛 Proposed fix (apply to both functions)
 def fingerprint_test_base(test_base_dir: Path, plugin_module: str) -> str:
     """Stable hash over file paths + mtimes under a single test base directory."""
     test_base_dir = test_base_dir.resolve()
     entries: list[tuple[str, int, int]] = []
-    for root, _dirs, files in os.walk(test_base_dir):
+    for root, dirnames, files in os.walk(test_base_dir, topdown=True):
+        dirnames[:] = [d for d in dirnames if d != ".caliper_cache"]
         for name in sorted(files):
             p = Path(root) / name
             try:
                 st = p.stat()
             except OSError:
                 continue
             rel = str(p.relative_to(test_base_dir))
             entries.append((rel, int(st.st_mtime_ns), st.st_size))
     entries.sort()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/cache.py` around lines 35 - 53, fingerprint_test_base
(and similarly fingerprint_base_dir) is including the per-test-base cache file
written under .caliper_cache into its walk, causing the fingerprint to change
after the cache is created; modify the file-collection logic in
fingerprint_test_base to ignore anything under the ".caliper_cache" directory
(e.g. skip files whose relative path starts with ".caliper_cache/" or whose
Path.parts contains ".caliper_cache") so the generated entries list never
includes the cache file written by cache_path_for_test_base, ensuring stable
fingerprints and cache hits.
🧹 Nitpick comments (1)
projects/caliper/engine/parse.py (1)

50-82: 💤 Low value

Cache writes happen even when use_cache=False.

When use_cache is False the read branch is skipped, but the parse-miss branch unconditionally calls write_test_base_cache (lines 76–82). For callers that interpret --no-cache (CLI) / parse.no_cache=true (orchestration) as "do not touch the cache at all," this still mutates .caliper_cache/<plugin>_v1.json on every run.

If the intent is "force re-parse and refresh cache" this is fine — please confirm. Otherwise gate the write on use_cache:

♻️ Optional gate
-            # Write cache for this test base
-            cache_file = write_test_base_cache(
-                test_base_dir,
-                plugin_module=plugin_module,
-                test_base_records=records,
-                fingerprint=fp,
-            )
-            cache_refs.append(str(cache_file))
+            # Write cache for this test base
+            if use_cache:
+                cache_file = write_test_base_cache(
+                    test_base_dir,
+                    plugin_module=plugin_module,
+                    test_base_records=records,
+                    fingerprint=fp,
+                )
+            cache_refs.append(str(cache_file))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/parse.py` around lines 50 - 82, The code currently
always calls write_test_base_cache and appends cache_file to cache_refs after a
parse miss even when use_cache is False; change the logic in the parse branch so
that write_test_base_cache (and the subsequent cache_refs.append) are executed
only when use_cache is True (leave the parse_fn, result.records/warnings
handling and all_records/all_warnings updates unchanged); reference the symbols
cached_records, use_cache, parse_fn, write_test_base_cache, cache_file and
cache_refs to locate and gate the cache write/append.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/caliper/engine/kpi/generate.py`:
- Around line 13-27: The function parameter cache_path is dead: remove it from
run_kpi_generate's signature and any internal use, and likewise drop the unused
cache_path parameter from run_ai_eval_export and run_visualize so their public
APIs match run_parse (which no longer accepts cache_path); then update all
callers that currently pass cache_path=None (notably the call sites in
cli/main.py and orchestration/postprocess.py) to stop supplying that argument
and adjust any related type hints/imports or docstrings to remove references to
cache_path so the signatures and plumbing are consistent.

---

Duplicate comments:
In `@projects/caliper/engine/cache.py`:
- Around line 35-53: fingerprint_test_base (and similarly fingerprint_base_dir)
is including the per-test-base cache file written under .caliper_cache into its
walk, causing the fingerprint to change after the cache is created; modify the
file-collection logic in fingerprint_test_base to ignore anything under the
".caliper_cache" directory (e.g. skip files whose relative path starts with
".caliper_cache/" or whose Path.parts contains ".caliper_cache") so the
generated entries list never includes the cache file written by
cache_path_for_test_base, ensuring stable fingerprints and cache hits.

---

Nitpick comments:
In `@projects/caliper/engine/parse.py`:
- Around line 50-82: The code currently always calls write_test_base_cache and
appends cache_file to cache_refs after a parse miss even when use_cache is
False; change the logic in the parse branch so that write_test_base_cache (and
the subsequent cache_refs.append) are executed only when use_cache is True
(leave the parse_fn, result.records/warnings handling and
all_records/all_warnings updates unchanged); reference the symbols
cached_records, use_cache, parse_fn, write_test_base_cache, cache_file and
cache_refs to locate and gate the cache write/append.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32f1f176-6a91-444f-ac25-f46941958596

📥 Commits

Reviewing files that changed from the base of the PR and between 8425b73 and 1d58c40.

📒 Files selected for processing (9)
  • projects/caliper/cli/main.py
  • projects/caliper/engine/ai_eval.py
  • projects/caliper/engine/cache.py
  • projects/caliper/engine/kpi/generate.py
  • projects/caliper/engine/parse.py
  • projects/caliper/engine/visualize.py
  • projects/caliper/orchestration/postprocess.py
  • projects/caliper/orchestration/postprocess_config.py
  • projects/core/library/postprocess.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • projects/core/library/postprocess.py

Comment thread projects/caliper/engine/kpi/generate.py
@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 6, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 10 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260506-115647
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 28 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

Execution logs

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 6, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🔴 Test of 'skeleton test' failed after 00 hours 00 minutes 00 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260506-121815
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Failure indicator:

## /workspace/artifacts/000__test/FAILURE 
--- 📍NameError STACKTRACE ---
--- 📍name 'test_duration' is not defined

   Traceback (most recent call last):
     File "/app/forge/projects/core/library/ci.py", line 100, in wrapper
       exit_code = command_func(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/app/forge/projects/skeleton/orchestration/ci.py", line 48, in test
       return test_skeleton.test()
              ^^^^^^^^^^^^^^^^^^^^
     File "/app/forge/projects/skeleton/orchestration/test_skeleton.py", line 96, in test
       return run_and_postprocess(do_test)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/app/forge/projects/core/library/postprocess.py", line 61, in run_and_postprocess
       ret = test_func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/app/forge/projects/skeleton/orchestration/test_skeleton.py", line 154, in do_test
       skeleton_take_time()
     File "/app/forge/projects/skeleton/orchestration/test_skeleton.py", line 104, in skeleton_take_time
       logger.info(f"Starting {test_duration}s test loop...")
                               ^^^^^^^^^^^^^
   NameError: name 'test_duration' is not defined. Did you mean: 'test_iteration'?

[...]

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🔴 Test of 'fournos_launcher submit' failed after 00 hours 01 minutes 18 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

• Failure indicator: Empty.
Execution logs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
projects/core/library/export.py (1)

50-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

forge.status is initialized but never written — removes noise from the k8s patch.

Lines 57–58 guard-initialize fjob_data["status"]["engineStatus"]["forge"]["status"] = {}, but this key is never subsequently populated; only exportArtifacts (line 61) is written. The empty status: {} dict is therefore included in every oc patch call, adding spurious structure to the live Kubernetes resource that could confuse operators inspecting the FournosJob status.

Additionally, the comment on line 50 (# Initialize status.engine.status if it doesn't exist) is now stale — it was accurate before the forge nesting was introduced but no longer describes the full initialization chain.

🔧 Proposed fix
-        # Initialize status.engine.status if it doesn't exist
+        # Initialize status.engineStatus.forge if it doesn't exist
         if "status" not in fjob_data:
             fjob_data["status"] = {}
         if "engineStatus" not in fjob_data["status"]:
             fjob_data["status"]["engineStatus"] = {}
         if "forge" not in fjob_data["status"]["engineStatus"]:
             fjob_data["status"]["engineStatus"]["forge"] = {}
-        if "status" not in fjob_data["status"]["engineStatus"]["forge"]:
-            fjob_data["status"]["engineStatus"]["forge"]["status"] = {}

         # Update with export-artifacts status
         fjob_data["status"]["engineStatus"]["forge"]["exportArtifacts"] = status
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/core/library/export.py` around lines 50 - 61, Remove the unnecessary
empty initialization of fjob_data["status"]["engineStatus"]["forge"]["status"]
since that key is never used and adds spurious empty structure to oc patches;
update the surrounding comment to accurately describe the full nested
initialization being performed (e.g., ensure it documents status -> engineStatus
-> forge) and leave only the actual write to
fjob_data["status"]["engineStatus"]["forge"]["exportArtifacts"] = status,
keeping all other existing guards that create missing dicts (fjob_data, status,
engineStatus, forge).
projects/caliper/cli/main.py (1)

532-549: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

run_cli may surface raw Python tracebacks for non-MissingParameter Click errors.

standalone_mode=False disables Click's default error handling, raising UsageError subclasses to the caller instead of handling them internally. Currently, only MissingParameter is caught; other recoverable user errors — BadParameter, NoSuchOption, BadOptionUsage — escape as uncaught exceptions and produce a Python traceback instead of a clean error message + non-zero exit. Catch the broader click.UsageError (parent of all of these) to provide uniform handling.

🛡️ Proposed fix
     try:
         rv = main.main(standalone_mode=False, prog_name="caliper")
         if isinstance(rv, int) and rv != 0:
             sys.exit(rv)
-    except click.MissingParameter as exc:
+    except click.UsageError as exc:
         msg = exc.format_message()
         sub = getattr(exc, "ctx", None)
         click.echo(f"Error: {msg}\n", err=True)
         if sub is not None:
             click.echo(sub.get_help(), err=True)
         ec = getattr(exc, "exit_code", 2)
         sys.exit(2 if ec is None else int(ec))
     except SystemExit:
         raise
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/cli/main.py` around lines 532 - 549, run_cli currently only
catches click.MissingParameter, so other Click usage errors (BadParameter,
NoSuchOption, BadOptionUsage) raised because standalone_mode=False will produce
raw tracebacks; update the exception handling to catch click.UsageError (the
common parent) in addition to MissingParameter inside run_cli around the
main.main(standalone_mode=False, prog_name="caliper") call, and handle it the
same way as MissingParameter: format and echo the error message, show subcommand
help if ctx/sub exists, and exit with the appropriate non-zero exit code using
the existing exit_code logic.
♻️ Duplicate comments (4)
projects/caliper/engine/cache.py (1)

14-18: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exclude .caliper_cache from both fingerprint walks.

cache_path_for_test_base() writes cache files under <test_base_dir>/.caliper_cache/..., and both fingerprint functions recurse into that directory. After the first write, the cache file changes the next fingerprint, so the per-test-base cache can never validate. The same self-invalidation exists for default_cache_path() under base_dir/.caliper_cache.

🐛 Minimal fix
 def fingerprint_base_dir(base_dir: Path, plugin_module: str) -> str:
     """Stable hash over file paths + mtimes under base_dir."""
     base_dir = base_dir.resolve()
     entries: list[tuple[str, int, int]] = []
-    for root, _dirs, files in os.walk(base_dir):
+    for root, dirnames, files in os.walk(base_dir, topdown=True):
+        dirnames[:] = [d for d in dirnames if d != ".caliper_cache"]
         for name in sorted(files):
             p = Path(root) / name
             try:
                 st = p.stat()
             except OSError:
@@
 def fingerprint_test_base(test_base_dir: Path, plugin_module: str) -> str:
     """Stable hash over file paths + mtimes under a single test base directory."""
     test_base_dir = test_base_dir.resolve()
     entries: list[tuple[str, int, int]] = []
-    for root, _dirs, files in os.walk(test_base_dir):
+    for root, dirnames, files in os.walk(test_base_dir, topdown=True):
+        dirnames[:] = [d for d in dirnames if d != ".caliper_cache"]
         for name in sorted(files):
             p = Path(root) / name
             try:
                 st = p.stat()
             except OSError:

Also applies to: 35-39, 56-64

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/cache.py` around lines 14 - 18, The fingerprint
functions are including the per-test and default cache dirs, causing
self-invalidating fingerprints; update fingerprint_base_dir and
fingerprint_test_base_dir to skip any directory named ".caliper_cache" during
the os.walk by ensuring the walk runs top-down and removing ".caliper_cache"
from the mutable _dirs list (e.g., filter _dirs in-place) so files under
.caliper_cache (written by cache_path_for_test_base and default_cache_path) are
excluded from the fingerprint computation.
projects/core/library/postprocess.py (2)

219-228: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

--output-dir help text still describes the input artifact tree, not the output target.

The help string still reads "Overrides caliper.postprocess.artifacts_dir and ARTIFACT_BASE_DIR when set", which is the semantics of --artifact-dir, not --output-dir. The output directory controls where post-processing results are written; it doesn't override artifacts_dir/ARTIFACT_BASE_DIR.

🛡️ Proposed fix
     help=(
-        "Output directory, where the post processing results will be stored. "
-        "Overrides caliper.postprocess.artifacts_dir and ARTIFACT_BASE_DIR when set."
+        "Output directory where post-processing results (visualizations, KPI JSONL, "
+        "caliper_postprocess_status.yaml) will be written. Overrides caliper.postprocess."
+        "visualize.output_dir when set."
     ),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/core/library/postprocess.py` around lines 219 - 228, The help text
for the Click option "--output-dir" (output_dir) in postprocess.py is incorrect:
it currently states it "Overrides caliper.postprocess.artifacts_dir and
ARTIFACT_BASE_DIR" which belongs to the artifact/input option. Update the help
string for the "--output-dir" click.option to clearly state that it specifies
where post-processing results are written (e.g., "Output directory where
post-processing results will be stored; overrides the default postprocess output
location."), and remove any mention of overriding
caliper.postprocess.artifacts_dir or ARTIFACT_BASE_DIR so the semantics match
output_dir rather than the artifact/input flag.

89-136: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Return type annotation still contradicts the actual return value.

run_postprocess_after_test is annotated -> None (Line 93) but executes return status at Line 136 where status is the dict[str, Any] returned by run_orchestration_postprocess. Type checkers will flag this, and run_and_postprocess (Line 80) calls it for side effects only — annotating the actual return contract avoids surprising any caller that does observe the value.

🛡️ Proposed fix
 def run_postprocess_after_test(
     artifact_root: Path | os.PathLike[str] | str | None,
     *,
     test_outcome: TestPhaseOutcome | None = None,
-) -> None:
+) -> dict[str, Any] | None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/core/library/postprocess.py` around lines 89 - 136, The function
run_postprocess_after_test is annotated as returning None but actually returns
the status dict from run_orchestration_postprocess (or returns early when
disabled), so update its return type to reflect that (e.g., Optional[dict[str,
Any]] or dict[str, Any] | None) and import/adjust typing as needed; ensure
callers like run_and_postprocess still treat the return as optional. Also update
the function docstring return semantics if present to match the new type.
projects/skeleton/orchestration/test_skeleton.py (1)

24-24: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This may reintroduce the artifact-root mismatch.

Inside env.NextArtifactDir(...), env.ARTIFACT_DIR points at the nested seed directory, not the artifact root your docstring describes. If postprocess discovery is rooted at BASE_ARTIFACT_DIR or only expects root-level scenario directories, these fixtures will be skipped again.

#!/bin/bash
set -euo pipefail

echo "== seed helper =="
sed -n '16,45p' projects/skeleton/orchestration/test_skeleton.py

echo
echo "== call site =="
sed -n '141,146p' projects/skeleton/orchestration/test_skeleton.py

echo
echo "== artifact-dir and postprocess discovery logic =="
rg -n -C3 'NextArtifactDir|BASE_ARTIFACT_DIR|ARTIFACT_DIR|def run_and_postprocess|glob\(|rglob\(|iterdir\(' --type=py

Also applies to: 143-145

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/skeleton/orchestration/test_skeleton.py` at line 24, The assignment
demo_dir = env.ARTIFACT_DIR uses the nested seed ARTIFACT_DIR and can
reintroduce the artifact-root mismatch; change it to point at the artifact root
used by postprocessing (use env.BASE_ARTIFACT_DIR or the appropriate
env.NextArtifactDir(...) call that returns the root) so fixtures are discovered
correctly by the postprocess path (see symbols demo_dir, env.ARTIFACT_DIR,
env.BASE_ARTIFACT_DIR, env.NextArtifactDir and the postprocess discovery
entrypoint like run_and_postprocess).
🧹 Nitpick comments (6)
projects/core/library/config.py (1)

13-14: ⚡ Quick win

Library module importing from an entrypoint module breaks layering

projects/core/library/config.py is a generic library used across the project, but it now directly imports from projects/core/ci_entrypoint/prepare_ci. This creates an upward dependency (libraryci_entrypoint), which can introduce circular imports and makes the library harder to use or test outside a CI-entrypoint context.

CI_METADATA_DIRNAME is a plain string constant that fits better in a shared location such as env.py or a dedicated constants.py.

♻️ Proposed refactor

Define the constant in a shared, lower-level module (e.g., env.py):

+# in projects/core/library/env.py (or a shared constants.py)
+CI_METADATA_DIRNAME = "000__ci_metadata"

Then in config.py, import from the sibling module instead:

-from projects.core.ci_entrypoint.prepare_ci import CI_METADATA_DIRNAME
+from . import env  # already imported; or from .constants import CI_METADATA_DIRNAME

And update prepare_ci to import the constant from the shared location as well.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/core/library/config.py` around lines 13 - 14,
projects/core/library/config.py currently imports CI_METADATA_DIRNAME from the
higher-level prepare_ci module creating a layering violation; move the
CI_METADATA_DIRNAME constant into a lower-level shared module (e.g.,
projects.core.library.env or projects.core.constants), then update
projects/core/library/config.py to import CI_METADATA_DIRNAME from that new
shared module and update projects/core/ci_entrypoint/prepare_ci.py to import the
same constant from the shared module instead of from prepare_ci, ensuring no
upward dependency from library to ci_entrypoint.
projects/caliper/engine/visualize.py (1)

50-91: 💤 Low value

cache_path parameter is accepted but never used.

run_visualize declares cache_path: Path | None but never passes it to run_parse (which itself has no such parameter). The CLI and orchestration layers always pass cache_path=None. Either drop the parameter from the signature or thread it through once run_parse supports a custom cache location — keeping unused parameters on a public-ish API leads to confusion at call sites (see projects/caliper/orchestration/postprocess.py Line 221 which passes cache_path=cache_path).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/engine/visualize.py` around lines 50 - 91, The run_visualize
function has an unused parameter cache_path; remove cache_path from
run_visualize's signature and all places that call it (and from any higher-level
CLI/orchestration functions that forward cache_path to run_visualize) so the
public API no longer advertises an unused option; if you must keep the option
instead, alternatively modify run_parse to accept a cache_path parameter and
pass cache_path through from run_visualize into run_parse (updating the
run_parse signature and all its callers), but do not leave cache_path unused in
run_visualize.
projects/caliper/cli/main.py (1)

121-138: 💤 Low value

Inconsistent exit codes between plugin resolution failures and KPI/import-export command failures.

_plugin_tuple exits with code=1 for config errors and code=2 for plugin load errors. The KPI subcommands (kpi_import, kpi_export, kpi_analyze) use sys.exit(3) for their own failures (Lines 320, 337, 356). However, those same commands route through _plugin_tuple-style helpers indirectly via kpi_generate only — kpi_import/export/analyze skip plugin resolution entirely. Just confirm the documented exit-code matrix (1 = workspace/config, 2 = plugin load / parse-visualize-ai-eval failure, 3 = KPI runtime failure) is intentional and reflected in user docs; otherwise consider unifying.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/cli/main.py` around lines 121 - 138, The review highlights
inconsistent exit codes between _plugin_tuple (which uses _exit_with_help(...,
code=1) for config errors and code=2 for plugin load errors) and the KPI
subcommands kpi_import/kpi_export/kpi_analyze (which call sys.exit(3) on
failure); update the CLI to follow the documented exit-code matrix by either (A)
making KPI subcommands use _exit_with_help(..., code=3) for runtime/KPI failures
so all exits use the same helper, or (B) changing _plugin_tuple to raise
specific exceptions that the callers (kpi_generate and KPI commands) catch and
convert to the documented exit codes (1 for workspace/config, 2 for plugin
load/parse failures, 3 for KPI runtime), ensuring unique symbols _plugin_tuple,
kpi_generate, kpi_import, kpi_export, kpi_analyze and _exit_with_help are used
to locate and implement the change.
projects/caliper/orchestration/postprocess_config.py (1)

108-150: 💤 Low value

Top-level extra="ignore" silently swallows typos in caliper.postprocess.* keys.

All sub-sections use extra="forbid", but the root config uses extra="ignore". A misspelled top-level key (e.g., caliper.postprocess.vizualize:) will be silently dropped, leaving the user wondering why their visualize section wasn't picked up — while a typo inside visualize: correctly errors out. Consider extra="forbid" here too for consistency, or document the asymmetric strictness explicitly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/postprocess_config.py` around lines 108 - 150,
The root CaliperOrchestrationPostprocessConfig currently sets model_config =
ConfigDict(extra="ignore") which silently drops misspelled top-level keys;
change this to model_config = ConfigDict(extra="forbid") so Pydantic will raise
on unknown top-level fields (making behavior consistent with the subsections
that use extra="forbid"), and update any tests/docs that rely on permissive
behavior to expect validation errors for unknown caliper.postprocess.* keys;
locate the model by the class name CaliperOrchestrationPostprocessConfig and its
model_config attribute to make the change.
projects/caliper/orchestration/postprocess.py (1)

167-232: 💤 Low value

Plugin is resolved and loaded twice when both parse and visualize are enabled.

_run_parse and _run_visualize each call _load_plugin(...) independently, so when both steps are enabled, the plugin module is re-resolved (and the manifest is re-read) twice per orchestration call. importlib.import_module caches modules so the second load_plugin is cheap, but the manifest YAML is re-read by resolve_plugin_module_string each time. Hoist the resolution above the inner functions and pass the (mod_str, plugin) tuple in.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/caliper/orchestration/postprocess.py` around lines 167 - 232, Both
_run_parse and _run_visualize call _load_plugin(...) causing duplicate manifest
reads; hoist plugin resolution once before those inner functions and reuse the
result. Call _load_plugin(postprocess_config, tree_root=tree_root,
manifest_path=manifest_path) once (store the returned tuple as e.g. mod_str,
plugin) before defining or invoking _run_parse and _run_visualize, and update
those functions to use the captured mod_str and plugin instead of calling
_load_plugin again (or accept the tuple as a parameter if you prefer explicit
passing).
projects/core/library/postprocess.py (1)

32-86: 💤 Low value

run_and_postprocess catches BaseException, which will swallow KeyboardInterrupt/SystemExit flow.

The try/except BaseException pattern at Lines 63–66 captures KeyboardInterrupt and SystemExit and stores them as original_exc, then re-raises after running post-processing. This is mostly intentional (to ensure post-processing runs even on Ctrl-C), but in the finally block, if run_postprocess_after_test itself raises, raise postprocess_exc from original_exc will surface a plain Exception chained from a KeyboardInterrupt — losing the user-interrupt signal semantics for the parent process. Confirm this is acceptable for FORGE's CI behavior; if not, narrow to Exception and add a separate KeyboardInterrupt/SystemExit handler that re-raises after a best-effort post-process.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@projects/core/library/postprocess.py` around lines 32 - 86, The current
try/except in run_and_postprocess uses except BaseException which captures
KeyboardInterrupt/SystemExit and later can cause interrupt semantics to be lost
when postprocessing fails; change the broad except to except Exception as e to
handle normal failures, and add a dedicated except (KeyboardInterrupt,
SystemExit) as e branch that sets original_exc and exc_msg, then performs a
best-effort call to run_postprocess_after_test (catch/log any postprocess errors
but do NOT replace or chain the original interrupt) and finally re-raises the
original interrupt so the process retains correct signal/exit behavior;
reference run_and_postprocess, original_exc, exc_msg, and
run_postprocess_after_test when implementing this.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@projects/caliper/engine/ai_eval.py`:
- Around line 11-33: The function run_ai_eval_export accepts cache_path but
never uses it and may write output to a non-existent directory; remove the
unused cache_path parameter (or, if future-proofing is desired, add a TODO to
wire it into run_parse when run_parse supports it) and ensure the parent
directory for output is created before writing—call Path.mkdir(parents=True,
exist_ok=True) on output.parent prior to output.write_text; reference
run_ai_eval_export, run_parse, cache_path, and output.write_text when making
these changes.

In `@projects/caliper/engine/cache.py`:
- Around line 99-123: When reading the cache in read_test_base_cache, guard
against malformed or unreadable files by wrapping the
json.loads(cache_path.read_text(...)) in a try/except (catch
json.JSONDecodeError and general IO errors) and treat failures as a cache miss
(return None); likewise wrap the conversion of records (the dict_to_rec ->
UnifiedResultRecord creation) in a try/except and return None if any record is
missing expected keys or conversion raises an exception. Use the existing
symbols cache_path_for_test_base, read_test_base_cache, dict_to_rec, and
UnifiedResultRecord to locate the code, and apply the same defensive pattern to
the other read helper mentioned (lines 162-165) so truncated/hand-edited cache
files are ignored and rebuilt rather than aborting.

In `@projects/caliper/engine/parse.py`:
- Around line 52-82: The code currently honors use_cache only for reads but
still writes caches; change the logic around write_test_base_cache and the
subsequent cache_refs.append so they only run when use_cache is True (e.g.,
guard the write_test_base_cache(...) call and cache_refs.append(str(cache_file))
with if use_cache:), leaving parse_fn(base_dir, [node]), extending
all_records/all_warnings, and other logic unchanged; reference the
variables/functions use_cache, parse_fn, write_test_base_cache, cache_file, and
cache_refs to locate the exact spot to guard.

In `@projects/caliper/engine/visualize.py`:
- Around line 14-27: resolve_visualize_config currently returns whatever
yaml.safe_load produces (via resolve_visualize_config), which may be a list,
scalar, or None, but callers expect a mapping; validate the parsed data is a
dict before returning: after each yaml.safe_load call in
resolve_visualize_config check isinstance(data, dict) and if not raise a clear
ValueError/FileFormatError (or return None per desired behavior) with a
descriptive message that the visualize config must be a mapping; mirror the
pattern used in plugin_config.load_manifest_file and update error text to
reference "visualize-groups" so resolve_report_ids won't receive non-dict
values.

In `@projects/caliper/orchestration/postprocess.py`:
- Around line 54-64: The function _resolve_visualize_output_dir currently
rejects relative paths but the documented semantics say relative paths should be
resolved against the orchestration artifact dir ($ARTIFACT_DIR). Change
_resolve_visualize_output_dir so that: if raw is None/empty keep the existing
ValueError; otherwise create p = Path(raw).expanduser(); if p.is_absolute()
return p.resolve(); if relative, read ARTIFACT_DIR =
os.environ.get("ARTIFACT_DIR") and raise a ValueError if ARTIFACT_DIR is
unset/empty, otherwise join Path(ARTIFACT_DIR).expanduser() / p and return that
resolved Path; update error messages accordingly. Ensure to import os at top if
not present.

In `@projects/caliper/README.md`:
- Around line 12-15: The visualize example uses the wrong flag: replace the use
of `--reports default` with the group-level flag `--report-group default` so the
visualize command selects the `default` group (and thus produces `summary_table`
/ `throughput_chart`) rather than looking for a literal report named `default`;
update the example line that currently shows `--reports default` to use
`--report-group default` and keep the rest of the visualize invocation (`caliper
--plugin-module my_package.caliper_plugin --artifacts-dir /path visualize
--output-dir ./out`) unchanged.

In `@projects/core/ci_entrypoint/fournos.py`:
- Around line 148-149: The shutil.move call currently sits outside the
try-except and can raise unlogged exceptions or create inconsistent state; move
the line shutil.move(str(fournos_fjob), str(fournos_fjob_dest)) into the same
try-block that calls parse_and_save_pr_arguments_fournos (and is caught by
logger.exception), or better yet perform the move only after
parse_and_save_pr_arguments_fournos and related processing complete successfully
so any failure is logged via logger.exception and the original fourn os_fjob
isn't relocated on partial failures.

In `@projects/core/library/config.py`:
- Around line 216-217: The open(...) call that appends to "presets_applied.txt"
in apply_preset can raise FileNotFoundError if env.ARTIFACT_DIR /
CI_METADATA_DIRNAME doesn't exist; before opening the file, ensure the directory
exists (e.g., check .exists() or call .mkdir(parents=True, exist_ok=True) on
env.ARTIFACT_DIR / CI_METADATA_DIRNAME) so that writing to "presets_applied.txt"
succeeds; update the code in apply_preset around the print(...) to create the
directory when missing, mirroring the guard used in
apply_config_overrides/save_config_overrides.

In `@projects/core/library/postprocess.py`:
- Around line 219-228: The --output-dir click option currently uses
click.Path(..., exists=True, file_okay=False, dir_okay=True) which forces
pre-creation; remove the exists=True constraint (keep file_okay=False) so the
CLI accepts non-existent output directories, since orchestration flows
(run_orchestration_postprocess → run_postprocess_from_orchestration_config →
run_visualize) will call output_dir.mkdir(parents=True, exist_ok=True) to create
the directory at runtime; update the click.option definition for "output_dir"
accordingly.

In `@projects/skeleton/orchestration/test_skeleton.py`:
- Around line 17-23: The helper that seeds per-scenario files currently only
writes per-scenario "__test_labels__.yaml" and "metrics.json" but must also
create top-level "visualize-groups.yaml" and "caliper.yaml" to match its
docstring and satisfy "report_group: default" resolution; update the helper (the
function that creates minimal Caliper inputs) to write a simple
"visualize-groups.yaml" containing a default report group mapping for the seeded
scenarios and a minimal "caliper.yaml" manifest (e.g., including the
plugin_module or equivalent minimal config) whenever visualization or default
report-group logic is expected so tests see a complete seeded tree.

In `@projects/skeleton/postprocess/default/plugin.py`:
- Around line 125-132: The HTML is currently written with external Plotly JS via
fig.write_html(out, include_plotlyjs="cdn"), which requires network access;
change the call in the throughput chart block (the fig.write_html invocation
that writes to variable out) to produce a self-contained artifact by inlining
Plotly (use include_plotlyjs=True or the equivalent "include" option) so the
generated throughput_chart.html works offline.

---

Outside diff comments:
In `@projects/caliper/cli/main.py`:
- Around line 532-549: run_cli currently only catches click.MissingParameter, so
other Click usage errors (BadParameter, NoSuchOption, BadOptionUsage) raised
because standalone_mode=False will produce raw tracebacks; update the exception
handling to catch click.UsageError (the common parent) in addition to
MissingParameter inside run_cli around the main.main(standalone_mode=False,
prog_name="caliper") call, and handle it the same way as MissingParameter:
format and echo the error message, show subcommand help if ctx/sub exists, and
exit with the appropriate non-zero exit code using the existing exit_code logic.

In `@projects/core/library/export.py`:
- Around line 50-61: Remove the unnecessary empty initialization of
fjob_data["status"]["engineStatus"]["forge"]["status"] since that key is never
used and adds spurious empty structure to oc patches; update the surrounding
comment to accurately describe the full nested initialization being performed
(e.g., ensure it documents status -> engineStatus -> forge) and leave only the
actual write to fjob_data["status"]["engineStatus"]["forge"]["exportArtifacts"]
= status, keeping all other existing guards that create missing dicts
(fjob_data, status, engineStatus, forge).

---

Duplicate comments:
In `@projects/caliper/engine/cache.py`:
- Around line 14-18: The fingerprint functions are including the per-test and
default cache dirs, causing self-invalidating fingerprints; update
fingerprint_base_dir and fingerprint_test_base_dir to skip any directory named
".caliper_cache" during the os.walk by ensuring the walk runs top-down and
removing ".caliper_cache" from the mutable _dirs list (e.g., filter _dirs
in-place) so files under .caliper_cache (written by cache_path_for_test_base and
default_cache_path) are excluded from the fingerprint computation.

In `@projects/core/library/postprocess.py`:
- Around line 219-228: The help text for the Click option "--output-dir"
(output_dir) in postprocess.py is incorrect: it currently states it "Overrides
caliper.postprocess.artifacts_dir and ARTIFACT_BASE_DIR" which belongs to the
artifact/input option. Update the help string for the "--output-dir"
click.option to clearly state that it specifies where post-processing results
are written (e.g., "Output directory where post-processing results will be
stored; overrides the default postprocess output location."), and remove any
mention of overriding caliper.postprocess.artifacts_dir or ARTIFACT_BASE_DIR so
the semantics match output_dir rather than the artifact/input flag.
- Around line 89-136: The function run_postprocess_after_test is annotated as
returning None but actually returns the status dict from
run_orchestration_postprocess (or returns early when disabled), so update its
return type to reflect that (e.g., Optional[dict[str, Any]] or dict[str, Any] |
None) and import/adjust typing as needed; ensure callers like
run_and_postprocess still treat the return as optional. Also update the function
docstring return semantics if present to match the new type.

In `@projects/skeleton/orchestration/test_skeleton.py`:
- Line 24: The assignment demo_dir = env.ARTIFACT_DIR uses the nested seed
ARTIFACT_DIR and can reintroduce the artifact-root mismatch; change it to point
at the artifact root used by postprocessing (use env.BASE_ARTIFACT_DIR or the
appropriate env.NextArtifactDir(...) call that returns the root) so fixtures are
discovered correctly by the postprocess path (see symbols demo_dir,
env.ARTIFACT_DIR, env.BASE_ARTIFACT_DIR, env.NextArtifactDir and the postprocess
discovery entrypoint like run_and_postprocess).

---

Nitpick comments:
In `@projects/caliper/cli/main.py`:
- Around line 121-138: The review highlights inconsistent exit codes between
_plugin_tuple (which uses _exit_with_help(..., code=1) for config errors and
code=2 for plugin load errors) and the KPI subcommands
kpi_import/kpi_export/kpi_analyze (which call sys.exit(3) on failure); update
the CLI to follow the documented exit-code matrix by either (A) making KPI
subcommands use _exit_with_help(..., code=3) for runtime/KPI failures so all
exits use the same helper, or (B) changing _plugin_tuple to raise specific
exceptions that the callers (kpi_generate and KPI commands) catch and convert to
the documented exit codes (1 for workspace/config, 2 for plugin load/parse
failures, 3 for KPI runtime), ensuring unique symbols _plugin_tuple,
kpi_generate, kpi_import, kpi_export, kpi_analyze and _exit_with_help are used
to locate and implement the change.

In `@projects/caliper/engine/visualize.py`:
- Around line 50-91: The run_visualize function has an unused parameter
cache_path; remove cache_path from run_visualize's signature and all places that
call it (and from any higher-level CLI/orchestration functions that forward
cache_path to run_visualize) so the public API no longer advertises an unused
option; if you must keep the option instead, alternatively modify run_parse to
accept a cache_path parameter and pass cache_path through from run_visualize
into run_parse (updating the run_parse signature and all its callers), but do
not leave cache_path unused in run_visualize.

In `@projects/caliper/orchestration/postprocess_config.py`:
- Around line 108-150: The root CaliperOrchestrationPostprocessConfig currently
sets model_config = ConfigDict(extra="ignore") which silently drops misspelled
top-level keys; change this to model_config = ConfigDict(extra="forbid") so
Pydantic will raise on unknown top-level fields (making behavior consistent with
the subsections that use extra="forbid"), and update any tests/docs that rely on
permissive behavior to expect validation errors for unknown
caliper.postprocess.* keys; locate the model by the class name
CaliperOrchestrationPostprocessConfig and its model_config attribute to make the
change.

In `@projects/caliper/orchestration/postprocess.py`:
- Around line 167-232: Both _run_parse and _run_visualize call _load_plugin(...)
causing duplicate manifest reads; hoist plugin resolution once before those
inner functions and reuse the result. Call _load_plugin(postprocess_config,
tree_root=tree_root, manifest_path=manifest_path) once (store the returned tuple
as e.g. mod_str, plugin) before defining or invoking _run_parse and
_run_visualize, and update those functions to use the captured mod_str and
plugin instead of calling _load_plugin again (or accept the tuple as a parameter
if you prefer explicit passing).

In `@projects/core/library/config.py`:
- Around line 13-14: projects/core/library/config.py currently imports
CI_METADATA_DIRNAME from the higher-level prepare_ci module creating a layering
violation; move the CI_METADATA_DIRNAME constant into a lower-level shared
module (e.g., projects.core.library.env or projects.core.constants), then update
projects/core/library/config.py to import CI_METADATA_DIRNAME from that new
shared module and update projects/core/ci_entrypoint/prepare_ci.py to import the
same constant from the shared module instead of from prepare_ci, ensuring no
upward dependency from library to ci_entrypoint.

In `@projects/core/library/postprocess.py`:
- Around line 32-86: The current try/except in run_and_postprocess uses except
BaseException which captures KeyboardInterrupt/SystemExit and later can cause
interrupt semantics to be lost when postprocessing fails; change the broad
except to except Exception as e to handle normal failures, and add a dedicated
except (KeyboardInterrupt, SystemExit) as e branch that sets original_exc and
exc_msg, then performs a best-effort call to run_postprocess_after_test
(catch/log any postprocess errors but do NOT replace or chain the original
interrupt) and finally re-raises the original interrupt so the process retains
correct signal/exit behavior; reference run_and_postprocess, original_exc,
exc_msg, and run_postprocess_after_test when implementing this.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 71ecd2d3-7768-48e3-b5ec-83df30ee091e

📥 Commits

Reviewing files that changed from the base of the PR and between 1d58c40 and 8537f13.

📒 Files selected for processing (29)
  • .gitignore
  • projects/caliper/README.md
  • projects/caliper/cli/main.py
  • projects/caliper/engine/ai_eval.py
  • projects/caliper/engine/cache.py
  • projects/caliper/engine/kpi/generate.py
  • projects/caliper/engine/parse.py
  • projects/caliper/engine/plugin_config.py
  • projects/caliper/engine/visualize.py
  • projects/caliper/orchestration/postprocess.py
  • projects/caliper/orchestration/postprocess_config.py
  • projects/caliper/orchestration/postprocess_outcome.py
  • projects/core/ci_entrypoint/fournos.py
  • projects/core/library/config.py
  • projects/core/library/export.py
  • projects/core/library/postprocess.py
  • projects/skeleton/orchestration/cli.py
  • projects/skeleton/orchestration/config.yaml
  • projects/skeleton/orchestration/test_skeleton.py
  • projects/skeleton/postprocess/__init__.py
  • projects/skeleton/postprocess/default/__init__.py
  • projects/skeleton/postprocess/default/caliper.yaml
  • projects/skeleton/postprocess/default/demo_run/load/__test_labels__.yaml
  • projects/skeleton/postprocess/default/demo_run/load/metrics.json
  • projects/skeleton/postprocess/default/demo_run/smoke/__test_labels__.yaml
  • projects/skeleton/postprocess/default/demo_run/smoke/metrics.json
  • projects/skeleton/postprocess/default/plugin.py
  • projects/skeleton/postprocess/default/visualize-groups.yaml
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • projects/caliper/engine/kpi/generate.py

Comment thread projects/caliper/engine/ai_eval.py
Comment thread projects/caliper/engine/cache.py
Comment thread projects/caliper/engine/parse.py
Comment thread projects/caliper/engine/visualize.py
Comment thread projects/caliper/orchestration/postprocess.py
Comment thread projects/core/ci_entrypoint/fournos.py
Comment thread projects/core/library/config.py Outdated
Comment thread projects/core/library/postprocess.py
Comment thread projects/skeleton/orchestration/test_skeleton.py
Comment thread projects/skeleton/postprocess/default/plugin.py
@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 6, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🔴 Test of 'fournos_launcher submit' failed after 00 hours 00 minutes 21 seconds 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

• Failure indicator: Empty.
Execution logs

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 6, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 10 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260506-135202
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 29 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

Execution logs

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 6, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 10 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260506-144049
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 35 seconds 🟢

• Link to the test results.

• No reports index generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

Execution logs

@kpouget
Copy link
Copy Markdown
Contributor Author

kpouget commented May 6, 2026

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'skeleton test' succeeded after 00 hours 00 minutes 10 seconds 🟢

• Link to the test results.

• Generated 2 Caliper report(s):

  • summary_table.html
  • throughput_chart.html

Test configuration:

ci_job.cluster: psap-mgmt
ci_job.exclusive: true
ci_job.fjob: forge-skeleton-20260506-152634
ci_job.hardware:
  gpuCount: 4
  gpuType: h200
ci_job.name: skeleton quick_test
ci_job.owner: kpouget
project.args:
- quick_test
project.name: skeleton

Execution logs

@psap-forge-bot
Copy link
Copy Markdown

psap-forge-bot Bot commented May 6, 2026

🟢 Test of 'fournos_launcher submit' succeeded after 00 hours 01 minutes 34 seconds 🟢

• Link to the test results.

• No reports generated...

Test configuration:

/test fournos skeleton quick_test
/var fournos.namespace: psap-automation-wip
/cluster psap-mgmt

Execution logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant