feat: Add evaluations support to ManagedAgent.run() by jsonbailey · Pull Request #153 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-04-28T23:27:49Z

Summary

Wires judge evaluations into ManagedAgent.run() via asyncio.Task, mirroring ManagedModel.run() (PR 7 / PR 8)
run() returns immediately; await result.evaluations guarantees both evaluation and tracker.track_judge_result() complete
Uses ai_config.evaluator.evaluate(input, content) — resolves to empty list with Evaluator.noop()
Failed judge results (success=False) do NOT call track_judge_result()
Adds 6 new tests covering the full evaluations contract

Depends on

feat: Wire LDAIMetrics tool_calls and duration_ms into tracker #152 (PR 10 — enrich-metrics, which is based on feat: Add ManagedGraphResult, GraphMetricSummary, and AgentGraphRunnerResult types #151, feat: Update LangChain runners to implement Runner protocol returning RunnerResult #150, feat: Update OpenAI runners to implement Runner protocol returning RunnerResult #149, feat!: Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() #148, fix: Replace done_callback with coroutine chain for judge tracking #147)

Test plan

All existing tests pass (uv run pytest packages/sdk/server-ai/tests/)
New TestManagedAgentEvaluations tests: run returns before evaluations resolve, collect results, tracking fires on await, noop evaluator returns empty list, failed results not tracked

🤖 Generated with Claude Code

Wire judge evaluations into ManagedAgent.run() via an asyncio.Task, mirroring ManagedModel.run(). Awaiting result.evaluations guarantees both evaluation and tracker.track_judge_result() complete. run() returns immediately; the evaluations task resolves asynchronously. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mirror the managed_model.py fix in managed_agent.py: wrap tracker.track_judge_result() in try/except so a tracking failure does not destroy successfully computed evaluation results, and log a warning when a judge evaluation fails (r.success is False) so failures are visible rather than silently skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 4f29d99 to 0ea4a04 Compare April 28, 2026 23:56

jsonbailey changed the base branch from jb/aic-2388/enrich-metrics to jb/aic-2174/langchain-graph-runner April 28, 2026 23:57

jsonbailey force-pushed the jb/aic-2174/langchain-graph-runner branch from 0539ba1 to 404670d Compare April 29, 2026 13:15

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 0ea4a04 to 04e80a8 Compare April 29, 2026 13:15

jsonbailey force-pushed the jb/aic-2174/langchain-graph-runner branch from 404670d to f132154 Compare April 29, 2026 13:19

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 04e80a8 to 29ced10 Compare April 29, 2026 13:19

jsonbailey force-pushed the jb/aic-2174/langchain-graph-runner branch from f132154 to eb1004c Compare April 29, 2026 13:22

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 29ced10 to c343602 Compare April 29, 2026 13:23

jsonbailey force-pushed the jb/aic-2174/langchain-graph-runner branch from eb1004c to 8a049e2 Compare April 29, 2026 13:53

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from c343602 to 1a24a4f Compare April 29, 2026 13:55

jsonbailey force-pushed the jb/aic-2174/langchain-graph-runner branch from 8a049e2 to cea3780 Compare April 29, 2026 13:57

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 1a24a4f to 78a7ded Compare April 29, 2026 13:57

jsonbailey force-pushed the jb/aic-2174/langchain-graph-runner branch from cea3780 to f27f9b8 Compare April 29, 2026 14:39

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 38951a6 to 52756c7 Compare April 29, 2026 14:39

jsonbailey force-pushed the jb/aic-2174/langchain-graph-runner branch from f27f9b8 to d892533 Compare April 29, 2026 16:34

jsonbailey and others added 3 commits April 29, 2026 11:34

fix: log warning when judge result tracking fails in ManagedAgent

ff2de9a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jsonbailey force-pushed the jb/aic-2174/agent-evaluations branch from 52756c7 to ff2de9a Compare April 29, 2026 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add evaluations support to ManagedAgent.run()#153

feat: Add evaluations support to ManagedAgent.run()#153
jsonbailey wants to merge 3 commits intojb/aic-2174/langchain-graph-runnerfrom
jb/aic-2174/agent-evaluations

jsonbailey commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jsonbailey commented Apr 28, 2026

Summary

Depends on

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant