Conversation
1198d47 to
d029c1d
Compare
This change implements snapshot, record then reply testing for all integration tests. Re-enables recently skipped (falky) tests: - test_agent_understands_other_agents (snapshot was edited manually) - test_supervisor_resumes_subagent_thread_across_invocations - test_supervisor_resumes_subagent_thread_across_invocations_structured Introduces a deterministic thread_id mock generator, such that snapshots are deterministic, for: - test_supervisor_resumes_subagent_thread_across_invocations - test_supervisor_resumes_subagent_thread_across_invocations_structured Modified test_tool_execution_service_access using tool middleware, to make the test deterministic.
d029c1d to
7f769bb
Compare
| def _before_record_request(request: Request) -> Request | None: | ||
| url = parse.urlparse(request.uri) # pyright: ignore[reportUnknownArgumentType, reportUnknownVariableType] | ||
| if url.hostname == internal_ai_hostname: | ||
| request.headers = {} |
There was a problem hiding this comment.
Shouldn't we keep some specific headers for like Authorization and keep comparing them?
| OPENAI_API_KEY = "ollama" | ||
|
|
||
|
|
||
| class TestAgent(AITestCase): |
There was a problem hiding this comment.
Not sure if correct assumption but i think most of the time will be decorating all tests in the file. Maybe we could add one more decoration on class level that would add function decorator to each test automatically.
High-level idea:
instead of
class TestIntegration:
@ai_snapshot_test()
def func1():
...
@ai_snapshot_test()
def func2():
...
Do
@snapshot_test()
class TestIntegration:
def func1():
...
def func2():
...
I dont say it has to be specifically in this PR however i wanted to point it out.
There was a problem hiding this comment.
That is a nice idea, but i feel that the complexity of it might be huge, i.e. what if we want to have some test that is not snapshoted, then we need some kind of exclude decorator.
I also see us running these tests with multiple snapshots, say from openai, gemini, anthropic (as subtests, created by this decorator) and there might be a need to say skip specific providers from specific tests or such. For example in test_structured_output.pl for models that do not support provider strategy.
There was a problem hiding this comment.
I think if we wanted to achieve something like this we could just add a fixture instead.
The fixtures can be autoused, so that all files in the scope of the fixture behave the same.
But I agree with Mateusz I don't see an issue with having those tests explicitly say they are snapshot. Also turning of the snapshotting of a specific test is a matter of removing the decorator, no further work needed.
| return model | ||
|
|
||
|
|
||
| def ai_snapshot_test() -> Callable[ |
There was a problem hiding this comment.
[nitpick] the name suggests it can be used only for AI. What do you think about record_snapshot name?
There was a problem hiding this comment.
the name suggests it can be used only for AI.
We do filtering here, so i think it only works for AI 😄 .
splunk-dtaborski
left a comment
There was a problem hiding this comment.
I really like the solution.
| [ | ||
| HumanMessage( | ||
| content="hi, my name is Chris. Generate a nickname for me", | ||
| content="i, my name is Chris. Generate a nickname for me", |
There was a problem hiding this comment.
looks like a typo by mistake
| os.path.join(os.path.dirname(__file__), "testdata", "tool_context.py"), | ||
| ) | ||
| @patch("splunklib.ai.agent._testing_app_id", "app_id") | ||
| @ai_snapshot_test() |
There was a problem hiding this comment.
is calling the decorator needed?
@ai_snapshot_test looks a little easier to the eye
There was a problem hiding this comment.
but this requires changing the code in the decorator
| ).hostname | ||
| assert internal_ai_hostname is not None | ||
|
|
||
| REDACTED_APP_KEY = "[[[--APPKEY-REDACTED-]]]" |
There was a problem hiding this comment.
nit: this could be moved to the top of file
| assert settings.internal_ai.client_id.lower() not in out.lower() | ||
| assert settings.internal_ai.client_secret.lower() not in out.lower() | ||
|
|
||
| return out |
There was a problem hiding this comment.
curious if we should look for the other secrets for our testing LLM before serializing - to make sure we do not leak anything by mistake when creating new snapshots.
or add some ci stage for that.
WDYT?
| before_record_request=_before_record_request, | ||
| before_record_response=_before_record_response, | ||
| # record_on_exception=False, | ||
| # drop_unused_requests=True, |
There was a problem hiding this comment.
to be removed if not needed
| return decorator | ||
|
|
||
|
|
||
| def deterministic_thread_ids() -> Callable[ |
There was a problem hiding this comment.
that's cool, we don't need to change the test body to accommodate that.
This change implements snapshot, record then reply testing for all integration tests.
Re-enables recently skipped (falky) tests:
test_agent_understands_other_agents(snapshot was edited manually)test_supervisor_resumes_subagent_thread_across_invocationstest_supervisor_resumes_subagent_thread_across_invocations_structuredIntroduces a deterministic thread_id mock generator, such that snapshots are deterministic, for:
test_supervisor_resumes_subagent_thread_across_invocationstest_supervisor_resumes_subagent_thread_across_invocations_structuredModified
test_tool_execution_service_accessusing tool middleware, to make the test deterministic.E2E tests still call the LLMs directly.