Snapshot record then reply testing by mateusz834 · Pull Request #740 · splunk/splunk-sdk-python

mateusz834 · 2026-04-23T07:56:26Z

This change implements snapshot, record then reply testing for all integration tests.

Re-enables recently skipped (falky) tests:

test_agent_understands_other_agents (snapshot was edited manually)
test_supervisor_resumes_subagent_thread_across_invocations
test_supervisor_resumes_subagent_thread_across_invocations_structured

Introduces a deterministic thread_id mock generator, such that snapshots are deterministic, for:

test_supervisor_resumes_subagent_thread_across_invocations
test_supervisor_resumes_subagent_thread_across_invocations_structured

Modified test_tool_execution_service_access using tool middleware, to make the test deterministic.

E2E tests still call the LLMs directly.

This change implements snapshot, record then reply testing for all integration tests. Re-enables recently skipped (falky) tests: - test_agent_understands_other_agents (snapshot was edited manually) - test_supervisor_resumes_subagent_thread_across_invocations - test_supervisor_resumes_subagent_thread_across_invocations_structured Introduces a deterministic thread_id mock generator, such that snapshots are deterministic, for: - test_supervisor_resumes_subagent_thread_across_invocations - test_supervisor_resumes_subagent_thread_across_invocations_structured Modified test_tool_execution_service_access using tool middleware, to make the test deterministic.

splunk-dtaborski · 2026-04-24T08:02:57Z

+            def _before_record_request(request: Request) -> Request | None:
+                url = parse.urlparse(request.uri)  # pyright: ignore[reportUnknownArgumentType, reportUnknownVariableType]
+                if url.hostname == internal_ai_hostname:
+                    request.headers = {}


Shouldn't we keep some specific headers for like Authorization and keep comparing them?

splunk-dtaborski · 2026-04-24T08:23:59Z

 OPENAI_API_KEY = "ollama"


 class TestAgent(AITestCase):


Not sure if correct assumption but i think most of the time will be decorating all tests in the file. Maybe we could add one more decoration on class level that would add function decorator to each test automatically.

High-level idea:
instead of

class TestIntegration: @ai_snapshot_test() def func1(): ... @ai_snapshot_test() def func2(): ...

Do

@snapshot_test() class TestIntegration: def func1(): ... def func2(): ...

I dont say it has to be specifically in this PR however i wanted to point it out.

That is a nice idea, but i feel that the complexity of it might be huge, i.e. what if we want to have some test that is not snapshoted, then we need some kind of exclude decorator.
I also see us running these tests with multiple snapshots, say from openai, gemini, anthropic (as subtests, created by this decorator) and there might be a need to say skip specific providers from specific tests or such. For example in test_structured_output.pl for models that do not support provider strategy.

CC @szykol @Ickerday for opinions.

I think if we wanted to achieve something like this we could just add a fixture instead.
The fixtures can be autoused, so that all files in the scope of the fixture behave the same.

But I agree with Mateusz I don't see an issue with having those tests explicitly say they are snapshot. Also turning of the snapshotting of a specific test is a matter of removing the decorator, no further work needed.

splunk-dtaborski · 2026-04-24T08:26:16Z

        return model
+
+
+def ai_snapshot_test() -> Callable[


[nitpick] the name suggests it can be used only for AI. What do you think about record_snapshot name?

the name suggests it can be used only for AI.

We do filtering here, so i think it only works for AI 😄 .

splunk-dtaborski

I really like the solution.

szykol · 2026-04-24T15:01:36Z

                [
                    HumanMessage(
-                        content="hi, my name is Chris. Generate a nickname for me",
+                        content="i, my name is Chris. Generate a nickname for me",


looks like a typo by mistake

szykol · 2026-04-24T15:02:24Z

        os.path.join(os.path.dirname(__file__), "testdata", "tool_context.py"),
    )
    @patch("splunklib.ai.agent._testing_app_id", "app_id")
+    @ai_snapshot_test()


is calling the decorator needed?

@ai_snapshot_test looks a little easier to the eye

but this requires changing the code in the decorator

szykol · 2026-04-24T15:04:02Z

+            ).hostname
+            assert internal_ai_hostname is not None
+
+            REDACTED_APP_KEY = "[[[--APPKEY-REDACTED-]]]"


nit: this could be moved to the top of file

szykol · 2026-04-24T15:05:27Z

+                    assert settings.internal_ai.client_id.lower() not in out.lower()
+                    assert settings.internal_ai.client_secret.lower() not in out.lower()
+
+                    return out


curious if we should look for the other secrets for our testing LLM before serializing - to make sure we do not leak anything by mistake when creating new snapshots.

or add some ci stage for that.

WDYT?

szykol · 2026-04-24T15:06:02Z

+                before_record_request=_before_record_request,
+                before_record_response=_before_record_response,
+                # record_on_exception=False,
+                # drop_unused_requests=True,


to be removed if not needed

szykol · 2026-04-24T15:07:16Z

+    return decorator
+
+
+def deterministic_thread_ids() -> Callable[


that's cool, we don't need to change the test body to accommodate that.

mateusz834 force-pushed the dev/MP/snapshots branch 16 times, most recently from 1198d47 to d029c1d Compare April 24, 2026 06:45

mateusz834 marked this pull request as ready for review April 24, 2026 06:45

mateusz834 force-pushed the dev/MP/snapshots branch from d029c1d to 7f769bb Compare April 24, 2026 06:46

mateusz834 requested review from Ickerday and szykol and removed request for szykol April 24, 2026 07:16

splunk-dtaborski reviewed Apr 24, 2026

View reviewed changes

szykol reviewed Apr 24, 2026

View reviewed changes

Conversation

mateusz834 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

splunk-dtaborski Apr 24, 2026 • edited by mateusz834 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mateusz834 Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mateusz834 Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

splunk-dtaborski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mateusz834 commented Apr 23, 2026 •

edited

Loading

splunk-dtaborski Apr 24, 2026 •

edited by mateusz834

Loading

mateusz834 Apr 24, 2026 •

edited

Loading

mateusz834 Apr 24, 2026 •

edited

Loading