Skip to content

feat: Add perf counter flag for benchmark-created threads#2179

Open
shreejaykurhade wants to merge 4 commits intogoogle:mainfrom
shreejaykurhade:perf-counters-all-threads
Open

feat: Add perf counter flag for benchmark-created threads#2179
shreejaykurhade wants to merge 4 commits intogoogle:mainfrom
shreejaykurhade:perf-counters-all-threads

Conversation

@shreejaykurhade
Copy link
Copy Markdown
Contributor

Summary

Adds --benchmark_perf_counters_all_threads={true|false} to control whether perf counters include benchmark-created threads.

The default is true, preserving the existing behavior. Passing --benchmark_perf_counters_all_threads=false sets perf_event_attr::inherit to false, allowing perf counters to measure only the main benchmark thread.

This addresses #1897.

Details

  • Adds the new command-line flag and help text.
  • Keeps existing perf counter behavior by default.
  • Adds explicit internal APIs for current-thread perf counters instead of passing opaque boolean arguments.
  • Suppresses the existing multi-thread perf-counter warning when counters are limited to the main benchmark thread.
  • Documents the new flag in docs/perf_counters.md.

Testing

  • cmake --build build --config Debug
  • ctest --test-dir build --output-on-failure

All tests passed locally: 85/85.

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 26, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@shreejaykurhade
Copy link
Copy Markdown
Contributor Author

shreejaykurhade commented Apr 26, 2026

please review @dmah42
Additional validation:

I verified that:

  • PerfCounters::Create(...) preserves the existing behavior and sets inherit=true.
  • PerfCounters::CreateForCurrentThread(...) sets inherit=false.
  • --benchmark_perf_counters_all_threads=false is consumed by the command-line parser and sets the flag value to false.

@shreejaykurhade shreejaykurhade force-pushed the perf-counters-all-threads branch from 523814b to c0b6b1e Compare April 26, 2026 14:40
@dmah42
Copy link
Copy Markdown
Member

dmah42 commented Apr 26, 2026

why would someone want this? documentation should aim to include the why as well as the what.

@shreejaykurhade
Copy link
Copy Markdown
Contributor Author

Someone would want this when the benchmarked code creates additional threads, but the measurement they care about is only the benchmark thread itself.

With inherit=true, perf counters include activity from threads created by the benchmarked code. That is useful for measuring total work, so this PR keeps it as the default. But for some benchmarks, the worker threads are an implementation detail or a separate subsystem, and including their counters changes the meaning of the result. For example, a benchmark may want to measure caller-side dispatch overhead, queue submission, polling, or coordination costs on the main benchmark thread without folding in the work done by spawned threads.

So the “why” is: this flag lets users choose whether perf counters represent total thread-inherited activity or only the current benchmark thread, depending on what question their benchmark is trying to answer.

I can add that motivation to the docs so the flag is described in terms of when to use each mode, not just what it toggles.

@shreejaykurhade
Copy link
Copy Markdown
Contributor Author

@dmah42 Updated docs/perf_counters.md to include the motivation.

The docs now explain that the default behavior measures total work across benchmark-created threads, while --benchmark_perf_counters_all_threads=false is useful when the benchmark is intended to measure only main-thread/caller-side costs such as dispatch, queue submission, polling, or coordination without including worker-thread activity.

@dmah42
Copy link
Copy Markdown
Member

dmah42 commented Apr 27, 2026

but the point of the library is to measure the performance of the function-under-test. if we do this, then the perf counters won't match the other timing measurements. i'm not seeing a concrete use case for this change, and it adds complexity to the code and to the user.

@shreejaykurhade
Copy link
Copy Markdown
Contributor Author

@dmah42 That makes sense. The mismatch with the timing measurements is the main tradeoff here: with --benchmark_perf_counters_all_threads=false, the perf counters describe only the caller thread while the benchmark timing still describes the full function invocation.

The concrete case I had in mind is benchmarking APIs where the function-under-test intentionally submits work to an already-running worker pool and returns after doing caller-side orchestration. In that setup, the benchmark author may want counters for the submission/polling path only, because worker-thread counters are dominated by unrelated executor activity or shared worker work.

That said, I see your point that this can be confusing as a general benchmark option, especially if the counters no longer line up with the timing columns. If that use case is not compelling enough for the library, I’m okay dropping this change rather than adding user-facing complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants