diff options
author | Leandro Lacerda <leandrolcampos@yahoo.com.br> | 2025-08-15 13:00:17 -0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-08-15 11:00:17 -0500 |
commit | 08ff017fb0c9c7c3c91858023ea45149449fbbfc (patch) | |
tree | e1e2252630986180458ebbc780942ae28bb668d8 /llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | |
parent | f34326dac8e6903e0621dd87505928756f860d6d (diff) | |
download | llvm-08ff017fb0c9c7c3c91858023ea45149449fbbfc.zip llvm-08ff017fb0c9c7c3c91858023ea45149449fbbfc.tar.gz llvm-08ff017fb0c9c7c3c91858023ea45149449fbbfc.tar.bz2 |
[libc] Improve GPU benchmarking (#153512)
This patch improves the GPU benchmarking in this way:
* Replace `rand`/`srand` with a deterministic per-thread RNG seeded by
`call_index`: reproducible, apples-to-apples libc vs vendor comparisons.
* Fix input generation: sample the unbiased exponent uniformly in
`[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and
`+0.0`.
* Fix standard deviation: use an explicit estimator from sums and
sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples.
* Fix throughput overhead: subtract a loop-only baseline inside
NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call
already corrected (no `overhead()` call).
* Adapt existing math benchmarks to the new RNG/timing plumbing (plumb
`call_index`, drop `rand/srand`, clean includes).
* Correct inter-thread aggregation: use iteration-weighted pooling to
compute the global mean/variance, ensuring statistically sound `Cycles
(Mean)` and `Stddev`.
* Remove `Time / Iteration` column from the results table: it reported
per-thread convergence time (not per-call latency) and was
redundant/misleading next to `Cycles (Mean)`.
* Remove unused `BenchmarkLogger` files: dead code that added
maintenance and cognitive overhead without providing functionality.
---
## TODO (before merge)
* [ ] Investigate compiler warnings and address their root causes.
* [x] Review how per-thread results are aggregated into the overall
result.
## Follow-ups (future PRs)
* Add support to run throughput benchmarks with uniform (linear) input
distributions, alongside the current log2-uniform scheme.
* Review/adjust the configuration and coverage of existing math
benchmarks.
* Add more math benchmarks (e.g., `exp`/`expf`, others).
Diffstat (limited to 'llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp')
0 files changed, 0 insertions, 0 deletions