riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Leandro Lacerda <leandrolcampos@yahoo.com.br>	2025-08-15 13:00:17 -0300
committer	GitHub <noreply@github.com>	2025-08-15 11:00:17 -0500
commit	08ff017fb0c9c7c3c91858023ea45149449fbbfc (patch)
tree	e1e2252630986180458ebbc780942ae28bb668d8 /llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
parent	f34326dac8e6903e0621dd87505928756f860d6d (diff)
download	llvm-08ff017fb0c9c7c3c91858023ea45149449fbbfc.zip llvm-08ff017fb0c9c7c3c91858023ea45149449fbbfc.tar.gz llvm-08ff017fb0c9c7c3c91858023ea45149449fbbfc.tar.bz2

[libc] Improve GPU benchmarking (#153512)

This patch improves the GPU benchmarking in this way: * Replace `rand`/`srand` with a deterministic per-thread RNG seeded by `call_index`: reproducible, apples-to-apples libc vs vendor comparisons. * Fix input generation: sample the unbiased exponent uniformly in `[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and `+0.0`. * Fix standard deviation: use an explicit estimator from sums and sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples. * Fix throughput overhead: subtract a loop-only baseline inside NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call already corrected (no `overhead()` call). * Adapt existing math benchmarks to the new RNG/timing plumbing (plumb `call_index`, drop `rand/srand`, clean includes). * Correct inter-thread aggregation: use iteration-weighted pooling to compute the global mean/variance, ensuring statistically sound `Cycles (Mean)` and `Stddev`. * Remove `Time / Iteration` column from the results table: it reported per-thread convergence time (not per-call latency) and was redundant/misleading next to `Cycles (Mean)`. * Remove unused `BenchmarkLogger` files: dead code that added maintenance and cognitive overhead without providing functionality. --- ## TODO (before merge) * [ ] Investigate compiler warnings and address their root causes. * [x] Review how per-thread results are aggregated into the overall result. ## Follow-ups (future PRs) * Add support to run throughput benchmarks with uniform (linear) input distributions, alongside the current log2-uniform scheme. * Review/adjust the configuration and coverage of existing math benchmarks. * Add more math benchmarks (e.g., `exp`/`expf`, others).

Diffstat (limited to 'llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: