Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and
`__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds
support for throughout bencmarking for functions with 2 inputs.
|
|
Previously, `AmdgpuSinTwoPow_128` and others were too large for their
table cells. This PR shortens the name to `AmdSin...`
There were also some `-` missing in the separator. This PR instead
creates the separator string using the length of the headers.
|
|
Previously, the time field was the total time take to run all iterations
of the benchmark. This PR changes the value displayed to be the average
time take by each iteration.
|
|
This PR adds sin benchmarking for a range of values and on a
pregenerated random distribution.
|
|
This PR adds minimums (50 iterations, 500 us, and epsilon of 0.0001) to
ensure that all benchmarks run at least a set number of times before
outputting a final measurement.
|
|
Summary:
This value is a uint32_t but is printed as a uint64_t, leading to
invalid offsets when done on AMDGPU due to its packed format extending
past the buffer.
|
|
This PR adds a `BENCHMARK_N_THREADS()` helper to register benchmarks
with a specific number of threads. This PR replaces the flags used
originally to allow any amount of threads.
|
|
This PR changes the output to resemble Google Benchmark. e.g.
```
Running Suite: LlvmLibcIsAlNumGpuBenchmark
Benchmark | Cycles | Min | Max | Iterations | Time (ns) | Stddev | Threads |
-----------------------------------------------------------------------------------------------------
IsAlnum | 92 | 76 | 482 | 23 | 86500 | 76 | 64 |
IsAlnumSingleThread | 87 | 76 | 302 | 20 | 72000 | 49 | 1 |
IsAlnumSingleWave | 87 | 76 | 302 | 20 | 72000 | 49 | 32 |
IsAlnumCapital | 89 | 76 | 299 | 17 | 78500 | 52 | 64 |
IsAlnumNotAlnum | 87 | 76 | 303 | 20 | 76000 | 49 | 64 |
```
|
|
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by
default, adding the option for single threaded benchmarks. We can
specify that a benchmark should be run on a single thread using the
`SINGLE_THREADED_BENCHMARK()` macro.
I chose to use a flag here so that other options could be added in the
future.
|
|
This PR replaces our old method of reducing the benchmark results by
using an array to using atomics instead. This should help us implement
single threaded benchmarks.
|
|
This is a part of #97655.
|
|
declaration" (#98593)
Reverts llvm/llvm-project#98075
bots are broken
|
|
This is a part of #97655.
|
|
There was previously an issue where registering multiple benchmarks in
the same file would only give the results for the last benchmark to run.
This PR fixes the issue.
@jhuber6
|
|
This PR fixes linting issues discovered by `cppcheck`.
Fixes: https://github.com/llvm/llvm-project/issues/96863
|
|
PR for adding microbenchmarking infrastructure for NVPTX. `nvlink`
cannot perform LTO, so we cannot inline `libc` functions and this
function call overhead is not adjusted for during microbenchmarking.
|