aboutsummaryrefslogtreecommitdiff
path: root/libc/benchmarks
AgeCommit message (Collapse)AuthorFilesLines
2025-11-12[runtimes][GTest] LLVM-independent unittests (#164794)Michael Kruse1-2/+4
The LLVM-customized GTest has a dependency on LLVM to support `llvm::raw_ostream` and hence has to link to LLVMSupport. The runtimes use the LLVMSupport from the bootstrapping LLVM build. The problem is that the boostrapping compiler and the runtimes target can diverge in their ABI, even in the runtimes default build. For instance, Clang is built using gcc which uses libstdc++, but the runtimes is built by Clang which can be configured to use libcxx by default. Altough it does not use gcc, this issue has caused [flang-aarch64-libcxx](https://lab.llvm.org/buildbot/#/builders/89)) to break, and is still (again?) broken. This patch makes the runtimes' GTest independent from LLVMSupport so we do not link any runtimes component with LLVM components. Runtime projects that use GTest unittests: * flang-rt * libc * compiler-rt: Adds `gtest-all.cpp` with [GTEST_NO_LLVM_SUPPORT=1](https://github.com/llvm/llvm-project/blob/f801b6f67ea896d6e4d2de38bce9a79689ceb254/compiler-rt/CMakeLists.txt#L723) to each unittest without using `llvm_gtest`. Not touched by this PR. * openmp: Handled by #159416. Not touched for now by this PR to avoid conflict. The current state of this PR tries to reuse https://github.com/llvm/llvm-project/blob/main/third-party/unittest/CMakeLists.txt as much as possible, altough personally I would prefer to make it use "modern CMake" style. third-party/unittest/CMakeLists.txt will detect whether it is used in runtimes build and adjaust accordingly. It creates a different target for LLVM (`llvm_gtest`, NFCI) and another one for the runtimes (`runtimes_gtest`). It is not possible to reuse `llvm_gtest` for both since `llvm_gtest` is imported using `find_package(LLVM)` if configured using LLVM_INSTALL_GTEST. An alias `default_gtest` is used to select between the two. `default_gtest` could also be used for openmp which also supports standalone and [LLVM_ENABLE_PROJECTS](https://github.com/llvm/llvm-project/pull/152189) build mode.
2025-09-24[libc][NFC] Remove usage of the C keyword `I`. (#160567)lntue3-8/+8
2025-08-27[libc][gpu] Add exp/log benchmarks and flexible input generation (#155727)Leandro Lacerda13-114/+744
This patch adds GPU benchmarks for the exp (`exp`, `expf`, `expf16`) and log (`log`, `logf`, `logf16`) families of math functions. Adding these benchmarks revealed a key limitation in the existing framework: the input generation mechanism was hardcoded to a single strategy that sampled numbers with a uniform distribution of their unbiased exponents. While this strategy is effective for values spanning multiple orders of magnitude, it is not suitable for linear ranges. The previous framework lacked the flexibility to support this. ### Summary of Changes **1. Framework Refactoring for Flexible Input Sampling:** The GPU benchmark framework was refactored to support multiple, pluggable input sampling strategies. * **`Random.h`:** A new header was created to house the `RandomGenerator` and the new distribution classes. * **Distribution Classes:** Two sampling strategies were implemented: * `UniformExponent`: Formalizes the previous logic of sampling numbers with a uniform distribution of their unbiased exponents. It can now also be configured to produce only positive values, which is essential for functions like `log`. * `UniformLinear`: A new strategy that samples numbers from a uniform distribution over a linear interval `[min, max)`. * **`MathPerf` Update:** The `MathPerf` class was updated with a generic `run_throughput` method that is templated on a distribution object. This makes the framework extensible to future sampling strategies. **2. New Benchmarks for `exp` and `log`:** Using the newly refactored framework, benchmarks were added for `exp`, `expf`, `expf16`, `log`, `logf`, and `logf16`. The test intervals were carefully chosen to measure the performance of distinct behavioral regions of each function.
2025-08-16[libc][gpu] Disable loop unrolling in the throughput benchmark loop (#153971)Leandro Lacerda2-0/+16
This patch makes GPU throughput benchmark results more comparable across targets by disabling loop unrolling in the benchmark loop. Motivation: * PTX (post-LTO) evidence on NVPTX: for libc `sin`, the generated PTX shows the `throughput` loop unrolled 8x at `N=128` (one iteration advances the input pointer by 64 bytes = 8 doubles), interleaving eight independent chains before the back-edge. This hides latency and significantly reduces cycles/call as the batch size `N` grows. * Observed scaling (NVPTX measurements): with unrolling enabled, `sin` dropped from ~3,100 cycles/call at `N=1` to ~360 at `N=128`. After enforcing `#pragma clang loop unroll(disable)`, results stabilized (e.g., from ~3100 cycles/call at `N=1` to ~2700 at `N=128`). * libdevice contrast: the libdevice `sin` path did not exhibit a similar drop in our measurements, and the PTX appears as compact internal calls rather than a long FMA chain, leaving less ILP for the outer loop to extract. What this change does: * Applies `#pragma clang loop unroll(disable)` to the GPU `throughput()` loop in both NVPTX and AMDGPU backends. Leaving unrolling entirely to the optimizer makes apples-to-apples comparisons uneven (e.g., libc vs. vendor). Disabling unrolling yields fairer, more consistent numbers.
2025-08-15[libc] Polish GPU benchmarking (#153900)Leandro Lacerda9-153/+22
This patch provides cleanups and improvements for the GPU benchmarking infrastructure. The key changes are: - Fix benchmark convergence bug: Round up the scaled iteration count (ceil) to ensure it grows properly. The previous truncation logic causes the iteration count to get stuck. - Resolve remaining compiler warning. - Remove unused `BenchmarkLogger` files: This is dead code that added maintenance and cognitive overhead without providing functionality. - Improve build hygiene: Clean up headers and CMake dependencies to strictly follow the 'include what you use' (IWYU) principle.
2025-08-15[libc] Improve GPU benchmarking (#153512)Leandro Lacerda10-227/+496
This patch improves the GPU benchmarking in this way: * Replace `rand`/`srand` with a deterministic per-thread RNG seeded by `call_index`: reproducible, apples-to-apples libc vs vendor comparisons. * Fix input generation: sample the unbiased exponent uniformly in `[min_exp, max_exp]`, clamp bounds, and skip `Inf`, `NaN`, `-0.0`, and `+0.0`. * Fix standard deviation: use an explicit estimator from sums and sums-of-squares (`sqrt(E[x^2] − E[x]^2)`) across samples. * Fix throughput overhead: subtract a loop-only baseline inside NVPTX/AMDGPU timing backends so `benchmark()` gets cycles-per-call already corrected (no `overhead()` call). * Adapt existing math benchmarks to the new RNG/timing plumbing (plumb `call_index`, drop `rand/srand`, clean includes). * Correct inter-thread aggregation: use iteration-weighted pooling to compute the global mean/variance, ensuring statistically sound `Cycles (Mean)` and `Stddev`. * Remove `Time / Iteration` column from the results table: it reported per-thread convergence time (not per-call latency) and was redundant/misleading next to `Cycles (Mean)`. * Remove unused `BenchmarkLogger` files: dead code that added maintenance and cognitive overhead without providing functionality. --- ## TODO (before merge) * [ ] Investigate compiler warnings and address their root causes. * [x] Review how per-thread results are aggregated into the overall result. ## Follow-ups (future PRs) * Add support to run throughput benchmarks with uniform (linear) input distributions, alongside the current log2-uniform scheme. * Review/adjust the configuration and coverage of existing math benchmarks. * Add more math benchmarks (e.g., `exp`/`expf`, others).
2025-07-23[libc][NFC] Add stdint.h proxy header to fix dependency issue with ↵lntue9-9/+11
<stdint.h> includes. (#150303) https://github.com/llvm/llvm-project/issues/149993
2025-07-18[libc] Fix GPU benchmarkingJoseph Huber6-55/+106
2025-01-23[libc][NFC] Strip all training whitespace and missing newlines (#124163)Joseph Huber2-2/+2
2024-12-10[libc] move bcmp, bzero, bcopy, index, rindex, strcasecmp, strncasecmp to ↵Nick Desaulniers1-3/+3
strings.h (#118899) docgen relies on the convention that we have a file foo.cpp in libc/src/\<header\>/. Because the above functions weren't in libc/src/strings/ but rather libc/src/string/, docgen could not find that we had implemented these. Rather than add special carve outs to docgen, let's fix up our sources for these 7 functions to stick with the existing conventions the rest of the codebase follows. Link: #118860 Fixes: #118875
2024-12-06[libc] Remove automemcpy folder (#118781)Guillaume Chatelet16-2276/+0
The build is currently broken and we don't have the resources to keep it up to date :-/
2024-11-14Fix build issues with libc mem* benchmarks (#115982)David Peixotto3-5/+4
Fix a few issues found when trying to build the benchmark: Errors 1. Unable to find include "src/__support/macros/config.h" in LibcMemoryBenchmarkMain.cpp Warnings 2. Unused variable warning `Index` in MemorySizeDistributions.cpp 3. Fix deprecation warning for const-ref version of `DoNotOptimize`. warning: 'DoNotOptimize<void *>' is deprecated: The const-ref version of this method can permit undesired compiler optimizations in benchmarks
2024-09-17[libc][benchmarks] Tidy uses of raw_string_ostream (NFC)Youngsuk Kim2-2/+2
As specified in the docs, 1) raw_string_ostream is always unbuffered and 2) the underlying buffer may be used directly ( 65b13610a5226b84889b923bae884ba395ad084d for further reference ) Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
2024-08-18[libc][gpu] Add Atan2 Benchmarks (#104708)jameshu158696-5/+137
This PR adds benchmarking for `atan2()`, `__nv_atan2()`, and `__ocml_atan2_f64()` using the same setup as `sin()`. This PR also adds support for throughout bencmarking for functions with 2 inputs.
2024-08-11Revert "libc: Remove `extern "C"` from main declarations" (#102827)Schrodinger ZHU Yifan1-1/+1
Reverts llvm/llvm-project#102825
2024-08-11libc: Remove `extern "C"` from main declarations (#102825)David Blaikie1-1/+1
This is invalid in C++, and clang recently started warning on it as of #101853
2024-08-08[libc][gpu] Add Sinf Benchmarks (#102532)jameshu158692-20/+41
This PR adds benchmarking for `sinf()` using the same set up as `sin()` but with a smaller range for floats.
2024-08-08[libc] [gpu] Fix Minor Benchmark UI Issues (#102529)jameshu158692-9/+11
Previously, `AmdgpuSinTwoPow_128` and others were too large for their table cells. This PR shortens the name to `AmdSin...` There were also some `-` missing in the separator. This PR instead creates the separator string using the length of the headers.
2024-08-08[libc] [gpu] Add Generic, NvSin, and OcmlSinf64 Throughput Benchmark (#101917)jameshu158696-80/+128
This PR implements https://github.com/lntue/llvm-project/commit/2a158426d4b90ffaa3eaecc9bc10e5aed11f1bcf to provide better throughput benchmarking for libc `sin()` and `__nv_sin()`. These changes have not been tested on AMDGPU yet, only compiled.
2024-08-05[libc] [gpu] Change Time To Be Per Iteration (#101919)jameshu158691-5/+5
Previously, the time field was the total time take to run all iterations of the benchmark. This PR changes the value displayed to be the average time take by each iteration.
2024-07-30[libc] Only link in the appropriate architecture's device libsJoseph Huber1-17/+19
2024-07-30[libc] Add AMDGPU Sin Benchmark (#101120)jameshu158692-1/+23
This PR adds support for benchmarking `__ocml_sin_f64()` against `sin()`. This PR is currently a draft because I do not have access to an AMD GPU and was not able to test the PR, but the code compiled when I ran `ninja gpu-benchmark` from `runtimes-amdgcn-amd-amdhsa-bins` Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-07-29[libc] Add Generic and NVPTX Sin Benchmark (#99795)jameshu158698-7/+156
This PR adds sin benchmarking for a range of values and on a pregenerated random distribution.
2024-07-27[libc] Make NVPTX benchmarks use LTO for linkingJoseph Huber1-0/+2
Summary: Now that we can do LTO, we can make the benchmarks more accurate by allowing optimization + inlining of the implementation.
2024-07-26[libc] Add Minimum Time and Iterations, Reduce Epsilon (#100838)jameshu158692-2/+4
This PR adds minimums (50 iterations, 500 us, and epsilon of 0.0001) to ensure that all benchmarks run at least a set number of times before outputting a final measurement.
2024-07-22[libc] Fix invalid format specifier in benchmarkJoseph Huber1-16/+11
Summary: This value is a uint32_t but is printed as a uint64_t, leading to invalid offsets when done on AMDGPU due to its packed format extending past the buffer.
2024-07-21[libc] Add N Threads Benchmark Helper (#99834)jameshu158692-16/+15
This PR adds a `BENCHMARK_N_THREADS()` helper to register benchmarks with a specific number of threads. This PR replaces the flags used originally to allow any amount of threads.
2024-07-21[libc] Improve Benchmark UI (#99796)jameshu158693-18/+62
This PR changes the output to resemble Google Benchmark. e.g. ``` Running Suite: LlvmLibcIsAlNumGpuBenchmark Benchmark | Cycles | Min | Max | Iterations | Time (ns) | Stddev | Threads | ----------------------------------------------------------------------------------------------------- IsAlnum | 92 | 76 | 482 | 23 | 86500 | 76 | 64 | IsAlnumSingleThread | 87 | 76 | 302 | 20 | 72000 | 49 | 1 | IsAlnumSingleWave | 87 | 76 | 302 | 20 | 72000 | 49 | 32 | IsAlnumCapital | 89 | 76 | 299 | 17 | 78500 | 52 | 64 | IsAlnumNotAlnum | 87 | 76 | 303 | 20 | 76000 | 49 | 64 | ```
2024-07-19[libc] Add AMDGPU Timing to CMake (#99603)jameshu158691-1/+1
`libc/benchmarks/gpu/timing/CMakeLists.txt` did not correctly build `amdgpu` utils. This PR fixes that issue by adding `amdgpu` to the loop that adds the correct sub directories.
2024-07-18[libc] Add Multithreaded GPU Benchmarks (#98964)jameshu158695-5/+30
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by default, adding the option for single threaded benchmarks. We can specify that a benchmark should be run on a single thread using the `SINGLE_THREADED_BENCHMARK()` macro. I chose to use a flag here so that other options could be added in the future.
2024-07-17[libc] Add Kernel Resource Usage to nvptx-loader (#97503)jameshu158691-1/+3
This PR allows `nvptx-loader` to read the resource usage of `_start`, `_begin`, and `_end` when executing CUDA binaries. Example output: ``` $ nvptx-loader --print-resource-usage libc/benchmarks/gpu/src/ctype/libc.benchmarks.gpu.src.ctype.isalnum_benchmark.__build__ [ RUN ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper [ OK ] LlvmLibcIsAlNumGpuBenchmark.IsAlnumWrapper: 93 cycles, 76 min, 470 max, 23 iterations, 78000 ns, 80 stddev _begin registers: 25 _start registers: 80 _end registers: 62 ``` --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-07-15[libc] Use Atomics in GPU Benchmarks (#98842)jameshu158693-43/+97
This PR replaces our old method of reducing the benchmark results by using an array to using atomics instead. This should help us implement single threaded benchmarks.
2024-07-12[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)Petr Hosek10-20/+29
This is a part of #97655.
2024-07-12Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace ↵Mehdi Amini10-29/+20
declaration" (#98593) Reverts llvm/llvm-project#98075 bots are broken
2024-07-11[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)Petr Hosek10-20/+29
This is a part of #97655.
2024-07-11[libc] Correctly Run Multiple Benchmarks in the Same File (#98467)jameshu158692-5/+16
There was previously an issue where registering multiple benchmarks in the same file would only give the results for the last benchmark to run. This PR fixes the issue. @jhuber6
2024-07-10[libc] Add Timing Utils for AMDGPU (#96828)jameshu158693-1/+120
PR for adding AMDGPU timing utils for benchmarking. I was not able to test this code since I do not have an AMD GPU, but I was able to successfully compile this code using -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_TEST_ARCHITECTURE=gfx90a -DRUNTIMES_amdgcn-amd-amdhsa_LIBC_GPU_LOADER_EXECUTABLE=echo -DRUNTIMES_amdgcn_amd-amdhsa_LIBC_GPU_TARGET_ARCHITECTURE=gfx90a to force the code to compile without having an AMD gpu on my machine. @jhuber6
2024-07-06[libc] Fix Cppcheck Issues (#96999)jameshu158692-15/+14
This PR fixes linting issues discovered by `cppcheck`. Fixes: https://github.com/llvm/llvm-project/issues/96863
2024-06-26[libc] NVPTX Profiling (#92009)jameshu1586915-0/+624
PR for adding microbenchmarking infrastructure for NVPTX. `nvlink` cannot perform LTO, so we cannot inline `libc` functions and this function call overhead is not adjusted for during microbenchmarking.
2024-02-29[libc] Revert https://github.com/llvm/llvm-project/pull/83199 since it broke ↵lntue1-1/+1
Fuchsia. (#83374) With some header fix forward for GPU builds.
2024-02-27[libc] Add "include/" to the LLVM include directories (#83199)Joseph Huber1-1/+2
Summary: Recent changes added an include path in the float128 type that used the internal `libc` path to find the macro. This doesn't work once it's installed because we need to search from the root of the install dir. This patch adds "include/" to the include path so that our inclusion of installed headers always match the internal use.
2024-02-23[libc][NFC] Remove all trailing spaces from libc (#82831)Joseph Huber1-2/+2
Summary: There are a lot of random training spaces on various lines. This patch just got rid of all of them with `sed 's/\ \+$//g'.
2023-10-26[libc] Add --sweep-min-size flag for benchmarks (#70302)Dmitry Vyukov1-4/+10
We have --sweep-max-size, it's reasonable to have --sweep-min-size as well. It can be used when working on the logic for larger sizes, or to collect a profile for larger sizes only.
2023-09-26[libc] Mass replace enclosing namespace (#67032)Guillaume Chatelet6-30/+30
This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-09-20[reland][libc][cmake] Tidy compiler includes (#66783) (#66878)Guillaume Chatelet1-0/+3
This is a reland of #66783 a35a3b75b219247eb9ff6784d1a0fe562f72d415 fixing the benchmark breakage.
2023-08-07[test][libc] Fix aligned_alloc argumentVitaly Buka1-2/+3
Size must be multiple of Alignment. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D157247
2023-05-11[libc][benchmark] Do not force static linkingGuillaume Chatelet1-1/+0
Being able to link statically depends on other CMake options and choice of libc.
2023-05-11[libc] Allows cross compilation of membenchmarksGuillaume Chatelet1-8/+17
This patch makes sure: - we pass the correct compiler options when building Google benchmarks, - we only import the C++ version of the memory functions. The change in libc/cmake/modules/LLVMLibCTestRules.cmake is here to make sure CMake can generate the right command line in the presence of the CMAKE_CROSSCOMPILING_EMULATOR option. Relevant documentation: https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING_EMULATOR.html https://cmake.org/cmake/help/latest/command/add_custom_command.html#command:add_custom_command " If COMMAND specifies an executable target name (created by the `add_executable()` command), it will automatically be replaced by the location of the executable created at build time if either of the following is true: - The target is not being cross-compiled (i.e. the CMAKE_CROSSCOMPILING variable is not set to true). - New in version 3.6: The target is being cross-compiled and an emulator is provided (i.e. its CROSSCOMPILING_EMULATOR target property is set). In this case, the contents of CROSSCOMPILING_EMULATOR will be prepended to the command before the location of the target executable. " Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D150200
2023-02-10[NFC][TargetParser] Replace uses of llvm/Support/Host.hArchibald Elliott1-1/+1
The forwarding header is left in place because of its use in `polly/lib/External/isl/interface/extract_interface.cc`, but I have added a GCC warning about the fact it is deprecated, because it is used in `isl` from where it is included by Polly.
2023-01-14[libc] Use std::optional instead of llvm::Optional (NFC)Kazu Hirata3-9/+11
This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716