aboutsummaryrefslogtreecommitdiff
path: root/libc/src/string/memory_utils
AgeCommit message (Collapse)AuthorFilesLines
2025-12-04Reland Refactor WIDE_READ to allow finer control over high-performance ↵Sterling-Augustine3-14/+21
function selection (#165613) (#170738) [Previous commit had an incorrect default case when FIND_FIRST_CHARACTER_WIDE_READ_IMPL was not specified in config.json. This PR is identical to that one with one line fixed.] As we implement more high-performance string-related functions, we have found a need for better control over their selection than the big-hammer LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr implementation coming, and unless I implement it in every variant, a simple binary value doesn't work. This PR makes gives finer-grained control over high-performance functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any function they like, the user can now select one of four implementations at build time: 1. element, which reads byte-by-byte (or wchar by wchar) 2. wide, which reads by unsigned long 3. generic, which uses standard clang vector implemenations, if available 4. arch, which uses an architecture-specific implemenation (Reading the code carefully, you may note that a user can actually specify any namespace they want, so we aren't technically limited to those 4.) We may also want to switch from command-line #defines as it is currently done, to something more like llvm-project/llvm/include/llvm/Config/llvm-config.h.cmake, and complexity out of the command-line. But that's a future problem.
2025-12-04Revert "Refactor WIDE_READ to allow finer control over high-performance ↵Sterling-Augustine3-21/+14
function selection" (#170717) Reverts llvm/llvm-project#165613 Breaks build bot
2025-12-04Refactor WIDE_READ to allow finer control over high-performance function ↵Sterling-Augustine3-14/+21
selection (#165613) [This is more of a straw-proposal than a ready-for-merging PR. I got started thinking about what this might look like, and ended up just implementing something as a proof-of-concept. Totally open to other methods an ideas.] As we implement more high-performance string-related functions, we have found a need for better control over their selection than the big-hammer LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr implementation coming, and unless I implement it in every variant, a simple binary value doesn't work. This PR makes gives finer-grained control over high-performance functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any function they like, the user can now select one of four implementations at build time: 1. element, which reads byte-by-byte (or wchar by wchar) 2. wide, which reads by unsigned long 3. generic, which uses standard clang vector implemenations, if available 4. arch, which uses an architecture-specific implemenation (Reading the code carefully, you may note that a user can actually specify any namespace they want, so we aren't technically limited to those 4.) We may also want to switch from command-line #defines as it is currently done, to something more like llvm-project/llvm/include/llvm/Config/llvm-config.h.cmake, and #including the resulting file, which would move quite a bit of complexity out of the command-line. But that's a future problem.
2025-11-10[libc] add an SVE implementation of strlen (#167259)Schrodinger ZHU Yifan1-5/+58
This PR creates an SVE-based implementation for strlen by translating from the AOR code in tree. Microbenchmark shows improvements against NEON when N>=64. Although both implementations fall behind glibc by a large margin, this may be a good start point to explore SVE implementations. Together with the PR: 1. Added two more tests of strlen with special nul symbols. 2. Added strlen's fuzzer and fix a typo in previous heap fuzzer. ``` === strlen(16 bytes) === libc: 1.56115 ns/call, 9.54499 GiB/s neon: 1.59393 ns/call, 9.34867 GiB/s sve: 1.66097 ns/call, 8.97134 GiB/s === strlen(64 bytes) === libc: 2.06967 ns/call, 28.7991 GiB/s neon: 2.59914 ns/call, 22.9325 GiB/s sve: 2.58628 ns/call, 23.0465 GiB/s === strlen(256 bytes) === libc: 3.74165 ns/call, 63.7202 GiB/s neon: 8.98243 ns/call, 26.5428 GiB/s sve: 7.36426 ns/call, 32.3751 GiB/s === strlen(1024 bytes) === libc: 10.5327 ns/call, 90.5438 GiB/s neon: 34.363 ns/call, 27.7529 GiB/s sve: 26.9329 ns/call, 35.4092 GiB/s === strlen(4096 bytes) === libc: 37.7304 ns/call, 101.104 GiB/s neon: 145.911 ns/call, 26.144 GiB/s sve: 103.208 ns/call, 36.9612 GiB/s === strlen(1048576 bytes) === libc: 9623.4 ns/call, 101.478 GiB/s neon: 36138.2 ns/call, 27.023 GiB/s sve: 26605.6 ns/call, 36.7051 GiB/s ```
2025-10-13Revert "[libc] Implement branchless head-tail comparison for bcmp" (#162859)Guillaume Chatelet2-77/+41
Reverts llvm/llvm-project#107540 This PR demonstrated improvements on micro-benchmarks but the gains did not seem to materialize in production. We are reverting this change for now to get more data. This PR might be reintegrated later once we're more confident in its effects.
2025-10-13[libc] Use UMAXV.4S to reduce bcmp result.Peter Collingbourne1-12/+6
We can use UMAXV.4S to reduce the comparison result in a single instruction. This improves performance by roughly 4% on Apple M1: Summary bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 ran 1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.02 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 (1 = original, 2 = a variant of this patch that uses UMAXV.16B, 3 = this patch) Reviewers: michaelrj-google, gchatelet, overmighty, SchrodingerZhu Pull Request: https://github.com/llvm/llvm-project/pull/99260
2025-10-01[libc] Unify and extend no_sanitize attributes for strlen. (#161316)Alexey Samsonov3-5/+4
Fast strlen implementations (naive wide-reads, SIMD-based, and x86_64/aarch64-optimized versions) all may perform technically-out-of-bound reads, which leads to reports under ASan, HWASan (on ARM machines), and also TSan (which also has the capability to detect heap out-of-bound reads). So, we need to explicitly disable instrumentation in all three cases. Tragically, Clang didn't support `[[gnu::no_sanitize]]` syntax until recently, and since we're supporting both GCC and Clang, we have to revert to `__attribute__` syntax.
2025-09-29[libc][msvc] fix mathlib build on WoA (#161258)Schrodinger ZHU Yifan1-2/+13
Fix build errors encountered when building math library on WoA. 1. Skip FEnv equality check for MSVC 2. Provide a placeholder type for vector types.
2025-09-26[libc] Update the memory helper functions for simd types (#160174)Joseph Huber1-2/+3
Summary: This unifies the interface to just be a bunch of `load` and `store` functions that optionally accept a mask / indices for gathers and scatters with masks. I had to rename this from `load` and `store` because it conflicts with the other version in `op_generic`. I might just work around that with a trait instead.
2025-09-16[libc] Clean up mask helpers after allowing implicit conversions (#158681)Joseph Huber1-2/+2
Summary: I landed a change in clang that allows integral vectors to implicitly convert to boolean ones. This means I can simplify the interface and remove the need to cast to bool on every use. Also do some other cleanups of the traits.
2025-09-12[libc] Some MSVC compatibility changes for src/string/memory_utils. (#158393)lntue4-0/+27
2025-09-02[libc] Add more elementwise wrapper functions (#156515)Joseph Huber1-0/+1
Summary: Fills out some of the missing fundamental floating point operations. These just wrap the elementwise builtin of the same name.
2025-09-02 [libc] Implement generic SIMD helper 'simd.h' and implement strlen (#152605)Joseph Huber1-0/+53
Summary: This PR introduces a new 'simd.h' header that implements an interface similar to the proposed `stdx::simd` in C++. However, we instead wrap around the LLVM internal type. This makes heavy use of the clang vector extensions and boolean vectors, instead using primitive vector types instead of a class (many benefits to this). I use this interface to implement a generic strlen implementation, but propse we use this for math. Right now this requires a feature only introduced in clang-22.
2025-08-21Fix wide read defaultsJoseph Huber1-1/+2
2025-08-21Reapply "[libc] Enable wide-read memory operations by default on Linux ↵Joseph Huber2-15/+21
(#154602)" (#154640) Reland afterr the sanitizer and arm32 builds complained.
2025-08-20Revert "[libc] Enable wide-read memory operations by default on Linux (#154602)"Joseph Huber2-18/+14
This reverts commit c80d1483c6d787edf62ff9e86b1e97af5eb5abf9.
2025-08-20[libc] Enable wide-read memory operations by default on Linux (#154602)Joseph Huber2-14/+18
Summary: This patch changes the linux build to use the wide reads on the memory operations by default. These memory functions will now potentially read outside of the bounds explicitly allowed by the current function. While technically undefined behavior in the standard, plenty of C library implementations do this. it will not cause a segmentation fault on linux as long as you do not cross a page boundary, and because we are only *reading* memory it should not have atomic effects.
2025-08-19Add vector-based strlen implementation for x86_64 and aarch64 (#152389)Sterling-Augustine2-0/+153
These replace the default LIBC_CONF_STRING_UNSAFE_WIDE_READ implementation on x86_64 and aarch64. These are substantially faster than both the character-by-character implementation and the original unsafe_wide_read implementation. Some below I have been unable to performance-test the aarch64 version, but I suspect speedups similar to avx2. ``` Function: strlen Variant: char wide ull sse2 avx2 avx512 ============================================================================================================================================================= length=1, alignment=1: 13.18 20.47 (-55.24%) 20.21 (-53.27%) 32.50 (-146.54%) 26.05 (-97.61%) 18.03 (-36.74%) length=1, alignment=0: 12.80 34.92 (-172.89%) 20.01 (-56.39%) 17.52 (-36.86%) 17.78 (-38.92%) 18.04 (-40.94%) length=2, alignment=2: 9.91 19.02 (-91.95%) 12.64 (-27.52%) 11.06 (-11.59%) 9.48 ( 4.38%) 9.48 ( 4.34%) length=2, alignment=0: 9.56 26.88 (-181.24%) 12.64 (-32.31%) 11.06 (-15.73%) 11.06 (-15.72%) 11.83 (-23.80%) length=3, alignment=3: 8.31 10.45 (-25.84%) 8.28 ( 0.32%) 8.28 ( 0.36%) 6.21 ( 25.28%) 6.21 ( 25.24%) length=3, alignment=0: 8.39 14.53 (-73.20%) 8.28 ( 1.33%) 7.24 ( 13.69%) 7.56 ( 9.94%) 7.25 ( 13.65%) length=4, alignment=4: 9.84 21.76 (-121.24%) 15.55 (-58.11%) 6.57 ( 33.18%) 5.02 ( 48.98%) 6.00 ( 39.00%) length=4, alignment=0: 8.64 13.70 (-58.51%) 7.28 ( 15.73%) 6.37 ( 26.31%) 6.36 ( 26.36%) 6.36 ( 26.36%) length=5, alignment=5: 11.85 23.81 (-100.97%) 12.17 ( -2.67%) 5.68 ( 52.09%) 4.87 ( 58.94%) 6.48 ( 45.33%) length=5, alignment=0: 11.82 13.64 (-15.42%) 7.27 ( 38.45%) 6.36 ( 46.15%) 6.37 ( 46.11%) 6.36 ( 46.14%) length=6, alignment=6: 10.50 19.37 (-84.56%) 13.64 (-29.93%) 6.54 ( 37.71%) 6.89 ( 34.35%) 9.45 ( 10.01%) length=6, alignment=0: 14.96 14.05 ( 6.04%) 6.49 ( 56.62%) 5.68 ( 62.04%) 5.68 ( 62.04%) 13.15 ( 12.05%) length=7, alignment=7: 10.97 18.02 (-64.35%) 14.59 (-33.06%) 6.36 ( 41.96%) 5.46 ( 50.25%) 5.46 ( 50.25%) length=7, alignment=0: 10.96 15.76 (-43.77%) 15.37 (-40.15%) 6.96 ( 36.51%) 5.68 ( 48.22%) 7.04 ( 35.83%) length=4, alignment=0: 8.66 13.69 (-58.02%) 7.28 ( 16.00%) 6.37 ( 26.44%) 6.37 ( 26.52%) 6.61 ( 23.74%) length=4, alignment=7: 8.87 17.35 (-95.73%) 12.18 (-37.39%) 5.68 ( 35.94%) 4.87 ( 45.11%) 6.00 ( 32.36%) length=4, alignment=2: 8.67 10.05 (-15.91%) 7.28 ( 16.01%) 7.37 ( 15.02%) 5.46 ( 37.02%) 5.47 ( 36.89%) length=2, alignment=2: 5.64 10.01 (-77.64%) 7.29 (-29.34%) 6.37 (-13.04%) 5.46 ( 3.19%) 5.46 ( 3.19%) length=8, alignment=0: 12.78 16.52 (-29.33%) 18.27 (-43.00%) 11.82 ( 7.47%) 9.83 ( 23.03%) 11.46 ( 10.27%) length=8, alignment=7: 14.24 17.30 (-21.49%) 12.16 ( 14.59%) 5.68 ( 60.14%) 4.87 ( 65.83%) 6.23 ( 56.28%) length=8, alignment=3: 12.34 26.15 (-111.98%) 12.20 ( 1.14%) 6.50 ( 47.34%) 4.87 ( 60.54%) 6.18 ( 49.94%) length=5, alignment=3: 10.95 19.74 (-80.30%) 12.17 (-11.11%) 5.68 ( 48.16%) 4.87 ( 55.56%) 5.96 ( 45.55%) length=16, alignment=0: 20.33 29.29 (-44.08%) 36.18 (-77.97%) 5.68 ( 72.06%) 5.68 ( 72.08%) 10.60 ( 47.86%) length=16, alignment=7: 19.29 17.52 ( 9.16%) 12.98 ( 32.73%) 7.05 ( 63.47%) 4.87 ( 74.75%) 6.23 ( 67.71%) length=16, alignment=4: 20.54 25.18 (-22.56%) 15.42 ( 24.92%) 7.31 ( 64.43%) 4.87 ( 76.29%) 5.98 ( 70.88%) length=10, alignment=4: 14.59 21.26 (-45.71%) 12.17 ( 16.58%) 5.68 ( 61.07%) 4.87 ( 66.65%) 6.00 ( 58.91%) length=32, alignment=0: 35.46 22.00 ( 37.95%) 16.22 ( 54.26%) 7.32 ( 79.35%) 5.68 ( 83.98%) 7.01 ( 80.22%) length=32, alignment=7: 35.23 24.14 ( 31.48%) 16.22 ( 53.96%) 7.30 ( 79.28%) 8.76 ( 75.12%) 6.14 ( 82.58%) length=32, alignment=5: 35.16 28.56 ( 18.76%) 16.22 ( 53.87%) 7.30 ( 79.23%) 6.77 ( 80.75%) 9.82 ( 72.07%) length=21, alignment=5: 26.47 27.66 ( -4.49%) 15.04 ( 43.17%) 6.90 ( 73.95%) 4.87 ( 81.60%) 6.04 ( 77.18%) length=64, alignment=0: 66.45 25.16 ( 62.14%) 22.70 ( 65.83%) 12.99 ( 80.44%) 7.47 ( 88.77%) 8.70 ( 86.90%) length=64, alignment=7: 64.75 27.78 ( 57.10%) 22.72 ( 64.91%) 10.85 ( 83.25%) 7.46 ( 88.48%) 8.68 ( 86.60%) length=64, alignment=6: 67.26 28.58 ( 57.51%) 22.70 ( 66.24%) 11.26 ( 83.25%) 9.46 ( 85.94%) 13.90 ( 79.33%) length=42, alignment=6: 73.42 27.97 ( 61.91%) 19.46 ( 73.49%) 8.92 ( 87.84%) 6.49 ( 91.16%) 6.00 ( 91.83%) length=128, alignment=0: 172.07 39.18 ( 77.23%) 35.68 ( 79.26%) 13.02 ( 92.43%) 12.98 ( 92.46%) 9.76 ( 94.33%) length=128, alignment=7: 163.98 43.79 ( 73.30%) 36.03 ( 78.03%) 15.68 ( 90.44%) 11.35 ( 93.08%) 10.51 ( 93.59%) length=128, alignment=7: 185.86 40.27 ( 78.33%) 36.04 ( 80.61%) 13.78 ( 92.58%) 11.35 ( 93.89%) 10.49 ( 94.36%) length=85, alignment=7: 121.61 55.66 ( 54.23%) 32.34 ( 73.40%) 13.88 ( 88.59%) 7.30 ( 94.00%) 8.72 ( 92.83%) length=256, alignment=0: 295.54 66.48 ( 77.50%) 61.63 ( 79.15%) 19.54 ( 93.39%) 12.97 ( 95.61%) 12.45 ( 95.79%) length=256, alignment=7: 308.06 78.92 ( 74.38%) 61.63 ( 80.00%) 22.90 ( 92.57%) 12.97 ( 95.79%) 13.23 ( 95.71%) length=256, alignment=8: 295.32 65.83 ( 77.71%) 61.62 ( 79.13%) 23.19 ( 92.15%) 12.97 ( 95.61%) 13.50 ( 95.43%) length=170, alignment=8: 234.39 48.79 ( 79.18%) 43.79 ( 81.32%) 16.22 ( 93.08%) 13.97 ( 94.04%) 10.48 ( 95.53%) length=512, alignment=0: 563.75 116.89 ( 79.27%) 114.99 ( 79.60%) 62.71 ( 88.88%) 19.58 ( 96.53%) 17.76 ( 96.85%) length=512, alignment=7: 580.53 120.91 ( 79.17%) 114.47 ( 80.28%) 37.75 ( 93.50%) 19.55 ( 96.63%) 18.68 ( 96.78%) length=512, alignment=9: 584.05 128.35 ( 78.02%) 114.74 ( 80.35%) 39.09 ( 93.31%) 19.76 ( 96.62%) 18.71 ( 96.80%) length=341, alignment=9: 405.84 90.87 ( 77.61%) 78.79 ( 80.59%) 28.77 ( 92.91%) 14.60 ( 96.40%) 14.15 ( 96.51%) length=1024, alignment=0: 1143.61 247.03 ( 78.40%) 243.70 ( 78.69%) 75.59 ( 93.39%) 67.02 ( 94.14%) 28.99 ( 97.46%) length=1024, alignment=7: 1124.55 267.87 ( 76.18%) 259.16 ( 76.95%) 64.96 ( 94.22%) 33.05 ( 97.06%) 30.91 ( 97.25%) length=1024, alignment=10: 1459.58 257.79 ( 82.34%) 239.91 ( 83.56%) 65.00 ( 95.55%) 33.10 ( 97.73%) 30.33 ( 97.92%) length=682, alignment=10: 732.89 163.67 ( 77.67%) 170.54 ( 76.73%) 46.48 ( 93.66%) 24.32 ( 96.68%) 21.44 ( 97.07%) length=2048, alignment=0: 2141.96 451.61 ( 78.92%) 448.00 ( 79.08%) 133.24 ( 93.78%) 61.22 ( 97.14%) 80.08 ( 96.26%) length=2048, alignment=7: 2145.05 458.26 ( 78.64%) 449.99 ( 79.02%) 140.19 ( 93.46%) 60.26 ( 97.19%) 51.71 ( 97.59%) length=2048, alignment=11: 2162.61 463.37 ( 78.57%) 448.07 ( 79.28%) 140.29 ( 93.51%) 59.51 ( 97.25%) 51.59 ( 97.61%) length=1365, alignment=11: 1439.74 322.86 ( 77.58%) 310.84 ( 78.41%) 116.08 ( 91.94%) 42.43 ( 97.05%) 36.15 ( 97.49%) length=4096, alignment=0: 4278.68 871.60 ( 79.63%) 865.25 ( 79.78%) 252.50 ( 94.10%) 161.17 ( 96.23%) 94.97 ( 97.78%) length=4096, alignment=7: 4253.01 871.62 ( 79.51%) 864.21 ( 79.68%) 243.90 ( 94.27%) 171.17 ( 95.98%) 95.14 ( 97.76%) length=4096, alignment=12: 4252.18 879.66 ( 79.31%) 863.68 ( 79.69%) 244.26 ( 94.26%) 185.36 ( 95.64%) 93.61 ( 97.80%) length=2730, alignment=12: 2868.22 597.65 ( 79.16%) 586.22 ( 79.56%) 175.09 ( 93.90%) 120.35 ( 95.80%) 101.35 ( 96.47%) length=0, alignment=0: 4.87 8.11 (-66.73%) 6.49 (-33.34%) 5.80 (-19.26%) 5.68 (-16.67%) 6.86 (-40.91%) length=32, alignment=0: 33.82 22.36 ( 33.89%) 17.03 ( 49.66%) 7.30 ( 78.42%) 5.68 ( 83.22%) 7.50 ( 77.83%) length=64, alignment=0: 66.20 26.76 ( 59.58%) 23.22 ( 64.93%) 12.99 ( 80.37%) 7.34 ( 88.92%) 8.44 ( 87.25%) length=96, alignment=0: 130.26 31.62 ( 75.72%) 30.00 ( 76.97%) 11.39 ( 91.26%) 10.54 ( 91.91%) 8.68 ( 93.34%) length=128, alignment=0: 164.66 39.05 ( 76.29%) 35.68 ( 78.33%) 13.07 ( 92.07%) 12.97 ( 92.12%) 9.59 ( 94.18%) length=160, alignment=0: 196.63 45.18 ( 77.02%) 42.16 ( 78.56%) 14.65 ( 92.55%) 10.87 ( 94.47%) 9.31 ( 95.27%) length=192, alignment=0: 225.50 52.71 ( 76.63%) 49.61 ( 78.00%) 16.22 ( 92.81%) 11.36 ( 94.96%) 11.08 ( 95.09%) length=224, alignment=0: 261.08 57.57 ( 77.95%) 55.82 ( 78.62%) 17.84 ( 93.17%) 12.16 ( 95.34%) 11.51 ( 95.59%) length=256, alignment=0: 295.13 65.56 ( 77.79%) 62.59 ( 78.79%) 19.46 ( 93.41%) 13.12 ( 95.56%) 12.33 ( 95.82%) length=288, alignment=0: 325.69 72.16 ( 77.84%) 69.20 ( 78.75%) 21.08 ( 93.53%) 13.94 ( 95.72%) 12.32 ( 96.22%) length=320, alignment=0: 364.18 78.78 ( 78.37%) 75.69 ( 79.21%) 22.71 ( 93.77%) 14.70 ( 95.96%) 14.46 ( 96.03%) length=352, alignment=0: 391.40 84.87 ( 78.32%) 82.15 ( 79.01%) 24.50 ( 93.74%) 15.62 ( 96.01%) 14.27 ( 96.35%) length=384, alignment=0: 428.50 91.43 ( 78.66%) 88.70 ( 79.30%) 26.16 ( 93.90%) 17.29 ( 95.97%) 15.04 ( 96.49%) length=416, alignment=0: 457.30 98.23 ( 78.52%) 95.02 ( 79.22%) 27.81 ( 93.92%) 17.22 ( 96.23%) 15.05 ( 96.71%) length=448, alignment=0: 488.38 104.52 ( 78.60%) 101.87 ( 79.14%) 31.22 ( 93.61%) 18.07 ( 96.30%) 16.89 ( 96.54%) length=480, alignment=0: 526.44 109.61 ( 79.18%) 108.11 ( 79.46%) 31.11 ( 94.09%) 18.88 ( 96.41%) 17.10 ( 96.75%) length=512, alignment=0: 556.50 117.29 ( 78.92%) 113.78 ( 79.56%) 62.57 ( 88.76%) 19.88 ( 96.43%) 17.80 ( 96.80%) length=576, alignment=0: 622.17 152.93 ( 75.42%) 127.58 ( 79.49%) 39.34 ( 93.68%) 21.31 ( 96.58%) 19.99 ( 96.79%) length=640, alignment=0: 691.01 142.56 ( 79.37%) 161.78 ( 76.59%) 39.20 ( 94.33%) 22.98 ( 96.67%) 20.13 ( 97.09%) length=704, alignment=0: 756.90 156.31 ( 79.35%) 176.19 ( 76.72%) 45.03 ( 94.05%) 24.82 ( 96.72%) 22.33 ( 97.05%) length=768, alignment=0: 826.23 193.17 ( 76.62%) 188.41 ( 77.20%) 50.81 ( 93.85%) 27.46 ( 96.68%) 23.25 ( 97.19%) length=832, alignment=0: 890.17 204.81 ( 76.99%) 201.61 ( 77.35%) 53.77 ( 93.96%) 27.73 ( 96.88%) 25.06 ( 97.18%) length=896, alignment=0: 959.52 217.89 ( 77.29%) 213.86 ( 77.71%) 57.99 ( 93.96%) 29.53 ( 96.92%) 26.29 ( 97.26%) length=960, alignment=0: 1024.52 231.06 ( 77.45%) 227.05 ( 77.84%) 60.36 ( 94.11%) 32.29 ( 96.85%) 27.94 ( 97.27%) length=1024, alignment=0: 1086.71 244.17 ( 77.53%) 239.87 ( 77.93%) 64.72 ( 94.04%) 72.38 ( 93.34%) 28.72 ( 97.36%) length=1152, alignment=0: 1231.48 270.22 ( 78.06%) 266.47 ( 78.36%) 73.38 ( 94.04%) 40.24 ( 96.73%) 32.42 ( 97.37%) length=1280, alignment=0: 1349.29 295.45 ( 78.10%) 292.69 ( 78.31%) 111.80 ( 91.71%) 42.44 ( 96.85%) 34.59 ( 97.44%) length=1408, alignment=0: 1487.13 322.57 ( 78.31%) 318.18 ( 78.60%) 84.47 ( 94.32%) 44.35 ( 97.02%) 37.31 ( 97.49%) length=1536, alignment=0: 1623.52 347.98 ( 78.57%) 344.24 ( 78.80%) 108.31 ( 93.33%) 49.82 ( 96.93%) 39.94 ( 97.54%) length=1664, alignment=0: 1748.88 373.80 ( 78.63%) 370.03 ( 78.84%) 118.76 ( 93.21%) 52.89 ( 96.98%) 42.93 ( 97.55%) length=1792, alignment=0: 1886.22 399.59 ( 78.82%) 397.39 ( 78.93%) 127.32 ( 93.25%) 53.64 ( 97.16%) 45.39 ( 97.59%) length=1920, alignment=0: 2018.37 425.98 ( 78.89%) 422.31 ( 79.08%) 126.70 ( 93.72%) 57.08 ( 97.17%) 48.12 ( 97.62%) length=2048, alignment=0: 2167.09 451.70 ( 79.16%) 447.70 ( 79.34%) 141.68 ( 93.46%) 61.63 ( 97.16%) 79.06 ( 96.35%) length=2304, alignment=0: 2422.03 503.63 ( 79.21%) 502.23 ( 79.26%) 149.62 ( 93.82%) 73.10 ( 96.98%) 56.97 ( 97.65%) length=2560, alignment=0: 2678.68 556.84 ( 79.21%) 553.24 ( 79.35%) 161.06 ( 93.99%) 127.74 ( 95.23%) 58.81 ( 97.80%) length=2816, alignment=0: 2941.95 608.70 ( 79.31%) 604.03 ( 79.47%) 171.85 ( 94.16%) 87.11 ( 97.04%) 67.08 ( 97.72%) length=3072, alignment=0: 3229.89 660.14 ( 79.56%) 659.19 ( 79.59%) 183.85 ( 94.31%) 140.25 ( 95.66%) 73.01 ( 97.74%) length=3328, alignment=0: 3496.08 713.05 ( 79.60%) 710.00 ( 79.69%) 209.72 ( 94.00%) 138.78 ( 96.03%) 77.81 ( 97.77%) length=3584, alignment=0: 3756.52 766.19 ( 79.60%) 763.94 ( 79.66%) 214.16 ( 94.30%) 146.36 ( 96.10%) 83.43 ( 97.78%) length=3840, alignment=0: 4017.15 817.43 ( 79.65%) 819.77 ( 79.59%) 242.07 ( 93.97%) 164.56 ( 95.90%) 89.72 ( 97.77%) length=4096, alignment=0: 4281.59 867.87 ( 79.73%) 864.71 ( 79.80%) 243.33 ( 94.32%) 173.11 ( 95.96%) 95.65 ( 97.77%) length=4608, alignment=0: 4810.30 977.80 ( 79.67%) 985.03 ( 79.52%) 271.13 ( 94.36%) 190.62 ( 96.04%) 107.82 ( 97.76%) length=5120, alignment=0: 5380.16 1075.77 ( 80.00%) 1071.80 ( 80.08%) 294.27 ( 94.53%) 206.04 ( 96.17%) 141.90 ( 97.36%) length=5632, alignment=0: 5925.70 1195.61 ( 79.82%) 1193.68 ( 79.86%) 323.42 ( 94.54%) 223.55 ( 96.23%) 125.28 ( 97.89%) length=6144, alignment=0: 6402.20 1285.52 ( 79.92%) 1281.04 ( 79.99%) 342.68 ( 94.65%) 234.84 ( 96.33%) 167.01 ( 97.39%) length=6656, alignment=0: 6997.01 1387.32 ( 80.17%) 1384.21 ( 80.22%) 365.93 ( 94.77%) 269.89 ( 96.14%) 176.40 ( 97.48%) length=7168, alignment=0: 7454.76 1492.10 ( 79.98%) 1488.45 ( 80.03%) 391.92 ( 94.74%) 280.81 ( 96.23%) 187.73 ( 97.48%) length=7680, alignment=0: 8163.34 1608.43 ( 80.30%) 1615.98 ( 80.20%) 460.03 ( 94.36%) 299.86 ( 96.33%) 201.40 ( 97.53%) ```
2025-07-23[libc][NFC] Add stdint.h proxy header to fix dependency issue with ↵lntue4-4/+4
<stdint.h> includes. (#150303) https://github.com/llvm/llvm-project/issues/149993
2025-07-17[libc] Improve Cortex `memset` and `memcpy` functions (#149044)Guillaume Chatelet5-99/+313
The code for `memcpy` is the same as in #148204 but it fixes the build bot error by using `static_assert(cpp::always_false<decltype(access)>)` instead of `static_assert(false)` (older compilers fails on `static_assert(false)` in `constexpr` `else` bodies). The code for `memset` is new and vastly improves performance over the current byte per byte implementation. Both `memset` and `memcpy` implementations use prefetching for sizes >= 64. This lowers a bit the performance for sizes between 64 and 256 but improves throughput for greater sizes.
2025-07-16Revert "[libc][NFC] refactor Cortex `memcpy` code" (#149035)Guillaume Chatelet3-150/+98
Reverts llvm/llvm-project#148204 `libc-arm32-qemu-debian-dbg` is failing, reverting and investigating
2025-07-16[libc][NFC] refactor Cortex `memcpy` code (#148204)Guillaume Chatelet3-98/+150
This patch is in preparation for the Cortex `memset` implementation. It improves the codegen by generating a prefetch for large sizes.
2025-06-26[libc] Improve memcpy for ARM Cortex-M supporting unaligned accesses. (#144872)Guillaume Chatelet4-1/+222
This implementation has been compiled with the [pigweed toolchain](https://pigweed.dev/toolchain.html) and tested on: - Raspberry Pi Pico 2 with the following options\ `--target=armv8m.main-none-eabi` `-march=armv8m.main+fp+dsp` `-mcpu=cortex-m33` - Raspberry Pi Pico with the following options\ `--target=armv6m-none-eabi` `-march=armv6m` `-mcpu=cortex-m0+` They both compile down to a little bit more than 200 bytes and are between 2 and 10 times faster than byte per byte copies. For best performance the following options can be set in the `libc/config/baremetal/arm/config.json` ``` { "codegen": { "LIBC_CONF_KEEP_FRAME_POINTER": { "value": false } }, "general": { "LIBC_ADD_NULL_CHECKS": { "value": false } } } ```
2025-05-02[libc] Add support for string/memory_utils functions for AArch64 without HW ↵William8-45/+148
FP/SIMD (#137592) Add conditional compilation to add support for AArch64 without vector registers and/or hardware FPUs by using the generic implementation. **Context:** A few functions were hard-coded to use vector registers/hardware FPUs. This meant that libc would not compile on architectures that did not support these features. This fix falls back on the generic implementation if a feature is not supported.
2025-03-14[libc] Fix memmove macros for unreocognized targetsJoseph Huber1-2/+2
2025-03-14[libc] Default to `byte_per_byte` instead of erroring (#131340)Joseph Huber5-20/+10
Summary: Right now a lot of the memory functions error if we don't have specific handling for them. This is weird because we have a generic implementation that should just be used whenever someone hasn't written a more optimized version. This allows us to use the `libc` headers with more architectures from the `shared/` directory without worrying about it breaking.
2025-03-10[libc] Add `-Wno-sign-conversion` & re-attempt `-Wconversion` (#129811)Vinay Deshmukh1-3/+4
Relates to https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459
2025-03-05Revert "[libc] Enable -Wconversion for tests. (#127523)"Augie Fackler1-2/+2
This reverts commit 1e6e845d49a336e9da7ca6c576ec45c0b419b5f6 because it changed the 1st parameter of adjust() to be unsigned, but libc itself calls adjust() with a negative argument in align_backward() in op_generic.h.
2025-03-04[libc] Fix casts for arm32 after Wconversion (#129771)Michael Jones1-1/+2
Followup to #127523 There were some test failures on arm32 after enabling Wconversion. There were some tests that were failing due to missing casts. Also I changed BigInt's `safe_get_at` back to being signed since it needed the ability to be negative.
2025-03-04[libc] Enable -Wconversion for tests. (#127523)Vinay Deshmukh1-2/+2
Relates to: #119281
2025-02-05[libc] Fix all imports of src/string/memory_utils (#114939)Krishna Pandey22-24/+34
Fixed imports for all files *within* `libc/src/string/memory_utils`. Note: This doesn't include **all** files that need to be fixed. Fixes #86579
2024-11-25[libc] suppress string warning in case intrinsics are defined as macros ↵Schrodinger ZHU Yifan1-0/+3
(#117640)
2024-11-13[libc] Rename libc/src/__support/endian.h to endian_internal.h (#115950)Daniel Thornburgh2-2/+2
This prevents a conflict with the Linux system endian.h when built in overlay mode for CPP files in __support. This issue appeared in PR #106259.
2024-10-22[libc][x86] copy one cache line at a time to prevent the use of `rep;movsb` ↵Guillaume Chatelet1-8/+9
(#113161) When using `-mprefer-vector-width=128` with `-march=sandybridge` copying 3 cache lines in one go (192B) gets converted into `rep;movsb` which translate into a 60% hit in performance. Consecutive calls to `__builtin_memcpy_inline` (implementation behind `builtin::Memcpy::block_offset`) are not coalesced by the compiler and so calling it three times in a row generates the desired assembly. It only differs in the interleaving of the loads and stores and does not affect performance. This is needed to reland https://github.com/llvm/llvm-project/pull/108939.
2024-10-06[libc] Clean up some include in `libc`. (#110980)c8ef11-11/+10
The patch primarily cleans up some incorrect includes. The `LIBC_INLINE` macro is defined in `attributes.h`, not `config.h`. There appears to be no need to change the CMake and Bazel build files.
2024-09-06[libc] Implement branchless head-tail comparison for bcmp (#107540)Vitaly Goldshteyn2-41/+77
Binary size changes: | Bytes (cache lines) | before | after | |---------------------|----------|---------| | sse4 | 419 (7) | 288 (5) | | avx | 430 (7) | 308 (5) | | avx512f | 589 (10) | 390 (7) | Benchmarks for different CPUs using https://github.com/google/fleetbench. - indus-cascadelake ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 1.96GB/s ± 1% 2.19GB/s ± 0% +11.49% (p=0.000 n=29+24) BM_LIBC_Bcmp_Fleet_L2 1.90GB/s ± 1% 2.14GB/s ± 1% +12.68% (p=0.000 n=29+24) BM_LIBC_Bcmp_Fleet_LLC 513MB/s ± 4% 531MB/s ± 4% +3.53% (p=0.000 n=24+24) BM_LIBC_Bcmp_Fleet_Cold 452MB/s ± 3% 456MB/s ± 4% ~ (p=0.103 n=30+30) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 2.98GB/s ± 1% 3.15GB/s ± 1% +5.59% (p=0.000 n=29+30) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.86GB/s ± 1% 3.07GB/s ± 1% +7.21% (p=0.000 n=29+30) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 738MB/s ± 7% 751MB/s ± 3% +1.68% (p=0.000 n=24+25) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 643MB/s ± 3% 642MB/s ± 4% ~ (p=0.522 n=29+30) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.08GB/s ± 0% 3.25GB/s ± 0% +5.35% (p=0.000 n=28+30) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 2.97GB/s ± 1% 3.17GB/s ± 1% +6.65% (p=0.000 n=29+30) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 901MB/s ±59% 871MB/s ±36% ~ (p=0.676 n=29+27) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 686MB/s ± 4% 686MB/s ± 3% ~ (p=0.934 n=29+30) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.63GB/s ± 0% 1.80GB/s ± 1% +10.19% (p=0.000 n=29+30) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.57GB/s ± 1% 1.75GB/s ± 1% +11.46% (p=0.000 n=29+30) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 451MB/s ±61% 427MB/s ±28% ~ (p=0.469 n=29+25) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 353MB/s ± 4% 354MB/s ± 5% ~ (p=0.467 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.91GB/s ± 1% 2.10GB/s ± 1% +9.90% (p=0.000 n=29+29) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.84GB/s ± 1% 2.03GB/s ± 1% +10.63% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 491MB/s ±24% 538MB/s ±24% +9.66% (p=0.000 n=24+27) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 417MB/s ± 4% 421MB/s ± 3% ~ (p=0.063 n=30+29) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 761MB/s ± 1% 867MB/s ± 1% +14.02% (p=0.000 n=28+30) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 748MB/s ± 1% 860MB/s ± 1% +15.04% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 227MB/s ±29% 260MB/s ±64% +14.70% (p=0.000 n=26+27) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 191MB/s ± 5% +2.26% (p=0.000 n=30+30) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.48GB/s ± 1% 1.71GB/s ± 1% +15.26% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.42GB/s ± 1% 1.67GB/s ± 1% +17.68% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 412MB/s ±34% 519MB/s ±80% +25.87% (p=0.000 n=27+30) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 336MB/s ± 4% 343MB/s ± 6% +2.05% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 2.87GB/s ± 0% 3.24GB/s ± 1% +12.88% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.78GB/s ± 1% 3.20GB/s ± 1% +15.15% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 926MB/s ±43% 1227MB/s ±76% +32.53% (p=0.000 n=27+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 716MB/s ± 4% 737MB/s ± 6% +3.02% (p=0.000 n=28+29) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.54GB/s ± 1% 1.56GB/s ± 0% +1.40% (p=0.000 n=29+30) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.47GB/s ± 1% 1.52GB/s ± 1% +2.97% (p=0.000 n=27+30) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 351MB/s ±23% 436MB/s ±83% +24.04% (p=0.005 n=24+29) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 4% 282MB/s ± 4% ~ (p=0.644 n=30+30) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 824MB/s ± 1% 1048MB/s ± 1% +27.18% (p=0.000 n=29+30) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 808MB/s ± 1% 1027MB/s ± 1% +27.12% (p=0.000 n=29+29) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 317MB/s ±79% 332MB/s ±74% ~ (p=0.338 n=30+29) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 5% 212MB/s ± 5% +2.27% (p=0.000 n=30+30) ``` - indus-skylake ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.06GB/s ± 2% 2.25GB/s ± 3% +9.66% (p=0.000 n=27+24) BM_LIBC_Bcmp_Fleet_L2 1.96GB/s ± 2% 2.17GB/s ± 2% +10.61% (p=0.000 n=30+24) BM_LIBC_Bcmp_Fleet_LLC 1.18GB/s ± 6% 1.32GB/s ± 5% +12.27% (p=0.000 n=28+28) BM_LIBC_Bcmp_Fleet_Cold 456MB/s ± 2% 466MB/s ± 2% +2.22% (p=0.000 n=28+28) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.08GB/s ± 2% 3.20GB/s ± 1% +3.72% (p=0.000 n=28+22) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 2.92GB/s ± 1% 3.05GB/s ± 2% +4.49% (p=0.000 n=23+23) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.83GB/s ± 8% 1.94GB/s ± 4% +6.24% (p=0.000 n=25+27) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 654MB/s ± 2% 659MB/s ± 2% +0.76% (p=0.012 n=30+29) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.19GB/s ± 2% 3.34GB/s ± 2% +4.41% (p=0.000 n=26+23) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.05GB/s ± 2% 3.21GB/s ± 2% +5.32% (p=0.000 n=28+25) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 1.95GB/s ± 4% 2.03GB/s ±10% +3.61% (p=0.000 n=27+30) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 700MB/s ± 2% 702MB/s ± 2% ~ (p=0.150 n=30+30) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.69GB/s ± 2% 1.85GB/s ± 1% +9.31% (p=0.000 n=30+26) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.60GB/s ± 2% 1.78GB/s ± 2% +10.90% (p=0.000 n=26+27) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.01GB/s ± 5% 1.12GB/s ± 5% +11.40% (p=0.000 n=27+28) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 355MB/s ± 3% 360MB/s ± 3% +1.46% (p=0.000 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 1.98GB/s ± 2% 2.15GB/s ± 2% +8.89% (p=0.000 n=29+27) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 1.87GB/s ± 3% 2.05GB/s ± 2% +10.06% (p=0.000 n=30+26) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.19GB/s ± 4% 1.31GB/s ± 6% +9.82% (p=0.000 n=27+29) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 424MB/s ± 3% 431MB/s ± 3% +1.58% (p=0.000 n=28+30) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 849MB/s ± 2% 949MB/s ± 2% +11.84% (p=0.000 n=27+28) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 815MB/s ± 3% 913MB/s ± 3% +12.06% (p=0.000 n=29+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 512MB/s ± 9% 571MB/s ± 7% +11.40% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 187MB/s ± 3% 192MB/s ± 2% +2.56% (p=0.000 n=30+28) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.55GB/s ± 2% 1.77GB/s ± 3% +13.93% (p=0.000 n=30+28) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.47GB/s ± 2% 1.70GB/s ± 2% +15.96% (p=0.000 n=27+26) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 939MB/s ± 5% 1084MB/s ± 4% +15.36% (p=0.000 n=28+27) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 340MB/s ± 2% 347MB/s ± 3% +1.93% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.06GB/s ± 3% 3.40GB/s ± 2% +11.13% (p=0.000 n=30+28) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 2.89GB/s ± 3% 3.24GB/s ± 2% +12.20% (p=0.000 n=29+26) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.93GB/s ± 4% 2.09GB/s ±11% +8.16% (p=0.000 n=26+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 746MB/s ± 2% 762MB/s ± 2% +2.11% (p=0.000 n=30+28) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.59GB/s ± 2% 1.62GB/s ± 2% +1.72% (p=0.000 n=25+27) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.49GB/s ± 2% 1.53GB/s ± 2% +2.62% (p=0.000 n=27+29) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 852MB/s ±10% 909MB/s ± 6% +6.71% (p=0.000 n=30+29) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 283MB/s ± 3% 283MB/s ± 2% ~ (p=0.617 n=30+27) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 891MB/s ± 2% 1083MB/s ± 2% +21.64% (p=0.000 n=27+24) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 855MB/s ± 2% 1045MB/s ± 1% +22.31% (p=0.000 n=25+23) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 568MB/s ± 7% 659MB/s ± 8% +16.04% (p=0.000 n=29+30) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 207MB/s ± 2% 212MB/s ± 2% +2.31% (p=0.000 n=30+27) ``` - arcadia-rome ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.16GB/s ± 2% 2.27GB/s ± 2% +5.13% (p=0.000 n=26+30) BM_LIBC_Bcmp_Fleet_L2 2.15GB/s ± 2% 2.25GB/s ± 2% +4.64% (p=0.000 n=27+30) BM_LIBC_Bcmp_Fleet_LLC 1.73GB/s ± 3% 1.81GB/s ± 3% +4.66% (p=0.000 n=25+28) BM_LIBC_Bcmp_Fleet_Cold 494MB/s ± 1% 496MB/s ± 2% +0.45% (p=0.023 n=22+24) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.30GB/s ± 1% 3.24GB/s ± 2% -1.70% (p=0.000 n=27+30) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.23GB/s ± 2% 3.19GB/s ± 2% -1.28% (p=0.000 n=28+28) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 2.59GB/s ± 3% 2.58GB/s ± 2% -0.65% (p=0.010 n=26+26) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 720MB/s ± 1% 707MB/s ± 3% -1.75% (p=0.000 n=22+25) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.37GB/s ± 1% 3.36GB/s ± 2% ~ (p=0.102 n=28+29) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 2% 3.30GB/s ± 2% -0.51% (p=0.038 n=28+29) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.67GB/s ± 4% 2.70GB/s ± 4% +0.96% (p=0.009 n=28+27) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 755MB/s ± 1% 751MB/s ± 2% -0.57% (p=0.000 n=22+25) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.79GB/s ± 1% 1.86GB/s ± 2% +3.92% (p=0.000 n=27+29) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.77GB/s ± 2% 1.82GB/s ± 2% +2.99% (p=0.000 n=28+29) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.41GB/s ± 4% 1.47GB/s ± 3% +3.97% (p=0.000 n=28+28) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 386MB/s ± 1% 389MB/s ± 1% +0.60% (p=0.000 n=21+23) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.07GB/s ± 2% 2.17GB/s ± 2% +4.87% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.07GB/s ± 2% 2.13GB/s ± 2% +3.02% (p=0.000 n=28+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.66GB/s ± 2% 1.73GB/s ± 2% +4.08% (p=0.000 n=29+26) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 466MB/s ± 2% 469MB/s ± 3% +0.66% (p=0.001 n=22+25) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 861MB/s ± 1% 964MB/s ± 2% +11.98% (p=0.000 n=29+29) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 853MB/s ± 2% 935MB/s ± 2% +9.54% (p=0.000 n=28+29) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 707MB/s ± 3% 743MB/s ± 4% +5.08% (p=0.000 n=29+29) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 199MB/s ± 3% 199MB/s ± 2% ~ (p=0.107 n=29+25) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.65GB/s ± 1% 1.75GB/s ± 2% +6.15% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.64GB/s ± 3% 1.73GB/s ± 2% +5.37% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 1.32GB/s ± 2% 1.40GB/s ± 2% +6.21% (p=0.000 n=28+27) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 370MB/s ± 3% 371MB/s ± 2% +0.16% (p=0.008 n=29+25) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.25GB/s ± 2% 3.47GB/s ± 2% +6.74% (p=0.000 n=28+29) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.26GB/s ± 1% 3.44GB/s ± 1% +5.43% (p=0.000 n=28+29) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 2.66GB/s ± 2% 2.79GB/s ± 3% +4.90% (p=0.000 n=27+29) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 812MB/s ± 3% 799MB/s ± 2% -1.57% (p=0.000 n=29+25) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.71GB/s ± 2% 1.66GB/s ± 2% -3.14% (p=0.000 n=29+29) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.63GB/s ± 2% 1.59GB/s ± 2% -2.50% (p=0.000 n=29+28) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 1.25GB/s ± 4% 1.25GB/s ± 2% ~ (p=0.530 n=28+26) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 311MB/s ± 3% 308MB/s ± 1% ~ (p=0.127 n=29+24) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 869MB/s ± 2% 1098MB/s ± 2% +26.28% (p=0.000 n=27+29) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 873MB/s ± 2% 1075MB/s ± 1% +23.06% (p=0.000 n=27+29) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 743MB/s ± 4% 859MB/s ± 4% +15.58% (p=0.000 n=27+27) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 221MB/s ± 4% 221MB/s ± 3% +0.14% (p=0.034 n=29+25) ``` - ixion-haswell ``` name old speed new speed delta BM_LIBC_Bcmp_Fleet_L1 2.27GB/s ± 5% 2.41GB/s ± 6% +6.10% (p=0.000 n=29+28) BM_LIBC_Bcmp_Fleet_L2 2.14GB/s ± 6% 2.33GB/s ± 5% +9.21% (p=0.000 n=29+30) BM_LIBC_Bcmp_Fleet_LLC 1.30GB/s ± 9% 1.43GB/s ± 8% +9.85% (p=0.000 n=30+30) BM_LIBC_Bcmp_Fleet_Cold 475MB/s ± 6% 475MB/s ± 5% ~ (p=0.839 n=30+29) BM_LIBC_Bcmp_0_L1 [Bcmp_0] 3.38GB/s ± 7% 3.46GB/s ± 6% +2.35% (p=0.009 n=30+29) BM_LIBC_Bcmp_0_L2 [Bcmp_0] 3.20GB/s ± 5% 3.32GB/s ± 6% +3.52% (p=0.000 n=28+30) BM_LIBC_Bcmp_0_LLC [Bcmp_0] 1.88GB/s ± 9% 2.00GB/s ± 6% +6.63% (p=0.000 n=30+28) BM_LIBC_Bcmp_0_Cold [Bcmp_0] 664MB/s ± 6% 655MB/s ± 6% -1.32% (p=0.025 n=30+30) BM_LIBC_Bcmp_1_L1 [Bcmp_1] 3.50GB/s ± 8% 3.61GB/s ±10% +3.09% (p=0.001 n=29+30) BM_LIBC_Bcmp_1_L2 [Bcmp_1] 3.32GB/s ± 7% 3.48GB/s ± 8% +4.89% (p=0.000 n=29+30) BM_LIBC_Bcmp_1_LLC [Bcmp_1] 2.02GB/s ± 7% 2.14GB/s ± 9% +5.82% (p=0.000 n=28+29) BM_LIBC_Bcmp_1_Cold [Bcmp_1] 716MB/s ± 6% 709MB/s ± 5% -0.97% (p=0.040 n=30+28) BM_LIBC_Bcmp_2_L1 [Bcmp_2] 1.83GB/s ± 7% 1.97GB/s ± 8% +7.90% (p=0.000 n=30+30) BM_LIBC_Bcmp_2_L2 [Bcmp_2] 1.74GB/s ± 6% 1.92GB/s ± 6% +10.29% (p=0.000 n=30+29) BM_LIBC_Bcmp_2_LLC [Bcmp_2] 1.05GB/s ± 9% 1.15GB/s ± 9% +9.73% (p=0.000 n=30+30) BM_LIBC_Bcmp_2_Cold [Bcmp_2] 379MB/s ± 6% 372MB/s ± 6% -1.74% (p=0.012 n=30+30) BM_LIBC_Bcmp_3_L1 [Bcmp_3] 2.17GB/s ± 5% 2.29GB/s ± 6% +5.61% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_L2 [Bcmp_3] 2.02GB/s ± 6% 2.20GB/s ± 6% +8.75% (p=0.000 n=29+30) BM_LIBC_Bcmp_3_LLC [Bcmp_3] 1.22GB/s ± 8% 1.34GB/s ± 9% +9.19% (p=0.000 n=30+30) BM_LIBC_Bcmp_3_Cold [Bcmp_3] 447MB/s ± 3% 441MB/s ± 7% -1.40% (p=0.033 n=30+30) BM_LIBC_Bcmp_4_L1 [Bcmp_4] 902MB/s ± 6% 995MB/s ±10% +10.37% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_L2 [Bcmp_4] 863MB/s ± 5% 945MB/s ±11% +9.50% (p=0.000 n=29+30) BM_LIBC_Bcmp_4_LLC [Bcmp_4] 528MB/s ±11% 559MB/s ±12% +5.75% (p=0.000 n=30+30) BM_LIBC_Bcmp_4_Cold [Bcmp_4] 183MB/s ± 4% 181MB/s ± 7% ~ (p=0.088 n=28+30) BM_LIBC_Bcmp_5_L1 [Bcmp_5] 1.70GB/s ± 6% 1.87GB/s ± 8% +10.14% (p=0.000 n=29+29) BM_LIBC_Bcmp_5_L2 [Bcmp_5] 1.60GB/s ± 5% 1.80GB/s ± 9% +12.61% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_LLC [Bcmp_5] 994MB/s ±13% 1094MB/s ± 8% +10.10% (p=0.000 n=29+30) BM_LIBC_Bcmp_5_Cold [Bcmp_5] 362MB/s ± 6% 358MB/s ± 7% ~ (p=0.123 n=30+30) BM_LIBC_Bcmp_6_L1 [Bcmp_6] 3.31GB/s ± 5% 3.67GB/s ± 6% +10.90% (p=0.000 n=28+30) BM_LIBC_Bcmp_6_L2 [Bcmp_6] 3.11GB/s ± 5% 3.53GB/s ± 5% +13.59% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_LLC [Bcmp_6] 1.98GB/s ± 9% 2.18GB/s ± 8% +10.34% (p=0.000 n=30+30) BM_LIBC_Bcmp_6_Cold [Bcmp_6] 754MB/s ± 5% 752MB/s ± 5% ~ (p=0.592 n=30+30) BM_LIBC_Bcmp_7_L1 [Bcmp_7] 1.72GB/s ± 5% 1.72GB/s ± 6% ~ (p=0.549 n=29+29) BM_LIBC_Bcmp_7_L2 [Bcmp_7] 1.61GB/s ± 7% 1.63GB/s ± 8% ~ (p=0.191 n=30+29) BM_LIBC_Bcmp_7_LLC [Bcmp_7] 913MB/s ± 8% 905MB/s ± 9% ~ (p=0.423 n=30+30) BM_LIBC_Bcmp_7_Cold [Bcmp_7] 304MB/s ± 6% 287MB/s ± 4% -5.57% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_L1 [Bcmp_8] 961MB/s ± 5% 1124MB/s ± 6% +16.94% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_L2 [Bcmp_8] 915MB/s ± 8% 1100MB/s ± 7% +20.16% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_LLC [Bcmp_8] 593MB/s ± 8% 669MB/s ± 8% +12.92% (p=0.000 n=30+30) BM_LIBC_Bcmp_8_Cold [Bcmp_8] 220MB/s ± 4% 220MB/s ± 6% ~ (p=0.572 n=30+30) ``` Co-authored-by: goldvitaly@google.com <%username%@google.com>
2024-08-29[libc][x86] Use prefetch for write for memcpy (#90450)Guillaume Chatelet1-13/+20
Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we prefetch memory for read on the source buffer. This patch adds prefetch for write on the destination buffer.
2024-07-12[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597)Petr Hosek33-73/+109
This is a part of #97655.
2024-07-12Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace ↵Mehdi Amini33-109/+73
declaration" (#98593) Reverts llvm/llvm-project#98075 bots are broken
2024-07-11[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075)Petr Hosek33-73/+109
This is a part of #97655.
2024-05-31[libc][NFC] Allow compilation of `memcpy` with `-m32` (#93790)Guillaume Chatelet2-3/+3
Needed to support i386 (#93709).
2024-05-14[libc][bug] Fix out of bound write in memcpy w/ software prefetching (#90591)Guillaume Chatelet1-2/+14
This patch adds tests for `memcpy` and `memset` making sure that we don't access buffers out of bounds. It relies on POSIX `mmap` / `mprotect` and works only when FULL_BUILD_MODE is disabled. The bug showed up while enabling software prefetching. `loop_and_tail_offset` is always running at least one iteration but in some configurations loop unrolled prefetching was actually needing only the tail operation and no loop iterations at all.
2024-03-27[libc] Remove obsolete LIBC_HAS_BUILTIN macro (#86554)Marc Auberer2-7/+6
Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was necessary to support older compilers that did not support `__has_builtin`. All of the compilers we support already have this builtin. See: https://libc.llvm.org/compiler_support.html All uses now use `__has_builtin` directly cc @nickdesaulniers
2024-03-09[libc] Provide `LIBC_TYPES_HAS_INT64` (#83441)Guillaume Chatelet1-6/+3
Umbrella bug #83182
2024-03-05[libc] suppress readability-identifier-naming for std::numeric_limits ↵Nick Desaulniers1-0/+13
interfaces (#83921) These templates are made to match the ergonomics of std::numeric_limits. Because our style for constexpr variables is ALL_CAPS, we must silence the linter for these manually. Link: https://clang.llvm.org/extra/clang-tidy/#suppressing-undesired-diagnostics
2024-03-05[libc] fix readability-identifier-naming in memory_utils/utils.h (#83919)Nick Desaulniers2-8/+5
Fixes: libc/src/string/memory_utils/utils.h:345:13: warning: invalid case style for member 'offset_' [readability-identifier-naming] Having a trailing underscore for members is a google3 style, not LLVM style. Removing the underscore is insufficient, as we would then have 2 members with the same identifier which is not allowed (it is a compile time error). Remove the getter, and just access the renamed member that's now made public.
2024-03-05[libc] fix more readability-identifier-naming lints (#83914)Nick Desaulniers5-76/+78
Found via: $ ninja -k2000 libc-lint 2>&1 | grep readability-identifier-naming Auto fixed via: $ clang-tidy -p build/compile_commands.json \ -checks="-*,readability-identifier-naming" \ <filename> --fix This doesn't fix all instances, just the obvious simple cases where it makes sense to change the identifier names. Subsequent PRs will fix up the stragglers.
2024-02-28[libc] fix typo introduced in inline_bcmp_byte_per_byte (#83356)Nick Desaulniers1-1/+1
My global find+replace was overzealous and broke post submit unit tests. Link: #83345
2024-02-28[libc] fix readability-identifier-naming.ConstexprFunctionCase (#83345)Nick Desaulniers10-29/+29
Codify that we use lower_case for readability-identifier-naming.ConstexprFunctionCase and then fix the 11 violations (rather than codify UPPER_CASE and have to fix the 170 violations).
2024-02-28[libc] fix clang-tidy llvm-header-guard warnings (#82679)Nick Desaulniers2-6/+6
Towards the goal of getting `ninja libc-lint` back to green, fix the numerous instances of: warning: header guard does not follow preferred style [llvm-header-guard] This is because many of our header guards start with `__LLVM` rather than `LLVM`. To filter just these warnings: $ ninja -k2000 libc-lint 2>&1 | grep llvm-header-guard To automatically apply fixits: $ find libc/src libc/include libc/test -name \*.h | \ xargs -n1 -I {} clang-tidy {} -p build/compile_commands.json \ -checks='-*,llvm-header-guard' --fix --quiet Some manual cleanup is still necessary as headers that were missing header guards outright will have them inserted before the license block (we prefer them after).