| Age | Commit message (Collapse) | Author | Files | Lines |
|
3a653afd45709432181952c0ffdb53eceb0939ae removed the inclusion of
`<__utility/declval.h>` from `<__algorithm/unwrap_range.h>`. However,
`unwrap_range.h` still needs to use `std::declval`. So we should restore
the inclusion.
The building failure with Clang modules was already caught by CI.
|
|
(#177853)
Part of https://github.com/llvm/llvm-project/issues/102817.
This patch attempts to optimize the performance of
`ranges::fold_left_with_iter` for segmented iterators.
- before
```
# | rng::fold_left(vector<int>)/8 2.78 ns 2.78 ns 241953718
# | rng::fold_left(vector<int>)/32 12.2 ns 12.2 ns 57579851
# | rng::fold_left(vector<int>)/50 19.2 ns 19.2 ns 36487764
# | rng::fold_left(vector<int>)/8192 3226 ns 3226 ns 216811
# | rng::fold_left(vector<int>)/1048576 441842 ns 441839 ns 1592
# | rng::fold_left(deque<int>)/8 2.83 ns 2.83 ns 243888678
# | rng::fold_left(deque<int>)/32 16.6 ns 16.6 ns 42297458
# | rng::fold_left(deque<int>)/50 22.3 ns 22.3 ns 31387998
# | rng::fold_left(deque<int>)/8192 2492 ns 2492 ns 281637
# | rng::fold_left(deque<int>)/1048576 324936 ns 324936 ns 2154
# | rng::fold_left(list<int>)/8 2.54 ns 2.54 ns 275946635
# | rng::fold_left(list<int>)/32 16.2 ns 16.2 ns 42901634
# | rng::fold_left(list<int>)/50 54.7 ns 54.7 ns 12767450
# | rng::fold_left(list<int>)/8192 15154 ns 15154 ns 56744
# | rng::fold_left(list<int>)/1048576 4976906 ns 4976867 ns 158
```
- after
```
# | rng::fold_left(vector<int>)/8 2.74 ns 2.74 ns 255954900
# | rng::fold_left(vector<int>)/32 12.1 ns 12.1 ns 57843462
# | rng::fold_left(vector<int>)/50 19.2 ns 19.2 ns 36422594
# | rng::fold_left(vector<int>)/8192 3202 ns 3202 ns 218265
# | rng::fold_left(vector<int>)/1048576 435718 ns 435709 ns 1609
# | rng::fold_left(deque<int>)/8 2.52 ns 2.52 ns 277288254
# | rng::fold_left(deque<int>)/32 14.1 ns 14.1 ns 52244463
# | rng::fold_left(deque<int>)/50 16.2 ns 16.2 ns 43131857
# | rng::fold_left(deque<int>)/8192 1695 ns 1695 ns 415620
# | rng::fold_left(deque<int>)/1048576 277729 ns 277731 ns 2532
# | rng::fold_left(list<int>)/8 2.55 ns 2.55 ns 277025050
# | rng::fold_left(list<int>)/32 16.2 ns 16.2 ns 43058857
# | rng::fold_left(list<int>)/50 54.7 ns 54.7 ns 12705516
# | rng::fold_left(list<int>)/8192 15236 ns 15235 ns 56840
# | rng::fold_left(list<int>)/1048576 4827263 ns 4827147 ns 152
```
|
|
We currently have a custom utility `__reverse_range`, which does
basically the same thing as `views::reverse` and the only place where we
use it is in `ranges::reverse_copy`. Instead of this special utility, we
can simply use `views::reverse`.
This has originally been introduced due to compile time concerns.
However, there doesn't seem to actually be a significant compile time
regression overall.
|
|
This is necessary when optimizing algorithms for segmented iterators to
reduce boilerplate code.
related:
-
https://github.com/llvm/llvm-project/pull/177853#discussion_r2754820322
-
https://github.com/llvm/llvm-project/pull/164266#discussion_r2447129525
|
|
We can use a relatively simple `if constexpr` chain instead of SFINAE
and class template specialization, making the functions much simpler to
understand.
|
|
vector<bool>::iterator specialization (#173384)
|
|
This PR implements a generic backend-agnostic parallel `std::is_sorted`
based on `std::transform_reduce`.
While this approach is suboptimal comparing a direct backend-specific
implementation, since it doesn't support early termination and requires
a reduction operation, it does show speedup when the dataset is large
enough and the comparator is not absolutely trivial.
Parent issue: #99938
|
|
possible (#176762)
This allows us to make use of any optimizations to `std::swap_ranges` in
`ranges::swap_ranges`.
This patch also moves some code specific to `ranges::swap_ranges` into
`ranges_swap_ranges.h`.
|
|
We don't need to use `std::distance`, since we know for a fact that we
have random access iterators in that place. Instead, we can just
subtract the iterators, avoiding a bunch of template machinery and
imrpoving compile times a bit.
|
|
Implement the `ranges::shift_left` algorithm from
[P2440R1](https://wg21.link/P2440R1).
Closes: #134061
---------
Co-authored-by: Hui Xie <hui.xie1990@gmail.com>
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
|
|
#175913 removed that `__builtin_assume_dereferenceable(ptr, 0)` implies
`ptr != nullptr`, which should allow us to use the builtin with LLVM 23.
This reverts commit 776c09c212e945fdceeae240b42c38df3dd34727.
|
|
|
|
`__builtin_assume_dereferenceable` currently implies that the pointer
given to it is non-null, even if the size is zero. This causes
miscompilations, since we consider [nullptr, nullptr) to be a valid
range.
Reverts llvm/llvm-project#167697
|
|
`__unwrap_iter` doesn't need to SFINAE away, so we can just check inside
the function body whether an iterator is copy constructible. This
reduces the overload set, improving compile times a bit.
|
|
```
Benchmark 4ecfaa602f56 80d5ac247d34 Difference % Difference
---------------------------- -------------- -------------- ------------ --------------
bm_find_if_autovectorization 1901.51 306.12 -1595.39 -83.90%
```
|
|
This changes the algorithm to more efficiently skip ranges which cannot
match the needle for random access iterators. Specifically, we now
search for a mismatching element from the back of the subrange we want
to check. When a mismatch occurs we can directly start one after the
mismatched element, since there cannot possibly be a matching subrange
starting between the start of the subrange we checked and the mismatched
element (since all elements have to be equal). The algorithm also
remembers the subrange which was already match as being equal and
doesn't try to compare it a second time, reducing the time spent in case
of a match.
```
Benchmark old new Difference % Difference
--------------------------------------------------- -------------- -------------- ------------ --------------
rng::search_n(deque<int>)_(no_match)/1000 458.33 14.22 -444.11 -96.90%
rng::search_n(deque<int>)_(no_match)/1024 456.17 13.89 -442.28 -96.95%
rng::search_n(deque<int>)_(no_match)/1048576 453420.38 17.69 -453402.69 -100.00%
rng::search_n(deque<int>)_(no_match)/8192 3566.08 17.60 -3548.49 -99.51%
rng::search_n(deque<int>,_pred)_(no_match)/1000 597.88 15.25 -582.63 -97.45%
rng::search_n(deque<int>,_pred)_(no_match)/1024 608.42 15.39 -593.03 -97.47%
rng::search_n(deque<int>,_pred)_(no_match)/1048576 594533.99 18.91 -594515.08 -100.00%
rng::search_n(deque<int>,_pred)_(no_match)/8192 4670.23 18.88 -4651.35 -99.60%
rng::search_n(list<int>)_(no_match)/1000 733.72 730.22 -3.50 -0.48%
rng::search_n(list<int>)_(no_match)/1024 759.93 753.10 -6.84 -0.90%
rng::search_n(list<int>)_(no_match)/1048576 833841.54 813483.75 -20357.79 -2.44%
rng::search_n(list<int>)_(no_match)/8192 8352.18 8417.31 65.14 0.78%
rng::search_n(list<int>,_pred)_(no_match)/1000 776.79 789.72 12.93 1.66%
rng::search_n(list<int>,_pred)_(no_match)/1024 788.42 806.70 18.28 2.32%
rng::search_n(list<int>,_pred)_(no_match)/1048576 955536.40 982976.81 27440.41 2.87%
rng::search_n(list<int>,_pred)_(no_match)/8192 8874.02 8915.18 41.16 0.46%
rng::search_n(vector<int>)_(no_match)/1000 212.69 3.79 -208.90 -98.22%
rng::search_n(vector<int>)_(no_match)/1024 219.67 3.70 -215.96 -98.31%
rng::search_n(vector<int>)_(no_match)/1048576 209622.54 3.67 -209618.87 -100.00%
rng::search_n(vector<int>)_(no_match)/8192 1643.80 3.83 -1639.98 -99.77%
rng::search_n(vector<int>,_pred)_(no_match)/1000 461.93 7.55 -454.38 -98.36%
rng::search_n(vector<int>,_pred)_(no_match)/1024 472.43 7.74 -464.69 -98.36%
rng::search_n(vector<int>,_pred)_(no_match)/1048576 546180.29 8.71 -546171.58 -100.00%
rng::search_n(vector<int>,_pred)_(no_match)/8192 3786.26 7.88 -3778.38 -99.79%
std::search_n(deque<int>)_(no_match)/1000 455.53 14.19 -441.34 -96.88%
std::search_n(deque<int>)_(no_match)/1024 459.79 13.98 -445.81 -96.96%
std::search_n(deque<int>)_(no_match)/1048576 449780.32 17.99 -449762.33 -100.00%
std::search_n(deque<int>)_(no_match)/8192 3508.55 17.97 -3490.58 -99.49%
std::search_n(deque<int>,_pred)_(no_match)/1000 571.53 17.16 -554.37 -97.00%
std::search_n(deque<int>,_pred)_(no_match)/1024 584.43 17.09 -567.34 -97.08%
std::search_n(deque<int>,_pred)_(no_match)/1048576 581418.31 19.16 -581399.15 -100.00%
std::search_n(deque<int>,_pred)_(no_match)/8192 4661.97 19.36 -4642.61 -99.58%
std::search_n(list<int>)_(no_match)/1000 722.45 710.39 -12.06 -1.67%
std::search_n(list<int>)_(no_match)/1024 748.50 727.08 -21.42 -2.86%
std::search_n(list<int>)_(no_match)/1048576 821655.28 784520.12 -37135.16 -4.52%
std::search_n(list<int>)_(no_match)/8192 7941.73 8002.05 60.32 0.76%
std::search_n(list<int>,_pred)_(no_match)/1000 766.59 786.31 19.72 2.57%
std::search_n(list<int>,_pred)_(no_match)/1024 785.92 804.43 18.51 2.35%
std::search_n(list<int>,_pred)_(no_match)/1048576 948252.76 969125.41 20872.65 2.20%
std::search_n(list<int>,_pred)_(no_match)/8192 8658.99 8825.71 166.72 1.93%
std::search_n(vector<int>)_(no_match)/1000 210.36 3.47 -206.89 -98.35%
std::search_n(vector<int>)_(no_match)/1024 217.60 4.13 -213.47 -98.10%
std::search_n(vector<int>)_(no_match)/1048576 209386.43 3.51 -209382.92 -100.00%
std::search_n(vector<int>)_(no_match)/8192 1643.79 3.50 -1640.29 -99.79%
std::search_n(vector<int>,_pred)_(no_match)/1000 460.88 5.44 -455.45 -98.82%
std::search_n(vector<int>,_pred)_(no_match)/1024 475.36 5.43 -469.93 -98.86%
std::search_n(vector<int>,_pred)_(no_match)/1048576 682722.75 7.15 -682715.60 -100.00%
std::search_n(vector<int>,_pred)_(no_match)/8192 3779.95 5.43 -3774.52 -99.86%
Geomean 4956.15 87.96 -4868.19 -98.23%
```
Fixes #129327
|
|
Clang provides lambdas as an extension in C++03 now, so we can use them
to simplify our code a bit.
|
|
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
- https://libcxx.llvm.org/CodingGuidelines.html
Towards #172124
|
|
This implements a new algorithm for `rotate` with random access
iterators, which uses `swap_ranges`. This reduces cache misses and
allows for vectorization.
Apple M4:
```
Benchmark old new Difference % Difference
--------------------------------------------------- -------------- -------------- ------------ --------------
rng::rotate(deque<int>)_(1_element_backward)/1024 46.17 45.13 -1.04 -2.26%
rng::rotate(deque<int>)_(1_element_backward)/32 4.90 4.92 0.02 0.45%
rng::rotate(deque<int>)_(1_element_backward)/50 6.12 6.02 -0.10 -1.56%
rng::rotate(deque<int>)_(1_element_backward)/8192 329.97 330.49 0.52 0.16%
rng::rotate(deque<int>)_(1_element_forward)/1024 42.20 42.99 0.79 1.87%
rng::rotate(deque<int>)_(1_element_forward)/32 4.99 5.26 0.27 5.37%
rng::rotate(deque<int>)_(1_element_forward)/50 6.33 6.48 0.14 2.28%
rng::rotate(deque<int>)_(1_element_forward)/8192 317.40 318.68 1.28 0.40%
rng::rotate(deque<int>)_(by_1/2)/1024 185.42 184.23 -1.18 -0.64%
rng::rotate(deque<int>)_(by_1/2)/32 8.72 8.99 0.27 3.14%
rng::rotate(deque<int>)_(by_1/2)/50 12.03 11.60 -0.43 -3.59%
rng::rotate(deque<int>)_(by_1/2)/8192 1549.94 1549.30 -0.64 -0.04%
rng::rotate(deque<int>)_(by_1/3)/1024 1886.50 409.41 -1477.09 -78.30%
rng::rotate(deque<int>)_(by_1/3)/32 46.51 21.68 -24.84 -53.40%
rng::rotate(deque<int>)_(by_1/3)/50 77.70 29.85 -47.85 -61.58%
rng::rotate(deque<int>)_(by_1/3)/8192 22814.91 3171.92 -19642.99 -86.10%
rng::rotate(deque<int>)_(by_1/4)/1024 811.39 283.28 -528.11 -65.09%
rng::rotate(deque<int>)_(by_1/4)/32 30.30 14.44 -15.86 -52.35%
rng::rotate(deque<int>)_(by_1/4)/50 76.23 29.29 -46.94 -61.58%
rng::rotate(deque<int>)_(by_1/4)/8192 7154.50 2357.48 -4797.02 -67.05%
rng::rotate(list<int>)_(1_element_backward)/1024 1483.46 742.31 -741.15 -49.96%
rng::rotate(list<int>)_(1_element_backward)/32 16.17 16.37 0.20 1.24%
rng::rotate(list<int>)_(1_element_backward)/50 29.31 29.20 -0.10 -0.34%
rng::rotate(list<int>)_(1_element_backward)/8192 8896.07 8949.56 53.49 0.60%
rng::rotate(list<int>)_(1_element_forward)/1024 1481.36 742.24 -739.12 -49.89%
rng::rotate(list<int>)_(1_element_forward)/32 15.72 16.22 0.50 3.19%
rng::rotate(list<int>)_(1_element_forward)/50 28.91 28.87 -0.03 -0.12%
rng::rotate(list<int>)_(1_element_forward)/8192 9595.63 9846.80 251.16 2.62%
rng::rotate(list<int>)_(by_1/2)/1024 833.67 408.22 -425.46 -51.03%
rng::rotate(list<int>)_(by_1/2)/32 8.88 8.76 -0.11 -1.25%
rng::rotate(list<int>)_(by_1/2)/50 15.67 15.73 0.06 0.40%
rng::rotate(list<int>)_(by_1/2)/8192 6392.01 6597.30 205.30 3.21%
rng::rotate(list<int>)_(by_1/3)/1024 1360.66 872.00 -488.66 -35.91%
rng::rotate(list<int>)_(by_1/3)/32 19.72 19.37 -0.35 -1.80%
rng::rotate(list<int>)_(by_1/3)/50 33.15 33.12 -0.04 -0.12%
rng::rotate(list<int>)_(by_1/3)/8192 10807.14 11216.71 409.58 3.79%
rng::rotate(list<int>)_(by_1/4)/1024 624.92 625.80 0.88 0.14%
rng::rotate(list<int>)_(by_1/4)/32 15.28 15.37 0.09 0.60%
rng::rotate(list<int>)_(by_1/4)/50 32.27 32.78 0.52 1.60%
rng::rotate(list<int>)_(by_1/4)/8192 11266.66 9594.79 -1671.87 -14.84%
rng::rotate(vector<bool>)_(1_element_backward)/1024 32.10 31.94 -0.16 -0.51%
rng::rotate(vector<bool>)_(1_element_backward)/32 18.48 18.07 -0.41 -2.21%
rng::rotate(vector<bool>)_(1_element_backward)/50 18.49 18.19 -0.30 -1.64%
rng::rotate(vector<bool>)_(1_element_backward)/8192 113.76 112.91 -0.85 -0.75%
rng::rotate(vector<bool>)_(1_element_forward)/1024 30.87 30.72 -0.15 -0.48%
rng::rotate(vector<bool>)_(1_element_forward)/32 18.16 17.98 -0.18 -1.01%
rng::rotate(vector<bool>)_(1_element_forward)/50 17.98 17.98 0.01 0.03%
rng::rotate(vector<bool>)_(1_element_forward)/8192 111.84 111.50 -0.34 -0.31%
rng::rotate(vector<bool>)_(by_1/2)/1024 9.27 9.34 0.07 0.75%
rng::rotate(vector<bool>)_(by_1/2)/32 18.32 17.90 -0.42 -2.29%
rng::rotate(vector<bool>)_(by_1/2)/50 18.14 17.69 -0.45 -2.49%
rng::rotate(vector<bool>)_(by_1/2)/8192 16.23 16.80 0.57 3.49%
rng::rotate(vector<bool>)_(by_1/3)/1024 58.89 58.74 -0.15 -0.25%
rng::rotate(vector<bool>)_(by_1/3)/32 18.16 17.74 -0.41 -2.28%
rng::rotate(vector<bool>)_(by_1/3)/50 18.15 17.72 -0.43 -2.38%
rng::rotate(vector<bool>)_(by_1/3)/8192 164.52 166.17 1.65 1.00%
rng::rotate(vector<bool>)_(by_1/4)/1024 13.44 14.20 0.76 5.65%
rng::rotate(vector<bool>)_(by_1/4)/32 18.38 17.86 -0.52 -2.84%
rng::rotate(vector<bool>)_(by_1/4)/50 18.26 17.75 -0.51 -2.79%
rng::rotate(vector<bool>)_(by_1/4)/8192 34.49 34.59 0.11 0.31%
rng::rotate(vector<int>)_(1_element_backward)/1024 37.22 37.55 0.33 0.88%
rng::rotate(vector<int>)_(1_element_backward)/32 3.02 3.01 -0.00 -0.10%
rng::rotate(vector<int>)_(1_element_backward)/50 5.31 5.33 0.02 0.31%
rng::rotate(vector<int>)_(1_element_backward)/8192 302.96 295.02 -7.94 -2.62%
rng::rotate(vector<int>)_(1_element_forward)/1024 36.78 36.97 0.19 0.51%
rng::rotate(vector<int>)_(1_element_forward)/32 3.08 3.19 0.11 3.56%
rng::rotate(vector<int>)_(1_element_forward)/50 5.33 5.33 -0.00 -0.05%
rng::rotate(vector<int>)_(1_element_forward)/8192 288.25 285.28 -2.97 -1.03%
rng::rotate(vector<int>)_(by_1/2)/1024 32.49 32.13 -0.36 -1.10%
rng::rotate(vector<int>)_(by_1/2)/32 3.80 2.63 -1.17 -30.82%
rng::rotate(vector<int>)_(by_1/2)/50 5.29 3.69 -1.59 -30.13%
rng::rotate(vector<int>)_(by_1/2)/8192 256.13 252.05 -4.07 -1.59%
rng::rotate(vector<int>)_(by_1/3)/1024 1389.57 135.85 -1253.72 -90.22%
rng::rotate(vector<int>)_(by_1/3)/32 21.73 11.99 -9.73 -44.80%
rng::rotate(vector<int>)_(by_1/3)/50 40.23 11.88 -28.35 -70.48%
rng::rotate(vector<int>)_(by_1/3)/8192 11040.23 946.51 -10093.72 -91.43%
rng::rotate(vector<int>)_(by_1/4)/1024 335.13 49.05 -286.08 -85.36%
rng::rotate(vector<int>)_(by_1/4)/32 12.32 6.06 -6.26 -50.83%
rng::rotate(vector<int>)_(by_1/4)/50 40.54 12.54 -28.00 -69.07%
rng::rotate(vector<int>)_(by_1/4)/8192 2648.84 391.91 -2256.93 -85.20%
std::rotate(deque<int>)_(1_element_backward)/1024 45.81 45.88 0.08 0.16%
std::rotate(deque<int>)_(1_element_backward)/32 4.81 4.85 0.04 0.76%
std::rotate(deque<int>)_(1_element_backward)/50 6.11 6.10 -0.01 -0.13%
std::rotate(deque<int>)_(1_element_backward)/8192 329.99 330.63 0.64 0.19%
std::rotate(deque<int>)_(1_element_forward)/1024 42.45 42.21 -0.24 -0.57%
std::rotate(deque<int>)_(1_element_forward)/32 5.03 5.14 0.11 2.23%
std::rotate(deque<int>)_(1_element_forward)/50 5.96 6.04 0.09 1.43%
std::rotate(deque<int>)_(1_element_forward)/8192 316.09 317.73 1.64 0.52%
std::rotate(deque<int>)_(by_1/2)/1024 185.34 185.41 0.07 0.04%
std::rotate(deque<int>)_(by_1/2)/32 8.08 8.57 0.49 6.01%
std::rotate(deque<int>)_(by_1/2)/50 11.13 11.61 0.48 4.35%
std::rotate(deque<int>)_(by_1/2)/8192 1549.74 1527.97 -21.77 -1.40%
std::rotate(deque<int>)_(by_1/3)/1024 1837.15 410.26 -1426.89 -77.67%
std::rotate(deque<int>)_(by_1/3)/32 46.73 22.85 -23.88 -51.11%
std::rotate(deque<int>)_(by_1/3)/50 78.05 29.61 -48.43 -62.06%
std::rotate(deque<int>)_(by_1/3)/8192 22784.08 3174.02 -19610.06 -86.07%
std::rotate(deque<int>)_(by_1/4)/1024 874.13 282.91 -591.22 -67.64%
std::rotate(deque<int>)_(by_1/4)/32 31.14 14.31 -16.83 -54.05%
std::rotate(deque<int>)_(by_1/4)/50 77.39 28.67 -48.72 -62.96%
std::rotate(deque<int>)_(by_1/4)/8192 7207.73 2344.73 -4863.00 -67.47%
std::rotate(list<int>)_(1_element_backward)/1024 1490.45 745.80 -744.65 -49.96%
std::rotate(list<int>)_(1_element_backward)/32 16.15 16.43 0.28 1.71%
std::rotate(list<int>)_(1_element_backward)/50 29.28 29.29 0.01 0.02%
std::rotate(list<int>)_(1_element_backward)/8192 9305.23 10372.35 1067.13 11.47%
std::rotate(list<int>)_(1_element_forward)/1024 1479.13 743.71 -735.42 -49.72%
std::rotate(list<int>)_(1_element_forward)/32 15.86 16.21 0.35 2.20%
std::rotate(list<int>)_(1_element_forward)/50 28.91 28.90 -0.01 -0.04%
std::rotate(list<int>)_(1_element_forward)/8192 9796.34 8826.58 -969.76 -9.90%
std::rotate(list<int>)_(by_1/2)/1024 851.06 411.44 -439.61 -51.66%
std::rotate(list<int>)_(by_1/2)/32 8.90 8.80 -0.10 -1.10%
std::rotate(list<int>)_(by_1/2)/50 15.69 15.69 0.00 0.02%
std::rotate(list<int>)_(by_1/2)/8192 6479.34 6599.58 120.24 1.86%
std::rotate(list<int>)_(by_1/3)/1024 865.82 877.81 11.99 1.38%
std::rotate(list<int>)_(by_1/3)/32 19.52 19.44 -0.09 -0.44%
std::rotate(list<int>)_(by_1/3)/50 33.67 33.14 -0.53 -1.59%
std::rotate(list<int>)_(by_1/3)/8192 10513.57 10355.02 -158.55 -1.51%
std::rotate(list<int>)_(by_1/4)/1024 639.35 655.30 15.95 2.49%
std::rotate(list<int>)_(by_1/4)/32 15.22 15.45 0.24 1.55%
std::rotate(list<int>)_(by_1/4)/50 32.51 32.93 0.42 1.29%
std::rotate(list<int>)_(by_1/4)/8192 9883.06 9284.28 -598.79 -6.06%
std::rotate(vector<bool>)_(1_element_backward)/1024 30.77 30.95 0.18 0.58%
std::rotate(vector<bool>)_(1_element_backward)/32 17.73 17.57 -0.16 -0.89%
std::rotate(vector<bool>)_(1_element_backward)/50 17.74 17.58 -0.16 -0.91%
std::rotate(vector<bool>)_(1_element_backward)/8192 111.37 111.04 -0.33 -0.29%
std::rotate(vector<bool>)_(1_element_forward)/1024 30.16 29.92 -0.24 -0.81%
std::rotate(vector<bool>)_(1_element_forward)/32 17.22 17.38 0.16 0.94%
std::rotate(vector<bool>)_(1_element_forward)/50 17.22 17.36 0.14 0.83%
std::rotate(vector<bool>)_(1_element_forward)/8192 111.28 111.19 -0.09 -0.08%
std::rotate(vector<bool>)_(by_1/2)/1024 9.20 9.65 0.46 4.96%
std::rotate(vector<bool>)_(by_1/2)/32 17.23 17.12 -0.11 -0.62%
std::rotate(vector<bool>)_(by_1/2)/50 17.18 17.04 -0.14 -0.82%
std::rotate(vector<bool>)_(by_1/2)/8192 15.93 16.92 0.99 6.22%
std::rotate(vector<bool>)_(by_1/3)/1024 57.99 58.05 0.06 0.10%
std::rotate(vector<bool>)_(by_1/3)/32 17.14 17.06 -0.08 -0.47%
std::rotate(vector<bool>)_(by_1/3)/50 17.09 16.95 -0.15 -0.87%
std::rotate(vector<bool>)_(by_1/3)/8192 164.14 165.70 1.56 0.95%
std::rotate(vector<bool>)_(by_1/4)/1024 14.00 14.21 0.21 1.49%
std::rotate(vector<bool>)_(by_1/4)/32 17.20 17.20 -0.00 -0.00%
std::rotate(vector<bool>)_(by_1/4)/50 17.09 17.10 0.01 0.08%
std::rotate(vector<bool>)_(by_1/4)/8192 33.90 34.16 0.26 0.76%
std::rotate(vector<int>)_(1_element_backward)/1024 37.04 37.54 0.50 1.35%
std::rotate(vector<int>)_(1_element_backward)/32 3.01 3.01 0.00 0.04%
std::rotate(vector<int>)_(1_element_backward)/50 5.30 4.96 -0.34 -6.44%
std::rotate(vector<int>)_(1_element_backward)/8192 302.63 286.92 -15.71 -5.19%
std::rotate(vector<int>)_(1_element_forward)/1024 36.83 36.88 0.05 0.15%
std::rotate(vector<int>)_(1_element_forward)/32 3.08 3.06 -0.02 -0.56%
std::rotate(vector<int>)_(1_element_forward)/50 5.34 5.07 -0.26 -4.95%
std::rotate(vector<int>)_(1_element_forward)/8192 283.85 284.99 1.14 0.40%
std::rotate(vector<int>)_(by_1/2)/1024 32.12 32.19 0.06 0.20%
std::rotate(vector<int>)_(by_1/2)/32 3.83 2.22 -1.61 -42.16%
std::rotate(vector<int>)_(by_1/2)/50 4.28 4.38 0.09 2.19%
std::rotate(vector<int>)_(by_1/2)/8192 256.28 252.63 -3.65 -1.42%
std::rotate(vector<int>)_(by_1/3)/1024 1386.79 142.76 -1244.03 -89.71%
std::rotate(vector<int>)_(by_1/3)/32 21.76 11.54 -10.22 -46.98%
std::rotate(vector<int>)_(by_1/3)/50 40.30 11.79 -28.51 -70.74%
std::rotate(vector<int>)_(by_1/3)/8192 11017.09 949.95 -10067.14 -91.38%
std::rotate(vector<int>)_(by_1/4)/1024 335.51 48.69 -286.83 -85.49%
std::rotate(vector<int>)_(by_1/4)/32 12.31 6.09 -6.22 -50.51%
std::rotate(vector<int>)_(by_1/4)/50 40.82 13.03 -27.79 -68.07%
std::rotate(vector<int>)_(by_1/4)/8192 2647.90 392.58 -2255.32 -85.17%
Geomean 76.74 56.58 -20.16 -26.27%
```
Fixes #54949
|
|
the ranges is known (#171585)
This allows us to merge some optimizations common between the 3-leg
overload and the two ranges overload.
In some cases this could also improve performance, since we avoid
checking one of the iterators if the size if the ranges is known. I
haven't been able to show any improvements though.
This is also a prerequisite for optimizing `std::search`.
|
|
specialization (#172270)
|
|
template (#173151)
|
|
This patch optimizes how `for_each` iterates over trees by using
recursion and storing pointers to the next nodes on the stack. This
avoids pointer chasing through the `__parent_` pointer, reducing cache
misses. It also makes use of the compiler being able tail-call optimize
the recursive function, removing back-tracking the iterators have to do.
```
Benchmark old new Difference % Difference
-------------------------------------------------- -------------- -------------- ------------ --------------
rng::for_each(map<int>)/32 35.19 26.67 -8.52 -24.21%
rng::for_each(map<int>)/50 64.13 40.68 -23.45 -36.57%
rng::for_each(map<int>)/8 5.06 6.49 1.43 28.21%
rng::for_each(map<int>)/8192 22893.89 9266.68 -13627.21 -59.52%
rng::for_each(map<int>::iterator)/32 35.51 26.88 -8.63 -24.31%
rng::for_each(map<int>::iterator)/50 64.39 41.24 -23.15 -35.95%
rng::for_each(map<int>::iterator)/8 5.12 5.93 0.81 15.80%
rng::for_each(map<int>::iterator)/8192 21283.14 9736.83 -11546.31 -54.25%
rng::for_each(multimap<int>)/32 35.22 26.61 -8.61 -24.45%
rng::for_each(multimap<int>)/50 64.10 40.07 -24.03 -37.49%
rng::for_each(multimap<int>)/8 5.08 6.69 1.61 31.70%
rng::for_each(multimap<int>)/8192 23130.44 9026.16 -14104.28 -60.98%
rng::for_each(multimap<int>::iterator)/32 35.40 25.08 -10.32 -29.15%
rng::for_each(multimap<int>::iterator)/50 64.19 38.15 -26.04 -40.56%
rng::for_each(multimap<int>::iterator)/8 5.04 5.25 0.22 4.31%
rng::for_each(multimap<int>::iterator)/8192 22875.97 9392.08 -13483.89 -58.94%
rng::for_each(multiset<int>)/32 35.82 27.11 -8.72 -24.33%
rng::for_each(multiset<int>)/50 62.92 41.59 -21.32 -33.89%
rng::for_each(multiset<int>)/8 4.79 6.79 2.00 41.70%
rng::for_each(multiset<int>)/8192 22642.68 9280.95 -13361.73 -59.01%
rng::for_each(multiset<int>::iterator)/32 35.76 26.71 -9.04 -25.28%
rng::for_each(multiset<int>::iterator)/50 63.44 39.00 -24.44 -38.53%
rng::for_each(multiset<int>::iterator)/8 4.90 5.21 0.30 6.18%
rng::for_each(multiset<int>::iterator)/8192 19930.45 9867.60 -10062.85 -50.49%
rng::for_each(set<int>)/32 35.90 27.30 -8.60 -23.96%
rng::for_each(set<int>)/50 63.15 40.75 -22.40 -35.47%
rng::for_each(set<int>)/8 4.77 6.83 2.06 43.23%
rng::for_each(set<int>)/8192 20262.77 9381.57 -10881.20 -53.70%
rng::for_each(set<int>::iterator)/32 36.02 26.42 -9.60 -26.64%
rng::for_each(set<int>::iterator)/50 63.29 37.97 -25.32 -40.01%
rng::for_each(set<int>::iterator)/8 4.72 5.22 0.50 10.50%
rng::for_each(set<int>::iterator)/8192 20041.91 9831.91 -10210.00 -50.94%
```
|
|
This removes a bit of code duplication and might simplify future
segmented iterator optimitations.
|
|
This essentially reverts #100685 and fixes the bidirectional and random
access specializations to be actually used.
```
Benchmark old new Difference % Difference
------------------------------------------------------------ -------------- -------------- ------------ --------------
rng::find_end(deque<int>)_(match_near_end)/1000 366.91 47.63 -319.28 -87.02%
rng::find_end(deque<int>)_(match_near_end)/1024 3273.31 35.42 -3237.89 -98.92%
rng::find_end(deque<int>)_(match_near_end)/8192 171608.41 285.04 -171323.38 -99.83%
rng::find_end(deque<int>)_(near_matches)/1000 31808.40 19214.35 -12594.05 -39.59%
rng::find_end(deque<int>)_(near_matches)/1024 37428.72 20773.87 -16654.85 -44.50%
rng::find_end(deque<int>)_(near_matches)/8192 1719468.34 1213967.45 -505500.89 -29.40%
rng::find_end(deque<int>)_(process_all)/1000 275.81 336.29 60.49 21.93%
rng::find_end(deque<int>)_(process_all)/1024 258.88 320.36 61.47 23.74%
rng::find_end(deque<int>)_(process_all)/1048576 277117.41 327640.37 50522.96 18.23%
rng::find_end(deque<int>)_(process_all)/8192 2166.36 2533.52 367.16 16.95%
rng::find_end(deque<int>)_(same_length)/1000 1280.06 362.53 -917.53 -71.68%
rng::find_end(deque<int>)_(same_length)/1024 1419.99 417.58 -1002.40 -70.59%
rng::find_end(deque<int>)_(same_length)/8192 11363.81 2870.63 -8493.18 -74.74%
rng::find_end(deque<int>)_(single_element)/1000 277.22 363.52 86.31 31.13%
rng::find_end(deque<int>)_(single_element)/1024 257.11 353.94 96.84 37.66%
rng::find_end(deque<int>)_(single_element)/8192 2059.02 2762.29 703.27 34.16%
rng::find_end(deque<int>,_pred)_(match_near_end)/1000 696.84 70.07 -626.77 -89.94%
rng::find_end(deque<int>,_pred)_(match_near_end)/1024 4774.82 70.75 -4704.07 -98.52%
rng::find_end(deque<int>,_pred)_(match_near_end)/8192 267492.37 549.57 -266942.81 -99.79%
rng::find_end(deque<int>,_pred)_(near_matches)/1000 39414.88 31070.43 -8344.46 -21.17%
rng::find_end(deque<int>,_pred)_(near_matches)/1024 38168.52 32362.18 -5806.34 -15.21%
rng::find_end(deque<int>,_pred)_(near_matches)/8192 2594717.16 1938056.79 -656660.38 -25.31%
rng::find_end(deque<int>,_pred)_(process_all)/1000 600.88 586.92 -13.96 -2.32%
rng::find_end(deque<int>,_pred)_(process_all)/1024 613.00 592.66 -20.33 -3.32%
rng::find_end(deque<int>,_pred)_(process_all)/1048576 600059.65 603440.98 3381.33 0.56%
rng::find_end(deque<int>,_pred)_(process_all)/8192 4850.32 4764.56 -85.76 -1.77%
rng::find_end(deque<int>,_pred)_(same_length)/1000 1514.90 700.34 -814.57 -53.77%
rng::find_end(deque<int>,_pred)_(same_length)/1024 1561.14 705.80 -855.34 -54.79%
rng::find_end(deque<int>,_pred)_(same_length)/8192 12544.84 5024.45 -7520.39 -59.95%
rng::find_end(deque<int>,_pred)_(single_element)/1000 603.79 650.63 46.84 7.76%
rng::find_end(deque<int>,_pred)_(single_element)/1024 614.93 656.43 41.50 6.75%
rng::find_end(deque<int>,_pred)_(single_element)/8192 4885.89 5225.71 339.82 6.96%
rng::find_end(forward_list<int>)_(match_near_end)/1000 770.05 769.32 -0.73 -0.09%
rng::find_end(forward_list<int>)_(match_near_end)/1024 4833.13 4733.24 -99.90 -2.07%
rng::find_end(forward_list<int>)_(match_near_end)/8192 259324.32 261066.84 1742.52 0.67%
rng::find_end(forward_list<int>)_(near_matches)/1000 38301.11 38608.61 307.50 0.80%
rng::find_end(forward_list<int>)_(near_matches)/1024 39370.54 39878.59 508.05 1.29%
rng::find_end(forward_list<int>)_(near_matches)/8192 2527338.50 2527722.47 383.97 0.02%
rng::find_end(forward_list<int>)_(process_all)/1000 713.63 720.74 7.11 1.00%
rng::find_end(forward_list<int>)_(process_all)/1024 727.81 731.60 3.79 0.52%
rng::find_end(forward_list<int>)_(process_all)/1048576 757728.47 766470.14 8741.67 1.15%
rng::find_end(forward_list<int>)_(process_all)/8192 5821.05 5817.80 -3.25 -0.06%
rng::find_end(forward_list<int>)_(same_length)/1000 1458.99 1454.50 -4.49 -0.31%
rng::find_end(forward_list<int>)_(same_length)/1024 1507.73 1515.78 8.05 0.53%
rng::find_end(forward_list<int>)_(same_length)/8192 20432.32 18658.93 -1773.39 -8.68%
rng::find_end(forward_list<int>)_(single_element)/1000 712.41 708.41 -4.00 -0.56%
rng::find_end(forward_list<int>)_(single_element)/1024 728.05 728.78 0.73 0.10%
rng::find_end(forward_list<int>)_(single_element)/8192 5795.48 6332.88 537.40 9.27%
rng::find_end(forward_list<int>,_pred)_(match_near_end)/1000 843.67 846.77 3.10 0.37%
rng::find_end(forward_list<int>,_pred)_(match_near_end)/1024 5267.90 5343.84 75.94 1.44%
rng::find_end(forward_list<int>,_pred)_(match_near_end)/8192 280912.75 286141.10 5228.35 1.86%
rng::find_end(forward_list<int>,_pred)_(near_matches)/1000 43386.35 44489.38 1103.03 2.54%
rng::find_end(forward_list<int>,_pred)_(near_matches)/1024 44929.84 45608.55 678.71 1.51%
rng::find_end(forward_list<int>,_pred)_(near_matches)/8192 2723281.29 2765369.43 42088.14 1.55%
rng::find_end(forward_list<int>,_pred)_(process_all)/1000 763.13 763.85 0.72 0.09%
rng::find_end(forward_list<int>,_pred)_(process_all)/1024 796.98 773.40 -23.58 -2.96%
rng::find_end(forward_list<int>,_pred)_(process_all)/1048576 858071.76 846166.06 -11905.69 -1.39%
rng::find_end(forward_list<int>,_pred)_(process_all)/8192 6282.19 6244.95 -37.24 -0.59%
rng::find_end(forward_list<int>,_pred)_(same_length)/1000 1560.18 1583.03 22.86 1.47%
rng::find_end(forward_list<int>,_pred)_(same_length)/1024 1603.94 1612.22 8.28 0.52%
rng::find_end(forward_list<int>,_pred)_(same_length)/8192 16907.98 15638.35 -1269.63 -7.51%
rng::find_end(forward_list<int>,_pred)_(single_element)/1000 746.72 754.08 7.36 0.99%
rng::find_end(forward_list<int>,_pred)_(single_element)/1024 761.27 771.75 10.48 1.38%
rng::find_end(forward_list<int>,_pred)_(single_element)/8192 6166.83 6687.87 521.04 8.45%
rng::find_end(list<int>)_(match_near_end)/1000 793.99 67.06 -726.93 -91.55%
rng::find_end(list<int>)_(match_near_end)/1024 4682.12 79.82 -4602.31 -98.30%
rng::find_end(list<int>)_(match_near_end)/8192 263187.10 582.64 -262604.46 -99.78%
rng::find_end(list<int>)_(near_matches)/1000 38066.70 34687.59 -3379.11 -8.88%
rng::find_end(list<int>)_(near_matches)/1024 39721.77 36150.04 -3571.73 -8.99%
rng::find_end(list<int>)_(near_matches)/8192 2543369.85 2247297.03 -296072.82 -11.64%
rng::find_end(list<int>)_(process_all)/1000 716.89 726.65 9.76 1.36%
rng::find_end(list<int>)_(process_all)/1024 742.41 744.05 1.64 0.22%
rng::find_end(list<int>)_(process_all)/1048576 822449.08 873801.46 51352.38 6.24%
rng::find_end(list<int>)_(process_all)/8192 7704.49 9766.50 2062.02 26.76%
rng::find_end(list<int>)_(same_length)/1000 1508.19 710.90 -797.28 -52.86%
rng::find_end(list<int>)_(same_length)/1024 1540.23 735.35 -804.88 -52.26%
rng::find_end(list<int>)_(same_length)/8192 22786.44 10752.45 -12033.98 -52.81%
rng::find_end(list<int>)_(single_element)/1000 699.16 734.76 35.60 5.09%
rng::find_end(list<int>)_(single_element)/1024 717.09 750.91 33.82 4.72%
rng::find_end(list<int>)_(single_element)/8192 9502.45 10289.21 786.76 8.28%
rng::find_end(list<int>,_pred)_(match_near_end)/1000 841.98 83.86 -758.12 -90.04%
rng::find_end(list<int>,_pred)_(match_near_end)/1024 5463.71 76.95 -5386.76 -98.59%
rng::find_end(list<int>,_pred)_(match_near_end)/8192 287070.76 647.14 -286423.62 -99.77%
rng::find_end(list<int>,_pred)_(near_matches)/1000 43878.61 38899.00 -4979.61 -11.35%
rng::find_end(list<int>,_pred)_(near_matches)/1024 45672.50 40520.68 -5151.82 -11.28%
rng::find_end(list<int>,_pred)_(near_matches)/8192 2764800.76 2495879.89 -268920.87 -9.73%
rng::find_end(list<int>,_pred)_(process_all)/1000 764.46 774.78 10.32 1.35%
rng::find_end(list<int>,_pred)_(process_all)/1024 786.81 793.05 6.24 0.79%
rng::find_end(list<int>,_pred)_(process_all)/1048576 934166.34 954637.60 20471.26 2.19%
rng::find_end(list<int>,_pred)_(process_all)/8192 9509.24 10209.73 700.49 7.37%
rng::find_end(list<int>,_pred)_(same_length)/1000 1545.67 782.96 -762.71 -49.34%
rng::find_end(list<int>,_pred)_(same_length)/1024 1580.94 796.87 -784.08 -49.60%
rng::find_end(list<int>,_pred)_(same_length)/8192 21558.41 13370.92 -8187.49 -37.98%
rng::find_end(list<int>,_pred)_(single_element)/1000 766.49 762.81 -3.68 -0.48%
rng::find_end(list<int>,_pred)_(single_element)/1024 784.75 781.47 -3.28 -0.42%
rng::find_end(list<int>,_pred)_(single_element)/8192 9722.26 10399.11 676.85 6.96%
rng::find_end(vector<int>)_(match_near_end)/1000 267.82 25.34 -242.48 -90.54%
rng::find_end(vector<int>)_(match_near_end)/1024 2259.46 25.78 -2233.68 -98.86%
rng::find_end(vector<int>)_(match_near_end)/8192 119747.92 214.53 -119533.39 -99.82%
rng::find_end(vector<int>)_(near_matches)/1000 16913.73 14102.20 -2811.53 -16.62%
rng::find_end(vector<int>)_(near_matches)/1024 16097.97 14767.26 -1330.71 -8.27%
rng::find_end(vector<int>)_(near_matches)/8192 1102803.07 823463.30 -279339.78 -25.33%
rng::find_end(vector<int>)_(process_all)/1000 233.43 380.28 146.85 62.91%
rng::find_end(vector<int>)_(process_all)/1024 238.86 389.32 150.46 62.99%
rng::find_end(vector<int>)_(process_all)/1048576 269619.36 391698.75 122079.39 45.28%
rng::find_end(vector<int>)_(process_all)/8192 2011.46 3061.40 1049.94 52.20%
rng::find_end(vector<int>)_(same_length)/1000 632.19 253.50 -378.69 -59.90%
rng::find_end(vector<int>)_(same_length)/1024 556.53 254.87 -301.66 -54.20%
rng::find_end(vector<int>)_(same_length)/8192 4597.26 2095.57 -2501.68 -54.42%
rng::find_end(vector<int>)_(single_element)/1000 231.57 417.64 186.06 80.35%
rng::find_end(vector<int>)_(single_element)/1024 236.41 427.03 190.62 80.63%
rng::find_end(vector<int>)_(single_element)/8192 1918.95 3367.29 1448.33 75.48%
rng::find_end(vector<int>,_pred)_(match_near_end)/1000 581.49 52.67 -528.82 -90.94%
rng::find_end(vector<int>,_pred)_(match_near_end)/1024 3545.40 53.74 -3491.65 -98.48%
rng::find_end(vector<int>,_pred)_(match_near_end)/8192 190482.78 432.30 -190050.48 -99.77%
rng::find_end(vector<int>,_pred)_(near_matches)/1000 28878.24 24723.01 -4155.23 -14.39%
rng::find_end(vector<int>,_pred)_(near_matches)/1024 30035.85 25597.45 -4438.40 -14.78%
rng::find_end(vector<int>,_pred)_(near_matches)/8192 1858596.45 1584796.11 -273800.34 -14.73%
rng::find_end(vector<int>,_pred)_(process_all)/1000 518.92 813.46 294.53 56.76%
rng::find_end(vector<int>,_pred)_(process_all)/1024 531.17 710.20 179.03 33.70%
rng::find_end(vector<int>,_pred)_(process_all)/1048576 674064.13 905070.15 231006.01 34.27%
rng::find_end(vector<int>,_pred)_(process_all)/8192 4254.34 6372.76 2118.43 49.79%
rng::find_end(vector<int>,_pred)_(same_length)/1000 1106.96 526.23 -580.73 -52.46%
rng::find_end(vector<int>,_pred)_(same_length)/1024 1133.60 539.70 -593.90 -52.39%
rng::find_end(vector<int>,_pred)_(same_length)/8192 8988.10 4302.83 -4685.27 -52.13%
rng::find_end(vector<int>,_pred)_(single_element)/1000 528.11 523.69 -4.42 -0.84%
rng::find_end(vector<int>,_pred)_(single_element)/1024 539.58 838.49 298.91 55.40%
rng::find_end(vector<int>,_pred)_(single_element)/8192 4301.43 7313.22 3011.79 70.02%
std::find_end(deque<int>)_(match_near_end)/1000 347.82 38.56 -309.26 -88.91%
std::find_end(deque<int>)_(match_near_end)/1024 3340.80 34.54 -3306.27 -98.97%
std::find_end(deque<int>)_(match_near_end)/8192 171599.83 281.87 -171317.96 -99.84%
std::find_end(deque<int>)_(near_matches)/1000 29703.68 19712.27 -9991.41 -33.64%
std::find_end(deque<int>)_(near_matches)/1024 32312.41 20008.21 -12304.20 -38.08%
std::find_end(deque<int>)_(near_matches)/8192 1851286.99 1216112.34 -635174.65 -34.31%
std::find_end(deque<int>)_(process_all)/1000 256.69 315.96 59.27 23.09%
std::find_end(deque<int>)_(process_all)/1024 260.97 305.42 44.45 17.03%
std::find_end(deque<int>)_(process_all)/1048576 273310.08 309499.13 36189.05 13.24%
std::find_end(deque<int>)_(process_all)/8192 2071.33 2606.57 535.25 25.84%
std::find_end(deque<int>)_(same_length)/1000 1422.58 441.07 -981.51 -68.99%
std::find_end(deque<int>)_(same_length)/1024 1844.27 350.75 -1493.52 -80.98%
std::find_end(deque<int>)_(same_length)/8192 14681.69 2839.26 -11842.43 -80.66%
std::find_end(deque<int>)_(single_element)/1000 291.63 344.82 53.19 18.24%
std::find_end(deque<int>)_(single_element)/1024 257.97 330.19 72.21 27.99%
std::find_end(deque<int>)_(single_element)/8192 2220.10 2505.02 284.92 12.83%
std::find_end(deque<int>,_pred)_(match_near_end)/1000 694.70 69.60 -625.11 -89.98%
std::find_end(deque<int>,_pred)_(match_near_end)/1024 4735.45 71.12 -4664.33 -98.50%
std::find_end(deque<int>,_pred)_(match_near_end)/8192 267417.02 561.03 -266855.99 -99.79%
std::find_end(deque<int>,_pred)_(near_matches)/1000 42199.71 31597.49 -10602.22 -25.12%
std::find_end(deque<int>,_pred)_(near_matches)/1024 38007.49 32362.16 -5645.33 -14.85%
std::find_end(deque<int>,_pred)_(near_matches)/8192 2607708.49 1935799.88 -671908.60 -25.77%
std::find_end(deque<int>,_pred)_(process_all)/1000 599.65 552.71 -46.94 -7.83%
std::find_end(deque<int>,_pred)_(process_all)/1024 615.88 554.17 -61.71 -10.02%
std::find_end(deque<int>,_pred)_(process_all)/1048576 598471.63 599441.79 970.16 0.16%
std::find_end(deque<int>,_pred)_(process_all)/8192 4853.45 4394.20 -459.25 -9.46%
std::find_end(deque<int>,_pred)_(same_length)/1000 1511.68 797.64 -714.04 -47.23%
std::find_end(deque<int>,_pred)_(same_length)/1024 1568.63 810.85 -757.78 -48.31%
std::find_end(deque<int>,_pred)_(same_length)/8192 12609.34 5092.02 -7517.32 -59.62%
std::find_end(deque<int>,_pred)_(single_element)/1000 601.22 628.80 27.58 4.59%
std::find_end(deque<int>,_pred)_(single_element)/1024 613.25 627.15 13.89 2.27%
std::find_end(deque<int>,_pred)_(single_element)/8192 4823.85 4795.25 -28.60 -0.59%
std::find_end(forward_list<int>)_(match_near_end)/1000 762.64 769.74 7.10 0.93%
std::find_end(forward_list<int>)_(match_near_end)/1024 4767.93 4840.87 72.94 1.53%
std::find_end(forward_list<int>)_(match_near_end)/8192 260275.68 260835.21 559.53 0.21%
std::find_end(forward_list<int>)_(near_matches)/1000 38020.76 38197.53 176.77 0.46%
std::find_end(forward_list<int>)_(near_matches)/1024 39028.86 39333.38 304.51 0.78%
std::find_end(forward_list<int>)_(near_matches)/8192 2524921.48 2523470.32 -1451.16 -0.06%
std::find_end(forward_list<int>)_(process_all)/1000 699.95 699.93 -0.02 -0.00%
std::find_end(forward_list<int>)_(process_all)/1024 715.24 712.07 -3.17 -0.44%
std::find_end(forward_list<int>)_(process_all)/1048576 755926.33 756976.31 1049.98 0.14%
std::find_end(forward_list<int>)_(process_all)/8192 5696.72 5672.92 -23.81 -0.42%
std::find_end(forward_list<int>)_(same_length)/1000 1485.84 1480.19 -5.65 -0.38%
std::find_end(forward_list<int>)_(same_length)/1024 1493.62 1516.95 23.33 1.56%
std::find_end(forward_list<int>)_(same_length)/8192 16833.75 13551.42 -3282.33 -19.50%
std::find_end(forward_list<int>)_(single_element)/1000 688.87 675.02 -13.85 -2.01%
std::find_end(forward_list<int>)_(single_element)/1024 688.89 691.59 2.69 0.39%
std::find_end(forward_list<int>)_(single_element)/8192 5735.87 6748.85 1012.98 17.66%
std::find_end(forward_list<int>,_pred)_(match_near_end)/1000 836.01 853.28 17.27 2.07%
std::find_end(forward_list<int>,_pred)_(match_near_end)/1024 5259.92 5299.30 39.39 0.75%
std::find_end(forward_list<int>,_pred)_(match_near_end)/8192 279479.85 285593.49 6113.65 2.19%
std::find_end(forward_list<int>,_pred)_(near_matches)/1000 42577.60 44550.54 1972.94 4.63%
std::find_end(forward_list<int>,_pred)_(near_matches)/1024 44374.19 45697.95 1323.76 2.98%
std::find_end(forward_list<int>,_pred)_(near_matches)/8192 2711138.03 2742988.33 31850.30 1.17%
std::find_end(forward_list<int>,_pred)_(process_all)/1000 752.03 762.75 10.72 1.43%
std::find_end(forward_list<int>,_pred)_(process_all)/1024 767.04 781.48 14.44 1.88%
std::find_end(forward_list<int>,_pred)_(process_all)/1048576 843453.35 861838.82 18385.47 2.18%
std::find_end(forward_list<int>,_pred)_(process_all)/8192 6241.65 6308.05 66.40 1.06%
std::find_end(forward_list<int>,_pred)_(same_length)/1000 2384.18 1589.21 -794.97 -33.34%
std::find_end(forward_list<int>,_pred)_(same_length)/1024 2428.97 1617.17 -811.80 -33.42%
std::find_end(forward_list<int>,_pred)_(same_length)/8192 16961.22 14972.86 -1988.36 -11.72%
std::find_end(forward_list<int>,_pred)_(single_element)/1000 743.31 752.77 9.47 1.27%
std::find_end(forward_list<int>,_pred)_(single_element)/1024 763.62 768.70 5.08 0.67%
std::find_end(forward_list<int>,_pred)_(single_element)/8192 6189.73 6934.04 744.31 12.02%
std::find_end(list<int>)_(match_near_end)/1000 773.76 76.41 -697.35 -90.12%
std::find_end(list<int>)_(match_near_end)/1024 4715.36 69.09 -4646.27 -98.53%
std::find_end(list<int>)_(match_near_end)/8192 264864.51 584.19 -264280.32 -99.78%
std::find_end(list<int>)_(near_matches)/1000 37650.69 35233.45 -2417.24 -6.42%
std::find_end(list<int>)_(near_matches)/1024 39239.25 36699.13 -2540.13 -6.47%
std::find_end(list<int>)_(near_matches)/8192 2543446.71 2252625.27 -290821.44 -11.43%
std::find_end(list<int>)_(process_all)/1000 718.00 724.59 6.59 0.92%
std::find_end(list<int>)_(process_all)/1024 735.14 746.70 11.57 1.57%
std::find_end(list<int>)_(process_all)/1048576 812620.48 869606.78 56986.30 7.01%
std::find_end(list<int>)_(process_all)/8192 8217.98 8462.53 244.55 2.98%
std::find_end(list<int>)_(same_length)/1000 1500.85 716.45 -784.39 -52.26%
std::find_end(list<int>)_(same_length)/1024 1534.13 736.62 -797.51 -51.98%
std::find_end(list<int>)_(same_length)/8192 20274.06 10621.82 -9652.24 -47.61%
std::find_end(list<int>)_(single_element)/1000 717.05 725.64 8.60 1.20%
std::find_end(list<int>)_(single_element)/1024 732.87 742.44 9.57 1.31%
std::find_end(list<int>)_(single_element)/8192 9835.11 11896.39 2061.28 20.96%
std::find_end(list<int>,_pred)_(match_near_end)/1000 845.46 75.09 -770.37 -91.12%
std::find_end(list<int>,_pred)_(match_near_end)/1024 5301.60 77.14 -5224.46 -98.54%
std::find_end(list<int>,_pred)_(match_near_end)/8192 281976.13 648.87 -281327.25 -99.77%
std::find_end(list<int>,_pred)_(near_matches)/1000 44076.98 39576.32 -4500.67 -10.21%
std::find_end(list<int>,_pred)_(near_matches)/1024 45531.64 41020.11 -4511.54 -9.91%
std::find_end(list<int>,_pred)_(near_matches)/8192 2756383.66 2503085.29 -253298.37 -9.19%
std::find_end(list<int>,_pred)_(process_all)/1000 766.06 764.48 -1.58 -0.21%
std::find_end(list<int>,_pred)_(process_all)/1024 780.35 799.51 19.15 2.45%
std::find_end(list<int>,_pred)_(process_all)/1048576 894643.71 898947.94 4304.24 0.48%
std::find_end(list<int>,_pred)_(process_all)/8192 8436.41 9977.74 1541.33 18.27%
std::find_end(list<int>,_pred)_(same_length)/1000 1545.22 784.29 -760.92 -49.24%
std::find_end(list<int>,_pred)_(same_length)/1024 1583.27 808.52 -774.74 -48.93%
std::find_end(list<int>,_pred)_(same_length)/8192 21850.99 10896.50 -10954.48 -50.13%
std::find_end(list<int>,_pred)_(single_element)/1000 752.03 755.00 2.97 0.39%
std::find_end(list<int>,_pred)_(single_element)/1024 774.22 784.14 9.92 1.28%
std::find_end(list<int>,_pred)_(single_element)/8192 10219.43 10396.49 177.05 1.73%
std::find_end(vector<int>)_(match_near_end)/1000 277.37 28.45 -248.91 -89.74%
std::find_end(vector<int>)_(match_near_end)/1024 2247.56 25.80 -2221.76 -98.85%
std::find_end(vector<int>)_(match_near_end)/8192 119785.10 212.44 -119572.66 -99.82%
std::find_end(vector<int>)_(near_matches)/1000 16351.34 14073.13 -2278.21 -13.93%
std::find_end(vector<int>)_(near_matches)/1024 16656.33 14654.36 -2001.97 -12.02%
std::find_end(vector<int>)_(near_matches)/8192 1181392.88 828918.96 -352473.91 -29.84%
std::find_end(vector<int>)_(process_all)/1000 231.14 235.80 4.66 2.01%
std::find_end(vector<int>)_(process_all)/1024 235.87 232.06 -3.81 -1.61%
std::find_end(vector<int>)_(process_all)/1048576 239922.25 238229.38 -1692.87 -0.71%
std::find_end(vector<int>)_(process_all)/8192 1837.43 1802.25 -35.19 -1.91%
std::find_end(vector<int>)_(same_length)/1000 632.59 252.80 -379.79 -60.04%
std::find_end(vector<int>)_(same_length)/1024 524.51 257.58 -266.94 -50.89%
std::find_end(vector<int>)_(same_length)/8192 5159.01 2090.12 -3068.89 -59.49%
std::find_end(vector<int>)_(single_element)/1000 229.56 250.47 20.91 9.11%
std::find_end(vector<int>)_(single_element)/1024 234.86 252.18 17.32 7.37%
std::find_end(vector<int>)_(single_element)/8192 1825.74 1981.90 156.16 8.55%
std::find_end(vector<int>,_pred)_(match_near_end)/1000 574.17 52.98 -521.19 -90.77%
std::find_end(vector<int>,_pred)_(match_near_end)/1024 3525.35 54.03 -3471.32 -98.47%
std::find_end(vector<int>,_pred)_(match_near_end)/8192 190155.81 423.41 -189732.40 -99.78%
std::find_end(vector<int>,_pred)_(near_matches)/1000 28541.98 24598.37 -3943.61 -13.82%
std::find_end(vector<int>,_pred)_(near_matches)/1024 29696.55 25675.27 -4021.28 -13.54%
std::find_end(vector<int>,_pred)_(near_matches)/8192 1846970.41 1596191.84 -250778.57 -13.58%
std::find_end(vector<int>,_pred)_(process_all)/1000 519.71 592.14 72.43 13.94%
std::find_end(vector<int>,_pred)_(process_all)/1024 529.74 491.07 -38.67 -7.30%
std::find_end(vector<int>,_pred)_(process_all)/1048576 631923.41 643729.57 11806.16 1.87%
std::find_end(vector<int>,_pred)_(process_all)/8192 4215.05 3909.30 -305.75 -7.25%
std::find_end(vector<int>,_pred)_(same_length)/1000 1095.46 524.99 -570.47 -52.08%
std::find_end(vector<int>,_pred)_(same_length)/1024 1117.95 537.65 -580.31 -51.91%
std::find_end(vector<int>,_pred)_(same_length)/8192 8923.95 4307.13 -4616.83 -51.74%
std::find_end(vector<int>,_pred)_(single_element)/1000 516.52 656.32 139.80 27.07%
std::find_end(vector<int>,_pred)_(single_element)/1024 528.82 673.72 144.90 27.40%
std::find_end(vector<int>,_pred)_(single_element)/8192 4210.37 5529.52 1319.15 31.33%
Geomean 6995.43 3440.97 -3554.46 -50.81%
```
|
|
The segmented iterator optimized implementation of find now unwraps
iterators when processing each segments.
As a result, it is able to take better advantage to some find
specializations: calling memchr/wmemchr for vector<vector<{char,int}>>
```
Benchmark Baseline Candidate Difference % Difference
-------------------------------------------------------------- ---------- ----------- ------------ --------------
rng::find(join_view(deque<deque<int>>))_(process_all)/1024 71.13 61.19 -9.94 -13.97
rng::find(join_view(deque<deque<int>>))_(process_all)/32768 2359.19 2237.02 -122.17 -5.18
rng::find(join_view(deque<deque<int>>))_(process_all)/50 16.88 17.59 0.71 4.20
rng::find(join_view(deque<deque<int>>))_(process_all)/8 15.59 16.10 0.51 3.27
rng::find(join_view(deque<deque<int>>))_(process_all)/8192 647.01 532.75 -114.26 -17.66
rng::find(join_view(list<vector<int>>))_(process_all)/1024 689.76 680.74 -9.02 -1.31
rng::find(join_view(list<vector<int>>))_(process_all)/32768 22284.95 21500.26 -784.69 -3.52
rng::find(join_view(list<vector<int>>))_(process_all)/50 32.77 32.12 -0.65 -1.98
rng::find(join_view(list<vector<int>>))_(process_all)/8 6.11 5.92 -0.19 -3.11
rng::find(join_view(list<vector<int>>))_(process_all)/8192 5527.88 5373.43 -154.45 -2.79
rng::find(join_view(vector<list<int>>))_(process_all)/1024 1305.59 1264.04 -41.55 -3.18
rng::find(join_view(vector<list<int>>))_(process_all)/32768 42840.88 43322.64 481.76 1.12
rng::find(join_view(vector<list<int>>))_(process_all)/50 57.52 62.35 4.82 8.38
rng::find(join_view(vector<list<int>>))_(process_all)/8 6.06 5.98 -0.07 -1.18
rng::find(join_view(vector<list<int>>))_(process_all)/8192 20700.53 21431.66 731.12 3.53
rng::find(join_view(vector<vector<char>>))_(process_all)/1024 310.64 18.34 -292.30 -94.09
rng::find(join_view(vector<vector<char>>))_(process_all)/32768 9424.96 531.99 -8892.97 -94.36
rng::find(join_view(vector<vector<char>>))_(process_all)/50 18.58 3.25 -15.32 -82.49
rng::find(join_view(vector<vector<char>>))_(process_all)/8 4.81 2.98 -1.84 -38.13
rng::find(join_view(vector<vector<char>>))_(process_all)/8192 2437.50 126.88 -2310.62 -94.79
rng::find(join_view(vector<vector<int>>))_(process_all)/1024 297.10 41.70 -255.39 -85.96
rng::find(join_view(vector<vector<int>>))_(process_all)/32768 9662.42 1822.05 -7840.36 -81.14
rng::find(join_view(vector<vector<int>>))_(process_all)/50 22.29 5.10 -17.19 -77.11
rng::find(join_view(vector<vector<int>>))_(process_all)/8 3.73 3.13 -0.60 -16.05
rng::find(join_view(vector<vector<int>>))_(process_all)/8192 2399.68 356.10 -2043.58 -85.16
```
|
|
vector::insert to assign (#157444)
This reduces the amount of code we have to maintain a bit.
This also simplifies `vector` by using the internal API instead of
`#if`s to switch based on language dialect.
|
|
|
|
```
---------------------------------------------------
Benchmark old new
---------------------------------------------------
BM_num_get<bool> 86.5 ns 32.3 ns
BM_num_get<long> 82.1 ns 30.3 ns
BM_num_get<long long> 85.2 ns 33.4 ns
BM_num_get<unsigned short> 85.3 ns 31.2 ns
BM_num_get<unsigned int> 84.2 ns 31.1 ns
BM_num_get<unsigned long> 83.6 ns 31.9 ns
BM_num_get<unsigned long long> 87.7 ns 31.5 ns
BM_num_get<float> 116 ns 114 ns
BM_num_get<double> 114 ns 114 ns
BM_num_get<long double> 113 ns 114 ns
BM_num_get<void*> 151 ns 144 ns
```
This patch applies multiple optimizations:
- Stages two and three of do_get are merged and a custom integer parser
has been implemented
This avoids allocations, removes the need for strto{,u}ll and avoids
__stage2_int_loop (avoiding extra writes to memory)
- std::find has been replaced with __atoms_offset, which uses vector
instructions to look for a character
Fixes #158100
Fixes #158102
|
|
This allows propagating optimizations to different algorithms by just
optimizing the lowest one. This is especially relevant now that we start
optimizing how we're iterating through ranges (e.g. the segmented
iterator optimizations) and adding assumptions so the compier can better
leverage semantics guaranteed by the standard (e.g.
`__builtin_assume_dereferenceable`).
|
|
This patch does two things.
(1) It replaces SFINAE with `if constexpr`, avoiding some overload
resolution and unnecessary boilerplate.
(2) It removes an overload from `__for_each_n` to forward to
`__for_each`, since `__for_each` doesn't provide any further
optimizations.
|
|
Part of #102817.
This patch optimizes `rng::generate_n` for segmented iterators by
forwarding the implementation directly to `std::generate_n`.
- before
```
rng::generate_n(deque<int>)/32 21.7 ns 22.0 ns 32000000
rng::generate_n(deque<int>)/50 30.8 ns 30.7 ns 22400000
rng::generate_n(deque<int>)/1024 492 ns 488 ns 1120000
rng::generate_n(deque<int>)/8192 3938 ns 3924 ns 179200
```
- after
```
rng::generate_n(deque<int>)/32 11.0 ns 11.0 ns 64000000
rng::generate_n(deque<int>)/50 16.2 ns 16.1 ns 40727273
rng::generate_n(deque<int>)/1024 292 ns 286 ns 2240000
rng::generate_n(deque<int>)/8192 2291 ns 2302 ns 298667
```
|
|
It seems this was accidentally included; there's no use of std::move in
this header.
|
|
Part of #102817.
This is a natural follow-up to #163006. We are forwarding
`std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs
c++17), resulting in improved performance for segmented iterators.
before:
```
std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273
std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941
std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161
std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200
```
after:
```
std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000
std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000
std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895
std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667
```
|
|
Part of #102817.
This patch attempts to optimize the performance of `std::generate` for
segmented iterators. Below are the benchmark numbers from
`libcxx\test\benchmarks\algorithms\modifying\generate.bench.cpp`. Test
cases that use segmented iterators have also been added.
- before
```
std::generate(deque<int>)/32 194 ns 193 ns 3733333
std::generate(deque<int>)/50 276 ns 276 ns 2488889
std::generate(deque<int>)/1024 5096 ns 5022 ns 112000
std::generate(deque<int>)/8192 40806 ns 40806 ns 17231
```
- after
```
std::generate(deque<int>)/32 106 ns 105 ns 6400000
std::generate(deque<int>)/50 139 ns 138 ns 4977778
std::generate(deque<int>)/1024 2713 ns 2699 ns 248889
std::generate(deque<int>)/8192 18983 ns 19252 ns 37333
```
---------
Co-authored-by: A. Jiang <de34@live.cn>
|
|
(#132665)
This patch optimizes `std::fill`, `std::fill_n`, `std::ranges::fill`,
and `std::ranges::fill_n` for segmented iterators, achieving substantial
performance improvements. Specifically, for `deque<int>` iterators, the
performance improvements are above 10x for all these algorithms. The
optimization also enables filling segmented memory of `deque<int>` to
approach the performance of filling contiguous memory of `vector<int>`.
Benchmark results comparing the before and after implementations are
provided below. For additional context, we’ve included `vector<int>`
results, which remain unchanged, as this patch specifically targets
segmented iterators and leaves non-segmented iterator behavior
untouched.
Fixes two subtasks outlined in #102817.
#### `fill_n`
```
-----------------------------------------------------------------------------
Benchmark Before After Speedup
-----------------------------------------------------------------------------
std::fill_n(deque<int>)/32 11.4 ns 2.28 ns 5.0x
std::fill_n(deque<int>)/50 19.7 ns 3.40 ns 5.8x
std::fill_n(deque<int>)/1024 391 ns 37.3 ns 10.5x
std::fill_n(deque<int>)/8192 3174 ns 301 ns 10.5x
std::fill_n(deque<int>)/65536 26504 ns 2951 ns 9.0x
std::fill_n(deque<int>)/1048576 407960 ns 80658 ns 5.1x
rng::fill_n(deque<int>)/32 14.3 ns 2.15 ns 6.6x
rng::fill_n(deque<int>)/50 20.2 ns 3.22 ns 6.3x
rng::fill_n(deque<int>)/1024 381 ns 37.8 ns 10.1x
rng::fill_n(deque<int>)/8192 3101 ns 294 ns 10.5x
rng::fill_n(deque<int>)/65536 25098 ns 2926 ns 8.6x
rng::fill_n(deque<int>)/1048576 394342 ns 78874 ns 5.0x
std::fill_n(vector<int>)/32 1.76 ns 1.72 ns 1.0x
std::fill_n(vector<int>)/50 3.00 ns 2.73 ns 1.1x
std::fill_n(vector<int>)/1024 38.4 ns 37.9 ns 1.0x
std::fill_n(vector<int>)/8192 258 ns 252 ns 1.0x
std::fill_n(vector<int>)/65536 2993 ns 2889 ns 1.0x
std::fill_n(vector<int>)/1048576 80328 ns 80468 ns 1.0x
rng::fill_n(vector<int>)/32 1.99 ns 1.35 ns 1.5x
rng::fill_n(vector<int>)/50 2.66 ns 2.12 ns 1.3x
rng::fill_n(vector<int>)/1024 37.7 ns 35.8 ns 1.1x
rng::fill_n(vector<int>)/8192 253 ns 250 ns 1.0x
rng::fill_n(vector<int>)/65536 2922 ns 2930 ns 1.0x
rng::fill_n(vector<int>)/1048576 79739 ns 79742 ns 1.0x
```
#### `fill`
```
--------------------------------------------------------------------------
Benchmark Before After Speedup
--------------------------------------------------------------------------
std::fill(deque<int>)/32 13.7 ns 2.45 ns 5.6x
std::fill(deque<int>)/50 21.7 ns 4.57 ns 4.7x
std::fill(deque<int>)/1024 367 ns 38.5 ns 9.5x
std::fill(deque<int>)/8192 2896 ns 247 ns 11.7x
std::fill(deque<int>)/65536 23723 ns 2907 ns 8.2x
std::fill(deque<int>)/1048576 379043 ns 79885 ns 4.7x
rng::fill(deque<int>)/32 13.6 ns 2.70 ns 5.0x
rng::fill(deque<int>)/50 23.4 ns 3.94 ns 5.9x
rng::fill(deque<int>)/1024 377 ns 37.9 ns 9.9x
rng::fill(deque<int>)/8192 2914 ns 286 ns 10.2x
rng::fill(deque<int>)/65536 23612 ns 2939 ns 8.0x
rng::fill(deque<int>)/1048576 379841 ns 80079 ns 4.7x
std::fill(vector<int>)/32 1.99 ns 1.79 ns 1.1x
std::fill(vector<int>)/50 3.05 ns 3.06 ns 1.0x
std::fill(vector<int>)/1024 37.6 ns 38.0 ns 1.0x
std::fill(vector<int>)/8192 255 ns 257 ns 1.0x
std::fill(vector<int>)/65536 2966 ns 2981 ns 1.0x
std::fill(vector<int>)/1048576 78300 ns 80348 ns 1.0x
rng::fill(vector<int>)/32 1.77 ns 1.75 ns 1.0x
rng::fill(vector<int>)/50 4.85 ns 2.31 ns 2.1x
rng::fill(vector<int>)/1024 39.6 ns 36.1 ns 1.1x
rng::fill(vector<int>)/8192 238 ns 251 ns 0.9x
rng::fill(vector<int>)/65536 2941 ns 2918 ns 1.0x
rng::fill(vector<int>)/1048576 80497 ns 80442 ns 1.0x
```
---------
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
Co-authored-by: A. Jiang <de34@live.cn>
|
|
Currently, `size_t` and `__libcpp_is_constant_evaluated` are obtained by
transitive inclusion in `<__algorithm/find.h>` when `<cwchar>` is not
included. This broke module build when `_LIBCPP_HAS_WIDE_CHARACTERS` is
`1` and caused CI failure. We should explicitly include the
corresponding internal headers.
|
|
This renames all the `iterator_traits` alises to be
`__iterator_<type-name>`. e.g `iterator_traits<T>::value_type` will be
`__iterator_value_type<T>`.
|
|
(#160738) (#161485)
This reverts commit b86aaacf28b358b187071bc87075f1faa2d65c4e.
The issue in LLVM has been fixed now.
|
|
(#161362)
|
|
```
Apple M4:
-----------------------------------------------------------------------------
Benchmark old new
-----------------------------------------------------------------------------
std::find(vector<char>) (bail 25%)/8 1.43 ns 1.44 ns
std::find(vector<char>) (bail 25%)/1024 5.54 ns 5.59 ns
std::find(vector<char>) (bail 25%)/8192 38.4 ns 39.1 ns
std::find(vector<char>) (bail 25%)/32768 134 ns 136 ns
std::find(vector<int>) (bail 25%)/8 1.56 ns 1.57 ns
std::find(vector<int>) (bail 25%)/1024 65.3 ns 65.4 ns
std::find(vector<int>) (bail 25%)/8192 465 ns 464 ns
std::find(vector<int>) (bail 25%)/32768 1832 ns 1832 ns
std::find(vector<long long>) (bail 25%)/8 0.920 ns 1.20 ns
std::find(vector<long long>) (bail 25%)/1024 65.2 ns 31.2 ns
std::find(vector<long long>) (bail 25%)/8192 464 ns 255 ns
std::find(vector<long long>) (bail 25%)/32768 1833 ns 992 ns
std::find(vector<char>) (process all)/8 1.21 ns 1.22 ns
std::find(vector<char>) (process all)/50 1.92 ns 1.93 ns
std::find(vector<char>) (process all)/1024 16.6 ns 16.9 ns
std::find(vector<char>) (process all)/8192 134 ns 136 ns
std::find(vector<char>) (process all)/32768 488 ns 503 ns
std::find(vector<int>) (process all)/8 2.45 ns 2.48 ns
std::find(vector<int>) (process all)/50 12.7 ns 12.7 ns
std::find(vector<int>) (process all)/1024 236 ns 236 ns
std::find(vector<int>) (process all)/8192 1830 ns 1834 ns
std::find(vector<int>) (process all)/32768 7351 ns 7346 ns
std::find(vector<long long>) (process all)/8 2.02 ns 1.45 ns
std::find(vector<long long>) (process all)/50 12.0 ns 6.12 ns
std::find(vector<long long>) (process all)/1024 235 ns 123 ns
std::find(vector<long long>) (process all)/8192 1830 ns 983 ns
std::find(vector<long long>) (process all)/32768 7306 ns 3969 ns
std::find(vector<bool>) (process all)/8 1.14 ns 1.15 ns
std::find(vector<bool>) (process all)/50 1.16 ns 1.17 ns
std::find(vector<bool>) (process all)/1024 4.51 ns 4.53 ns
std::find(vector<bool>) (process all)/8192 33.6 ns 33.5 ns
std::find(vector<bool>) (process all)/1048576 3660 ns 3660 ns
```
|
|
(#160738)
Reverts llvm/llvm-project#157866
It breaks a lot of sanitizer buildbots
|
|
This patch adds additional overloads to `map::at` in case its known that
the argument is transparently comparable to the key type. This avoids
actually constructing the key type in some cases, potentially removing
allocations.
```
--------------------------------------------------------
Benchmark old new
--------------------------------------------------------
BM_map_find_string_literal 12.8 ns 2.68 ns
```
|
|
`static` is not allowed inside constexpr functions in C++ versions below
23. This is causing a build error when compiled with GCC and warning
when compiled with Clang. This patch fixes this.
---------
Co-authored-by: A. Jiang <de34@live.cn>
|
|
```
Apple M4:
-----------------------------------------------------------------------------------------------------
Benchmark old new
-----------------------------------------------------------------------------------------------------
BM_MakeHeap_uint32_Random_1 0.285 ns 0.271 ns
BM_MakeHeap_uint32_Random_4 2.09 ns 1.80 ns
BM_MakeHeap_uint32_Random_16 1.85 ns 1.83 ns
BM_MakeHeap_uint32_Random_64 1.92 ns 1.50 ns
BM_MakeHeap_uint32_Random_256 2.10 ns 1.87 ns
BM_MakeHeap_uint32_Random_1024 1.73 ns 1.86 ns
BM_MakeHeap_uint32_Random_16384 2.17 ns 2.05 ns
BM_MakeHeap_uint32_Random_262144 1.77 ns 1.77 ns
BM_MakeHeap_uint32_Ascending_1 0.288 ns 0.277 ns
BM_MakeHeap_uint32_Ascending_4 0.658 ns 0.481 ns
BM_MakeHeap_uint32_Ascending_16 0.636 ns 0.637 ns
BM_MakeHeap_uint32_Ascending_64 0.643 ns 0.601 ns
BM_MakeHeap_uint32_Ascending_256 0.710 ns 0.636 ns
BM_MakeHeap_uint32_Ascending_1024 0.747 ns 0.660 ns
BM_MakeHeap_uint32_Ascending_16384 0.713 ns 0.633 ns
BM_MakeHeap_uint32_Ascending_262144 0.769 ns 0.731 ns
BM_MakeHeap_uint32_Descending_1 0.294 ns 0.280 ns
BM_MakeHeap_uint32_Descending_4 0.379 ns 0.305 ns
BM_MakeHeap_uint32_Descending_16 0.376 ns 0.268 ns
BM_MakeHeap_uint32_Descending_64 0.358 ns 0.271 ns
BM_MakeHeap_uint32_Descending_256 0.377 ns 0.284 ns
BM_MakeHeap_uint32_Descending_1024 0.355 ns 0.267 ns
BM_MakeHeap_uint32_Descending_16384 0.348 ns 0.248 ns
BM_MakeHeap_uint32_Descending_262144 0.349 ns 0.247 ns
BM_MakeHeap_uint32_SingleElement_1 0.292 ns 0.280 ns
BM_MakeHeap_uint32_SingleElement_4 0.570 ns 0.332 ns
BM_MakeHeap_uint32_SingleElement_16 0.635 ns 0.604 ns
BM_MakeHeap_uint32_SingleElement_64 0.653 ns 0.567 ns
BM_MakeHeap_uint32_SingleElement_256 0.703 ns 0.609 ns
BM_MakeHeap_uint32_SingleElement_1024 0.737 ns 0.604 ns
BM_MakeHeap_uint32_SingleElement_16384 0.699 ns 0.574 ns
BM_MakeHeap_uint32_SingleElement_262144 0.803 ns 0.684 ns
BM_MakeHeap_uint32_PipeOrgan_1 0.291 ns 0.284 ns
BM_MakeHeap_uint32_PipeOrgan_4 0.588 ns 0.399 ns
BM_MakeHeap_uint32_PipeOrgan_16 0.648 ns 1.12 ns
BM_MakeHeap_uint32_PipeOrgan_64 0.662 ns 0.771 ns
BM_MakeHeap_uint32_PipeOrgan_256 0.723 ns 0.672 ns
BM_MakeHeap_uint32_PipeOrgan_1024 0.749 ns 0.674 ns
BM_MakeHeap_uint32_PipeOrgan_16384 0.708 ns 0.638 ns
BM_MakeHeap_uint32_PipeOrgan_262144 0.786 ns 0.743 ns
BM_MakeHeap_uint32_Heap_1 0.298 ns 0.282 ns
BM_MakeHeap_uint32_Heap_4 0.396 ns 0.308 ns
BM_MakeHeap_uint32_Heap_16 0.377 ns 0.268 ns
BM_MakeHeap_uint32_Heap_64 0.356 ns 0.271 ns
BM_MakeHeap_uint32_Heap_256 0.378 ns 0.290 ns
BM_MakeHeap_uint32_Heap_1024 0.356 ns 0.275 ns
BM_MakeHeap_uint32_Heap_16384 0.348 ns 0.252 ns
BM_MakeHeap_uint32_Heap_262144 0.347 ns 0.250 ns
BM_MakeHeap_uint32_QuickSortAdversary_1 0.290 ns 0.284 ns
BM_MakeHeap_uint32_QuickSortAdversary_4 0.627 ns 0.409 ns
BM_MakeHeap_uint32_QuickSortAdversary_16 0.640 ns 0.653 ns
BM_MakeHeap_uint32_QuickSortAdversary_64 0.577 ns 0.484 ns
BM_MakeHeap_uint32_QuickSortAdversary_256 0.613 ns 0.521 ns
BM_MakeHeap_uint32_QuickSortAdversary_1024 0.652 ns 0.514 ns
BM_MakeHeap_uint32_QuickSortAdversary_16384 0.428 ns 0.308 ns
BM_MakeHeap_uint32_QuickSortAdversary_262144 0.373 ns 0.261 ns
BM_MakeHeap_uint64_Random_1 0.291 ns 0.281 ns
BM_MakeHeap_uint64_Random_4 2.20 ns 1.97 ns
BM_MakeHeap_uint64_Random_16 1.93 ns 1.70 ns
BM_MakeHeap_uint64_Random_64 1.89 ns 1.48 ns
BM_MakeHeap_uint64_Random_256 2.10 ns 1.99 ns
BM_MakeHeap_uint64_Random_1024 1.41 ns 2.04 ns
BM_MakeHeap_uint64_Random_16384 2.12 ns 1.83 ns
BM_MakeHeap_uint64_Random_262144 1.83 ns 1.63 ns
BM_MakeHeap_uint64_Ascending_1 0.288 ns 0.283 ns
BM_MakeHeap_uint64_Ascending_4 0.624 ns 0.493 ns
BM_MakeHeap_uint64_Ascending_16 0.648 ns 0.688 ns
BM_MakeHeap_uint64_Ascending_64 0.671 ns 0.634 ns
BM_MakeHeap_uint64_Ascending_256 0.739 ns 0.680 ns
BM_MakeHeap_uint64_Ascending_1024 0.761 ns 0.698 ns
BM_MakeHeap_uint64_Ascending_16384 0.740 ns 0.685 ns
BM_MakeHeap_uint64_Ascending_262144 0.849 ns 0.837 ns
BM_MakeHeap_uint64_Descending_1 0.304 ns 0.287 ns
BM_MakeHeap_uint64_Descending_4 0.395 ns 0.312 ns
BM_MakeHeap_uint64_Descending_16 0.374 ns 0.276 ns
BM_MakeHeap_uint64_Descending_64 0.358 ns 0.272 ns
BM_MakeHeap_uint64_Descending_256 0.389 ns 0.303 ns
BM_MakeHeap_uint64_Descending_1024 0.361 ns 0.278 ns
BM_MakeHeap_uint64_Descending_16384 0.352 ns 0.253 ns
BM_MakeHeap_uint64_Descending_262144 0.350 ns 0.251 ns
BM_MakeHeap_uint64_SingleElement_1 0.295 ns 0.285 ns
BM_MakeHeap_uint64_SingleElement_4 0.569 ns 0.358 ns
BM_MakeHeap_uint64_SingleElement_16 0.649 ns 0.652 ns
BM_MakeHeap_uint64_SingleElement_64 0.673 ns 0.565 ns
BM_MakeHeap_uint64_SingleElement_256 0.732 ns 0.651 ns
BM_MakeHeap_uint64_SingleElement_1024 0.759 ns 0.632 ns
BM_MakeHeap_uint64_SingleElement_16384 0.748 ns 0.614 ns
BM_MakeHeap_uint64_SingleElement_262144 0.947 ns 0.797 ns
BM_MakeHeap_uint64_PipeOrgan_1 0.295 ns 0.284 ns
BM_MakeHeap_uint64_PipeOrgan_4 0.601 ns 0.496 ns
BM_MakeHeap_uint64_PipeOrgan_16 0.655 ns 1.18 ns
BM_MakeHeap_uint64_PipeOrgan_64 0.682 ns 0.803 ns
BM_MakeHeap_uint64_PipeOrgan_256 0.759 ns 0.710 ns
BM_MakeHeap_uint64_PipeOrgan_1024 0.759 ns 0.713 ns
BM_MakeHeap_uint64_PipeOrgan_16384 0.739 ns 0.696 ns
BM_MakeHeap_uint64_PipeOrgan_262144 0.870 ns 0.849 ns
BM_MakeHeap_uint64_Heap_1 0.290 ns 0.284 ns
BM_MakeHeap_uint64_Heap_4 0.413 ns 0.314 ns
BM_MakeHeap_uint64_Heap_16 0.378 ns 0.277 ns
BM_MakeHeap_uint64_Heap_64 0.362 ns 0.272 ns
BM_MakeHeap_uint64_Heap_256 0.389 ns 0.303 ns
BM_MakeHeap_uint64_Heap_1024 0.362 ns 0.283 ns
BM_MakeHeap_uint64_Heap_16384 0.352 ns 0.253 ns
BM_MakeHeap_uint64_Heap_262144 0.350 ns 0.251 ns
BM_MakeHeap_uint64_QuickSortAdversary_1 0.293 ns 0.284 ns
BM_MakeHeap_uint64_QuickSortAdversary_4 0.606 ns 0.494 ns
BM_MakeHeap_uint64_QuickSortAdversary_16 0.645 ns 0.691 ns
BM_MakeHeap_uint64_QuickSortAdversary_64 0.607 ns 0.511 ns
BM_MakeHeap_uint64_QuickSortAdversary_256 0.641 ns 0.537 ns
BM_MakeHeap_uint64_QuickSortAdversary_1024 0.657 ns 0.529 ns
BM_MakeHeap_uint64_QuickSortAdversary_16384 0.435 ns 0.316 ns
BM_MakeHeap_uint64_QuickSortAdversary_262144 0.382 ns 0.266 ns
BM_MakeHeap_pair<uint32, uint32>_Random_1 0.297 ns 0.378 ns
BM_MakeHeap_pair<uint32, uint32>_Random_4 2.80 ns 0.765 ns
BM_MakeHeap_pair<uint32, uint32>_Random_16 2.92 ns 2.20 ns
BM_MakeHeap_pair<uint32, uint32>_Random_64 3.17 ns 3.64 ns
BM_MakeHeap_pair<uint32, uint32>_Random_256 3.44 ns 3.20 ns
BM_MakeHeap_pair<uint32, uint32>_Random_1024 3.35 ns 3.64 ns
BM_MakeHeap_pair<uint32, uint32>_Random_16384 3.22 ns 3.50 ns
BM_MakeHeap_pair<uint32, uint32>_Random_262144 3.61 ns 3.46 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_1 0.291 ns 0.379 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_4 0.779 ns 0.436 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_16 0.943 ns 1.01 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_64 1.17 ns 1.26 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_256 1.38 ns 1.44 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_1024 1.37 ns 1.43 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_16384 1.29 ns 1.31 ns
BM_MakeHeap_pair<uint32, uint32>_Ascending_262144 1.37 ns 1.40 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_1 0.292 ns 0.382 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_4 0.440 ns 0.347 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_16 0.529 ns 0.520 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_64 0.540 ns 0.527 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_256 0.637 ns 0.583 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_1024 0.552 ns 0.513 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_16384 0.522 ns 0.488 ns
BM_MakeHeap_pair<uint32, uint32>_Descending_262144 0.515 ns 0.492 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_1 0.299 ns 0.377 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_4 0.787 ns 0.474 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_16 1.07 ns 0.921 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_64 1.20 ns 1.15 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_256 1.30 ns 1.27 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_1024 1.30 ns 1.31 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_16384 1.29 ns 1.28 ns
BM_MakeHeap_pair<uint32, uint32>_SingleElement_262144 1.37 ns 1.38 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_1 0.293 ns 0.385 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_4 0.677 ns 0.438 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_16 1.04 ns 1.00 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_64 1.20 ns 1.27 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_256 1.40 ns 1.43 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_1024 1.36 ns 1.43 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_16384 1.27 ns 1.31 ns
BM_MakeHeap_pair<uint32, uint32>_PipeOrgan_262144 1.37 ns 1.41 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_1 0.292 ns 0.378 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_4 0.440 ns 0.380 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_16 0.560 ns 0.606 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_64 0.588 ns 0.573 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_256 0.632 ns 0.607 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_1024 0.598 ns 0.580 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_16384 0.576 ns 0.563 ns
BM_MakeHeap_pair<uint32, uint32>_Heap_262144 0.572 ns 0.561 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_1 0.304 ns 0.379 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_4 0.823 ns 0.430 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_16 1.08 ns 1.03 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_64 1.18 ns 1.23 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_256 1.39 ns 1.43 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_1024 1.36 ns 1.42 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_16384 1.28 ns 1.32 ns
BM_MakeHeap_pair<uint32, uint32>_QuickSortAdversary_262144 1.34 ns 1.37 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_1 0.276 ns 0.511 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_4 4.25 ns 1.96 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_16 4.84 ns 3.77 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_64 5.53 ns 4.93 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_256 5.30 ns 5.06 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_1024 5.29 ns 5.02 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_16384 5.53 ns 5.31 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Random_262144 5.49 ns 5.29 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_1 0.275 ns 0.443 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_4 1.22 ns 0.764 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_16 1.39 ns 1.49 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_64 1.66 ns 1.76 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_256 1.85 ns 1.99 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_1024 1.90 ns 2.02 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_16384 2.15 ns 2.27 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Ascending_262144 2.25 ns 2.37 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_1 0.274 ns 0.445 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_4 0.939 ns 0.520 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_16 1.02 ns 0.811 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_64 1.01 ns 0.941 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_256 1.03 ns 1.01 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_1024 0.969 ns 0.947 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_16384 0.882 ns 0.876 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Descending_262144 0.893 ns 0.871 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_1 0.276 ns 0.443 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_4 1.34 ns 0.870 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_16 1.60 ns 1.59 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_64 1.91 ns 2.00 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_256 1.91 ns 2.08 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_1024 1.91 ns 2.10 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_16384 2.13 ns 2.35 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_SingleElement_262144 2.25 ns 2.48 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_1 0.275 ns 0.446 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_4 1.07 ns 0.671 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_16 1.36 ns 1.44 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_64 1.70 ns 1.80 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_256 1.88 ns 2.05 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_1024 1.92 ns 2.06 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_16384 2.11 ns 2.26 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_PipeOrgan_262144 2.18 ns 2.34 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_1 0.274 ns 0.441 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_4 0.938 ns 0.587 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_16 1.01 ns 0.873 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_64 1.08 ns 1.00 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_256 1.21 ns 1.18 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_1024 1.37 ns 1.29 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_16384 1.31 ns 1.26 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_Heap_262144 1.29 ns 1.25 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_1 0.275 ns 0.447 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_4 1.22 ns 0.764 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_16 1.35 ns 1.46 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_64 1.63 ns 1.73 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_256 1.83 ns 1.87 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_1024 1.81 ns 1.94 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_16384 2.06 ns 2.19 ns
BM_MakeHeap_tuple<uint32, uint64, uint32>_QuickSortAdversary_262144 2.14 ns 2.31 ns
BM_MakeHeap_string_Random_1 0.289 ns 0.446 ns
BM_MakeHeap_string_Random_4 7.43 ns 4.84 ns
BM_MakeHeap_string_Random_16 9.73 ns 8.92 ns
BM_MakeHeap_string_Random_64 11.5 ns 11.5 ns
BM_MakeHeap_string_Random_256 12.6 ns 12.4 ns
BM_MakeHeap_string_Random_1024 13.2 ns 13.2 ns
BM_MakeHeap_string_Random_16384 15.4 ns 15.4 ns
BM_MakeHeap_string_Random_262144 21.4 ns 21.5 ns
BM_MakeHeap_string_Ascending_1 0.287 ns 0.447 ns
BM_MakeHeap_string_Ascending_4 3.22 ns 2.44 ns
BM_MakeHeap_string_Ascending_16 3.77 ns 3.62 ns
BM_MakeHeap_string_Ascending_64 4.84 ns 5.17 ns
BM_MakeHeap_string_Ascending_256 5.79 ns 6.04 ns
BM_MakeHeap_string_Ascending_1024 5.93 ns 6.60 ns
BM_MakeHeap_string_Ascending_16384 6.84 ns 7.25 ns
BM_MakeHeap_string_Ascending_262144 13.5 ns 14.3 ns
BM_MakeHeap_string_Descending_1 0.293 ns 0.447 ns
BM_MakeHeap_string_Descending_4 2.61 ns 1.83 ns
BM_MakeHeap_string_Descending_16 2.64 ns 2.60 ns
BM_MakeHeap_string_Descending_64 2.75 ns 2.74 ns
BM_MakeHeap_string_Descending_256 3.78 ns 3.57 ns
BM_MakeHeap_string_Descending_1024 3.20 ns 3.51 ns
BM_MakeHeap_string_Descending_16384 3.57 ns 3.85 ns
BM_MakeHeap_string_Descending_262144 6.27 ns 6.39 ns
BM_MakeHeap_string_SingleElement_1 0.291 ns 0.448 ns
BM_MakeHeap_string_SingleElement_4 3.88 ns 2.71 ns
BM_MakeHeap_string_SingleElement_16 5.08 ns 4.96 ns
BM_MakeHeap_string_SingleElement_64 6.14 ns 6.29 ns
BM_MakeHeap_string_SingleElement_256 6.13 ns 6.46 ns
BM_MakeHeap_string_SingleElement_1024 5.98 ns 6.60 ns
BM_MakeHeap_string_SingleElement_16384 5.88 ns 6.39 ns
BM_MakeHeap_string_SingleElement_262144 9.95 ns 10.2 ns
BM_MakeHeap_string_PipeOrgan_1 0.292 ns 0.447 ns
BM_MakeHeap_string_PipeOrgan_4 2.97 ns 2.51 ns
BM_MakeHeap_string_PipeOrgan_16 3.76 ns 3.91 ns
BM_MakeHeap_string_PipeOrgan_64 4.89 ns 5.20 ns
BM_MakeHeap_string_PipeOrgan_256 5.77 ns 6.09 ns
BM_MakeHeap_string_PipeOrgan_1024 6.14 ns 6.40 ns
BM_MakeHeap_string_PipeOrgan_16384 6.83 ns 7.32 ns
BM_MakeHeap_string_PipeOrgan_262144 13.8 ns 14.6 ns
BM_MakeHeap_string_Heap_1 0.288 ns 0.515 ns
BM_MakeHeap_string_Heap_4 3.62 ns 4.20 ns
BM_MakeHeap_string_Heap_16 5.36 ns 5.23 ns
BM_MakeHeap_string_Heap_64 5.79 ns 5.38 ns
BM_MakeHeap_string_Heap_256 5.70 ns 5.40 ns
BM_MakeHeap_string_Heap_1024 5.78 ns 5.37 ns
BM_MakeHeap_string_Heap_16384 6.09 ns 5.67 ns
BM_MakeHeap_string_Heap_262144 6.37 ns 5.96 ns
BM_MakeHeap_string_QuickSortAdversary_1 0.282 ns 0.448 ns
BM_MakeHeap_string_QuickSortAdversary_4 7.45 ns 5.60 ns
BM_MakeHeap_string_QuickSortAdversary_16 9.76 ns 8.85 ns
BM_MakeHeap_string_QuickSortAdversary_64 11.5 ns 11.2 ns
BM_MakeHeap_string_QuickSortAdversary_256 12.0 ns 11.8 ns
BM_MakeHeap_string_QuickSortAdversary_1024 12.2 ns 12.0 ns
BM_MakeHeap_string_QuickSortAdversary_16384 13.7 ns 13.6 ns
BM_MakeHeap_string_QuickSortAdversary_262144 14.1 ns 14.8 ns
BM_MakeHeap_float_Random_1 0.287 ns 0.287 ns
BM_MakeHeap_float_Random_4 2.29 ns 2.60 ns
BM_MakeHeap_float_Random_16 4.00 ns 2.48 ns
BM_MakeHeap_float_Random_64 4.41 ns 1.92 ns
BM_MakeHeap_float_Random_256 4.73 ns 2.05 ns
BM_MakeHeap_float_Random_1024 4.90 ns 2.27 ns
BM_MakeHeap_float_Random_16384 4.42 ns 2.27 ns
BM_MakeHeap_float_Random_262144 4.72 ns 1.39 ns
BM_MakeHeap_float_Ascending_1 0.291 ns 0.293 ns
BM_MakeHeap_float_Ascending_4 0.633 ns 0.428 ns
BM_MakeHeap_float_Ascending_16 0.638 ns 0.874 ns
BM_MakeHeap_float_Ascending_64 0.614 ns 0.698 ns
BM_MakeHeap_float_Ascending_256 0.663 ns 0.713 ns
BM_MakeHeap_float_Ascending_1024 0.660 ns 0.761 ns
BM_MakeHeap_float_Ascending_16384 0.628 ns 0.725 ns
BM_MakeHeap_float_Ascending_262144 0.629 ns 0.814 ns
BM_MakeHeap_float_Descending_1 0.290 ns 0.290 ns
BM_MakeHeap_float_Descending_4 0.421 ns 0.316 ns
BM_MakeHeap_float_Descending_16 0.302 ns 0.225 ns
BM_MakeHeap_float_Descending_64 0.293 ns 0.212 ns
BM_MakeHeap_float_Descending_256 0.314 ns 0.246 ns
BM_MakeHeap_float_Descending_1024 0.300 ns 0.231 ns
BM_MakeHeap_float_Descending_16384 0.308 ns 0.205 ns
BM_MakeHeap_float_Descending_262144 0.309 ns 0.203 ns
BM_MakeHeap_float_SingleElement_1 0.289 ns 0.292 ns
BM_MakeHeap_float_SingleElement_4 0.569 ns 0.347 ns
BM_MakeHeap_float_SingleElement_16 0.538 ns 0.825 ns
BM_MakeHeap_float_SingleElement_64 0.585 ns 0.727 ns
BM_MakeHeap_float_SingleElement_256 0.603 ns 0.708 ns
BM_MakeHeap_float_SingleElement_1024 0.618 ns 0.760 ns
BM_MakeHeap_float_SingleElement_16384 0.599 ns 0.726 ns
BM_MakeHeap_float_SingleElement_262144 0.723 ns 0.820 ns
BM_MakeHeap_float_PipeOrgan_1 0.289 ns 0.291 ns
BM_MakeHeap_float_PipeOrgan_4 0.457 ns 0.420 ns
BM_MakeHeap_float_PipeOrgan_16 0.670 ns 1.32 ns
BM_MakeHeap_float_PipeOrgan_64 0.764 ns 0.889 ns
BM_MakeHeap_float_PipeOrgan_256 0.793 ns 0.757 ns
BM_MakeHeap_float_PipeOrgan_1024 0.755 ns 0.764 ns
BM_MakeHeap_float_PipeOrgan_16384 0.723 ns 0.723 ns
BM_MakeHeap_float_PipeOrgan_262144 0.654 ns 0.817 ns
BM_MakeHeap_float_Heap_1 0.291 ns 0.289 ns
BM_MakeHeap_float_Heap_4 0.388 ns 0.316 ns
BM_MakeHeap_float_Heap_16 0.317 ns 0.225 ns
BM_MakeHeap_float_Heap_64 0.353 ns 0.213 ns
BM_MakeHeap_float_Heap_256 0.361 ns 0.246 ns
BM_MakeHeap_float_Heap_1024 0.381 ns 0.233 ns
BM_MakeHeap_float_Heap_16384 0.390 ns 0.205 ns
BM_MakeHeap_float_Heap_262144 0.379 ns 0.202 ns
BM_MakeHeap_float_QuickSortAdversary_1 0.295 ns 0.289 ns
BM_MakeHeap_float_QuickSortAdversary_4 0.640 ns 0.422 ns
BM_MakeHeap_float_QuickSortAdversary_16 0.658 ns 0.871 ns
BM_MakeHeap_float_QuickSortAdversary_64 0.574 ns 0.659 ns
BM_MakeHeap_float_QuickSortAdversary_256 0.631 ns 0.550 ns
BM_MakeHeap_float_QuickSortAdversary_1024 0.617 ns 0.552 ns
BM_MakeHeap_float_QuickSortAdversary_16384 0.424 ns 0.283 ns
BM_MakeHeap_float_QuickSortAdversary_262144 0.386 ns 0.219 ns
```
Fixes #120752
|
|
We've built up quite a few links directly to github within the code
base. We should instead use `llvm.org/PR<issue-number>` to link to bugs,
since that is resilient to the bug tracker changing in the future. This
is especially relevant for tests linking to bugs, since they will
probably be there for decades to come. A nice side effect is that these
links are significantly shorter than the GH links, making them much less
of an eyesore.
This patch also replaces a few links that linked to the old bugzilla
instance on llvm.org.
|
|
In #135303, we started using `__bit_log2` instead of `__log2i` inside
`std::sort`. However, `__bit_log2` has a precondition that `__log2i`
didn't have, which is that the input is non-zero. While it technically
makes no sense to request the logarithm of 0, `__log2i` handled that
case and returned 0 without issues.
After switching to `__bit_log2`, passing 0 as an input results in an
unsigned integer overflow which can trigger
`-fsanitize=unsigned-integer-overflow`. While not technically UB in
itself, it's clearly not intended either.
To fix this, we add an internal assertion to `__bit_log2` which catches
the issue in our test suite, and we make sure not to violate
`__bit_log2`'s preconditions before we call it from `std::sort`.
|
|
|
|
Libc++'s policy is to support only the latest released Xcode, which is
Xcode 16.x. We did update our CI jobs to Xcode 16.x, but we forgot to
update the documentation, which still mentioned Xcode 15. This patch
updates the documentation and cleans up outdated mentions of
apple-clang-15 in the test suite.
|
|
(#132896)
Previously, the segmented iterator optimization was limited to `std::{for_each, for_each_n}`. This patch
extends the optimization to `std::ranges::for_each` and `std::ranges::for_each_n`, ensuring consistent
optimizations across these algorithms. This patch first generalizes the `std` algorithms by introducing
a `Projection` parameter, which is set to `__identity` for the `std` algorithms. Then we let the `ranges`
algorithms to directly call their `std` counterparts with a general `__proj` argument. Benchmarks
demonstrate performance improvements of up to 21.4x for ``std::deque::iterator`` and 22.3x for
``join_view`` of ``vector<vector<char>>``.
Addresses a subtask of #102817.
|