aboutsummaryrefslogtreecommitdiff
path: root/libstdc++-v3/include/std
AgeCommit message (Collapse)AuthorFilesLines
35 hourslibstdc++: Implement C++23 P2590R2 - Explicit lifetime management [PR106658]Jakub Jelinek1-0/+129
As I can't think of how the middle-end would treat __builtin_start_lifetime_as other than a blackbox and probably would need to be implemented as such inline asm in RTL, this patch just implements it using inline asm in the library. If not anything else, it can serve as fallback before we and/or clang get some builtin for it. Right now the inline asms pretend (potential) read from and write to the whole memory region and make optimizers forget where the return value points to. If the optimizers don't know where it points to, I think that should be good enough, but I'm a little bit afraid of possibly future optimizations trying to optimize q->c = 1; q->d = 2; auto p = std::start_lifetime_as<S>(q); if (p == reinterpret_cast<decltype (p)>(q)) return p->a + p->b; that because of the guarding condition or perhaps assertion we could simply use the q pointer in MEM_REFs with S type and be surprised by TBAA. Though if it is a must-alias case, then we should be fine as well. Though guess that would be the same case with a builtin. 2025-09-18 Jakub Jelinek <jakub@redhat.com> PR c++/106658 * include/bits/version.def: Implement C++23 P2590R2 - Explicit lifetime management. (start_lifetime_as): New. * include/bits/version.h: Regenerate. * include/std/memory (std::start_lifetime_as, std::start_lifetime_as_array): New function templates. * src/c++23/std.cc.in (std::start_lifetime_as, std::start_lifetime_as_array): Export. * testsuite/std/memory/start_lifetime_as/start_lifetime_as.cc: New test.
3 dayslibstdc++: Optimize determination of std::tuple_cat return typeJonathan Wakely2-27/+26
The std::tuple_cat function has to determine a std::tuple return type from zero or more tuple-like arguments. This uses the __make_tuple class template to transform a tuple-like type into a std::tuple, and the __combine_tuples class template to combine zero or more std::tuple types into a single std::tuple type. This change optimizes the __make_tuple class template to use an _Index_tuple and pack expansion instead of recursive instantiation, and optimizes __combine_tuples to use fewer levels of recursion. For ranges::adjacent_view's __detail::__repeated_tuple helper we can just use the __make_tuple class template directly, instead of doing overload resolution on std::tuple_cat to get its return type. libstdc++-v3/ChangeLog: * include/std/ranges (__detail::__repeated_tuple): Use __make_tuple helper alias directly, instead of doing overload resolution on std::tuple_cat. * include/std/tuple (__make_tuple_impl): Remove. (__do_make_tuple): Replace recursion with _Index_tuple and pack expansion. (__make_tuple): Adjust to new __do_make_tuple definition. (__combine_tuples<tuple<T1s...>, tuple<T2s...>, Rem...>): Replace with a partial specialization for exactly two tuples and a partial specialization for three or more tuples. Reviewed-by: Patrick Palka <ppalka@redhat.com>
3 dayslibstdc++: Fix missing change to views::pairwise from P2165R4 [PR121956]Jonathan Wakely1-3/+1
ranges::adjacent_view::_Iterator::value_type should have been changed by r14-8710-g65b4cba9d6a9ff to always produce std::tuple, even for the N == 2 views::pairwise specialization. libstdc++-v3/ChangeLog: PR libstdc++/121956 * include/std/ranges (adjacent_view::_Iterator::value_type): Always define as std::tuple<T, N>, not std::pair<T, T>. * testsuite/std/ranges/adaptors/adjacent/1.cc: Check value type of views::pairwise. Reviewed-by: Patrick Palka <ppalka@redhat.com>
3 dayslibstdc++: Fix more missing uses of iter_difference_t [PR119820]Jonathan Wakely1-1/+1
libstdc++-v3/ChangeLog: PR libstdc++/119820 * include/bits/ranges_algo.h (__shuffle_fn): Use ranges::distance to get difference type value to add to iterator. * include/std/format (__formatter_str::_M_format_range): Use ranges::next to increment iterator by a size_t value. Reviewed-by: Patrick Palka <ppalka@redhat.com>
7 dayslibstdc++: Fix algorithms to use iterators' difference_type for arithmetic ↵Jonathan Wakely1-1/+1
[PR121890] Whenever we use operator+ or similar operators on random access iterators we need to be careful to use the iterator's difference_type rather than some other integer type. It's not guaranteed that an expression with an arbitrary integer type, such as `it + 1u`, has the same effects as `it + iter_difference_t<It>(1)`. Some of our algorithms need changes to cast values to the correct type, or to use std::next or ranges::next instead of `it + n`. Several tests also need fixes where the arithmetic occurs directly in the test. The __gnu_test::random_access_iterator_wrapper class template is adjusted to have deleted operators that make programs ill-formed if the argument to relevant operators is not the difference_type. This will make it easier to avoid regressing in future. libstdc++-v3/ChangeLog: PR libstdc++/121890 * include/bits/ranges_algo.h (ranges::rotate, ranges::shuffle) (__insertion_sort, __unguarded_partition_pivot, __introselect): Use ranges::next to advance iterators. Use local variables in rotate to avoid duplicate expressions. (ranges::push_heap, ranges::pop_heap, ranges::partial_sort) (ranges::partial_sort_copy): Use ranges::prev. (__final_insertion_sort): Use iter_difference_t<Iter> for operand of operator+ on iterator. * include/bits/ranges_base.h (ranges::advance): Use iterator's difference_type for all iterator arithmetic. * include/bits/stl_algo.h (__search_n_aux, __rotate) (__insertion_sort, __unguarded_partition_pivot, __introselect) (__final_insertion_sort, for_each_n, random_shuffle): Likewise. Use local variables in __rotate to avoid duplicate expressions. * include/bits/stl_algobase.h (__fill_n_a, __lc_rai::__newlast1): Likewise. * include/bits/stl_heap.h (push_heap): Likewise. (__is_heap_until): Add static_assert. (__is_heap): Convert distance to difference_type. * include/std/functional (boyer_moore_searcher::operator()): Use iterator's difference_type for iterator arithmetic. * testsuite/util/testsuite_iterators.h (random_access_iterator_wrapper): Add deleted overloads of operators that should be called with difference_type. * testsuite/24_iterators/range_operations/advance.cc: Use ranges::next. * testsuite/25_algorithms/heap/constrained.cc: Use ranges::next and ranges::prev. * testsuite/25_algorithms/nth_element/58800.cc: Use std::next. * testsuite/25_algorithms/nth_element/constrained.cc: Use ptrdiff_t for loop variable. * testsuite/25_algorithms/nth_element/random_test.cc: Use iterator's difference_type instead of int. * testsuite/25_algorithms/partial_sort/check_compare_by_value.cc: Use std::next. * testsuite/25_algorithms/partial_sort/constrained.cc: Use ptrdiff_t for loop variable. * testsuite/25_algorithms/partial_sort/random_test.cc: Use iterator's difference_type instead of int. * testsuite/25_algorithms/partial_sort_copy/constrained.cc: Use ptrdiff_t for loop variable. * testsuite/25_algorithms/partial_sort_copy/random_test.cc: Use iterator's difference_type instead of int. * testsuite/std/ranges/adaptors/drop.cc: Use ranges::next. * testsuite/25_algorithms/fill_n/diff_type.cc: New test. * testsuite/25_algorithms/lexicographical_compare/diff_type.cc: New test. Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
8 dayslibstdc++: Use consteval for _S_noexcept() helper functionsJonathan Wakely1-1/+1
These _S_noexcept() functions are only used in noexcept-specifiers and never need to be called at runtime. They can be immediate functions, i.e. consteval. libstdc++-v3/ChangeLog: * include/bits/iterator_concepts.h (_IterMove::_S_noexcept) (_IterSwap::_S_noexcept): Change constexpr to consteval. * include/bits/ranges_base.h (_Begin::_S_noexcept) (_End::_S_noexcept, _RBegin::_S_noexcept, _REnd::_S_noexcept) (_Size::_S_noexcept, _Empty::_S_noexcept, _Data::_S_noexcept): Likewise. * include/std/concepts (_Swap::_S_noexcept): Likewise. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
8 dayslibstdc++: Remove trailing whitespace in <syncstream>Jonathan Wakely1-1/+1
libstdc++-v3/ChangeLog: * include/std/syncstream: Remove trailing whitespace.
9 dayslibstdc++: Enforce Mandates: for Boyer-Moore searchersJonathan Wakely1-0/+12
C++17 has a 'Requires:' precondition that the two random access iterator types have the same value type. In C++20 that is a 'Mandates:' requirement which we must diagnose. Although we could diagnose it in C++17, that might be a breaking change for any users relying on it today. Also I am lazy and wanted to use C++20's std::iter_value_t for the checks. So this only enforces the requirement for C++20 and later. libstdc++-v3/ChangeLog: * include/std/functional (boyer_moore_searcher::operator()): Add static_assert. (boyer_moore_horspool_searcher::operator()): Likewise. * testsuite/20_util/function_objects/121782.cc: New test.
9 dayslibstdc++: Rename _S-prefixed identifiers in <mdspan>.Luc Grosheintz1-17/+17
In libstdc++ the prefix _S is used for static members only. In <mdspan> there's several type aliases that also used the prefix _S. They now use a single leading underscore follow by a capital letter instead. libstdc++-v3/ChangeLog: * include/std/mdspan (_ExtentsStorage::_Base): New name for _S_base. (_ExtentsStorage::_Storage): New name for _S_storage. (extents::_Storage): New name for _S_storage. (layout_stride::mapping::_Strides): New name for _S_stries_t. * testsuite/23_containers/mdspan/class_mandate_neg.cc: Update test to the new error message. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
9 dayslibstdc++: Apply LWG4351 to CTAD of span/mdspan.Luc Grosheintz1-1/+3
The concept __integral_constant_like doesn't consider traits with a boolean member `value` as an integer constant. This is done to reject various completely unrelated traits like is_const, is_abstract, etc. LWG4351 adjusts the check to strip references and cv qualifiers before checking if `value` is bool. The immediate context is constant_wrapper which defines: template<...> struct constant_wrapper { static constexpr const auto& value = ...; }; Without LWG4351, std::cw<true> and std::cw<false> would both be considered integer constants (by __integral_constant_like); but both std::{true,false}_type are not considered integer constants. Hence, LWG4351 removes inconsistent behaviour between std::integral_constant and std::constant_wrapper. libstdc++-v3/ChangeLog: * include/std/span (__integral_constant_like): Use remove_cvref_t before checking if _Tp::value is boolean. * testsuite/23_containers/mdspan/extents/misc.cc: Update test. * testsuite/23_containers/mdspan/mdspan.cc: Ditto. * testsuite/23_containers/span/deduction.cc: Ditto. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
9 dayslibstdc++: Use _Drop_iter<_CharT> for formattable concept checking [PR121765]Tomasz Kamiński1-37/+83
When producing output, the libstdc++ format implementation only uses _Sink_iter specializations. Since users cannot construct basic_format_context, this is the only iterator type actually used. The __format_padded helper relies on this property to efficiently pad sequences from tuples and ranges. However, the standard's formattable concept requires a generic format function in formatters that works with any iterator type. This is intended to future-proof the implementation by allowing new format_context types. Previously, libstdc++ used back_insert_iterator<basic_string<_CharT>> for this purpose. Normally, concept checks only instantiate function signatures, but with user-defined formatters and deduced return types, the function body and all called functions are instantiated. This could trigger a static assertion error in the range/tuple formatter that assumed the iterator was a _Sink_iter (see included test). This patch resolves the issue by replacing the _Iter_for_t alias with the internal _Drop_iter. This iterator's sematnics is to drop elements, so __format_padded can handle it by simply returning the input iterator, which still produces the required behavior [1]. An alternative of using _Sink_iter was considered but rejected because it would allow formatters to pass formattable requirements while only supporting format_context and wformat_context, which seems counter to the design intent (the std/format/formatter/concept.cc fails). [1] The standard's wording defines format functions as producing an output representation, but does not explicitly require a formatter to be invoked for each element. This allows the use of _Drop_iter to pass the concept check without generating any output. PR libstdc++/121765 libstdc++-v3/ChangeLog: * include/std/format (__format::_Drop_iter): Define. (_Iter_for_t::type): Change alias to _Drop_iter. (__format::__format_padded): Return __fc.out() for _Drop_iter. * testsuite/std/format/pr121765.cc: New test. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
10 dayslibstdc++: Rename _CwFixedValue::_S_type memberJonathan Wakely1-7/+7
Rename _S_type to __type as it's not a static member. Also rename _Tp to _Xv because it's not a type. libstdc++-v3/ChangeLog: * include/std/type_traits (_CwFixedValue::_S_type): Rename to __type. (constant_wrapper): Rename template parameter in declaration to match later definition.
11 dayslibstdc++: Rename template parameter of std::constant_wrapperJonathan Wakely1-3/+3
This fixes: FAIL: 17_intro/badnames.cc -std=gnu++26 (test for excess errors) libstdc++-v3/ChangeLog: * include/std/type_traits (constant_wrapper): Rename template parameter to avoid BADNAME.
11 dayslibstdc++: Make syncbuf _S_get_mutex definition extern.Nathan Myers1-16/+6
This patch creates a global function __syncbuf_get_mutex, gated by _GLIBCXX_HAS_GTHREADS, replacing a static instantiated member _S_get_mutex used in syncbuf<> construction, and makes the global symbol visible. A static local table of 16 mutexes is shared among all specializations of syncbuf<>, chosen on construction by a hash of the wrapped streambuf's address. It detaches the implementation of _S_get_mutex from the C++20 ABI. libstdc++-v3/ChangeLog: * include/std/syncstream: (syncbuf<>::__mutex) Remove _S_get_mutex, use extern function instead. * src/c++20/syncbuf.cc: Define global __syncbuf_get_mutex. * src/c++20/Makefile.am: Mention syncbuf.cc. * src/c++20/Makefile.in: Regenerate. * config/abi/pre/gnu.ver: Mention mangled __syncbuf_get_mutex.
11 dayslibstdc++: Adjust span/mdspan CTAD for P2781R9.Luc Grosheintz1-1/+2
A usecase for P2781R9 is more ergonomic creation of span and mdspan with mixed static and dynamic extents, e.g.: span(ptr, cw<3>) extents(cw<3>, 5, cw<7>) mdspan(ptr, cw<3>, 5, cw<7>) should be deduced as: span<..., 3> extents<..., 3, dyn, 7> mdspan<..., extents<..., 3, dyn, 7>> The change required is to strip cv-qualifiers and references from `_Tp::value`, because of: template<_CwFixedValue _X, typename> struct constant_wrapper : _CwOperators { static constexpr const auto& value = _X._M_data; libstdc++-v3/ChangeLog: * include/std/span (__integral_constant_like): Allow the member `value` of a constant wrapping type to be a const reference of an integer. * testsuite/23_containers/mdspan/extents/misc.cc: Add test for cw and constant_wrapper. * testsuite/23_containers/mdspan/mdspan.cc: Ditto. * testsuite/23_containers/span/deduction.cc: Ditto. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
11 dayslibstdc++: Implement constant_wrapper, cw from P2781R9.Luc Grosheintz1-0/+371
This is a partial implementation of P2781R9. It adds std::cw and std::constant_wrapper, but doesn't modify __integral_constant_like for span/mdspan. libstdc++-v3/ChangeLog: * include/bits/version.def (constant_wrapper): Add. * include/bits/version.h: Regenerate. * include/std/type_traits (_CwFixedValue): New class. (_IndexSequence): New struct. (_BuildIndexSequence): New struct. (_ConstExprParam): New concept. (_CwOperators): New struct. (constant_wrapper): New struct. (cw): New global constant. * src/c++23/std.cc.in (constant_wrapper): Add. (cw): Add. * testsuite/20_util/constant_wrapper/adl.cc: New test. * testsuite/20_util/constant_wrapper/ex.cc: New test. * testsuite/20_util/constant_wrapper/generic.cc: New test. * testsuite/20_util/constant_wrapper/instantiate.cc: New test. * testsuite/20_util/constant_wrapper/op_comma_neg.cc: New test. * testsuite/20_util/constant_wrapper/version.cc: New test. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Co-authored-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
14 dayslibstdc++: Make join_view::_Iterator::_M_get_inner noexcept [PR121804]Patrick Palka1-2/+2
Since this helper (added in r16-3576-g7f7f1878eedd80) is used in the noexcept-spec of iter_move and iter_swap, it in turn needs an accurate noexcept-spec. PR libstdc++/121804 libstdc++-v3/ChangeLog: * include/std/ranges (join_view::_Iterator::_M_get_inner): Mark noexcept. * testsuite/std/ranges/adaptors/join.cc (test16): New test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2025-09-04libstdc++: Conditionalize LWG 3569 changes to join_viewPatrick Palka1-12/+37
LWG 3569 adjusted join_view's iterator specification to handle non default-constructible iterators by wrapping the corresponding data member in std::optional, which we followed suit in r13-2649-g7aa80c82ecf3a3. But this wrapping is unnecessary for iterators that are already default-constructible. Rather than unconditionally using std::optional here, which introduces time/space overhead, this patch conditionalizes our LWG 3569 changes on the iterator in question being non-forward (and thus non default-constructible). We check forwardness instead of default-constructibility in order to accommodate input-only iterators that satisfy but do not model default_initializable, e.g. whose default constructor is underconstrained. libstdc++-v3/ChangeLog: * include/std/ranges (join_view::_Iterator::_M_satisfy): Adjust to handle non-std::optional _M_inner as per before LWG 3569. (join_view::_Iterator::_M_get_inner): New. (join_view::_Iterator::_M_inner): Don't wrap in std::optional if the iterator is forward. Initialize. (join_view::_Iterator::operator*): Use _M_get_inner instead of *_M_inner. (join_view::_Iterator::operator++): Likewise. (join_view::_Iterator::iter_move): Likewise. (join_view::_Iterator::iter_swap): Likewise. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2025-09-04libstdc++: Reuse _Bind_back_t functor in ranges::_PartialTomasz Kamiński1-92/+12
This patch refactors ranges::_Partial to be implemented using _Bind_back_t. This allows it to benefit from the changes in r16-3398-g250dd5b5604fbc, specifically making the closure trivially copyable. Since _Bind_back_t already provides an optimized implementation for a single bound argument, specializations for _Partial with a single argument are now removed. We still preserve a specialization of _Partial for trivially copy-constructible arguments that define only a const overload of operator(). To avoid re-checking invocability constraints, this specialization calls the now-public, unconstrained _Binder::_S_call static method instead of the constrained _Binder::operator(). The primary specialization of _Partial retains its operator(), which uses a simpler __adaptor_invocable constraint that does not consider member pointers, as they are not relevant here. This implementation also calls _Binder::_S_call to avoid re-performing overload resolution and invocability checks for _Binder::operator(). Finally, the _M_binder member (_Bind_back_t) is now marked [[no_unique_address]]. This is beneficial as ranges::_Partial is used with ranges::to, which commonly has zero or empty bound arguments (e.g., stateless allocators, comparators, or hash functions). libstdc++-v3/ChangeLog: * include/bits/binders.h (_Binder::_S_call): Make public. * include/std/ranges (ranges::_Partial<_Adaptor, _Args...>): Replace tuple<_Args...> with _Bind_back_t<_Adaptor, _Args...>. (ranges::_Partial<_Adaptor, _Arg>): Remove. Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-09-04libstdc++: Move _Binder and related aliases to separate file.Tomasz Kamiński1-186/+1
bits/binders.h is already mapped in libstdc++-v3/doc/doxygen/stdheader.cc. libstdc++-v3/ChangeLog: * include/Makefile.am: Add bits/binders.h * include/Makefile.in: Add bits/binders.h * include/std/functional (std::_Indexed_bound_arg, std::_Binder) (std::__make_bound_args, std::_Bind_front_t, std::_Bind_back_t): Moved to bits/binders.h file, that is now included. * include/bits/binders.h: New file. Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-09-04libstdc++: Merge bind_front and bind_back bindersTomasz Kamiński1-102/+70
The _Bind_front and _Bind_back class templates are now merged into a single _Binder implementation that accepts _Back as a template parameter. This makes the bind_back implementation available in C++20 mode, allowing it to be used for range adaptor closures. With zero bound arguments, bind_back and bind_front have equivalent functionality. Consequently, _Bind_back_t now produces the same type as bind_front (_Binder<false, _Fd>). A simple copy of the functor cannot be returned in this case, as it would visibly affect overload resolution (see included test cases). We also replace std::invoke in internal functions, with std::__invoke. libstdc++-v3/ChangeLog: * include/std/functional: (std::_Indexed_bound_arg): Fixed indentation. (__Bound_arg_storage::_S_apply_front) (__Bound_arg_storage::_S_apply_front): Merged into _S_apply. (__Bound_arg_storage::_S_apply): Merged above, add _Back template parameter, replace std::invoke with std::__invoke. (std::_Bind_front): Renamed to std::_Binder and add _Back template parameter. (std::_Binder): Renamed from std::_Bind_front. (_Binder::_Result_t, _Binder::_S_noexcept_invoke): Define. (_Binder::operator()): Use _Result_t and _S_noexcept_invoke. (_Binder::_S_call): Handle zero args specially, replace std::invoke with std::__invoke. (std::_Bind_front_t, std::_Bind_back_t): Defined in terms of _Binder. (std::_Bind_back): Merged into _Binder. * testsuite/20_util/function_objects/bind_back/1.cc: New tests. * testsuite/20_util/function_objects/bind_back/111327.cc: Updated error messages. * testsuite/20_util/function_objects/bind_front/1.cc: New tests. * testsuite/20_util/function_objects/bind_front/111327.cc: Updated error messages. Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-09-04libstdc++: Add _GLIBCXX_RESOLVE_LIB_DEFECTS for 4314 in <mdspan>.Luc Grosheintz1-1/+7
In r16-2328-g29d53f6213e0a1 we fixed a bug related to user-defined objects that can convert to an integers only via an rvalue reference. The same commit also implemented LWG 4314 [1], but didn't mark it with _GLIBCXX_RESOLVE_LIB_DEFECTS. This commit adds the missing markers. [1]: https://cplusplus.github.io/LWG/issue4314 It also fixes one cases of trailing white-space near a ctor for aligned_accessor. libstdc++-v3/ChangeLog: * include/std/mdspan (layout_left::mapping::operator()): Add _GLIBCXX_RESOLVE_LIB_DEFECTS marker for 4314. (layout_left::mapping::operator()): Ditto. (layout_stride::mapping::operator()): Ditto. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-09-03libstdc++: Implement LWG4222 'expected' constructor from a single value ↵Yihan Wang1-0/+1
missing a constraint libstdc++-v3/ChangeLog: * include/std/expected (expected(U&&)): Add missing constraint as per LWG 4222. * testsuite/20_util/expected/lwg4222.cc: New test. Signed-off-by: Yihan Wang <yronglin777@gmail.com>
2025-09-03libstdc++: Restore C++20 <chrono> support for old std::string ABIJonathan Wakely1-5/+8
The r16-3416-g806de30f51c8b9 change to use __cpp_lib_chrono in preprocessor conditions broke support for <chrono> for freestanding and the COW std::string ABI. That happened because __cpp_lib_chrono is only defined to the C++20 value for hosted and for the new ABI, because the full set of C++20 features are not defined for freestanding and tzdb is not defined for the old ABI. This introduces a new internal feature test macro that corresponds to the features that are always supported (e.g. chrono::local_time, chrono::year, chrono::weekday). libstdc++-v3/ChangeLog: * include/bits/version.def (chrono_cxx20): Define. * include/bits/version.h: Regenerate. * include/std/chrono: Check __glibcxx_chrono_cxx20 instead of __cpp_lib_chrono for C++20 features that don't require the new std::string ABI and/or can be used for freestanding. * src/c++20/clock.cc: Adjust preprocessor condition. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-09-02libstdc++: Move _Index_tuple, _Build_index_tuple to <type_traits>.Luc Grosheintz1-0/+22
As preparation for implementing std::constant_wrapper that's part of the C++26 version of the <type_traits> header, the two classes _Index_tuple and _Build_index_tuple are moved to <type_traits>. These two helpers are needed by std::constant_wrapper to initialize the elements of one C array with another. Since, <bits/utility.h> already includes <type_traits> this solution avoids creating a very small header file for just these two internal classes. This approach doesn't move std::index_sequence and related code to <type_traits> and therefore doesn't change which headers provide user-facing features. libstdc++-v3/ChangeLog: * include/bits/utility.h (_Index_tuple): Move to <type_traits>. (_Build_index_tuple): Ditto. * include/std/type_traits (_Index_tuple): Ditto. (_Build_index_tuple): Ditto. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-28libstdc++: Implement C++26 <debugging> features [PR119670]Jonathan Wakely1-0/+77
This implements P2546R5 (Debugging Support), including the P2810R4 (is_debugger_present is_replaceable) changes, allowing std::is_debugger_present to be replaced by the program. It would be good to provide a macOS definition of is_debugger_present as per https://developer.apple.com/library/archive/qa/qa1361/_index.html but that isn't included in this change. The src/c++26/debugging.cc file defines a global volatile int which can be set by debuggers to indicate when they are attached and detached from a running process. This allows std::is_debugger_present() to give a reliable answer, and additionally allows a debugger to choose how std::breakpoint() should behave. Setting the global to a positive value will cause std::breakpoint() to use that value as an argument to std::raise, so debuggers that prefer SIGABRT for breakpoints can select that. By default std::breakpoint() will use a platform-specific action such as the INT3 instruction on x86, or GCC's __builtin_trap(). On Linux the std::is_debugger_present() function checks whether the process is being traced by a process named "gdb", "gdbserver" or "lldb-server", to try to avoid interpreting other tracing processes (such as strace) as a debugger. There have been comments suggesting this isn't desirable and that std::is_debugger_present() should just return true for any tracing process (which is the case for non-Linux targets that support the ptrace system call). libstdc++-v3/ChangeLog: PR libstdc++/119670 * acinclude.m4 (GLIBCXX_CHECK_DEBUGGING): Check for facilities needed by <debugging>. * config.h.in: Regenerate. * configure: Regenerate. * configure.ac: Use GLIBCXX_CHECK_DEBUGGING. * include/Makefile.am: Add new header. * include/Makefile.in: Regenerate. * include/bits/version.def (debugging): Add. * include/bits/version.h: Regenerate. * include/precompiled/stdc++.h: Add new header. * src/c++26/Makefile.am: Add new file. * src/c++26/Makefile.in: Regenerate. * include/std/debugging: New file. * src/c++26/debugging.cc: New file. * testsuite/19_diagnostics/debugging/breakpoint.cc: New test. * testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc: New test. * testsuite/19_diagnostics/debugging/is_debugger_present.cc: New test. * testsuite/19_diagnostics/debugging/is_debugger_present-2.cc: New test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-28libstdc++: Remove implicit type conversions in std::complexWeslley da Silva Pereira1-7/+8
The current implementation of `complex<_Tp>` assumes that int `int` is implicitly convertible to `_Tp`, e.g., when using `complex<_Tp>(1)`. This patch transforms the implicit conversions into explicit type casts. As a result, `std::complex` is now able to support more types. One example is the type `Eigen::Half` from https://eigen.tuxfamily.org/dox-devel/Half_8h_source.html which does not implement implicit type conversions. libstdc++-v3/ChangeLog: * include/std/complex (polar, __complex_sqrt, pow) (__complex_pow_unsigned): Use explicit conversions from int to the complex value_type.
2025-08-28libstdc++: Constrain bitset(const CharT*) constructor [PR121046]Jonathan Wakely1-1/+7
Asking std::is_constructible_v<std::bitset<1>, NonTrivial*> gives an error, rather than answering the query. The problem is that the constructor for std::bitset("010101") is not constrained to only accept pointers to char-like types, and for the second parameter (which has a default argument) std::basic_string_view<CharT> gets instantiated. If the type is not char-like then that has undefined behaviour, and might trigger a static_assert to fail in the body of std::basic_string_view. We can fix it by constraining that constructor using the requirements for char-like types from [strings.general] p1. I've submitted LWG 4294 and proposed making this change in the standard. libstdc++-v3/ChangeLog: PR libstdc++/121046 * include/std/bitset (bitset(const CharT*, ...)): Add constraints on CharT type. * testsuite/23_containers/bitset/lwg4294.cc: New test.
2025-08-27libstdc++: Move tai_- and gps_clock::now impls out of ABINathan Myers1-17/+15
This patch moves std::tai_clock::now() and std::tai_clock::now() definitions from header inlines to static members invoked via a normal function call, in service of stabilizing the C++20 ABI. It also changes #if guards to mention the actual __cpp_lib_* feature gated, not just the language version, for clarity. New global function symbols std::chrono::tai_clock::now and std::chrono::gps_clock::now are exported. libstdc++-v3/ChangeLog: * include/std/chrono (gps_clock::now, tai_clock::now): Remove inline definitions. * src/c++20/clock.cc (gps_clock::now, tai_clock::now): New file for out-of-line now() impls. * src/c++20/Makefile.am: Mention clock.cc. * src/c++20/Makefile.in: Regenerate. * config/abi/pre/gnu.ver: add mangled now() symbols.
2025-08-26libstdc++/ranges: Prefer using offset-based _CachedPositionPatrick Palka1-2/+0
The offset-based partial specialization of _CachedPosition for random-access iterators is currently only selected if the offset type is smaller than the iterator type. Before r12-1018-g46ed811bcb4b86 this made sense since the main partial specialization only stored the iterator (incorrectly). After that bugfix, the main partial specialization now effectively stores a std::optional<iter> so the size constraint is inaccurate. And this main partial specialization must invalidate itself upon copy/move unlike the offset-based partial specialization. So I think we should just always prefer the offset-based _CachedPosition for a random-access iterator, even if the offset type happens to be larger than the iterator type. libstdc++-v3/ChangeLog: * include/std/ranges (__detail::_CachedPosition): Remove additional size constraint on the offset-based partial specialization. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2025-08-26libstdc++: Refactor bound arguments storage for bind_front/backTomasz Kamiński1-26/+87
This patch refactors the implementation of bind_front and bind_back to avoid using std::tuple for argument storage. Instead, bound arguments are now: * stored directly if there is only one, * within a dedicated _Bound_arg_storage otherwise. _Bound_arg_storage is less expensive to instantiate and access than std::tuple. It can also be trivially copyable, as it doesn't require a non-trivial assignment operator for reference types. Storing a single argument directly provides similar benefits compared to both one element tuple or _Bound_arg_storage. _Bound_arg_storage holds each argument in an _Indexed_bound_arg base object. The base class is parameterized by both type and index to allow storing multiple arguments of the same type. Invocations are handled by _S_apply_front amd _S_apply_back static functions, which simulate explicit object parameters. To facilitate this, the __like_t alias template is now unconditionally available since C++11 in bits/move.h. libstdc++-v3/ChangeLog: * include/bits/move.h (std::__like_impl, std::__like_t): Make available in c++11. * include/std/functional (std::_Indexed_bound_arg) (std::_Bound_arg_storage, std::__make_bound_args): Define. (std::_Bind_front, std::_Bind_back): Use _Bound_arg_storage. * testsuite/20_util/function_objects/bind_back/1.cc: Expand test to cover cases of 0, 1, many bound args. * testsuite/20_util/function_objects/bind_back/111327.cc: Likewise. * testsuite/20_util/function_objects/bind_front/1.cc: Likewise. * testsuite/20_util/function_objects/bind_front/111327.cc: Likewise. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-26libstdc++: Specialize _Never_valueless_alt for jthread, stop_token and ↵Tomasz Kamiński2-0/+33
stop_source The move constructors for stop_source and stop_token are equivalent to copying and clearing the raw pointer, as they are wrappers for a counted-shared state. For jthread, the move constructor performs a member-wise move of stop_source and thread. While std::thread could also have a _Never_valueless_alt specialization due to its inexpensive move (only moving a handle), doing so now would change the ABI. This patch takes the opportunity to correct this behavior for jthread, before C++20 API is marked stable. libstdc++-v3/ChangeLog: * include/std/stop_token (__variant::_Never_valueless_alt): Declare. (__variant::_Never_valueless_alt<std::stop_token>) (__variant::_Never_valueless_alt<std::stop_source>): Define. * include/std/thread: (__variant::_Never_valueless_alt): Declare. (__variant::_Never_valueless_alt<std::jthread>): Define.
2025-08-21libstdc++: Check _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK with #if [PR121496]Jonathan Wakely1-2/+2
The change in r14-905-g3b7cb33033fbe6 to disable the use of pthread_mutex_clocklock when TSan is active assumed that the _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK macro was always checked with #if rather than #ifdef, which was not true. This makes the checks use #if consistently. libstdc++-v3/ChangeLog: PR libstdc++/121496 * include/std/mutex (__timed_mutex_impl::_M_try_wait_until): Change preprocessor condition to use #if instead of #ifdef. (recursive_timed_mutex::_M_clocklock): Likewise. * testsuite/30_threads/timed_mutex/121496.cc: New test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-21libstdc++: Implement aligned_accessor from mdspan [PR120994]Luc Grosheintz1-0/+53
This commit completes the implementation of P2897R7 by implementing and testing the template class aligned_accessor. PR libstdc++/120994 libstdc++-v3/ChangeLog: * include/bits/version.def (aligned_accessor): Add. * include/bits/version.h: Regenerate. * include/std/mdspan (aligned_accessor): New class. * src/c++23/std.cc.in (aligned_accessor): Add. * testsuite/23_containers/mdspan/accessors/generic.cc: Add tests for aligned_accessor. * testsuite/23_containers/mdspan/accessors/aligned_neg.cc: New test. * testsuite/23_containers/mdspan/version.cc: Add test for __cpp_lib_aligned_accessor. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Implement is_sufficiently_aligned [PR120994]Luc Grosheintz1-0/+1
This commit implements and tests the function is_sufficiently_aligned from P2897R7. PR libstdc++/120994 libstdc++-v3/ChangeLog: * include/bits/align.h (is_sufficiently_aligned): New function. * include/bits/version.def (is_sufficiently_aligned): Add. * include/bits/version.h: Regenerate. * include/std/memory: Add __glibcxx_want_is_sufficiently_aligned. * src/c++23/std.cc.in (is_sufficiently_aligned): Add. * testsuite/20_util/headers/memory/version.cc: Add test for __cpp_lib_is_sufficiently_aligned. * testsuite/20_util/is_sufficiently_aligned/1.cc: New test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Fix std::numeric_limits<__float128>::max_digits10 [PR121374]Jonathan Wakely1-1/+1
When I added this explicit specialization in r14-1433-gf150a084e25eaa I used the wrong value for the number of mantissa digits (I used 112 instead of 113). Then when I refactored it in r14-1582-g6261d10521f9fd I used the value calculated from the incorrect value (35 instead of 36). libstdc++-v3/ChangeLog: PR libstdc++/121374 * include/std/limits (numeric_limits<__float128>::max_digits10): Fix value. * testsuite/18_support/numeric_limits/128bit.cc: Check value.
2025-08-21libstdc++: Implement std::dims from <mdspan>.Luc Grosheintz1-0/+5
This commit implements the C++26 feature std::dims described in P2389R2. It sets the feature testing macro to 202406 and adds tests. Also fixes the test mdspan/version.cc libstdc++-v3/ChangeLog: * include/bits/version.def (mdspan): Set value for C++26. * include/bits/version.h: Regenerate. * include/std/mdspan (dims): Add. * src/c++23/std.cc.in (dims): Add. * testsuite/23_containers/mdspan/extents/misc.cc: Add tests. * testsuite/23_containers/mdspan/version.cc: Update test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Simplify precomputed partial products in <mdspan>.Luc Grosheintz1-21/+23
Prior to this commit, the partial products of static extents in <mdspan> was done in a loop that calls a function that computes the partial product. The complexity is quadratic in the rank. This commit removes the quadratic complexity. libstdc++-v3/ChangeLog: * include/std/mdspan (__static_prod): Delete. (__fwd_partial_prods): Compute at compile-time in O(rank), not O(rank**2). (__rev_partial_prods): Ditto. (__size): Inline __static_prod. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Reduce size static storage for __fwd_prod in mdspan.Luc Grosheintz1-2/+2
This fixes an oversight in a previous commit that improved mdspan related code. Because __size doesn't use __fwd_prod, __fwd_prod(__rank) is not needed anymore. Hence, one can shrink the size of __fwd_partial_prods. libstdc++-v3/ChangeLog: * include/std/mdspan (__fwd_partial_prods): Reduce size of the array by 1 element. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Replace numeric_limit with __int_traits in mdspan.Luc Grosheintz1-12/+14
Using __int_traits avoids the need to include <limits> from <mdspan>. This in turn should reduce the size of the pre-compiled <mdspan>. Similar refactoring was carried out for PR92546. Unfortunately, ./gcc/xgcc -std=c++23 -P -E -x c++ - -include mdspan | wc -l shows a decrease by 1(!) line. This is due to bits/max_size_type.h which includes <limits>. libstdc++-v3/ChangeLog: * include/std/mdspan (__valid_static_extent): Replace numeric_limits with __int_traits. (extents::_S_ctor_explicit): Ditto. (extents::__static_quotient): Ditto. (layout_stride::mapping::mapping): Ditto. (mdspan::size): Ditto. * testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: Update test with additional diagnostics. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve extents::operator==.Luc Grosheintz1-4/+5
An interesting case to consider is: bool same11(const std::extents<int, dyn, 2, 3>& e1, const std::extents<int, dyn, dyn, 3>& e2) { return e1 == e2; } Which has the following properties: - There's no mismatching static extents, preventing any short-circuiting. - There's a comparison between dynamic and static extents. - There's one trivial comparison: ... && 3 == 3. Let E[i] denote the array of static extents, D[k] denote the array of dynamic extents and k[i] be the index of the i-th extent in D. (Naturally, k[i] is only meaningful if i is a dynamic extent). The previous implementation results in assembly that's more or less a literal translation of: for (i = 0; i < 3; ++i) e1 = E1[i] == -1 ? D1[k1[i]] : E1[i]; e2 = E2[i] == -1 ? D2[k2[i]] : E2[i]; if e1 != e2: return false return true; While the proposed method results in assembly for if(D1[0] == D2[0]) return false; return 2 == D2[1]; i.e. 110: 8b 17 mov edx,DWORD PTR [rdi] 112: 31 c0 xor eax,eax 114: 39 16 cmp DWORD PTR [rsi],edx 116: 74 08 je 120 <same11+0x10> 118: c3 ret 119: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 120: 83 7e 04 02 cmp DWORD PTR [rsi+0x4],0x2 124: 0f 94 c0 sete al 127: c3 ret It has the following nice properties: - It eliminated the indirection D[k[i]], because k[i] is known at compile time. Saving us a comparison E[i] == -1 and conditionally loading k[i]. - It eliminated the trivial condition 3 == 3. The result is code that only loads the required values and performs exactly the number of comparisons needed by the algorithm. It also results in smaller object files. Therefore, this seems like a sensible change. We've check several other examples, including fully statically determined cases and high-rank examples. The example given above illustrates the other cases well. The constexpr condition: if constexpr (!_S_is_compatible_extents<...>) return false; is no longer needed, because the optimizer correctly handles this case. However, it's retained for clarity/certainty. libstdc++-v3/ChangeLog: * include/std/mdspan (extents::operator==): Replace loop with pack expansion. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Reduce indirection in extents::extent.Luc Grosheintz1-10/+25
In both fully static and dynamic extents the comparison static_extent(i) == dynamic_extent is known at compile time. As a result, extents::extent doesn't need to perform the check at runtime. An illustrative example is: using E = std::extents<int, 3, 5, 7, 11, 13, 17>; int required_span_size(const typename Layout::mapping<E>& m) { return m.required_span_size(); } Prior to this commit the generated code (on -O2) is: 2a0: b9 01 00 00 00 mov ecx,0x1 2a5: 31 d2 xor edx,edx 2a7: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0] 2ae: 00 00 00 00 2b2: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0] 2b9: 00 00 00 00 2bd: 0f 1f 00 nop DWORD PTR [rax] 2c0: 48 8b 04 d5 00 00 00 mov rax,QWORD PTR [rdx*8+0x0] 2c7: 00 2c8: 48 83 f8 ff cmp rax,0xffffffffffffffff 2cc: 0f 84 00 00 00 00 je 2d2 <required_span_size_6d_static+0x32> 2d2: 83 e8 01 sub eax,0x1 2d5: 0f af 04 97 imul eax,DWORD PTR [rdi+rdx*4] 2d9: 48 83 c2 01 add rdx,0x1 2dd: 01 c1 add ecx,eax 2df: 48 83 fa 06 cmp rdx,0x6 2e3: 75 db jne 2c0 <required_span_size_6d_static+0x20> 2e5: 89 c8 mov eax,ecx 2e7: c3 ret which is a scalar loop, and notably includes the check 308: 48 83 f8 ff cmp rax,0xffffffffffffffff to assert that the static extent is indeed not -1. Note, that on -O3 the optimizer eliminates the comparison; and generates a sequence of scalar operations: lea, shl, add and mov. The aim of this commit is to eliminate this comparison also for -O2. With the optimization applied we get: 2e0: f3 0f 6f 0f movdqu xmm1,XMMWORD PTR [rdi] 2e4: 66 0f 6f 15 00 00 00 movdqa xmm2,XMMWORD PTR [rip+0x0] 2eb: 00 2ec: 8b 57 10 mov edx,DWORD PTR [rdi+0x10] 2ef: 66 0f 6f c1 movdqa xmm0,xmm1 2f3: 66 0f 73 d1 20 psrlq xmm1,0x20 2f8: 66 0f f4 c2 pmuludq xmm0,xmm2 2fc: 66 0f 73 d2 20 psrlq xmm2,0x20 301: 8d 14 52 lea edx,[rdx+rdx*2] 304: 66 0f f4 ca pmuludq xmm1,xmm2 308: 66 0f 70 c0 08 pshufd xmm0,xmm0,0x8 30d: 66 0f 70 c9 08 pshufd xmm1,xmm1,0x8 312: 66 0f 62 c1 punpckldq xmm0,xmm1 316: 66 0f 6f c8 movdqa xmm1,xmm0 31a: 66 0f 73 d9 08 psrldq xmm1,0x8 31f: 66 0f fe c1 paddd xmm0,xmm1 323: 66 0f 6f c8 movdqa xmm1,xmm0 327: 66 0f 73 d9 04 psrldq xmm1,0x4 32c: 66 0f fe c1 paddd xmm0,xmm1 330: 66 0f 7e c0 movd eax,xmm0 334: 8d 54 90 01 lea edx,[rax+rdx*4+0x1] 338: 8b 47 14 mov eax,DWORD PTR [rdi+0x14] 33b: c1 e0 04 shl eax,0x4 33e: 01 d0 add eax,edx 340: c3 ret Which shows eliminating the trivial comparison, unlocks a new set of optimizations, i.e. SIMD-vectorization. In particular, the loop has been vectorized by loading the first four constants from aligned memory; the first four strides from non-aligned memory, then computes the product and reduction. It interleaves the above with computing 1 + 12*S[4] + 16*S[5] (as scalar operations) and then finishes the reduction. A similar effect can be observed for fully dynamic extents. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__all_static): New function. (__mdspan::_StaticExtents::_S_is_dyn): Inline and eliminate. (__mdspan::_ExtentsStorage::_S_is_dynamic): New method. (__mdspan::_ExtentsStorage::_M_extent): Use _S_is_dynamic. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve nearly fully dynamic extents in mdspan.Luc Grosheintz1-2/+2
One previous commit optimized fully dynamic extents; and another refactored __size such that __fwd_prod is valid for __r = 0, ..., rank (exclusive). Therefore, by noticing that __rev_prod (and __fwd_prod) never accesses the first (or last) extent, one can avoid pre-computing partial products of static extents in those cases, if all other extents are dynamic. We check that the size of the reference object file decreases further and the .rodata sections for __fwd_prod<dyn, ..., dyn, 11> __rev_prod<3, dyn, ..., dyn> are absent. libstdc++-v3/ChangeLog: * include/std/mdspan (__fwd_prods): Relax condition for fully-dynamic extents to cover (dyn, ..., dyn, X). (__rev_partial_prods): Analogous for (X, dyn, ..., dyn). Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve fully dynamic extents in mdspan.Luc Grosheintz1-10/+53
In mdspan related code, for extents with no static extents, i.e. only dynamic extents, the following simplifications can be made: - The array of dynamic extents has size rank. - The two arrays dynamic-index and dynamic-index-inv become trivial, e.g. k[i] == i. - All elements of the arrays __{fwd,rev}_partial_prods are 1. This commits eliminates the arrays for dynamic-index, dynamic-index-inv and __{fwd,rev}_partial_prods. It also removes the indirection k[i] == i from the source code, which isn't as relevant because the optimizer is (often) capable of eliminating the indirection. To check if it's working we look at: using E2 = std::extents<int, dyn, dyn, dyn, dyn>; int stride_left_E2(const std::layout_left::mapping<E2>& m, size_t r) { return m.stride(r); } which generates the following 0000000000000190 <stride_left_E2>: 190: 48 c1 e6 02 shl rsi,0x2 194: 74 22 je 1b8 <stride_left_E2+0x28> 196: 48 01 fe add rsi,rdi 199: b8 01 00 00 00 mov eax,0x1 19e: 66 90 xchg ax,ax 1a0: 48 63 17 movsxd rdx,DWORD PTR [rdi] 1a3: 48 83 c7 04 add rdi,0x4 1a7: 48 0f af c2 imul rax,rdx 1ab: 48 39 fe cmp rsi,rdi 1ae: 75 f0 jne 1a0 <stride_left_E2+0x10> 1b0: c3 ret 1b1: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 1b8: b8 01 00 00 00 mov eax,0x1 1bd: c3 ret We see that: - There's no code to load the partial product of static extents. - There's no indirection D[k[i]], it's just D[i] (as before). On a test file which computes both mapping::stride(r) and mapping::required_span_size, we check for static storage with objdump -h we don't see the NTTP _Extents, anything (anymore) related to _StaticExtents, __fwd_partial_prods or __rev_partial_prods. We also check that the size of the reference object file (described three commits prior) reduced by a few percent from 41.9kB to 39.4kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__all_dynamic): New function. (__mdspan::_StaticExtents::_S_dynamic_index): Convert to method. (__mdspan::_StaticExtents::_S_dynamic_index_inv): Ditto. (__mdspan::_StaticExtents): New specialization for fully dynamic extents. (__mdspan::__fwd_prod): New constexpr if branch to avoid instantiating __fwd_partial_prods. (__mdspan::__rev_prod): Ditto. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve low-rank layout_{left,right}::stride.Luc Grosheintz1-6/+28
The methods layout_{left,right}::mapping::stride are defined as \prod_{i = 0}^r E[i] \prod_{i = r+1}^n E[i] This is computed as the product of a precomputed static product and the product of the required dynamic extents. Disassembly shows that even for low-rank extents, i.e. rank == 1 and rank == 2, with at least one dynamic extent, the generated code loads two values; and then runs the loop over at most one element, e.g. for stride_left_d5 defined below the generated code is: 220: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0] 227: 00 228: 31 d2 xor edx,edx 22a: 48 85 c0 test rax,rax 22d: 74 23 je 252 <stride_left_d5+0x32> 22f: 48 8b 0c f5 00 00 00 mov rcx,QWORD PTR [rsi*8+0x0] 236: 00 237: 48 c1 e1 02 shl rcx,0x2 23b: 74 13 je 250 <stride_left_d5+0x30> 23d: 48 01 f9 add rcx,rdi 240: 48 63 17 movsxd rdx,DWORD PTR [rdi] 243: 48 83 c7 04 add rdi,0x4 247: 48 0f af c2 imul rax,rdx 24b: 48 39 f9 cmp rcx,rdi 24e: 75 f0 jne 240 <stride_left_d5+0x20> 250: 89 c2 mov edx,eax 252: 89 d0 mov eax,edx 254: c3 ret If there's no dynamic extents, it simply loads the precomputed product of static extents. For rank == 1 the answer is the constant `1`; for rank == 2 it's either 1 or extents.extent(k), with k == 0 for layout_left and k == 1 for layout_right. Consider, using Ed = std::extents<int, dyn>; int stride_left_d(const std::layout_left::mapping<Ed>& m, size_t r) { return m.stride(r); } using E3d = std::extents<int, 3, dyn>; int stride_left_3d(const std::layout_left::mapping<E3d>& m, size_t r) { return m.stride(r); } using Ed5 = std::extents<int, dyn, 5>; int stride_left_d5(const std::layout_left::mapping<Ed5>& m, size_t r) { return m.stride(r); } The optimized code for these three cases is: 0000000000000060 <stride_left_d>: 60: b8 01 00 00 00 mov eax,0x1 65: c3 ret 0000000000000090 <stride_left_3d>: 90: 48 83 fe 01 cmp rsi,0x1 94: 19 c0 sbb eax,eax 96: 83 e0 fe and eax,0xfffffffe 99: 83 c0 03 add eax,0x3 9c: c3 ret 00000000000000a0 <stride_left_d5>: a0: b8 01 00 00 00 mov eax,0x1 a5: 48 85 f6 test rsi,rsi a8: 74 02 je ac <stride_left_d5+0xc> aa: 8b 07 mov eax,DWORD PTR [rdi] ac: c3 ret For rank == 1 it simply returns 1 (as expected). For rank == 2, it either implements a branchless formula, or conditionally loads one value. In all cases involving a dynamic extent this seems like it's always doing clearly less work, both in terms of computation and loads. In cases not involving a dynamic extent, it replaces loading one value with a branchless sequence of four instructions. This commit also refactors __size to no use any of the precomputed arrays. This prevents instantiating __{fwd,rev}_partial_prods for low-rank extents. This results in a further size reduction of a reference object file (described two commits prior) by 9% from 46.0kB to 41.9kB. In a prior commit we optimized __size to produce better object code by precomputing the static products. This refactor enables the optimizer to generate the same optimized code. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__fwd_prod): Optimize for rank <= 2. (__mdspan::__rev_prod): Ditto. (__mdspan::__size): Refactor to use a pre-computed product, not a partial product. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Precompute products of static extents.Luc Grosheintz1-25/+52
Let E denote an multi-dimensional extent; n the rank of E; r = 0, ..., n; E[i] the i-th extent; and D[k] be the (possibly empty) array of dynamic extents. The two partial products for r = 0, ..., n: \prod_{i = 0}^r E[i] (fwd) \prod_{i = r+1}^n E[i] (rev) can be computed as the product of static and dynamic extents. The static fwd and rev product can be computed at compile time for all values of r. Three methods are directly affected by this optimization: layout_left::mapping::stride layout_right::mapping::stride mdspan::size We'll check the generated code (-O2) for all three methods for a generic (artificially) high-dimensional multi-dimensional extents. Consider a generic case: using Extents = std::extents<int, 3, 5, dyn, dyn, dyn, 7, dyn>; int stride_left(const std::layout_left::mapping<Extents>& m, size_t r) { return m.stride(r); } The code generated prior to this commit: 4f0: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 4f8 4f7: 00 4f8: 48 83 c6 01 add rsi,0x1 4fc: 48 c7 44 24 e8 ff ff mov QWORD PTR [rsp-0x18],0xffffffffffffffff 503: ff ff 505: 48 8d 04 f5 00 00 00 lea rax,[rsi*8+0x0] 50c: 00 50d: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0 512: 66 0f 76 c0 pcmpeqd xmm0,xmm0 516: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0 51b: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 523 522: 00 523: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0 528: 48 83 f8 38 cmp rax,0x38 52c: 74 72 je 5a0 <stride_right_E1+0xb0> 52e: 48 8d 54 04 b8 lea rdx,[rsp+rax*1-0x48] 533: 4c 8d 4c 24 f0 lea r9,[rsp-0x10] 538: b8 01 00 00 00 mov eax,0x1 53d: 0f 1f 00 nop DWORD PTR [rax] 540: 48 8b 0a mov rcx,QWORD PTR [rdx] 543: 49 89 c0 mov r8,rax 546: 4c 0f af c1 imul r8,rcx 54a: 48 83 f9 ff cmp rcx,0xffffffffffffffff 54e: 49 0f 45 c0 cmovne rax,r8 552: 48 83 c2 08 add rdx,0x8 556: 49 39 d1 cmp r9,rdx 559: 75 e5 jne 540 <stride_right_E1+0x50> 55b: 48 85 c0 test rax,rax 55e: 74 38 je 598 <stride_right_E1+0xa8> 560: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi*8+0x0] 567: 00 568: 48 c1 e2 02 shl rdx,0x2 56c: 48 83 fa 10 cmp rdx,0x10 570: 74 1e je 590 <stride_right_E1+0xa0> 572: 48 8d 4f 10 lea rcx,[rdi+0x10] 576: 48 01 d7 add rdi,rdx 579: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 580: 48 63 17 movsxd rdx,DWORD PTR [rdi] 583: 48 83 c7 04 add rdi,0x4 587: 48 0f af c2 imul rax,rdx 58b: 48 39 f9 cmp rcx,rdi 58e: 75 f0 jne 580 <stride_right_E1+0x90> 590: c3 ret 591: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 598: c3 ret 599: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 5a0: b8 01 00 00 00 mov eax,0x1 5a5: eb b9 jmp 560 <stride_right_E1+0x70> 5a7: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0] 5ae: 00 00 which seems to be performing: preparatory_work(); ret = 1 for(i = 0; i < rank; ++i) tmp = ret * E[i] if E[i] != -1 ret = tmp for(i = 0; i < rank_dynamic; ++i) ret *= D[i] This commit reduces it down to: 270: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0] 277: 00 278: 31 d2 xor edx,edx 27a: 48 85 c0 test rax,rax 27d: 74 33 je 2b2 <stride_right_E1+0x42> 27f: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi*8+0x0] 286: 00 287: 48 c1 e2 02 shl rdx,0x2 28b: 48 83 fa 10 cmp rdx,0x10 28f: 74 1f je 2b0 <stride_right_E1+0x40> 291: 48 8d 4f 10 lea rcx,[rdi+0x10] 295: 48 01 d7 add rdi,rdx 298: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 29f: 00 2a0: 48 63 17 movsxd rdx,DWORD PTR [rdi] 2a3: 48 83 c7 04 add rdi,0x4 2a7: 48 0f af c2 imul rax,rdx 2ab: 48 39 f9 cmp rcx,rdi 2ae: 75 f0 jne 2a0 <stride_right_E1+0x30> 2b0: 89 c2 mov edx,eax 2b2: 89 d0 mov eax,edx 2b4: c3 ret Loosely speaking this does the following: 1. Load the starting position k in the array of dynamic extents; and return if possible. 2. Load the partial product of static extents. 3. Computes the \prod_{i = k}^d D[i] where d is the number of dynamic extents in a loop. It shows that the span used for passing in the dynamic extents is completely eliminated; and the fact that the product always runs to the end of the array of dynamic extents is used by the compiler to eliminate one indirection to determine the end position in the array of dynamic extents. The analogous code is generated for layout_left. Next, consider using E2 = std::extents<int, 3, 5, dyn, dyn, 7, dyn, 11>; int size2(const std::mdspan<double, E2>& md) { return md.size(); } on immediately preceding commit the generated code is 10: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 18 17: 00 18: 49 89 f8 mov r8,rdi 1b: 48 8d 44 24 b8 lea rax,[rsp-0x48] 20: 48 c7 44 24 e8 0b 00 mov QWORD PTR [rsp-0x18],0xb 27: 00 00 29: 48 8d 7c 24 f0 lea rdi,[rsp-0x10] 2e: ba 01 00 00 00 mov edx,0x1 33: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0 38: 66 0f 76 c0 pcmpeqd xmm0,xmm0 3c: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0 41: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 49 48: 00 49: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0 4e: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0] 55: 00 00 00 00 59: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 60: 48 8b 08 mov rcx,QWORD PTR [rax] 63: 48 89 d6 mov rsi,rdx 66: 48 0f af f1 imul rsi,rcx 6a: 48 83 f9 ff cmp rcx,0xffffffffffffffff 6e: 48 0f 45 d6 cmovne rdx,rsi 72: 48 83 c0 08 add rax,0x8 76: 48 39 c7 cmp rdi,rax 79: 75 e5 jne 60 <size2+0x50> 7b: 48 85 d2 test rdx,rdx 7e: 74 18 je 98 <size2+0x88> 80: 49 63 00 movsxd rax,DWORD PTR [r8] 83: 49 63 48 04 movsxd rcx,DWORD PTR [r8+0x4] 87: 48 0f af c1 imul rax,rcx 8b: 41 0f af 40 08 imul eax,DWORD PTR [r8+0x8] 90: 0f af c2 imul eax,edx 93: c3 ret 94: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 98: 31 c0 xor eax,eax 9a: c3 ret which is needlessly long. The current commit reduces it down to: 10: 48 63 07 movsxd rax,DWORD PTR [rdi] 13: 48 63 57 04 movsxd rdx,DWORD PTR [rdi+0x4] 17: 48 0f af c2 imul rax,rdx 1b: 0f af 47 08 imul eax,DWORD PTR [rdi+0x8] 1f: 69 c0 83 04 00 00 imul eax,eax,0x483 25: c3 ret Which simply computes the product: D[0] * D[1] * D[2] * const where const is the product of all static extents. Meaning the loop to compute the product of dynamic extents has been fully unrolled and all constants are perfectly precomputed. The size of the object file described in the previous commit reduces by 17% from 55.8kB to 46.0kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__static_prod): New function. (__mdspan::__fwd_partial_prods): Constexpr array of partial forward products. (__mdspan::__fwd_partial_prods): Same for reverse partial products. (__mdspan::__static_extents_prod): Delete function. (__mdspan::__extents_prod): Renamed from __exts_prod and refactored. include/std/mdspan (__mdspan::__fwd_prod): Compute as the product of pre-computed static static and the product of dynamic extents. (__mdspan::__rev_prod): Ditto. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Reduce template instantiations in <mdspan>.Luc Grosheintz1-19/+37
In mdspan related code involving static extents, often the IndexType is part of the template parameters, even though it's not needed. This commit extracts the parts of _ExtentsStorage not related to IndexType into a separate class _StaticExtents. It also prefers passing the array of static extents, instead of the whole extents object where possible. The size of an object file compiled with -O2 that instantiates Layout::mapping<extents<IndexType, Indices...>::stride Layout::mapping<extents<IndexType, Indices...>::required_span_size for the product of - eight IndexTypes - three Layouts, - nine choices of Indices... decreases by 19% from 69.2kB to 55.8kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::_StaticExtents): Extract non IndexType related code from _ExtentsStorage. (__mdspan::_ExtentsStorage): Use _StaticExtents. (__mdspan::__static_extents): Return reference to NTTP of _StaticExtents. (__mdspan::__contains_zero): New overload. (__mdspan::__exts_prod, __mdspan::__static_quotient): Use span to avoid copying __sta_exts. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-07-28libstdc++: Fix style issues in <mdspan>.Luc Grosheintz1-13/+4
libstdc++-v3/ChangeLog: * include/std/mdspan: Small stylistic adjustments. Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-07-28libstdc++: Support braces as arguments for std::erase on inplace_vector ↵Tomasz Kamiński1-1/+1
[PR121196] PR libstdc++/121196 libstdc++-v3/ChangeLog: * include/std/inplace_vector (std::erase): Provide default argument for _Up parameter. * testsuite/23_containers/inplace_vector/erasure.cc: Add test for using braces-init-list as arguments to erase_if and use function to verify content of inplace_vector Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-07-22libstdc++: Fix obvious mistake in inplace_vector::assign_range [PR119137]Tomasz Kamiński1-1/+1
In case of input iterators, the loop that assigns to existing elements should run up to number of elements in vector (_M_size) not capacity (_Nm). PR libstdc++/119137 libstdc++-v3/ChangeLog: * include/std/inplace_vector (inplace_vector::assign_range): Replace _Nm with _M_size in the assigment loop.