Age | Commit message (Collapse) | Author | Files | Lines |
|
As I can't think of how the middle-end would treat
__builtin_start_lifetime_as other than a blackbox and probably would
need to be implemented as such inline asm in RTL, this patch
just implements it using inline asm in the library.
If not anything else, it can serve as fallback before we and/or clang
get some builtin for it.
Right now the inline asms pretend (potential) read from and write to the whole
memory region and make optimizers forget where the return value points to.
If the optimizers don't know where it points to, I think that should be
good enough, but I'm a little bit afraid of possibly future optimizations
trying to optimize
q->c = 1;
q->d = 2;
auto p = std::start_lifetime_as<S>(q);
if (p == reinterpret_cast<decltype (p)>(q))
return p->a + p->b;
that because of the guarding condition or perhaps assertion we could
simply use the q pointer in MEM_REFs with S type and be surprised by TBAA.
Though if it is a must-alias case, then we should be fine as well.
Though guess that would be the same case with a builtin.
2025-09-18 Jakub Jelinek <jakub@redhat.com>
PR c++/106658
* include/bits/version.def: Implement C++23 P2590R2 - Explicit
lifetime management.
(start_lifetime_as): New.
* include/bits/version.h: Regenerate.
* include/std/memory (std::start_lifetime_as,
std::start_lifetime_as_array): New function templates.
* src/c++23/std.cc.in (std::start_lifetime_as,
std::start_lifetime_as_array): Export.
* testsuite/std/memory/start_lifetime_as/start_lifetime_as.cc: New test.
|
|
The std::tuple_cat function has to determine a std::tuple return type
from zero or more tuple-like arguments. This uses the __make_tuple class
template to transform a tuple-like type into a std::tuple, and the
__combine_tuples class template to combine zero or more std::tuple types
into a single std::tuple type.
This change optimizes the __make_tuple class template to use an
_Index_tuple and pack expansion instead of recursive instantiation, and
optimizes __combine_tuples to use fewer levels of recursion.
For ranges::adjacent_view's __detail::__repeated_tuple helper we can
just use the __make_tuple class template directly, instead of doing
overload resolution on std::tuple_cat to get its return type.
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::__repeated_tuple): Use
__make_tuple helper alias directly, instead of doing overload
resolution on std::tuple_cat.
* include/std/tuple (__make_tuple_impl): Remove.
(__do_make_tuple): Replace recursion with _Index_tuple and pack
expansion.
(__make_tuple): Adjust to new __do_make_tuple definition.
(__combine_tuples<tuple<T1s...>, tuple<T2s...>, Rem...>): Replace
with a partial specialization for exactly two tuples and a
partial specialization for three or more tuples.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
|
|
ranges::adjacent_view::_Iterator::value_type should have been changed by
r14-8710-g65b4cba9d6a9ff to always produce std::tuple, even for the
N == 2 views::pairwise specialization.
libstdc++-v3/ChangeLog:
PR libstdc++/121956
* include/std/ranges (adjacent_view::_Iterator::value_type):
Always define as std::tuple<T, N>, not std::pair<T, T>.
* testsuite/std/ranges/adaptors/adjacent/1.cc: Check value type
of views::pairwise.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
|
|
libstdc++-v3/ChangeLog:
PR libstdc++/119820
* include/bits/ranges_algo.h (__shuffle_fn): Use
ranges::distance to get difference type value to add to
iterator.
* include/std/format (__formatter_str::_M_format_range):
Use ranges::next to increment iterator by a size_t value.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
|
|
[PR121890]
Whenever we use operator+ or similar operators on random access
iterators we need to be careful to use the iterator's difference_type
rather than some other integer type. It's not guaranteed that an
expression with an arbitrary integer type, such as `it + 1u`, has the
same effects as `it + iter_difference_t<It>(1)`.
Some of our algorithms need changes to cast values to the correct type,
or to use std::next or ranges::next instead of `it + n`. Several tests
also need fixes where the arithmetic occurs directly in the test.
The __gnu_test::random_access_iterator_wrapper class template is
adjusted to have deleted operators that make programs ill-formed if the
argument to relevant operators is not the difference_type. This will
make it easier to avoid regressing in future.
libstdc++-v3/ChangeLog:
PR libstdc++/121890
* include/bits/ranges_algo.h (ranges::rotate, ranges::shuffle)
(__insertion_sort, __unguarded_partition_pivot, __introselect):
Use ranges::next to advance iterators. Use local variables in
rotate to avoid duplicate expressions.
(ranges::push_heap, ranges::pop_heap, ranges::partial_sort)
(ranges::partial_sort_copy): Use ranges::prev.
(__final_insertion_sort): Use iter_difference_t<Iter>
for operand of operator+ on iterator.
* include/bits/ranges_base.h (ranges::advance): Use iterator's
difference_type for all iterator arithmetic.
* include/bits/stl_algo.h (__search_n_aux, __rotate)
(__insertion_sort, __unguarded_partition_pivot, __introselect)
(__final_insertion_sort, for_each_n, random_shuffle): Likewise.
Use local variables in __rotate to avoid duplicate expressions.
* include/bits/stl_algobase.h (__fill_n_a, __lc_rai::__newlast1):
Likewise.
* include/bits/stl_heap.h (push_heap): Likewise.
(__is_heap_until): Add static_assert.
(__is_heap): Convert distance to difference_type.
* include/std/functional (boyer_moore_searcher::operator()): Use
iterator's difference_type for iterator arithmetic.
* testsuite/util/testsuite_iterators.h
(random_access_iterator_wrapper): Add deleted overloads of
operators that should be called with difference_type.
* testsuite/24_iterators/range_operations/advance.cc: Use
ranges::next.
* testsuite/25_algorithms/heap/constrained.cc: Use ranges::next
and ranges::prev.
* testsuite/25_algorithms/nth_element/58800.cc: Use std::next.
* testsuite/25_algorithms/nth_element/constrained.cc: Use
ptrdiff_t for loop variable.
* testsuite/25_algorithms/nth_element/random_test.cc: Use
iterator's difference_type instead of int.
* testsuite/25_algorithms/partial_sort/check_compare_by_value.cc:
Use std::next.
* testsuite/25_algorithms/partial_sort/constrained.cc: Use
ptrdiff_t for loop variable.
* testsuite/25_algorithms/partial_sort/random_test.cc: Use
iterator's difference_type instead of int.
* testsuite/25_algorithms/partial_sort_copy/constrained.cc:
Use ptrdiff_t for loop variable.
* testsuite/25_algorithms/partial_sort_copy/random_test.cc:
Use iterator's difference_type instead of int.
* testsuite/std/ranges/adaptors/drop.cc: Use ranges::next.
* testsuite/25_algorithms/fill_n/diff_type.cc: New test.
* testsuite/25_algorithms/lexicographical_compare/diff_type.cc:
New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
These _S_noexcept() functions are only used in noexcept-specifiers and
never need to be called at runtime. They can be immediate functions,
i.e. consteval.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h (_IterMove::_S_noexcept)
(_IterSwap::_S_noexcept): Change constexpr to consteval.
* include/bits/ranges_base.h (_Begin::_S_noexcept)
(_End::_S_noexcept, _RBegin::_S_noexcept, _REnd::_S_noexcept)
(_Size::_S_noexcept, _Empty::_S_noexcept, _Data::_S_noexcept):
Likewise.
* include/std/concepts (_Swap::_S_noexcept): Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
libstdc++-v3/ChangeLog:
* include/std/syncstream: Remove trailing whitespace.
|
|
C++17 has a 'Requires:' precondition that the two random access iterator
types have the same value type. In C++20 that is a 'Mandates:'
requirement which we must diagnose.
Although we could diagnose it in C++17, that might be a breaking change
for any users relying on it today. Also I am lazy and wanted to use
C++20's std::iter_value_t for the checks. So this only enforces the
requirement for C++20 and later.
libstdc++-v3/ChangeLog:
* include/std/functional (boyer_moore_searcher::operator()): Add
static_assert.
(boyer_moore_horspool_searcher::operator()): Likewise.
* testsuite/20_util/function_objects/121782.cc: New test.
|
|
In libstdc++ the prefix _S is used for static members only. In <mdspan>
there's several type aliases that also used the prefix _S. They now use
a single leading underscore follow by a capital letter instead.
libstdc++-v3/ChangeLog:
* include/std/mdspan (_ExtentsStorage::_Base): New name for
_S_base.
(_ExtentsStorage::_Storage): New name for _S_storage.
(extents::_Storage): New name for _S_storage.
(layout_stride::mapping::_Strides): New name for
_S_stries_t.
* testsuite/23_containers/mdspan/class_mandate_neg.cc: Update
test to the new error message.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
The concept __integral_constant_like doesn't consider traits with a
boolean member `value` as an integer constant. This is done to reject
various completely unrelated traits like is_const, is_abstract, etc.
LWG4351 adjusts the check to strip references and cv qualifiers before
checking if `value` is bool. The immediate context is constant_wrapper
which defines:
template<...>
struct constant_wrapper
{
static constexpr const auto& value = ...;
};
Without LWG4351, std::cw<true> and std::cw<false> would both be
considered integer constants (by __integral_constant_like); but both
std::{true,false}_type are not considered integer constants. Hence,
LWG4351 removes inconsistent behaviour between std::integral_constant
and std::constant_wrapper.
libstdc++-v3/ChangeLog:
* include/std/span (__integral_constant_like): Use
remove_cvref_t before checking if _Tp::value is boolean.
* testsuite/23_containers/mdspan/extents/misc.cc: Update test.
* testsuite/23_containers/mdspan/mdspan.cc: Ditto.
* testsuite/23_containers/span/deduction.cc: Ditto.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
When producing output, the libstdc++ format implementation only uses _Sink_iter
specializations. Since users cannot construct basic_format_context, this is the
only iterator type actually used. The __format_padded helper relies on this
property to efficiently pad sequences from tuples and ranges.
However, the standard's formattable concept requires a generic format function
in formatters that works with any iterator type. This is intended to
future-proof the implementation by allowing new format_context types. Previously,
libstdc++ used back_insert_iterator<basic_string<_CharT>> for this purpose.
Normally, concept checks only instantiate function signatures, but with
user-defined formatters and deduced return types, the function body and all
called functions are instantiated. This could trigger a static assertion error
in the range/tuple formatter that assumed the iterator was a _Sink_iter
(see included test).
This patch resolves the issue by replacing the _Iter_for_t alias with the
internal _Drop_iter. This iterator's sematnics is to drop elements, so
__format_padded can handle it by simply returning the input iterator, which
still produces the required behavior [1].
An alternative of using _Sink_iter was considered but rejected because it would
allow formatters to pass formattable requirements while only supporting
format_context and wformat_context, which seems counter to the design intent
(the std/format/formatter/concept.cc fails).
[1] The standard's wording defines format functions as producing an output
representation, but does not explicitly require a formatter to be invoked
for each element. This allows the use of _Drop_iter to pass the concept check
without generating any output.
PR libstdc++/121765
libstdc++-v3/ChangeLog:
* include/std/format (__format::_Drop_iter): Define.
(_Iter_for_t::type): Change alias to _Drop_iter.
(__format::__format_padded): Return __fc.out() for
_Drop_iter.
* testsuite/std/format/pr121765.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
Rename _S_type to __type as it's not a static member.
Also rename _Tp to _Xv because it's not a type.
libstdc++-v3/ChangeLog:
* include/std/type_traits (_CwFixedValue::_S_type): Rename to
__type.
(constant_wrapper): Rename template parameter in declaration to
match later definition.
|
|
This fixes:
FAIL: 17_intro/badnames.cc -std=gnu++26 (test for excess errors)
libstdc++-v3/ChangeLog:
* include/std/type_traits (constant_wrapper): Rename template
parameter to avoid BADNAME.
|
|
This patch creates a global function __syncbuf_get_mutex, gated by
_GLIBCXX_HAS_GTHREADS, replacing a static instantiated member
_S_get_mutex used in syncbuf<> construction, and makes the global
symbol visible. A static local table of 16 mutexes is shared among
all specializations of syncbuf<>, chosen on construction by a hash
of the wrapped streambuf's address.
It detaches the implementation of _S_get_mutex from the C++20 ABI.
libstdc++-v3/ChangeLog:
* include/std/syncstream: (syncbuf<>::__mutex) Remove _S_get_mutex,
use extern function instead.
* src/c++20/syncbuf.cc: Define global __syncbuf_get_mutex.
* src/c++20/Makefile.am: Mention syncbuf.cc.
* src/c++20/Makefile.in: Regenerate.
* config/abi/pre/gnu.ver: Mention mangled __syncbuf_get_mutex.
|
|
A usecase for P2781R9 is more ergonomic creation of span and mdspan with
mixed static and dynamic extents, e.g.:
span(ptr, cw<3>)
extents(cw<3>, 5, cw<7>)
mdspan(ptr, cw<3>, 5, cw<7>)
should be deduced as:
span<..., 3>
extents<..., 3, dyn, 7>
mdspan<..., extents<..., 3, dyn, 7>>
The change required is to strip cv-qualifiers and references from
`_Tp::value`, because of:
template<_CwFixedValue _X, typename>
struct constant_wrapper : _CwOperators
{
static constexpr const auto& value = _X._M_data;
libstdc++-v3/ChangeLog:
* include/std/span (__integral_constant_like): Allow the member
`value` of a constant wrapping type to be a const reference of
an integer.
* testsuite/23_containers/mdspan/extents/misc.cc: Add test for
cw and constant_wrapper.
* testsuite/23_containers/mdspan/mdspan.cc: Ditto.
* testsuite/23_containers/span/deduction.cc: Ditto.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
This is a partial implementation of P2781R9. It adds std::cw and
std::constant_wrapper, but doesn't modify __integral_constant_like for
span/mdspan.
libstdc++-v3/ChangeLog:
* include/bits/version.def (constant_wrapper): Add.
* include/bits/version.h: Regenerate.
* include/std/type_traits (_CwFixedValue): New class.
(_IndexSequence): New struct.
(_BuildIndexSequence): New struct.
(_ConstExprParam): New concept.
(_CwOperators): New struct.
(constant_wrapper): New struct.
(cw): New global constant.
* src/c++23/std.cc.in (constant_wrapper): Add.
(cw): Add.
* testsuite/20_util/constant_wrapper/adl.cc: New test.
* testsuite/20_util/constant_wrapper/ex.cc: New test.
* testsuite/20_util/constant_wrapper/generic.cc: New test.
* testsuite/20_util/constant_wrapper/instantiate.cc: New test.
* testsuite/20_util/constant_wrapper/op_comma_neg.cc: New test.
* testsuite/20_util/constant_wrapper/version.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Co-authored-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
Since this helper (added in r16-3576-g7f7f1878eedd80) is used in the
noexcept-spec of iter_move and iter_swap, it in turn needs an accurate
noexcept-spec.
PR libstdc++/121804
libstdc++-v3/ChangeLog:
* include/std/ranges (join_view::_Iterator::_M_get_inner):
Mark noexcept.
* testsuite/std/ranges/adaptors/join.cc (test16): New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
|
|
LWG 3569 adjusted join_view's iterator specification to handle non
default-constructible iterators by wrapping the corresponding data member
in std::optional, which we followed suit in r13-2649-g7aa80c82ecf3a3.
But this wrapping is unnecessary for iterators that are already
default-constructible. Rather than unconditionally using std::optional
here, which introduces time/space overhead, this patch conditionalizes
our LWG 3569 changes on the iterator in question being non-forward (and
thus non default-constructible). We check forwardness instead of
default-constructibility in order to accommodate input-only iterators
that satisfy but do not model default_initializable, e.g. whose default
constructor is underconstrained.
libstdc++-v3/ChangeLog:
* include/std/ranges (join_view::_Iterator::_M_satisfy):
Adjust to handle non-std::optional _M_inner as per before LWG 3569.
(join_view::_Iterator::_M_get_inner): New.
(join_view::_Iterator::_M_inner): Don't wrap in std::optional if
the iterator is forward. Initialize.
(join_view::_Iterator::operator*): Use _M_get_inner instead
of *_M_inner.
(join_view::_Iterator::operator++): Likewise.
(join_view::_Iterator::iter_move): Likewise.
(join_view::_Iterator::iter_swap): Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
|
|
This patch refactors ranges::_Partial to be implemented using _Bind_back_t.
This allows it to benefit from the changes in r16-3398-g250dd5b5604fbc,
specifically making the closure trivially copyable. Since _Bind_back_t
already provides an optimized implementation for a single bound argument,
specializations for _Partial with a single argument are now removed.
We still preserve a specialization of _Partial for trivially copy-constructible
arguments that define only a const overload of operator(). To avoid
re-checking invocability constraints, this specialization calls the now-public,
unconstrained _Binder::_S_call static method instead of the constrained
_Binder::operator().
The primary specialization of _Partial retains its operator(), which
uses a simpler __adaptor_invocable constraint that does not consider
member pointers, as they are not relevant here. This implementation also
calls _Binder::_S_call to avoid re-performing overload resolution and
invocability checks for _Binder::operator().
Finally, the _M_binder member (_Bind_back_t) is now marked
[[no_unique_address]]. This is beneficial as ranges::_Partial is used with
ranges::to, which commonly has zero or empty bound arguments (e.g., stateless
allocators, comparators, or hash functions).
libstdc++-v3/ChangeLog:
* include/bits/binders.h (_Binder::_S_call): Make public.
* include/std/ranges (ranges::_Partial<_Adaptor, _Args...>):
Replace tuple<_Args...> with _Bind_back_t<_Adaptor, _Args...>.
(ranges::_Partial<_Adaptor, _Arg>): Remove.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
bits/binders.h is already mapped in libstdc++-v3/doc/doxygen/stdheader.cc.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add bits/binders.h
* include/Makefile.in: Add bits/binders.h
* include/std/functional (std::_Indexed_bound_arg, std::_Binder)
(std::__make_bound_args, std::_Bind_front_t, std::_Bind_back_t):
Moved to bits/binders.h file, that is now included.
* include/bits/binders.h: New file.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
The _Bind_front and _Bind_back class templates are now merged into a single
_Binder implementation that accepts _Back as a template parameter. This makes
the bind_back implementation available in C++20 mode, allowing it to be used
for range adaptor closures.
With zero bound arguments, bind_back and bind_front have equivalent
functionality. Consequently, _Bind_back_t now produces the same type as
bind_front (_Binder<false, _Fd>). A simple copy of the functor cannot be
returned in this case, as it would visibly affect overload resolution
(see included test cases).
We also replace std::invoke in internal functions, with std::__invoke.
libstdc++-v3/ChangeLog:
* include/std/functional: (std::_Indexed_bound_arg): Fixed
indentation.
(__Bound_arg_storage::_S_apply_front)
(__Bound_arg_storage::_S_apply_front): Merged into _S_apply.
(__Bound_arg_storage::_S_apply): Merged above, add _Back template
parameter, replace std::invoke with std::__invoke.
(std::_Bind_front): Renamed to std::_Binder and add _Back
template parameter.
(std::_Binder): Renamed from std::_Bind_front.
(_Binder::_Result_t, _Binder::_S_noexcept_invoke): Define.
(_Binder::operator()): Use _Result_t and _S_noexcept_invoke.
(_Binder::_S_call): Handle zero args specially, replace std::invoke
with std::__invoke.
(std::_Bind_front_t, std::_Bind_back_t): Defined in terms
of _Binder.
(std::_Bind_back): Merged into _Binder.
* testsuite/20_util/function_objects/bind_back/1.cc: New tests.
* testsuite/20_util/function_objects/bind_back/111327.cc: Updated
error messages.
* testsuite/20_util/function_objects/bind_front/1.cc: New tests.
* testsuite/20_util/function_objects/bind_front/111327.cc: Updated
error messages.
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
In r16-2328-g29d53f6213e0a1 we fixed a bug related to user-defined
objects that can convert to an integers only via an rvalue reference.
The same commit also implemented LWG 4314 [1], but didn't mark it with
_GLIBCXX_RESOLVE_LIB_DEFECTS. This commit adds the missing markers.
[1]: https://cplusplus.github.io/LWG/issue4314
It also fixes one cases of trailing white-space near a ctor for
aligned_accessor.
libstdc++-v3/ChangeLog:
* include/std/mdspan (layout_left::mapping::operator()): Add
_GLIBCXX_RESOLVE_LIB_DEFECTS marker for 4314.
(layout_left::mapping::operator()): Ditto.
(layout_stride::mapping::operator()): Ditto.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
missing a constraint
libstdc++-v3/ChangeLog:
* include/std/expected (expected(U&&)): Add missing constraint
as per LWG 4222.
* testsuite/20_util/expected/lwg4222.cc: New test.
Signed-off-by: Yihan Wang <yronglin777@gmail.com>
|
|
The r16-3416-g806de30f51c8b9 change to use __cpp_lib_chrono in
preprocessor conditions broke support for <chrono> for freestanding and
the COW std::string ABI. That happened because __cpp_lib_chrono is only
defined to the C++20 value for hosted and for the new ABI, because the
full set of C++20 features are not defined for freestanding and tzdb is
not defined for the old ABI.
This introduces a new internal feature test macro that corresponds to
the features that are always supported (e.g. chrono::local_time,
chrono::year, chrono::weekday).
libstdc++-v3/ChangeLog:
* include/bits/version.def (chrono_cxx20): Define.
* include/bits/version.h: Regenerate.
* include/std/chrono: Check __glibcxx_chrono_cxx20 instead of
__cpp_lib_chrono for C++20 features that don't require the new
std::string ABI and/or can be used for freestanding.
* src/c++20/clock.cc: Adjust preprocessor condition.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
As preparation for implementing std::constant_wrapper that's part of the
C++26 version of the <type_traits> header, the two classes _Index_tuple
and _Build_index_tuple are moved to <type_traits>. These two helpers are
needed by std::constant_wrapper to initialize the elements of one C
array with another.
Since, <bits/utility.h> already includes <type_traits> this solution
avoids creating a very small header file for just these two internal
classes. This approach doesn't move std::index_sequence and related code
to <type_traits> and therefore doesn't change which headers provide
user-facing features.
libstdc++-v3/ChangeLog:
* include/bits/utility.h (_Index_tuple): Move to <type_traits>.
(_Build_index_tuple): Ditto.
* include/std/type_traits (_Index_tuple): Ditto.
(_Build_index_tuple): Ditto.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
This implements P2546R5 (Debugging Support), including the P2810R4
(is_debugger_present is_replaceable) changes, allowing
std::is_debugger_present to be replaced by the program.
It would be good to provide a macOS definition of is_debugger_present as
per https://developer.apple.com/library/archive/qa/qa1361/_index.html
but that isn't included in this change.
The src/c++26/debugging.cc file defines a global volatile int which can
be set by debuggers to indicate when they are attached and detached from
a running process. This allows std::is_debugger_present() to give a
reliable answer, and additionally allows a debugger to choose how
std::breakpoint() should behave. Setting the global to a positive value
will cause std::breakpoint() to use that value as an argument to
std::raise, so debuggers that prefer SIGABRT for breakpoints can select
that. By default std::breakpoint() will use a platform-specific action
such as the INT3 instruction on x86, or GCC's __builtin_trap().
On Linux the std::is_debugger_present() function checks whether the
process is being traced by a process named "gdb", "gdbserver" or
"lldb-server", to try to avoid interpreting other tracing processes
(such as strace) as a debugger. There have been comments suggesting this
isn't desirable and that std::is_debugger_present() should just return
true for any tracing process (which is the case for non-Linux targets
that support the ptrace system call).
libstdc++-v3/ChangeLog:
PR libstdc++/119670
* acinclude.m4 (GLIBCXX_CHECK_DEBUGGING): Check for facilities
needed by <debugging>.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_DEBUGGING.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (debugging): Add.
* include/bits/version.h: Regenerate.
* include/precompiled/stdc++.h: Add new header.
* src/c++26/Makefile.am: Add new file.
* src/c++26/Makefile.in: Regenerate.
* include/std/debugging: New file.
* src/c++26/debugging.cc: New file.
* testsuite/19_diagnostics/debugging/breakpoint.cc: New test.
* testsuite/19_diagnostics/debugging/breakpoint_if_debugging.cc:
New test.
* testsuite/19_diagnostics/debugging/is_debugger_present.cc: New
test.
* testsuite/19_diagnostics/debugging/is_debugger_present-2.cc:
New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
The current implementation of `complex<_Tp>` assumes that int
`int` is implicitly convertible to `_Tp`, e.g., when using
`complex<_Tp>(1)`.
This patch transforms the implicit conversions into explicit type casts.
As a result, `std::complex` is now able to support more types. One
example is the type `Eigen::Half` from
https://eigen.tuxfamily.org/dox-devel/Half_8h_source.html which does not
implement implicit type conversions.
libstdc++-v3/ChangeLog:
* include/std/complex (polar, __complex_sqrt, pow)
(__complex_pow_unsigned): Use explicit conversions from int to
the complex value_type.
|
|
Asking std::is_constructible_v<std::bitset<1>, NonTrivial*> gives an
error, rather than answering the query. The problem is that the
constructor for std::bitset("010101") is not constrained to only accept
pointers to char-like types, and for the second parameter (which has a
default argument) std::basic_string_view<CharT> gets instantiated. If
the type is not char-like then that has undefined behaviour, and might
trigger a static_assert to fail in the body of std::basic_string_view.
We can fix it by constraining that constructor using the requirements
for char-like types from [strings.general] p1. I've submitted LWG 4294
and proposed making this change in the standard.
libstdc++-v3/ChangeLog:
PR libstdc++/121046
* include/std/bitset (bitset(const CharT*, ...)): Add
constraints on CharT type.
* testsuite/23_containers/bitset/lwg4294.cc: New test.
|
|
This patch moves std::tai_clock::now() and std::tai_clock::now()
definitions from header inlines to static members invoked via a
normal function call, in service of stabilizing the C++20 ABI.
It also changes #if guards to mention the actual __cpp_lib_*
feature gated, not just the language version, for clarity.
New global function symbols std::chrono::tai_clock::now
and std::chrono::gps_clock::now are exported.
libstdc++-v3/ChangeLog:
* include/std/chrono (gps_clock::now, tai_clock::now): Remove
inline definitions.
* src/c++20/clock.cc (gps_clock::now, tai_clock::now): New file
for out-of-line now() impls.
* src/c++20/Makefile.am: Mention clock.cc.
* src/c++20/Makefile.in: Regenerate.
* config/abi/pre/gnu.ver: add mangled now() symbols.
|
|
The offset-based partial specialization of _CachedPosition for
random-access iterators is currently only selected if the offset type is
smaller than the iterator type. Before r12-1018-g46ed811bcb4b86 this
made sense since the main partial specialization only stored the
iterator (incorrectly). After that bugfix, the main partial
specialization now effectively stores a std::optional<iter> so the
size constraint is inaccurate. And this main partial specialization
must invalidate itself upon copy/move unlike the offset-based partial
specialization. So I think we should just always prefer the
offset-based _CachedPosition for a random-access iterator, even if the
offset type happens to be larger than the iterator type.
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::_CachedPosition): Remove
additional size constraint on the offset-based partial
specialization.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
|
|
This patch refactors the implementation of bind_front and bind_back to avoid
using std::tuple for argument storage. Instead, bound arguments are now:
* stored directly if there is only one,
* within a dedicated _Bound_arg_storage otherwise.
_Bound_arg_storage is less expensive to instantiate and access than std::tuple.
It can also be trivially copyable, as it doesn't require a non-trivial assignment
operator for reference types. Storing a single argument directly provides similar
benefits compared to both one element tuple or _Bound_arg_storage.
_Bound_arg_storage holds each argument in an _Indexed_bound_arg base object.
The base class is parameterized by both type and index to allow storing
multiple arguments of the same type. Invocations are handled by _S_apply_front
amd _S_apply_back static functions, which simulate explicit object parameters.
To facilitate this, the __like_t alias template is now unconditionally available
since C++11 in bits/move.h.
libstdc++-v3/ChangeLog:
* include/bits/move.h (std::__like_impl, std::__like_t): Make
available in c++11.
* include/std/functional (std::_Indexed_bound_arg)
(std::_Bound_arg_storage, std::__make_bound_args): Define.
(std::_Bind_front, std::_Bind_back): Use _Bound_arg_storage.
* testsuite/20_util/function_objects/bind_back/1.cc: Expand
test to cover cases of 0, 1, many bound args.
* testsuite/20_util/function_objects/bind_back/111327.cc: Likewise.
* testsuite/20_util/function_objects/bind_front/1.cc: Likewise.
* testsuite/20_util/function_objects/bind_front/111327.cc: Likewise.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
stop_source
The move constructors for stop_source and stop_token are equivalent to
copying and clearing the raw pointer, as they are wrappers for a
counted-shared state.
For jthread, the move constructor performs a member-wise move of stop_source
and thread. While std::thread could also have a _Never_valueless_alt
specialization due to its inexpensive move (only moving a handle), doing
so now would change the ABI. This patch takes the opportunity to correct
this behavior for jthread, before C++20 API is marked stable.
libstdc++-v3/ChangeLog:
* include/std/stop_token (__variant::_Never_valueless_alt): Declare.
(__variant::_Never_valueless_alt<std::stop_token>)
(__variant::_Never_valueless_alt<std::stop_source>): Define.
* include/std/thread: (__variant::_Never_valueless_alt): Declare.
(__variant::_Never_valueless_alt<std::jthread>): Define.
|
|
The change in r14-905-g3b7cb33033fbe6 to disable the use of
pthread_mutex_clocklock when TSan is active assumed that the
_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK macro was always checked with #if
rather than #ifdef, which was not true.
This makes the checks use #if consistently.
libstdc++-v3/ChangeLog:
PR libstdc++/121496
* include/std/mutex (__timed_mutex_impl::_M_try_wait_until):
Change preprocessor condition to use #if instead of #ifdef.
(recursive_timed_mutex::_M_clocklock): Likewise.
* testsuite/30_threads/timed_mutex/121496.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
This commit completes the implementation of P2897R7 by implementing and
testing the template class aligned_accessor.
PR libstdc++/120994
libstdc++-v3/ChangeLog:
* include/bits/version.def (aligned_accessor): Add.
* include/bits/version.h: Regenerate.
* include/std/mdspan (aligned_accessor): New class.
* src/c++23/std.cc.in (aligned_accessor): Add.
* testsuite/23_containers/mdspan/accessors/generic.cc: Add tests
for aligned_accessor.
* testsuite/23_containers/mdspan/accessors/aligned_neg.cc: New test.
* testsuite/23_containers/mdspan/version.cc: Add test for
__cpp_lib_aligned_accessor.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
This commit implements and tests the function is_sufficiently_aligned
from P2897R7.
PR libstdc++/120994
libstdc++-v3/ChangeLog:
* include/bits/align.h (is_sufficiently_aligned): New function.
* include/bits/version.def (is_sufficiently_aligned): Add.
* include/bits/version.h: Regenerate.
* include/std/memory: Add __glibcxx_want_is_sufficiently_aligned.
* src/c++23/std.cc.in (is_sufficiently_aligned): Add.
* testsuite/20_util/headers/memory/version.cc: Add test for
__cpp_lib_is_sufficiently_aligned.
* testsuite/20_util/is_sufficiently_aligned/1.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
When I added this explicit specialization in r14-1433-gf150a084e25eaa I
used the wrong value for the number of mantissa digits (I used 112
instead of 113). Then when I refactored it in r14-1582-g6261d10521f9fd I
used the value calculated from the incorrect value (35 instead of 36).
libstdc++-v3/ChangeLog:
PR libstdc++/121374
* include/std/limits (numeric_limits<__float128>::max_digits10):
Fix value.
* testsuite/18_support/numeric_limits/128bit.cc: Check value.
|
|
This commit implements the C++26 feature std::dims described in P2389R2.
It sets the feature testing macro to 202406 and adds tests.
Also fixes the test mdspan/version.cc
libstdc++-v3/ChangeLog:
* include/bits/version.def (mdspan): Set value for C++26.
* include/bits/version.h: Regenerate.
* include/std/mdspan (dims): Add.
* src/c++23/std.cc.in (dims): Add.
* testsuite/23_containers/mdspan/extents/misc.cc: Add tests.
* testsuite/23_containers/mdspan/version.cc: Update test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
Prior to this commit, the partial products of static extents in <mdspan>
was done in a loop that calls a function that computes the partial
product. The complexity is quadratic in the rank.
This commit removes the quadratic complexity.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__static_prod): Delete.
(__fwd_partial_prods): Compute at compile-time in O(rank), not
O(rank**2).
(__rev_partial_prods): Ditto.
(__size): Inline __static_prod.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
This fixes an oversight in a previous commit that improved mdspan
related code. Because __size doesn't use __fwd_prod, __fwd_prod(__rank)
is not needed anymore. Hence, one can shrink the size of
__fwd_partial_prods.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__fwd_partial_prods): Reduce size of the
array by 1 element.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
Using __int_traits avoids the need to include <limits> from <mdspan>.
This in turn should reduce the size of the pre-compiled <mdspan>.
Similar refactoring was carried out for PR92546. Unfortunately,
./gcc/xgcc -std=c++23 -P -E -x c++ - -include mdspan | wc -l
shows a decrease by 1(!) line. This is due to bits/max_size_type.h which
includes <limits>.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__valid_static_extent): Replace
numeric_limits with __int_traits.
(extents::_S_ctor_explicit): Ditto.
(extents::__static_quotient): Ditto.
(layout_stride::mapping::mapping): Ditto.
(mdspan::size): Ditto.
* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc:
Update test with additional diagnostics.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
An interesting case to consider is:
bool same11(const std::extents<int, dyn, 2, 3>& e1,
const std::extents<int, dyn, dyn, 3>& e2)
{ return e1 == e2; }
Which has the following properties:
- There's no mismatching static extents, preventing any
short-circuiting.
- There's a comparison between dynamic and static extents.
- There's one trivial comparison: ... && 3 == 3.
Let E[i] denote the array of static extents, D[k] denote the array of
dynamic extents and k[i] be the index of the i-th extent in D.
(Naturally, k[i] is only meaningful if i is a dynamic extent).
The previous implementation results in assembly that's more or less a
literal translation of:
for (i = 0; i < 3; ++i)
e1 = E1[i] == -1 ? D1[k1[i]] : E1[i];
e2 = E2[i] == -1 ? D2[k2[i]] : E2[i];
if e1 != e2:
return false
return true;
While the proposed method results in assembly for
if(D1[0] == D2[0]) return false;
return 2 == D2[1];
i.e.
110: 8b 17 mov edx,DWORD PTR [rdi]
112: 31 c0 xor eax,eax
114: 39 16 cmp DWORD PTR [rsi],edx
116: 74 08 je 120 <same11+0x10>
118: c3 ret
119: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0]
120: 83 7e 04 02 cmp DWORD PTR [rsi+0x4],0x2
124: 0f 94 c0 sete al
127: c3 ret
It has the following nice properties:
- It eliminated the indirection D[k[i]], because k[i] is known at
compile time. Saving us a comparison E[i] == -1 and conditionally
loading k[i].
- It eliminated the trivial condition 3 == 3.
The result is code that only loads the required values and performs
exactly the number of comparisons needed by the algorithm. It also
results in smaller object files. Therefore, this seems like a sensible
change. We've check several other examples, including fully statically
determined cases and high-rank examples. The example given above
illustrates the other cases well.
The constexpr condition:
if constexpr (!_S_is_compatible_extents<...>)
return false;
is no longer needed, because the optimizer correctly handles this case.
However, it's retained for clarity/certainty.
libstdc++-v3/ChangeLog:
* include/std/mdspan (extents::operator==): Replace loop with
pack expansion.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
In both fully static and dynamic extents the comparison
static_extent(i) == dynamic_extent
is known at compile time. As a result, extents::extent doesn't
need to perform the check at runtime.
An illustrative example is:
using E = std::extents<int, 3, 5, 7, 11, 13, 17>;
int required_span_size(const typename Layout::mapping<E>& m)
{ return m.required_span_size(); }
Prior to this commit the generated code (on -O2) is:
2a0: b9 01 00 00 00 mov ecx,0x1
2a5: 31 d2 xor edx,edx
2a7: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0]
2ae: 00 00 00 00
2b2: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0]
2b9: 00 00 00 00
2bd: 0f 1f 00 nop DWORD PTR [rax]
2c0: 48 8b 04 d5 00 00 00 mov rax,QWORD PTR [rdx*8+0x0]
2c7: 00
2c8: 48 83 f8 ff cmp rax,0xffffffffffffffff
2cc: 0f 84 00 00 00 00 je 2d2 <required_span_size_6d_static+0x32>
2d2: 83 e8 01 sub eax,0x1
2d5: 0f af 04 97 imul eax,DWORD PTR [rdi+rdx*4]
2d9: 48 83 c2 01 add rdx,0x1
2dd: 01 c1 add ecx,eax
2df: 48 83 fa 06 cmp rdx,0x6
2e3: 75 db jne 2c0 <required_span_size_6d_static+0x20>
2e5: 89 c8 mov eax,ecx
2e7: c3 ret
which is a scalar loop, and notably includes the check
308: 48 83 f8 ff cmp rax,0xffffffffffffffff
to assert that the static extent is indeed not -1. Note, that on -O3 the
optimizer eliminates the comparison; and generates a sequence of scalar
operations: lea, shl, add and mov. The aim of this commit is to
eliminate this comparison also for -O2. With the optimization applied we
get:
2e0: f3 0f 6f 0f movdqu xmm1,XMMWORD PTR [rdi]
2e4: 66 0f 6f 15 00 00 00 movdqa xmm2,XMMWORD PTR [rip+0x0]
2eb: 00
2ec: 8b 57 10 mov edx,DWORD PTR [rdi+0x10]
2ef: 66 0f 6f c1 movdqa xmm0,xmm1
2f3: 66 0f 73 d1 20 psrlq xmm1,0x20
2f8: 66 0f f4 c2 pmuludq xmm0,xmm2
2fc: 66 0f 73 d2 20 psrlq xmm2,0x20
301: 8d 14 52 lea edx,[rdx+rdx*2]
304: 66 0f f4 ca pmuludq xmm1,xmm2
308: 66 0f 70 c0 08 pshufd xmm0,xmm0,0x8
30d: 66 0f 70 c9 08 pshufd xmm1,xmm1,0x8
312: 66 0f 62 c1 punpckldq xmm0,xmm1
316: 66 0f 6f c8 movdqa xmm1,xmm0
31a: 66 0f 73 d9 08 psrldq xmm1,0x8
31f: 66 0f fe c1 paddd xmm0,xmm1
323: 66 0f 6f c8 movdqa xmm1,xmm0
327: 66 0f 73 d9 04 psrldq xmm1,0x4
32c: 66 0f fe c1 paddd xmm0,xmm1
330: 66 0f 7e c0 movd eax,xmm0
334: 8d 54 90 01 lea edx,[rax+rdx*4+0x1]
338: 8b 47 14 mov eax,DWORD PTR [rdi+0x14]
33b: c1 e0 04 shl eax,0x4
33e: 01 d0 add eax,edx
340: c3 ret
Which shows eliminating the trivial comparison, unlocks a new set of
optimizations, i.e. SIMD-vectorization. In particular, the loop has been
vectorized by loading the first four constants from aligned memory; the
first four strides from non-aligned memory, then computes the product
and reduction. It interleaves the above with computing 1 + 12*S[4] +
16*S[5] (as scalar operations) and then finishes the reduction.
A similar effect can be observed for fully dynamic extents.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__mdspan::__all_static): New function.
(__mdspan::_StaticExtents::_S_is_dyn): Inline and eliminate.
(__mdspan::_ExtentsStorage::_S_is_dynamic): New method.
(__mdspan::_ExtentsStorage::_M_extent): Use _S_is_dynamic.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
One previous commit optimized fully dynamic extents; and another
refactored __size such that __fwd_prod is valid for __r = 0, ..., rank
(exclusive).
Therefore, by noticing that __rev_prod (and __fwd_prod) never accesses
the first (or last) extent, one can avoid pre-computing partial products
of static extents in those cases, if all other extents are dynamic.
We check that the size of the reference object file decreases further
and the .rodata sections for
__fwd_prod<dyn, ..., dyn, 11>
__rev_prod<3, dyn, ..., dyn>
are absent.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__fwd_prods): Relax condition for fully-dynamic
extents to cover (dyn, ..., dyn, X).
(__rev_partial_prods): Analogous for (X, dyn, ..., dyn).
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
In mdspan related code, for extents with no static extents, i.e. only
dynamic extents, the following simplifications can be made:
- The array of dynamic extents has size rank.
- The two arrays dynamic-index and dynamic-index-inv become
trivial, e.g. k[i] == i.
- All elements of the arrays __{fwd,rev}_partial_prods are 1.
This commits eliminates the arrays for dynamic-index, dynamic-index-inv
and __{fwd,rev}_partial_prods. It also removes the indirection k[i] == i
from the source code, which isn't as relevant because the optimizer is
(often) capable of eliminating the indirection.
To check if it's working we look at:
using E2 = std::extents<int, dyn, dyn, dyn, dyn>;
int stride_left_E2(const std::layout_left::mapping<E2>& m, size_t r)
{ return m.stride(r); }
which generates the following
0000000000000190 <stride_left_E2>:
190: 48 c1 e6 02 shl rsi,0x2
194: 74 22 je 1b8 <stride_left_E2+0x28>
196: 48 01 fe add rsi,rdi
199: b8 01 00 00 00 mov eax,0x1
19e: 66 90 xchg ax,ax
1a0: 48 63 17 movsxd rdx,DWORD PTR [rdi]
1a3: 48 83 c7 04 add rdi,0x4
1a7: 48 0f af c2 imul rax,rdx
1ab: 48 39 fe cmp rsi,rdi
1ae: 75 f0 jne 1a0 <stride_left_E2+0x10>
1b0: c3 ret
1b1: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0]
1b8: b8 01 00 00 00 mov eax,0x1
1bd: c3 ret
We see that:
- There's no code to load the partial product of static extents.
- There's no indirection D[k[i]], it's just D[i] (as before).
On a test file which computes both mapping::stride(r) and
mapping::required_span_size, we check for static storage with
objdump -h
we don't see the NTTP _Extents, anything (anymore) related to
_StaticExtents, __fwd_partial_prods or __rev_partial_prods. We also
check that the size of the reference object file (described three
commits prior) reduced by a few percent from 41.9kB to 39.4kB.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__mdspan::__all_dynamic): New function.
(__mdspan::_StaticExtents::_S_dynamic_index): Convert to method.
(__mdspan::_StaticExtents::_S_dynamic_index_inv): Ditto.
(__mdspan::_StaticExtents): New specialization for fully dynamic
extents.
(__mdspan::__fwd_prod): New constexpr if branch to avoid
instantiating __fwd_partial_prods.
(__mdspan::__rev_prod): Ditto.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
The methods layout_{left,right}::mapping::stride are defined
as
\prod_{i = 0}^r E[i]
\prod_{i = r+1}^n E[i]
This is computed as the product of a precomputed static product and the
product of the required dynamic extents.
Disassembly shows that even for low-rank extents, i.e. rank == 1 and
rank == 2, with at least one dynamic extent, the generated code loads
two values; and then runs the loop over at most one element, e.g. for
stride_left_d5 defined below the generated code is:
220: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0]
227: 00
228: 31 d2 xor edx,edx
22a: 48 85 c0 test rax,rax
22d: 74 23 je 252 <stride_left_d5+0x32>
22f: 48 8b 0c f5 00 00 00 mov rcx,QWORD PTR [rsi*8+0x0]
236: 00
237: 48 c1 e1 02 shl rcx,0x2
23b: 74 13 je 250 <stride_left_d5+0x30>
23d: 48 01 f9 add rcx,rdi
240: 48 63 17 movsxd rdx,DWORD PTR [rdi]
243: 48 83 c7 04 add rdi,0x4
247: 48 0f af c2 imul rax,rdx
24b: 48 39 f9 cmp rcx,rdi
24e: 75 f0 jne 240 <stride_left_d5+0x20>
250: 89 c2 mov edx,eax
252: 89 d0 mov eax,edx
254: c3 ret
If there's no dynamic extents, it simply loads the precomputed product
of static extents.
For rank == 1 the answer is the constant `1`; for rank == 2 it's either 1 or
extents.extent(k), with k == 0 for layout_left and k == 1 for
layout_right.
Consider,
using Ed = std::extents<int, dyn>;
int stride_left_d(const std::layout_left::mapping<Ed>& m, size_t r)
{ return m.stride(r); }
using E3d = std::extents<int, 3, dyn>;
int stride_left_3d(const std::layout_left::mapping<E3d>& m, size_t r)
{ return m.stride(r); }
using Ed5 = std::extents<int, dyn, 5>;
int stride_left_d5(const std::layout_left::mapping<Ed5>& m, size_t r)
{ return m.stride(r); }
The optimized code for these three cases is:
0000000000000060 <stride_left_d>:
60: b8 01 00 00 00 mov eax,0x1
65: c3 ret
0000000000000090 <stride_left_3d>:
90: 48 83 fe 01 cmp rsi,0x1
94: 19 c0 sbb eax,eax
96: 83 e0 fe and eax,0xfffffffe
99: 83 c0 03 add eax,0x3
9c: c3 ret
00000000000000a0 <stride_left_d5>:
a0: b8 01 00 00 00 mov eax,0x1
a5: 48 85 f6 test rsi,rsi
a8: 74 02 je ac <stride_left_d5+0xc>
aa: 8b 07 mov eax,DWORD PTR [rdi]
ac: c3 ret
For rank == 1 it simply returns 1 (as expected). For rank == 2, it
either implements a branchless formula, or conditionally loads one
value. In all cases involving a dynamic extent this seems like it's
always doing clearly less work, both in terms of computation and loads.
In cases not involving a dynamic extent, it replaces loading one value
with a branchless sequence of four instructions.
This commit also refactors __size to no use any of the precomputed
arrays. This prevents instantiating __{fwd,rev}_partial_prods for
low-rank extents. This results in a further size reduction of a
reference object file (described two commits prior) by 9% from 46.0kB to
41.9kB.
In a prior commit we optimized __size to produce better object code by
precomputing the static products. This refactor enables the optimizer to
generate the same optimized code.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__mdspan::__fwd_prod): Optimize
for rank <= 2.
(__mdspan::__rev_prod): Ditto.
(__mdspan::__size): Refactor to use a pre-computed product, not
a partial product.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
Let E denote an multi-dimensional extent; n the rank of E; r = 0, ...,
n; E[i] the i-th extent; and D[k] be the (possibly empty) array of
dynamic extents.
The two partial products for r = 0, ..., n:
\prod_{i = 0}^r E[i] (fwd)
\prod_{i = r+1}^n E[i] (rev)
can be computed as the product of static and dynamic extents. The static
fwd and rev product can be computed at compile time for all values of r.
Three methods are directly affected by this optimization:
layout_left::mapping::stride
layout_right::mapping::stride
mdspan::size
We'll check the generated code (-O2) for all three methods for a generic
(artificially) high-dimensional multi-dimensional extents.
Consider a generic case:
using Extents = std::extents<int, 3, 5, dyn, dyn, dyn, 7, dyn>;
int stride_left(const std::layout_left::mapping<Extents>& m, size_t r)
{ return m.stride(r); }
The code generated prior to this commit:
4f0: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 4f8
4f7: 00
4f8: 48 83 c6 01 add rsi,0x1
4fc: 48 c7 44 24 e8 ff ff mov QWORD PTR [rsp-0x18],0xffffffffffffffff
503: ff ff
505: 48 8d 04 f5 00 00 00 lea rax,[rsi*8+0x0]
50c: 00
50d: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0
512: 66 0f 76 c0 pcmpeqd xmm0,xmm0
516: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0
51b: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 523
522: 00
523: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0
528: 48 83 f8 38 cmp rax,0x38
52c: 74 72 je 5a0 <stride_right_E1+0xb0>
52e: 48 8d 54 04 b8 lea rdx,[rsp+rax*1-0x48]
533: 4c 8d 4c 24 f0 lea r9,[rsp-0x10]
538: b8 01 00 00 00 mov eax,0x1
53d: 0f 1f 00 nop DWORD PTR [rax]
540: 48 8b 0a mov rcx,QWORD PTR [rdx]
543: 49 89 c0 mov r8,rax
546: 4c 0f af c1 imul r8,rcx
54a: 48 83 f9 ff cmp rcx,0xffffffffffffffff
54e: 49 0f 45 c0 cmovne rax,r8
552: 48 83 c2 08 add rdx,0x8
556: 49 39 d1 cmp r9,rdx
559: 75 e5 jne 540 <stride_right_E1+0x50>
55b: 48 85 c0 test rax,rax
55e: 74 38 je 598 <stride_right_E1+0xa8>
560: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi*8+0x0]
567: 00
568: 48 c1 e2 02 shl rdx,0x2
56c: 48 83 fa 10 cmp rdx,0x10
570: 74 1e je 590 <stride_right_E1+0xa0>
572: 48 8d 4f 10 lea rcx,[rdi+0x10]
576: 48 01 d7 add rdi,rdx
579: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0]
580: 48 63 17 movsxd rdx,DWORD PTR [rdi]
583: 48 83 c7 04 add rdi,0x4
587: 48 0f af c2 imul rax,rdx
58b: 48 39 f9 cmp rcx,rdi
58e: 75 f0 jne 580 <stride_right_E1+0x90>
590: c3 ret
591: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0]
598: c3 ret
599: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0]
5a0: b8 01 00 00 00 mov eax,0x1
5a5: eb b9 jmp 560 <stride_right_E1+0x70>
5a7: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0]
5ae: 00 00
which seems to be performing:
preparatory_work();
ret = 1
for(i = 0; i < rank; ++i)
tmp = ret * E[i]
if E[i] != -1
ret = tmp
for(i = 0; i < rank_dynamic; ++i)
ret *= D[i]
This commit reduces it down to:
270: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0]
277: 00
278: 31 d2 xor edx,edx
27a: 48 85 c0 test rax,rax
27d: 74 33 je 2b2 <stride_right_E1+0x42>
27f: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi*8+0x0]
286: 00
287: 48 c1 e2 02 shl rdx,0x2
28b: 48 83 fa 10 cmp rdx,0x10
28f: 74 1f je 2b0 <stride_right_E1+0x40>
291: 48 8d 4f 10 lea rcx,[rdi+0x10]
295: 48 01 d7 add rdi,rdx
298: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0]
29f: 00
2a0: 48 63 17 movsxd rdx,DWORD PTR [rdi]
2a3: 48 83 c7 04 add rdi,0x4
2a7: 48 0f af c2 imul rax,rdx
2ab: 48 39 f9 cmp rcx,rdi
2ae: 75 f0 jne 2a0 <stride_right_E1+0x30>
2b0: 89 c2 mov edx,eax
2b2: 89 d0 mov eax,edx
2b4: c3 ret
Loosely speaking this does the following:
1. Load the starting position k in the array of dynamic extents; and
return if possible.
2. Load the partial product of static extents.
3. Computes the \prod_{i = k}^d D[i] where d is the number of
dynamic extents in a loop.
It shows that the span used for passing in the dynamic extents is
completely eliminated; and the fact that the product always runs to the
end of the array of dynamic extents is used by the compiler to eliminate
one indirection to determine the end position in the array of dynamic
extents.
The analogous code is generated for layout_left.
Next, consider
using E2 = std::extents<int, 3, 5, dyn, dyn, 7, dyn, 11>;
int size2(const std::mdspan<double, E2>& md)
{ return md.size(); }
on immediately preceding commit the generated code is
10: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 18
17: 00
18: 49 89 f8 mov r8,rdi
1b: 48 8d 44 24 b8 lea rax,[rsp-0x48]
20: 48 c7 44 24 e8 0b 00 mov QWORD PTR [rsp-0x18],0xb
27: 00 00
29: 48 8d 7c 24 f0 lea rdi,[rsp-0x10]
2e: ba 01 00 00 00 mov edx,0x1
33: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0
38: 66 0f 76 c0 pcmpeqd xmm0,xmm0
3c: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0
41: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 49
48: 00
49: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0
4e: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0]
55: 00 00 00 00
59: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0]
60: 48 8b 08 mov rcx,QWORD PTR [rax]
63: 48 89 d6 mov rsi,rdx
66: 48 0f af f1 imul rsi,rcx
6a: 48 83 f9 ff cmp rcx,0xffffffffffffffff
6e: 48 0f 45 d6 cmovne rdx,rsi
72: 48 83 c0 08 add rax,0x8
76: 48 39 c7 cmp rdi,rax
79: 75 e5 jne 60 <size2+0x50>
7b: 48 85 d2 test rdx,rdx
7e: 74 18 je 98 <size2+0x88>
80: 49 63 00 movsxd rax,DWORD PTR [r8]
83: 49 63 48 04 movsxd rcx,DWORD PTR [r8+0x4]
87: 48 0f af c1 imul rax,rcx
8b: 41 0f af 40 08 imul eax,DWORD PTR [r8+0x8]
90: 0f af c2 imul eax,edx
93: c3 ret
94: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
98: 31 c0 xor eax,eax
9a: c3 ret
which is needlessly long. The current commit reduces it down to:
10: 48 63 07 movsxd rax,DWORD PTR [rdi]
13: 48 63 57 04 movsxd rdx,DWORD PTR [rdi+0x4]
17: 48 0f af c2 imul rax,rdx
1b: 0f af 47 08 imul eax,DWORD PTR [rdi+0x8]
1f: 69 c0 83 04 00 00 imul eax,eax,0x483
25: c3 ret
Which simply computes the product:
D[0] * D[1] * D[2] * const
where const is the product of all static extents. Meaning the loop to
compute the product of dynamic extents has been fully unrolled and
all constants are perfectly precomputed.
The size of the object file described in the previous commit reduces
by 17% from 55.8kB to 46.0kB.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__mdspan::__static_prod): New function.
(__mdspan::__fwd_partial_prods): Constexpr array of partial
forward products.
(__mdspan::__fwd_partial_prods): Same for reverse partial
products.
(__mdspan::__static_extents_prod): Delete function.
(__mdspan::__extents_prod): Renamed from __exts_prod and refactored.
include/std/mdspan (__mdspan::__fwd_prod): Compute as the
product of pre-computed static static and the product of dynamic
extents.
(__mdspan::__rev_prod): Ditto.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
In mdspan related code involving static extents, often the IndexType is
part of the template parameters, even though it's not needed.
This commit extracts the parts of _ExtentsStorage not related to
IndexType into a separate class _StaticExtents.
It also prefers passing the array of static extents, instead of the
whole extents object where possible.
The size of an object file compiled with -O2 that instantiates
Layout::mapping<extents<IndexType, Indices...>::stride
Layout::mapping<extents<IndexType, Indices...>::required_span_size
for the product of
- eight IndexTypes
- three Layouts,
- nine choices of Indices...
decreases by 19% from 69.2kB to 55.8kB.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__mdspan::_StaticExtents): Extract non IndexType
related code from _ExtentsStorage.
(__mdspan::_ExtentsStorage): Use _StaticExtents.
(__mdspan::__static_extents): Return reference to NTTP of _StaticExtents.
(__mdspan::__contains_zero): New overload.
(__mdspan::__exts_prod, __mdspan::__static_quotient): Use span to avoid
copying __sta_exts.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
libstdc++-v3/ChangeLog:
* include/std/mdspan: Small stylistic adjustments.
Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
|
|
[PR121196]
PR libstdc++/121196
libstdc++-v3/ChangeLog:
* include/std/inplace_vector (std::erase): Provide default argument
for _Up parameter.
* testsuite/23_containers/inplace_vector/erasure.cc: Add test for
using braces-init-list as arguments to erase_if and use function
to verify content of inplace_vector
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
|
|
In case of input iterators, the loop that assigns to existing elements
should run up to number of elements in vector (_M_size) not capacity (_Nm).
PR libstdc++/119137
libstdc++-v3/ChangeLog:
* include/std/inplace_vector (inplace_vector::assign_range):
Replace _Nm with _M_size in the assigment loop.
|