aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-08-23c++/modules: Provide definitions of synthesized methods outside their ↵Nathaniel Shead4-0/+59
defining module [PR120499] In the PR, we're getting a linker error from _Vector_impl's destructor never getting emitted. This is because of a combination of factors: 1. in imp-member-4_a, the destructor is not used and so there is no definition generated. 2. in imp-member-4_b, the destructor gets synthesized (as part of the synthesis for Coll's destructor) but is not ODR-used and so does not get emitted. Despite there being a definition provided in this TU, the destructor is still considered imported and so isn't streamed into the module body. 3. in imp-member-4_c, we need to ODR-use the destructor but we only got a forward declaration from imp-member-4_b, so we cannot emit a body. The point of failure here is step 2; this function has effectively been declared in the imp-member-4_b module, and so we shouldn't treat it as imported. This way we'll properly stream the body so that importers can emit it. PR c++/120499 gcc/cp/ChangeLog: * method.cc (synthesize_method): Set the instantiating module. gcc/testsuite/ChangeLog: * g++.dg/modules/imp-member-4_a.C: New test. * g++.dg/modules/imp-member-4_b.C: New test. * g++.dg/modules/imp-member-4_c.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-23Daily bump.GCC Administrator3-1/+130
2025-08-23rs6000: Add shift count guards to avoid undefined behavior [PR118890]Kishan Parmar1-2/+2
This patch adds missing guards on shift amounts to prevent UB when the shift count equals or exceeds HOST_BITS_PER_WIDE_INT. In the patch (r16-2666-g647bd0a02789f1), shift counts were only checked for nonzero but not for being within valid bounds. This patch tightens those conditions by enforcing that shift counts are greater than zero and less than HOST_BITS_PER_WIDE_INT. 2025-08-23 Kishan Parmar <kishan@linux.ibm.com> gcc/ PR target/118890 * config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): Add bounds checks for shift counts to prevent undefined behavior. (rs6000_emit_set_long_const): Likewise.
2025-08-22[PR rtl-optimization/120553] Improve selecting between constants based on ↵Jeff Law9-0/+783
sign bit test While working to remove mvconst_internal I stumbled over a regression in the code to handle signed division by a power of two. In that sequence we want to select between 0, 2^n-1 by pairing a sign bit splat with a subsequent logical right shift. This can be done without branches or conditional moves. Playing with it a bit made me realize there's a handful of selections we can do based on a sign bit test. Essentially there's two broad cases. Clearing bits after the sign bit splat. So we have 0, -1, if we clear bits the 0 stays as-is, but the -1 could easily turn into 2^n-1, ~2^n-1, or some small constants. Setting bits after the sign bit splat. If we have 0, -1, setting bits the -1 stays as-is, but the 0 can turn into 2^n, a small constant, etc. Shreya and I originally started looking at target patterns to do this, essentially discovering conditional move forms of the selects and rewriting them into something more efficient. That got out of control pretty quickly and it relied on if-conversion to initially create the conditional move. The better solution is to actually discover the cases during if-conversion itself. That catches cases that were previously being missed, checks cost models, and is actually simpler since we don't have to distinguish between things like ori and bseti, instead we just emit the natural RTL and let the target figure it out. In the ifcvt implementation we put these cases just before trying the traditional conditional move sequences. Essentially these are a last attempt before trying the generalized conditional move sequence. This as been bootstrapped and regression tested on aarch64, riscv, ppc64le, s390x, alpha, m68k, sh4eb, x86_64 and probably a couple others I've forgotten. It's also been tested on the other embedded targets. Obviously the new tests are risc-v specific, so that testing was primarily to make sure we didn't ICE, generate incorrect code or regress target existing specific tests. Raphael has some changes to attack this from the gimple direction as well. I think the latest version of those is on me to push through internal review. PR rtl-optimization/120553 gcc/ * ifcvt.cc (noce_try_sign_bit_splat): New function. (noce_process_if_block): Use it. gcc/testsuite/ * gcc.target/riscv/pr120553-1.c: New test. * gcc.target/riscv/pr120553-2.c: New test. * gcc.target/riscv/pr120553-3.c: New test. * gcc.target/riscv/pr120553-4.c: New test. * gcc.target/riscv/pr120553-5.c: New test. * gcc.target/riscv/pr120553-6.c: New test. * gcc.target/riscv/pr120553-7.c: New test. * gcc.target/riscv/pr120553-8.c: New test.
2025-08-22Pass representative of live SLP node to vect_create_epilog_for_reductionRichard Biener1-2/+3
We passed the reduc_info which is close, but the representative is more spot on and will not collide with making the reduc_info a distinct type. * tree-vect-loop.cc (vectorizable_live_operation): Pass the representative of the PHIs node to vect_create_epilog_for_reduction.
2025-08-22Fixups around reduction info and STMT_VINFO_REDUC_VECTYPE_INRichard Biener1-6/+6
STMT_VINFO_REDUC_VECTYPE_IN exists on relevant reduction stmts, not the reduction info. And STMT_VINFO_DEF_TYPE exists on the reduction info. The following fixes up a few places. * tree-vect-loop.cc (vectorizable_lane_reducing): Get reduction info properly. Adjust checks according to comments. (vectorizable_reduction): Do not set STMT_VINFO_REDUC_VECTYPE_IN on the reduc info. (vect_transform_reduction): Query STMT_VINFO_REDUC_VECTYPE_IN on the actual reduction stmt, not the info.
2025-08-22RISC-V: Add testcase for scalar unsigned SAT_MUL form 3Pan Li27-0/+382
Add run and asm check test cases for scalar unsigned SAT_MUL form 3. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u16-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u32-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u32-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u64-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u16.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-4-u8-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u16-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u32-from-u64.rv32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u64-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u128.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u16.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u32.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-run-4-u8-from-u64.rv32.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-22Match: Add form 3 for unsigned SAT_MULPan Li1-1/+26
This patch would like to try to match the the unsigned SAT_MUL form 3, aka below: #define DEF_SAT_U_MUL_FMT_3(NT, WT) \ NT __attribute__((noinline)) \ sat_u_mul_##NT##_from_##WT##_fmt_3 (NT a, NT b) \ { \ WT x = (WT)a * (WT)b; \ if ((x >> sizeof(a) * 8) == 0) \ return (NT)x; \ else \ return (NT)-1; \ } While WT is T is uint16_t, uint32_t, uint64_t and uint128_t, and NT is is uint8_t, uint16_t, uint32_t and uint64_t. gcc/ChangeLog: * match.pd: Add form 3 for unsigned SAT_MUL. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-22Emit the TLS call after NOTE_INSN_FUNCTION_BEGH.J. Lu3-4/+44
For the beginning basic block: (note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 4 26 2 NOTE_INSN_FUNCTION_BEG) emit the TLS call after NOTE_INSN_FUNCTION_BEG. gcc/ PR target/121635 * config/i386/i386-features.cc (ix86_emit_tls_call): Emit the TLS call after NOTE_INSN_FUNCTION_BEG. gcc/testsuite/ PR target/121635 * gcc.target/i386/pr121635-1a.c: New test. * gcc.target/i386/pr121635-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-22Use REDUC_GROUP_FIRST_ELEMENT lessRichard Biener1-34/+33
REDUC_GROUP_FIRST_ELEMENT is often checked to see whether we are dealing with a SLP reduction or a reduction chain. When we are in the context of analyzing the reduction (so we are sure the SLP instance we see is correct), then we can use the SLP instance kind instead. * tree-vect-loop.cc (get_initial_defs_for_reduction): Adjust comment. (vect_create_epilog_for_reduction): Get at the reduction kind via the instance, re-use the slp_reduc flag instead of checking REDUC_GROUP_FIRST_ELEMENT again. Remove unreachable code. (vectorizable_reduction): Compute a reduc_chain flag from the SLP instance kind, avoid REDUC_GROUP_FIRST_ELEMENT checks. (vect_transform_cycle_phi): Likewise. (vectorizable_live_operation): Check the SLP instance kind instead of REDUC_GROUP_FIRST_ELEMENT.
2025-08-22testsuite: Fix g++.dg/abi/mangle83.C for -fshort-enumsNathaniel Shead1-2/+2
Linaro CI informed me that this test fails on ARM thumb-m7-hard-eabi. This appears to be because the target defaults to -fshort-enums, and so the mangled names are inaccurate. This patch just disables the implicit type enum test for this case. gcc/testsuite/ChangeLog: * g++.dg/abi/mangle83.C: Disable implicit enum test for -fshort-enums. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2025-08-22Decouple parloops from vect reduction infra some moreRichard Biener1-66/+48
The following removes the use of STMT_VINFO_REDUC_* from parloops, also fixing a mistake with analyzing double reductions which rely on the outer loop vinfo so the inner loop is properly detected as nested. * tree-parloops.cc (parloops_is_simple_reduction): Pass in double reduction inner loop LC phis and query that. (parloops_force_simple_reduction): Similar, but set it. Check for valid reduction types here. (valid_reduction_p): Remove. (gather_scalar_reductions): Adjust, fixup double reduction inner loop processing.
2025-08-22RTEMS: Add riscv multilibsSebastian Huber1-2/+7
gcc/ChangeLog: * config/riscv/t-rtems: Add -mstrict-align multilibs for targets without support for misaligned access in hardware.
2025-08-21[arm] require armv7 support for [PR120424]Alexandre Oliva2-1/+4
Without stating the architecture version required by the test, test runs with options that are incompatible with the required architecture version fail, e.g. -mfloat-abi=hard. armv7 was not covered by the long list of arm variants in target-supports.exp, so add it, and use it for the effective target requirement and for the option. for gcc/testsuite/ChangeLog PR rtl-optimization/120424 * lib/target-supports.exp (arm arches): Add arm_arch_v7. * g++.target/arm/pr120424.C: Require armv7 support. Use dg-add-options arm_arch_v7 instead of explicit -march=armv7.
2025-08-22Daily bump.GCC Administrator9-1/+336
2025-08-21Fortran: Fix NULL pointer issue.Steven G. Kargl2-2/+10
PR fortran/121627 gcc/fortran/ChangeLog: * module.cc (create_int_parameter_array): Avoid NULL pointer dereference and enhance error message. gcc/testsuite/ChangeLog: * gfortran.dg/pr121627.f90: New test.
2025-08-21pru: libgcc: Add software implementation for multiplicationDimitar Dimitrov6-1/+134
For cores without a hardware multiplier, set respective optabs with library functions which use software implementation of multiplication. The implementation was copied from the RL78 backend. gcc/ChangeLog: * config/pru/pru.cc (pru_init_libfuncs): Set softmpy libgcc functions for optab multiplication entries if TARGET_OPT_MUL option is not set. libgcc/ChangeLog: * config/pru/libgcc-eabi.ver: Add __pruabi_softmpyi and __pruabi_softmpyll symbols. * config/pru/t-pru: Add softmpy source files. * config/pru/pru-softmpy.h: New file. * config/pru/softmpyi.c: New file. * config/pru/softmpyll.c: New file. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-08-21pru: Define multilib for different core variantsDimitar Dimitrov3-1/+33
Enable multilib builds for contemporary PRU core versions (AM335x and later), and older versions present in AM18xx. gcc/ChangeLog: * config.gcc: Include pru/t-multilib. * config/pru/pru.h (MULTILIB_DEFAULTS): Define. * config/pru/t-multilib: New file. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-08-21pru: Add options to disable MUL/FILL/ZERO instructionsDimitar Dimitrov6-15/+42
Older PRU core versions (e.g. in AM1808 SoC) do not support XIN, XOUT, FILL, ZERO instructions. Add GCC command line options to optionally disable generation of those instructions, so that code can be executed on such older PRU cores. gcc/ChangeLog: * common/config/pru/pru-common.cc (TARGET_DEFAULT_TARGET_FLAGS): Keep multiplication, FILL and ZERO instructions enabled by default. * config/pru/pru.md (prumov<mode>): Gate code generation on TARGET_OPT_FILLZERO. (mov<mode>): Ditto. (zero_extendqidi2): Ditto. (zero_extendhidi2): Ditto. (zero_extendsidi2): Ditto. (@pru_ior_fillbytes<mode>): Ditto. (@pru_and_zerobytes<mode>): Ditto. (@<code>di3): Ditto. (mulsi3): Gate code generation on TARGET_OPT_MUL. * config/pru/pru.opt: Add mmul and mfillzero options. * config/pru/pru.opt.urls: Regenerate. * config/rl78/rl78.opt.urls: Regenerate. * doc/invoke.texi: Document new options. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-08-21c: Add folding of nullptr_t in some cases [PR121478]Andrew Pinski3-3/+58
The middle-end does not fully understand NULLPTR_TYPE. So it gets confused a lot of the time when dealing with it. This adds the folding that is similarly done in the C++ front-end already. In some cases it should produce slightly better code as there is no reason to load from a nullptr_t variable as it is always NULL. The following is handled: nullptr_v ==/!= nullptr_v -> true/false (ptr)nullptr_v -> (ptr)0, nullptr_v f(nullptr_v) -> f ((nullptr, nullptr_v)) The last one is for conversion inside ... . Bootstrapped and tested on x86_64-linux-gnu. PR c/121478 gcc/c/ChangeLog: * c-fold.cc (c_fully_fold_internal): Fold nullptr_t ==/!= nullptr_t. * c-typeck.cc (convert_arguments): Handle conversion from nullptr_t for varargs. (convert_for_assignment): Handle conversions from nullptr_t to pointer type specially. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr121478-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-21c++: constexpr clobber of const [PR121068]Jason Merrill2-0/+29
Since r16-3022, 20_util/variant/102912.cc was failing in C++20 and above due to wrong errors about destruction modifying a const object; destruction is OK. PR c++/121068 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_store_expression): Allow clobber of a const object. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/constexpr-dtor18.C: New test.
2025-08-21RISC-V: testsuite: Fix DejaGnu support for riscv_zvfhPaul-Antoine Arras13-18/+14
Call check_effective_target_riscv_zvfh_ok rather than check_effective_target_riscv_zvfh in vx_vf_*run-1-f16.c run tests and ensure that they are actually run. Also fix remove_options_for_riscv_zvfh. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: Call check_effective_target_riscv_zvfh_ok rather than check_effective_target_riscv_zvfh. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmacc-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmsac-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: Likewise. * lib/target-supports.exp (check_effective_target_riscv_zvfh_ok): Append zvfh instead of v to march. (remove_options_for_riscv_zvfh): Remove duplicate and call remove_ rather than add_options_for_riscv_z_ext.
2025-08-21rtl-ssa: Add missing live-out uses [PR121619]Richard Sandiford4-1/+74
This PR is another bug in the rtl-ssa code to manage live-out uses. It seems that this didn't get much coverage until recently. In the testcase, late-combine first removed a register-to-register move by substituting into all uses, some of which were in other EBBs. This was done after checking make_uses_available, which (as expected) says that single dominating definitions are available everywhere that the definition dominates. But the update failed to add appropriate live-out uses, so a later parallelisation attempt tried to move the new destination into a later block. gcc/ PR rtl-optimization/121619 * rtl-ssa/functions.h (function_info::commit_make_use_available): Declare. * rtl-ssa/blocks.cc (function_info::commit_make_use_available): New function. * rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Use it. gcc/testsuite/ PR rtl-optimization/121619 * gcc.dg/pr121619.c: New test.
2025-08-21libstdc++: Use pthread_mutex_clocklock when TSan is active [PR121496]Jonathan Wakely2-2/+2
This reverts r14-905-g3b7cb33033fbe6 which disabled the use of pthread_mutex_clocklock when TSan is active. That's no longer needed, because GCC has TSan interceptors for pthread_mutex_clocklock since GCC 15.1 and Clang has them since 18.1.0 (released March 2024). The interceptor was added by https://github.com/llvm/llvm-project/pull/75713 libstdc++-v3/ChangeLog: PR libstdc++/121496 * acinclude.m4 (GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK): Do not use _GLIBCXX_TSAN in _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK macro. * configure: Regenerate. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-21libstdc++: Check _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK with #if [PR121496]Jonathan Wakely2-2/+16
The change in r14-905-g3b7cb33033fbe6 to disable the use of pthread_mutex_clocklock when TSan is active assumed that the _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK macro was always checked with #if rather than #ifdef, which was not true. This makes the checks use #if consistently. libstdc++-v3/ChangeLog: PR libstdc++/121496 * include/std/mutex (__timed_mutex_impl::_M_try_wait_until): Change preprocessor condition to use #if instead of #ifdef. (recursive_timed_mutex::_M_clocklock): Likewise. * testsuite/30_threads/timed_mutex/121496.cc: New test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
2025-08-21tree-optimization/111494 - reduction vectorization with signed UBRichard Biener3-1/+56
The following makes sure to pun arithmetic that's used in vectorized reduction to unsigned when overflow invokes undefined behavior. PR tree-optimization/111494 * gimple-fold.h (arith_code_with_undefined_signed_overflow): Declare. * gimple-fold.cc (arith_code_with_undefined_signed_overflow): Export. * tree-vect-stmts.cc (vectorizable_operation): Use unsigned arithmetic for operations participating in a reduction.
2025-08-21x86-64: Emit the TLS call after NOTE_INSN_BASIC_BLOCKH.J. Lu3-3/+86
For a basic block with only a label: (code_label 78 11 77 3 14 (nil) [1 uses]) (note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK) emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before NOTE_INSN_BASIC_BLOCK, to avoid x.c: In function ‘aout_16_write_syms’: x.c:54:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 3 54 | } | ^ x.c:54:1: error: NOTE_INSN_BASIC_BLOCK 77 in middle of basic block 3 during RTL pass: x86_cse x.c:54:1: internal compiler error: verify_flow_info failed gcc/ PR target/121607 * config/i386/i386-features.cc (ix86_emit_tls_call): Emit the TLS call after NOTE_INSN_BASIC_BLOCK in a basic block with only a label. gcc/testsuite/ PR target/121607 * gcc.target/i386/pr121607-1a.c: New test. * gcc.target/i386/pr121607-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-21libstdc++: Implement aligned_accessor from mdspan [PR120994]Luc Grosheintz9-3/+197
This commit completes the implementation of P2897R7 by implementing and testing the template class aligned_accessor. PR libstdc++/120994 libstdc++-v3/ChangeLog: * include/bits/version.def (aligned_accessor): Add. * include/bits/version.h: Regenerate. * include/std/mdspan (aligned_accessor): New class. * src/c++23/std.cc.in (aligned_accessor): Add. * testsuite/23_containers/mdspan/accessors/generic.cc: Add tests for aligned_accessor. * testsuite/23_containers/mdspan/accessors/aligned_neg.cc: New test. * testsuite/23_containers/mdspan/version.cc: Add test for __cpp_lib_aligned_accessor. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Implement is_sufficiently_aligned [PR120994]Luc Grosheintz7-0/+74
This commit implements and tests the function is_sufficiently_aligned from P2897R7. PR libstdc++/120994 libstdc++-v3/ChangeLog: * include/bits/align.h (is_sufficiently_aligned): New function. * include/bits/version.def (is_sufficiently_aligned): Add. * include/bits/version.h: Regenerate. * include/std/memory: Add __glibcxx_want_is_sufficiently_aligned. * src/c++23/std.cc.in (is_sufficiently_aligned): Add. * testsuite/20_util/headers/memory/version.cc: Add test for __cpp_lib_is_sufficiently_aligned. * testsuite/20_util/is_sufficiently_aligned/1.cc: New test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Fix std::numeric_limits<__float128>::max_digits10 [PR121374]Jonathan Wakely2-1/+6
When I added this explicit specialization in r14-1433-gf150a084e25eaa I used the wrong value for the number of mantissa digits (I used 112 instead of 113). Then when I refactored it in r14-1582-g6261d10521f9fd I used the value calculated from the incorrect value (35 instead of 36). libstdc++-v3/ChangeLog: PR libstdc++/121374 * include/std/limits (numeric_limits<__float128>::max_digits10): Fix value. * testsuite/18_support/numeric_limits/128bit.cc: Check value.
2025-08-21libstdc++: Suppress some more additional diagnostics [PR117294]Jonathan Wakely2-0/+3
libstdc++-v3/ChangeLog: PR c++/117294 * testsuite/20_util/optional/cons/value_neg.cc: Prune additional output for C++20 and later. * testsuite/20_util/scoped_allocator/69293_neg.cc: Match additional error for C++20 and later.
2025-08-21libstdc++: Implement std::dims from <mdspan>.Luc Grosheintz6-5/+32
This commit implements the C++26 feature std::dims described in P2389R2. It sets the feature testing macro to 202406 and adds tests. Also fixes the test mdspan/version.cc libstdc++-v3/ChangeLog: * include/bits/version.def (mdspan): Set value for C++26. * include/bits/version.h: Regenerate. * include/std/mdspan (dims): Add. * src/c++23/std.cc.in (dims): Add. * testsuite/23_containers/mdspan/extents/misc.cc: Add tests. * testsuite/23_containers/mdspan/version.cc: Update test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Simplify precomputed partial products in <mdspan>.Luc Grosheintz1-21/+23
Prior to this commit, the partial products of static extents in <mdspan> was done in a loop that calls a function that computes the partial product. The complexity is quadratic in the rank. This commit removes the quadratic complexity. libstdc++-v3/ChangeLog: * include/std/mdspan (__static_prod): Delete. (__fwd_partial_prods): Compute at compile-time in O(rank), not O(rank**2). (__rev_partial_prods): Ditto. (__size): Inline __static_prod. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Reduce size static storage for __fwd_prod in mdspan.Luc Grosheintz1-2/+2
This fixes an oversight in a previous commit that improved mdspan related code. Because __size doesn't use __fwd_prod, __fwd_prod(__rank) is not needed anymore. Hence, one can shrink the size of __fwd_partial_prods. libstdc++-v3/ChangeLog: * include/std/mdspan (__fwd_partial_prods): Reduce size of the array by 1 element. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21xtensa: Small improvement to "*btrue_INT_MIN"Takayuki 'January June' Suwa1-10/+7
This patch changes the implementation of the insn to test whether the result itself is negative or not, rather than the MSB of the result of the ABS machine instruction. This eliminates the need to consider bit- endianness and allows for longer branch distances. /* example */ extern void foo(int); void test0(int a) { if (a == -2147483648) foo(a); } void test1(int a) { if (a != -2147483648) foo(a); } ;; before (endianness: little) test0: entry sp, 32 abs a8, a2 bbci a8, 31, .L1 mov.n a10, a2 call8 foo .L1: retw.n test1: entry sp, 32 abs a8, a2 bbsi a8, 31, .L4 mov.n a10, a2 call8 foo .L4: retw.n ;; after (endianness-independent) test0: entry sp, 32 abs a8, a2 bgez a8, .L1 mov.n a10, a2 call8 foo .L1: retw.n test1: entry sp, 32 abs a8, a2 bltz a8, .L4 mov.n a10, a2 call8 foo .L4: retw.n gcc/ChangeLog: * config/xtensa/xtensa.md (*btrue_INT_MIN): Change the branch insn condition to test for a negative number rather than testing for the MSB.
2025-08-21libstdc++: Replace numeric_limit with __int_traits in mdspan.Luc Grosheintz2-12/+17
Using __int_traits avoids the need to include <limits> from <mdspan>. This in turn should reduce the size of the pre-compiled <mdspan>. Similar refactoring was carried out for PR92546. Unfortunately, ./gcc/xgcc -std=c++23 -P -E -x c++ - -include mdspan | wc -l shows a decrease by 1(!) line. This is due to bits/max_size_type.h which includes <limits>. libstdc++-v3/ChangeLog: * include/std/mdspan (__valid_static_extent): Replace numeric_limits with __int_traits. (extents::_S_ctor_explicit): Ditto. (extents::__static_quotient): Ditto. (layout_stride::mapping::mapping): Ditto. (mdspan::size): Ditto. * testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: Update test with additional diagnostics. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve extents::operator==.Luc Grosheintz1-4/+5
An interesting case to consider is: bool same11(const std::extents<int, dyn, 2, 3>& e1, const std::extents<int, dyn, dyn, 3>& e2) { return e1 == e2; } Which has the following properties: - There's no mismatching static extents, preventing any short-circuiting. - There's a comparison between dynamic and static extents. - There's one trivial comparison: ... && 3 == 3. Let E[i] denote the array of static extents, D[k] denote the array of dynamic extents and k[i] be the index of the i-th extent in D. (Naturally, k[i] is only meaningful if i is a dynamic extent). The previous implementation results in assembly that's more or less a literal translation of: for (i = 0; i < 3; ++i) e1 = E1[i] == -1 ? D1[k1[i]] : E1[i]; e2 = E2[i] == -1 ? D2[k2[i]] : E2[i]; if e1 != e2: return false return true; While the proposed method results in assembly for if(D1[0] == D2[0]) return false; return 2 == D2[1]; i.e. 110: 8b 17 mov edx,DWORD PTR [rdi] 112: 31 c0 xor eax,eax 114: 39 16 cmp DWORD PTR [rsi],edx 116: 74 08 je 120 <same11+0x10> 118: c3 ret 119: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 120: 83 7e 04 02 cmp DWORD PTR [rsi+0x4],0x2 124: 0f 94 c0 sete al 127: c3 ret It has the following nice properties: - It eliminated the indirection D[k[i]], because k[i] is known at compile time. Saving us a comparison E[i] == -1 and conditionally loading k[i]. - It eliminated the trivial condition 3 == 3. The result is code that only loads the required values and performs exactly the number of comparisons needed by the algorithm. It also results in smaller object files. Therefore, this seems like a sensible change. We've check several other examples, including fully statically determined cases and high-rank examples. The example given above illustrates the other cases well. The constexpr condition: if constexpr (!_S_is_compatible_extents<...>) return false; is no longer needed, because the optimizer correctly handles this case. However, it's retained for clarity/certainty. libstdc++-v3/ChangeLog: * include/std/mdspan (extents::operator==): Replace loop with pack expansion. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Reduce indirection in extents::extent.Luc Grosheintz1-10/+25
In both fully static and dynamic extents the comparison static_extent(i) == dynamic_extent is known at compile time. As a result, extents::extent doesn't need to perform the check at runtime. An illustrative example is: using E = std::extents<int, 3, 5, 7, 11, 13, 17>; int required_span_size(const typename Layout::mapping<E>& m) { return m.required_span_size(); } Prior to this commit the generated code (on -O2) is: 2a0: b9 01 00 00 00 mov ecx,0x1 2a5: 31 d2 xor edx,edx 2a7: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0] 2ae: 00 00 00 00 2b2: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0] 2b9: 00 00 00 00 2bd: 0f 1f 00 nop DWORD PTR [rax] 2c0: 48 8b 04 d5 00 00 00 mov rax,QWORD PTR [rdx*8+0x0] 2c7: 00 2c8: 48 83 f8 ff cmp rax,0xffffffffffffffff 2cc: 0f 84 00 00 00 00 je 2d2 <required_span_size_6d_static+0x32> 2d2: 83 e8 01 sub eax,0x1 2d5: 0f af 04 97 imul eax,DWORD PTR [rdi+rdx*4] 2d9: 48 83 c2 01 add rdx,0x1 2dd: 01 c1 add ecx,eax 2df: 48 83 fa 06 cmp rdx,0x6 2e3: 75 db jne 2c0 <required_span_size_6d_static+0x20> 2e5: 89 c8 mov eax,ecx 2e7: c3 ret which is a scalar loop, and notably includes the check 308: 48 83 f8 ff cmp rax,0xffffffffffffffff to assert that the static extent is indeed not -1. Note, that on -O3 the optimizer eliminates the comparison; and generates a sequence of scalar operations: lea, shl, add and mov. The aim of this commit is to eliminate this comparison also for -O2. With the optimization applied we get: 2e0: f3 0f 6f 0f movdqu xmm1,XMMWORD PTR [rdi] 2e4: 66 0f 6f 15 00 00 00 movdqa xmm2,XMMWORD PTR [rip+0x0] 2eb: 00 2ec: 8b 57 10 mov edx,DWORD PTR [rdi+0x10] 2ef: 66 0f 6f c1 movdqa xmm0,xmm1 2f3: 66 0f 73 d1 20 psrlq xmm1,0x20 2f8: 66 0f f4 c2 pmuludq xmm0,xmm2 2fc: 66 0f 73 d2 20 psrlq xmm2,0x20 301: 8d 14 52 lea edx,[rdx+rdx*2] 304: 66 0f f4 ca pmuludq xmm1,xmm2 308: 66 0f 70 c0 08 pshufd xmm0,xmm0,0x8 30d: 66 0f 70 c9 08 pshufd xmm1,xmm1,0x8 312: 66 0f 62 c1 punpckldq xmm0,xmm1 316: 66 0f 6f c8 movdqa xmm1,xmm0 31a: 66 0f 73 d9 08 psrldq xmm1,0x8 31f: 66 0f fe c1 paddd xmm0,xmm1 323: 66 0f 6f c8 movdqa xmm1,xmm0 327: 66 0f 73 d9 04 psrldq xmm1,0x4 32c: 66 0f fe c1 paddd xmm0,xmm1 330: 66 0f 7e c0 movd eax,xmm0 334: 8d 54 90 01 lea edx,[rax+rdx*4+0x1] 338: 8b 47 14 mov eax,DWORD PTR [rdi+0x14] 33b: c1 e0 04 shl eax,0x4 33e: 01 d0 add eax,edx 340: c3 ret Which shows eliminating the trivial comparison, unlocks a new set of optimizations, i.e. SIMD-vectorization. In particular, the loop has been vectorized by loading the first four constants from aligned memory; the first four strides from non-aligned memory, then computes the product and reduction. It interleaves the above with computing 1 + 12*S[4] + 16*S[5] (as scalar operations) and then finishes the reduction. A similar effect can be observed for fully dynamic extents. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__all_static): New function. (__mdspan::_StaticExtents::_S_is_dyn): Inline and eliminate. (__mdspan::_ExtentsStorage::_S_is_dynamic): New method. (__mdspan::_ExtentsStorage::_M_extent): Use _S_is_dynamic. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve nearly fully dynamic extents in mdspan.Luc Grosheintz1-2/+2
One previous commit optimized fully dynamic extents; and another refactored __size such that __fwd_prod is valid for __r = 0, ..., rank (exclusive). Therefore, by noticing that __rev_prod (and __fwd_prod) never accesses the first (or last) extent, one can avoid pre-computing partial products of static extents in those cases, if all other extents are dynamic. We check that the size of the reference object file decreases further and the .rodata sections for __fwd_prod<dyn, ..., dyn, 11> __rev_prod<3, dyn, ..., dyn> are absent. libstdc++-v3/ChangeLog: * include/std/mdspan (__fwd_prods): Relax condition for fully-dynamic extents to cover (dyn, ..., dyn, X). (__rev_partial_prods): Analogous for (X, dyn, ..., dyn). Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve fully dynamic extents in mdspan.Luc Grosheintz1-10/+53
In mdspan related code, for extents with no static extents, i.e. only dynamic extents, the following simplifications can be made: - The array of dynamic extents has size rank. - The two arrays dynamic-index and dynamic-index-inv become trivial, e.g. k[i] == i. - All elements of the arrays __{fwd,rev}_partial_prods are 1. This commits eliminates the arrays for dynamic-index, dynamic-index-inv and __{fwd,rev}_partial_prods. It also removes the indirection k[i] == i from the source code, which isn't as relevant because the optimizer is (often) capable of eliminating the indirection. To check if it's working we look at: using E2 = std::extents<int, dyn, dyn, dyn, dyn>; int stride_left_E2(const std::layout_left::mapping<E2>& m, size_t r) { return m.stride(r); } which generates the following 0000000000000190 <stride_left_E2>: 190: 48 c1 e6 02 shl rsi,0x2 194: 74 22 je 1b8 <stride_left_E2+0x28> 196: 48 01 fe add rsi,rdi 199: b8 01 00 00 00 mov eax,0x1 19e: 66 90 xchg ax,ax 1a0: 48 63 17 movsxd rdx,DWORD PTR [rdi] 1a3: 48 83 c7 04 add rdi,0x4 1a7: 48 0f af c2 imul rax,rdx 1ab: 48 39 fe cmp rsi,rdi 1ae: 75 f0 jne 1a0 <stride_left_E2+0x10> 1b0: c3 ret 1b1: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 1b8: b8 01 00 00 00 mov eax,0x1 1bd: c3 ret We see that: - There's no code to load the partial product of static extents. - There's no indirection D[k[i]], it's just D[i] (as before). On a test file which computes both mapping::stride(r) and mapping::required_span_size, we check for static storage with objdump -h we don't see the NTTP _Extents, anything (anymore) related to _StaticExtents, __fwd_partial_prods or __rev_partial_prods. We also check that the size of the reference object file (described three commits prior) reduced by a few percent from 41.9kB to 39.4kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__all_dynamic): New function. (__mdspan::_StaticExtents::_S_dynamic_index): Convert to method. (__mdspan::_StaticExtents::_S_dynamic_index_inv): Ditto. (__mdspan::_StaticExtents): New specialization for fully dynamic extents. (__mdspan::__fwd_prod): New constexpr if branch to avoid instantiating __fwd_partial_prods. (__mdspan::__rev_prod): Ditto. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Improve low-rank layout_{left,right}::stride.Luc Grosheintz1-6/+28
The methods layout_{left,right}::mapping::stride are defined as \prod_{i = 0}^r E[i] \prod_{i = r+1}^n E[i] This is computed as the product of a precomputed static product and the product of the required dynamic extents. Disassembly shows that even for low-rank extents, i.e. rank == 1 and rank == 2, with at least one dynamic extent, the generated code loads two values; and then runs the loop over at most one element, e.g. for stride_left_d5 defined below the generated code is: 220: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0] 227: 00 228: 31 d2 xor edx,edx 22a: 48 85 c0 test rax,rax 22d: 74 23 je 252 <stride_left_d5+0x32> 22f: 48 8b 0c f5 00 00 00 mov rcx,QWORD PTR [rsi*8+0x0] 236: 00 237: 48 c1 e1 02 shl rcx,0x2 23b: 74 13 je 250 <stride_left_d5+0x30> 23d: 48 01 f9 add rcx,rdi 240: 48 63 17 movsxd rdx,DWORD PTR [rdi] 243: 48 83 c7 04 add rdi,0x4 247: 48 0f af c2 imul rax,rdx 24b: 48 39 f9 cmp rcx,rdi 24e: 75 f0 jne 240 <stride_left_d5+0x20> 250: 89 c2 mov edx,eax 252: 89 d0 mov eax,edx 254: c3 ret If there's no dynamic extents, it simply loads the precomputed product of static extents. For rank == 1 the answer is the constant `1`; for rank == 2 it's either 1 or extents.extent(k), with k == 0 for layout_left and k == 1 for layout_right. Consider, using Ed = std::extents<int, dyn>; int stride_left_d(const std::layout_left::mapping<Ed>& m, size_t r) { return m.stride(r); } using E3d = std::extents<int, 3, dyn>; int stride_left_3d(const std::layout_left::mapping<E3d>& m, size_t r) { return m.stride(r); } using Ed5 = std::extents<int, dyn, 5>; int stride_left_d5(const std::layout_left::mapping<Ed5>& m, size_t r) { return m.stride(r); } The optimized code for these three cases is: 0000000000000060 <stride_left_d>: 60: b8 01 00 00 00 mov eax,0x1 65: c3 ret 0000000000000090 <stride_left_3d>: 90: 48 83 fe 01 cmp rsi,0x1 94: 19 c0 sbb eax,eax 96: 83 e0 fe and eax,0xfffffffe 99: 83 c0 03 add eax,0x3 9c: c3 ret 00000000000000a0 <stride_left_d5>: a0: b8 01 00 00 00 mov eax,0x1 a5: 48 85 f6 test rsi,rsi a8: 74 02 je ac <stride_left_d5+0xc> aa: 8b 07 mov eax,DWORD PTR [rdi] ac: c3 ret For rank == 1 it simply returns 1 (as expected). For rank == 2, it either implements a branchless formula, or conditionally loads one value. In all cases involving a dynamic extent this seems like it's always doing clearly less work, both in terms of computation and loads. In cases not involving a dynamic extent, it replaces loading one value with a branchless sequence of four instructions. This commit also refactors __size to no use any of the precomputed arrays. This prevents instantiating __{fwd,rev}_partial_prods for low-rank extents. This results in a further size reduction of a reference object file (described two commits prior) by 9% from 46.0kB to 41.9kB. In a prior commit we optimized __size to produce better object code by precomputing the static products. This refactor enables the optimizer to generate the same optimized code. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__fwd_prod): Optimize for rank <= 2. (__mdspan::__rev_prod): Ditto. (__mdspan::__size): Refactor to use a pre-computed product, not a partial product. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Precompute products of static extents.Luc Grosheintz1-25/+52
Let E denote an multi-dimensional extent; n the rank of E; r = 0, ..., n; E[i] the i-th extent; and D[k] be the (possibly empty) array of dynamic extents. The two partial products for r = 0, ..., n: \prod_{i = 0}^r E[i] (fwd) \prod_{i = r+1}^n E[i] (rev) can be computed as the product of static and dynamic extents. The static fwd and rev product can be computed at compile time for all values of r. Three methods are directly affected by this optimization: layout_left::mapping::stride layout_right::mapping::stride mdspan::size We'll check the generated code (-O2) for all three methods for a generic (artificially) high-dimensional multi-dimensional extents. Consider a generic case: using Extents = std::extents<int, 3, 5, dyn, dyn, dyn, 7, dyn>; int stride_left(const std::layout_left::mapping<Extents>& m, size_t r) { return m.stride(r); } The code generated prior to this commit: 4f0: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 4f8 4f7: 00 4f8: 48 83 c6 01 add rsi,0x1 4fc: 48 c7 44 24 e8 ff ff mov QWORD PTR [rsp-0x18],0xffffffffffffffff 503: ff ff 505: 48 8d 04 f5 00 00 00 lea rax,[rsi*8+0x0] 50c: 00 50d: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0 512: 66 0f 76 c0 pcmpeqd xmm0,xmm0 516: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0 51b: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 523 522: 00 523: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0 528: 48 83 f8 38 cmp rax,0x38 52c: 74 72 je 5a0 <stride_right_E1+0xb0> 52e: 48 8d 54 04 b8 lea rdx,[rsp+rax*1-0x48] 533: 4c 8d 4c 24 f0 lea r9,[rsp-0x10] 538: b8 01 00 00 00 mov eax,0x1 53d: 0f 1f 00 nop DWORD PTR [rax] 540: 48 8b 0a mov rcx,QWORD PTR [rdx] 543: 49 89 c0 mov r8,rax 546: 4c 0f af c1 imul r8,rcx 54a: 48 83 f9 ff cmp rcx,0xffffffffffffffff 54e: 49 0f 45 c0 cmovne rax,r8 552: 48 83 c2 08 add rdx,0x8 556: 49 39 d1 cmp r9,rdx 559: 75 e5 jne 540 <stride_right_E1+0x50> 55b: 48 85 c0 test rax,rax 55e: 74 38 je 598 <stride_right_E1+0xa8> 560: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi*8+0x0] 567: 00 568: 48 c1 e2 02 shl rdx,0x2 56c: 48 83 fa 10 cmp rdx,0x10 570: 74 1e je 590 <stride_right_E1+0xa0> 572: 48 8d 4f 10 lea rcx,[rdi+0x10] 576: 48 01 d7 add rdi,rdx 579: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 580: 48 63 17 movsxd rdx,DWORD PTR [rdi] 583: 48 83 c7 04 add rdi,0x4 587: 48 0f af c2 imul rax,rdx 58b: 48 39 f9 cmp rcx,rdi 58e: 75 f0 jne 580 <stride_right_E1+0x90> 590: c3 ret 591: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 598: c3 ret 599: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 5a0: b8 01 00 00 00 mov eax,0x1 5a5: eb b9 jmp 560 <stride_right_E1+0x70> 5a7: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax*1+0x0] 5ae: 00 00 which seems to be performing: preparatory_work(); ret = 1 for(i = 0; i < rank; ++i) tmp = ret * E[i] if E[i] != -1 ret = tmp for(i = 0; i < rank_dynamic; ++i) ret *= D[i] This commit reduces it down to: 270: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0] 277: 00 278: 31 d2 xor edx,edx 27a: 48 85 c0 test rax,rax 27d: 74 33 je 2b2 <stride_right_E1+0x42> 27f: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi*8+0x0] 286: 00 287: 48 c1 e2 02 shl rdx,0x2 28b: 48 83 fa 10 cmp rdx,0x10 28f: 74 1f je 2b0 <stride_right_E1+0x40> 291: 48 8d 4f 10 lea rcx,[rdi+0x10] 295: 48 01 d7 add rdi,rdx 298: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax*1+0x0] 29f: 00 2a0: 48 63 17 movsxd rdx,DWORD PTR [rdi] 2a3: 48 83 c7 04 add rdi,0x4 2a7: 48 0f af c2 imul rax,rdx 2ab: 48 39 f9 cmp rcx,rdi 2ae: 75 f0 jne 2a0 <stride_right_E1+0x30> 2b0: 89 c2 mov edx,eax 2b2: 89 d0 mov eax,edx 2b4: c3 ret Loosely speaking this does the following: 1. Load the starting position k in the array of dynamic extents; and return if possible. 2. Load the partial product of static extents. 3. Computes the \prod_{i = k}^d D[i] where d is the number of dynamic extents in a loop. It shows that the span used for passing in the dynamic extents is completely eliminated; and the fact that the product always runs to the end of the array of dynamic extents is used by the compiler to eliminate one indirection to determine the end position in the array of dynamic extents. The analogous code is generated for layout_left. Next, consider using E2 = std::extents<int, 3, 5, dyn, dyn, 7, dyn, 11>; int size2(const std::mdspan<double, E2>& md) { return md.size(); } on immediately preceding commit the generated code is 10: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 18 17: 00 18: 49 89 f8 mov r8,rdi 1b: 48 8d 44 24 b8 lea rax,[rsp-0x48] 20: 48 c7 44 24 e8 0b 00 mov QWORD PTR [rsp-0x18],0xb 27: 00 00 29: 48 8d 7c 24 f0 lea rdi,[rsp-0x10] 2e: ba 01 00 00 00 mov edx,0x1 33: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0 38: 66 0f 76 c0 pcmpeqd xmm0,xmm0 3c: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0 41: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 49 48: 00 49: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0 4e: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax*1+0x0] 55: 00 00 00 00 59: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 60: 48 8b 08 mov rcx,QWORD PTR [rax] 63: 48 89 d6 mov rsi,rdx 66: 48 0f af f1 imul rsi,rcx 6a: 48 83 f9 ff cmp rcx,0xffffffffffffffff 6e: 48 0f 45 d6 cmovne rdx,rsi 72: 48 83 c0 08 add rax,0x8 76: 48 39 c7 cmp rdi,rax 79: 75 e5 jne 60 <size2+0x50> 7b: 48 85 d2 test rdx,rdx 7e: 74 18 je 98 <size2+0x88> 80: 49 63 00 movsxd rax,DWORD PTR [r8] 83: 49 63 48 04 movsxd rcx,DWORD PTR [r8+0x4] 87: 48 0f af c1 imul rax,rcx 8b: 41 0f af 40 08 imul eax,DWORD PTR [r8+0x8] 90: 0f af c2 imul eax,edx 93: c3 ret 94: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 98: 31 c0 xor eax,eax 9a: c3 ret which is needlessly long. The current commit reduces it down to: 10: 48 63 07 movsxd rax,DWORD PTR [rdi] 13: 48 63 57 04 movsxd rdx,DWORD PTR [rdi+0x4] 17: 48 0f af c2 imul rax,rdx 1b: 0f af 47 08 imul eax,DWORD PTR [rdi+0x8] 1f: 69 c0 83 04 00 00 imul eax,eax,0x483 25: c3 ret Which simply computes the product: D[0] * D[1] * D[2] * const where const is the product of all static extents. Meaning the loop to compute the product of dynamic extents has been fully unrolled and all constants are perfectly precomputed. The size of the object file described in the previous commit reduces by 17% from 55.8kB to 46.0kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__static_prod): New function. (__mdspan::__fwd_partial_prods): Constexpr array of partial forward products. (__mdspan::__fwd_partial_prods): Same for reverse partial products. (__mdspan::__static_extents_prod): Delete function. (__mdspan::__extents_prod): Renamed from __exts_prod and refactored. include/std/mdspan (__mdspan::__fwd_prod): Compute as the product of pre-computed static static and the product of dynamic extents. (__mdspan::__rev_prod): Ditto. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21libstdc++: Reduce template instantiations in <mdspan>.Luc Grosheintz1-19/+37
In mdspan related code involving static extents, often the IndexType is part of the template parameters, even though it's not needed. This commit extracts the parts of _ExtentsStorage not related to IndexType into a separate class _StaticExtents. It also prefers passing the array of static extents, instead of the whole extents object where possible. The size of an object file compiled with -O2 that instantiates Layout::mapping<extents<IndexType, Indices...>::stride Layout::mapping<extents<IndexType, Indices...>::required_span_size for the product of - eight IndexTypes - three Layouts, - nine choices of Indices... decreases by 19% from 69.2kB to 55.8kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::_StaticExtents): Extract non IndexType related code from _ExtentsStorage. (__mdspan::_ExtentsStorage): Use _StaticExtents. (__mdspan::__static_extents): Return reference to NTTP of _StaticExtents. (__mdspan::__contains_zero): New overload. (__mdspan::__exts_prod, __mdspan::__static_quotient): Use span to avoid copying __sta_exts. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
2025-08-21Merge BB and loop path in vect_analyze_stmtRichard Biener4-64/+37
We have now common patterns for most of the vectorizable_* calls, so merge. This also avoids calling vectorizable_early_exit for BB vect and clarifies signatures of it and vectorizable_phi. * tree-vectorizer.h (vectorizable_phi): Take bb_vec_info. (vectorizable_early_exit): Take loop_vec_info. * tree-vect-loop.cc (vectorizable_phi): Adjust. * tree-vect-slp.cc (vect_slp_analyze_operations): Likewise. (vectorize_slp_instance_root_stmt): Likewise. * tree-vect-stmts.cc (vectorizable_early_exit): Likewise. (vect_transform_stmt): Likewise. (vect_analyze_stmt): Merge the sequences of vectorizable_* where common.
2025-08-21MAINTAINERS: Update my email address and stand down as AArch64 maintainerRichard Sandiford1-2/+1
Today is my last working day at Arm, so this patch switches my MAINTAINERS entries to my personal email address. (It turns out that I never updated some of the later entries...oops) In order to avoid setting false expectations, and to try to avoid getting in the way, I'm also standing down as an AArch64 maintainer, effective EOB today. I might still end up reviewing the odd AArch64 patch under global reviewership though, depending on how things go :) ChangeLog: * MAINTAINERS: Update my email address and stand down as AArch64 maintainer.
2025-08-21Fortran: gfortran PDT component access [PR84122, PR85942]Paul Thomas4-2/+193
2025-08-21 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/84122 * parse.cc (parse_derived): PDT type parameters are not allowed an explicit access specification and must appear before a PRIVATE statement. If a PRIVATE statement is seen, mark all the other components as PRIVATE. PR fortran/85942 * simplify.cc (get_kind): Convert a PDT KIND component into a specification expression using the default initializer. gcc/testsuite/ PR fortran/84122 * gfortran.dg/pdt_38.f03: New test. PR fortran/85942 * gfortran.dg/pdt_39.f03: New test.
2025-08-20c++: pointer to auto member function [PR120757]Jason Merrill2-2/+26
Here r13-1210 correctly changed &A<int>::foo to not be considered type-dependent, but tsubst_expr of the OFFSET_REF got confused trying to tsubst a type that involved auto. Fixed by getting the type from the member rather than tsubst. PR c++/120757 gcc/cp/ChangeLog: * pt.cc (tsubst_expr) [OFFSET_REF]: Don't tsubst the type. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/auto-fn66.C: New test.
2025-08-21Daily bump.GCC Administrator7-1/+223
2025-08-20c++: lambda capture and shadowing [PR121553]Marek Polacek5-7/+23
P2036 says that this: [x=1]{ int x; } should be rejected, but with my P2036 we started giving an error for the attached testcase as well, breaking Dolphin. So let's keep the error only for init-captures. PR c++/121553 gcc/cp/ChangeLog: * name-lookup.cc (check_local_shadow): Check !is_normal_capture_proxy. gcc/testsuite/ChangeLog: * g++.dg/warn/Wshadow-19.C: Revert P2036 changes. * g++.dg/warn/Wshadow-6.C: Likewise. * g++.dg/warn/Wshadow-20.C: New test. * g++.dg/warn/Wshadow-21.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2025-08-20Regenerate common.opt.urls for -fdiagnostics-show-contextQing Zhao1-0/+6
When -fdiagnostics-show-context[=DEPTH] was added, they were documented, but common.opt.urls wasn't regenerated. gcc/ChangeLog: * common.opt.urls: Regenerate.