aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-04-16optimize Zicond conditional select cases.Fei Gao2-1/+17
When one of the two input operands is 0, ADD and IOR are functionally equivalent. ADD is slightly preferred over IOR because ADD has a higher likelihood of being implemented as a compressed instruction when compared to IOR. C.ADD uses the CR format with any of the 32 RVI registers availble, while C.OR uses the CA format with limit to just 8 of them. Conditional select, if zero case: rd = (rc == 0) ? rs1 : rs2 before patch: czero.nez rd, rs1, rc czero.eqz rtmp, rs2, rc or rd, rd, rtmp after patch: czero.eqz rd, rs1, rc czero.nez rtmp, rs2, rc add rd, rd, rtmp Same trick applies for the conditional select, if non-zero case: rd = (rc != 0) ? rs1 : rs2 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_conditional_move): replace or with add when expanding zicond if possible. gcc/testsuite/ChangeLog: * gcc.target/riscv/zicond-prefer-add-to-or.c: New test.
2024-04-16[strub] improve handling of indirected volatile parms [PR112938]Alexandre Oliva2-0/+19
The earlier patch for PR112938 arranged for volatile parms to be made indirect in internal strub wrapped bodies. The first problem that remained, more evident, was that the indirected parameter remained volatile, despite the indirection, but it wasn't regimplified, so indirecting it was malformed gimple. Regimplifying turned out not to be needed. The best course of action was to drop the volatility from the by-reference parm, that was being unexpectedly inherited from the original volatile parm. That exposed another problem: the dereferences would then lose their volatile status, so we had to bring volatile back to them. for gcc/ChangeLog PR middle-end/112938 * ipa-strub.cc (pass_ipa_strub::execute): Drop volatility from indirected parm. (maybe_make_indirect): Restore volatility in dereferences. for gcc/testsuite/ChangeLog PR middle-end/112938 * g++.dg/strub-internal-pr112938.cc: New.
2024-04-16LoongArch: Add indexes for some compilation options.Lulu Cheng12-13/+23
gcc/ChangeLog: * config/loongarch/loongarch.opt.urls: Regenerate. * config/mn10300/mn10300.opt.urls: Likewise. * config/msp430/msp430.opt.urls: Likewise. * config/nds32/nds32-elf.opt.urls: Likewise. * config/nds32/nds32-linux.opt.urls: Likewise. * config/nds32/nds32.opt.urls: Likewise. * config/pru/pru.opt.urls: Likewise. * config/riscv/riscv.opt.urls: Likewise. * config/rx/rx.opt.urls: Likewise. * config/sh/sh.opt.urls: Likewise. * config/sparc/sparc.opt.urls: Likewise. * doc/invoke.texi: Add indexes for some compilation options.
2024-04-16Daily bump.GCC Administrator6-1/+119
2024-04-15AVR: Add 8 more avrxmega3 MCUs.Georg-Johann Lay2-1/+9
gcc/ * config/avr/avr-mcus.def: Add: avr16du14, avr16du20, avr16du28, avr16du32, avr32du14, avr32du20, avr32du28, avr32du32. * doc/avr-mmcu.texi: Rebuild.
2024-04-15ada: Add documentation for Exceptional_CasesPiotr Trojanek4-807/+876
Add minimal description for pragma and aspect Exceptional_Cases, based on a similarly minimal descriptions for other SPARK contracts. gcc/ada/ * doc/gnat_rm/implementation_defined_aspects.rst (Exceptional_Cases): Add description for aspect. * doc/gnat_rm/implementation_defined_pragmas.rst (Exceptional_Cases): Add description for pragma. * gnat_rm.texi: Regenerate. * gnat_ugn.texi: Regenerate.
2024-04-15Guard longjmp in test to not inf loop [PR114720]Jørgen Kvalsvik1-1/+13
Guard the longjmp to not infinitely loop. The longjmp (jump) function is called unconditionally to make test flow simpler, but the jump destination would return to a point in main that would call longjmp again. The longjmp is really there to exercise the then-branch of setjmp, to verify coverage is accurately counted in the presence of complex edges. PR gcov-profile/114720 gcc/testsuite/ChangeLog: * gcc.misc-tests/gcov-22.c: Guard longjmp to not loop.
2024-04-15RISC-V: Add VLS to mask vec_extract [PR114668].Robin Dapp2-2/+37
This adds the missing VLS modes to the mask extract expanders. gcc/ChangeLog: PR target/114668 * config/riscv/autovec.md: Add VLS. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr114668.c: New test.
2024-04-15gcov-profile/114715 - missing coverage for switchRichard Biener2-0/+31
The following avoids missing coverage for the line of a switch statement which happens when gimplification emits a BIND_EXPR wrapping the switch as that prevents us from setting locations on the containing statements via annotate_all_with_location. Instead set the location of the GIMPLE switch directly. PR gcov-profile/114715 * gimplify.cc (gimplify_switch_expr): Set the location of the GIMPLE switch. * gcc.misc-tests/gcov-24.c: New testcase.
2024-04-15x86: Allow TImode offsettable memory only with 8-bit constantH.J. Lu5-17/+50
The x86 instruction size limit is 15 bytes. If a NDD instruction has a segment prefix byte, a 4-byte opcode prefix, a MODRM byte, a SIB byte, a 4-byte displacement and a 4-byte immediate, adding an address size prefix will exceed the size limit. Change TImode ADD, AND, OR and XOR to allow offsettable memory only with 8-bit signed integer constant, which is encoded with a 1-byte immediate, if the address size prefix is used. gcc/ PR target/114696 * config/i386/i386.md (isa): Add apx_ndd_64. (enabled): Likewise. (*add<dwi>3_doubleword): Change rjO to r,ro,jO with 8-bit signed integer constant and enable jO only for apx_ndd_64. (*add<dwi>3_doubleword_cc_overflow_1): Likewise. (*and<dwi>3_doubleword): Likewise. (*<code><dwi>3_doubleword): Likewise. gcc/testsuite/ PR target/114696 * gcc.target/i386/apx-ndd-x32-2a.c: New test. * gcc.target/i386/apx-ndd-x32-2b.c: Likewise. * gcc.target/i386/apx-ndd-x32-2c.c: Likewise. * gcc.target/i386/apx-ndd-x32-2d.c: Likewise.
2024-04-15middle-end: adjust loop upper bounds when peeling for gaps and early break ↵Tamar Christina3-8/+124
[PR114403]. This fixes a bug with the interaction between peeling for gaps and early break. Before I go further, I'll first explain how I understand this to work for loops with a single exit. When peeling for gaps we peel N < VF iterations to scalar. This happens by removing N iterations from the calculation of niters such that vect_iters * VF == niters is always false. In other words, when we exit the vector loop we always fall to the scalar loop. The loop bounds adjustment guarantees this. Because of this we potentially execute a vector loop iteration less. That is, if you're at the boundary condition where niters % VF by peeling one or more scalar iterations the vector loop executes one less. This is accounted for by the adjustments in vect_transform_loops. This adjustment happens differently based on whether the the vector loop can be partial or not: Peeling for gaps sets the bias to 0 and then: when not partial: we take the floor of (scalar_upper_bound / VF) - 1 to get the vector latch iteration count. when loop is partial: For a single exit this means the loop is masked, we take the ceil to account for the fact that the loop can handle the final partial iteration using masking. Note that there's no difference between ceil an floor on the boundary condition. There is a difference however when you're slightly above it. i.e. if scalar iterates 14 times and VF = 4 and we peel 1 iteration for gaps. The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in effect the partial iteration is ignored and it's done as scalar. This is fine because the niters modification has capped the vector iteration at 2. So that when we reduce the induction values you end up entering the scalar code with ind_var.2 = ind_var.1 + 2 * VF. Now lets look at early breaks. To make it esier I'll focus on the specific testcase: char buffer[64]; __attribute__ ((noipa)) buff_t *copy (buff_t *first, buff_t *last) { char *buffer_ptr = buffer; char *const buffer_end = &buffer[SZ-1]; int store_size = sizeof(first->Val); while (first != last && (buffer_ptr + store_size) <= buffer_end) { const char *value_data = (const char *)(&first->Val); __builtin_memcpy(buffer_ptr, value_data, store_size); buffer_ptr += store_size; ++first; } if (first == last) return 0; return first; } Here the first, early exit is on the condition: (buffer_ptr + store_size) <= buffer_end and the main exit is on condition: first != last This is important, as this bug only manifests itself when the first exit has a known constant iteration count that's lower than the latch exit count. because buffer holds 64 bytes, and VF = 4, unroll = 2, we end up processing 16 bytes per iteration. So the exit has a known bounds of 8 + 1. The vectorizer correctly analizes this: Statement (exit)if (ivtmp_21 != 0) is executed at most 8 (bounded by 8) + 1 times in loop 1. and as a consequence the IV is bound by 9: # vect_vec_iv_.14_117 = PHI <_118(9), { 9, 8, 7, 6 }(20)> ... vect_ivtmp_21.16_124 = vect_vec_iv_.14_117 + { 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615 }; mask_patt_22.17_126 = vect_ivtmp_21.16_124 != { 0, 0, 0, 0 }; if (mask_patt_22.17_126 == { -1, -1, -1, -1 }) goto <bb 3>; [88.89%] else goto <bb 30>; [11.11%] The imporant bits are this: In this example the value of last - first = 416. the calculated vector iteration count, is: x = (((ptr2 - ptr1) - 16) / 16) + 1 = 27 the bounds generated, adjusting for gaps: x == (((x - 1) >> 2) << 2) which means we'll always fall through to the scalar code. as intended. Here are two key things to note: 1. In this loop, the early exit will always be the one taken. When it's taken we enter the scalar loop with the correct induction value to apply the gap peeling. 2. If the main exit is taken, the induction values assumes you've finished all vector iterations. i.e. it assumes you have completed 24 iterations, as we treat the main exit the same for normal loop vect and early break when not PEELED. This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24 * VF; So what's going wrong. The vectorizer's codegen is correct and efficient, however when we adjust the upper bounds, that code knows that the loops upper bound is based on the early exit. i.e. 8 latch iterations. or in other words. It thinks the loop iterates once. This is incorrect as the vector loop iterates twice, as it has set up the induction value such that it exits at the early exit. So it in effect iterates 2.5x times. Becuase the upper bound is incorrect, when we unroll it now exits from the main exit which uses the incorrect induction value. So there are three ways to fix this: 1. If we take the position that the main exit should support both premature exits and final exits then vect_update_ivs_after_vectorizer needs to be skipped for this case, and vectorizable_induction updated with third case where we reduce with LAST reduction based on the IVs instead of assuming you're at the end of the vector loop. I don't like this approach. It don't think we should add a third induction style to cover up an issue introduced by unrolling. It makes the code harder to follow and makes main exits harder to reason about. 2. We could say that vec_init_loop_exit_info should pick the exit which has the smallest known iteration count. This would turn this case into a PEELED case and the induction values would be correct as we'd always recalculate them from a reduction. This is suboptimal though as the reason we pick the latch exit as the IV one is to prevent having to rotate the loop. This results in more efficient code for what we assume is the common case, i.e. the main exit. 3. In PR113734 we've established that for vectorization of early breaks that we must always treat the loop as partial. Here partiallity means that we have enough vector elements to start the iteration, but we may take an early exit and so never reach the latch/main exit. This requirement is overwritten by the peeling for gaps adjustment of the upper bound. I believe the bug is simply that this shouldn't be done. The adjustment here is to indicate that the main exit always leads to the scalar loop when peeling for gaps. But this invariant is already always true for all early exits. Remember that early exits restart the scalar loop at the start of the vector iteration, so the induction values will start it where we want to do the gaps peeling. I think no# 3 is the correct fix, and also one that doesn't degrade code quality. gcc/ChangeLog: PR tree-optimization/114403 * tree-vect-loop.cc (vect_transform_loop): Adjust upper bounds for when peeling for gaps and early break. gcc/testsuite/ChangeLog: PR tree-optimization/114403 * gcc.dg/vect/vect-early-break_124-pr114403.c: New test. * gcc.dg/vect/vect-early-break_125-pr114403.c: New test.
2024-04-15Inline 'gcc/rust/Make-lang.in:RUST_LIBDEPS' into single userThomas Schwinge1-3/+1
gcc/rust/ * Make-lang.in (RUST_LIBDEPS): Inline into single user.
2024-04-15Add 'gcc/rust/Make-lang.in:LIBPROC_MACRO_INTERNAL'Thomas Schwinge1-2/+4
... to avoid verbatim repetition. gcc/rust/ * Make-lang.in (LIBPROC_MACRO_INTERNAL): New. (RUST_LIBDEPS, crab1$(exeext)): Use it.
2024-04-15Inline 'gcc/rust/Make-lang.in:RUST_LDFLAGS' into single userThomas Schwinge1-2/+1
gcc/rust/ * Make-lang.in (RUST_LDFLAGS): Inline into single user.
2024-04-15Remove 'libgrust/libproc_macro_internal' from ↵Thomas Schwinge1-1/+1
'gcc/rust/Make-lang.in:RUST_LDFLAGS' This isn't necessary, as the full path to 'libproc_macro_internal.a' is specified elsewhere. gcc/rust/ * Make-lang.in (RUST_LDFLAGS): Remove 'libgrust/libproc_macro_internal'.
2024-04-15testsuite: i386: Restrict gcc.target/i386/fhardened-1.c etc. to Linux/GNURainer Orth2-0/+2
The new gcc.target/i386/fhardened-1.c etc. tests FAIL on Solaris/x86 and Darwin/x86: FAIL: gcc.target/i386/fhardened-1.c (test for excess errors) FAIL: gcc.target/i386/fhardened-2.c (test for excess errors) Excess errors: cc1: warning: '-fhardened' not supported for this target Support for -fhardened is restricted to HAVE_FHARDENED_SUPPORT in toplev.cc (process_options) which again is only defined for linux*|gnu* targets in gcc/configure.ac. Accordingly, this patch restricts the tests to those two, as is already done in gcc.target/i386/cf_check-6.c. Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu. 2024-04-15 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: * gcc.target/i386/fhardened-1.c: Restrict to Linux/GNU. * gcc.target/i386/fhardened-2.c: Likewise.
2024-04-15attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]Jakub Jelinek2-1/+14
The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions assumes that all decls must have types. I think it is better in something as unimportant as diag_attr_exclusions to be more robust, if there is no type, it can just diagnose exclusions on the DECL_ATTRIBUTES, like for types it only diagnoses it on TYPE_ATTRIBUTES. 2024-04-15 Jakub Jelinek <jakub@redhat.com> PR c++/114634 * attribs.cc (diag_attr_exclusions): Set attrs[1] to NULL_TREE for decls with NULL TREE_TYPE. * g++.dg/ext/attrib68.C: New test.
2024-04-15c++: Only emit exported GMF usings [PR114600]Nathaniel Shead2-1/+15
A typo in r14-6978 made us emit too many things. This ensures that we don't emit using-declarations from the GMF that we don't need to. PR c++/114600 gcc/cp/ChangeLog: * module.cc (depset::hash::add_binding_entity): Require both WMB_Using and WMB_Export for GMF entities. gcc/testsuite/ChangeLog: * g++.dg/modules/using-14.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Co-authored-by: Patrick Palka <ppalka@redhat.com>
2024-04-15Daily bump.GCC Administrator3-1/+12
2024-04-14c++: Setup aliases imported from modules [PR106820]Nathaniel Shead3-0/+22
I wonder if more generally we need to be doing more work when importing definitions from header units especially to handle all the work that 'make_rtl_for_nonlocal_decl' and 'rest_of_decl_compilation' would have been performing. But this patch fixes at least one missing step. PR c++/106820 gcc/cp/ChangeLog: * module.cc (trees_in::decl_value): Assemble alias when needed. gcc/testsuite/ChangeLog: * g++.dg/modules/pr106820_a.H: New test. * g++.dg/modules/pr106820_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2024-04-14Daily bump.GCC Administrator3-1/+52
2024-04-13Regenerate c.opt.urlsMark Wielaard1-0/+3
Fixes: df7bfdb7dbf2 ("c++: reference cast, conversion fn [PR113141]") A new warning option -Wcast-user-defined was added to c.opt and documented in doc/invoke.texi. But c.opt.urls wasn't regenerate. gcc/c-family/ChangeLog: * c.opt.urls: Regenerate.
2024-04-13c++/modules: make bits_in/out move-constructiblePatrick Palka1-0/+2
gcc/cp/ChangeLog: * module.cc (struct bytes_in::bits_in): Define defaulted move ctor. (struct bytes_out::bits_out): Likewise.
2024-04-13c++/modules: optimize tree flag streamingPatrick Palka1-193/+262
One would expect consecutive calls to bytes_in/out::b for streaming adjacent bits, as is done for tree flag streaming, to at least be optimized by the compiler into individual bit operations using statically known bit positions (and ideally combined into larger sized reads/writes). Unfortunately this doesn't happen because the compiler has trouble tracking the values of this->bit_pos and this->bit_val across the calls, likely because the compiler doesn't know the value of 'this'. Thus for each consecutive bit stream operation, bit_pos and bit_val are loaded from 'this', checked if buffering is needed, and finally the bit is extracted from bit_val according to the (unknown) bit_pos, even though relative to the previous operation (if we didn't need to buffer) bit_val is unchanged and bit_pos is just 1 larger. This ends up being quite slow, with tree_node_bools taking 10% of time when streaming in the std module. This patch improves this by making tracking of bit_pos and bit_val easier for the compiler. Rather than bit_pos and bit_val being members of the (effectively global) bytes_in/out objects, this patch factors out the bit streaming code/state into separate classes bits_in/out that get constructed locally as needed for bit streaming. Since these objects are now clearly local, the compiler can more easily track their values and optimize away redundant buffering checks. And since bit streaming is intended to be batched it's natural for these new classes to be RAII-enabled such that the bit stream is flushed upon destruction. In order to make the most of this improved tracking of bit position, this patch changes parts where we conditionally stream a tree flag to unconditionally stream (the flag or a dummy value). That way the number of bits streamed and the respective bit positions are as statically known as reasonably possible. In lang_decl_bools and lang_type_bools this patch makes us flush the current bit buffer at the start so that subsequent bit positions are in turn statically known. And in core_bools, we can add explicit early exits utilizing invariants that the compiler can't figure out itself (e.g. a tree code can't have both TS_TYPE_COMMON and TS_DECL_COMMON, and if a tree code doesn't have TS_DECL_COMMON then it doesn't have TS_DECL_WITH_VIS). This patch also moves the definitions of the relevant streaming classes into anonymous namespaces so that the compiler can make more informed decisions about inlining their member functions. After this patch, compile time for a simple Hello World using the std module is reduced by 7% with a release compiler. The on-disk size of the std module increases by 0.4% (presumably due to the extra flushing done in lang_decl_bools and lang_type_bools). The bit stream out performance isn't improved as much as the stream in due to the spans/lengths instrumentation performed on stream out (which maybe should be disabled for release builds?) gcc/cp/ChangeLog: * module.cc: Update comment about classes defined within. (class data): Enclose in an anonymous namespace. (data::calc_crc): Moved from bytes::calc_crc. (class bytes): Remove. Move bit_flush to namespace scope. (class bytes_in): Enclose in an anonymous namespace. Inherit directly from data and adjust accordingly. Move b and bflush members to bits_in. (class bytes_out): As above. Remove is_set static data member. (bit_flush): Moved from class bytes. (struct bytes_in::bits_in): Define. (struct bytes_out::bits_out): Define. (bytes_in::stream_bits): Define. (bytes_out::stream_bits): Define. (bytes_out::bflush): Moved to bits_out/in. (bytes_in::bflush): Likewise (bytes_in::bfill): Removed. (bytes_out::b): Moved to bits_out/in. (bytes_in::b): Likewise. (class trees_in): Enclose in an anonymous namespace. (class trees_out): Enclose in an anonymous namespace. (trees_out::core_bools): Add bits_out/in parameter and use it. Unconditionally stream a bit for public_flag. Add early exits as appropriate. (trees_out::core_bools): Likewise. (trees_out::lang_decl_bools): Add bits_out/in parameter and use it. Flush the current bit buffer at the start. Unconditionally stream a bit for module_keyed_decls_p. (trees_in::lang_decl_bools): Likewise. (trees_out::lang_type_bools): Add bits_out/in parameter and use it. Flush the current bit buffer at the start. (trees_in::lang_type_bools): Likewise. (trees_out::tree_node_bools): Construct a bits_out object and use/pass it. (trees_in::tree_node_bools): Likewise. (trees_out::decl_value): Likewise. (trees_in::decl_value): Likewise. (module_state::write_define): Likewise. (module_state::read_define): Likewise. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-04-13Daily bump.GCC Administrator6-1/+277
2024-04-13aarch64: Add rcpc3 dependency on rcpc2 and rcpcAndrew Carlotti3-4/+5
We don't yet have a separate feature flag for FEAT_LRCPC2 (and adding one will require extending the feature bitmask). Instead, make the FEAT_LRCPC2 patterns available when either armv8.4-a or +rcpc3 is specified. We already have a +rcpc flag, so this dependency can be specified directly. Also add an explicit dependance on +rcpc to the FEAT_LRCPC2 patterns, so that they are disabled with armv8.4-a+norcpc. The cpunative test needed updating because it used an invalid Features list, since lrcpc3 requires both ilrcpc and lrcpc to be present. Without this change, host_detect_local_cpu would return the architecture string 'armv8-a+dotprod+crc+crypto+rcpc3+norcpc'. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def: Add RCPC to RCPC3 dependencies. * config/aarch64/aarch64.h (AARCH64_ISA_RCPC8_4): Add test for RCPC3 bit gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/info_24: Include lrcpc and ilrcpc.
2024-04-13aarch64: Enable +cssc for armv8.9-aAndrew Carlotti1-1/+1
FEAT_CSSC is mandatory in the architecture from Armv8.9. gcc/ChangeLog: * config/aarch64/aarch64-arches.def: Add CSSC to V8_9A dependencies.
2024-04-12c++: ICE with temporary of class type in array DMI [PR109966]Marek Polacek3-39/+92
This ICE started with the fairly complicated r13-765. We crash in gimplify_var_or_parm_decl because a stray VAR_DECL leaked there. The problem is ultimately that potential_prvalue_result_of wasn't correctly handling arrays and replace_placeholders_for_class_temp_r replaced a PLACEHOLDER_EXPR in a TARGET_EXPR which is used in the context of copy elision. If I have M m[2] = { M{""}, M{""} }; then we don't invoke the M(const M&) copy-ctor. One part of the fix is to use TARGET_EXPR_ELIDING_P rather than potential_prvalue_result_of. That unfortunately doesn't handle the case like struct N { N(M); }; N arr[2] = { M{""}, M{""} }; because TARGET_EXPRs that initialize a function argument are not marked TARGET_EXPR_ELIDING_P even though gimplify_arg drops such TARGET_EXPRs on the floor. We can use a pset to avoid replacing placeholders in them. I made an attempt to use set_target_expr_eliding in convert_for_arg_passing but that regressed constexpr-diag1.C, and does not seem like a prudent change in stage 4 anyway. PR c++/109966 gcc/cp/ChangeLog: * typeck2.cc (potential_prvalue_result_of): Remove. (replace_placeholders_for_class_temp_r): Check TARGET_EXPR_ELIDING_P. Use a pset. Don't replace_placeholders in TARGET_EXPRs that initialize a function argument. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/nsdmi-aggr20.C: New test. * g++.dg/cpp1y/nsdmi-aggr21.C: New test.
2024-04-12rs6000: Add OPTION_MASK_POWER8 [PR101865]Will Schmidt13-9/+245
The bug in PR101865 is the _ARCH_PWR8 predefine macro is conditional upon TARGET_DIRECT_MOVE, which can be false for some -mcpu=power8 compiles if the -mno-altivec or -mno-vsx options are used. The solution here is to create a new OPTION_MASK_POWER8 mask that is true for -mcpu=power8, regardless of Altivec or VSX enablement. Unfortunately, the only way to create an OPTION_MASK_* mask is to create a new option, which we have done here, but marked it as WarnRemoved since we do not want users using it. For stage1, we will look into how we can create ISA mask flags for use in the compiler without the need for explicit options. 2024-04-12 Will Schmidt <will_schmidt@linux.ibm.com> Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/101865 * config/rs6000/rs6000-builtin.cc (rs6000_builtin_is_supported): Use TARGET_POWER8. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Use OPTION_MASK_POWER8. * config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Add OPTION_MASK_POWER8. (ISA_2_7_MASKS_SERVER): Likewise. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Update comment. Use OPTION_MASK_POWER8 and TARGET_POWER8. * config/rs6000/rs6000.h (TARGET_SYNC_HI_QI): Use TARGET_POWER8. * config/rs6000/rs6000.md (define_attr "isa"): Add p8. (define_attr "enabled"): Handle it. (define_insn "prefetch"): Use TARGET_POWER8. * config/rs6000/rs6000.opt (mpower8-internal): New. gcc/testsuite/ PR target/101865 * gcc.target/powerpc/predefine-p7-novsx.c: New test. * gcc.target/powerpc/predefine-p8-noaltivec-novsx.c: New test. * gcc.target/powerpc/predefine-p8-noaltivec.c: New test. * gcc.target/powerpc/predefine-p8-novsx.c: New test. * gcc.target/powerpc/predefine-p8-pragma-vsx.c: New test. * gcc.target/powerpc/predefine-p9-novsx.c: New test.
2024-04-12c++/modules: local type merging [PR99426]Patrick Palka6-31/+222
One known missing piece in the modules implementation is merging of a streamed-in local type (class or enum) with the corresponding in-TU version of the local type. This missing piece turns out to cause a hard-to-reduce use-after-free GC issue due to the entity_ary not being marked as a GC root (deliberately), and manifests as a serialization error on stream-in as in PR99426 (see comment #6 for a reduction). It's also reproducible on trunk when running the xtreme-header tests without -fno-module-lazy. This patch implements this missing piece, making us merge such local types according to their position within the containing function's definition, analogous to how we merge FIELD_DECLs of a class according to their index in the TYPE_FIELDS list. PR c++/99426 gcc/cp/ChangeLog: * module.cc (merge_kind::MK_local_type): New enumerator. (merge_kind_name): Update. (trees_out::chained_decls): Move BLOCK-specific handling of DECL_LOCAL_DECL_P decls to ... (trees_out::core_vals) <case BLOCK>: ... here. Stream BLOCK_VARS manually. (trees_in::core_vals) <case BLOCK>: Stream BLOCK_VARS manually. Handle deduplicated local types.. (trees_out::key_local_type): Define. (trees_in::key_local_type): Define. (trees_out::get_merge_kind) <case FUNCTION_DECL>: Return MK_local_type for a local type. (trees_out::key_mergeable) <case FUNCTION_DECL>: Use key_local_type. (trees_in::key_mergeable) <case FUNCTION_DECL>: Likewise. (trees_in::is_matching_decl): Be flexible with type mismatches for local entities. (trees_in::register_duplicate): Also register the DECL_TEMPLATE_RESULT of a TEMPLATE_DECL as a duplicate. (depset_cmp): Return 0 for equal IDENTIFIER_HASH_VALUEs. gcc/testsuite/ChangeLog: * g++.dg/modules/merge-17.h: New test. * g++.dg/modules/merge-17_a.H: New test. * g++.dg/modules/merge-17_b.C: New test. * g++.dg/modules/xtreme-header-7_a.H: New test. * g++.dg/modules/xtreme-header-7_b.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-04-12c++: reference cast, conversion fn [PR113141]Jason Merrill4-1/+48
The second testcase in 113141 is a separate issue: we first decide that the conversion is ill-formed, but then when recalculating the special c_cast_p handling makes us think it's OK. We don't want that, it should continue to fall back to the reinterpret_cast interpretation. And while we're here, let's warn that we're not using the conversion function. Note that the standard seems to say that in this case we should treat (Matrix &) as const_cast<Matrix &>(static_cast<const Matrix &>(X)), which would use the conversion operator, but that doesn't match existing practice, so let's resolve that another day. I've raised this issue with CWG; at the moment I lean toward never binding a temporary in a C-style cast to reference type, which would also be a change from existing practice. PR c++/113141 gcc/c-family/ChangeLog: * c.opt: Add -Wcast-user-defined. gcc/ChangeLog: * doc/invoke.texi: Document -Wcast-user-defined. gcc/cp/ChangeLog: * call.cc (reference_binding): For an invalid cast, warn and don't recalculate. gcc/testsuite/ChangeLog: * g++.dg/conversion/ref12.C: New test. Co-authored-by: Patrick Palka <ppalka@redhat.com>
2024-04-12c++: reference list-init, conversion fn [PR113141]Jason Merrill4-4/+56
The original testcase in PR113141 is an instance of CWG1996; the standard fails to consider conversion functions when initializing a reference directly from an initializer-list of one element, but then does consider them when initializing a temporary. I have a proposed fix for this defect, which is implemented here. DR 1996 PR c++/113141 gcc/cp/ChangeLog: * call.cc (reference_binding): Check direct binding from a single-element list. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-ref1.C: New test. * g++.dg/cpp0x/initlist-ref2.C: New test. * g++.dg/cpp0x/initlist-ref3.C: New test. Co-authored-by: Patrick Palka <ppalka@redhat.com>
2024-04-12Regenerate opt.urlsTatsuyuki Ishi1-0/+2
Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.") gcc/ChangeLog: * config/riscv/riscv.opt.urls: Regenerated. Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-12c++: Fix bogus warnings about ignored annotations [PR114691]Jakub Jelinek2-1/+27
The middle-end warns about the ANNOTATE_EXPR added for while/for loops if they declare a var inside of the loop condition. This is because the assumption is that ANNOTATE_EXPR argument is used immediately in a COND_EXPR (later GIMPLE_COND), but simplify_loop_decl_cond wraps the ANNOTATE_EXPR inside of a TRUTH_NOT_EXPR, so it no longer holds. The following patch fixes that by adding the TRUTH_NOT_EXPR inside of the ANNOTATE_EXPR argument if any. 2024-04-12 Jakub Jelinek <jakub@redhat.com> PR c++/114691 * semantics.cc (simplify_loop_decl_cond): Use cp_build_unary_op with TRUTH_NOT_EXPR on ANNOTATE_EXPR argument (if any) rather than ANNOTATE_EXPR itself. * g++.dg/ext/pr114691.C: New test.
2024-04-12c++: templated substitution into lambda-expr, cont [PR114393]Patrick Palka2-2/+21
The original PR114393 testcase is unfortunately still not accepted after r14-9938-g081c1e93d56d35 due to return type deduction confusion when a lambda-expr is used as a default template argument. The below reduced testcase demonstrates the bug. Here when forming the dependent specialization b_v<U> we substitute the default argument of F, a lambda-expr, with _Descriptor=U. (In this case in_template_context is true since we're in the context of the template c_v, so we don't defer.) This substitution in turn lowers the level of the lambda's auto return type from 2 to 1 and so later, when instantiating c_v<int, char> we wrongly substitute this auto with the template argument at level=0,index=0, i.e. int, instead of going through do_auto_deduction which would yield char. One way to fix this would be to use a level-less auto to represent a deduced return type of a lambda, but that might be too invasive of a change at this stage, and it might be better to do this across the board for all deduced return types. Another way would be to pass tf_partial from coerce_template_parms during dependent substitution into a default template argument so that the substitution doesn't do any level-lowering, but that wouldn't do the right thing in this case due to the tf_partial early exit in the LAMBDA_EXPR case of tsubst_expr. Yet another way, and the approach that this patch takes, is to just defer all dependent substitution into a lambda-expr, building upon the logic added in r14-9938-g081c1e93d56d35. This also helps ensure LAMBDA_EXPR_REGEN_INFO consists only of the concrete template arguments that were ultimately substituted into the most general lambda. PR c++/114393 gcc/cp/ChangeLog: * pt.cc (tsubst_lambda_expr): Also defer all dependent substitution. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-targ2a.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-04-12c++: Diagnose or avoid constexpr dtors in classes with virtual bases [PR114426]Jakub Jelinek3-10/+21
I had another look at this P1 PR today. You said in the "c++: fix in-charge parm in constexpr" mail back in December (as well as in the r14-6507 commit message): "Since a class with vbases can't have constexpr 'tors there isn't actually a need for an in-charge parameter in a destructor" but the ICE is because the destructor is marked implicitly constexpr. https://eel.is/c++draft/dcl.constexpr#3.2 says that a destructor of a class with virtual bases is not constexpr-suitable, but we were actually implementing this just for constructors, so clearly my fault from the https://wg21.link/P0784R7 implementation. That paper clearly added that sentence in there and removed similar sentence just from the constructor case. So, the following patch makes sure the else if (CLASSTYPE_VBASECLASSES (DECL_CONTEXT (fun))) { ret = false; if (complain) error ("%q#T has virtual base classes", DECL_CONTEXT (fun)); } hunk is done no just for DECL_CONSTRUCTOR_P (fun), but also DECL_DESTRUCTOR_P (fun) - in that case just for cxx_dialect >= cxx20, as for cxx_dialect < cxx20 we already set ret = false; and diagnose a different error, so no need to diagnose two. 2024-04-12 Jakub Jelinek <jakub@redhat.com> PR c++/114426 * constexpr.cc (is_valid_constexpr_fn): Return false/diagnose with complain destructors in classes with virtual bases. * g++.dg/cpp2a/pr114426.C: New test. * g++.dg/cpp2a/constexpr-dtor16.C: New test.
2024-04-12match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]Andrew Pinski2-1/+18
The problem is `!a?b:c` pattern will create a COND_EXPR with an 1bit signed integer which breaks patterns like `a?~t:t`. This rejects when we have a signed operand for both patterns. Note for GCC 15, I am going to look at the canonicalization of `a?~t:t` where t was a constant since I think keeping it a COND_EXPR might be more canonical and is what VPR produces from the same IR; if anything expand should handle which one is better. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/114666 gcc/ChangeLog: * match.pd (`!a?b:c`): Reject signed types for the condition. (`a?~t:t`): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/bitfld-signed1-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-04-12aarch64: Avoid using mismatched ZERO ZA sizesRichard Sandiford2-11/+15
The svzero_mask_za intrinsic tried to use the shortest combination of .b, .h, .s and .d tiles, allowing mixtures of sizes where necessary. However, Iain S pointed out that LLVM instead requires the tiles to have the same suffix. GAS supports both versions, so this patch generates the LLVM-friendly form. gcc/ * config/aarch64/aarch64.cc (aarch64_output_sme_zero_za): Require all tiles to have the same suffix. gcc/testsuite/ * gcc.target/aarch64/sme/acle-asm/zero_mask_za.c (zero_mask_za_ab) (zero_mask_za_d7, zero_mask_za_bf): Expect a list of .d tiles instead of a mixture.
2024-04-12s390: testsuite: Xfail range-sincos.c and vrp-float-abs-1.cStefan Schulze Frielinghaus2-2/+2
As mentioned in PR114678 those failures will be fixed by https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html For GCC 14 just xfail them which should be reverted once the patch is applied. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/range-sincos.c: Xfail for s390. * gcc.dg/tree-ssa/vrp-float-abs-1.c: Dito.
2024-04-12c++: templated substitution into lambda-expr [PR114393]Patrick Palka6-2/+70
The below testcases use a lambda-expr as a template argument and they all trip over the below added tsubst_lambda_expr sanity check ultimately because current_template_parms is empty which causes push_template_decl to return error_mark_node from the call to begin_lambda_type. Were it not for the sanity check this silent error_mark_node result leads to nonsensical errors down the line, or silent breakage. In the first testcase, we hit this assert during instantiation of the dependent alias template-id c1_t<_Data> from instantiate_template, which clears current_template_parms via push_to_top_level. Similar story for the second testcase. For the third testcase we hit the assert during partial instantiation of the member template from instantiate_class_template which similarly calls push_to_top_level. These testcases illustrate that templated substitution into a lambda-expr is not always possible, in particular when we lost the relevant template context. I experimented with recovering the template context by making tsubst_lambda_expr fall back to using scope_chain->prev->template_parms if current_template_parms is empty which worked but seemed like a hack. I also experimented with preserving the template context by keeping current_template_parms set during instantiate_template for a dependent specialization which also worked but it's at odds with the fact that we cache dependent specializations (and so they should be independent of the template context). So instead of trying to make such substitution work, this patch uses the extra-args mechanism to defer templated substitution into a lambda-expr when we lost the relevant template context. PR c++/114393 PR c++/107457 PR c++/93595 gcc/cp/ChangeLog: * cp-tree.h (LAMBDA_EXPR_EXTRA_ARGS): Define. (tree_lambda_expr::extra_args): New field. * module.cc (trees_out::core_vals) <case LAMBDA_EXPR>: Stream LAMBDA_EXPR_EXTRA_ARGS. (trees_in::core_vals) <case LAMBDA_EXPR>: Likewise. * pt.cc (has_extra_args_mechanism_p): Return true for LAMBDA_EXPR. (tree_extra_args): Handle LAMBDA_EXPR. (tsubst_lambda_expr): Use LAMBDA_EXPR_EXTRA_ARGS to defer templated substitution into a lambda-expr if we lost the template context. Add sanity check for error_mark_node result from begin_lambda_type. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-targ2.C: New test. * g++.dg/cpp2a/lambda-targ3.C: New test. * g++.dg/cpp2a/lambda-targ4.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-04-12RISC-V: Fix Werror=sign-compare in riscv_validate_vector_typePan Li1-5/+5
This patch would like to fix the Werror=sign-compare similar to below: gcc/config/riscv/riscv.cc: In function ‘void riscv_validate_vector_type(const_tree, const char*)’: gcc/config/riscv/riscv.cc:5614:23: error: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Werror=sign-compare] 5614 | if (TARGET_MIN_VLEN < required_min_vlen) The TARGET_MIN_VLEN is *int* by default but the required_min_vlen returned from riscv_vector_required_min_vlen is **unsigned**. Thus, adjust the related function and reference variable(s) to int type to avoid such kind of Werror. The below test suite is passed for this patch. * The rv64gcv fully regression tests. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_vector_float_type_p): Take int as the return value instead of unsigned. (riscv_vector_element_bitsize): Ditto. (riscv_vector_required_min_vlen): Ditto. (riscv_validate_vector_type): Take int type for local variable(s). Signed-off-by: Pan Li <pan2.li@intel.com>
2024-04-12analyzer: Bail out on function pointer for -Wanalyzer-allocation-sizeStefan Schulze Frielinghaus1-0/+4
On s390 pr94688.c is failing due to excess error pr94688.c:6:5: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] This is because on s390 functions are by default aligned to an 8-byte boundary and during function type construction size is set to function boundary. Thus, for the assignment a.0_1 = (void (*<T237>) ()) &a; we have that the right-hand side is pointing to a 4-byte memory region whereas the size of the function pointer is 8 byte and a warning is emitted. Since -Wanalyzer-allocation-size is not about pointers to code, bail out early. gcc/analyzer/ChangeLog: * region-model.cc (region_model::check_region_size): Bail out early on function pointers.
2024-04-12tree-cfg: Make the verifier returns_twice message translatableJakub Jelinek1-7/+12
While translation of the verifier messages is questionable, that case is something that ideally should never happen except to gcc developers and so pressumably English should be fine, we use error etc. APIs and those imply translatations and some translators translate it. The following patch adjusts the code such that we don't emit appel returns_twice est not first dans le bloc de base 33 in French (i.e. 2 English word in the middle of a French message). Similarly Swedish or Ukrainian. Note, the German translator did differentiate between these verifier messages vs. normal user facing and translated it to: "Interner Fehler: returns_twice call is %s in basic block %d" so just a German prefix before English message. 2024-04-12 Jakub Jelinek <jakub@redhat.com> * tree-cfg.cc (gimple_verify_flow_info): Make the misplaced returns_twice diagnostics translatable.
2024-04-12Limit special asan/ubsan/bitint returns_twice handling to calls in bbs with ↵Jakub Jelinek3-3/+27
abnormal pred [PR114687] The tree-cfg.cc verifier only diagnoses returns_twice calls preceded by non-label/debug stmts if it is in a bb with abnormal predecessor. The following testcase shows that if a user lies in the attributes (a function which never returns can't be pure, and can't return twice when it doesn't ever return at all), when we figure it out, we can remove the abnormal edges to the "returns_twice" call and perhaps whole .ABNORMAL_DISPATCHER etc. edge_before_returns_twice_call then ICEs because it can't find such an edge. The following patch limits the special handling to calls in bbs where the verifier requires that. 2024-04-12 Jakub Jelinek <jakub@redhat.com> PR sanitizer/114687 * gimple-iterator.cc (gsi_safe_insert_before): Only use edge_before_returns_twice_call if bb_has_abnormal_pred. (gsi_safe_insert_seq_before): Likewise. * gimple-lower-bitint.cc (bitint_large_huge::lower_call): Only push to m_returns_twice_calls if bb_has_abnormal_pred. * gcc.dg/asan/pr114687.c: New test.
2024-04-12testsuite: Fix loop-interchange-16.cStefan Schulze Frielinghaus1-0/+1
Prevent loop unrolling of the innermost loop because otherwise we are left with no loop interchange for targets like s390 which have a more aggressive loop unrolling strategy. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/loop-interchange-16.c: Prevent loop unrolling of the innermost loop.
2024-04-12RISC-V: Bugfix ICE non-vector in TARGET_FUNCTION_VALUE_REGNO_PPan Li5-1/+45
This patch would like to fix one ICE when vector is not enabled in hook TARGET_FUNCTION_VALUE_REGNO_P implementation. The vector regno is available if and only if the TARGET_VECTOR is true. The previous implement missed this condition and then result in ICE when rv64gc build option without vector. The below test suite is passed for this patch. * The rv64gcv fully regression tests. * The rv64gc fully regression tests. PR target/114639 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_function_value_regno_p): Add TARGET_VECTOR predicate for V_RETURN regno. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr114639-1.c: New test. * gcc.target/riscv/pr114639-2.c: New test. * gcc.target/riscv/pr114639-3.c: New test. * gcc.target/riscv/pr114639-4.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-04-12Daily bump.GCC Administrator4-1/+243
2024-04-11btf: fix a possibly misleading asm debug commentDavid Faust1-34/+50
This patch fixes a small error that could occur in the debug comment when emitting a type reference with btf_asm_type_ref. While working on a previous patch, I noticed the following in the asm output for the test btf-bitfields-4.c: ... .long 0x39 # MEMBER 'c' idx=3 .long 0x6 # btm_type: (BTF_KIND_UNKN '') ... .long 0x34 # TYPE 6 BTF_KIND_INT 'char' The type for member 'c' is correct, but the comment for the member incorrectly reads "BTF_KIND_UNKN ''". This was caused by an incorrect type lookup in btf_asm_type_ref that could happen if the source file has types which can be represented in CTF but not in BTF. This patch fixes the issue by changing btf_asm_type_ref to work fully in the CTF ID space until writing out the final BTF ID. That ensures types are correctly identified when writing the asm debug comments, like the following fixed comment for the above case. ... .long 0x39 # MEMBER 'c' idx=3 .long 0x6 # btm_type: (BTF_KIND_INT 'char') ... Note that there was no problem with the actual BTF information, the only error was in the comment. This patch does not change the output BTF information, and no tests were affected. gcc/ * btfout.cc (btf_asm_type_ref): Convert IDs to BTF internally and fix potentially looking up wrong type for asm debug comment info. Split into... (btf_asm_datasec_type_ref): ... This. New. (btf_asm_datasec_entry): Call it here, instead of btf_asm_type_ref. (btf_asm_type, btf_asm_array, btf_asm_varent, btf_asm_sou_member) (btf_asm_func_arg, btf_asm_func_type): Adapt btf_asm_type_ref call.
2024-04-11btf: emit non-representable bitfield as voidDavid Faust2-28/+28
This patch fixes an issue with mangled BTF that could occur when a struct type contains a bitfield member which cannot be represented in BTF. It is undefined what should happen in such cases, but we can at least do something reasonable. Commit 936dd627cd9 "btf: do not skip members of data type with type id BTF_VOID_TYPEID" made a similar change for un-representable non-bitfield members, but had an unintended side-effect of mangling BTF for un-representable bitfields: the struct (or union) would account for the offending bitfield in its member count but the bitfield member itself was not emitted, making the member count incorrect. This change ensures that non-representable bitfield members of struct and union types are always emitted with BTF_VOID_TYPEID. This avoids corrupting the BTF information for the entire struct or union type. gcc/ * btfout.cc (btf_asm_sou_member): Always emit non-representable bitfield members as having 'void' type. Refactor slightly. gcc/testsuite/ * gcc.dg/debug/btf/btf-bitfields-4.c: Add two new checks.
2024-04-11aarch64: Fix _BitInt testcasesAndre Vieira (lists)2-36/+24
This patch fixes some testisms introduced by: commit 5aa3fec38cc6f52285168b161bab1a869d864b44 Author: Andre Vieira <andre.simoesdiasvieira@arm.com> Date: Wed Apr 10 16:29:46 2024 +0100 aarch64: Add support for _BitInt The testcases were relying on an unnecessary sign-extend that is no longer generated. The tested version was just slightly behind top of trunk when the patch was committed, and the codegen had changed, for the better, by then. gcc/testsuite/ChangeLog: * gcc.target/aarch64/bitfield-bitint-abi-align16.c (g1, g8, g16, g1p, g8p, g16p): Remove unnecessary sbfx. * gcc.target/aarch64/bitfield-bitint-abi-align8.c (g1, g8, g16, g1p, g8p, g16p): Likewise.