aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
3 hoursDaily bump.HEADtrunkmasterGCC Administrator9-1/+284
7 hoursc++: improve nesting in print_z_candidate [PR121966]David Malcolm1-5/+5
Comment #2 of PR c++/121966 notes that the "inherited here" messages should be nested *within* the note they describe. Implemented by this patch, which also nests other notes emitted for rejection_reason within the first note of print_z_candidate. gcc/cp/ChangeLog: PR c++/121966 * call.cc (print_z_candidate): Consolidate instances of auto_diagnostic_nesting_level into one, above the "inherited here" message so that any such message is nested within the note, and any messages emitted due to the switch on rejection_reason are similarly nested within the note. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
7 hoursc++: fix count of z candidates for non-viable candidates, nesting [PR121966]David Malcolm1-1/+5
In r15-6116-gd3dd24acd74605 I updated print_z_candidates to show the number of candidates, and a number for each candidate. PR c++/121966 notes that the printed count is sometimes higher than what's actually printed: I missed the case where candidates in the list aren't printed due to not being viable. Fixed thusly. gcc/cp/ChangeLog: PR c++/121966 * call.cc (print_z_candidates): Copy the filtering logic on viable candidates from the printing loop to the counting loop, so that num_candidates matches the number of iterations of the latter loop. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
7 hourstestsuite: add 'std-' prefix to c++ analyzer test casesDavid Malcolm2-0/+0
gcc/testsuite/ChangeLog: * g++.dg/analyzer/unique_ptr-1.C: Rename to... * g++.dg/analyzer/std-unique_ptr-1.C: ...this. * g++.dg/analyzer/unique_ptr-2.C: Rename to... * g++.dg/analyzer/std-unique_ptr-2.C: ...this. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
7 hourssarif-replay: fix uninitialized m_debug_physical_locationsDavid Malcolm1-0/+1
In r16-2766-g7969e4859ed007 I added a new field to replay_opts but forgot to initialize it in set_defaults. Fixed thusly. Spotted thanks to valgrind. gcc/ChangeLog: * sarif-replay.cc (set_defaults): Initialize m_debug_physical_locations. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
7 hoursuninclude: Add lib/gcc/<anything>/include as an possible include dirAndrew Pinski1-2/+3
While running uninclude on PR99912's preprocessed source uninclude didn't uninclude some of the x86_64 target headers. This was because `lib/gcc/<anything>/include` was not noticed as an possible system include dir. It supported `gcc-lib/<anything>/include` though. contrib/ChangeLog: * uninclude: Add `lib/gcc/<anything>/include`.
9 hoursforwprop: Fix up "nop" copies after recent changes [PR121962]Andrew Pinski2-2/+83
After r16-3887-g597b50abb0d2fc, the check to see if the copy is a nop copy becomes inefficient. The code going into an infinite loop as the copy keeps on being propagated over and over again. That is if we have: ``` struct s1 *b = &a.t; a.t = *b; p = *b; ``` This goes into an infinite loop propagating over and over again the `MEM[&a]`. To solve this a new function is needed for the comparison that is similar to new_src_based_on_copy. PR tree-optimization/121962 gcc/ChangeLog: * tree-ssa-forwprop.cc (same_for_assignment): New function. (optimize_agr_copyprop_1): Use same_for_assignment to check for nop copies. (optimize_agr_copyprop): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr121962-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
9 hoursforwprop: Add a quick out for new_src_based_on_copy when both are declsAndrew Pinski1-0/+4
If both operands that are being compared are decls, operand_equal_p will already handle that case so an early out can be done here. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (new_src_based_on_copy): An early out if both are decls. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
9 hoursforwprop: Handle memcpy for arguments with respect to copiesAndrew Pinski2-62/+106
This moves the code used in optimize_agr_copyprop_1 (r16-3887-g597b50abb0d) to handle this same case into its new function and use it inside optimize_agr_copyprop_arg. This allows to remove more copies that show up only in arguments. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (optimize_agr_copyprop_1): Split out the case where `operand_equal_p (dest, src2)` is false into ... (new_src_based_on_copy): This. New function. (optimize_agr_copyprop_arg): Use new_src_based_on_copy instead of operand_equal_p to find the new src. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/copy-prop-aggregate-arg-2.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
9 hourslibstdc++/ranges: Fix more wrong value type init from reference type [PR111861]Patrick Palka2-5/+5
As in r16-3912-g412a1f78b53709, this fixes some other spots where we wrongly use a deduced type and non-direct-initialization when trying to initialize a value type from an iterator's reference type. PR libstdc++/111861 libstdc++-v3/ChangeLog: * include/bits/ranges_algo.h (ranges::unique_copy): When initializing a value type object from *iter, use direct-initialization and don't use a deduced type. (ranges::push_heap): Use direct-initialization when initializing a value type object from ranges::iter_move. (ranges::max): As in ranges::unique_copy. * include/bits/ranges_util.h (ranges::min): Likewise. Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
11 hoursImplement -fexternal-blas64 option.Thomas Koenig7-15/+63
Libraries like Intel MKL use 64-bit integers in their API, but gfortran up to now only provides external BLAS for matmul with 32-bit integers. This straightforward patch provides a new option -fexternal-blas64 to remedy that situation. gcc/fortran/ChangeLog: * frontend-passes.cc (optimize_namespace): Handle flag_external_blas64. (call_external_blas): If flag_external_blas is set, use gfc_integer_4_kind as the argument kind, gfc_integer_8_kind otherwise. * gfortran.h (gfc_integer_8_kind): Define. * invoke.texi: Document -fexternal-blas64. * lang.opt: Add -fexternal-blas64. * lang.opt.urls: Regenerated. * options.cc (gfc_post_options): -fexternal-blas is incompatible with -fexternal-blas64. gcc/testsuite/ChangeLog: * gfortran.dg/matmul_blas_3.f90: New test.
14 hours[PR tree-optimization/58727] Don't over-simplify constants`Shreya Munnangi2-0/+29
Here's Shreya's next patch. In pr58727 we have a case where the tree/gimple optimizers have decided to "simplify" constants involved in logical ops by turning off as many bits as they can in the hope that the simplified constant will be easier/smaller to encode. That "simplified" constant gets passed down into the RTL optimizers where it can ultimately cause a missed optimization. Concretely let's assume we have insns 6, 7, 8 as shown in the combine dump below: > Trying 6, 7 -> 9: > 6: r139:SI=r141:SI&0xfffffffffffffffd > REG_DEAD r141:SI > 7: r140:SI=r139:SI&0xffffffffffbfffff > REG_DEAD r139:SI > 9: r137:SI=r140:SI|0x2 > REG_DEAD r140:SI We can obviously see that insn 6 is redundant as the bit we turn off would be turned on by insn 9. But combine ultimately tries to generate: > (set (reg:SI 137 [ _3 ]) > (ior:SI (and:SI (reg:SI 141 [ a ]) > (const_int -4194305 [0xffffffffffbffffd])) > (const_int 2 [0x2]))) That does actually match a pattern on RISC-V, but it's a pattern that generates two bit-clear insns (or a bit-clear followed by andi and a pattern we'll be removing someday). But if instead we IOR 0x2 back into the simplified constant we get: > (set (reg:SI 137 [ _3 ]) > (ior:SI (and:SI (reg:SI 141 [ a ]) > (const_int -4194305 [0xffffffffffbfffff])) > (const_int 2 [0x2]))) That doesn't match, but when split by generic code in the combiner we get: > Successfully matched this instruction: > (set (reg:SI 140) > (and:SI (reg:SI 141 [ a ]) > (const_int -4194305 [0xffffffffffbfffff]))) > Successfully matched this instruction: > (set (reg:SI 137 [ _3 ]) > (ior:SI (reg:SI 140) > (const_int 2 [0x2]))) Which is bclr+bset/ori. ie, we dropped one of the logical AND operations. Bootstrapped and regression tested on x86 and riscv. Regression tested on the 30 or so embedded targets as well without new failures. I'll give this a couple days for folks to chime in before pushing on Shreya's behalf. This doesn't fix pr58727 for the other targets as they would need target dependent hackery. Jeff PR tree-optimization/58727 gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): In (A & C1) | C2, if (C1|C2) results in a constant with a single bit clear, then adjust C1 appropriately. gcc/testsuite/ * gcc.target/riscv/pr58727.c: New test.
14 hours[gimplefe] fix SSA operand creationRichard Biener2-60/+161
When transitioning gcc.dg/torture/pr84830.c to a GIMPLE testcase to feed the IL into PRE that caused the original issue (and verify it's still there with the fix reverted), I noticed we put up SSA operands before having fully parsed the function and thus with not all variables having the final TREE_ADDRESSABLE state. The following fixes this, delaying update_stmt calls to when we create PHI nodes. It also makes the pr84830.c not rely on the particular fake exit edge source location by making the loop have an exit. gcc/c/ * gimple-parser.cc (c_parser_parse_gimple_body): Initialize SSA operands for each stmt. (c_parser_gimple_compound_statement): Append stmts without updating SSA operands. gcc/testsuite/ * gcc.dg/torture/pr84830.c: Turn into GIMPLE unit test for PRE.
16 hourss390: testsuite: Fix bitops-{1,2}.c and andc-splitter-2.cStefan Schulze Frielinghaus3-14/+18
After r16-2649-g0340177d54d tests fail for gcc.target/s390/arch13/bitops-{1,2}.c since sign extends in conjunction with (subreg (not a)) are folded, now. That is, of course, wanted. Since the original tests were about 32-bit operations, circumvent the sign extend by not returning a value but rather writing it to memory. Similar for andc-splitter-2.c sign extends are folded there, too. Since the test is not about 32- or 64-bit adjust the scan assembler directives only. gcc/testsuite/ChangeLog: * gcc.target/s390/arch13/bitops-1.c: Do not return a 32bit value but write it to memory. * gcc.target/s390/arch13/bitops-2.c: Ditto. * gcc.target/s390/md/andc-splitter-2.c: Adjust scan assembler directive because sign extends are folded, now.
18 hoursPreserve TREE_THIS_NOTRAP during inlining in more casesEric Botcazou1-21/+27
For parameters passed by reference, the Ada compiler sets TREE_THIS_NOTRAP on their dereference to prevent tree_could_trap_p from returning true and then causing a new basic block to be created for every access to them, given that in Ada the -fnon-call-exceptions flag is enabled by default. However, when the subprogram is inlined, this TREE_THIS_NOTRAP flag cannot be blindly preserved because the call may pass the dereference of a pointer as the argument: even if the compiler generates a check that the pointer is not null just before, preserving TREE_THIS_NOTRAP could cause an access to be hoisted before the check; therefore it gets cleared for parameters. Now that's suboptimal if the argument is a full object because accessing it through the dereference of the parameter cannot trap, which causes MEM_REFs of the form MEM_REF [&DECL] to be considered as trapping in the case where the nominal subtype of DECL is self-referential. gcc/ * tree-inline.cc (maybe_copy_this_notrap): New function. Also copy the TREE_THIS_NOTRAP flag for parameters when the argument is a full object and the parameter's type is self-referential. (remap_gimple_op_r): Call maybe_copy_this_notrap. (copy_tree_body_r): Likewise.
19 hourstestsuite, objective-c: Fix duplicate test names in 'special'.Iain Sandoe5-128/+31
For macOS/Darwin, we run Objective-C tests for both the GNU and NeXT runtimes (and these runs are usually differentiated by identifying the runtime in the test name). However, the 'special' sub-set of tests had a non-standard driver since it needs two sources for each test (but did not report the runtime in the test name and so shows duplicates). We can now automate the multi-source case with dg-additional-sources but need to do a little work to filter these additional sources from the set (since they also have a .m suffix). This addresses the FIXME in the original driver. To resolve the duplicated names, means amending the reported name to include the runtime as a differentiator, this means that test comparisons will temporarily report new and missing tests for any comparison that includes this change. gcc/testsuite/ChangeLog: * objc.dg/special/load-category-1.m: Add second source. * objc.dg/special/load-category-2.m: Likewise. * objc.dg/special/load-category-3.m: Likewise. * objc.dg/special/unclaimed-category-1.m: Likewise. * objc.dg/special/special.exp: Rewrite to make use of generic testsuite facilities. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
20 hourstestsuite: arm: Simplify fp16-aapcs testsTorbjörn SVENSSON5-218/+24
Reduce fp16-aapcs testcases to return value testing since parameter passing are already tested in aapcs/vfp*.c gcc/testsuite/ChangeLog: * gcc.target/arm/fp16-aapcs.c: New test. * gcc.target/arm/fp16-aapcs-1.c: Removed. * gcc.target/arm/fp16-aapcs-2.c: Likewise. * gcc.target/arm/fp16-aapcs-3.c: Likewise. * gcc.target/arm/fp16-aapcs-4.c: Likewise. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
21 hourslibgomp: Init hash table for 'indirect'-clause of 'declare target' on the ↵Tobias Burnus7-101/+295
host [PR114445, PR119857] Especially with unified-shared memory and especially with C++'s virtual functions, it is not uncommon to have on the device a function pointer that points to the host function - but has an associated device. If the pointed-to function is (explicitly or implicitly) 'declare target' with the 'indirect' clause, it is added to the lookup table. Before this commit, the conversion of the lookup table into a lookup hash table happened every time a device kernel was launched on the first team - albeit if already converted, the function immediately returned. Ignoring the overhead, there was also a race: If multiple teams were launched, it could happen that another team of the same target region already tried to use the lookup table which it was still being created. Likewise when lauching a kernel with 'nowait' and directly afterward another kernel, there could be a race of creating the table. With this commit, the creating of the kernel has been moved to the host-plugin's GOMP_OFFLOAD_load_image. The previous code stored a pointer to the host/device pointer array, which makes it hard when creating the hash table on the host (data is needed for finding the slot) - but accessing it on the device (where the lookup has to work as well). As the hash-table implementation (only) supports integral value as payload (0 and 1 having special meaning), the solution was to move to an uint128_t variable to store both the host and device address. As the host-side library is typically dynamically linked and the device-side one statically, there is the problem of backward compatibility. The current implementation permits both older binaries and newer libgomp and newer binaries with older libgomp. I could imagine us breaking the latter eventually, but for now there is up and downward compatibility. (Obviously, the race is only fixed if new + new is combined.) Code wise, on the device exist GOMP_INDIRECT_ADDR_MAP which was updated to point to the host/device-address array. Now additionally GOMP_INDIRECT_ADDR_HMAP exists, which contains the hash-table map. If the latter exists, libgomp only updates it and the former remains a NULL pointer; it is also untouched if there are no indirect functions. Being NULL therefore avoids the call to the device-side build_indirect_map. The code also currently supports to have no hash and a linear walk. I think that remained from testing; due to the backward-compat feature, it can actually be turned of on either side. libgomp/ChangeLog: PR libgomp/119857 PR libgomp/114445 * config/accel/target-indirect.c: Change to use uint128_t instead of a struct as data structure and add GOMP_INDIRECT_ADDR_HMAP as host-accessible variable. (struct indirect_map_t): Remove. (USE_HASHTAB_LOOKUP, INDIRECT_DEV_ADDR, INDIRECT_HOST_ADDR, SET_INDIRECT_HOST_ADDR, SET_INDIRECT_ADDRS): Define. (htab_free): Use __builtin_unreachable. (htab_hash, htab_eq, GOMP_target_map_indirect_ptr, build_indirect_map): Update for new representation and new pointer-to-hash variable. * config/gcn/team.c (gomp_gcn_enter_kernel): Only call build_indirect_map when GOMP_INDIRECT_ADDR_MAP. * config/nvptx/team.c (gomp_nvptx_main): Likewise. * libgomp-plugin.h (GOMP_INDIRECT_ADDR_HMAP): Define. * plugin/plugin-gcn.c: Conditionally include build-target-indirect-htab.h. (USE_HASHTAB_LOOKUP_FOR_INDIRECT): Define. (create_target_indirect_map): New prototype. (GOMP_OFFLOAD_load_image): Update to create the device's indirect-function hash table on the host. * plugin/plugin-nvptx.c: Conditionally include build-target-indirect-htab.h. (USE_HASHTAB_LOOKUP_FOR_INDIRECT): Define. (create_target_indirect_map): New prototype. (GOMP_OFFLOAD_load_image): Update to create the device's indirect-function hash table on the host. * plugin/build-target-indirect-htab.h: New file.
21 hourslibgomp: Add Fortran version of acc_copyout_finalize_async and ↵Tobias Burnus5-51/+320
acc_delete_finalize_async OpenACC 2.5 added several functions for C and Fortran; while acc_{copyout,delete}{,_finalize,_async} exist for both, for some reasons only the C version of acc_{copyout,delete}_finalize_async was actually added, even though the documentation (.texi) and the .map file listed also the auxiliar Fortran functions! OpenACC 2.5 added the Fortran version with the following odd interface: 'type, dimension(:[,:]...)'. In OpenACC 2.6, it was then updated to the Fortran 2018 syntax: 'type(*), dimension(..)', which is also used in openacc.f90 internally. This commit now also updates the documentation to the newer syntax - plus fixes a function-name typo: acc_delete_async_finalize should have the _async at the end not in the middle! libgomp/ChangeLog: * libgomp.map (OACC_2.5): Move previously unimplemented acc_{copyout,delete}_finalize_async_{32,64,array}_h_ to ... (OACC_2.6.1): ... here. * libgomp.texi (acc_copyin, acc_present_or_copyin, acc_create, acc_present_or_create, acc_copyout, acc_update_device, acc_update_self, acc_is_present): Use 'type(*), dimension(..)' instead of 'type, dimension(:[,:]...)' for Fortran. (acc_delete): Likewise; change acc_delete_async_finalize to acc_delete_finalize_async. * openacc.f90 (openacc_internal): Add interfaces for acc_{copyout,delete}_finalize_async_{{32,64,array}_h,_l}. (openacc): Add generic interfaces for acc_copyout_finalize_async and acc_delete_finalize_async. (acc_{copyout,delete}_finalize_async_{32,64,array}_h): New. * openacc_lib.h: Add generic interfaces for acc_copyout_finalize_async and acc_delete_finalize_async. * testsuite/libgomp.oacc-fortran/pr92970-1.f90: New test.
24 hoursRISC-V: Add test for vec_duplicate + vwmulu.vv signed combine with GR2VR ↵Pan Li12-1/+69
cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vwmulu.vv combine to vwmulu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vwmulu.vx. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: Add test data for vwmulu.vx run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vwmulu-run-1-u64.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
24 hoursRISC-V: Add test for vec_duplicate + vwsubu.vv signed combine with GR2VR ↵Pan Li12-2/+70
cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vwsubu.vv combine to vwsubu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vwsubu.vx. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vwsubu-run-1-u64.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
24 hoursRISC-V: Add test for vec_duplicate + vwaddu.vv signed combine with GR2VR ↵Pan Li13-0/+194
cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vwaddu.vv combine to vwaddu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vwaddu.vx. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vwaddu-run-1-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_vx_run.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
24 hoursRISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR costPan Li3-0/+61
This patch would like to combine the vec_duplicate + vwaddu.vv to the vwaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. Before this patch: 11 beq a3,zero,.L8 12 vsetvli a5,zero,e32,m1,ta,ma 13 vmv.v.x v2,a2 ... 16 .L3: 17 vsetvli a5,a3,e32,m1,ta,ma ... 22 vwaddu.vv v1,v2,v3 ... 25 bne a3,zero,.L3 After this patch: 11 beq a3,zero,.L8 ... 14 .L3: 15 vsetvli a5,a3,e32,m1,ta,ma ... 20 vwaddu.vx v1,a2,v3 ... 23 bne a3,zero,.L3 The pattern of this patch only works on DImode, aka below pattern. v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode) + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode)); Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss this extend op after expand. For uint16_t => uint32_t we have: (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0)) For uint32_t => uint64_t we have: (set (reg:DI 148 [ _6 ]) (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0))) We can see there is no zero_extend for uint16_t to uint32_t, and we cannot hit the pattern above. So the combine will try below pattern for uint16_t to uint32_t. v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode) + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode))) But it cannot match the vwaddu sematics, thus we need another handing for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to uint16_t. gcc/ChangeLog: * config/riscv/autovec-opt.md (*widen_first_<any_extend:su>_vx_<mode>): Add helper bridge pattern for vwaddu.vx combine. (*widen_<any_widen_binop:optab>_<any_extend:su>_vx_<mode>): Add new pattern to match vwaddu.vx combine. * config/riscv/iterators.md: Add code attr to get extend CODE. * config/riscv/vector-iterators.md: Add Dmode iterator for widen. Signed-off-by: Pan Li <pan2.li@intel.com>
24 hoursi386/testsuite: Correct res_ref2 array size for avx512bw-vpmov{,us}wb-2.cHaochen Jiang2-2/+2
Both of the tests under 128 bit are raising: warning: writing 16 bytes into a region of size 8 [-Wstringop-overflow=] when compiling, leading to a test fail. The warning is caused by the incorrect array size for res_ref2. The wrong size caused the overflow. Correct them in this patch to fix the test fail. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-vpmovuswb-2.c: Correct res_ref2 array size. * gcc.target/i386/avx512bw-vpmovwb-2.c: Ditto.
24 hoursi386/testsuite: Fix scan tree dump in vect-epilogue-4.cHaochen Jiang1-1/+2
vect-epilogue-4.c uses mask 64 byte to vectorize in epilogue part. Similar as r16-876 fix for vect-epilogue-5.c, we need to adjust the scan tree dump. gcc/testsuite/ChangeLog: * gcc.target/i386/vect-epilogues-4.c: Fix for epilogue vect tree dump.
27 hourslibstdc++: Explicitly pass -Wsystem-headers in tests that need itPatrick Palka5-1/+5
When running libstdc++ tests using an installed gcc (as opposed to an in-tree gcc), we naturally use system stdlib headers instead of the in-tree headers. But warnings from within system headers are suppressed by default, so tests that check for such warnings spuriously fail in such a setup. This patch makes us compile such tests with -Wsystem-headers so that they consistently pass. libstdc++-v3/ChangeLog: * testsuite/20_util/bind/dangling_ref.cc: Compile with -Wsystem-headers. * testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Likewise. * testsuite/20_util/unique_ptr/lwg4148.cc: Likewise. * testsuite/29_atomics/atomic/operators/pointer_partial_void.cc: Likewise. * testsuite/30_threads/packaged_task/cons/dangling_ref.cc: Likewise. Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
27 hoursDaily bump.GCC Administrator6-1/+207
29 hoursc: Reject gimple and rtl functions as needed functions [PR121421]Andrew Pinski2-1/+20
These two don't make sense as nested functions as they both don't handle the unnesting and/or have support for the static chain. So let's reject them. Bootstrapped and tested on x86_64-linux-gnu. PR c/121421 gcc/c/ChangeLog: * c-parser.cc (c_parser_declaration_or_fndef): Error out for gimple and rtl functions as nested functions. gcc/testsuite/ChangeLog: * gcc.dg/gimplefe-error-16.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
34 hoursdocs: Adjust -Wimplicit-fallthrough= documentation for C23Jakub Jelinek1-1/+1
I've noticed in -Wimplicit-fallthrough= documentation we talk about [[fallthrough]]; for C++17 but don't mention that it is also standard way to suppress the warning for C23. 2025-09-16 Jakub Jelinek <jakub@redhat.com> * doc/invoke.texi (Wimplicit-fallthrough=): Document that also C23 provides a standard way to suppress the warning with [[fallthrough]];.
34 hourslibstdc++: Optimize determination of std::tuple_cat return typeJonathan Wakely2-27/+26
The std::tuple_cat function has to determine a std::tuple return type from zero or more tuple-like arguments. This uses the __make_tuple class template to transform a tuple-like type into a std::tuple, and the __combine_tuples class template to combine zero or more std::tuple types into a single std::tuple type. This change optimizes the __make_tuple class template to use an _Index_tuple and pack expansion instead of recursive instantiation, and optimizes __combine_tuples to use fewer levels of recursion. For ranges::adjacent_view's __detail::__repeated_tuple helper we can just use the __make_tuple class template directly, instead of doing overload resolution on std::tuple_cat to get its return type. libstdc++-v3/ChangeLog: * include/std/ranges (__detail::__repeated_tuple): Use __make_tuple helper alias directly, instead of doing overload resolution on std::tuple_cat. * include/std/tuple (__make_tuple_impl): Remove. (__do_make_tuple): Replace recursion with _Index_tuple and pack expansion. (__make_tuple): Adjust to new __do_make_tuple definition. (__combine_tuples<tuple<T1s...>, tuple<T2s...>, Rem...>): Replace with a partial specialization for exactly two tuples and a partial specialization for three or more tuples. Reviewed-by: Patrick Palka <ppalka@redhat.com>
35 hourslibstdc++: ranges::rotate should not use 'auto' with ranges::iter_move ↵Jonathan Wakely1-2/+2
[PR121913] The r16-3835-g7801236069a95c change to use ranges::iter_move should also have used iter_value_t<_Iter> to ensure we get an object of the value type, not a proxy reference. libstdc++-v3/ChangeLog: PR libstdc++/121913 * include/bits/ranges_algo.h (__rotate_fn::operator()): Use auto_value_t<_Iter> instead of deduced type. Reviewed-by: Patrick Palka <ppalka@redhat.com>
35 hourslibstdc++: Fix missing change to views::pairwise from P2165R4 [PR121956]Jonathan Wakely2-3/+14
ranges::adjacent_view::_Iterator::value_type should have been changed by r14-8710-g65b4cba9d6a9ff to always produce std::tuple, even for the N == 2 views::pairwise specialization. libstdc++-v3/ChangeLog: PR libstdc++/121956 * include/std/ranges (adjacent_view::_Iterator::value_type): Always define as std::tuple<T, N>, not std::pair<T, T>. * testsuite/std/ranges/adaptors/adjacent/1.cc: Check value type of views::pairwise. Reviewed-by: Patrick Palka <ppalka@redhat.com>
35 hoursxtensa: Simplify the definition of REGNO_OK_FOR_BASE_P() and avoid calling ↵Takayuki 'January June' Suwa2-3/+4
it directly In recent gcc versions, REGNO_OK_FOR_BASE_P() is not called directly, but rather via regno_ok_for_base_p() which is a wrapper in gcc/addresses.h. The wrapper obtains a hard register number from pseudo via reg_renumber array, so REGNO_OK_FOR_BASE_P() does not need to take this into consideration. On the other hand, since there is only one use of REGNO_OK_FOR_BASE_P() in the target-specific code, it would make more sense to simplify the definition of REGNO_OK_FOR_BASE_P() and replace its call with that of regno_ok_for_base_p(). gcc/ChangeLog: * config/xtensa/xtensa.cc (#include): Add "addresses.h". * config/xtensa/xtensa.h (REGNO_OK_FOR_BASE_P): Simplify to just a call to GP_REG_P(). (BASE_REG_P): Replace REGNO_OK_FOR_BASE_P() with the equivalent call to regno_ok_for_base_p().
39 hoursAArch64: Add isnan expander [PR 66462]Wilco Dijkstra2-0/+49
Add an expander for isnan using integer arithmetic. Since isnan is just a compare, enable it only with -fsignaling-nans to avoid generating spurious exceptions. This fixes part of PR66462. int isnan1 (float x) { return __builtin_isnan (x); } Before: fcmp s0, s0 cset w0, vs ret After: fmov w1, s0 mov w0, -16777216 cmp w0, w1, lsl 1 cset w0, cc ret gcc: PR middle-end/66462 * config/aarch64/aarch64.md (isnan<mode>2): Add new expander. gcc/testsuite: PR middle-end/66462 * gcc.target/aarch64/pr66462.c: Update test.
40 hoursUnify last two vect_transform_slp_perm_load callsRichard Biener1-16/+23
The following unifies the vect_transform_slp_perm_load call done in vectorizable_load with that eventually done in get_load_store_type. On the way it fixes the conditions on which we can allow VMAT_ELEMENTWISE or VMAT_GATHER_SCATTER when there's a SLP permutation (and we arrange to not code generate that). In particular that only works for single-lane SLP of non-grouped loads or groups of size one. VMAT_ELEMENTWISE does not (yet?) materialize a permutation upon vector build but still relies on vect_transform_slp_perm_load. * tree-vect-stmts.cc (get_load_store_type): Get in a flag whether a SLP_TREE_LOAD_PERMUTATION on the node can be code generated and use it. Fix the condition on using strided gather/scatter to avoid dropping a meaningful permutation. (vectorizable_store): Adjust. (vectorizable_load): Analyze the permutation early and pass the result down to get_load_store_type. Fix the condition on when we are allowed to elide a load permutation.
42 hourslibstdc++: Do not use _GLIBCXX_MAKE_MOVE_ITERATOR for C++17Jonathan Wakely1-6/+6
The _GLIBCXX_MAKE_MOVE_ITERATOR macro is needed for code that needs to compile as C++98, where it just produces the original iterator. In std::uninitialized_move and std::uninitialized_move_n we can just call std::make_move_iterator directly. libstdc++-v3/ChangeLog: * include/bits/stl_uninitialized.h (uninitialized_move) (uninitialized_move_n): Replace _GLIBCXX_MAKE_MOVE_ITERATOR with std::make_move_iterator. Reviewed-by: Patrick Palka <ppalka@redhat.com>
42 hourslibstdc++: Fix more missing uses of iter_difference_t [PR119820]Jonathan Wakely2-2/+2
libstdc++-v3/ChangeLog: PR libstdc++/119820 * include/bits/ranges_algo.h (__shuffle_fn): Use ranges::distance to get difference type value to add to iterator. * include/std/format (__formatter_str::_M_format_range): Use ranges::next to increment iterator by a size_t value. Reviewed-by: Patrick Palka <ppalka@redhat.com>
43 hoursaarch64: Force vector in SVE gimple_folder::fold_active_lanes_to.Jennifer Schmitz9-0/+81
An ICE was reported in the following test case: svint8_t foo(svbool_t pg, int8_t op2) { return svmul_n_s8_z(pg, svdup_s8(1), op2); } with a type mismatch in 'vec_cond_expr': _4 = VEC_COND_EXPR <v16_2(D), v32_3(D), { 0, ... }>; The reason is that svmul_impl::fold folds calls where one of the operands is all ones to the other operand using gimple_folder::fold_active_lanes_to. However, we implicitly assumed that the argument that is passed to fold_active_lanes_to is a vector type. In the given test case op2 is a scalar type, resulting in the type mismatch in the vec_cond_expr. This patch fixes the ICE by forcing a vector type of the argument in fold_active_lanes_to before the statement with the vec_cond_expr. In the initial version of this patch, the force_vector statement was placed in svmul_impl::fold, but it was moved to fold_active_lanes_to to align it with fold_const_binary which takes care of the fixup from scalar to vector type using vector_const_binop. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for trunk? OK to backport to GCC 15? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR target/121602 * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::fold_active_lanes_to): Add force_vector statement. gcc/testsuite/ PR target/121602 * gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
43 hoursada: Fix error message for Stream_SizeRonan Desplanques3-315/+232
Before this patch, confirming Stream_Size aspect specifications on elementary types were incorrectly rejected when the stream size was 128, and the error messages emitted for Stream_Size aspect errors gave incorrect possible values. This patch fixes this. The most significant part of the fix is a new subprogram in Exp_Strm, Get_Primitives, that makes it possible to retrieve a precise list of supported stream sizes, but also to select the right runtime streaming primitives for a given type. Using the latter, this patch factorizes code that was present in both Build_Elementary_Input_Call and Build_Elementary_Write_Call. gcc/ada/ChangeLog: * exp_strm.ads (Get_Primitives): New function. * exp_strm.adb (Get_Primitives): Likewise. (Build_Elementary_Input_Call, Build_Elementary_Write_Call): use Get_Primitives. (Has_Stream_Standard_Rep): Add formal parameter and rename to... (Is_Stream_Standard_Rep): New function. * sem_ch13.adb (Analyze_Attribute_Definition_Clause): Fix error emission.
43 hoursada: Revert "Remove dependence on secondary stack for type with controlled ↵Gary Dismukes3-28/+47
component" This reverts commit 91b51fc42b167eedaaded6360c490a4306bc5c55.
44 hoursAda, libgnarl: Fix Ada bootstrap for Darwin.Iain Sandoe1-2/+1
Recent changes to Ada have produced a new diagnostic: s-osinte.adb:34:18: warning: unit "Interfaces.C.Extensions"... which causes a bootstrap fail on Darwin when Ada is enabled. Fixed thus. PR ada/114065 gcc/ada/ChangeLog: * libgnarl/s-osinte__darwin.adb: Add and reference clause for Interfaces.C, remove clause for Interfaces.C.Extensions. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
48 hoursRISC-V: Allow profiles input in '--with-arch' option.Jiawei2-4/+32
Allows profiles input in '--with-arch'. Check profiles with 'riscv-profiles.def'. gcc/ChangeLog: * config.gcc: Accept RISC-V profiles in `--with-arch`. * config/riscv/arch-canonicalize: Add profile detection and skip canonicalization for profiles.
48 hoursRISC-V: Configure Profiles definitions in the definition file.Jiawei2-55/+91
Moving RISC-V Profiles definations into 'riscv-profiles.def'. Add comments for 'riscv_profiles'. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (struct riscv_profiles): Add comments. (RISCV_PROFILE): Removed. * config/riscv/riscv-profiles.def: New file.
48 hoursRISC-V: Imply zicsr for sdtrig and ssstrict extensions.Dongyan Chen1-2/+2
This patch implies zicsr for sdtrig and ssstrict extensions. According to the riscv-privileged spec, the sdtrig and ssstrict extensions are privileged extensions, so they should imply zicsr. gcc/ChangeLog: * config/riscv/riscv-ext.def: Imply zicsr.
2 daysi386/testsuite: Fix non unique name testsHaochen Jiang23-66/+53
After r16-3651, compare_tests script will explicitly mention those tests have the same name. This helps us review all the tests we have. Among them, most of them are unintentional typos (e.g., keep testing the same vector size for scan-assembler). Fix them through this commit. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-vpackssdw-1.c: Fix xmm/ymm mask tests. * gcc.target/i386/avx512bw-vpacksswb-1.c: Ditto. * gcc.target/i386/avx512bw-vpackusdw-1.c: Ditto. * gcc.target/i386/avx512bw-vpackuswb-1.c: Ditto. * gcc.target/i386/avx512bw-vpermw-1.c: Test xmm. * gcc.target/i386/avx512bw-vpmulhw-1.c: Fix xmm/ymm mask tests. * gcc.target/i386/avx512f-vec-init.c: Remove duplicate test. * gcc.target/i386/avx512fp16-13.c: Fix test for aligned load. * gcc.target/i386/avx512fp16-conjugation-1.c: Revise the test to test more precisely on masks. * gcc.target/i386/avx512fp16vl-conjugation-1.c: Ditto. * gcc.target/i386/avx512vbmi-vpermb-1.c: Test xmm. * gcc.target/i386/avx512vl-vcvtpd2ps-1.c: Fix scan asm. * gcc.target/i386/avx512vl-vinsert-1.c: Fix typo. * gcc.target/i386/avx512vl-vpmulld-1.c: Fix xmm/ymm mask tests. * gcc.target/i386/avx512vl-vptestmd-1.c: Ditto. * gcc.target/i386/bitwise_mask_op-1.c: Fix typo. * gcc.target/i386/cond_op_shift_q-1.c: Test both vpsra{,v} and vpsll{,v}. * gcc.target/i386/cond_op_shift_ud-1.c: Ditto. * gcc.target/i386/cond_op_shift_uq-1.c: Ditto. * gcc.target/i386/memcpy-pr95886.c: Fix the wrong const int. * gcc.target/i386/part-vect-sqrtph-1.c: Remove duplicate test. * gcc.target/i386/pr107432-7.c: Test vpmov{s,z}xbw instead of vpmov{s,z}xbd. * gcc.target/i386/pr88828-0.c: Fix pblendw scan asm.
2 daysOptimize vpermpd to vbroadcastf128 for specific permutations.liuhongt3-0/+51
gcc/ChangeLog: * config/i386/predicates.md (avx_vbroadcast128_operand): New predicate. * config/i386/sse.md (*avx_vbroadcastf128_<mode>_perm): New pre_reload splitter. gcc/testsuite/ChangeLog: * gcc.target/i386/avx_vbroadcastf128.c: New test.
2 daysDaily bump.GCC Administrator7-1/+516
2 days[analyzer] another function name that returns a pointer to errnoAlexandre Oliva1-0/+1
Add __get_errno_ptr() as yet another synonym for __errno_location. for gcc/analyzer/ChangeLog * kf.cc (register_known_functions): Add __get_errno_ptr.
2 daysaarch64: move pr113356.C under g++.targetClément Chigot1-0/+0
This test requires a C++ compiler. for gcc/testsuite/ChangeLog * gcc.target/aarch64/pr113356.C: Move to ... * g++.target/aarch64/pr113356.C: ... here.
2 days[ppc] [vxworks] allow code model selectionAlexandre Oliva1-0/+5
Bring code model selection logic to vxworks.h as well. for gcc/ChangeLog * config/rs6000/vxworks.h (TARGET_CMODEL, SET_CMODEL): Define.