riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-05-10	RISC-V: Fix typos in code or comment [NFC]	Kito Cheng	2	-50/+50
	Just found some typo when fixing bugs and then use aspell to find few more typos, this patch didn't do anything other than fix typo. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Fix typos in comments. (get_all_predecessors): Ditto. (pre_vsetvl::m_unknow_info): Rename to... (pre_vsetvl::m_unknown_info): this. (pre_vsetvl::compute_vsetvl_def_data): Rename m_unknow_info to m_unknown_info. (pre_vsetvl::cleaup): Rename to... (pre_vsetvl::cleanup): this. (pre_vsetvl::compute_vsetvl_def_data): Fix typos. (pass_vsetvl::lazy_vsetvl): Update function name and fix typos. * config/riscv/riscv.cc: Fix typos in comments. (struct machine_function): Fix typo in comments. (riscv_valid_lo_sum_p): Ditto. (riscv_force_address): Ditto. (riscv_immediate_operand_p): Ditto. (riscv_in_small_data_p): Ditto. (riscv_first_stack_step): Ditto. (riscv_expand_prologue): Ditto. (riscv_convert_vector_chunks): Ditto. (riscv_override_options_internal): Ditto. (get_common_costs): Ditto.
2024-05-10	driver: Move -fdiagnostics-urls= early like -fdiagnostics-color= [PR114980]	Xi Ruoyao	1	-0/+13
	In GCC 14 we started to emit URLs for "command-line option <option> is valid for <language> but not <another language>" and "-Werror= argument '-Werror=<option>' is not valid for <language>" warnings. So we should have moved -fdiagnostics-urls= early like -fdiagnostics-color=, or -fdiagnostics-urls= wouldn't be able to control URLs in these warnings. No test cases are added because with TERM=xterm-256colors PR114980 already triggers some test failures. gcc/ChangeLog: PR driver/114980 * opts-common.cc (prune_options): Move -fdiagnostics-urls= early like -fdiagnostics-color=.
2024-05-09	[committed] [RISC-V] Provide splitting guidance to combine to faciliate ↵	Jeff Law	2	-0/+52
	shNadd.uw generation This fixes a minor code quality issue I found while comparing GCC and LLVM. Essentially we want to do a bit of re-association to generate shNadd.uw instructions. Combine does the right thing and finds all the necessary instructions, reassociates the operands, combines constants, etc. Where is fails is finding a good split point. The backend can trivially provide guidance on how to split via a define_split pattern. This has survived both Ventana's internal CI system (rv64gcb) as well as my own (rv64gc, rv32gcv). I'll wait for the external CI system to give the all-clear before pushing. gcc/ * config/riscv/bitmanip.md: Add splitter for shadd feeding another add instruction. gcc/testsuite/ * gcc.target/riscv/zba-shadduw.c: New test.
2024-05-10	Revert: "Enable prange support." [PR114985]	Aldy Hernandez	13	-18/+31
	This reverts commit 36e877996936abd8bd08f8b1d983c8d1023a5842 until the IPA pass is fixed with regards to POINTER = POINTER <RELOP> POINTER.
2024-05-09	Constant fold {-1,-1} << 1 in simplify-rtx.cc	Roger Sayle	1	-0/+54
	This patch addresses a missed optimization opportunity in the RTL optimization passes. The function simplify_const_binary_operation will constant fold binary operators with two CONST_INT operands, and those with two CONST_VECTOR operands, but is missing compile-time evaluation of binary operators with a CONST_VECTOR and a CONST_INT, such as vector shifts and rotates. The first version of this patch didn't contain a switch statement to explicitly check for valid binary opcodes, which bootstrapped and regression tested fine, but my paranoia has got the better of me, so this version now checks that VEC_SELECT or some funky (future) rtx_code doesn't cause problems. 2024-05-09 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * simplify-rtx.cc (simplify_const_binary_operation): Constant fold binary operations where the LHS is CONST_VECTOR and the RHS is CONST_INT (or CONST_DOUBLE) such as vector shifts.
2024-05-09	c++: failure to suppress -Wsizeof-array-div in template [PR114983]	Marek Polacek	3	-0/+30
	-Wsizeof-array-div offers a way to suppress the warning by wrapping the second operand of the division in parens: sizeof (samplesBuffer) / (sizeof(unsigned char)) but this doesn't work in a template, because we fail to propagate the suppression bits. Do it, then. The finish_parenthesized_expr hunk is not needed because suppress_warning isn't very fine-grained. But I think it makes sense to be explicit and not rely on OPT_Wparentheses also suppressing OPT_Wsizeof_array_div. PR c++/114983 gcc/cp/ChangeLog: * pt.cc (tsubst_expr) <case SIZEOF_EXPR>: Use copy_warning. * semantics.cc (finish_parenthesized_expr): Also suppress -Wsizeof-array-div. gcc/testsuite/ChangeLog: * g++.dg/warn/Wsizeof-array-div3.C: New test.
2024-05-09	testsuite: Fix up pr84508* tests [PR84508]	Jakub Jelinek	2	-2/+4
	The tests FAIL on x86_64-linux with /usr/bin/ld: cannot find -lubsan collect2: error: ld returned 1 exit status compiler exited with status 1 FAIL: gcc.target/i386/pr84508-1.c (test for excess errors) Excess errors: /usr/bin/ld: cannot find -lubsan The problem is that only .dg/ubsan/ubsan.exp calls ubsan_init which adds the needed search paths to libubsan library. So, link/run tests for -fsanitize=undefined need to go into gcc.dg/ubsan/ or g++.dg/ubsan/, even when they are target specific. 2024-05-09 Jakub Jelinek <jakub@redhat.com> PR target/84508 gcc.target/i386/pr84508-1.c: Move to ... * gcc.dg/ubsan/pr84508-1.c: ... here. Restrict to i?86/x86_64 non-ia32 targets. * gcc.target/i386/pr84508-2.c: Move to ... * gcc.dg/ubsan/pr84508-2.c: ... here. Restrict to i?86/x86_64 non-ia32 targets.
2024-05-09	PR modula2/115003 exporting a symbol to outer scope with a name clash causes ICE	Gaius Mulley	1	-0/+1
	An ICE will occur if an unknown symbol is exported and causes a name clash. The error mechanism attempts to find the scope of an unknown symbol. This patch adds a missing case clause to GetScope and returns NulSym if the scope is an unknown symbol. gcc/m2/ChangeLog: PR modula2/115003 * gm2-compiler/SymbolTable.mod (GetScope): Add UndefinedSym case clause and return NulSym. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-05-09	c++: lambda capturing structured bindings [PR85889]	Marek Polacek	3	-1/+23
	<https://wg21.link/p1381r1> clarifies that it's OK to capture structured bindings. [expr.prim.lambda.capture]/4 says "The identifier in a simple-capture shall denote a local entity" and [basic.pre]/3: "An entity is a [...] structured binding". It doesn't appear that this was made a DR, so, strictly speaking, we should have a -Wc++20-extensions warning, like clang++. PR c++/85889 gcc/cp/ChangeLog: * lambda.cc (add_capture): Add a pedwarn for capturing structured bindings. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/decomp3.C: Use -Wno-c++20-extensions. * g++.dg/cpp1z/decomp60.C: New test.
2024-05-09	sra: Do not leave work for DSE (that it can sometimes not perform)	Martin Jambor	3	-6/+17
	When looking again at the g++.dg/tree-ssa/pr109849.C testcase we discovered that it generates terrible store-to-load forwarding stalls because SRA was leaving behind aggregate loads but all the stores were by scalar parts and DSE failed to remove the useless load. SRA has all the knowledge to remove the statement even now, so this small patch makes it do so. With this patch, the g++.dg/tree-ssa/pr109849.C micro-benchmark runs 9 times faster (on an AMD EPYC 75F3 machine). gcc/ChangeLog: 2024-04-18 Martin Jambor <mjambor@suse.cz> * tree-sra.cc (sra_modify_assign): Remove the original statement also when dealing with a store to a fully covered aggregate from a non-candidate. gcc/testsuite/ChangeLog: 2024-04-23 Martin Jambor <mjambor@suse.cz> * g++.dg/tree-ssa/pr109849.C: Also check that the aggeegate store to cur disappears. * gcc.dg/tree-ssa/ssa-dse-26.c: Instead of relying on DSE, check that the unwanted stores were removed at early SRA time.
2024-05-09	Manually update entries for the Revert Revert commits.	Jakub Jelinek	2	-0/+23

2024-05-09	Daily bump.	GCC Administrator	7	-1/+1254

2024-05-09	RISC-V: Make full-vec-move1.c test robust for optimization	Pan Li	1	-2/+4
	During investigate the support of early break autovec, we notice the test full-vec-move1.c will be optimized to 'return 0;' in main function body. Because somehow the value of V type is compiler time constant, and then the second loop will be considered as assert (true). Thus, the ccp4 pass will eliminate these stmt and just return 0. typedef int16_t V __attribute__((vector_size (128))); int main () { V v; for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++) (v)[i] = i; V res = v; for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++) assert (res[i] == i); // will be optimized to assert (true) } This patch would like to introduce a extern function to use the res[i] that get rid of the ccp4 optimization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: Introduce extern func use to get rid of ccp4 optimization. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-05-09	testsuite: Fix up vector-subaccess-1.C test for ia32 [PR89224]	Jakub Jelinek	1	-0/+1
	The test FAILs on i686-linux due to .../gcc/testsuite/g++.dg/torture/vector-subaccess-1.C:16:6: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi] excess warnings. This fixes it by adding -Wno-psabi, like commonly done in other tests. 2024-05-09 Jakub Jelinek <jakub@redhat.com> PR c++/89224 * g++.dg/torture/vector-subaccess-1.C: Add -Wno-psabi as additional options.
2024-05-09	MIPS: Support constraint 'w' for MSA instruction	YunQiang Su	2	-0/+12
	Support syntax like: asm volatile ("fmadd.d %w0, %w1, %w2" : "+w"(a): "w"(b), "w"(c)); gcc * config/mips/constraints.md: Add new constraint 'w'. gcc/testsuite * gcc.target/mips/msa-inline-asm.c: New test.
2024-05-09	RISC-V: Add tests for cpymemsi expansion	Christoph Müllner	4	-0/+116
	cpymemsi expansion was available for RISC-V since the initial port. However, there are not tests to detect regression. This patch adds such tests. Three of the tests target the expansion requirements (known length and alignment). One test reuses an existing memcpy test from the by-pieces framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c). gcc/testsuite/ChangeLog: * gcc.target/riscv/cpymemsi-1.c: New test. * gcc.target/riscv/cpymemsi-2.c: New test. * gcc.target/riscv/cpymemsi-3.c: New test. * gcc.target/riscv/cpymemsi.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-05-09	i386: Fix some intrinsics without alignment requirements.	Hu, Lin1	4	-9/+33
	gcc/ChangeLog: PR target/84508 * config/i386/emmintrin.h (_mm_load_sd): Remove alignment requirement. (_mm_store_sd): Ditto. (_mm_loadh_pd): Ditto. (_mm_loadl_pd): Ditto. (_mm_storel_pd): Add alignment requirement. * config/i386/xmmintrin.h (_mm_loadh_pi): Remove alignment requirement. (_mm_loadl_pi): Ditto. (_mm_load_ss): Ditto. (_mm_store_ss): Ditto. gcc/testsuite/ChangeLog: PR target/84508 * gcc.target/i386/pr84508-1.c: New test. * gcc.target/i386/pr84508-2.c: Ditto.
2024-05-09	[ranger] Force buffer alignment in Value_Range [PR114912]	Aldy Hernandez	1	-12/+18
	gcc/ChangeLog: PR tree-optimization/114912 * value-range.h (class Value_Range): Use a union.
2024-05-09	[prange] Reword dispatch error message	Aldy Hernandez	1	-1/+2
	After reading the ICE for the PR, it's obvious the error message is rather cryptic. This makes it less so. gcc/ChangeLog: * range-op.cc (range_op_handler::discriminator_fail): Reword error message.
2024-05-09	i386: fix ix86_hardreg_mov_ok with lra_in_progress	konglin1	1	-1/+2
	Originally eliminate_regs_in_insnit will transform (parallel [ (set (reg:QI 130) (plus:QI (subreg:QI (reg:DI 19 frame) 0) (const_int 96))) (clobber (reg:CC 17 flag))]) {addqi_1} to (set (reg:QI 130) (subreg:QI (reg:DI 19 frame) 0)) {movqi_internal} when verify_changes. But with No Flags add, it transforms (set (reg:QI 5 di) (plus:QI (subreg:QI (reg:DI 19 frame) 0) (const_int 96))) {addqi_1_nf} to (set (reg:QI 5 di) (subreg:QI (reg:DI 19 frame) 0)) {addqi_1_nf}. there is no extra clobbers at the end, and its dest reg just is a hardreg. For ix86_hardreg_mov_ok, it returns false. So it fails to update insn and causes the ICE when transform to movqi_internal. But actually it is ok and safe for ix86_hardreg_mov_ok when lra_in_progress. And tested the spec2017, the performance was not affected. gcc/ChangeLog: * config/i386/i386.cc (ix86_hardreg_mov_ok): Relax hard reg mov restriction when lra in progress.
2024-05-08	[PATCH v1 1/1] RISC-V: Nan-box the result of movbf on soft-bf16	Xiao Zeng	3	-25/+77
	1 This patch implements the Nan-box of bf16. 2 Please refer to the Nan-box implementation of hf16 in: <https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=057dc349021660c40699fb5c98fd9cac8e168653> 3 The discussion about Nan-box can be found on the website: <https://www.mail-archive.com/search?q=Nan-box+the+result+of+movhf+on+soft-fp16&l=gcc-patches%40gcc.gnu.org> 4 Below test are passed for this patch * The riscv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf with Nan-boxing value. * config/riscv/riscv.md (movbf_softfloat_boxing): New pattern. gcc/testsuite/ChangeLog: gcc.target/riscv/_Bfloat16-nanboxing.c: New test.
2024-05-08	[RISC-V][V2] Fix incorrect if-then-else nesting of Zbs usage in constant ↵	Jeff Law	1	-40/+41
	synthesis Reposting without the patch that ignores whitespace. The CI system doesn't like including both patches, that'll generate a failure to apply and none of the tests actually get run. So I managed to goof the if-then-else level of the bseti bits last week. They were supposed to be a last ditch effort to improve the result, but ended up inside a conditional where they don't really belong. I almost always use Zba, Zbb and Zbs together, so it slipped by. So it's NFC if you always test with Zbb and Zbs enabled together. But if you enabled Zbs without Zbb you'd see a failure to use bseti. gcc/ * config/riscv/riscv.cc (riscv_build_integer_1): Fix incorrect if-then-else nesting of Zbs code.
2024-05-08	AVR: target/114981 - Support __builtin_powi[l] / __powidf2.	Georg-Johann Lay	1	-0/+33
	This supports __powidf2 by means of a double wrapper for already existing f7_powi (renamed to __f7_powi by f7-renames.h). It tweaks the implementation so that it does not perform trivial multiplications with 1.0 any more, but instead uses a move. It also fixes the last statement of f7_powi, which was wrong. Notice that f7_powi was unused until now. PR target/114981 libgcc/config/avr/libf7/ * libf7-common.mk (F7_ASM_PARTS): Add D_powi * libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function. * libf7.c (f7_powi): Fix last (wrong) statement. Tweak trivial multiplications with 1.0. gcc/testsuite/ * gcc.target/avr/pr114981-powil.c: New test.
2024-05-08	[PR114810][LRA]: Recognize alternatives with lack of available registers for ↵	Vladimir N. Makarov	1	-2/+41
	insn and demote them. PR114810 was fixed in machine-dependent way. This patch is a fix of the PR on LRA side. LRA chose alternative with constraints `&r,r,ro` on i686 when all operands of DImode and there are only 6 available general regs. The patch recognizes such case and significantly increase the alternative cost. It does not reject alternative completely. So the fix is safe but it might not work for all potentially possible cases of registers lack as register classes can have any relations including subsets and intersections. gcc/ChangeLog: PR target/114810 * lra-constraints.cc (process_alt_operands): Calculate union reg class for the alternative, peak matched regs and required reload regs. Recognize alternatives with lack of available registers and make them costly. Add debug print about this case.
2024-05-08	c++: #pragma doesn't disable -Wunused-label [PR113582]	Marek Polacek	4	-6/+42
	The PR complains that void do_something(){ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wunused-label" start:; #pragma GCC diagnostic pop } #1 doesn't work. That's because we warn_for_unused_label only while we're in finish_function, meaning we're at #1 where we're outside the #pragma region. We can use suppress_warning + warning_suppressed_p to fix this. Note that I'm not using TREE_USED. Propagating it in tsubst_stmt/LABEL_EXPR from decl to label would mean that we don't warn in do_something2, but I think we want the warning there: we're in a template and the goto is a discarded statement. PR c++/113582 gcc/c-family/ChangeLog: * c-warn.cc (warn_for_unused_label): Don't warn if -Wunused-label has been suppressed for the label. gcc/cp/ChangeLog: * parser.cc (cp_parser_label_for_labeled_statement): suppress_warning if it's not enabled at input_location. * pt.cc (tsubst_stmt): Call copy_warning. gcc/testsuite/ChangeLog: * g++.dg/warn/Wunused-label-4.C: New test.
2024-05-08	match: `a CMP nonnegative ? a : ABS<a>` simplified to just `ABS<a>` [PR112392]	Andrew Pinski	2	-0/+49
	We can optimize `a == nonnegative ? a : ABS<a>`, `a > nonnegative ? a : ABS<a>` and `a >= nonnegative ? a : ABS<a>` into `ABS<a>`. This allows removal of some extra comparison and extra conditional moves in some cases. I don't remember where I had found though but it is simple to add so let's add it. Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note I have a secondary pattern for the equal case as either a or nonnegative could be used. PR tree-optimization/112392 gcc/ChangeLog: * match.pd (`x CMP nonnegative ? x : ABS<x>`): New pattern; where CMP is ==, > and >=. (`x CMP nonnegative@y ? y : ABS<x>`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-41.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-05-08	tree-ssa-sink: Improve code sinking pass	Ajit Kumar Agarwal	2	-4/+16
	Currently, code sinking will sink code at the use points with loop having same nesting depth. The following patch improves code sinking by placing the sunk code in begining of the block after the labels. 2024-05-08 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> gcc/ChangeLog: PR tree-optimization/81953 * tree-ssa-sink.cc (statement_sink_location):Sink statements at the begining of the basic block after labels. gcc/testsuite/ChangeLog: PR tree-optimization/81953 * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
2024-05-08	RISC-V: Cover sign-extensions in lshr<GPR:mode>3_zero_extend_4	Christoph Müllner	6	-8/+198
	The lshr<GPR:mode>3_zero_extend_4 pattern targets bit extraction with zero-extension. This pattern represents the canonical form of zero-extensions of a logical right shift. The same optimization can be applied to sign-extensions. Given the two optimizations are so similar, this patch converts the existing one to also cover the sign-extension case as well. gcc/ChangeLog: * config/riscv/iterators.md (ashiftrt): New code attribute 'extract_shift' and adding extractions to optab. * config/riscv/riscv.md (lshr<GPR:mode>3_zero_extend_4): Rename to... (<any_extract:optab><GPR:mode>3):...this and add support for sign-extensions. gcc/testsuite/ChangeLog: * gcc.target/riscv/extend-shift-helpers.h: Add helpers for sign-extension. * gcc.target/riscv/sign-extend-rshift-32.c: New test. * gcc.target/riscv/sign-extend-rshift-64.c: New test. * gcc.target/riscv/sign-extend-rshift.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-05-08	RISC-V: Add zero_extract support for rv64gc	Christoph Müllner	6	-0/+222
	The combiner attempts to optimize a zero-extension of a logical right shift using zero_extract. We already utilize this optimization for those cases that result in a single instructions. Let's add a insn_and_split pattern that also matches the generic case, where we can emit an optimized sequence of a slli/srli. Tested with SPEC CPU 2017 (rv64gc). PR target/111501 gcc/ChangeLog: * config/riscv/riscv.md (lshr<GPR:mode>3_zero_extend_4): New pattern for zero-extraction. gcc/testsuite/ChangeLog: gcc.target/riscv/extend-shift-helpers.h: New test. * gcc.target/riscv/pr111501.c: New test. * gcc.target/riscv/zero-extend-rshift-32.c: New test. * gcc.target/riscv/zero-extend-rshift-64.c: New test. * gcc.target/riscv/zero-extend-rshift.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-05-08	RISC-V: Cover sign-extensions in lshrsi3_zero_extend_2	Christoph Müllner	3	-4/+25
	The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower 32-bit word and zero-extends it back to DImode. This is realized using srliw, which operates on 32-bit registers. The same optimziation can be applied to sign-extensions when emitting a sraiw instead of the srliw. Given these two optimizations are so similar, this patch simply converts the existing one to also cover the sign-extension case as well. gcc/ChangeLog: * config/riscv/iterators.md (sraiw): New code iterator 'any_extract'. New code attribute 'extract_sidi_shift'. * config/riscv/riscv.md (lshrsi3_zero_extend_2): Rename to... (lshrsi3_extend_2):...this and add support for sign-extensions. gcc/testsuite/ChangeLog: * gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-05-08	RISC-V: Add test for sraiw-31 special case	Christoph Müllner	1	-0/+14
	We already optimize a sign-extension of a right-shift by 31 in <optab>si3_extend. Let's add a test for that (similar to zero-extend-1.c). gcc/testsuite/ChangeLog: * gcc.target/riscv/sign-extend-1.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-05-08	Fix SLP reduction initial value for pointer reductions	Richard Biener	1	-1/+8
	For pointer reductions we need to convert the initial value to the vector component integer type. * tree-vect-loop.cc (get_initial_defs_for_reduction): Convert initial value to the vector component type.
2024-05-08	Fix non-grouped SLP load/store accounting in alignment peeling	Richard Biener	1	-2/+5
	When we have a non-grouped access we bogously multiply by zero. This shows most with single-lane SLP but also happens with the multi-lane splat case. * tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Properly guard DR_GROUP_SIZE access with STMT_VINFO_GROUPED_ACCESS.
2024-05-08	aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]	Alex Coplan	1	-2/+2
	This fixes a typo in combine_reg_notes in the load/store pair fusion pass. As it stands, the calls to filter_notes store any REG_FRAME_RELATED_EXPR to fr_expr with the following association: - i2 -> fr_expr[0] - i1 -> fr_expr[1] but then the checks inside the following if statement expect the opposite (more natural) association, i.e.: - i2 -> fr_expr[1] - i1 -> fr_expr[0] this patch fixes the oversight by swapping the fr_expr indices in the calls to filter_notes. In hindsight it would probably have been less confusing / error-prone to have combine_reg_notes take an array of two insns, then we wouldn't have to mix 1-based and 0-based indexing as well as remembering to call filter_notes in reverse program order. This however is a minimal fix for backporting purposes. gcc/ChangeLog: PR target/114936 * config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes): Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in FR_EXPR[N-1], thus matching the correspondence expected by the copy_rtx calls.
2024-05-08	tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops	Stefan Schulze Frielinghaus	1	-0/+4
	This fixes a couple of tests (gcc.dg/vect/pr109011-.c) on s390 where loops are unrolled although -fno-unroll-loops is specified. gcc/ChangeLog: tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour -fno-unroll-loops.
2024-05-08	AVR: target/114975 - Add combine-pattern for __parityqi2.	Georg-Johann Lay	2	-1/+33
	PR target/114975 gcc/ * config/avr/avr.md: Add combine pattern for 8-bit parity detection. gcc/testsuite/ * gcc.target/avr/pr114975-parity.c: New test.
2024-05-08	AVR: target/114975 - Add combine-pattern for __popcountqi2.	Georg-Johann Lay	2	-0/+30
	PR target/114975 gcc/ * config/avr/avr.md: Add combine pattern for 8-bit popcount detection. gcc/testsuite/ * gcc.target/avr/pr114975-popcount.c: New test.
2024-05-08	Fix and speedup IDF pruning by dominator	Richard Biener	1	-22/+25
	When insert_updated_phi_nodes_for tries to skip pruning the IDF to blocks dominated by the nearest common dominator of the set of definition blocks it compares against ENTRY_BLOCK but that's never going to be the common dominator. In fact if it ever were the code fails to copy IDF to PRUNED_IDF, leading to wrong code. The following fixes that by avoiding the copy and pruning from the IDF in-place as well as using the more approprate check against the single successor of the ENTRY_BLOCK. * tree-into-ssa.cc (insert_updated_phi_nodes_for): Skip pruning when the nearest common dominator is the successor of ENTRY_BLOCK. Do not copy IDF but prune it directly.
2024-05-08	reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]	Jakub Jelinek	2	-1/+32
	The optimize_range_tests_to_bit_test optimization normally emits a range test first: if (entry_test_needed) { tem = build_range_check (loc, optype, unshare_expr (exp), false, lowi, high); if (tem == NULL_TREE \|\| is_gimple_val (tem)) continue; } so during the bit test we already know that exp is in the [lowi, high] range, but skips it if we have range info which tells us this isn't necessary. Also, normally it emits shifts by exp - lowi counter, but has an optimization to use just exp counter if the mask isn't a more expensive constant in that case and lowi is > 0 and high is smaller than prec. The following testcase is miscompiled because the two abnormal cases are triggered. The range of exp is [43, 43][48, 48][95, 95], so we on 64-bit arch decide we don't need the entry test, because 95 - 43 < 64. And we also decide to use just exp as counter, because the range test tests just for exp == 43 \|\| exp == 48, so high is smaller than 64 too. Because 95 is in the exp range, we can't do that, we'd either need to do a range test first, i.e. if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1)) or need to subtract lowi from the shift counter, i.e. if ((1UL << (exp - 43)) & mask2) but can't do both unless r.upper_bound () is < prec. The following patch ensures that. 2024-05-08 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/114965 * tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to optimize away exp - lowi subtraction from shift count unless entry test is emitted or unless r.upper_bound () is smaller than prec. * gcc.c-torture/execute/pr114965.c: New test.
2024-05-08	Minor tweaks to code computing modular multiplicative inverse	Eric Botcazou	4	-60/+71
	This removes the last parameter of choose_multiplier, which is unused, adds another assertion and more details to the description and various comments. Likewise to the closely related invert_mod2n, except for the last parameter. [changelog] * expmed.h (choose_multiplier): Tweak description and remove last parameter. * expmed.cc (choose_multiplier): Likewise. Add assertion for the third parameter and adds details to various comments. (invert_mod2n): Tweak description and add assertion for the first parameter. (expand_divmod): Adjust calls to choose_multiplier. * tree-vect-generic.cc (expand_vector_divmod): Likewise. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Likewise.
2024-05-08	x86: Fix cmov cost model issue [PR109549]	konglin1	2	-5/+2
	(if_then_else:SI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg/v:SI 101 [ e ]) (reg:SI 102)) The cost is 8 for the rtx, the cost for (eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4, but this is just an operator do not need to compute it's cost in cmov. gcc/ChangeLog: PR target/109549 * config/i386/i386.cc (ix86_rtx_costs): The XEXP (x, 0) for cmov is an operator do not need to compute cost. gcc/testsuite/ChangeLog: * gcc.target/i386/cmov6.c: Fixed.
2024-05-08	Enable prange support.	Aldy Hernandez	13	-31/+18
	This throws the switch on prange. After this patch, it is no longer valid to store a pointer in an irange (or vice versa). Instead, they must go in prange, which is faster and more memory efficient. I will push this now, so I have time to do any follow-up bugfixing before going on paternity leave. There are various cleanups we plan on doing after this patch (faster intersect/union, remove range-op-mixed.h, remove value_range in favor of int_range_max, reclaim the name for the Value_Range temporary, clean up range-ops, etc etc). But we will hold off on those for now to make it easier to revert this patch, if for some reason we need to do so while I'm away. Tested on x86-64 Linux. gcc/ChangeLog: * gimple-range-cache.cc (sbr_sparse_bitmap::sbr_sparse_bitmap): Change irange to prange. * gimple-range-fold.cc (fold_using_range::fold_stmt): Same. (fold_using_range::range_of_address): Same. * gimple-range-fold.h (range_of_address): Same. * gimple-range-infer.cc (gimple_infer_range::add_nonzero): Same. * gimple-range-op.cc (class cfn_strlen): Same. * gimple-range-path.cc (path_range_query::adjust_for_non_null_uses): Same. * gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses): Same. * tree-ssa-structalias.cc (find_what_p_points_to): Same. * range-op-ptr.cc (range_op_table::initialize_pointer_ops): Remove hybrid entries in table. * range-op.cc (range_op_table::range_op_table): Add pointer entries for bitwise and/or and min/max. * value-range.cc (irange::verify_range): Add assert. * value-range.h (irange::varying_compatible_p): Remove check for error_mark_node. (irange::supports_p): Remove pointer support. * ipa-cp.h (ipa_supports_p): Add prange support.
2024-05-08	Revert "Revert "testsuite/gcc.target/cris/pr93372-2.c: Handle xpass from ↵	Hans-Peter Nilsson	1	-7/+8
	combine improvement"" This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.
2024-05-08	c++/modules: Stream unmergeable temporaries by value again [PR114856]	Nathaniel Shead	5	-1/+26
	In r14-9266-g2823b4d96d9ec4 I gave all temporary vars a DECL_CONTEXT, including those at namespace or global scope, so that they could be properly merged across importers. However, not all of these temporary vars are actually supposed to be mergeable. For instance, in the attached testcase we have an unnamed temporary var used in the NSDMI of a class member, which cannot properly merged -- but it also doesn't need to be, as it'll be thrown away when the class type itself is merged anyway. This patch reverts the change made above and instead makes a weaker adjustment that only causes temporary vars with linkage have a DECL_CONTEXT to merge from. This way these unnamed, "unmergeable" temporaries are properly streamed by value again. PR c++/114856 gcc/cp/ChangeLog: * call.cc (make_temporary_var_for_ref_to_temp): Set context for temporaries with linkage. * init.cc (create_temporary_var): Revert to only set context when in a function decl. gcc/testsuite/ChangeLog: * g++.dg/modules/pr114856.h: New test. * g++.dg/modules/pr114856_a.H: New test. * g++.dg/modules/pr114856_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
2024-05-07	c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector ↵	Andrew Pinski	4	-3/+32
	types [PR89224] After r7-987-gf17a223de829cb, the access for the elements of a vector type would lose the qualifiers. So if we had `constvector[0]`, the type of the element of the array would not have const on it. This was due to a missing build_qualified_type for the inner type of the vector when building the array type. We need to add back the call to build_qualified_type and now the access has the correct qualifiers. So the overloads and even if it is a lvalue or rvalue is correctly done. Note we correctly now reject the testcase gcc.dg/pr83415.c which was incorrectly accepted after r7-987-gf17a223de829cb. Built and tested for aarch64-linux-gnu. PR c++/89224 gcc/c-family/ChangeLog: * c-common.cc (convert_vector_to_array_for_subscript): Call build_qualified_type for the inner type. gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_array_reference): Compare main variants for the vector/array types instead of the types directly. gcc/testsuite/ChangeLog: * g++.dg/torture/vector-subaccess-1.C: New test. * gcc.dg/pr83415.c: Change warning to error. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-05-07	DCE __cxa_atexit calls where the function is pure/const [PR19661]	Andrew Pinski	7	-0/+201
	In C++ sometimes you have a deconstructor function which is "empty", like for an example with unions or with arrays. The front-end might not know it is empty either so this should be done on during optimization.o To implement it I added it to DCE where we mark if a statement is necessary or not. Bootstrapped and tested on x86_64-linux-gnu with no regressions. Changes since v1: * v2: Add support for __aeabi_atexit for arm-eabi. Add extra comments. Add cxa_atexit-5.C testcase for -fPIC case. v3: Fix testcases for the __aeabi_atexit (forgot to do in the v2). PR tree-optimization/19661 gcc/ChangeLog: * tree-ssa-dce.cc (is_cxa_atexit): New function. (is_removable_cxa_atexit_call): New function. (mark_stmt_if_obviously_necessary): Don't mark removable cxa_at_exit calls. (mark_all_reaching_defs_necessary_1): Likewise. (propagate_necessity): Likewise. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/cxa_atexit-1.C: New test. * g++.dg/tree-ssa/cxa_atexit-2.C: New test. * g++.dg/tree-ssa/cxa_atexit-3.C: New test. * g++.dg/tree-ssa/cxa_atexit-4.C: New test. * g++.dg/tree-ssa/cxa_atexit-5.C: New test. * g++.dg/tree-ssa/cxa_atexit-6.C: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-05-07	MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) ↵	Andrew Pinski	2	-0/+57
	to match This adds a few more of what is currently done in phiopt's value_replacement to match. I noticed this when I was hooking up phiopt's value_replacement code to use match and disabling the old code. But this can be done independently from the hooking up phiopt's value_replacement as phiopt is already hooked up for simplified versions already. /* a != 0 ? a / b : 0 -> a / b iff b is nonzero. / / a != 0 ? a * b : 0 -> a * b / / a != 0 ? a & b : 0 -> a & b / We prefer the `cond ? a : 0` forms to allow optimization of `a cond` which uses that form. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/114894 gcc/ChangeLog: * match.pd (`a != 0 ? a / b : 0`): New pattern. (`a != 0 ? a * b : 0`): New pattern. (`a != 0 ? a & b : 0`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-value-5.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-05-07	[committed][RISC-V] Turn on overlap_op_by_pieces for generic-ooo tuning	Jeff Law	1	-1/+1
	Per quick email exchange with Palmer. Given the triviality, I'm just pushing it. gcc/ * config/riscv/riscv.cc (generic_ooo_tune_info): Turn on overlap_op_by_pieces.
2024-05-07	[committed] [RISC-V] Allow uarchs to set TARGET_OVERLAP_OP_BY_PIECES_P	Christoph Müllner	3	-0/+117
	This is almost exclusively work from the VRULL team. As we've discussed in the Tuesday meeting in the past, we'd like to have a knob in the tuning structure to indicate that overlapped stores during move_by_pieces expansion of memcpy & friends are acceptable. This patch adds the that capability in our tuning structure. It's off for all the uarchs upstream, but we have been using it inside Ventana for our uarch with success. So technically it's NFC upstream, but puts in the infrastructure multiple organizations likely need. gcc/ * config/riscv/riscv.cc (struct riscv_tune_param): Add new "overlap_op_by_pieces" field. (rocket_tune_info, sifive_7_tune_info): Set it. (sifive_p400_tune_info, sifive_p600_tune_info): Likewise. (thead_c906_tune_info, xiangshan_nanhu_tune_info): Likewise. (generic_ooo_tune_info, optimize_size_tune_info): Likewise. (riscv_overlap_op_by_pieces): New function. (TARGET_OVERLAP_OP_BY_PIECES_P): define. gcc/testsuite/ * gcc.target/riscv/memcpy-nonoverlapping.c: New test. * gcc.target/riscv/memset-nonoverlapping.c: New test.
2024-05-07	c++: Implement C++26 P2893R3 - Variadic friends [PR114459]	Jakub Jelinek	6	-41/+169
	The following patch imeplements the C++26 P2893R3 - Variadic friends paper. The paper allows for the friend type declarations to specify more than one friend type specifier and allows to specify ... at the end of each. The patch doesn't introduce tentative parsing of friend-type-declaration non-terminal, but rather just extends existing parsing where it is a friend declaration which ends with ; after the declaration specifiers to the cases where it ends with ...; or , or ..., In that case it pedwarns for cxx_dialect < cxx26, handles the ... and if there is , continues in a loop to parse the further friend type specifiers. 2024-05-07 Jakub Jelinek <jakub@redhat.com> PR c++/114459 gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_variadic_friend=202403L for C++26. gcc/cp/ * parser.cc (cp_parser_member_declaration): Implement C++26 P2893R3 - Variadic friends. Parse friend type declarations with ... or with more than one friend type specifier. * friend.cc (make_friend_class): Allow TYPE_PACK_EXPANSION. * pt.cc (instantiate_class_template): Handle PACK_EXPANSION_P in friend classes. gcc/testsuite/ * g++.dg/cpp26/feat-cxx26.C (__cpp_variadic_friend): Add test. * g++.dg/cpp26/variadic-friend1.C: New test.