aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-11-11mode-switching: Add a target-configurable confluence operatorRichard Sandiford4-28/+186
The mode-switching pass assumed that all of an entity's modes were mutually exclusive. However, the upcoming SME changes have an entity with some overlapping modes, so that there is sometimes a "superunion" mode that contains two given modes. We can use this relationship to pass something more helpful than "don't know" to the emit hook. This patch adds a new hook that targets can use to specify a mode confluence operator. With mutually exclusive modes, it's possible to compute a block's incoming and outgoing modes by looking at its availability sets. With the confluence operator, we instead need to solve a full dataflow problem. However, when emitting a mode transition, the upcoming SME use of mode-switching benefits from having as much information as possible about the starting mode. Calculating this information is definitely worth the compile time. The dataflow problem is written to work before and after the LCM problem has been solved. A later patch makes use of this. While there (since git blame would ping me for the reindented code), I used a lambda to avoid the cut-&-pasted loops. gcc/ * target.def (mode_switching.confluence): New hook. * doc/tm.texi (TARGET_MODE_CONFLUENCE): New @hook. * doc/tm.texi.in: Regenerate. * mode-switching.cc (confluence_info): New variable. (mode_confluence, forward_confluence_n, forward_transfer): New functions. (optimize_mode_switching): Use them to calculate mode_in when TARGET_MODE_CONFLUENCE is defined.
2023-11-11mode-switching: Use 1-based edge aux fieldsRichard Sandiford1-4/+4
The pass used the edge aux field to record which mode change should happen on the edge, with -1 meaning "none". It's more convenient for later patches to leave aux zero for "none", and use numbers based at 1 to record a change. gcc/ * mode-switching.cc (commit_mode_sets): Use 1-based edge aux values.
2023-11-11mode-switching: Pass the set of live registers to the after hookRichard Sandiford8-13/+20
This patch passes the set of live hard registers to the after hook, like the previous one did for the needed hook. gcc/ * target.def (mode_switching.after): Add a regs_live parameter. * doc/tm.texi: Regenerate. * config/epiphany/epiphany-protos.h (epiphany_mode_after): Update accordingly. * config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise. (epiphany_mode_after): Likewise. * config/i386/i386.cc (ix86_mode_after): Likewise. * config/riscv/riscv.cc (riscv_mode_after): Likewise. * config/sh/sh.cc (sh_mode_after): Likewise. * mode-switching.cc (optimize_mode_switching): Likewise.
2023-11-11mode-switching: Pass set of live registers to the needed hookRichard Sandiford9-13/+27
The emit hook already takes the set of live hard registers as input. This patch passes it to the needed hook too. SME uses this to optimise the mode choice based on whether state is live or dead. The main caller already had access to the required info, but the special handling of return values did not. gcc/ * target.def (mode_switching.needed): Add a regs_live parameter. * doc/tm.texi: Regenerate. * config/epiphany/epiphany-protos.h (epiphany_mode_needed): Update accordingly. * config/epiphany/epiphany.cc (epiphany_mode_needed): Likewise. * config/epiphany/mode-switch-use.cc (insert_uses): Likewise. * config/i386/i386.cc (ix86_mode_needed): Likewise. * config/riscv/riscv.cc (riscv_mode_needed): Likewise. * config/sh/sh.cc (sh_mode_needed): Likewise. * mode-switching.cc (optimize_mode_switching): Likewise. (create_pre_exit): Likewise, using the DF simulate functions to calculate the required information.
2023-11-11mode-switching: Allow targets to set the mode for EH handlersRichard Sandiford4-1/+19
The mode-switching pass already had hooks to say what mode an entity is in on entry to a function and what mode it must be in on return. For SME, we also want to say what mode an entity is guaranteed to be in on entry to an exception handler. gcc/ * target.def (mode_switching.eh_handler): New hook. * doc/tm.texi.in (TARGET_MODE_EH_HANDLER): New @hook. * doc/tm.texi: Regenerate. * mode-switching.cc (optimize_mode_switching): Use eh_handler to get the mode on entry to an exception handler.
2023-11-11mode-switching: Tweak entry/exit handlingRichard Sandiford1-19/+15
An entity isn't transparent in a block that requires a specific mode. optimize_mode_switching took that into account for normal insns, but didn't for the exit block. Later patches misbehaved because of this. In contrast, an entity was correctly marked as non-transparent in the entry block, but the reasoning seemed a bit convoluted. It also referred to a function that no longer exists. Since KILL = ~TRANSP, the entity is by definition not transparent in a block that defines the entity, so I think we can make it so without comment. Finally, the exit handling was nested in the entry handling, but that doesn't seem necessary. A target could say that an entity is undefined on entry but must be defined on return, on a "be liberal in what you accept, be conservative in what you do" principle. gcc/ * mode-switching.cc (optimize_mode_switching): Mark the exit block as nontransparent if it requires a specific mode. Handle the entry and exit mode as sibling rather than nested concepts. Remove outdated comment.
2023-11-11mode-switching: Simplify recording of transparencyRichard Sandiford1-8/+11
For a given block, an entity is either transparent for all modes or for none. Each update to the transparency set therefore used a loop like: for (i = 0; i < no_mode; i++) clear_mode_bit (transp[bb->index], j, i); This patch instead starts out with a bit-per-block bitmap and updates the main bitmap at the end. This isn't much of a simplification on its own. The main purpose is to simplify later patches. gcc/ * mode-switching.cc (optimize_mode_switching): Initially compute transparency in a bit-per-block bitmap.
2023-11-11mode-switching: Fix the mode passed to the emit hookRichard Sandiford1-13/+17
optimize_mode_switching passes an entity's current mode (if known) to the emit hook. However, the mode that it passed ignored the effect of the after hook. Instead, the mode for the first emit call in a block was taken from the incoming mode, whereas the mode for each subsequent emit call was taken from the result of the previous call. The previous pass through the insns already calculated the correct mode, so this patch records it in the seginfo structure. (There was a 32-bit hole on 64-bit hosts, so this doesn't increase the size of the structure for them.) gcc/ * mode-switching.cc (seginfo): Add a prev_mode field. (new_seginfo): Take and initialize the prev_mode. (optimize_mode_switching): Update calls accordingly. Use the recorded modes during the emit phase, rather than computing one on the fly.
2023-11-11mode-switching: Avoid quadractic list operationRichard Sandiford1-16/+8
add_seginfo chained insn information to the end of a list by starting at the head of the list. This patch avoids the quadraticness by keeping track of the tail pointer. gcc/ * mode-switching.cc (add_seginfo): Replace head pointer with a pointer to the tail pointer. (optimize_mode_switching): Update calls accordingly.
2023-11-11mode-switching: Add note problemRichard Sandiford1-0/+1
optimize_mode_switching uses REG_DEAD notes to track register liveness, but it failed to tell DF to calculate up-to-date notes. Noticed by inspection. I don't have a testcase that fails because of this. gcc/ * mode-switching.cc (optimize_mode_switching): Call df_note_add_problem.
2023-11-11mode-switching: Tweak the macro/hook documentationRichard Sandiford3-54/+84
I found the documentation for the mode-switching macros/hooks a bit hard to follow at first. This patch tries to add the information that I think would have made it easier to understand. Of course, documentation preferences are personal, and so I could be changing something that others understood to something that seems impenetrable. Some notes on specific changes: - "in an optimizing compilation" didn't seem accurate; the pass is run even at -O0, and often needs to be for correctness. - "at run time" meant when the compiler was run, rather than when the compiled code was run. - Removing the list of optional macros isn't a clarification, but it means that upcoming patches don't create an absurdly long list. - I don't really understand the purpose of TARGET_MODE_PRIORITY, so I mostly left that alone. gcc/ * target.def: Tweak documentation of mode-switching hooks. * doc/tm.texi.in (OPTIMIZE_MODE_SWITCHING): Tweak documentation. (NUM_MODES_FOR_MODE_SWITCHING): Likewise. * doc/tm.texi: Regenerate.
2023-11-11c: Synthesize nonnull attribute for parameters declared with static [PR110815]Martin Uecker4-12/+29
Parameters declared with `static` are nonnull. We synthesize an artifical nonnull attribute for such parameters to get the same warnings and optimizations. Bootstrapped and regression tested on x86. PR c/110815 PR c/112428 gcc/c-family: * c-attribs.cc (build_attr_access_from_parms): Synthesize nonnull attribute for parameters declared with `static`. gcc: * gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes): remove warning for parameters declared with `static`. gcc/testsuite: * gcc.dg/Wnonnull-8.c: Adapt test. * gcc.dg/Wnonnull-9.c: New test.
2023-11-11Make scan-assembler* ignore LTO sectionsJoern Rennecke4-6/+41
gcc/testsuite/ * lib/scanasm.exp (scan-assembler-times): Disregard LTO sections. (scan-assembler-dem, scan-assembler-dem-not): Likewise. (dg-scan): Likewise, if name starts with scan-assembler. (scan-raw-assembler): New proc. * gcc.dg/pr61868.c: Use scan-raw-assembler. * gcc.dg/scantest-lto.c: New test. gcc/ * doc/sourcebuild.texi (Scan the assembly output): Document change.
2023-11-11RISC-V: Add test for PR112469Juzhe-Zhong1-0/+13
As PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112469 which has been fixed by Richard patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635994.html Add tests to avoid regression. Committed. PR target/112469 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112469.c: New test.
2023-11-10testsuite: fix lambda-decltype3.C in C++11Marek Polacek1-0/+2
This fixes FAIL: g++.dg/cpp0x/lambda/lambda-decltype3.C -std=c++11 (test for excess errors) due to lambda-decltype3.C:25:6: error: lambda capture initializers only available with '-std=c++14' or '-std=gnu++14' [-Wc++14-extensions] gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-decltype3.C: Check __cpp_init_captures.
2023-11-10[PATCH] doc: Add fpatchable-function-entry to Option-Summary page[PR110983]Mao1-1/+2
gcc/ PR middle-end/110983 * doc/invoke.texi (Option Summary): Add -fpatchable-function-entry.
2023-11-10RISC-V: Fix indentation of "length" attribute for branches and jumpsMaciej W. Rozycki1-11/+17
The "length" attribute calculation expressions for branches and jumps are incorrectly and misleadingly indented, and they overrun the 80 column limit as well, all of this causing troubles in following them. Correct all these issues. gcc/ * config/riscv/riscv.md (length): Fix indentation for branch and jump length calculation expressions.
2023-11-10c23: recursive type checking of tagged typeMartin Uecker1-202/+58
Adapt the old and unused code for type checking for C23. gcc/c/: * c-typeck.cc (struct comptypes_data): Add anon_field flag. (comptypes, comptypes_check_unum_int, comptypes_check_different_types): Remove old cache. (tagged_tu_types_compatible_p): Rewrite.
2023-11-10g++: Rely on dg-do-what-default to avoid running pr102788.cc on non-vector ↵Patrick O'Neill1-1/+0
targets Testcases in g++.dg/vect rely on check_vect_support_and_set_flags to set dg-do-what-default and avoid running vector tests on non-vector targets. The testcase in this patch overwrites the default with dg-do run. Removing the dg-do run directive resolves this issue for non-vector targets (while still running the tests on vector targets). gcc/testsuite/ChangeLog: * g++.dg/vect/pr102788.cc: Remove dg-do run directive. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-11-10Handle constant CONSTRUCTORs in operand_compareEric Botcazou3-7/+118
This teaches operand_compare to compare constant CONSTRUCTORs, which is quite helpful for so-called fat pointers in Ada, i.e. objects that are semantically pointers but are represented by structures made up of two pointers. This is modeled on the implementation present in the ICF pass. gcc/ * fold-const.cc (operand_compare::operand_equal_p) <CONSTRUCTOR>: Deal with nonempty constant CONSTRUCTORs. (operand_compare::hash_operand) <CONSTRUCTOR>: Hash DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET for FIELD_DECLs. gcc/testsuite/ * gnat.dg/opt103.ads, gnat.dg/opt103.adb: New test.
2023-11-10[IRA]: Check autoinc and memory address after temporary equivalence substitutionVladimir N. Makarov2-1/+48
My previous RA patches to take register equivalence into account do temporary register equivalence substitution to find out that the equivalence can be consumed by insns. The insn with the substitution is checked on validity using target-depended code. This code expects that autoinc operations work on register but this register can be substituted by equivalent memory. The patch fixes this problem. The patch also adds checking that the substitution can be consumed in memory address too. gcc/ChangeLog: PR target/112337 * ira-costs.cc: (validate_autoinc_and_mem_addr_p): New function. (equiv_can_be_consumed_p): Use it. gcc/testsuite/ChangeLog: PR target/112337 * gcc.target/arm/pr112337.c: New.
2023-11-10ada: Fix syntax errorAndris Pavēnis1-3/+3
gcc/ada/ * expect.c (__gnat_waitpid): fix syntax errors
2023-11-10c++: decltype of (by-value captured reference) [PR79620]Patrick Palka3-2/+37
The capture_decltype handling in finish_decltype_type wasn't looking through implicit INDIRECT_REF (added by convert_from_reference), which caused us to incorrectly resolve decltype((r)) to float& below. This patch fixes this, and adds an assert to outer_automatic_var_p to help prevent against such bugs. We still don't fully accept the example ultimately because for the decltype inside the lambda's trailing return type, at that point we're in lambda type scope but not yet in lambda function scope that the capture_decltype handling looks for (which is an orthogonal bug). PR c++/79620 gcc/cp/ChangeLog: * cp-tree.h (STRIP_REFERENCE_REF): Define. * semantics.cc (outer_var_p): Assert REFERENCE_REF_P is false. (finish_decltype_type): Look through implicit INDIRECT_REF when deciding whether to call capture_decltype. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-decltype3.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2023-11-10c++: decltype of capture proxy [PR79378, PR96917]Patrick Palka3-1/+96
We typically don't see capture proxies in finish_decltype_type because process_outer_var_ref is a no-op within an unevaluated context and so a use of a captured variable within decltype resolves to the captured variable, not the capture. But we can see them during decltype(auto) deduction and for decltype of an init-capture, which suggests we need to handle capture proxies specially within finish_decltype_type after all. This patch adds such handling. PR c++/79378 PR c++/96917 gcc/cp/ChangeLog: * semantics.cc (finish_decltype_type): Handle an id-expression naming a capture proxy specially. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/decltype-auto7.C: New test. * g++.dg/cpp1y/lambda-init20.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2023-11-10Allow md iterators to include other iteratorsRichard Sandiford3-47/+47
This patch allows an .md iterator to include the contents of previous iterators, possibly with an extra condition attached. Too much indirection might become hard to follow, so for the AArch64 changes I tried to stick to things that seemed likely to be uncontroversial: (a) structure iterators that combine modes for different sizes and vector counts (b) iterators that explicitly duplicate another iterator (for iterating over the cross product) gcc/ * read-rtl.cc (md_reader::read_mapping): Allow iterators to include other iterators. * doc/md.texi: Document the change. * config/aarch64/iterators.md (DREG2, VQ2, TX2, DX2, SX2): Include the iterator that is being duplicated, rather than reproducing it. (VSTRUCT_D): Redefine using VSTRUCT_[234]D. (VSTRUCT_Q): Likewise VSTRUCT_[234]Q. (VSTRUCT_2QD, VSTRUCT_3QD, VSTRUCT_4QD, VSTRUCT_QD): Redefine using the individual D and Q iterators.
2023-11-10i386: Clear stack protector scratch with zero/sign-extend instructionUros Bizjak1-8/+66
Use unrelated register initializations using zero/sign-extend instructions to clear stack protector scratch register. Hanlde only SI -> DImode extensions for 64-bit targets, as this is the only extension that triggers the peephole in a non-negligible number. Also use explicit check for word_mode instead of mode iterator in peephole2 patterns to avoid pattern explosion. gcc/ChangeLog: * config/i386/i386.md (stack_protect_set_1 peephole2): Explicitly check operand 2 for word_mode. (stack_protect_set_1 peephole2 #2): Ditto. (stack_protect_set_2 peephole2): Ditto. (stack_protect_set_3 peephole2): Ditto. (*stack_protect_set_4z_<mode>_di): New insn patter. (*stack_protect_set_4s_<mode>_di): Ditto. (stack_protect_set_4 peephole2): New peephole2 pattern to substitute stack protector scratch register clear with unrelated register initialization involving zero/sign-extend instruction.
2023-11-10i386: Fix ashift insn mnemonic in shift code attributeUros Bizjak1-1/+1
gcc/ChangeLog: * config/i386/i386.md (shift): Use SAL insted of SLL for ashift insn mnemonic.
2023-11-10Middle-end: Fix bug of induction variable vectorization for RVVJuzhe-Zhong2-1/+62
PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 1. Since SELECT_VL result is not necessary always VF in non-final iteration. Current GIMPLE IR is wrong: ... _35 = .SELECT_VL (ivtmp_33, VF); _21 = vect_vec_iv_.8_22 + { VF, ... }; E.g. Consider the total iterations N = 6, the VF = 4. Since SELECT_VL output is defined as not always to be VF in non-final iteration which needs to depend on hardware implementation. Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization. It may process 3 elements at the 1st iteration and 3 elements at the last iteration. Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; is wrong which is adding VF, which is 4, actually, we didn't process 4 elements. It should be adding 3 elements which is the result of SELECT_VL. So, here the correct IR should be: _36 = .SELECT_VL (ivtmp_34, VF); _22 = (int) _36; vect_cst__21 = [vec_duplicate_expr] _22; 2. This issue only happens on non-SLP vectorization single rgroup since: if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) { tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type, OPTIMIZE_FOR_SPEED) && LOOP_VINFO_LENS (loop_vinfo).length () == 1 && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())) LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true; } 3. This issue doesn't appears on nested loop no matter LOOP_VINFO_USING_SELECT_VL_P is true or false. Since: # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)> # vect_diff_15.7_20 = PHI <vect_diff_9.8_22(3), vect_diff_18.5_11(5)> _19 = vect_vec_iv_.6_5 + { 1, ... }; vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, vect_diff_15.7_20, vect_diff_15.7_20, _28, 0); ivtmp_1 = ivtmp_4 + 4294967295; .... <bb 5> [local count: 6549826]: # vect_diff_18.5_11 = PHI <vect_diff_9.8_22(4), { 0, ... }(2)> # ivtmp_26 = PHI <ivtmp_27(4), 40(2)> _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]); goto <bb 3>; [100.00%] Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable independent on VF (or don't care about how many elements are processed in the iteration). The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true. Testing passed, Ok for trunk ? PR tree-optimization/112438 gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_induction): Bugfix when LOOP_VINFO_USING_SELECT_VL_P. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112438.c: New test.
2023-11-10RISC-V: Add combine optimization by slideup for vec_init vectorizationJuzhe-Zhong11-0/+1101
This patch is a small optimization for vector initialization. Discovered when I am evaluating benchmarks. Consider this following case: void foo3 (int8_t *out, int8_t x, int8_t y) { v16qi v = {y, y, y, y, y, y, y, x, x, x, x, x, x, x, x, x}; *(v16qi*)out = v; } Before this patch: vsetivli zero,16,e8,m1,ta,ma vmv.v.x v1,a2 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vslide1down.vx v1,v1,a1 vse8.v v1,0(a0) ret After this patch: vsetivli zero,16,e8,m1,ta,ma vmv.v.x v1,a1 vmv.v.x v2,a2 vslideup.vi v1,v2,8 vse8.v v1,0(a0) ret gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): New enum. * config/riscv/riscv-v.cc (rvv_builder::combine_sequence_use_slideup_profitable_p): New function. (expand_vector_init_slideup_combine_sequence): Ditto. (expand_vec_init): Add slideup combine optimization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add combine test. * gcc.target/riscv/rvv/autovec/vls-vlmax/combine-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/combine-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/combine-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/combine-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/combine-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/combine-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/combine-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/combine-7.c: New test.
2023-11-10RISC-V: testsuite: Fix 32-bit FAILs.Robin Dapp33-331/+335
This patch fixes several more FAILs that would only show in 32-bit runs. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: Adjust. * gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/pr111401.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfcvt-itof-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/slp-mask-run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: Ditto. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
2023-11-10vect: Look through pattern stmt in fold_left_reduction.Robin Dapp2-1/+11
It appears as if we "look through" a statement pattern in vect_finish_replace_stmt but not before when we replace the newly created vector statement's lhs. Then the lhs is the statement pattern's lhs while in vect_finish_replace_stmt we assert that it's from the statement the pattern replaced. This patch uses vect_orig_stmt on the scalar destination's definition so the replaced statement is used everywhere. gcc/ChangeLog: PR tree-optimization/112464 * tree-vect-loop.cc (vectorize_fold_left_reduction): Use vect_orig_stmt on scalar_dest_def_info. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112464.c: New test.
2023-11-10RISC-V: XTheadMemPair: Fix missing fcsr handling in ISR prologue/epilogueJin Ma2-28/+46
The t0 register is used as a temporary register for interrupts, so it needs special treatment. It is necessary to avoid using "th.ldd" in the interrupt program to stop the subsequent operation of the t0 register, so they need to exchange positions in the function "riscv_for_each_saved_reg". gcc/ChangeLog: * config/riscv/riscv.cc (riscv_for_each_saved_reg): Place the interrupt operation before the XTheadMemPair.
2023-11-10tree-optimization/110221 - SLP and loop mask/lenRichard Biener2-0/+27
The following fixes the issue that when SLP stmts are internal defs but appear invariant because they end up only using invariant defs then they get scheduled outside of the loop. This nice optimization breaks down when loop masks or lens are applied since those are not explicitly tracked as dependences. The following makes sure to never schedule internal defs outside of the vectorized loop when the loop uses masks/lens. PR tree-optimization/110221 * tree-vect-slp.cc (vect_schedule_slp_node): When loop masking / len is applied make sure to not schedule intenal defs outside of the loop. * gfortran.dg/pr110221.f: New testcase.
2023-11-10vect: Don't set excess bits in unform masksAndrew Stubbs1-2/+14
AVX ignores any excess bits in the mask (at least for vector sizes >=8), but AMD GCN magically uses a larger vector than was intended (the smaller sizes are "fake"), leading to wrong-code. This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c, gfortran.dg/c-interop/contiguous-1.f90, gfortran.dg/c-interop/ff-descriptor-7.f90, and others. gcc/ChangeLog: * expr.cc (store_constructor): Add "and" operation to uniform mask generation.
2023-11-10amdgcn: Fix v_add constraints (pr112308)Andrew Stubbs1-50/+68
The instruction doesn't allow "B" constants for the vop3b encoding (used when the cc register isn't VCC), so fix the pattern and all the insns that might get split to it post-reload. Also switch to the new constraint format for ease of adding new alternatives. gcc/ChangeLog: PR target/112308 * config/gcn/gcn-valu.md (add<mode>3<exec_clobber>): Fix B constraint and switch to the new format. (add<mode>3_dup<exec_clobber>): Likewise. (add<mode>3_vcc<exec_vcc>): Likewise. (add<mode>3_vcc_dup<exec_vcc>): Likewise. (add<mode>3_vcc_zext_dup): Likewise. (add<mode>3_vcc_zext_dup_exec): Likewise. (add<mode>3_vcc_zext_dup2): Likewise. (add<mode>3_vcc_zext_dup2_exec): Likewise.
2023-11-10middle-end/112469 - fix missing converts in vec_cond_expr simplificationRichard Biener2-4/+16
The following avoids type inconsistencies in .COND_op generated by simplifications of VEC_COND_EXPRs. PR middle-end/112469 * match.pd (cond ? op a : b -> .COND_op (cond, a, b)): Add missing view_converts. * gcc.dg/torture/pr112469.c: New testcase.
2023-11-10amdgcn: Fix vector min/max ICEAndrew Stubbs1-1/+9
The DImode min/max instructions need a clobber that SImode does not, so add the special case to the reduction expand code. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_reduc_scalar): Add clobber to DImode min/max instructions.
2023-11-10LoongArch: Fix instruction name typo in lsx_vreplgr2vr_<lsxfmt_f> templateChenghui Pan1-1/+1
gcc/ChangeLog: * config/loongarch/lsx.md: Fix instruction name typo in lsx_vreplgr2vr_<lsxfmt_f> template.
2023-11-10RISC-V: Robustify vec_init pattern[NFC]Juzhe-Zhong1-1/+13
Although current GCC didn't cause ICE when I create FP16 vec_init case with -march=rv64gcv (no ZVFH), current vec_init pattern looks wrong. Since V_VLS FP16 predicate is TARGET_VECTOR_ELEN_FP_16, wheras vec_init needs vfslide1down/vfslide1up. It makes more sense to robustify the vec_init patterns which split them into 2 patterns (one is integer, the other is float) like other autovectorization patterns. gcc/ChangeLog: * config/riscv/autovec.md (vec_init<mode><vel>): Split patterns.
2023-11-10Revert "RISC-V: Support vec_init for trailing same element"Pan Li18-3179/+0
This reverts commit e7f4040d9d6ec40c48ada940168885d7dde03af9 as introduces some legacy vmv insns.
2023-11-10RISC-V: Support vec_init for trailing same elementPan Li18-0/+3179
This patch would like to support the vec_init for the trailing same element in the array. For example as below typedef double vnx16df __attribute__ ((vector_size (128))); __attribute__ ((noipa)) void f_vnx16df (double a, double b, double *out) { vnx16df v = {a, a, a, b, b, b, b, b, b, b, b, b, b, b, b, b}; *(vnx16df *) out = v; } Before this patch: f_vnx16df: vsetivli zero,16,e64,m8,ta,ma vfmv.v.f v8,fa0 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vfslide1down.vf v8,v8,fa1 vs8r.v v8,0(a0) ret After this patch: f_vnx16df: vsetivli zero,16,e64,m8,ta,ma vfmv.v.f v16,fa1 vfslide1up.vf v8,v16,fa0 vmv8r.v v16,v8 vfslide1up.vf v8,v16,fa0 vmv8r.v v16,v8 vfslide1up.vf v8,v16,fa0 vs8r.v v8,0(a0) ret gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vector_init_trailing_same_elem): New fun impl to expand the insn when trailing same elements. (expand_vec_init): Try trailing same elements when vec_init. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-same-tail-run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-7.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-8.c: New test. * gcc.target/riscv/rvv/autovec/vls/init-same-tail-9.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-11-10test: Fix FAIL of pr97428.c for RVVJuzhe-Zhong1-0/+1
gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97428.c: Add additional compile option for riscv.
2023-11-10RISC-V: Move cond_copysign from combine pattern to autovec patternJuzhe-Zhong2-22/+22
Since cond_copysign has been support into match.pd (middle-end). We don't need to support conditional copysign by RTL combine pass. Instead, we can support it by direct explicit cond_copysign optab. conditional copysign tests are already available in the testsuite. No need to add tests. gcc/ChangeLog: * config/riscv/autovec-opt.md (*cond_copysign<mode>): Remove. * config/riscv/autovec.md (cond_copysign<mode>): New pattern.
2023-11-10Internal-fn: Add FLOATN support for l/ll round and rint [PR/112432]Pan Li1-4/+4
The defined DEF_EXT_LIB_FLOATN_NX_BUILTINS functions should also have DEF_INTERNAL_FLT_FLOATN_FN instead of DEF_INTERNAL_FLT_FN for the FLOATN support. According to the glibc API and gcc builtin, we have below table for the FLOATN is supported or not. +---------+-------+-------------------------------------+ | | glibc | gcc: DEF_EXT_LIB_FLOATN_NX_BUILTINS | +---------+-------+-------------------------------------+ | iceil | N | N | | ifloor | N | N | | irint | N | N | | iround | N | N | | lceil | N | N | | lfloor | N | N | | lrint | Y | Y | | lround | Y | Y | | llceil | N | N | | llfllor | N | N | | llrint | Y | Y | | llround | Y | Y | +---------+-------+-------------------------------------+ This patch would like to support FLOATN for: 1. lrint 2. lround 3. llrint 4. llround The below tests are passed within this patch: 1. x86 bootstrap and regression test. 2. aarch64 regression test. 3. riscv regression tests. PR target/112432 gcc/ChangeLog: * internal-fn.def (LRINT): Add FLOATN support. (LROUND): Ditto. (LLRINT): Ditto. (LLROUND): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-11-09[committed] Improve single bit zero extraction on H8.Jeff Law1-2/+68
When zero extracting a single bit bitfield from bits 16..31 on the H8 we currently generate some pretty bad code. The fundamental issue is we can't shift efficiently and there's no trivial way to extract a single bit out of the high half word of an SImode value. What usually happens is we use a synthesized right shift to get the single bit into the desired position, then a bit-and to mask off everything we don't care about. The shifts are expensive, even using tricks like half and quarter word moves to implement shift-by-16 and shift-by-8. Additionally a logical right shift must clear out the upper bits which is redundant since we're going to mask things with &1 later. This patch provides a consistently better sequence for such extractions. The general form moves the high half into the low half, a bit extraction into C, clear the destination, then move C into the destination with a few special cases. This also avoids all the shenanigans for H8/SX which has a much more capable shifter. It's not single cycle, but it is reasonably efficient. This has been regression tested on the H8 without issues. Pushing to the trunk momentarily. jeff ps. Yes, supporting zero extraction of multi-bit fields might be improvable as well. But I've already spent more time on this than I can reasonably justify. gcc/ * config/h8300/combiner.md (single bit sign_extract): Avoid recently added patterns for H8/SX. (single bit zero_extract): New patterns.
2023-11-10Fix wrong code due to vec_merge + pcmp to blendvb splitter.liuhongt2-2/+110
gcc/ChangeLog: PR target/112443 * config/i386/sse.md (*avx2_pcmp<mode>3_4): Fix swap condition from LT to GT since there's not in the pattern. (*avx2_pcmp<mode>3_5): Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/pr112443.C: New test.
2023-11-10bpf: fix pseudo-c asm emitted for *mulsidi3_zeroextendJose E. Marchesi3-8/+26
This patch fixes the pseudo-c BPF assembly syntax used for *mulsidi3_zeroextend, which was being emitted as: rN *= wM instead of the proper way to denote a mul32 in pseudo-C syntax: wN *= wM Includes test. Tested in bpf-unknown-none-gcc target in x86_64-linux-gnu host. gcc/ChangeLog: * config/bpf/bpf.cc (bpf_print_register): Accept modifier code 'W' to force emitting register names using the wN form. * config/bpf/bpf.md (*mulsidi3_zeroextend): Force operands to always use wN written form in pseudo-C assembly syntax. gcc/testsuite/ChangeLog: * gcc.target/bpf/mulsidi3-zeroextend-pseudoc.c: New test.
2023-11-10bpf: testsuite: fix expected regexp in gcc.target/bpf/ldxdw.cJose E. Marchesi1-1/+1
gcc/testsuite/ChangeLog: * gcc.target/bpf/ldxdw.c: Fix regexp with expected result.
2023-11-09diagnostics: cleanups to diagnostic-show-locus.ccDavid Malcolm6-73/+104
Reduce implicit usage of line_table global, and move source printing to within diagnostic_context. gcc/ChangeLog: * diagnostic-show-locus.cc (layout::m_line_table): New field. (compatible_locations_p): Convert to... (layout::compatible_locations_p): ...this, replacing uses of line_table global with m_line_table. (layout::layout): Convert "richloc" param from a pointer to a const reference. Initialize m_line_table member. (layout::maybe_add_location_range): Replace uses of line_table global with m_line_table. Pass the latter to linemap_client_expand_location_to_spelling_point. (layout::print_leading_fixits): Pass m_line_table to affects_line_p. (layout::print_trailing_fixits): Likewise. (gcc_rich_location::add_location_if_nearby): Update for change to layout ctor params. (diagnostic_show_locus): Convert to... (diagnostic_context::maybe_show_locus): ...this, converting richloc param from a pointer to a const reference. Make "loc" const. Split out printing part of function to... (diagnostic_context::show_locus): ...this. (selftest::test_offset_impl): Update for change to layout ctor params. (selftest::test_layout_x_offset_display_utf8): Likewise. (selftest::test_layout_x_offset_display_tab): Likewise. (selftest::test_tab_expansion): Likewise. * diagnostic.h (diagnostic_context::maybe_show_locus): New decl. (diagnostic_context::show_locus): New decl. (diagnostic_show_locus): Convert from a decl to an inline function. * gdbinit.in (break-on-diagnostic): Update from a breakpoint on diagnostic_show_locus to one on diagnostic_context::maybe_show_locus. * genmatch.cc (linemap_client_expand_location_to_spelling_point): Add "set" param and use it in place of line_table global. * input.cc (expand_location_1): Likewise. (expand_location): Update for new param of expand_location_1. (expand_location_to_spelling_point): Likewise. (linemap_client_expand_location_to_spelling_point): Add "set" param and use it in place of line_table global. * tree-diagnostic-path.cc (event_range::print): Pass line_table for new param of linemap_client_expand_location_to_spelling_point. libcpp/ChangeLog: * include/line-map.h (rich_location::get_expanded_location): Make const. (rich_location::get_line_table): New accessor. (rich_location::m_line_table): Make the pointer be const. (rich_location::m_have_expanded_location): Make mutable. (rich_location::m_expanded_location): Likewise. (fixit_hint::affects_line_p): Add const line_maps * param. (linemap_client_expand_location_to_spelling_point): Likewise. * line-map.cc (rich_location::get_expanded_location): Make const. Pass m_line_table to linemap_client_expand_location_to_spelling_point. (rich_location::maybe_add_fixit): Likewise. (fixit_hint::affects_line_p): Add set param and pass to linemap_client_expand_location_to_spelling_point. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-11-09Add missing declaration of get_restrict in C++ interfaceGuillaume Gomez1-0/+1
gcc/jit/ChangeLog: * libgccjit++.h: