aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2022-02-25testsuite: Fix ASAN error [PR104687]Martin Liska1-1/+1
PR testsuite/104687 gcc/testsuite/ChangeLog: * gcc.dg/lto/20090717_0.c: Fix asan error.
2022-02-25arc: Fail conditional move expand patternsClaudiu Zissulescu2-6/+22
If the movcc comparison is not valid it triggers an assert in the current implementation. This behavior is not needed as we can FAIL the movcc expand pattern. gcc/ * config/arc/arc.cc (gen_compare_reg): Return NULL_RTX if the comparison is not valid. * config/arc/arc.md (movsicc): Fail if comparison is not valid. (movdicc): Likewise. (movsfcc): Likewise. (movdfcc): Likewise. Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
2022-02-25tree-optimization/103037 - PRE simplifying valueized expressionsRichard Biener4-22/+68
This fixes a long-standing issue in PRE where we track valueized expressions in our expression sets that we use for PHI translation, code insertion but also feed into match-and-simplify via vn_nary_simplify. But that's not what is expected from vn_nary_simplify or match-and-simplify which assume we are simplifying with operands available at the point of the expression so they can use contextual information on the SSA names like ranges. While the VN side was updated to ensure this with the rewrite to RPO VN, thereby removing all workarounds that nullified such contextual info on all SSA names, the PRE side still suffers from this. The following patch tries to apply minimal surgery at this point and makes PRE track un-valueized expressions in the expression sets but only for the NARY kind (both NAME and CONSTANT do not suffer from this issue), leaving the REFERENCE kind alone. The REFERENCE kind is important when trying to remove the workarounds still in place in compute_avail for code hoisting, but that's a separate issue and we have a working workaround in place. Doing this comes at the cost of duplicating the VN IL on the PRE side for NARY and eventually some extra overhead for translated expressions that is difficult to assess. 2022-02-25 Richard Biener <rguenther@suse.de> PR tree-optimization/103037 * tree-ssa-sccvn.h (alloc_vn_nary_op_noinit): Declare. (vn_nary_length_from_stmt): Likewise. (init_vn_nary_op_from_stmt): Likewise. (vn_nary_op_compute_hash): Likewise. * tree-ssa-sccvn.cc (alloc_vn_nary_op_noinit): Export. (vn_nary_length_from_stmt): Likewise. (init_vn_nary_op_from_stmt): Likewise. (vn_nary_op_compute_hash): Likewise. * tree-ssa-pre.cc (pre_expr_obstack): New obstack. (get_or_alloc_expr_for_nary): Pass in the value-id to use, (re-)compute the hash value and if the expression is not found allocate it from pre_expr_obstack. (phi_translate_1): Do not insert the NARY found in the VN tables but build a PRE expression from the valueized NARY with the value-id we eventually found. (find_or_generate_expression): Assert we have an entry for constant values. (compute_avail): Insert not valueized expressions into EXP_GEN using the value-id from the VN tables. (init_pre): Allocate pre_expr_obstack. (fini_pre): Free pre_expr_obstack. * gcc.dg/torture/pr103037.c: New testcase.
2022-02-25i386: Use a new temp slot kind for splitter to floatdi<mode>2_i387_with_xmm ↵Jakub Jelinek3-3/+34
[PR104674] As mentioned in the PR, the following testcase is miscompiled for similar reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various places during expansion and during expansion we can guarantee that the lifetime of those temporary slot doesn't overlap. But the following splitter uses SLOT_TEMP too and in between expansion and split1 there is a possibility that something extends the lifetime of SLOT_TEMP created slots across an instruction that will be split by this splitter. The following patch fixes it by using a new temp slot kind to make sure it doesn't reuse a SLOT_TEMP that could be live across the instruction. 2022-02-25 Jakub Jelinek <jakub@redhat.com> PR target/104674 * config/i386/i386.h (enum ix86_stack_slot): Add SLOT_FLOATxFDI_387. * config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm): Use SLOT_FLOATxFDI_387 rather than SLOT_TEMP. * gcc.target/i386/pr104674.c: New test.
2022-02-25warning-control: Comment spelling fixJakub Jelinek1-1/+1
This fixes a spelling mistake I found while looking at warning-control implementation. 2022-02-25 Jakub Jelinek <jakub@redhat.com> * warning-control.cc (get_nowarn_spec): Comment spelling fix.
2022-02-25internal-fn: Call do_pending_stack_adjust in expand_SPACESHIP [PR104679]Jakub Jelinek2-0/+24
The following testcase is miscompiled on ia32 at -O2, because when expand_SPACESHIP is called, we have pending stack adjustment from the foo call right before it. Now, ix86_expand_fp_spaceship uses emit_jump_insn several times but then emit_jump also several times. While emit_jump_insn doesn't do do_pending_stack_adjust (), emit_jump does, so we end up with: ... 8: call [`_Z3foodl'] argc:0x10 REG_CALL_DECL `_Z3foodl' 9: r88:DF=[`a'] 10: r89:HI=unspec[cmp(r88:DF,0.0)] 25 11: flags:CC=unspec[r89:HI] 26 12: pc={(unordered(flags:CCFP,0))?L27:pc} REG_BR_PROB 536868 66: NOTE_INSN_BASIC_BLOCK 4 13: pc={(uneq(flags:CCFP,0))?L19:pc} REG_BR_PROB 214748364 67: NOTE_INSN_BASIC_BLOCK 5 14: pc={(flags:CCFP>0)?L23:pc} REG_BR_PROB 536870916 68: NOTE_INSN_BASIC_BLOCK 6 15: r86:SI=0xffffffffffffffff 16: {sp:SI=sp:SI+0x10;clobber flags:CC;} REG_ARGS_SIZE 0 17: pc=L29 18: barrier 19: L19: 69: NOTE_INSN_BASIC_BLOCK 7 ... The sp += 16 pending stuck adjust was emitted in the middle of the sequence and is effective only for the single case of the 4 possibilities where .SPACESHIP returns -1, in all other cases the stack isn't adjusted and so we ICE during dwarf2cfi. Now, we could either call do_pending_stack_adjust in ix86_expand_fp_spaceship, or use there calls that actually don't call do_pending_stack_adjust (but having the stack adjustment across branches is generally undesirable), or we can call it in expand_SPACESHIP for all targets (note, just i386 currently implements it). I chose the generic code because e.g. expand_{addsub,neg,mul}_overflow in the same file also call do_pending_stack_adjust in internal-fn.cc for the same reasons, that it is expected that most if not all targets will expand those through jumps and we don't want all of the targets to need to deal with that. 2022-02-25 Jakub Jelinek <jakub@redhat.com> PR middle-end/104679 * internal-fn.cc (expand_SPACESHIP): Call do_pending_stack_adjust. * g++.dg/torture/pr104679.C: New test.
2022-02-25match.pd: Don't create BIT_NOT_EXPRs for COMPLEX_TYPE [PR104675]Jakub Jelinek3-1/+50
We don't support BIT_{AND,IOR,XOR,NOT}_EXPR on complex types, &/|/^ are just rejected for them, and ~ is parsed as CONJ_EXPR. So, we should avoid simplifications which turn valid complex type expressions into something that will ICE during expansion. 2022-02-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/104675 * match.pd (-A - 1 -> ~A, -1 - A -> ~A): Don't simplify for COMPLEX_TYPE. * gcc.dg/pr104675-1.c: New test. * gcc.dg/pr104675-2.c: New test.
2022-02-24Revert commit r12-5852-g50e8b0c9bca6cdc57804f860ec5311b641753fbbAlexandre Oliva1-1/+1
The patch for PR103302 caused PR104121, and extended the live ranges of LRA reloads. for gcc/ChangeLog PR target/104121 PR target/103302 * expr.cc (emit_move_multi_word): Restore clobbers during LRA.
2022-02-24Add testcase from PR103845Alexandre Oliva1-0/+29
This problem was already fixed as part of PR104263: the abnormal edge that remained from before inlining didn't make sense after inlining. So this patch adds only the testcase. for gcc/testsuite/ChangeLog PR tree-optimization/103845 PR tree-optimization/104263 * gcc.dg/pr103845.c: New.
2022-02-24Cope with NULL dw_cfi_cfa_locAlexandre Oliva2-0/+24
In def_cfa_0, we may set the 2nd operand's dw_cfi_cfa_loc to NULL, but then cfi_oprnd_equal_p calls cfa_equal_p with a NULL dw_cfa_location*. This patch aranges for us to tolerate NULL dw_cfi_cfa_loc. for gcc/ChangeLog PR middle-end/104540 * dwarf2cfi.cc (cfi_oprnd_equal_p): Cope with NULL dw_cfi_cfa_loc. for gcc/testsuite/ChangeLog PR middle-end/104540 * g++.dg/pr104540.C: New.
2022-02-24Copy EH phi args for throwing hardened comparesAlexandre Oliva2-3/+45
When we duplicate a throwing compare for hardening, the EH edge from the original compare gets duplicated for the inverted compare, but we failed to adjust any PHI nodes in the EH block. This patch adds the needed adjustment, copying the PHI args from those of the preexisting edge. for gcc/ChangeLog PR tree-optimization/103856 * gimple-harden-conditionals.cc (non_eh_succ_edge): Enable the eh edge to be requested through an extra parameter. (pass_harden_compares::execute): Copy PHI args in the EH dest block for the new EH edge added for the inverted compare. for gcc/testsuite/ChangeLog PR tree-optimization/103856 * g++.dg/pr103856.C: New.
2022-02-25Daily bump.GCC Administrator5-1/+95
2022-02-24Fix attr-retain-* tescases for 32-bit PowerPC.Pat Haugen2-0/+4
PR testsuite/100407 gcc/testsuite/ * gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC. * gcc.c-torture/compile/attr-retain-2.c: Likewise.
2022-02-24Fortran: frontend code for F2018 QUIET specifier to STOP and ERROR STOPHarald Anlauf8-13/+189
Fortran 2018 allows for a QUIET specifier to the STOP and ERROR STOP statements. Whilst the gfortran library code provides support for this specifier for quite some time, the frontend implementation was missing. gcc/fortran/ChangeLog: PR fortran/84519 * dump-parse-tree.cc (show_code_node): Dump QUIET specifier when present. * match.cc (gfc_match_stopcode): Implement parsing of F2018 QUIET specifier. F2018 stopcodes may have non-default integer kind. * resolve.cc (gfc_resolve_code): Add checks for QUIET argument. * trans-stmt.cc (gfc_trans_stop): Pass QUIET specifier to call of library function. gcc/testsuite/ChangeLog: PR fortran/84519 * gfortran.dg/stop_1.f90: New test. * gfortran.dg/stop_2.f: New test. * gfortran.dg/stop_3.f90: New test. * gfortran.dg/stop_4.f90: New test.
2022-02-24RISC-V: Document the degree of position independence that medany affordsPalmer Dabbelt1-0/+4
The code generated by -mcmodel=medany is defined to be position-independent, but is not guaranteed to function correctly when linked into position-independent executables or libraries. See the recent discussion at the psABI specification [1] for more details. It would be better to reject these invalid sequences when linking, but as pointed out in a recent LD bug [2] there may be some compatibility issues related to the PCREL_HI20 relocations used to initialize GP. Given the complexity here it's unlikely we'll be able to reject these sequences any time soon, so instead just document that these may not work. [1]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/245 [2]: https://sourceware.org/bugzilla/show_bug.cgi?id=28789 gcc/ChangeLog: * doc/invoke.texi (RISC-V -mcmodel=medany): Document the degree of position independence that -mcmodel=medany affords. Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-02-24Fix clang warning in pt.ccMartin Liska1-1/+1
Fixes: gcc/cp/pt.cc:13755:23: warning: suggest braces around initialization of subobject [-Wmissing-braces] tree_vec_map in = { fn, nullptr }; gcc/cp/ChangeLog: * pt.cc (defarg_insts_for): Use braces for subobject.
2022-02-24bpf: do not --enable-gcov for bpf-*-* targetsJose E. Marchesi2-4/+20
This patch changes the build machinery in order to disable the build of GCOV (both compiler and libgcc) in bpf-*-* targets. The reason for this change is that BPF is (currently) too restricted in order to support the coverage instrumentalization. Tested in bpf-unknown-none and x86_64-linux-gnu targets. 2022-02-23 Jose E. Marchesi <jose.marchesi@oracle.com> gcc/ChangeLog PR target/104656 * configure.ac: --disable-gcov if targetting bpf-*. * configure: Regenerate. libgcc/ChangeLog PR target/104656 * configure.ac: --disable-gcov if targetting bpf-*. * configure: Regenerate.
2022-02-24tree-optimization/104676 - free nb_iterations after loop distributionRichard Biener2-1/+36
Loop distribution can release SSA names used in nb_iterations, make sure to release those. 2022-02-24 Richard Biener <rguenther@suse.de> PR tree-optimization/104676 * tree-loop-distribution.cc (loop_distribution::execute): Do a full scev_reset. * gcc.dg/torture/pr104676.c: New testcase.
2022-02-24sccvn: Fix visit_reference_op_call value numbering of vdefs [PR104601]Jakub Jelinek2-7/+51
The following testcase is miscompiled, because -fipa-pure-const discovers that bar is const, but when sccvn during fre3 sees # .MEM_140 = VDEF <.MEM_96> *__pred$__d_43 = _50 (_49); where _50 value numbers to &bar, it value numbers .MEM_140 to vuse_ssa_val (gimple_vuse (stmt)). For const/pure calls that return a SSA_NAME (or don't have lhs) that is fine, those calls don't store anything, but if the lhs is present and not an SSA_NAME, value numbering the vdef to anything but itself means that e.g. walk_non_aliased_vuses won't consider the call, but the call acts as a store to its lhs. When it is ignored, sccvn will return whatever has been stored to the lhs earlier. I've bootstrapped/regtested an earlier version of this patch, which did the if (!lhs && gimple_call_lhs (stmt)) changed |= set_ssa_val_to (vdef, vdef); part before else if (vnresult->result_vdef), and that regressed +FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo \\\\(" 1 +FAIL: gcc.dg/pr51879-16.c scan-tree-dump-times pre "foo2 \\\\(" 1 so this updated patch uses result_vdef there as before and only otherwise (which I think must be the const/pure case) decides based on whether the lhs is non-SSA_NAME. 2022-02-24 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/104601 * tree-ssa-sccvn.cc (visit_reference_op_call): For calls with non-SSA_NAME lhs value number vdef to itself instead of e.g. the vuse value number. * g++.dg/torture/pr104601.C: New test.
2022-02-24[nvptx] Add missing t-omp-device isasTom de Vries2-2/+8
In t-omp-device we list isas that can be used in omp declare variant like so: ... #pragma omp declare variant (f30) match (device={isa("sm_30")}) ... and in nvptx_omp_device_kind_arch_isa we handle them. Update both to reflect the current list of isas. Tested on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Handle sm_70, sm_75 and sm_80. * config/nvptx/t-omp-device: Add sm_53, sm_70, sm_75 and sm_80. Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
2022-02-24[nvptx] Add shf.{l,r}.wrap insnTom de Vries3-0/+59
Ptx contains funnel shift operations shf.l.wrap and shf.r.wrap that can be used to implement 32-bit left or right rotate. Add define_insns rotlsi3 and rotrsi3. Tested on nvptx. gcc/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.md (define_insn "rotlsi3", define_insn "rotrsi3"): New define_insn. gcc/testsuite/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/rotate-run.c: New test. * gcc.target/nvptx/rotate.c: New test.
2022-02-24[nvptx] Fix dummy location in gen_commentTom de Vries1-1/+1
I committed "[nvptx] Add -mptx-comment", but tested it in combination with the proposed "[final] Handle compiler-generated asm insn" ( https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590721.html ), so by itself the commit introduced some regressions: ... FAIL: gcc.dg/20020426-2.c (internal compiler error: Segmentation fault) FAIL: gcc.dg/analyzer/zlib-3.c (internal compiler error: Segmentation fault) FAIL: gcc.dg/pr101223.c (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr80764.c -O2 (internal compiler error: Segmentation fault) ... There are due to cfun->function_start_locus == 0. Fix these by using DECL_SOURCE_LOCATION (cfun->decl) instead. Tested on nvptx. gcc/ChangeLog: 2022-02-23 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (gen_comment): Use DECL_SOURCE_LOCATION (cfun->decl) instead of cfun->function_start_locus.
2022-02-24Fix typo in <code>v1ti3.liuhongt2-2/+16
For evex encoding vp{xor,or,and}, suffix is needed. Or there would be an error for vpxor %xmm0, %xmm31, %xmm1 Error: unsupported instruction `vpxor' gcc/ChangeLog: * config/i386/sse.md (<code>v1ti3): Add suffix and replace isa attr of alternative 2 from avx to avx512vl. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-logicsuffix-1.c: New test.
2022-02-24Daily bump.GCC Administrator5-1/+125
2022-02-23analyzer: handle __attribute__((const)) [PR104434]David Malcolm13-6/+954
When testing -fanalyzer on openblas-0.3, I noticed slightly over 2000 false positives from -Wanalyzer-malloc-leak on code like this: if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' ) ) { pt_t = (lapack_complex_float*) LAPACKE_malloc( sizeof(lapack_complex_float) * ldpt_t * MAX(1,n) ); [...snip...] } [...snip lots of code...] if( LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'q' ) ) { LAPACKE_free( pt_t ); } where LAPACKE_lsame is a char-comparison function implemented in a different TU. The analyzer naively considers the execution path where: LAPACKE_lsame( vect, 'b' ) || LAPACKE_lsame( vect, 'p' ) is true at the malloc guard, but then false at the free guard, which is thus a memory leak. This patch makes -fanalyer respect __attribute__((const)), so that the analyzer treats such functions as returning the same value when given the same inputs. I've filed https://github.com/xianyi/OpenBLAS/issues/3543 suggesting that LAPACKE_lsame be annotated with __attribute__((const)); with that, and with this patch, the false positives seem to be fixed. gcc/analyzer/ChangeLog: PR analyzer/104434 * analyzer.h (class const_fn_result_svalue): New decl. * region-model-impl-calls.cc (call_details::get_manager): New. * region-model-manager.cc (region_model_manager::get_or_create_const_fn_result_svalue): New. (region_model_manager::log_stats): Log m_const_fn_result_values_map. * region-model.cc (const_fn_p): New. (maybe_get_const_fn_result): New. (region_model::on_call_pre): Handle fndecls with __attribute__((const)) by calling the above rather than making a conjured_svalue. * region-model.h (visitor::visit_const_fn_result_svalue): New. (region_model_manager::get_or_create_const_fn_result_svalue): New decl. (region_model_manager::const_fn_result_values_map_t): New typedef. (region_model_manager::m_const_fn_result_values_map): New field. (call_details::get_manager): New decl. * svalue.cc (svalue::cmp_ptr): Handle SK_CONST_FN_RESULT. (const_fn_result_svalue::dump_to_pp): New. (const_fn_result_svalue::dump_input): New. (const_fn_result_svalue::accept): New. * svalue.h (enum svalue_kind): Add SK_CONST_FN_RESULT. (svalue::dyn_cast_const_fn_result_svalue): New. (class const_fn_result_svalue): New. (is_a_helper <const const_fn_result_svalue *>::test): New. (template <> struct default_hash_traits<const_fn_result_svalue::key_t>): New. gcc/testsuite/ChangeLog: PR analyzer/104434 * gcc.dg/analyzer/attr-const-1.c: New test. * gcc.dg/analyzer/attr-const-2.c: New test. * gcc.dg/analyzer/attr-const-3.c: New test. * gcc.dg/analyzer/pr104434-const.c: New test. * gcc.dg/analyzer/pr104434-nonconst.c: New test. * gcc.dg/analyzer/pr104434.h: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-02-23c++: Add new test [PR79493]Marek Polacek1-0/+7
A nice side effect of r12-1822 was improving the diagnostic we emit for the following test. PR c++/79493 gcc/testsuite/ChangeLog: * g++.dg/diagnostic/undeclared1.C: New test.
2022-02-23c++: Add fixed test [PR70077]Marek Polacek1-0/+17
Fixed with r10-1280. PR c++/70077 gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept76.C: New test.
2022-02-23middle-end/104644 - recursion with bswap match.pd patternRichard Biener4-14/+23
The following patch avoids infinite recursion during generic folding. The (cmp (bswap @0) INTEGER_CST@1) simplification relies on (bswap @1) actually being simplified, if it is not simplified, we just move the bswap from one operand to the other and if @0 is also INTEGER_CST, we apply the same rule next. The reason why bswap @1 isn't folded to INTEGER_CST is that the INTEGER_CST has TREE_OVERFLOW set on it and fold-const-call.cc predicate punts in such cases: static inline bool integer_cst_p (tree t) { return TREE_CODE (t) == INTEGER_CST && !TREE_OVERFLOW (t); } The patch uses ! modifier to ensure the bswap is simplified and extends support to GENERIC by means of requiring !EXPR_P which is not perfect but a conservative approximation. 2022-02-22 Richard Biener <rguenther@suse.de> PR tree-optimization/104644 * doc/match-and-simplify.texi: Amend ! documentation. * genmatch.cc (expr::gen_transform): Code-generate ! support for GENERIC. (parser::parse_expr): Allow ! for GENERIC. * match.pd (cmp (bswap @0) INTEGER_CST@1): Use ! modifier on bswap. * gcc.dg/pr104644.c: New test. Co-Authored-by: Jakub Jelinek <jakub@redhat.com>
2022-02-23Support SSA name declarations with pointer typeRichard Biener3-6/+66
Currently we fail to parse int * _3; as SSA name and instead get a VAR_DECL because of the way the C frontends declarator specs work. That causes havoc if those supposed SSA names are used in PHIs or in other places where VAR_DECLs are not allowed. The following fixes the pointer case in an ad-hoc way - for more complex type declarators we probably have to find a way to re-use the C frontend grokdeclarator without actually creating a VAR_DECL there (or maybe make it create an SSA name). Pointers appear too often to be neglected though, thus the following ad-hoc fix for this. This also adds verification that we do not end up with SSA names without definitions as can happen when reducing a GIMPLE testcase. Instead of working through segfaults one-by-one we emit errors for all of those at once now. 2022-02-23 Richard Biener <rguenther@suse.de> gcc/c * gimple-parser.cc (c_parser_parse_gimple_body): Diagnose SSA names without definition. (c_parser_gimple_declaration): Handle pointer typed SSA names. gcc/testsuite/ * gcc.dg/gimplefe-49.c: New testcase. * gcc.dg/gimplefe-error-13.c: Likewise.
2022-02-23tree-optimization/101636 - CTOR vectorization ICERichard Biener3-6/+135
The following fixes an ICE when vectorizing the defs of a CTOR results in a different vector type than expected. That can happen with AARCH64 SVE and a fixed vector length as noted in r10-5979 and on x86 with AVX512 mask CTORs and trying to re-vectorize using SSE as shown in this bug. The fix is simply to reject the vectorization when it didn't produce the desired type. 2022-02-23 Richard Biener <rguenther@suse.de> PR tree-optimization/101636 * tree-vect-slp.cc (vect_print_slp_tree): Dump the vector type of the node. (vect_slp_analyze_operations): Make sure the CTOR is vectorized with an expected type. (vectorize_slp_instance_root_stmt): Revert r10-5979 fix. * gcc.target/i386/pr101636.c: New testcase. * c-c++-common/torture/pr101636.c: Likewise.
2022-02-23warn-recursion: Don't warn for __builtin_calls in gnu_inline extern inline ↵Jakub Jelinek4-5/+72
functions [PR104633] The first two testcases show different ways how e.g. the glibc _FORTIFY_SOURCE wrappers are implemented, and on Winfinite-recursion-3.c the new -Winfinite-recursion warning emits a false positive warning. It is a false positive because when a builtin with 2 names is called through the __builtin_ name (but not all builtins have a name prefixed exactly like that) from extern inline function with gnu_inline semantics, it doesn't mean the compiler will ever attempt to use the user inline wrapper for the call, the __builtin_ just does what the builtin function is expected to do and either expands into some compiler generated code, or if the compiler decides to emit a call it will use an actual definition of the function, but that is not the extern inline gnu_inline function which is never emitted out of line. Compared to that, in Winfinite-recursion-5.c the extern inline gnu_inline wrapper calls the builtin by the same name as the function's name and in that case it is infinite recursion, we actuall try to inline the recursive call and also error because the recursion is infinite during inlining; without always_inline we wouldn't error but it is still infinite recursion, the user has no control on how many recursive calls we actually inline. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR c/104633 * gimple-warn-recursion.cc (pass_warn_recursion::find_function_exit): Don't warn about calls to corresponding builtin from extern inline gnu_inline wrappers. * gcc.dg/Winfinite-recursion-3.c: New test. * gcc.dg/Winfinite-recursion-4.c: New test. * gcc.dg/Winfinite-recursion-5.c: New test.
2022-02-23nvptx: Back-end portion of a fix for PR target/104489.Roger Sayle1-1/+2
This one line fix/tweak is the back-end specific change for a fix for PR target/104489, that allows the ISA for GCC's nvptx backend to be bumped to sm_53. The machine-independent middle-end pieces were posted here: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html 2022-02-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/104489 * config/nvptx/nvptx.md (*movhf_insn): Add subregs_ok attribute.
2022-02-23arm: Fix typo in auto-vectorized MVE comparisonsChristophe Lyon1-2/+2
I made a last minute renaming of mve_const_bool_vec_to_hi () into mve_bool_vec_to_const () and forgot to update the call sites in vfp.md accordingly. Committed as obvious. 2022-02-23 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Fix typo.
2022-02-23x86: Update Intel architectures ISA support in documentation.Cui,Lili1-87/+98
Since the ISA supported by Intel architectures in the documentation are inconsistent with the actual, modify them all. gcc/Changelog: * doc/invoke.texi: Update documents for Intel architectures.
2022-02-23Daily bump.GCC Administrator4-1/+495
2022-02-22libgo: update README.gccIan Lance Taylor1-1/+1
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/387514
2022-02-22rs6000: Move g++.dg/ext powerpc tests to g++.targetPaul A. Clarke28-32/+32
Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no longer required. 2021-02-22 Paul A. Clarke <pc@us.ibm.com> gcc/testsuite * g++.dg/ext/altivec-1.C: Move to g++.target/powerpc, adjust dg directives. * g++.dg/ext/altivec-2.C: Likewise. * g++.dg/ext/altivec-3.C: Likewise. * g++.dg/ext/altivec-4.C: Likewise. * g++.dg/ext/altivec-5.C: Likewise. * g++.dg/ext/altivec-6.C: Likewise. * g++.dg/ext/altivec-7.C: Likewise. * g++.dg/ext/altivec-8.C: Likewise. * g++.dg/ext/altivec-9.C: Likewise. * g++.dg/ext/altivec-10.C: Likewise. * g++.dg/ext/altivec-11.C: Likewise. * g++.dg/ext/altivec-12.C: Likewise. * g++.dg/ext/altivec-13.C: Likewise. * g++.dg/ext/altivec-14.C: Likewise. * g++.dg/ext/altivec-15.C: Likewise. * g++.dg/ext/altivec-16.C: Likewise. * g++.dg/ext/altivec-17.C: Likewise. * g++.dg/ext/altivec-18.C: Likewise. * g++.dg/ext/altivec-cell-1.C: Likewise. * g++.dg/ext/altivec-cell-2.C: Likewise. * g++.dg/ext/altivec-cell-3.C: Likewise. * g++.dg/ext/altivec-cell-4.C: Likewise. * g++.dg/ext/altivec-cell-5.C: Likewise. * g++.dg/ext/altivec-types-1.C: Likewise. * g++.dg/ext/altivec-types-2.C: Likewise. * g++.dg/ext/altivec-types-3.C: Likewise. * g++.dg/ext/altivec-types-4.C: Likewise. * g++.dg/ext/undef-bool-1.C: Likewise.
2022-02-22Fortran: skip compile-time shape check if constructor shape is not knownHarald Anlauf2-0/+30
gcc/fortran/ChangeLog: PR fortran/104619 * resolve.cc (resolve_structure_cons): Skip shape check if shape of constructor cannot be determined at compile time. gcc/testsuite/ChangeLog: PR fortran/104619 * gfortran.dg/derived_constructor_comps_7.f90: New test.
2022-02-22Restore bootstrap on x86_64-pc-linux-gnuRoger Sayle1-8/+11
This patch resolves the bootstrap failure on x86_64-pc-linux-gnu. 2022-02-22 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore bootstrap.
2022-02-22Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref'Thomas Schwinge4-39/+18
Clean-up for commit e2a58ed6dc5293602d0d168475109caa81ad0f0d "openacc: Middle-end worker-partitioning support": as of commit 2a3f9f6532bb21d8ab6f16fbe9ee603f6b1405f2 "openacc: Shared memory layout optimisation", we're no longer running into the vectorizer ICEs for '!ADDR_SPACE_GENERIC_P'. gcc/ * omp-low.cc (omp_build_component_ref): Move function... * omp-general.cc (omp_build_component_ref): ... here. Remove 'static'. * omp-general.h (omp_build_component_ref): Declare function. * omp-oacc-neuter-broadcast.cc (oacc_build_component_ref): Remove function. (build_receiver_ref, build_sender_ref): Call 'omp_build_component_ref' instead.
2022-02-22Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t'Thomas Schwinge1-8/+4
Now that I've resolved GCC 'hash_map' issues (a while ago already), we may further simplify this after commit 049eda8274b7394523238b17ab12c3e2889f253e "Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'": as 'hash_map' Value, directly store 'field_map_t' objects, not pointers to manually allocated 'field_map_t' objects. gcc/ * omp-oacc-neuter-broadcast.cc (record_field_map_t): Further simplify. Adjust all users.
2022-02-22rs6000: Fix GC on rs6000.c decls for atomic handling (PR88134)Segher Boessenkool1-6/+5
In PR88134 it is pointed out that we do not have GTY markup for some variables we use for atomic. So, let's add that. 2022-02-22 Segher Boessenkool <segher@kernel.crashing.org> PR target/88134 * config/rs6000/rs6000.cc (atomic_hold_decl, atomic_clear_decl, atomic_update_decl): Add GTY markup.
2022-02-22arm: Add VPR_REG to ALL_REGSChristophe Lyon1-1/+1
VPR_REG should be part of ALL_REGS, this patch fixes this omission. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.
2022-02-22arm: Convert more MVE/CDE builtins to predicate qualifiersChristophe Lyon3-78/+58
This patch covers a few non-load/store builtins where we do not use the <mode> iterator and thus we cannot use <MVE_vpred>. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use predicate. (CX_BINARY_UNONE_QUALIFIERS): Likewise. (CX_TERNARY_UNONE_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete. (QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete. (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete. * config/arm/arm_mve_builtins.def: Use predicated qualifiers. * config/arm/mve.md: Use VxBI instead of HI.
2022-02-22arm: Convert more load/store MVE builtins to predicate qualifiersChristophe Lyon2-43/+43
This patch covers a few builtins where we do not use the <mode> iterator and thus we cannot use <MVE_vpred>. For v2di instructions, we keep the HI mode for predicates. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate qualifier. (STRSBU_P_QUALIFIERS): Likewise. (LDRGBS_Z_QUALIFIERS): Likewise. (LDRGBU_Z_QUALIFIERS): Likewise. (LDRGBWBXU_Z_QUALIFIERS): Likewise. (LDRGBWBS_Z_QUALIFIERS): Likewise. (LDRGBWBU_Z_QUALIFIERS): Likewise. (STRSBWBS_P_QUALIFIERS): Likewise. (STRSBWBU_P_QUALIFIERS): Likewise. * config/arm/mve.md: Use VxBI instead of HI.
2022-02-22arm: Convert more MVE builtins to predicate qualifiersChristophe Lyon3-543/+569
This patch covers all builtins that have an HI operand and use the <mode> iterator, thus we can replace HI whe <MVE_vpred>. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (TERNOP_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ... (TERNOP_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this. (TERNOP_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (TERNOP_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (TERNOP_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ... (TERNOP_NONE_NONE_IMM_PRED_QUALIFIERS): ... this. (TERNOP_NONE_NONE_UNONE_UNONE_QUALIFIERS): Change to ... (TERNOP_NONE_NONE_UNONE_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS): ... this. (QUADOP_NONE_NONE_NONE_NONE_PRED_QUALIFIERS): New. (QUADOP_NONE_NONE_NONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New. (QUADOP_UNONE_UNONE_NONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_NONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_NONE_NONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_NONE_NONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (QUADOP_UNONE_UNONE_UNONE_NONE_UNONE_QUALIFIERS): Change to ... (QUADOP_UNONE_UNONE_UNONE_NONE_PRED_QUALIFIERS): ... this. (STRS_P_QUALIFIERS): Use predicate qualifier. (STRU_P_QUALIFIERS): Likewise. (STRSU_P_QUALIFIERS): Likewise. (STRSS_P_QUALIFIERS): Likewise. (LDRGS_Z_QUALIFIERS): Likewise. (LDRGU_Z_QUALIFIERS): Likewise. (LDRS_Z_QUALIFIERS): Likewise. (LDRU_Z_QUALIFIERS): Likewise. (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_UNONE_QUALIFIERS): Change to ... (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS): ... this. (BINOP_NONE_NONE_PRED_QUALIFIERS): New. (BINOP_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm_mve_builtins.def: Use new predicated qualifiers. * config/arm/mve.md: Use MVE_VPRED instead of HI.
2022-02-22arm: Convert remaining MVE vcmp builtins to predicate qualifiersChristophe Lyon3-144/+145
This is mostly a mechanical change, only tested by the intrinsics expansion tests. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS): Delete. (TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ... (TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this. (TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new predicated qualifiers. * config/arm/mve.md (mve_vcmp<mve_cmp_op>q_n_<mode>) (mve_vcmp*q_m_f<mode>): Use MVE_VPRED instead of HI.
2022-02-22arm: Fix vcond_mask expander for MVE (PR target/100757)Christophe Lyon12-130/+227
The problem in this PR is that we call VPSEL with a mask of vector type instead of HImode. This happens because operand 3 in vcond_mask is the pre-computed vector comparison and has vector type. This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE, returning the appropriate VxBI mode when targeting MVE. In turn, this implies implementing vec_cmp<mode><MVE_vpred>, vec_cmpu<mode><MVE_vpred> and vcond_mask_<mode><MVE_vpred>, and we can move vec_cmp<mode><v_cmp_result>, vec_cmpu<mode><mode> and vcond_mask_<mode><v_cmp_result> back to neon.md since they are not used by MVE anymore. The new *<MVE_vpred> patterns listed above are implemented in mve.md since they are only valid for MVE. However this may make maintenance/comparison more painful than having all of them in vec-common.md. In the process, we can get rid of the recently added vcond_mve parameter of arm_expand_vector_compare. Compared to neon.md's vcond_mask_<mode><v_cmp_result> before my "arm: Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH iterator added in r12-835 (to have V4HF/V8HF support), as well as the (!<Is_float_mode> || flag_unsafe_math_optimizations) condition which was not present before r12-834 although SF modes were enabled by VDQW (I think this was a bug). Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no longer need to generate vpsel with vectors of 0 and 1: the masks are now merged via scalar 'ands' instructions operating on 16-bit masks after converting the boolean vectors. In addition, this patch fixes a problem in arm_expand_vcond() where the result would be a vector of 0 or 1 instead of operand 1 or 2. Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new arm_mve effective target. Reducing the number of iterations in pr100757-3.c from 32 to 8, we generate the code below: float a[32]; float fn1(int d) { float c = 4.0f; for (int b = 0; b < 8; b++) if (a[b] != 2.0f) c = 5.0f; return c; } fn1: ldr r3, .L3+48 vldr.64 d4, .L3 // q2=(2.0,2.0,2.0,2.0) vldr.64 d5, .L3+8 vldrw.32 q0, [r3] // q0=a(0..3) adds r3, r3, #16 vcmp.f32 eq, q0, q2 // cmp a(0..3) == (2.0,2.0,2.0,2.0) vldrw.32 q1, [r3] // q1=a(4..7) vmrs r3, P0 vcmp.f32 eq, q1, q2 // cmp a(4..7) == (2.0,2.0,2.0,2.0) vmrs r2, P0 @ movhi ands r3, r3, r2 // r3=select(a(0..3]) & select(a(4..7)) vldr.64 d4, .L3+16 // q2=(5.0,5.0,5.0,5.0) vldr.64 d5, .L3+24 vmsr P0, r3 vldr.64 d6, .L3+32 // q3=(4.0,4.0,4.0,4.0) vldr.64 d7, .L3+40 vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0) vmov.32 r2, q3[1] // keep the scalar max vmov.32 r0, q3[3] vmov.32 r3, q3[2] vmov.f32 s11, s12 vmov s15, r2 vmov s14, r3 vmaxnm.f32 s15, s11, s15 vmaxnm.f32 s15, s15, s14 vmov s14, r0 vmaxnm.f32 s15, s15, s14 vmov r0, s15 bx lr .L4: .align 3 .L3: .word 1073741824 // 2.0f .word 1073741824 .word 1073741824 .word 1073741824 .word 1084227584 // 5.0f .word 1084227584 .word 1084227584 .word 1084227584 .word 1082130432 // 4.0f .word 1082130432 .word 1082130432 .word 1082130432 This patch adds tests that trigger an ICE without this fix. The pr100757*.c testcases are derived from gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using various types and return values different from 0 and 1 to avoid commonalization with boolean masks. In addition, since we should not need these masks, the tests make sure they are not present. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> PR target/100757 gcc/ * config/arm/arm-protos.h (arm_get_mask_mode): New prototype. (arm_expand_vector_compare): Update prototype. * config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New. (arm_vector_mode_supported_p): Add support for VxBI modes. (arm_expand_vector_compare): Remove useless generation of vpsel. (arm_expand_vcond): Fix select operands. (arm_get_mask_mode): New. * config/arm/mve.md (vec_cmp<mode><MVE_vpred>): New. (vec_cmpu<mode><MVE_vpred>): New. (vcond_mask_<mode><MVE_vpred>): New. * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result>) (vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): Move to ... * config/arm/neon.md (vec_cmp<mode><v_cmp_result>) (vec_cmpu<mode><mode, vcond_mask_<mode><v_cmp_result>): ... here and disable for MVE. * doc/sourcebuild.texi (arm_mve): Document new effective-target. gcc/testsuite/ PR target/100757 * gcc.target/arm/simd/pr100757-2.c: New. * gcc.target/arm/simd/pr100757-3.c: New. * gcc.target/arm/simd/pr100757-4.c: New. * gcc.target/arm/simd/pr100757.c: New. * gcc.dg/signbit-2.c: Skip when targeting ARM/MVE. * lib/target-supports.exp (check_effective_target_arm_mve): New.
2022-02-22arm: Implement auto-vectorized MVE comparisons with vectors of boolean ↵Christophe Lyon12-42/+268
predicates We make use of qualifier_predicate to describe MVE builtins prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins, as they are exercised by the tests added earlier in the series. Special handling is needed for mve_vpselq because it has a v2di variant, which has no natural VPR.P0 representation: we keep HImode for it. The vector_compare expansion code is updated to use the right VxBI mode instead of HI for the result. We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns to use the new MVE_7_HI iterator which covers HI and the new VxBI modes, in conjunction with the new DB constraint for a constant vector of booleans. This patch also adds tests derived from the one provided in PR target/101325: there is a compile-only test because I did not have access to anything that could execute MVE code until recently. I have been able to add an executable test since QEMU supports MVE. Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does. This ensures arm_mve_hw passes even if the toolchain does not generate MVE code by default. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS) (BINOP_PRED_NONE_NONE_QUALIFIERS) (TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS) (TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm-protos.h (mve_bool_vec_to_const): New. * config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI modes. (arm_mode_to_pred_mode): New. (arm_expand_vector_compare): Use the right VxBI mode instead of HI. (arm_expand_vcond): Likewise. (simd_valid_immediate): Handle MODE_VECTOR_BOOL. (mve_bool_vec_to_const): New. (neon_make_constant): Call mve_bool_vec_to_const when needed. * config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_) (vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f) (vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u) (vpselq_s, vpselq_f): Use new predicated qualifiers. * config/arm/constraints.md (DB): New. * config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators. (MVE_VPRED, MVE_vpred): New attribute iterators. * config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>) (@mve_vcmp<mve_cmp_op>q_f<mode>, @mve_vpselq_<supf><mode>) (@mve_vpselq_f<mode>): Use MVE_VPRED instead of HI. (@mve_vpselq_<supf>v2di): Define separately. (mov<mode>): New expander for VxBI modes. * config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use MVE_7_HI iterator and add support for DB constraint. gcc/testsuite/ PR target/100757 PR target/101325 * gcc.dg/rtl/arm/mve-vxbi.c: New test. * gcc.target/arm/simd/pr101325.c: New. * gcc.target/arm/simd/pr101325-2.c: New. * lib/target-supports.exp (check_effective_target_arm_mve_hw): Use add_options_for_arm_v8_1m_mve_fp.
2022-02-22arm: Implement MVE predicates as vectors of booleansChristophe Lyon11-56/+162
This patch implements support for vectors of booleans to support MVE predicates, instead of HImode. Since the ABI mandates pred16_t (aka uint16_t) to represent predicates in intrinsics prototypes, we introduce a new "predicate" type qualifier so that we can map relevant builtins HImode arguments and return value to the appropriate vector of booleans (VxBI). We have to update test_vector_ops_duplicate, because it iterates using an offset in bytes, where we would need to iterate in bits: we stop iterating when we reach the end of the vector of booleans. In addition, we have to fix the underlying definition of vectors of booleans because ARM/MVE needs a different representation than AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the element size, so that a true element of V4BI is represented by '0b1111'. This patch updates the aarch64 definition of VNx*BI as needed. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ PR target/100757 PR target/101325 * config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI, VNx2BI): Update definition. * config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new simd types. (arm_init_builtin): Map predicate vectors arguments to HImode. (arm_expand_builtin_args): Move HImode predicate arguments to VxBI rtx. Move return value to HImode rtx. * config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate. * config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes. * config/arm/arm-simd-builtin-types.def (Pred1x16_t, Pred2x8_t,Pred4x4_t): New. * emit-rtl.cc (init_emit_once): Handle all boolean modes. * genmodes.cc (mode_data): Add boolean field. (blank_mode): Initialize it. (make_complex_modes): Fix handling of boolean modes. (make_vector_modes): Likewise. (VECTOR_BOOL_MODE): Use new COMPONENT parameter. (make_vector_bool_mode): Likewise. (BOOL_MODE): New. (make_bool_mode): New. (emit_insn_modes_h): Fix generation of boolean modes. (emit_class_narrowest_mode): Likewise. * machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT parameter. Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to define BImode. * rtx-vector-builder.cc (rtx_vector_builder::find_cached_value): Fix handling of constm1_rtx for VECTOR_BOOL. * simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL. (native_decode_vector_rtx): Likewise. (test_vector_ops_duplicate): Skip vec_merge test with vectors of booleans. * varasm.cc (output_constant_pool_2): Likewise.