aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2021-08-11Fortran: Fix c_float128 and c_float128_complex definitions.Sandra Loosemore2-6/+21
gfc_float128_type_node is only non-NULL on targets that support a 128-bit type that is not long double. Use float128_type_node instead when computing the value of the kind constants c_float128 and c_float128_complex from the ISO_C_BINDING intrinsic module; this also ensures it actually corresponds to __float128 (the IEEE encoding) and not some other 128-bit floating-point type. 2021-08-11 Sandra Loosemore <sandra@codesourcery.com> gcc/fortran/ * iso-c-binding.def (c_float128, c_float128_complex): Check float128_type_node instead of gfc_float128_type_node. * trans-types.c (gfc_init_kinds, gfc_build_real_type): Update comments re supported 128-bit floating-point types.
2021-08-11Fix gcc.dg/lto/pr48622_0.c testcaseRichard Biener1-0/+6
This fixes the testcase to not rely on the reference to ashift_qi_1 being optimized out by RTL optimization via help of the initregs pass that changes comparisons of uninitialized data with a comparison that is always false. 2021-08-11 Richard Biener <rguenther@suse.de> * gcc.dg/lto/pr48622_1.c: Provide non-LTO definition of ashift_qi_1.
2021-08-11target/101788 - avoid decomposing hard-register "loads"Richard Biener1-1/+2
This avoids decomposing hard-register accesses that masquerade as loads. 2021-08-11 Richard Biener <rguenther@suse.de> PR target/101877 * tree-ssa-forwprop.c (pass_forwprop::execute): Do not decompose hard-register accesses.
2021-08-11Adjust volatile handling of the operand scannerRichard Biener2-10/+6
The GIMPLE SSA operand scanner handles COMPONENT_REFs that are not marked TREE_THIS_VOLATILE but have a TREE_THIS_VOLATILE FIELD_DECL as volatile. That's inconsistent in how TREE_THIS_VOLATILE testing on GENERIC refs works which requires operand zero of component references to mirror TREE_THIS_VOLATILE to the ref so that testing TREE_THIS_VOLATILE on the outermost reference is enough to determine the volatileness. The following patch thus removes FIELD_DECL scanning from the GIMPLE SSA operand scanner, possibly leaving fewer stmts marked as gimple_has_volatile_ops. It shows we miss at least one case in the fortran frontend, though there's a suspicious amount of COMPONENT_REF creation compared to little setting of TREE_THIS_VOLATILE. This fixes the FAIL of gfortran.dg/volatile11.f90 that would otherwise occur. Visually inspecting fortran/ reveals a bunch of likely to fix cases but I don't know the constraints of 'volatile' uses in the fortran language to assess whether some of these are not necessary. 2021-08-09 Richard Biener <rguenther@suse.de> gcc/ * tree-ssa-operands.c (operands_scanner::get_expr_operands): Do not look at COMPONENT_REF FIELD_DECLs TREE_THIS_VOLATILE to determine has_volatile_ops. gcc/fortran/ * trans-common.c (create_common): Set TREE_THIS_VOLATILE on the COMPONENT_REF if the field is volatile.
2021-08-11Small tweak to expand_used_varsEric Botcazou1-6/+3
This completes the replacement of DECL_ATTRIBUTES (current_function_decl) with the attribs local variable. gcc/ * cfgexpand.c (expand_used_vars): Reuse attribs local variable.
2021-08-11Fix min_flags handling in mod-refJan Hubicka2-11/+62
gcc/ChangeLog: 2021-08-11 Jan Hubicka <hubicka@ucw.cz> Alexandre Oliva <oliva@adacore.com> * ipa-modref.c (modref_lattice::dump): Fix escape_point's min_flags dumping. (modref_lattice::merge_deref): Fix handling of indirect scape points. (update_escape_summary_1): Likewise. (update_escape_summary): Likewise. (ipa_merge_modref_summary_after_inlining): Likewise. gcc/testsuite/ChangeLog: * c-c++-common/modref-dse.c: New test.
2021-08-11middle-end/101858 - avoid shift of pointer in foldingRichard Biener2-0/+11
This makes sure to not generate a shift of pointer types in simplification of X < (cast) (1 << Y). 2021-08-11 Richard Biener <rguenther@suse.de> PR middle-end/101858 * fold-const.c (fold_binary_loc): Guard simplification of X < (cast) (1 << Y) to integer types. * gcc.dg/pr101858.c: New testcase.
2021-08-11tree-optimization/101861 - fix gather use for non-gather refsRichard Biener1-1/+2
My previous change broke the usage of gather for strided loads. The following fixes it. 2021-08-11 Richard Biener <rguenther@suse.de> PR tree-optimization/101861 * tree-vect-stmts.c (vectorizable_load): Fix error in previous change with regard to gather vectorization.
2021-08-11arm/66791: Replace builtins for vdup_n and vmov_n intrinsics.prathamesh.kulkarni3-73/+63
gcc/ChangeLog: PR target/66791 * config/arm/arm_neon.h (vdup_n_s8): Replace call to builtin with constructor. (vdup_n_s16): Likewise. (vdup_n_s32): Likewise. (vdup_n_s64): Likewise. (vdup_n_u8): Likewise. (vdup_n_u16): Likewise. (vdup_n_u32): Likewise. (vdup_n_u64): Likewise. (vdup_n_p8): Likewise. (vdup_n_p16): Likewise. (vdup_n_p64): Likewise. (vdup_n_f16): Likewise. (vdup_n_f32): Likewise. (vdupq_n_s8): Likewise. (vdupq_n_s16): Likewise. (vdupq_n_s32): Likewise. (vdupq_n_s64): Likewise. (vdupq_n_u8): Likewise. (vdupq_n_u16): Likewise. (vdupq_n_u32): Likewise. (vdupq_n_u64): Likewise. (vdupq_n_p8): Likewise. (vdupq_n_p16): Likewise. (vdupq_n_p64): Likewise. (vdupq_n_f16): Likewise. (vdupq_n_f32): Likewise. (vmov_n_s8): Replace call to builtin with call to corresponding vdup_n intrinsic. (vmov_n_s16): Likewise. (vmov_n_s32): Likewise. (vmov_n_s64): Likewise. (vmov_n_u8): Likewise. (vmov_n_u16): Likewise. (vmov_n_u32): Likewise. (vmov_n_u64): Likewise. (vmov_n_p8): Likewise. (vmov_n_p16): Likewise. (vmov_n_f16): Likewise. (vmov_n_f32): Likewise. (vmovq_n_s8): Likewise. (vmovq_n_s16): Likewise. (vmovq_n_s32): Likewise. (vmovq_n_s64): Likewise. (vmovq_n_u8): Likewise. (vmovq_n_u16): Likewise. (vmovq_n_u32): Likewise. (vmovq_n_u64): Likewise. (vmovq_n_p8): Likewise. (vmovq_n_p16): Likewise. (vmovq_n_f16): Likewise. (vmovq_n_f32): Likewise. * config/arm/arm_neon_builtins.def: Remove entries for vdup_n. gcc/testsuite/ChangeLog: PR target/66791 * gcc.target/arm/pr51534.c: Adjust test.
2021-08-11Ada: Remove debug line number for DECL_IGNORED_P functionsBernd Edlinger1-1/+3
It was pointed out in PR101598 to be inappropriate, that ignored Ada decls receive the source line number which was recorded in the function decl's DECL_SOURCE_LOCATION. Therefore set all front-end-generated Ada decls with DECL_IGNORED_P to UNKNOWN_LOCATION. 2021-08-11 Bernd Edlinger <bernd.edlinger@hotmail.de> PR debug/101598 * gcc-interface/trans.c (Subprogram_Body_to_gnu): Set the DECL_SOURCE_LOCATION of DECL_IGNORED_P gnu_subprog_decl to UNKNOWN_LOCATION.
2021-08-10compiler: don't crash on a, b := int(0)Ian Lance Taylor2-3/+14
Fixes PR go/101851 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/341330
2021-08-11Extend ldexp{s,d}f3 to vscalefs{s,d} when TARGET_AVX512F and TARGET_SSE_MATH.liuhongt3-8/+83
gcc/ChangeLog: PR target/98309 * config/i386/i386.md (ldexp<mode>3): Extend to vscalefs[sd] when TARGET_AVX512F and TARGET_SSE_MATH. gcc/testsuite/ChangeLog: PR target/98309 * gcc.target/i386/pr98309-1.c: New test. * gcc.target/i386/pr98309-2.c: New test.
2021-08-11gcc.dg/uninit-pred-9_b.c: Xfail for CRIS tooHans-Peter Nilsson1-1/+1
Adding to the growing list, for autotester accounting purposes. FWIW I see this fails for m68k too: https://gcc.gnu.org/pipermail/gcc-testresults/2021-August/712395.html and moxie: https://gcc.gnu.org/pipermail/gcc-testresults/2021-August/712389.html and pru: https://gcc.gnu.org/pipermail/gcc-testresults/2021-August/712366.html testsuite: PR middle-end/101674 * gcc.dg/uninit-pred-9_b.c: Xfail for cris-*-* too.
2021-08-11Daily bump.GCC Administrator5-1/+185
2021-08-10openmp: Fix up cp/parser.c build with GCC 4.8 to 6Jakub Jelinek1-2/+2
Christophe Lyon reported that cp/parser.c no longer compiles with GCC 4.8.5 after my recent OpenMP changes. A goto out; there crosses odsd variable declaration, and odsd has a vec<...> member where vec has = default; default constructor and gcc before r7-2822-gd0b0fbd9fce2f30a82558bf2308b3a7b56c2f364 treated that as error. Fixed by moving the declaration earlier before the goto. Tested on x86_64-linux with GCC 4.8.5 system gcc, committed to trunk as obvious. 2021-08-10 Jakub Jelinek <jakub@redhat.com> * parser.c (cp_parser_member_declaration): Move odsd declaration before cp_parser_using_declaration call to avoid errors with GCC 4.8 to 6.
2021-08-10gfortran: Fix in-build-tree testing [PR101305, PR101660]Tobias Burnus1-1/+1
ISO_Fortran_binding.h is written in the build dir - hence, a previous commit added it as include directory for in-build-tree testing. However, it turned out that -I$specdir/libgfortran interferes with reading .mod files as they are then no longer regareded as intrinsic modules. Solution: Create an extra include/ directory in the libgfortran build dir and copy ISO_Fortran_binding.h to that directory. As -B$specdir/libgfortran already causes gfortran to read that include subdirectory, the -I flag is no longer needed. PR libfortran/101305 PR fortran/101660 PR testsuite/101847 libgfortran/ChangeLog: * Makefile.am (ISO_Fortran_binding.h): Create include/ in the build dir and copy the include file to it. (clean-local): Add for removing the 'include' directory. * Makefile.in: Regenerate. gcc/testsuite/ChangeLog: * lib/gfortran.exp (gfortran_init): Remove -I$specpath/libgfortran from the string used to set GFORTRAN_UNDER_TEST.
2021-08-10Enable gcc.target/i386/pr88531-1a.c for all targetsH.J. Lu1-1/+1
PR tree-optimization/101809 * gcc.target/i386/pr88531-1a.c: Enable for all targets.
2021-08-10i386: Allow some V32HImode and V64QImode permutations even without AVX512BW ↵Jakub Jelinek2-4/+32
[PR80355] When working on the PR, I've noticed we generate terrible code for V32HImode or V64QImode permutations for -mavx512f -mno-avx512bw. Generally we can't do much with such permutations, but since PR68655 we can handle at least some, those expressible using V16SImode or V8DImode permutations, but that wasn't reachable, because ix86_vectorize_vec_perm_const didn't even try, it said without TARGET_AVX512BW it can't do anything, and with it can do everything, no d.testing_p attempts. This patch makes it try it for TARGET_AVX512F && !TARGET_AVX512BW. The first hunk is to avoid ICE, expand_vec_perm_even_odd_1 asserts d->vmode isn't V32HImode because expand_vec_perm_1 for AVX512BW handles already all permutations, but when we let it through without !TARGET_AVX512BW, expand_vec_perm_1 doesn't handle it. If we want, that hunk can be dropped if we implement in expand_vec_perm_even_odd_1 and its helper the even permutation as vpmovdw + vpmovdw + vinserti64x4 and odd permutation as vpsrld $16 + vpsrld $16 + vpmovdw + vpmovdw + vinserti64x4. 2021-08-10 Jakub Jelinek <jakub@redhat.com> PR target/80355 * config/i386/i386-expand.c (expand_vec_perm_even_odd): Return false for V32HImode if !TARGET_AVX512BW. (ix86_vectorize_vec_perm_const) <case E_V32HImode, case E_V64QImode>: If !TARGET_AVX512BW and TARGET_AVX512F and d.testing_p, don't fail early, but actually check the permutation. * gcc.target/i386/avx512f-pr80355-2.c: New test.
2021-08-10tree-optimization/101809 - support emulated gather for double[int]Richard Biener1-17/+30
This adds emulated gather support for index vectors with more elements than the data vector. The internal function gather vectorization code doesn't currently handle this (but the builtin decl code does). This allows vectorization of double data gather with int indexes on 32bit platforms where there isn't an implicit widening to 64bit present. 2021-08-10 Richard Biener <rguenther@suse.de> PR tree-optimization/101809 * tree-vect-stmts.c (get_load_store_type): Allow emulated gathers with offset vector nunits being a constant multiple of the data vector nunits. (vect_get_gather_scatter_ops): Use the appropriate nunits for the offset vector defs. (vectorizable_store): Adjust call to vect_get_gather_scatter_ops. (vectorizable_load): Likewise. Handle the case of less offset vectors than data vectors.
2021-08-10i386: Improve single operand AVX512F permutations [PR80355]Jakub Jelinek2-0/+107
On the following testcase we emit vmovdqa32 .LC0(%rip), %zmm1 vpermd %zmm0, %zmm1, %zmm0 and vmovdqa64 .LC1(%rip), %zmm1 vpermq %zmm0, %zmm1, %zmm0 instead of vshufi32x4 $78, %zmm0, %zmm0, %zmm0 and vshufi64x2 $78, %zmm0, %zmm0, %zmm0 we can emit with the patch. We have patterns that match two argument permutations for vshuf[if]*, but for one argument it doesn't trigger. Either we can add two patterns for that, or we would need to add another routine to i386-expand.c that would transform under certain condition these cases to the two argument vshuf*, doing it in sse.md looked simpler. We don't need this for 32-byte vectors, we already emit single insn permutation that doesn't need memory op there. 2021-08-10 Jakub Jelinek <jakub@redhat.com> PR target/80355 * config/i386/sse.md (*avx512f_shuf_<shuffletype>64x2_1<mask_name>_1, *avx512f_shuf_<shuffletype>32x4_1<mask_name>_1): New define_insn patterns. * gcc.target/i386/avx512f-pr80355-1.c: New test.
2021-08-10openmp: Add support for declare simd and declare variant in a attribute syntaxJakub Jelinek6-28/+726
This patch adds support for declare simd and declare variant in attribute syntax. Either in attribute-specifier-seq at the start of declaration, in that case it has similar restriction to pragma-syntax, that there is a single function declaration/definition in the declaration, rather than variable declaration or more than one function declarations or mix of function and variable declarations. Or after the declarator id, in that case it applies just to the single function declaration and the same declaration can have multiple such attributes. Or both. Furthermore, cp_parser_statement has been adjusted so that it doesn't accept [[omp::directive (parallel)]] etc. before statements that don't take attributes at all, or where those attributes don't appertain to the statement but something else (e.g. to label, using directive, declaration, etc.). 2021-08-10 Jakub Jelinek <jakub@redhat.com> gcc/cp/ * parser.h (struct cp_omp_declare_simd_data): Remove in_omp_attribute_pragma and clauses members, add loc and attribs. (struct cp_oacc_routine_data): Remove loc member, add clauses member. * parser.c (cp_finalize_omp_declare_simd): New function. (cp_parser_handle_statement_omp_attributes): Mention in function comment the function is used also for attribute-declaration. (cp_parser_handle_directive_omp_attributes): New function. (cp_parser_statement): Don't call cp_parser_handle_statement_omp_attributes if statement doesn't have attribute-specifier-seq at the beginning at all or if if those attributes don't appertain to the statement. (cp_parser_simple_declaration): Call cp_parser_handle_directive_omp_attributes and cp_finalize_omp_declare_simd. (cp_parser_explicit_instantiation): Likewise. (cp_parser_init_declarator): Initialize prefix_attributes only after parsing declarators. (cp_parser_direct_declarator): Call cp_parser_handle_directive_omp_attributes and cp_finalize_omp_declare_simd. (cp_parser_member_declaration): Likewise. (cp_parser_single_declaration): Likewise. (cp_parser_omp_declare_simd): Don't initialize data.in_omp_attribute_pragma, instead initialize data.attribs[0] and data.attribs[1]. (cp_finish_omp_declare_variant): Remove in_omp_attribute_pragma argument, instead use parser->lexer->in_omp_attribute_pragma. (cp_parser_late_parsing_omp_declare_simd): Adjust cp_finish_omp_declare_variant caller. Handle attribute-syntax declare simd/variant. gcc/testsuite/ * g++.dg/gomp/attrs-1.C (bar): Add missing semicolon after [[omp::directive (threadprivate (t2))]]. Add tests with if/while/switch after parallel in attribute syntax. (corge): Add missing omp:: before directive. * g++.dg/gomp/attrs-2.C (bar): Add missing semicolon after [[omp::directive (threadprivate (t2))]]. * g++.dg/gomp/attrs-10.C: New test. * g++.dg/gomp/attrs-11.C: New test.
2021-08-10i386: Fix typos in amxbf16 runtime test.Hongyu Wang1-3/+3
gcc/testsuite/ChangeLog: * gcc.target/i386/amxbf16-dpbf16ps-2.c: Fix typos.
2021-08-10tree-optimization/101801 - rework generic vector vectorization moreRichard Biener3-8/+56
This builds ontop of the vect_worthwhile_without_simd_p refactoring done earlier. It was wrong in dropping the appearant double checks for operation support since the optab check can happen with an integer vector emulation mode and thus succeed but vector lowering might not actually support the operation on word_mode. The following patch adds a vect_emulated_vector_p helper and re-instantiates the check where it was previously. It also adds appropriate costing of the scalar stmts emitted by vector lowering to vectorizable_operation which should be the only place such operations are synthesized. I've also cared for the case where the vector mode is supported but the operation is not (though I think this will be unlikely given we're talking about plus, minus and negate). This fixes the observed FAIL of gcc.dg/tree-ssa/gen-vect-11b.c with -m32 where we end up vectorizing a multiplication that ends up being teared down to scalars again by vector lowering. I'm not super happy about all the other places where we're now and previously feeding scalar modes to optab checks where we want to know whether we can vectorize sth but well. 2021-09-08 Richard Biener <rguenther@suse.de> PR tree-optimization/101801 PR tree-optimization/101819 * tree-vectorizer.h (vect_emulated_vector_p): Declare. * tree-vect-loop.c (vect_emulated_vector_p): New function. (vectorizable_reduction): Re-instantiate a check for emulated operations. * tree-vect-stmts.c (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. Cost emulated vector operations according to the scalar sequence synthesized by vector lowering.
2021-08-10middle-end/101824 - properly handle volatiles in nested fn loweringRichard Biener2-0/+20
When we build the COMPONENT_REF of a formerly volatile local off the FRAME decl we have to make sure to mark the COMPONENT_REF as TREE_THIS_VOLATILE. While the GIMPLE operand scanner looks at the FIELD_DECL this is not how volatile GENERIC refs work. 2021-08-09 Richard Biener <rguenther@suse.de> PR middle-end/101824 * tree-nested.c (get_frame_field): Mark the COMPONENT_REF as volatile in case the variable was. * gcc.dg/tree-ssa/pr101824.c: New testcase.
2021-08-10Evaluate arguments of sizeof that are structs of variable size.Martin Uecker2-1/+19
Evaluate arguments of sizeof for all types of variable size and not just for VLAs. This fixes some issues related to [PR29970] where statement expressions need to be evaluated so that the size is well defined. 2021-08-10 Martin Uecker <muecker@gwdg.de> gcc/c/ PR c/29970 * c-typeck.c (c_expr_sizeof_expr): Evaluate size expressions for structs of variable size. gcc/testsuite/ PR c/29970 * gcc.dg/vla-stexp-1.c: New test.
2021-08-09x86: Optimize load of const FP all bits set vectorsH.J. Lu4-6/+29
Check float_vector_all_ones_operand for vector floating-point modes to optimize load of const floating-point all bits set vectors. gcc/ PR target/101804 * config/i386/constraints.md (BC): Document for integer SSE constant all bits set operand. (BF): New constraint for const floating-point all bits set vectors. * config/i386/i386.c (standard_sse_constant_p): Likewise. (standard_sse_constant_opcode): Likewise. * config/i386/sse.md (sseconstm1): New mode attribute. (mov<mode>_internal): Replace BC with <sseconstm1>. gcc/testsuite/ PR target/101804 * gcc.target/i386/avx2-gather-2.c: Pass -march=skylake instead of "-mavx2 -mtune=skylake". Scan vpcmpeqd.
2021-08-10Support cond_ashr/lshr/ashl for vector integer modes under AVX512.liuhongt14-0/+272
gcc/ChangeLog: * config/i386/sse.md (cond_<insn><mode>): New expander. (VI248_AVX512VLBW): New mode iterator. * config/i386/predicates.md (nonimmediate_or_const_vec_dup_operand): New predicate. gcc/testsuite/ChangeLog: * gcc.target/i386/cond_op_shift_d-1.c: New test. * gcc.target/i386/cond_op_shift_d-2.c: New test. * gcc.target/i386/cond_op_shift_q-1.c: New test. * gcc.target/i386/cond_op_shift_q-2.c: New test. * gcc.target/i386/cond_op_shift_ud-1.c: New test. * gcc.target/i386/cond_op_shift_ud-2.c: New test. * gcc.target/i386/cond_op_shift_uq-1.c: New test. * gcc.target/i386/cond_op_shift_uq-2.c: New test. * gcc.target/i386/cond_op_shift_uw-1.c: New test. * gcc.target/i386/cond_op_shift_uw-2.c: New test. * gcc.target/i386/cond_op_shift_w-1.c: New test. * gcc.target/i386/cond_op_shift_w-2.c: New test.
2021-08-10Daily bump.GCC Administrator3-1/+191
2021-08-09Ensure toupper and tolower follow the expected pattern.Andrew MacLeod2-0/+22
If the parameter is not compatible with the LHS, assume this is not really a builtin function to avoid a trap. gcc/ PR tree-optimization/101741 * gimple-range-fold.cc (fold_using_range::range_of_builtin_call): Check type of parameter for toupper/tolower. gcc/testsuite/ * gcc.dg/pr101741.c: New.
2021-08-09ipa: Fix testsuite/gcc.dg/ipa/remref-6.cMartin Jambor2-2/+2
I forgot to add -fdump-ipa-inline to options of testsuite/gcc.dg/ipa/remref-6.c and so the dump scan test were not PASSing but ended up as UNRESOLVED. Fixing that revealed that the one of the dumps it was looking for had a double space, so I removed it too. gcc/ChangeLog: 2021-08-09 Martin Jambor <mjambor@suse.cz> PR testsuite/101654 * ipa-prop.c (propagate_controlled_uses): Removed a spurious space. gcc/testsuite/ChangeLog: 2021-08-09 Martin Jambor <mjambor@suse.cz> PR testsuite/101654 * gcc.dg/ipa/remref-6.c: Added missing -fdump-ipa-inline option.
2021-08-09Verify destination[source] of a load[store] instruction is a register.Pat Haugen1-2/+12
gcc/ChangeLog: * config/rs6000/rs6000.c (is_load_insn1): Verify destination is a register. (is_store_insn1): Verify source is a register.
2021-08-09i386: Name V2SF logic insns [PR101812]Uros Bizjak2-1/+13
Name V2SF logic insns, so expand_simple_binop works with V2SF modes. 2021-08-09 Uroš Bizjak <ubizjak@gmail.com> gcc/ PR target/101812 * config/i386/mmx.md (<any_logic:code>v2sf3): Rename from *mmx_<any_logic:code>v2sf3 gcc/testsuite/ PR target/101812 * gcc.target/i386/pr101812.c: New test.
2021-08-09Cross-reference parts adapted in 'gcc/omp-oacc-neuter-broadcast.cc'Thomas Schwinge3-1/+15
gcc/ * config/nvptx/nvptx.c: Cross-reference parts adapted in 'gcc/omp-oacc-neuter-broadcast.cc'. * omp-low.c: Likewise. * omp-oacc-neuter-broadcast.cc: Cross-reference parts adapted from the above files.
2021-08-09amdgcn: Enable OpenACC worker partitioning for AMD GCNJulian Brown2-17/+3
gcc/ * config/gcn/gcn.c (gcn_init_builtins): Override decls for BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START, BUILT_IN_GOACC_SINGLE_COPY_END and BUILT_IN_GOACC_BARRIER. (gcn_goacc_validate_dims): Turn on worker partitioning unconditionally. (gcn_fork_join): Update comment. * config/gcn/gcn.opt (flag_worker_partitioning): Remove. (macc_experimental_workers): Remove unused option. libgomp/ * plugin/plugin-gcn.c (gcn_exec): Change default number of workers to 16. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c [acc_device_radeon]: Update. * testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c [ACC_DEVICE_TYPE_radeon]: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c [acc_device_radeon]: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c [ACC_DEVICE_TYPE_radeon]: Likewise. * testsuite/libgomp.oacc-fortran/optional-reduction.f90: XFAIL for 'openacc_radeon_accel_selected' and '-O0'. * testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise. Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-08-09openacc: Middle-end worker-partitioning supportJulian Brown11-34/+1584
This patch implements worker-partitioning support in the middle end, by rewriting gimple. The OpenACC execution model requires that code can run in either "worker single" mode where only a single worker per gang is active, or "worker partitioned" mode, where multiple workers per gang are active. This means we need to do something equivalent to spawning additional workers when transitioning from worker-single to worker-partitioned mode. However, GPUs typically fix the number of threads of invoked kernels at launch time, so we need to do something with the "extra" threads when they are not wanted. The scheme used is to conditionalise each basic block that executes in "worker single" mode for worker 0 only. Conditional branches are handled specially so "idle" (non-0) workers follow along with worker 0. On transitioning to "worker partitioned" mode, any variables modified by worker 0 are propagated to the other workers via GPU shared memory. Special care is taken for routine calls, writes through pointers, and so forth, as follows: - There are two types of function calls to consider in worker-single mode: "normal" calls to maths library routines, etc. are called from worker 0 only. OpenACC routines may contain worker-partitioned loops themselves, so are called from all workers, including "idle" ones. - SSA names set in worker-single mode, but used in worker-partitioned mode, are copied to shared memory in worker 0. Other workers retrieve the value from the appropriate shared-memory location after a barrier, and new phi nodes are introduced at the convergence point to resolve the worker 0/other worker copies of the value. - Local scalar variables (on the stack) also need special handling. We broadcast any variables that are written in the current worker-single block, and that are read in any worker-partitioned block. (This is believed to be safe, and is flow-insensitive to ease analysis.) - Local aggregates (arrays and composites) on the stack are *not* broadcast. Instead we force gimple stmts modifying elements/fields of local aggregates into fully-partitioned mode. The RHS of the assignment is a scalar, and is thus subject to broadcasting as above. - Writes through pointers may affect any local variable that has its address taken. We use points-to analysis to determine the set of potentially-affected variables for a given pointer indirection. We broadcast any such variable which is used in worker-partitioned mode, on a per-block basis for any block containing a write through a pointer. Some slides about the implementation (from 2018) are available at: https://jtb20.github.io/gcnworkers.pdf gcc/ * Makefile.in (OBJS): Add omp-oacc-neuter-broadcast.o. * doc/tm.texi.in (TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD): Add documentation hook. * doc/tm.texi: Regenerate. * omp-oacc-neuter-broadcast.cc: New file. * omp-builtins.def (BUILT_IN_GOACC_BARRIER) (BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START) (BUILT_IN_GOACC_SINGLE_COPY_END): New builtins. * passes.def (pass_omp_oacc_neuter_broadcast): Add pass. * target.def (goacc.create_worker_broadcast_record): Add target hook. * tree-pass.h (make_pass_omp_oacc_neuter_broadcast): Add prototype. * config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record): Rename prototype to... (gcn_goacc_create_worker_broadcast_record): ... this. * config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename function to... (gcn_goacc_create_worker_broadcast_record): ... this. * config/gcn/gcn.c (TARGET_GOACC_ADJUST_PROPAGATION_RECORD): Rename to... (TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD): ... this. Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com> (via 'gcc/config/nvptx/nvptx.c' master) Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-08-09PR101609: Use the correct iterator for AArch64 vector right shift patternTejas Belagod3-9/+89
Loops containing long long shifts fail to vectorize due to the vectorizer not being able to recognize long long right shifts. This is due to a bug in the iterator used for the vashr and vlshr patterns in aarch64-simd.md. 2021-08-09 Tejas Belagod <tejas.belagod@arm.com> gcc/ChangeLog PR target/101609 * config/aarch64/aarch64-simd.md (vlshr<mode>3, vashr<mode>3): Use the right iterator. gcc/testsuite/ChangeLog * gcc.target/aarch64/vect-shr-reg.c: New testcase. * gcc.target/aarch64/vect-shr-reg-run.c: Likewise.
2021-08-09Remove 'gcc/omp-offload.c' from 'GTFILES'Thomas Schwinge1-1/+0
Given that it doesn't contain any 'GTY' markers, no 'gcc/gt-omp-offload.h' file gets generated (and '#include'd anywhere). Small fix-up for r243673 (Git commit 629b3d75c8c5a244d891a9c292bca6912d4b0dd9) "Split omp-low into multiple files". gcc/ * Makefile.in (GTFILES): Remove '$(srcdir)/omp-offload.c'.
2021-08-09Don't consider '-foffload-abi' in 'DEF_GOACC_BUILTIN', 'DEF_GOMP_BUILTIN'Thomas Schwinge3-9/+7
Since Tom's PR64707 commit r220037 (Git commit 1506ae0e1e865fb7a42fc37a47f1799b71f21c53) "Make fopenmp an LTO option" as well as PR64672 commit r220038 (Git commit a0c88d0629a33161add8d5bc083f1e59f3f756f7) "Make fopenacc an LTO option", we're now actually passing '-fopenacc'/'-fopenmp' to the 'mkoffload's, which will pass these on to the offload compilers. gcc/ * builtins.def (DEF_GOACC_BUILTIN, DEF_GOMP_BUILTIN): Don't consider '-foffload-abi'. * common.opt (-foffload-abi): Remove 'Var', 'Init'. * opts.c (common_handle_option) <-foffload-abi> [ACCEL_COMPILER]: Ignore.
2021-08-09Sanity check that 'Init' doesn't appear without 'Var' in '*.opt' filesThomas Schwinge1-2/+6
... as that doesn't make sense. @item Init(@var{value}) The variable specified by the @code{Var} property should be statically initialized to @var{value}. [...] gcc/ * optc-gen.awk: Sanity check that 'Init' doesn't appear without 'Var'.
2021-08-09[OpenACC] Clean up unused 'BUILT_IN_ACC_GET_DEVICE_TYPE'Thomas Schwinge1-2/+0
Unused as of r229767 (Git commit e50146711b7200e8f822c6d8239430c682b76e4f) "OpenACC reductions". gcc/ * omp-builtins.def (BUILT_IN_ACC_GET_DEVICE_TYPE): Remove.
2021-08-09[documentation] No need anymore to "mention ['gt-*.h' file] as a dependency ↵Thomas Schwinge1-2/+1
in the 'Makefile'" ... as of r202907 (Git commit b6541edc52ed57b6e47150396356d3080ba81034) "remove explicit dependencies". gcc/ * doc/gty.texi (Files): Update.
2021-08-09[documentation] Fix GTY header file exampleThomas Schwinge1-1/+1
Fix-up for CVS 'gcc/doc/gty.texi' r1.6 (Subversion r55857, Git commit cba57c9d40057fa78efc9a404ab4ae7101a59dcb) "Minor doc updates" gcc/ * doc/gty.texi (Files): Fix GTY header file example.
2021-08-09Improve handling of unknown sign bit in CCP.Roger Sayle4-17/+107
This middle-end patch implements several related improvements to tree-ssa's conditional (bit) constant propagation pass. The current code handling ordered comparisons contains the comment "If the most significant bits are not known we know nothing" which is not entirely true [this test even prevents this pass understanding these comparisons always have a zero or one result]. This patch introduces a new value_mask_to_min_max helper function, that understands the different semantics of the most significant bit on signed vs. unsigned values. This allows us to generalize ordered comparisons, GE_EXPR, GT_EXPR, LE_EXPR and LT_EXPR, where to code is tweaked to correctly handle the potential equal cases. Then finally support is added for the related tree codes MIN_EXPR, MAX_EXPR, ABS_EXPR and ABSU_EXPR. Regression testing revealed three test cases in the testsuite that were checking for specific optimizations that are now being performed earlier than expected. These tests can continue to check their original transformations by explicitly adding -fno-tree-ccp to their dg-options (some already specify -fno-ipa-vrp or -fno-tree-forwprop for the same reason). 2021-08-09 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * tree-ssa-ccp.c (value_mask_to_min_max): Helper function to determine the upper and lower bounds from a mask-value pair. (bit_value_unop) [ABS_EXPR, ABSU_EXPR]: Add support for absolute value and unsigned absolute value expressions. (bit_value_binop): Initialize *VAL's precision. [LT_EXPR, LE_EXPR]: Use value_mask_to_min_max to determine upper and lower bounds of operands. Add LE_EXPR/GE_EXPR support when the operands are unknown but potentially equal. [MIN_EXPR, MAX_EXPR]: Support minimum/maximum expressions. gcc/testsuite/ChangeLog * gcc.dg/pr68217.c: Add -fno-tree-ccp option. * gcc.dg/tree-ssa/vrp24.c: Add -fno-tree-ccp option. * g++.dg/ipa/pure-const-3.C: Add -fno-tree-ccp option.
2021-08-09testsuite/lib/gfortran.exp: Add -I for ISO*.h [PR101305, PR101660]Tobias Burnus20-21/+27
This patch adds -I$specdir/libgfortran to GFORTRAN_UNDER_TEST, when set by proc gfortran_init. As the $specdir depends on the multilib setting, it has to be re-set for a different multilib; hence, we track whether a previous call to gfortran_init set that var or whether it was set differently. gcc/testsuite/ PR libfortran/101305 PR fortran/101660 * lib/gfortran.exp (gfortran_init): Add -I $specdir/libgfortran to GFORTRAN_UNDER_TEST; update it when set by previous gfortran_init call. * gfortran.dg/ISO_Fortran_binding_1.c: Use <...> not "..." for ISO_Fortran_binding.h's #include. * gfortran.dg/ISO_Fortran_binding_10.c: Likewise. * gfortran.dg/ISO_Fortran_binding_11.c: Likewise. * gfortran.dg/ISO_Fortran_binding_12.c: Likewise. * gfortran.dg/ISO_Fortran_binding_15.c: Likewise. * gfortran.dg/ISO_Fortran_binding_16.c: Likewise. * gfortran.dg/ISO_Fortran_binding_17.c: Likewise. * gfortran.dg/ISO_Fortran_binding_18.c: Likewise. * gfortran.dg/ISO_Fortran_binding_3.c: Likewise. * gfortran.dg/ISO_Fortran_binding_5.c: Likewise. * gfortran.dg/ISO_Fortran_binding_6.c: Likewise. * gfortran.dg/ISO_Fortran_binding_7.c: Likewise. * gfortran.dg/ISO_Fortran_binding_8.c: Likewise. * gfortran.dg/ISO_Fortran_binding_9.c: Likewise. * gfortran.dg/PR94327.c: Likewise. * gfortran.dg/PR94331.c: Likewise. * gfortran.dg/bind_c_array_params_3_aux.c: Likewise. * gfortran.dg/iso_fortran_binding_uint8_array_driver.c: Likewise. * gfortran.dg/pr93524.c: Likewise.
2021-08-09aarch64: Expand %<w> correctly according to mode iteratorBin Cheng1-1/+1
Pattern "*extend<SHORT:mode><GPI:mode>2_aarch64" is duplicated from the corresponding zero_extend pattern, however %<w> needs to be expanded according to its mode iterator because the smov instruction is different to umov. 2021-08-09 Bin Cheng <bin.cheng@linux.alibaba.com> gcc/ * config/aarch64/aarch64.md (*extend<SHORT:mode><GPI:mode>2_aarch64): Use %<GPI:w>0.
2021-08-09testsuite: aarch64: Fix invalid SVE testsJonathan Wright5-48/+24
Some scan-assembler tests for SVE code generation were erroneously split over multiple lines - meaning they became invalid. This patch gets the tests working again by putting each test on a single line. The extract_[1234].c tests are corrected to expect that extracted 32-bit values are moved into 'w' registers rather than 'x' registers. gcc/testsuite/ChangeLog: 2021-08-06 Jonathan Wright <jonathan.wright@arm.com> * gcc.target/aarch64/sve/dup_lane_1.c: Don't split scan-assembler tests over multiple lines. Expect 32-bit result values in 'w' registers. * gcc.target/aarch64/sve/extract_1.c: Likewise. * gcc.target/aarch64/sve/extract_2.c: Likewise. * gcc.target/aarch64/sve/extract_3.c: Likewise. * gcc.target/aarch64/sve/extract_4.c: Likewise.
2021-08-09testsuite: aarch64: Fix failing vector structure tests on big-endianJonathan Wright1-1/+1
Recent refactoring of the arm_neon.h header enabled better code generation for intrinsics that manipulate vector structures. New tests were also added to verify the benefit of these changes. It now transpires that the code generation improvements are observed only on little-endian systems. This patch restricts the code generation tests to little-endian targets. gcc/testsuite/ChangeLog: 2021-08-04 Jonathan Wright <jonathan.wright@arm.com> * gcc.target/aarch64/vector_structure_intrinsics.c: Restrict tests to little-endian targets.
2021-08-09Daily bump.GCC Administrator3-1/+9
2021-08-08lra: Fix s/otput/output/ typo in debug outputSergei Trofimovich1-1/+1
gcc/ * lra-constraints.c: Fix s/otput/output/ typo.
2021-08-08Fix c6x test compromised by recent improvements to bswap & rotatesJeff Law1-2/+6
gcc/testsuite * gcc.target/tic6x/rotdi16-scan.c: Pull rotate into its own function.