aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2022-10-19Fix omp-expand.cc's expand_omp_target for OpenACCTobias Burnus2-1/+6
In OG12 commit a6c1eccffb161130351d891dc87f5afe54f8075c, "Fortran/OpenMP: Support mapping of DT with allocatable components" the size of the addr/sizes/kind arrays was passed as 4th argument. However, OpenACC uses >3 arguments for its own purpose, e.g. to handle noncontiguous arrays by passing an array descriptor there. This patch restores the previous behaviour for OpenACC, fixing testcases like libgomp.oacc-c-c++-common/noncontig_array-1.c. gcc/ * omp-expand.cc (expand_omp_target): Fix OpenACC in case there are more than 3 arguments to the builtin function.
2022-10-19ChangeLog for "Fortran: Fix delinearization regression"Tobias Burnus2-0/+14
Missed to update gcc/fortran/ChangeLog.omp and to include the following in previous commit, i.e. commit 76b773a4a2d1daf0b83e50cd999bc38f8dd047be. gcc/fortran/ChangeLog: * trans-array.cc (non_negative_strides_array_p): Fix handling of GFC_DECL_SAVED_DESCRIPTOR. (gfc_conv_array_ref): Use ARRAY_REF again when possible. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/affinity-clause-1.f90: Revert to upsteam version, update one scan-tree item. * gfortran.dg/gomp/depend-4.f90: Revert to upstream version. * gfortran.dg/gomp/depend-5.f90: Likewise. * gfortran.dg/gomp/depend-6.f90: Likewise.
2022-10-19Fortran: Fix delinearization regressionTobias Burnus5-92/+91
The delinearization patch "Fortran: delinearize multi-dimensional array accesses", OG12 commit 39a8c371fda6136cf77c74895a00b136409e0ba3 uses gfc_build_array_ref for the non-delinearization path. The generated code depends on whether there can be negative strides or not, an addition to that function in r12-8230-g7964ab6c364 - adding a Boolean argument. The follow-up OG12 commit "Fix Fortran array-access regressions", 9fb0076b11eb2774b620bcf2171d55c7d1fb899f also added this argument to the call in gfc_conv_array_ref, but always evaluating as false. This commit changes it to a call to non_negative_strides_array_p (Note: for 'se->expr' not 'base'; the former could be 'arraydesc' while the later is then 'arraydesc.data' whose TREE_TYPE does not contain information about the array type.) However, doing so revealed a bug in non_negative_strides_array_p, fixed in this commit but also submitted as "Fortran: Fix non_negative_strides_array_p" to mainline, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603883.html As a side effect of this commit, several testcases now pass and the OG12-only changes to depend-{4,5,6}.f90 and affinity-clause-1.f90 could be undone, except that the latter now uses the delinearized array syntax in one case, which is an improvement (as honored in the scan-dump-tree). Hence, this commit (partially) reverts the commits: 21c806f73fc gfortran.dg/gomp/{depend-5,scope-6}.f90: Update scan-tree-dump 014fc7cd451 Fix dg- pattern for gomp/{affinity-clause-1.f90,uses_allocators-3.f90} 2d8aa5cc5d3 gfortran.dg/gomp/depend-6.f90: minor fix + dump update d77133b29fc gfortran.dg/gomp/depend-4.f90: minor fix + dump update The main testcase for non_negative_strides_array_p is gfortran.dg/array_reference_3.f90, which now also passes as well. Additionally, this changes prevents some unintended implicit mapping such that libgomp.fortran/map-alloc-comp-{4,6}.f90 failed before - and now passes again.
2022-10-19Remove undefined behaviour from testscase.Andrew MacLeod1-1/+1
There was a patch posted to remove the undefined behaviour from this testcase, but it appear to never have been applied. gcc/teststuite/ PR tree-optimization/102892 * gcc.dg/pr102892-1.c: Remove undefined behaviour.
2022-10-19rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]Kewen Lin2-1/+15
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note makes one assumption that we have emitted one insn which restores the frame pointer previously. That part of code was guarded with flag frame_pointer_needed before, it was consistent, but it was replaced with flag frame_pointer_needed_indeed since commit r10-7981. It caused ICE due to unexpected NULL insn. PR target/96072 gcc/ChangeLog: * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the condition for adding REG_CFA_DEF_CFA reg note with frame_pointer_needed_indeed. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr96072.c: New test. (cherry picked from commit 5be0950d22209f5ba69d244387228e12389a8470)
2022-10-19rs6000: Fix condition of define_expand vec_shr_<mode> [PR100645]Kewen Lin2-1/+14
PR100645 exposes one latent bug in define_expand vec_shr_<mode> that the current condition TARGET_ALTIVEC is too loose. The mode iterator VEC_L contains a few modes, they are not always supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should be used like some other VEC_L usages. PR target/100645 gcc/ChangeLog: * config/rs6000/vector.md (vec_shr_<mode>): Replace condition TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr100645.c: New test. (cherry picked from commit bfad7069b74c97000b698191c1945f07a6192db5)
2022-10-19Daily bump.GCC Administrator1-1/+1
2022-10-18Merge branch 'releases/gcc-12' into devel/omp/gcc-12Tobias Burnus32-128/+1234
Merge up to r12-8843-g912bdd5cfb92f6dd58accd755ad14f47c0df619e (18th Oct 2022)
2022-10-18Daily bump.GCC Administrator3-1/+136
2022-10-17Fix register count when not splitting Complex IEEE 128-bit args.Pat Haugen1-0/+6
For ABI_V4, we do not split complex args. This created a problem because even though an arg would be passed in two VSX regs, we were only advancing the function arg counter by one VSX register. Fixed with this patch. PR target/99685 gcc/ * config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump register count when not splitting IEEE 128-bit Complex. (cherry picked from commit 2ee68beee709e48fce85b8892ff9985acc6a91a8)
2022-10-17Fortran: Fixes for kind=4 characters strings [PR107266]Tobias Burnus7-12/+153
PR fortran/107266 gcc/fortran/ * trans-expr.cc (gfc_conv_string_parameter): Use passed type to honor character kind. * trans-types.cc (gfc_sym_type): Honor character kind. * trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4 character strings. gcc/testsuite/ * gfortran.dg/char4_decl.f90: New test. * gfortran.dg/char4_decl-2.f90: New test. (cherry picked from commit c610cf20ebb3444ef4224d789aca670a12f5da40)
2022-10-17tree-optimization/107254 - check and support live lanes from permutesRichard Biener2-5/+77
The following fixes an omission from adding SLP permute nodes which is live lanes originating from those. We have to check that we can extract the lane and have to actually code generate them. PR tree-optimization/107254 * tree-vect-slp.cc (vect_slp_analyze_node_operations_1): For permutes also analyze live lanes. (vect_schedule_slp_node): For permutes also code generate live lane extracts. * gfortran.dg/vect/pr107254.f90: New testcase. (cherry picked from commit 9ed4a849afb5b18b462bea311e7eee454c2c9f68)
2022-10-17tree-optimization/107212 - SLP reduction of reduction pathsRichard Biener3-7/+63
The following fixes an issue with how we handle epilogue generation for SLP reductions of reduction paths where the actual live lanes are not "canonical". We need to make sure to identify all live lanes as reductions and thus have to iterate over all participating SLP lanes when walking the reduction SSA use-def chain. Also the previous attempt likely to mitigate such issue in vectorizable_live_operation is misguided and has to be removed. PR tree-optimization/107212 * tree-vect-loop.cc (vectorizable_reduction): Make sure to set STMT_VINFO_REDUC_DEF for all live lanes in a SLP reduction. (vectorizable_live_operation): Do not pun to the SLP node representative for reduction epilogue generation. * gcc.dg/vect/pr107212-1.c: New testcase. * gcc.dg/vect/pr107212-2.c: Likewise. (cherry picked from commit ee467644c53ee2f7d633a8e1f53603feafab4351)
2022-10-17tree-optimization/107160 - avoid reusing multiple accumulatorsRichard Biener2-1/+43
Epilogue vectorization is not set up to re-use a vectorized accumulator consisting of more than one vector. For non-SLP we always reduce to a single but for SLP that isn't happening. In such case we currenlty miscompile the epilog so avoid this. PR tree-optimization/107160 * tree-vect-loop.cc (vect_create_epilog_for_reduction): Do not register accumulator if we failed to reduce it to a single vector. * gcc.dg/vect/pr107160.c: New testcase. (cherry picked from commit 5cbaf84c191b9a3e3cb26545c808d208bdbf2ab5)
2022-10-17tree-optimization/107107 - tail-merging VN wrong-codeRichard Biener2-14/+28
The following fixes an unintended(?) side-effect of the special MODIFY_EXPR expression entries we add for tail-merging during VN. We shouldn't value-number the virtual operand differently here. PR tree-optimization/107107 * tree-ssa-sccvn.cc (visit_reference_op_store): Do not affect value-numbering when doing the tail merging MODIFY_EXPR lookup. * gcc.dg/pr107107.c: New testcase. (cherry picked from commit 85333b9265720fc4e49397301cb16324d2b89aa7)
2022-10-17tree-optimization/106922 - extend same-val clobber FRERichard Biener2-3/+55
The following extends the skipping of same valued stores to handle an arbitrary number of them as long as they are from the same value (which we now record). That's an obvious extension which allows to optimize the m_engaged member of std::optional more reliably. PR tree-optimization/106922 * tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow an arbitrary number of same valued skipped stores. * g++.dg/torture/pr106922.C: New testcase. (cherry picked from commit af611afe5fcc908a6678b5b205fb5af7d64fbcb2)
2022-10-17testsuite: Fix up pr106922.C testJakub Jelinek1-2/+2
On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches wrote: > * g++.dg/tree-ssa/pr106922.C: Adjust. > --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C > @@ -87,5 +87,4 @@ void testfunctionfoo() { > } > } > > -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { xfail { ! lp64 } } } } > -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 } } } } > +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } } I've noticed +UNRESOLVED: g++.dg/tree-ssa/pr106922.C -std=gnu++20 scan-tree-dump-not dce3 "m_initialized" +UNRESOLVED: g++.dg/tree-ssa/pr106922.C -std=gnu++2b scan-tree-dump-not dce3 "m_initialized" with this change, both on x86_64 and i686. The dump is still cddce3, additionally as the last reference to the pre dump is gone, not sure it is worth creating that dump. With the following patch, there aren't FAILs nor UNRESOLVED tests with GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'" 2022-09-23 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106922 * g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than dce3. Remove -fdump-tree-pre-details from dg-options. (cherry picked from commit a0de11d0d22054b6fd76a0730a3ec807542379d0)
2022-10-17tree-optimization/106922 - missed FRE/PRERichard Biener3-32/+93
The following enhances the store-with-same-value trick in vn_reference_lookup_3 by not only looking for a = val; *ptr = val; .. = a; but also *ptr = val; other = x; .. = a; where the earlier store is more than one hop away. It does this by queueing the actual value to compare until after the walk but as disadvantage only allows a single such skipped store from a constant value. Unfortunately we cannot handle defs from non-constants this way since we're prone to pick up values from the past loop iteration this way and we have no good way to identify values that are invariant in the currently iterated cycle. That's why we keep the single-hop lookup for those cases. gcc.dg/tree-ssa/pr87126.c would be a testcase that's un-XFAILed when we'd handle those as well. PR tree-optimization/106922 * tree-ssa-sccvn.cc (vn_walk_cb_data::same_val): New member. (vn_walk_cb_data::finish): Perform delayed verification of a skipped may-alias. (vn_reference_lookup_pieces): Likewise. (vn_reference_lookup): Likewise. (vn_reference_lookup_3): When skipping stores of the same value also handle constant stores that are more than a single VDEF away by delaying the verification. * gcc.dg/tree-ssa/ssa-fre-100.c: New testcase. * g++.dg/tree-ssa/pr106922.C: Adjust. (cherry picked from commit 9baee6181b4e427e0b5ba417e51424c15858dce7)
2022-10-17GCN: Restore build with GCC 4.8Thomas Schwinge2-7/+15
For example, for "g++-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4", the recent commit r13-3220-g45381d6f9f4e7b5c7b062f5ad8cc9788091c2d07 "amdgcn: add multiple vector sizes" broke the build: In file included from [...]/source-gcc/gcc/coretypes.h:458:0, from [...]/source-gcc/gcc/config/gcn/gcn.cc:24: [...]/source-gcc/gcc/config/gcn/gcn.cc: In function ‘machine_mode VnMODE(int, machine_mode)’: ./insn-modes.h:42:71: error: temporary of non-literal type ‘scalar_int_mode’ in a constant expression #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode)) ^ [...]/source-gcc/gcc/config/gcn/gcn.cc:405:10: note: in expansion of macro ‘QImode’ case QImode: ^ In file included from [...]/source-gcc/gcc/coretypes.h:478:0, from [...]/source-gcc/gcc/config/gcn/gcn.cc:24: [...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not literal because: class scalar_int_mode ^ [...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not an aggregate, does not have a trivial default constructor, and has no constexpr constructor that is not a copy or move constructor [...] Addressing this like simiar issues have been addressed in the past. gcc/ * config/gcn/gcn.cc (VnMODE): Use 'case E_QImode:' instead of 'case QImode:', etc. (cherry picked from commit 612de72b0d2904b5a5a2b487ce4cb907c768a947)
2022-10-17Daily bump.GCC Administrator1-1/+1
2022-10-16Daily bump.GCC Administrator1-1/+1
2022-10-15Daily bump.GCC Administrator2-1/+10
2022-10-14[og12] OpenACC: Don't gang-privatize artificial variablesJulian Brown2-0/+27
This patch prevents compiler-generated artificial variables from being treated as privatization candidates for OpenACC. The rationale is that e.g. "gang-private" variables actually must be shared by each worker and vector spawned within a particular gang, but that sharing is not necessary for any compiler-generated variable (at least at present, but no such need is anticipated either). Variables on the stack (and machine registers) are already private per-"thread" (gang, worker and/or vector), and that's fine for artificial variables. Several tests need their scan output patterns adjusted to compensate. 2022-10-14 Julian Brown <julian@codesourcery.com> gcc/ * omp-low.cc (oacc_privatization_candidate_p): Artificial vars are not privatization candidates. libgomp/ * testsuite/libgomp.oacc-fortran/declare-1.f90: Adjust scan output. * testsuite/libgomp.oacc-fortran/host_data-5.F90: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/print-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
2022-10-14[og12] amdgcn: Use FLAT addressing for all functions with pointer argumentsJulian Brown2-6/+15
The GCN backend uses a heuristic to determine whether to use FLAT or GLOBAL addressing in a particular (offload) function: namely, if a function takes a pointer-to-scalar parameter, it is assumed that the pointer may refer to "flat scratch" space, and thus FLAT addressing must be used instead of GLOBAL. I came up with this heuristic initially whilst working on support for moving OpenACC gang-private variables into local-data share (scratch) memory. The assumption that only scalar variables would be transformed in that way turned out to be wrong. For example, prior to the next patch in the series, Fortran compiler-generated temporary structures were treated as gang private and moved to LDS space, typically overflowing the region allocated for such variables. That will no longer happen after that patch is applied, but there may be other cases of structs moving to LDS space now or in the future that this patch may be needed for. 2022-10-14 Julian Brown <julian@codesourcery.com> gcc/ * config/gcn/gcn.cc (gcn_detect_incoming_pointer_arg): Any pointer argument forces FLAT addressing mode, not just pointer-to-non-aggregate.
2022-10-14Fix PR target/107248Eric Botcazou1-12/+12
This is the infamous PR rtl-optimization/38644 rearing its ugly head for leaf functions on SPARC more than a decade later... Richard E.'s generic solution has never been implemented so let's do as other RISC back-ends did. gcc/ PR target/107248 * config/sparc/sparc.cc (sparc_expand_prologue): Emit a frame blockage for leaf functions. (sparc_flat_expand_prologue): Emit frame instead of full blockage. (sparc_expand_epilogue): Emit a frame blockage for leaf functions. (sparc_flat_expand_epilogue): Emit frame instead of full blockage.
2022-10-14Daily bump.GCC Administrator4-1/+27
2022-10-13c++: ICE with VEC_INIT_EXPR and defarg [PR106925]Marek Polacek2-2/+18
Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr while processing the default argument in this test. At this point start_preparsed_function hasn't yet set current_function_decl. expand_vec_init_expr then leads to maybe_splice_retval_cleanup which checks DECL_CONSTRUCTOR_P (current_function_decl) without checking that c_f_d is non-null first. It seems correct that c_f_d is null here, so it seems to me that maybe_splice_retval_cleanup should check c_f_d as in the following patch. PR c++/106925 gcc/cp/ChangeLog: * except.cc (maybe_splice_retval_cleanup): Check current_function_decl. Make the bool const. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-defarg3.C: New test. (cherry picked from commit 3130e70dab1e64a7b014391fe941090d5f3b6b7d)
2022-10-13install.texi: gcn - update llvm reqirements, gcn/nvptx - newlib use versionTobias Burnus1-5/+26
gcc/ * doc/install.texi (Specific): Add missing items to bullet list. (amdgcn): Update LLVM requirements, use version not date for newlib. (nvptx): Use version not git hash for newlib. (cherry picked from commit e886ebd17965d78f609b62479f4f48085108389c)
2022-10-13Daily bump.GCC Administrator3-1/+48
2022-10-12fortran: Move clobbers after evaluation of all arguments [PR106817]Mikael Morin2-2/+47
For actual arguments whose dummy is INTENT(OUT), we used to generate clobbers on them at the same time we generated the argument reference for the function call. This was wrong if for an argument coming later, the value expression was depending on the value of the just- clobbered argument, and we passed an undefined value in that case. With this change, clobbers are collected separatedly and appended to the procedure call preliminary code after all the arguments have been evaluated. PR fortran/106817 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_procedure_call): Collect all clobbers to their own separate block. Append the block of clobbers to the procedure preliminary block after the argument evaluation codes for all the arguments. gcc/testsuite/ChangeLog: * gfortran.dg/intent_optimize_4.f90: New test. (cherry picked from commit 29919bf3b6449bafd02e795abbb1966e3990c1fc)
2022-10-12fortran: Fix invalid function decl clobber ICE [PR105012]Mikael Morin2-1/+29
The fortran frontend, as result symbol for a function without declared result symbol, uses the function symbol itself. This caused an invalid clobber of a function decl to be emitted, leading to an ICE, whereas the intended behaviour was to clobber the function result variable. This change fixes the problem by getting the decl from the just-retrieved variable reference after the call to gfc_conv_expr_reference, instead of copying it from the frontend symbol. PR fortran/105012 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_procedure_call): Retrieve variable from the just calculated variable reference. gcc/testsuite/ChangeLog: * gfortran.dg/intent_out_15.f90: New test. (cherry picked from commit edaf1e005c90b311c39b46d85cea17befbece112)
2022-10-12fortran: Move the clobber generation codeMikael Morin2-30/+33
This change inlines the clobber generation code from gfc_conv_expr_reference to the single caller from where the add_clobber flag can be true, and removes the add_clobber argument. What motivates this is the standard making the procedure call a cause for a variable to become undefined, which translates to a clobber generation, so clobber generation should be closely related to procedure call generation, whereas it is rather orthogonal to variable reference generation. Thus the generation of the clobber feels more appropriate in gfc_conv_procedure_call than in gfc_conv_expr_reference. Behaviour remains unchanged. gcc/fortran/ChangeLog: * trans.h (gfc_conv_expr_reference): Remove add_clobber argument. * trans-expr.cc (gfc_conv_expr_reference): Ditto. Inline code depending on add_clobber and conditions controlling it ... (gfc_conv_procedure_call): ... to here. (cherry picked from commit 2b393f6f83903cb836676bbd042c1b99a6e7e6f7)
2022-10-12[OG12] amdgcn: Fixup "Add builtin for vectorized DFmode fabs operation"Andrew Stubbs2-1/+6
The function was taken away by the "add multiple vector sizes" patch. 2022-10-11 Andrew Stubbs <ams@codesourcery.com> gcc/ * config/gcn/gcn.cc (gcn_expand_builtin_1): Change gcn_full_exec_reg to get_exec.
2022-10-12amdgcn: vector testsuite tweaksAndrew Stubbs15-14/+53
The testsuite needs a few tweaks following my patches to add multiple vector sizes for amdgcn. gcc/testsuite/ChangeLog: * gcc.dg/pr104464.c: Xfail on amdgcn. * gcc.dg/signbit-2.c: Likewise. * gcc.dg/signbit-5.c: Likewise. * gcc.dg/vect/bb-slp-68.c: Likewise. * gcc.dg/vect/bb-slp-cond-1.c: Change expectations on amdgcn. * gcc.dg/vect/bb-slp-subgroups-3.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-2.c: Change expectations for multiple vector sizes. * gcc.dg/vect/pr33953.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-13.c: Likewise. * gcc.dg/vect/pr80631-2.c: Likewise. * gcc.dg/vect/slp-reduc-4.c: Likewise. * gcc.dg/vect/trapv-vect-reduc-4.c: Likewise. * lib/target-supports.exp (available_vector_sizes): Add more sizes for amdgcn.
2022-10-12amdgcn: Add vector integer negate insnAndrew Stubbs2-0/+20
Another example of the vectorizer needing explicit insns where the scalar expander just works. gcc/ChangeLog: * config/gcn/gcn-valu.md (neg<mode>2): New define_expand.
2022-10-12amdgcn: vec_init for multiple vector sizesAndrew Stubbs3-26/+155
Implements vec_init when the input is a vector of smaller vectors, or of vector MEM types, or a smaller vector duplicated several times. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_init<V_ALL:mode><V_ALL_ALT:mode>): New. * config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3. (GEN_VNM): Add gathervNm_expr. (GEN_VN_NOEXEC): Add vec_seriesvNsi. (gcn_expand_vector_init): Add initialization of vectors from smaller vectors.
2022-10-12amdgcn: Add vec_extract for partial vectorsAndrew Stubbs4-1/+55
Add vec_extract expanders for all valid pairs of vector types. gcc/ChangeLog: * config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants. * config/gcn/gcn-valu.md (vec_extract<V_ALL:mode><V_ALL_ALT:mode>): New define_expand. * config/gcn/gcn.cc (get_exec): Export the existing function. Add a new overload variant.
2022-10-12amdgcn: Resolve insn conditions at compile timeAndrew Stubbs3-4/+40
GET_MODE_NUNITS isn't a compile time constant, so we end up with many impossible insns in the machine description. Adding MODE_VF allows the insns to be eliminated completely. gcc/ChangeLog: * config/gcn/gcn-valu.md (<cvt_name><VCVT_MODE:mode><VCVT_FMODE:mode>2<exec>): Use MODE_VF. (<cvt_name><VCVT_FMODE:mode><VCVT_IMODE:mode>2<exec>): Likewise. * config/gcn/gcn.h (MODE_VF): New macro.
2022-10-12amdgcn: add multiple vector sizesAndrew Stubbs5-425/+1015
The vectors sizes are simulated using implicit masking, but they make life easier for the autovectorizer and SLP passes. gcc/ChangeLog: * config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF, V16QI, V16HI, V16SI, V16DI, V16TI, V16HF, V16SF, V16DF, V8QI, V8HI, V8SI, V8DI, V8TI, V8HF, V8SF, V8DF, V4QI, V4HI, V4SI, V4DI, V4TI, V4HF, V4SF, V4DF, V2QI, V2HI, V2SI, V2DI, V2TI, V2HF, V2SF, V2DF. (ADJUST_ALIGNMENT): Likewise. * config/gcn/gcn-protos.h (gcn_full_exec): Delete. (gcn_full_exec_reg): Delete. (gcn_scalar_exec): Delete. (gcn_scalar_exec_reg): Delete. (vgpr_1reg_mode_p): Use inner mode to identify vector registers. (vgpr_2reg_mode_p): Likewise. (vgpr_vector_mode_p): Use VECTOR_MODE_P. * config/gcn/gcn-valu.md (V_QI, V_HI, V_HF, V_SI, V_SF, V_DI, V_DF, V_QIHI, V_1REG, V_INT_1REG, V_INT_1REG_ALT, V_FP_1REG, V_2REG, V_noQI, V_noHI, V_INT_noQI, V_INT_noHI, V_ALL, V_ALL_ALT, V_INT, V_FP): Add additional vector modes. (V64_SI, V64_DI, V64_ALL, V64_FP): New iterators. (scalar_mode, SCALAR_MODE, vnsi, VnSI, vndi, VnDI, sdwa): Add additional vector mode mappings. (mov<mode>): Implement vector length conversions. (ldexp<mode>3<exec>): Use VnSI. (frexp<mode>_exp2<exec>): Likewise. (VCVT_MODE, VCVT_FMODE, VCVT_IMODE): Add additional vector modes. (reduc_<reduc_op>_scal_<mode>): Use V64_ALL. (fold_left_plus_<mode>): Use V64_FP. (*<reduc_op>_dpp_shr_<mode>): Use V64_1REG. (*<reduc_op>_dpp_shr_<mode>): Use V64_DI. (*plus_carry_dpp_shr_<mode>): Use V64_INT_1REG. (*plus_carry_in_dpp_shr_<mode>): Use V64_SI. (*plus_carry_dpp_shr_<mode>): Use V64_DI. (mov_from_lane63_<mode>): Use V64_2REG. * config/gcn/gcn.cc (VnMODE): New function. (gcn_can_change_mode_class): Support multiple vector sizes. (gcn_modes_tieable_p): Likewise. (gcn_operand_part): Likewise. (gcn_scalar_exec): Delete function. (gcn_scalar_exec_reg): Delete function. (gcn_full_exec): Delete function. (gcn_full_exec_reg): Delete function. (gcn_inline_fp_constant_p): Support multiple vector sizes. (gcn_fp_constant_p): Likewise. (A): New macro. (GEN_VN_NOEXEC): New macro. (GEN_VNM_NOEXEC): New macro. (GEN_VN): New macro. (GEN_VNM): New macro. (GET_VN_FN): New macro. (CODE_FOR): New macro. (CODE_FOR_OP): New macro. (gen_mov_with_exec): Delete function. (gen_duplicate_load): Delete function. (gcn_expand_vector_init): Support multiple vector sizes. (strided_constant): Likewise. (gcn_addr_space_legitimize_address): Likewise. (gcn_expand_scalar_to_vector_address): Likewise. (gcn_expand_scaled_offsets): Likewise. (gcn_secondary_reload): Likewise. (gcn_valid_cvt_p): Likewise. (gcn_expand_builtin_1): Likewise. (gcn_make_vec_perm_address): Likewise. (gcn_vectorize_vec_perm_const): Likewise. (gcn_vector_mode_supported_p): Likewise. (gcn_autovectorize_vector_modes): New hook. (gcn_related_vector_mode): Support multiple vector sizes. (gcn_expand_dpp_shr_insn): Add FIXME comment. (gcn_md_reorg): Support multiple vector sizes. (print_reg): Likewise. (print_operand): Likewise. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): New hook.
2022-10-12vect: while_ult for integer masksAndrew Stubbs4-6/+46
Add a vector length parameter needed by amdgcn without breaking aarch64. All amdgcn vector masks are DImode, regardless of vector length, so we can't tell what length is implied simply from the operator mode. (Even if we used different integer modes there's no mode small enough to differenciate a 2 or 4 lane mask). Without knowing the intended length we end up using a mask with too many lanes enabled, which leads to undefined behaviour.. The extra operand is not added for vector mask types so AArch64 does not need to be adjusted. gcc/ChangeLog: * config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using operand 3. * doc/md.texi (while_ult): Document new operand 3 usage. * internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type maps to a non-vector mode.
2022-10-12Daily bump.GCC Administrator3-1/+96
2022-10-11arm: Fix constant immediates predicates and constraints for some MVE builtinsChristophe Lyon1-15/+15
Several MVE builtins incorrectly use the same predicate/constraint pair for several modes, which does not match the specification. This patch uses the appropriate iterator instead. 2022-09-06 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/mve.md (mve_vqshluq_n_s<mode>): Use MVE_pred/MVE_constraint instead of mve_imm_7/Ra. (mve_vqshluq_m_n_s<mode>): Likewise. (mve_vqrshrnbq_n_<supf><mode>): Use MVE_pred3/MVE_constraint3 instead of mve_imm_8/Rb. (mve_vqrshrunbq_n_s<mode>): Likewise. (mve_vqrshrntq_n_<supf><mode>): Likewise. (mve_vqrshruntq_n_s<mode>): Likewise. (mve_vrshrnbq_n_<supf><mode>): Likewise. (mve_vrshrntq_n_<supf><mode>): Likewise. (mve_vqrshrnbq_m_n_<supf><mode>): Likewise. (mve_vqrshrntq_m_n_<supf><mode>): Likewise. (mve_vrshrnbq_m_n_<supf><mode>): Likewise. (mve_vrshrntq_m_n_<supf><mode>): Likewise. (mve_vqrshrunbq_m_n_s<mode>): Likewise. (mve_vsriq_n_<supf><mode): Use MVE_pred2/MVE_constraint2 instead of mve_imm_selective_upto_8/Rg. (mve_vsriq_m_n_<supf><mode>): Likewise. (cherry-picked from c3fb6658c7670e446f2fd00984404d971e416b3c)
2022-10-11tree-optimization/106934 - avoid BIT_FIELD_REF of bitfieldsRichard Biener2-0/+13
The following avoids creating BIT_FIELD_REF of bitfields in update-address-taken. The patch doesn't implement punning to a full precision integer type but leaves a comment according to that. PR tree-optimization/106934 * tree-ssa.cc (non_rewritable_mem_ref_base): Avoid BIT_FIELD_REFs of bitfields. (maybe_rewrite_mem_ref_base): Likewise. * gfortran.dg/pr106934.f90: New testcase. (cherry picked from commit 05f5c42cb42c5088187d44cc45a5f671d19ad8c5)
2022-10-11tree-optimization/106922 - PRE and virtual operand translationRichard Biener2-6/+103
PRE implicitely keeps virtual operands at the blocks incoming version but the explicit updating point during PHI translation fails to trigger when there are no PHIs at all in a block. Later lazy updating then fails because of a too lose block check. A similar issues plagues reference invalidation when checking the ANTIC_OUT to ANTIC_IN translation. The following fixes both and makes the lazy updating work. The diagnostic testcase unfortunately requires boost so the testcase is the one I reduced for a missed optimization in PRE. The testcase fails with -m32 on x86_64 because we optimize too much before PRE which causes PRE to not trigger so we fail to eliminate a full redundancy. I'm going to open a separate bug for this. Hopefully the !lp64 selector is good enough. PR tree-optimization/106922 * tree-ssa-pre.cc (translate_vuse_through_block): Only keep the VUSE if its def dominates PHIBLOCK. (prune_clobbered_mems): Rewrite logic so we check whether a value dies in a block when the VUSE def doesn't dominate it. * g++.dg/tree-ssa/pr106922.C: New testcase. (cherry picked from commit 5edf02ed2b6de024f83a023d046a6a18f645bc83)
2022-10-11tree-optimization/106892 - avoid invalid pointer association in predcomRichard Biener2-2/+46
When predictive commoning builds a reference for iteration N it prematurely associates a constant offset into the MEM_REF offset operand which can be invalid if the base pointer then points outside of an object which alias-analysis does not consider valid. PR tree-optimization/106892 * tree-predcom.cc (ref_at_iteration): Do not associate the constant part of the offset into the MEM_REF offset operand, across a non-zero offset. * gcc.dg/torture/pr106892.c: New testcase. (cherry picked from commit a8b0b13da7379feb31950a9d2ad74b98a29c547f)
2022-10-11tree-optimization/105937 - avoid uninit diagnostics crossing iterationsRichard Biener2-2/+247
The following avoids adding PHIs to the worklist for uninit processing if we reach them following backedges. That confuses predicate analysis because it assumes the use is happening in the same iteration as the the definition. For the testcase in the PR the situation is like void foo (int val) { int uninit; # val = PHI <..> (B) for (..) { if (..) { .. = val; (C) val = uninit; } # val = PHI <..> (A) } } and starting from (A) with 'uninit' as argument we arrive at (B) and from there at (C). Predicate analysis then tries to prove the predicate of (B) (not the backedge) can prove that the path from (B) to (C) is unreachable which isn't really what it necessary - that's what we'd need to do when the preheader edge of the loop were the edge with the uninitialized def. So the following makes those cases intentionally false negatives. PR tree-optimization/105937 * tree-ssa-uninit.cc (find_uninit_use): Do not queue PHIs on backedges. (execute_late_warn_uninitialized): Mark backedges. * g++.dg/uninit-pr105937.C: New testcase. (cherry picked from commit c77fae1ca796d6ea06d5cd437909905c3d3d771c)
2022-10-11Merge branch 'releases/gcc-12' into devel/omp/gcc-12Tobias Burnus27-63/+563
Merge up to r12-8817-g97374f25e1ee7ea45293c244f29425c9f9abcf5a (11th Oct 2022)
2022-10-11Daily bump.GCC Administrator1-1/+1
2022-10-10Daily bump.GCC Administrator1-1/+1
2022-10-09Daily bump.GCC Administrator3-1/+25