Age | Commit message (Collapse) | Author | Files | Lines |
|
vect_better_loop_vinfo_p
While experimenting with some backend costs for Advanced SIMD and SVE I
hit many cases where GCC would pick SVE for VLA auto-vectorisation even when
the backend very clearly presented cheaper costs for Advanced SIMD.
For a simple float addition loop the SVE costs were:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 28
Vector prologue cost: 2
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 2
prologue iterations: 0
epilogue iterations: 0
Minimum number of vector iterations: 1
Calculated minimum iters for profitability: 4
and for Advanced SIMD (Neon) they're:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 11
Vector prologue cost: 0
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 0
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 0
vec.c:9:21: note: Runtime profitability threshold = 4
yet the SVE one was always picked. With guidance from Richard this seems
to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in
particular the part with the big comment explaining the
estimated_rel_new * 2 <= estimated_rel_old heuristic.
This patch extends the comparisons by introducing a three-way estimate
kind for poly_int values that the backend can distinguish.
This allows vect_better_loop_vinfo_p to ask for minimum, maximum and
likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper.
gcc/
* target.h (enum poly_value_estimate_kind): Define.
(estimated_poly_value): Take an estimate kind argument.
* target.def (estimated_poly_value): Update definition for the
above.
* doc/tm.texi: Regenerate.
* targhooks.c (estimated_poly_value): Update prototype.
* tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and
likely estimates of VF to pick between vinfos.
* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use
estimated_poly_value instead of aarch64_estimated_poly_value.
(aarch64_estimated_poly_value): Take a kind argument and handle
it.
|
|
Clang didn't like sizeot (uintset::value) in a templated context. Not sure
where the problem lies -- ambiguous std, gcc erroneous accept or clang erroneous
reject. Anyway, this avoids that construct.
PR c++/98340
gcc/cp/
* module.cc (uintset<T>::hash::add): Use uintset (0u).MEMBER,
rather than uintset::MEMBER.
|
|
gcc/ChangeLog
2020-12-17 Andrea Corallo <andrea.corallo@arm.com>
* config/arm/arm_neon.h (vcreate_p64): Remove call to
'__builtin_neon_vcreatedi'.
|
|
Processing op1_range for conversion between a non-pointer and pointer
shouldnt do any fancy math.
gcc/
PR tree-optimization/97750
* range-op.cc (operator_cast::op1_range): Handle pointers better.
gcc/testsuite/
* gcc.dg/pr97750.c: New.
|
|
The RTL SSA merge broke SPARC bootstrap:
In file included from ./tm_p.h:4,
from /vol/gcc/src/hg/master/local/gcc/rtl-ssa.h:54,
from /vol/gcc/src/hg/master/local/gcc/fwprop.c:29:
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc-protos.h:45:47: error: use of enum 'memmodel' without previous declaration
extern void sparc_emit_membar_for_model (enum memmodel, int, int);
^~~~~~~~
and similarly in rtl-ssa/functions.cc, rtl-ssa/changes.cc, and
rtl-ssa/insns.cc.
Fixed by moving the memmove.h include in rtl-ssa.h before tm_p.h.
Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.
2020-12-17 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc:
* rtl-ssa.h: Include memmodel.h before tm_p.h.
|
|
When breaking out the sample server from the gcc/cp directory, it lost
its check for mmap, and the sample resolver just assumed it was there.
Fixed thusly. The non-mapping paths in module.cc weren't (recently)
excercised, and led to a signedness warning. Finally I'd missed
c++tools's config.h.in in the gcc_update script. There I took the
opportunity of adding a 'tools' segment of the dependency lists.
PR bootstrap/98300
contrib/
* gcc_update: Add c++tools/config.h.in.
c++tools/
* configure.ac: Check for sys/mman.h.
* resolver.cc: Don't assume mmap, O_CLOEXEC are available. Use
xmalloc.
* config.h.in: Regenerated.
* configure: Regenerated.
gcc/cp/
* module.cc: Fix ::read, ::write result signedness comparisons.
|
|
As mentioned in the PR, shrink-wrapping disqualifies for prologue
placement basic blocks that have EDGE_CROSSING incoming edge.
I don't see why that is necessary, those edges seem to be redirected
just fine, both on x86_64 and powerpc64. In the former case, they
are usually conditional jumps that patch_jump_insn can handle just fine,
after all, they were previously crossing and will be crossing after
the redirection too, just to a different label. And in the powerpc64
case, it is a simple_jump instead that again seems to be handled by
patch_jump_insn just fine.
Sure, redirecting an edge that was previously not crossing to be crossing or
vice versa can fail, but that is not what shrink-wrapping needs.
Also tested in GCC 8 with this patch and don't see ICEs there either
(though, of course, I'm not suggesting we should backport this to release
branches).
The old ICEs could have been fixed by PR87475 fix or some other one
years ago.
2020-12-17 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/98289
* shrink-wrap.c (can_get_prologue): Don't punt on EDGE_CROSSING
incoming edges.
* gcc.target/i386/pr98289.c: New test.
* gcc.dg/torture/pr98289.c: New test.
|
|
gcc/ada/
* libgnat/a-tags.ads, libgnat/a-tags.adb (CW_Membership): Move
to spec to allow inlining.
gcc/testsuite/
* gnat.dg/debug15.adb: Remove fragile testcase.
|
|
gcc/ada/
* checks.adb: Remove, not used.
* checks.ads: Likewise.
* exp_ch6.adb: Likewise.
* exp_ch7.adb: Likewise.
* exp_ch7.ads: Likewise.
* exp_fixd.adb: Likewise.
* exp_tss.adb: Likewise.
* exp_tss.ads: Likewise.
* exp_util.adb: Likewise.
* exp_util.ads: Likewise.
* gnat1drv.adb: Likewise.
* libgnat/s-finmas.adb: Likewise.
* libgnat/s-finmas.ads: Likewise.
* libgnat/system-aix.ads: Likewise.
* libgnat/system-darwin-arm.ads: Likewise.
* libgnat/system-darwin-ppc.ads: Likewise.
* libgnat/system-darwin-x86.ads: Likewise.
* libgnat/system-djgpp.ads: Likewise.
* libgnat/system-dragonfly-x86_64.ads: Likewise.
* libgnat/system-freebsd.ads: Likewise.
* libgnat/system-hpux-ia64.ads: Likewise.
* libgnat/system-hpux.ads: Likewise.
* libgnat/system-linux-alpha.ads: Likewise.
* libgnat/system-linux-arm.ads: Likewise.
* libgnat/system-linux-hppa.ads: Likewise.
* libgnat/system-linux-ia64.ads: Likewise.
* libgnat/system-linux-m68k.ads: Likewise.
* libgnat/system-linux-mips.ads: Likewise.
* libgnat/system-linux-ppc.ads: Likewise.
* libgnat/system-linux-riscv.ads: Likewise.
* libgnat/system-linux-s390.ads: Likewise.
* libgnat/system-linux-sh4.ads: Likewise.
* libgnat/system-linux-sparc.ads: Likewise.
* libgnat/system-linux-x86.ads: Likewise.
* libgnat/system-lynxos178-ppc.ads: Likewise.
* libgnat/system-lynxos178-x86.ads: Likewise.
* libgnat/system-mingw.ads: Likewise.
* libgnat/system-qnx-aarch64.ads: Likewise.
* libgnat/system-rtems.ads: Likewise.
* libgnat/system-solaris-sparc.ads: Likewise.
* libgnat/system-solaris-x86.ads: Likewise.
* libgnat/system-vxworks-arm-rtp-smp.ads: Likewise.
* libgnat/system-vxworks-arm-rtp.ads: Likewise.
* libgnat/system-vxworks-arm.ads: Likewise.
* libgnat/system-vxworks-e500-kernel.ads: Likewise.
* libgnat/system-vxworks-e500-rtp-smp.ads: Likewise.
* libgnat/system-vxworks-e500-rtp.ads: Likewise.
* libgnat/system-vxworks-e500-vthread.ads: Likewise.
* libgnat/system-vxworks-ppc-kernel.ads: Likewise.
* libgnat/system-vxworks-ppc-ravenscar.ads: Likewise.
* libgnat/system-vxworks-ppc-rtp-smp.ads: Likewise.
* libgnat/system-vxworks-ppc-rtp.ads: Likewise.
* libgnat/system-vxworks-ppc-vthread.ads: Likewise.
* libgnat/system-vxworks-ppc.ads: Likewise.
* libgnat/system-vxworks-x86-kernel.ads: Likewise.
* libgnat/system-vxworks-x86-rtp-smp.ads: Likewise.
* libgnat/system-vxworks-x86-rtp.ads: Likewise.
* libgnat/system-vxworks-x86-vthread.ads: Likewise.
* libgnat/system-vxworks-x86.ads: Likewise.
* libgnat/system-vxworks7-aarch64-rtp-smp.ads: Likewise.
* libgnat/system-vxworks7-aarch64.ads: Likewise.
* libgnat/system-vxworks7-arm-rtp-smp.ads: Likewise.
* libgnat/system-vxworks7-arm.ads: Likewise.
* libgnat/system-vxworks7-e500-kernel.ads: Likewise.
* libgnat/system-vxworks7-e500-rtp-smp.ads: Likewise.
* libgnat/system-vxworks7-e500-rtp.ads: Likewise.
* libgnat/system-vxworks7-ppc-kernel.ads: Likewise.
* libgnat/system-vxworks7-ppc-rtp-smp.ads: Likewise.
* libgnat/system-vxworks7-ppc-rtp.ads: Likewise.
* libgnat/system-vxworks7-ppc64-kernel.ads: Likewise.
* libgnat/system-vxworks7-ppc64-rtp-smp.ads: Likewise.
* libgnat/system-vxworks7-x86-kernel.ads: Likewise.
* libgnat/system-vxworks7-x86-rtp-smp.ads: Likewise.
* libgnat/system-vxworks7-x86-rtp.ads: Likewise.
* libgnat/system-vxworks7-x86_64-kernel.ads: Likewise.
* libgnat/system-vxworks7-x86_64-rtp-smp.ads: Likewise.
* repinfo.adb: Likewise.
* repinfo.ads: Likewise.
* rtsfind.ads: Likewise.
* sem_aux.adb: Likewise.
* sem_aux.ads: Likewise.
* sem_ch13.adb: Likewise.
* sem_ch13.ads: Likewise.
* sem_util.adb (Validity_Checks_Suppressed, TSS,
Is_All_Null_Statements, Known_Non_Negative,
Non_Limited_Designated_Type, Get_Binary_Nkind, Get_Unary_Nkind,
Is_Protected_Operation, Number_Components, Package_Body,
Validate_Independence, Independence_Checks): Likewise; update
comments.
* targparm.adb: Likewise.
* targparm.ads (AAM, AAM_Str, Fractional_Fixed_Ops,
Frontend_Layout, Make_Detach_Call, Target_Has_Fixed_Ops, Detach,
Back_End_Layout, Create_Dynamic_SO_Ref, Get_Dynamic_SO_Entity,
Is_Dynamic_SO_Ref, Is_Static_SO_Ref,
Fractional_Fixed_Ops_On_Target): Likewise.
* validsw.adb (Save_Validity_Check_Options,
Set_Default_Validity_Check_Options): Likewise.
* validsw.ads: Likewise.
|
|
gcc/ada/
* symbols.ads, symbols.adb: Removed no longer used.
|
|
gcc/ada/
* sem_util.adb (New_Requires_Transient_Scope): Renamed
Requires_Transient_Scope.
(Requires_Transient_Scope, Old_Requires_Transient_Scope,
Results_Differ): Removed.
* debug.adb: Remove -gnatdQ.
|
|
gcc/ada/
* libgnat/s-valrea.adb (Need_Extra): Fix comment.
|
|
gcc/ada/
* sem_ch5.adb (Analyze_Case_Statement): Move modification of
Unblocked_Exit_Count after early return statements; fix typo in
comment.
|
|
gcc/ada/
* sem_ch5.adb (Analyze_Case_Statement): Change local variable
Exp to constant; remove unreferenced Last_Choice variable;
reduce scope of other variables.
(Analyze_If_Statement): Reduce scope of a local variable; add
comment.
|
|
gcc/ada/
* opt.ads (Multiple_Unit_Index): Refine type from Int to Nat.
|
|
gcc/ada/
* sem_util.adb (In_Check_Node): Add guard and rename Node to
Par, just like it is done in surrounding routines, e.g.
In_Assertion_Expression_Pragma and In_Generic_Formal_Package.
|
|
gcc/ada/
* libgnat/a-cbdlli.adb, libgnat/a-cbdlli.ads,
libgnat/a-cdlili.adb, libgnat/a-cdlili.ads,
libgnat/a-cidlli.adb, libgnat/a-cidlli.ads,
libgnat/a-cobove.adb, libgnat/a-cobove.ads,
libgnat/a-coinve.adb, libgnat/a-coinve.ads,
libgnat/a-convec.adb, libgnat/a-convec.ads: Add *_Vector
operations, remove default for Count, rename Append_One to be
Append.
|
|
gcc/ada/
* sem_res.adb (Resolve_Declare_Expression): Need to establish a
transient scope in case Expression (N) requires actions to be
wrapped. Code cleanup.
* exp_ch7.adb, exp_ch11.adb: Code cleanup.
|
|
gcc/ada/
* par-ch3.adb (P_Identifier_Declarations): Reuse
Error_Msg_Ada_2020_Feature for object renaming without subtype.
* par-ch4.adb (P_Primary): Likewise for target name.
(P_Iterated_Component_Association): Likewise for iterated
component.
(P_Declare_Expression): Likewise for declare expression.
* par-ch6.adb (P_Formal_Part): Likewise for aspect on formal
parameter.
* sem_aggr.adb (Resolve_Delta_Aggregate): Ditto.
* sem_ch8.adb (Analyze_Object_Renaming): Reuse
Error_Msg_Ada_2020_Feature.
* sem_ch13.adb (Validate_Aspect_Aggregate): Reuse
Error_Msg_Ada_2020_Feature; use lower case for "aspect" and
don't use underscore for "Ada_2020"; don't give up on analysis
in Ada 2012 mode.
(Validate_Aspect_Stable_Properties): Reuse
Error_Msg_Ada_2020_Feature; use lower case for "aspect"; minor
style fixes.
|
|
gcc/ada/
* sem_ch4.adb (Analyze_Selected_Component): Request a compile
time error replacement in Apply_Compile_Time_Constraint_Error
in case of an invalid field.
* sem_ch3.adb (Create_Constrained_Components): Take advantage of
Gather_Components also in the case of a record extension and
also constrain records in the case of compile time known discriminant
values, as already done in gigi.
* sem_util.ads, sem_util.adb (Gather_Components): New parameter
Allow_Compile_Time to allow compile time known (but non static)
discriminant values, needed by Create_Constrained_Components,
and new parameter Include_Interface_Tag.
(Is_Dependent_Component_Of_Mutable_Object): Use Original_Node to
perform check on the original tree.
(Is_Object_Reference): Likewise. Only call Original_Node when
relevant via a new function Safe_Prefix.
(Is_Static_Discriminant_Component, In_Check_Node): New.
(Is_Actual_Out_Or_In_Out_Parameter): New.
* exp_ch4.adb (Expand_N_Selected_Component): Remove no longer needed
code preventing evaluating statically discriminants in more cases.
* exp_ch5.adb (Expand_N_Loop_Statement): Simplify expansion of loops
with an N_Raise_xxx_Error node to avoid confusing the code generator.
(Make_Component_List_Assign): Try to find a constrained type to
extract discriminant values from, so that the case statement
built gets an opportunity to be folded by
Expand_N_Case_Statement.
(Expand_Assign_Record): Update comments, code cleanups.
* sem_attr.adb (Analyze_Attribute): Perform most of the analysis
on the original prefix node to deal properly with a prefix rewritten
as a N_Raise_xxx_Error.
* sem_ch5.adb (Analyze_Loop_Parameter_Specification): Handle properly
a discrete subtype definition being rewritten as N_Raise_xxx_Error.
* sem_ch8.adb (Analyze_Object_Renaming): Handle N_Raise_xxx_Error
nodes as part of the expression being renamed.
* sem_eval.ads, sem_eval.adb (Fold, Eval_Selected_Component): New.
(Compile_Time_Known_Value, Expr_Value, Expr_Rep_Value): Evaluate
static discriminant component values.
* sem_res.adb (Resolve_Selected_Component): Call
Eval_Selected_Component.
|
|
gcc/ada/
* exp_ch4.adb (Expand_N_Unchecked_Type_Conversion): Remove
folding of discrete values.
* exp_intr.adb (Expand_Unc_Conversion): Analyze, resolve and
evaluate (if possible) calls to instances of
Ada.Unchecked_Conversion after they have been expanded into
N_Unchecked_Type_Conversion.
* sem_eval.adb (Eval_Unchecked_Conversion): Add folding of
discrete values.
|
|
gcc/ada/
* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Likewise.
* exp_imgv.adb (Expand_Value_Attribute): Use RE_Value_Long_Float in
lieu of RE_Value_Long_Long_Float as fallback for fixed-point types.
Also use it for Long_Long_Float if it has same size as Long_Float.
* libgnat/s-imgrea.adb: Replace Powten_Table with Powen_LLF.
* libgnat/s-powflt.ads: New file.
* libgnat/s-powlfl.ads: Likewise.
* libgnat/s-powtab.ads: Rename to...
* libgnat/s-powllf.ads: ...this.
* libgnat/s-valflt.ads: Add with clause for System.Powten_Flt and
pass its table as actual parameter to System.Val_Real.
* libgnat/s-vallfl.ads: Likewise for System.Powten_LFlt.
* libgnat/s-valllf.ads: Likewise for System.Powten_LLF.
* libgnat/s-valrea.ads: Add Maxpow and Powten_Address parameters.
* libgnat/s-valrea.adb: Add pragma Warnings (Off).
(Need_Extra): New boolean constant.
(Precision_Limit): Set it according to Need_Extra.
(Impl): Adjust actual parameter.
(Integer_to_Rea): Add assertion on the machine radix. Take into
account the extra digit only if Need_Extra is true. Reimplement
the computation of the final value for bases 2, 4, 8, 10 and 16.
* libgnat/s-valued.adb (Impl): Adjust actual parameter.
(Scan_Decimal): Add pragma Unreferenced.
(Value_Decimal): Likewise.
* libgnat/s-valuef.adb (Impl): Adjust actual parameter.
* libgnat/s-valuer.ads (Floating): Remove.
(Round): New formal parameter.
* libgnat/s-valuer.adb (Round_Extra): New procedure.
(Scan_Decimal_Digits): Use it to round the extra digit if Round
is set to True in the instantiation.
(Scan_Integral_Digits): Likewise.
|
|
gcc/ada/
* libgnat/system-lynxos178-ppc.ads,
libgnat/system-lynxos178-x86.ads: Fix small typo in comments.
|
|
gcc/ada/
* exp_dbug.adb (Get_Encoded_Name): Generate encodings for fixed
point types only if -fgnat-encodings=all is specified.
|
|
gcc/ada/
* checks.adb (Build_Discriminant_Checks): Add condition to
replace references to the current instance of the type when we
are within an Init_Proc.
(Replace_Current_Instance): Examine a given node and replace the
current instance of the type with the corresponding _init
formal.
(Search_And_Replace_Current_Instance): Traverse proc which calls
Replace_Current_Instance in order to replace all references
within a given expression.
|
|
gcc/ada/
* par-ch12.adb (P_Formal_Derived_Type_Definition): Complain
about formal type with aspect specification, which only become
legal in Ada 2020.
* par-ch9.adb (P_Protected_Operation_Declaration_Opt): Reuse
Error_Msg_Ada_2005_Extension.
(P_Entry_Declaration): Likewise.
* scng.adb (Scan): Improve diagnostics for target_name; emit
error, but otherwise continue in earlier than Ada 2020 modes.
|
|
gcc/ada/
* libgnat/a-cbsyqu.ads (Implementation): Provide a box
initialization for the element array used internally to
represent the queue, so that its components are properly
initialized if the given element type has default
initialization. Suppress warnings on the rest of the package in
case the element type has no default or discriminant, because it
is bound to be confusing to the user.
|
|
gcc/ada/
* sem_util.adb (Inherit_Predicate_Flags): No-op before Ada 2012.
|
|
gcc/ada/
* exp_ch7.adb (Make_Final_Call, Make_Init_Call): Take protected
types into account.
* sem_util.ads: Fix typo.
|
|
gcc/ada/
* checks.adb: Rework error messages.
* exp_ch3.adb: Likewise.
* freeze.adb: Likewise.
* lib-load.adb: Likewise.
* par-ch12.adb: Likewise.
* par-ch3.adb: Likewise.
* par-ch4.adb: Likewise.
* par-ch9.adb: Likewise.
* sem_aggr.adb: Likewise.
* sem_attr.adb: Likewise.
* sem_cat.adb: Likewise.
* sem_ch10.adb: Likewise.
* sem_ch12.adb: Likewise.
(Instantiate_Type): Fix CODEFIX comment, applicable only on
continuation message, and identify the second message as a
continuation.
* sem_ch13.adb: Rework error messages.
* sem_ch3.adb: Likewise.
* sem_ch4.adb: Likewise.
* sem_ch5.adb: Likewise.
* sem_ch6.adb: Likewise.
* sem_ch8.adb: Likewise.
* sem_ch9.adb: Likewise.
* sem_prag.adb: Likewise.
* sem_res.adb: Likewise.
* sem_util.adb: Likewise.
(Wrong_Type): Fix CODEFIX comment, applicable only on
continuation message, and identify the second message as a
continuation.
* symbols.adb: Rework error messages.
gcc/testsuite/
* gnat.dg/interface6.adb, gnat.dg/not_null.adb,
gnat.dg/protected_func.adb: Adjust error messages.
|
|
gcc/ada/
* sem_attr.adb (OK_Self_Reference): Return True if node does not
come from source (e.g. a rewritten aggregate).
|
|
gcc/ada/
* sem_ch13.adb (Parse_Aspect_Stable_Properties): Fix style;
limit the scope of local variables; remove extra assignment in
Extract_Entity.
(Validate_Aspect_Stable_Properties): Simplify with procedural
Next.
|
|
When cross-compiling GCC with target libc headers available and
configure option --enable-s390-excess-float-precision has been omitted,
identify whether they clamp float_t to double or respect
__FLT_EVAL_METHOD__ via a compile test that coerces the build-system
compiler to use the target headers. Then derive the setting from that.
gcc/ChangeLog:
2020-12-16 Marius Hillenbrand <mhillen@linux.ibm.com>
* configure.ac: Change --enable-s390-excess-float-precision
default behavior for cross compiles with target headers.
* configure: Regenerate.
* doc/install.texi: Adjust documentation.
|
|
gcc/fortran/ChangeLog:
PR fortran/92587
* match.c (gfc_match_assignment): Move gfc_find_vtab call from here ...
* resolve.c (gfc_resolve_code): ... to here.
gcc/testsuite/ChangeLog:
PR fortran/92587
* gfortran.dg/finalize_37.f90: New test.
|
|
The dependency check for FORALL constructs already handled pointer
components to derived types, but missed allocatables. Fix that.
gcc/fortran/ChangeLog:
PR fortran/98307
* trans-stmt.c (check_forall_dependencies): Extend dependency
check to allocatable components of derived types.
gcc/testsuite/ChangeLog:
PR fortran/98307
* gfortran.dg/forall_19.f90: New test.
|
|
|
|
|
|
2020-12-16 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
gcc/
* config/xtensa/xtensa.md (*ashlsi3_1, *ashlsi3_3x, *ashrsi3_3x)
(*lshrsi3_3x): New patterns.
gcc/testsuite/
* gcc.target/xtensa/shifts.c: New test.
|
|
|
|
This patch rewrites fwprop.c to use the RTL SSA framework. It tries
as far as possible to mimic the old behaviour, even in caes where
that doesn't fit naturally with the new framework. I've added ???
comments to mark those places, but I think “fixing” them should
be done separately to make bisection easier.
In particular:
* The old implementation iterated over uses, and after a successful
substitution, the new insn's uses were added to the end of the list.
The pass still processed those uses, but because it processed them at
the end, it didn't fully optimise one instruction before propagating
it into the next.
The new version follows the same approach for comparison purposes,
but I'd like to drop that as a follow-on patch.
* The old implementation operated on single use sites (DF_REF_LOCs).
This doesn't work well for instructions with match_dups, where it's
necessary to update both an operand and its dups at the same time.
For example, attempting to substitute into a divmod instruction would
fail because only the div or the mod side would be updated.
The new version again follows this to some extent for comparison
purposes (although not exactly). Again I'd like to drop it as a
follow-on patch.
One difference is that if a register occurs in multiple MEM addresses
in a set, the new version will try to update them all at once. This is
what causes the SVE ACLE st4* output to improve.
Also, the old version didn't naturally guarantee termination (PR79405),
whereas the new one does.
gcc/
* fwprop.c: Rewrite to use the RTL SSA framework.
gcc/testsuite/
* gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: Don't
expect insn updates to be deferred.
* gcc.target/aarch64/sve/acle/asm/st4_s8.c: Expect the addition
to be folded into the address.
* gcc.target/aarch64/sve/acle/asm/st4_u8.c: Likewise.
|
|
This patch adds the RTL SSA infrastructure itself. The following
fwprop.c patch will make use of it.
gcc/
* configure.ac: Add rtl-ssa to the list of dependence directories.
* configure: Regenerate.
* Makefile.in (rtl-ssa-warn): New variable.
(OBJS): Add the rtl-ssa object files.
* emit-rtl.h (rtl_data::ssa): New field.
* rtl-ssa.h: New file.
* system.h: Include <functional> when INCLUDE_FUNCTIONAL is defined.
* rtl-ssa/access-utils.h: Likewise.
* rtl-ssa/accesses.h: New file.
* rtl-ssa/accesses.cc: Likewise.
* rtl-ssa/blocks.h: New file.
* rtl-ssa/blocks.cc: Likewise.
* rtl-ssa/change-utils.h: Likewise.
* rtl-ssa/changes.h: New file.
* rtl-ssa/changes.cc: Likewise.
* rtl-ssa/functions.h: New file.
* rtl-ssa/functions.cc: Likewise.
* rtl-ssa/insn-utils.h: Likewise.
* rtl-ssa/insns.h: New file.
* rtl-ssa/insns.cc: Likewise.
* rtl-ssa/internals.inl: Likewise.
* rtl-ssa/is-a.inl: Likewise.
* rtl-ssa/member-fns.inl: Likewise.
* rtl-ssa/movement.h: Likewise.
|
|
This patch adds some documentation to rtl.texi about the SSA form.
It only really describes the high-level structure -- I think for
API-level stuff it's better to rely on function comments instead.
gcc/
* doc/rtl.texi (RTL SSA): New node.
|
|
This patch adds a routine for finding a “simple” SET for a register
definition. See the comment in the patch for details.
gcc/
* rtl.h (simple_regno_set): Declare.
* rtlanal.c (simple_regno_set): New function.
|
|
This patch adds some classes for gathering the list of registers
and memory that are read and written by an instruction, along
with various properties about the accesses. In some ways it's
similar to the information that DF collects for registers,
but extended to memory. The main reason for using it instead
of DF is that it can analyse tentative changes to instructions
before they've been committed.
The classes also collect general information about the instruction,
since it's cheap to do and helps to avoid multiple walks of the same
RTL pattern.
I've tried to optimise the code quite a bit, since with later patches
it becomes relatively performance-sensitive. See the discussion in
the comments for the trade-offs involved.
I put the declarations in a new rtlanal.h header file since it
seemed a bit excessive to put so much new inline stuff in rtl.h.
gcc/
* rtlanal.h: New file.
(MEM_REGNO): New constant.
(rtx_obj_flags): New namespace.
(rtx_obj_reference, rtx_properties): New classes.
(growing_rtx_properties, vec_rtx_properties_base): Likewise.
(vec_rtx_properties): New alias.
* rtlanal.c: Include it.
(rtx_properties::try_to_add_reg): New function.
(rtx_properties::try_to_add_dest): Likewise.
(rtx_properties::try_to_add_src): Likewise.
(rtx_properties::try_to_add_pattern): Likewise.
(rtx_properties::try_to_add_insn): Likewise.
(vec_rtx_properties_base::grow): Likewise.
|
|
When using validate_change to make a group of changes, you have
to remember to cancel them if something goes wrong. This patch
adds an RAII class to make that easier. See the comments in the
patch for details and examples.
gcc/
* recog.h (insn_change_watermark): New class.
|
|
This patch adds yet another way of propagating into an instruction and
simplifying the result. (The net effect of the series is to keep the
total number of propagation approaches the same though, since a later
patch removes the fwprop.c routines.)
One of the drawbacks of the validate_replace_* routines is that
they only do simple simplifications, mostly canonicalisations:
/* Do changes needed to keep rtx consistent. Don't do any other
simplifications, as it is not our job. */
if (simplify)
simplify_while_replacing (loc, to, object, op0_mode);
But substituting can often lead to real simplification opportunities.
simplify-rtx.c:simplify_replace_rtx does fully simplify the result,
but it only operates on specific rvalues rather than full instruction
patterns. It is also nondestructive, which means that it returns a
new rtx whenever a substitution or simplification was possible.
This can create quite a bit of garbage rtl in the context of a
speculative recog, where changing the contents of a pointer is
often enough.
The new routines are therefore supposed to provide simplify_replace_rtx-
style substitution in recog. They go to some effort to prevent garbage
rtl from being created.
At the moment, the new routines fail if the pattern would still refer
to the old "from" value in some way. That might be unnecessary in
some contexts; if so, it could be put behind a configuration parameter.
gcc/
* recog.h (insn_propagation): New class.
* recog.c (insn_propagation::apply_to_mem_1): New function.
(insn_propagation::apply_to_rvalue_1): Likewise.
(insn_propagation::apply_to_lvalue_1): Likewise.
(insn_propagation::apply_to_pattern_1): Likewise.
(insn_propagation::apply_to_pattern): Likewise.
(insn_propagation::apply_to_rvalue): Likewise.
|
|
In some cases, it can be convenient to roll back the changes that
have been made by validate_change to see how things looked before,
then reroll the changes. For example, this makes it possible
to defer calculating the cost of an instruction until we know that
the result is actually needed. It can also make dumps easier to read.
This patch adds a couple of helper functions for doing that.
gcc/
* recog.h (temporarily_undo_changes, redo_changes): Declare.
* recog.c (temporarily_undone_changes): New variable.
(validate_change_1, confirm_change_group): Check that it's zero.
(cancel_changes): Likewise.
(swap_change, temporarily_undo_changes): New functions.
(redo_changes): Likewise.
|
|
A later patch wants to be able to use the validate_change machinery
to reduce the XVECLEN of a PARALLEL. This should be more efficient
than allocating a separate PARALLEL at a possibly distant memory
location, especially since the new PARALLEL would be garbage rtl if
the new pattern turns out not to match. Combine already pulls this
trick with SUBST_INT.
This patch adds a general helper for doing that.
gcc/
* recog.h (validate_change_xveclen): Declare.
* recog.c (change_t::old_len): New field.
(validate_change_1): Add a new_len parameter. Conditionally
replace the XVECLEN of an rtx, avoiding single-element PARALLELs.
(validate_change_xveclen): New function.
(cancel_changes): Undo changes made by validate_change_xveclen.
|
|
One of the recurring warts of RTL is that multiplication by a power
of 2 is represented as a MULT inside a MEM but as an ASHIFT outside
a MEM. It would obviously be better if we didn't have this kind of
context sensitivity, but it would be difficult to remove.
Currently the simplify-rtx.c routines are hard-coded for the
ASHIFT form. This means that some callers have to convert the
ASHIFTs “back” into MULTs after calling the simplify-rtx.c
routines; see fwprop.c:canonicalize_address for an example.
I think we can relieve some of the pain by wrapping the simplify-rtx.c
routines in a simple class that tracks whether the expression occurs
in a MEM or not, so that no post-processing is needed.
An obvious concern is whether passing the “this” pointer around
will slow things down or bloat the code. I can't measure any
increase in compile time after applying the patch. Sizewise,
simplify-rtx.o text increases by 2.3% in default-checking builds
and 4.1% in release-checking builds.
I realise the MULT/ASHIFT thing isn't the most palatable
reason for doing this, but I think it might be useful for
other things in future, such as using local nonzero_bits
hooks/virtual functions instead of the global hooks.
The obvious alternative would be to add a static variable
and hope that it is always updated correctly.
Later patches make use of this.
gcc/
* rtl.h (simplify_context): New class.
(simplify_unary_operation, simplify_binary_operation): Use it.
(simplify_ternary_operation, simplify_relational_operation): Likewise.
(simplify_subreg, simplify_gen_unary, simplify_gen_binary): Likewise.
(simplify_gen_ternary, simplify_gen_relational): Likewise.
(simplify_gen_subreg, lowpart_subreg): Likewise.
* simplify-rtx.c (simplify_gen_binary): Turn into a member function
of simplify_context.
(simplify_gen_unary, simplify_gen_ternary, simplify_gen_relational)
(simplify_truncation, simplify_unary_operation): Likewise.
(simplify_unary_operation_1, simplify_byte_swapping_operation)
(simplify_associative_operation, simplify_logical_relational_operation)
(simplify_binary_operation, simplify_binary_operation_series)
(simplify_distributive_operation, simplify_plus_minus): Likewise.
(simplify_relational_operation, simplify_relational_operation_1)
(simplify_cond_clz_ctz, simplify_merge_mask): Likewise.
(simplify_ternary_operation, simplify_subreg, simplify_gen_subreg)
(lowpart_subreg): Likewise.
(simplify_binary_operation_1): Likewise. Test mem_depth when
deciding whether the ASHIFT or MULT form is canonical.
(simplify_merge_mask): Use simplify_context.
|
|
verify_changes has a test for whether a particular hard register
is a user-defined register asm. A later patch needs to test the
same thing, so this patch splits it out into a helper.
gcc/
* rtl.h (register_asm_p): Declare.
* recog.c (verify_changes): Split out the test for whether
a hard register is a register asm to...
* rtlanal.c (register_asm_p): ...this new function.
|