Age | Commit message (Collapse) | Author | Files | Lines |
|
There is no need to guard against routine Contains being called on
No_Elist, because it will return False. Code cleanup related to handling
of primitive operations in GNATprove; semantics is unaffected.
gcc/ada/
* sem_prag.adb (Record_Possible_Body_Reference): Remove call to Present.
* sem_util.adb (Find_Untagged_Type_Of): Likewise.
|
|
The handling of finalization is delicate during the expansion of aggregates
since the generated assignments must not cause the finalization of the RHS.
That's why the No_Ctrl_Actions flag is set on them and the adjustments are
generated manually.
This was not done in the case of an array of array with controlled component
when its subaggregates are not expanded in place but instead are replaced by
temporaries, leading to double free or memory corruption.
gcc/ada/
* exp_aggr.adb (Initialize_Array_Component): Remove obsolete code.
(Expand_Array_Aggregate): In the case where a temporary is created
and the parent is an assignment statement with No_Ctrl_Actions set,
set Is_Ignored_Transient on the temporary.
|
|
The problem is that the ghost mode of the instance is used to analyze the
parent of the generic body, whose own ghost mode has nothing to do with it.
gcc/ada/
* sem_ch12.adb (Instantiate_Package_Body): Set the ghost mode to
that of the instance only after loading the generic's parent.
(Instantiate_Subprogram_Body): Likewise.
|
|
Code cleanup; semantics is unaffected.
gcc/ada/
* exp_ch4.adb (Expand_Set_Membership): Simplify by using Evolve_Or_Else.
|
|
Code cleanup; semantics is unaffected.
gcc/ada/
* exp_ch4.adb (Is_OK_Object_Reference): Replace loop with a call to
Unqual_Conv; consequently, change object from variable to constant;
replace an IF statement with an AND THEN expression.
|
|
We used to count protected entries by iterating over component
declarations, but then switched to iterating over entities and
left some code that is no longer needed. Cleanup; semantics is
unaffected (maybe except fixing an assertion failure in developer
builds when there is pragma among entry family declarations).
gcc/ada/
* exp_ch9.adb
(Build_Entry_Count_Expression): Remove loop over component declaration;
consequently remove a parameter that is no longer used; adapt callers.
(Make_Task_Create_Call): Refine type of a local variable.
|
|
When iterating over record components we must ignore pragmas.
Minor bug, as pragmas within record components do not appear often.
gcc/ada/
* sem_cat.adb (Check_Non_Static_Default_Expr): Detect components inside
loop, not in the loop condition itself.
|
|
gcc/ada/
* libgnat/a-cbdlli.ads (List): Move Nodes component to the end.
|
|
An instantiation of the package compiled with -gnatw.q yields:
warning: in instantiation at a-crdlli.ads:317 [-gnatw.q]
warning: record layout may cause performance issues [-gnatw.q]
warning: in instantiation at a-crdlli.ads:317 [-gnatw.q]
warning:
component "Nodes" whose length depends on a discriminant [-gnatw.q]
warning: in instantiation at a-crdlli.ads:317 [-gnatw.q]
warning: comes too early and was moved down [-gnatw.q]
gcc/ada/
* libgnat/a-crdlli.ads (List): Move Nodes component to the end.
|
|
This rejects the Unrestricted_Access attribute applied to an aliased array
with a constrained nominal subtype when its type is resolved to be a thin
pointer. The reason is that supporting this case would require the aliased
array to contain its bounds, and this is the case only for aliased arrays
whose nominal subtype is unconstrained.
gcc/ada/
* sem_attr.adb (Is_Thin_Pointer_To_Unc_Array): New predicate.
(Resolve_Attribute): Apply the static matching legality rule to an
Unrestricted_Access attribute applied to an aliased prefix if the
type is a thin pointer. Call Is_Thin_Pointer_To_Unc_Array for the
aliasing legality rule as well.
|
|
Code cleanup; semantics is unaffected.
gcc/ada/
* sem_util.adb (Is_Null_Record_Definition): Use First_Non_Pragma and
Next_Non_Pragma to ignore pragmas within component list.
|
|
Routine Get_Argument works differently for generic units (as explained
in its comment), but it failed to reliably detect such units when their
kind is temporarily made non-generic (for resolving recursive calls, as
explained in the comment at the end of Is_Generic_Declaration_Or_Body).
With this patch the frontend will look at the decorated expression of
the Global contract attached to the Global aspect; previously it was
looking at the undecorated expression attached to the corresponding
pragma.
gcc/ada/
* sem_prag.adb (Get_Argument): Improve detection of generic units.
|
|
Code cleanup; semantics is unaffected.
gcc/ada/
* sem_ch4.adb (Check_Action_OK): Replace low-level test with a
high-level routine.
* sem_ch13.adb (Is_Predicate_Static): Likewise.
|
|
gcc/ada/
* exp_ch9.adb
(Expand_N_Conditional_Entry_Call): Factorize code to avoid
duplicating subtrees; required to avoid problems when the copied
code has implicit labels.
* sem_util.ads (New_Copy_Separate_List): Removed.
(New_Copy_Separate_Tree): Removed.
* sem_util.adb (New_Copy_Separate_List): Removed.
(New_Copy_Separate_Tree): Removed.
|
|
Calls to Length on No_List intentionally return 0, so explicit guards
against No_List are unnecessary. Code cleanup; semantics is unaffected.
gcc/ada/
* sem_ch13.adb (Check_Component_List): Local variable Compl is now
a constant; a nested block is no longer needed.
|
|
Assorted cleanups related to recent fixes of aggregate handling for
GNATprove; semantics is unaffected.
gcc/ada/
* sem_aggr.adb
(Resolve_Record_Aggregate): Remove useless assignment.
* sem_aux.adb
(Has_Variant_Part): Remove useless guard; this routine is only called
on type entities (and now will crash in other cases).
* sem_ch3.adb
(Create_Constrained_Components): Only assign Assoc_List when necessary;
tune whitespace.
(Is_Variant_Record): Refactor repeated calls to Parent.
* sem_util.adb
(Gather_Components): Assert that discriminant association has just one
choice in component_association; refactor repeated calls to Next.
* sem_util.ads
(Gather_Components): Tune whitespace in comment.
|
|
Component items in a record declaration might include pragmas, which
must be ignored when detecting components with default expressions.
More a code cleanup than a bugfix, as it only affects artificial corner
cases. Found while fixing missing legality checks for variant component
declarations.
gcc/ada/
* sem_ch3.adb (Check_CPP_Type_Has_No_Defaults): Iterate with
First_Non_Pragma and Next_Non_Pragma.
* exp_dist.adb (Append_Record_Traversal): Likewise.
|
|
gcc/ada/
* exp_ch9.adb (Build_Class_Wide_Master): Remember internal blocks
that have a task master entity declaration.
(Build_Master_Entity): Code cleanup.
* sem_util.ads (Is_Internal_Block): New subprogram.
* sem_util.adb (Is_Internal_Block): New subprogram.
|
|
Call to First on empty list is intentionally returning Empty.
gcc/ada/
* sem_util.adb (Gather_Components): Remove guard for empty list of
components.
|
|
It breaks the Allow_Integer_Address special mode.
Add new standard_address parameters to gigi and alphabetize others, this is
necessary when addresses are not treated like integers.
gcc/ada/
* back_end.adb (Call_Back_End): Add gigi_standard_address to the
signature of the gigi procedure and alphabetize other parameters.
Pass Standard_Address as actual parameter for it.
* cstand.adb (Create_Standard): Do not set Is_Descendant_Of_Address
on Standard_Address.
* gcc-interface/gigi.h (gigi): Add a standard_address parameter and
alphabetize others.
* gcc-interface/trans.cc (gigi): Likewise. Record a builtin address
type and save it as the type for Standard.Address.
|
|
This commit fixes two CodePeer crashes that were introduced when the
format of the controlling tag changed.
gcc/ada/
* exp_disp.adb (Expand_Dispatching_Call): Handle new Controlling_Tag.
* sem_scil.adb (Check_SCIL_Node): Treat N_Object_Renaming_Declaration as
N_Object_Declaration.
|
|
Code cleanup; semantics is unaffected.
gcc/ada/
* exp_aggr.adb
(Build_Constrained_Type): Remove local constants that were shadowing
equivalent global constants; replace a wrapper that calls
Make_Integer_Literal with a numeric literal; remove explicit
Aliased_Present parameter which is equivalent to the default value.
(Check_Bounds): Remove unused initial value.
(Expand_Array_Aggregate): Use aggregate type from the context.
|
|
This extends the delaying mechanism present in the cases where the instance
is not at library level, so as to wait until after the instantiation of the
body is performed, before generating the finalizer of the compilation unit.
gcc/ada/
* einfo.ads (Delay_Cleanups): Document new usage.
* exp_ch7.ads (Build_Finalizer): New declaration.
* exp_ch7.adb (Build_Finalizer.Process_Declarations): Do not treat
library-level package instantiations specially.
(Build_Finalizer): Return early for package bodies and specs that
are not compilation units instead of using a more convoluted test.
(Expand_N_Package_Body): Do not build a finalizer if Delay_Cleanups
is set on the defining entity.
(Expand_N_Package_Declaration): Likewise.
* inline.ads (Pending_Body_Info): Reorder and add Fin_Scop.
(Add_Pending_Instantiation): Add Fin_Scop parameter.
* inline.adb (Add_Pending_Instantiation): Likewise and copy it into
the Pending_Body_Info appended to Pending_Instantiations.
(Add_Scope_To_Clean): Change parameter name to Scop and remove now
irrelevant processing.
(Cleanup_Scopes): Deal with scopes that are package specs or bodies.
(Instantiate_Body): For package instantiations, deal specially with
scopes that are package bodies and with scopes that are dynamic.
Pass the resulting scope to Add_Scope_To_Clean directly.
* sem_ch12.adb (Analyze_Package_Instantiation): In the case where a
body is needed, compute the enclosing finalization scope and pass it
in the call to Add_Pending_Instantiation.
(Inline_Instance_Body): Adjust aggregate passed in the calls to
Instantiate_Package_Body.
(Load_Parent_Of_Generic): Likewise.
|
|
gcc/ada/
* sem_util.adb (Compile_Time_Constraint_Error): Test the Ekind.
|
|
Code cleanup; semantics is unaffected.
gcc/ada/
* exp_aggr.adb (Build_Constrained_Type): Use List_Length to count
expressions in consecutive subaggregates.
|
|
Remove hard coded definition and conform to standard usage of using
computed os_constants for opaque type declarations.
gcc/ada/
* libgnarl/s-osinte__qnx.ads (sigset_t): Modify
declaration to use system.os_constants computed
value. Align it.
|
|
They are problematic on platforms where the provenance of pointers must be
tracked throughout their lifetime.
gcc/ada/
* exp_sel.adb: Add clauses for Sem_Util, remove them for Opt, Sinfo
and Sinfo.Nodes.
(Build_K): Always use 'Tag of the object.
(Build_S_Assignment): Likewise.
|
|
Code cleanup related to work on expression functions for GNATprove
(which require accessibility checks even when they are not expanded
and thus have no explicit return statements).
gcc/ada/
* accessibility.adb
(Is_Formal_Of_Current_Function): This routine expects an entity
reference and not the entity itself, so its parameter is a Node_Id
and not an Entity_Id.
|
|
Code cleanup only; semantics is unaffected.
gcc/ada/
* exp_aggr.adb
(Build_Array_Aggr_Code): Change variable to constant.
(Check_Same_Aggr_Bounds): Fix style; remove unused initial value.
|
|
Before this patch, in some situations, a subprogram call could be
expanded before the extra formals for the subprogram were created.
This patch fixes the problem in those situations.
gcc/ada/
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): Create extra formals
in more situations.
|
|
gcc/ada/
* checks.adb (Selected_Range_Checks): Add guards to protect calls
to Expr_Value on bounds.
|
|
Both predicates bail out if the bounds of the range are not known at compile
time, whereas Compile_Time_Compare can deal with them in specific cases.
gcc/ada/
* sem_eval.ads (Is_Null_Range): Remove requirements of compile-time
known bounds and add WARNING line.
(Not_Null_Range): Remove requirements of compile-time known bounds.
* sem_eval.adb (Is_Null_Range): Fall back to Compile_Time_Compare.
(Not_Null_Range): Likewise.
* fe.h (Is_Null_Range): New predicate.
|
|
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi):
Do not disable call to ix86_expand_vecop_qihi2.
|
|
GENERAL_REGS and mode.
r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use GENERAL_REGS when
hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
gcc/ChangeLog:
PR target/109610
PR target/109858
* ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
calculation when !hard_regno_mode_ok for GENERAL_REGS and
mode, otherwise still use GENERAL_REGS.
|
|
gcc/ChangeLog:
* config/riscv/riscv.cc (vector_zero_call_used_regs): Add
explict VL and drop VL in ops.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
|
|
gcc/testsuite/ChangeLog:
* gcc.dg/nested-vla-1.c: Require effective target trampolines.
* gcc.dg/nested-vla-2.c: Ditto.
* gcc.dg/nested-vla-3.c: Ditto.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
gcc/ChangeLog:
* sched-deps.cc (sched_macro_fuse_insns): Insns should not be fusion
in different BB blocks.
|
|
Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI -> V16HImode)
instructions when available. Currently, the compiler generates following
assembly for V16QImode multiplication (-mavx2):
vpunpcklbw %xmm0, %xmm0, %xmm3
vpunpcklbw %xmm1, %xmm1, %xmm2
vpunpckhbw %xmm0, %xmm0, %xmm0
movl $255, %eax
vpunpckhbw %xmm1, %xmm1, %xmm1
vpmullw %xmm3, %xmm2, %xmm2
vmovd %eax, %xmm3
vpmullw %xmm0, %xmm1, %xmm1
vpbroadcastw %xmm3, %xmm3
vpand %xmm2, %xmm3, %xmm0
vpand %xmm1, %xmm3, %xmm3
vpackuswb %xmm3, %xmm0, %xmm0
and only with -mavx512bw -mavx512vl generates:
vpmovzxbw %xmm1, %ymm1
vpmovzxbw %xmm0, %ymm0
vpmullw %ymm1, %ymm0, %ymm0
vpmovwb %ymm0, %xmm0
Patched compiler generates more optimized code involving multiplication
in 2x-wider mode in cases where missing truncate instruction has to be
emulated with a permutation (-mavx2):
vpmovzxbw %xmm0, %ymm0
vpmovzxbw %xmm1, %ymm1
movl $255, %eax
vpmullw %ymm1, %ymm0, %ymm1
vmovd %eax, %xmm0
vpbroadcastw %xmm0, %ymm0
vpand %ymm1, %ymm0, %ymm0
vpackuswb %ymm0, %ymm0, %ymm0
vpermq $216, %ymm0, %ymm0
The patch also adjusts cost calculation of V*QImode emulations to account
for generation of 2x-wider mode instructions.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
instructions when available. Emulate truncation via
ix86_expand_vec_perm_const_1 when native truncate insn
is not available.
(ix86_expand_vecop_qihi_partial) <case MULT>: Use pmovzx
when available. Trivially rename some variables.
(ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.
* config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
calculation of V*QImode emulations to account for generation of
2x-wider mode instructions.
(ix86_shift_rotate_cost): Update cost calculation of V*QImode
emulations to account for generation of 2x-wider mode instructions.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.
|
|
avr-common.cc introduces the following options that are set depending
on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and
-fsplit-wide-types-early. The inliner thinks that different options
disallow cross-optimization inlining, so provide can_inline_p.
gcc/
PR target/104327
* config/avr/avr.cc (avr_can_inline_p): New static function.
(TARGET_CAN_INLINE_P): Define to that function.
|
|
There is already a pattern in avr.md that matches single-bit transfers
from one register to another one, but it only handled bit 0 of 8-bit
registers. This change makes that pattern more generic so it matches
more of similar single-bit transfers.
gcc/
PR target/82931
* config/avr/avr.md (*movbitqi.0): Rename to *movbit<mode>.0-6.
Handle any bit position and use mode QISI.
* config/avr/avr.cc (avr_rtx_costs_1) [IOR]: Return a cost
of 2 insns for bit-transfer of respective style.
gcc/testsuite/
PR target/82931
* gcc.target/avr/pr82931.c: New test.
|
|
MVE_5 and MVE_6 iterators are the same: this patch replaces MVE_6 with
MVE_5 everywhere in mve.md and removes MVE_6 from iterators.md.
2023-05-25 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (MVE_6): Remove.
* config/arm/mve.md: Replace MVE_6 with MVE_5.
|
|
This patch is supporting decrement IV by following the flow designed by
Richard:
(1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
call vect_set_loop_controls_directly.
(2) vect_set_loop_controls_directly calculates "step" as in your patch.
If rgc has 1 control, this step is the SSA name created for that
control. Otherwise the step is a fresh SSA name, as in your patch.
(3) vect_set_loop_controls_directly stores this step somewhere for later
use, probably in LOOP_VINFO. Let's use "S" to refer to this stored
step.
(4) After the vect_set_loop_controls_directly call above, and outside
the "if" statement that now contains vect_set_loop_controls_directly,
check whether rgc->controls.length () > 1. If so, use
vect_adjust_loop_lens_control to set the controls based on S.
Then the only caller of vect_adjust_loop_lens_control is
vect_set_loop_condition_partial_vectors. And the starting
step for vect_adjust_loop_lens_control is always S.
This patch has well tested for single-rgroup and multiple-rgroup (SLP)
and passed all testcase in RISC-V port.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog:
* tree-vect-loop-manip.cc (vect_adjust_loop_lens_control): New
function.
(vect_set_loop_controls_directly): Add decrement IV support.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): New
variable.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New
macro.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c: New test.
|
|
This patch annotates the complex add and mla patterns for vec-concat-zero.
Testing showed an interesting bug in our MD patterns where they were defined to match:
(plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
(unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" "w")
(match_operand:VHSDF 3 "register_operand" "w")
(match_operand:SI 4 "const_int_operand" "n")]
FCMLA))
but the canonicalisation rules for PLUS require the more "complex" operand to be first so
during combine when the new substituted patterns were attempted to be formed combine/recog would
try to match:
(plus:V2SF (unspec:V2SF [
(reg:V2SF 100)
(reg:V2SF 101)
(const_int 0 [0])
] UNSPEC_FCMLA270)
(reg:V2SF 99))
instead. This patch fixes the operands of the PLUS RTX in these patterns.
Similar patterns for the dot-product instructions already used the right order.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_fcadd<rot><mode>): Rename to...
(aarch64_fcadd<rot><mode><vczle><vczbe>): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmla<rot><mode>): Rename to...
(aarch64_fcmla<rot><mode><vczle><vczbe>): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmla_lane<rot><mode>): Rename to...
(aarch64_fcmla_lane<rot><mode><vczle><vczbe>): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmla_laneq<rot>v4hf): Rename to...
(aarch64_fcmla_laneq<rot>v4hf<vczle><vczbe>): ... This.
Fix canonicalization of PLUS operands.
(aarch64_fcmlaq_lane<rot><mode>): Fix canonicalization of PLUS operands.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_9.c: New test.
|
|
This patch implements a number of scalar data processing intrinsics from ACLE
that were requested by some users. Some of these have fast single-instruction
sequences for Armv6 and later, but even for earlier versions they can still emit
an inline sequence or a call to libgcc (and ACLE recommends them being unconditionally
available).
Chris Sidebottom wrote most of the patch, I just cleaned it up, wired up some builtins
and adjusted the tests.
Bootstrapped and tested on arm-none-linux-gnueabihf.
Co-authored-by: Chris Sidebottom <chris.sidebottom@arm.com>
gcc/ChangeLog:
* config/arm/arm.md (rbitsi2): Rename to...
(arm_rbit): ... This.
(ctzsi2): Adjust for the above.
(arm_rev16si2): Convert to define_expand.
(arm_rev16si2_alt1): New pattern.
(arm_rev16si2_alt): Rename to...
(*arm_rev16si2_alt2): ... This.
* config/arm/arm_acle.h (__ror, __rorl, __rorll, __clz, __clzl, __clzll,
__cls, __clsl, __clsll, __revsh, __rev, __revl, __revll, __rev16,
__rev16l, __rev16ll, __rbit, __rbitl, __rbitll): Define intrinsics.
* config/arm/arm_acle_builtins.def (rbit, rev16si2): Define builtins.
gcc/testsuite/ChangeLog:
* gcc.target/arm/acle/data-intrinsics-armv6.c: New test.
* gcc.target/arm/acle/data-intrinsics-assembly.c: New test.
* gcc.target/arm/acle/data-intrinsics-rbit.c: New test.
* gcc.target/arm/acle/data-intrinsics.c: New test.
|
|
In r11-966-g9a182ef9ee011935d827ab5c6c9a7cd8e22257d8 we introduce a
simplification to emit_move_insn that attempts to simplify moves of the form:
(set (subreg:M1 (reg:M2 ...)) (constant C))
where M1 and M2 are of equal mode size. That is problematic for the splitter
vfp.md:no_literal_pool_df_immediate in the arm backend, which tries to pun an
lvalue DFmode pseudo into DImode and assign a constant to it with
emit_move_insn, as the new transformation simply undoes this, and we end up
splitting indefinitely.
This patch changes things around in the arm backend so that we use a
DImode temporary (instead of DFmode) and first load the DImode constant
into the pseudo, and then pun the pseudo into DFmode as an rvalue in a
reg -> reg move. I believe this should be semantically equivalent but
avoids the pathalogical behaviour seen in the PR.
gcc/ChangeLog:
PR target/109800
* config/arm/arm.md (movdf): Generate temporary pseudo in DImode
instead of DFmode.
* config/arm/vfp.md (no_literal_pool_df_immediate): Rather than punning an
lvalue DFmode pseudo into DImode, use a DImode pseudo and pun it into
DFmode as an rvalue.
gcc/testsuite/ChangeLog:
PR target/109800
* gcc.target/arm/pure-code/pr109800.c: New test.
|
|
The following properly handles pattern matching generated COND_EXPRs
which can still have embedded compares in vectorizable_condition
which will always code generate the masked vector variant. We
were requiring vcond with embedded comparisons instead of also
allowing (as code generated) split compare and VEC_COND_EXPR.
This fixes some of the fallout when removing vcond{,u,eq} expanders
from the x86 backend.
PR target/109955
* tree-vect-stmts.cc (vectorizable_condition): For
embedded comparisons also handle the case when the target
only provides vec_cmp and vcond_mask.
|
|
Current ARC's TLS Local Dynamic model is using two anchors to access
data, namely `.tdata` and `.tbss`. This implementation is unnecessary
complicated. However, the TLS Local Dynamic model has better results
using Global Dynamic model and anchors.
gcc/ChangeLog;
* config/arc/arc.cc (arc_call_tls_get_addr): Simplify access using
TLS Local Dynamic.
Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
|
|
gcc/ChangeLog:
* config/aarch64/aarch64.cc (scalar_move_insn_p): New function.
(seq_cost_ignoring_scalar_moves): Likewise.
(aarch64_expand_vector_init): Call seq_cost_ignoring_scalar_moves.
|
|
While optimising some vector math library code with intrinsics we stumbled upon the issue in the testcase.
The compiler should be generating a FACGT instruction but instead we generate:
foo(__Float32x4_t, __Float32x4_t, __Float32x4_t):
fabs v0.4s, v0.4s
adrp x0, .LC0
ldr q31, [x0, #:lo12:.LC0]
fcmgt v0.4s, v0.4s, v31.4s
ret
This is because the vcagtq_f32 intrinsic is open-coded in arm_neon.h as
return vabsq_f32 (__a) > vabsq_f32 (__b)
thus relying on the optimisers to merge it back together. But since one of the arms of the comparison
is a vector constant the combine pass optimises the abs into it and tries matching:
(set (reg:V4SI 101)
(neg:V4SI (gt:V4SI (reg:V4SF 100)
(const_vector:V4SF [
(const_double:SF 1.0e+2 [0x0.c8p+7]) repeated x4
]))))
and
(set (reg:V4SI 101)
(neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 104))
(reg:V4SF 103))))
instead of what we want:
(insn 13 9 14 2 (set (reg/i:V4SI 32 v0)
(neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 98))
(abs:V4SF (reg:V4SF 96)))))
I don't really see a good way around that with our current implementation of these intrinsics.
Therefore this patch reimplements these intrinsics with aarch64 builtins that generate the RTL for these
instructions directly. Apparently we already had them defined in aarch64-simd-builtins.def and have been
using them for the fp16 case already.
I realise that this approach is against the general principle of expressing intrinsics in the higher-level constructs,
so I'm willing to listen to counter-arguments.
That said, the FACGT/FACGE instructions are as fast as the non-ABS comparison instructions on all microarchitectures that I know of
so it should always be a win to have them in the merged form rather than split the fabs step separately or try to hoist it.
And the testcase does come from real library code that we're trying to optimise.
With this patch for the testcase we generate:
foo:
adrp x0, .LC0
ldr q31, [x0, #:lo12:.LC0]
facgt v0.4s, v0.4s, v31.4s
ret
gcc/ChangeLog:
* config/aarch64/arm_neon.h (vcage_f64): Reimplement with builtins.
(vcage_f32): Likewise.
(vcages_f32): Likewise.
(vcageq_f32): Likewise.
(vcaged_f64): Likewise.
(vcageq_f64): Likewise.
(vcagts_f32): Likewise.
(vcagt_f32): Likewise.
(vcagt_f64): Likewise.
(vcagtq_f32): Likewise.
(vcagtd_f64): Likewise.
(vcagtq_f64): Likewise.
(vcale_f32): Likewise.
(vcale_f64): Likewise.
(vcaled_f64): Likewise.
(vcales_f32): Likewise.
(vcaleq_f32): Likewise.
(vcaleq_f64): Likewise.
(vcalt_f32): Likewise.
(vcalt_f64): Likewise.
(vcaltd_f64): Likewise.
(vcaltq_f32): Likewise.
(vcaltq_f64): Likewise.
(vcalts_f32): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/facgt_constpool_1.c: New test.
|