Age | Commit message (Collapse) | Author | Files | Lines |
|
As we can always broadcast an integer constant to a vector register
allow them in riscv_const_insns. We need as many instructions as
it takes to generate the constant and one vmv.vx.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Allow
const_vec_duplicates.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: Add vmv.v.x
tests.
* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Dito.
|
|
The patch doesn't handle:
1. cast64_to_32,
2. memory source with rsize < range.
gcc/ChangeLog:
PR middle-end/108938
* gimple-ssa-store-merging.cc (is_bswap_or_nop_p): New
function, cut from original find_bswap_or_nop function.
(find_bswap_or_nop): Add a new parameter, detect bswap +
rotate and save rotate result in the new parameter.
(bswap_replace): Add a new parameter to indicate rotate and
generate rotate stmt if needed.
(maybe_optimize_vector_constructor): Adjust for new rotate
parameter in the upper 2 functions.
(pass_optimize_bswap::execute): Ditto.
(imm_store_chain_info::output_merged_store): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr108938-1.c: New test.
* gcc.target/i386/pr108938-2.c: New test.
* gcc.target/i386/pr108938-3.c: New test.
* gcc.target/i386/pr108938-load-1.c: New test.
* gcc.target/i386/pr108938-load-2.c: New test.
|
|
This patch converts the patterns for the integer widen and pairwise-add instructions
to standard RTL operations. The pairwise addition withing a vector can be represented
as an addition of two vec_selects, one selecting the even elements, and one selecting odd.
Thus for the intrinsic vpaddlq_s8 we can generate:
(set (reg:V8HI 92)
(plus:V8HI (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
(parallel [
(const_int 0 [0])
(const_int 2 [0x2])
(const_int 4 [0x4])
(const_int 6 [0x6])
(const_int 8 [0x8])
(const_int 10 [0xa])
(const_int 12 [0xc])
(const_int 14 [0xe])
]))
(vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
(parallel [
(const_int 1 [0x1])
(const_int 3 [0x3])
(const_int 5 [0x5])
(const_int 7 [0x7])
(const_int 9 [0x9])
(const_int 11 [0xb])
(const_int 13 [0xd])
(const_int 15 [0xf])
]))))
Similarly for the accumulating forms where there's an extra outer PLUS for the accumulation.
We already have the handy helper functions aarch64_stepped_int_parallel_p and
aarch64_gen_stepped_int_parallel defined in aarch64.cc that we can make use of to define
the right predicate for the VEC_SELECT PARALLEL.
This patch allows us to remove some code iterators and the UNSPEC definitions for SADDLP and UADDLP.
UNSPEC_UADALP and UNSPEC_SADALP are retained because they are used by SVE2 patterns still.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>adalp<mode>): Delete.
(aarch64_<su>adalp<mode>): New define_expand.
(*aarch64_<su>adalp<mode><vczle><vczbe>_insn): New define_insn.
(aarch64_<su>addlp<mode>): Convert to define_expand.
(*aarch64_<su>addlp<mode><vczle><vczbe>_insn): New define_insn.
* config/aarch64/iterators.md (UNSPEC_SADDLP, UNSPEC_UADDLP): Delete.
(ADALP): Likewise.
(USADDLP): Likewise.
* config/aarch64/predicates.md (vect_par_cnst_even_or_odd_half): Define.
|
|
This patch reimplements the MD patterns for the UHADD,SHADD,UHSUB,SHSUB,URHADD,SRHADD instructions using
standard RTL operations rather than unspecs. The correct RTL representations involves widening
the inputs before adding them and halving, followed by a truncation back to the original mode.
An unfortunate wart in the patch is that we end up having very similar expanders for the intrinsics
through the aarch64_<su>h<ADDSUB:optab><mode> and aarch64_<su>rhadd<mode> names and the standard names
for the vector averaging optabs <su>avg<mode>3_floor and <su>avg<mode>3_ceil.
I'd like to reuse <su>avg<mode>3_ceil for the intrinsics builtin as well but our scheme
in aarch64-simd-builtins.def and aarch64-builtins.cc makes it awkward by only allowing mappings
of entries in aarch64-simd-builtins.def to:
0 - CODE_FOR_aarch64_<name><mode>
1-9 - CODE_FOR_<name><mode><1-9>
10 - CODE_FOR_<name><mode>
whereas here we want a string after the <mode> i.e. CODE_FOR_uavg<mode>3_ceil.
This patch adds a bit of remapping logic in aarch64-builtins.cc before the construction of the
builtin info that remaps the CODE_FOR_* definitions in aarch64-simd-builtins.def to the
optab-derived ones. CODE_FOR_aarch64_srhaddv4si gets remapped to CODE_FOR_avgv4si3_ceil, for example.
It's a bit specific to this case, but this solution requires the least invasive changes while avoiding
having duplicate expanders just for the sake of a different pattern name.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (VAR1): Move to after inclusion of
aarch64-builtin-iterators.h. Add definition to remap shadd, uhadd,
srhadd, urhadd builtin codes for standard optab ones.
* config/aarch64/aarch64-simd.md (<u>avg<mode>3_floor): Rename to...
(<su_optab>avg<mode>3_floor): ... This. Expand to RTL codes rather than
unspec.
(<u>avg<mode>3_ceil): Rename to...
(<su_optab>avg<mode>3_ceil): ... This. Expand to RTL codes rather than
unspec.
(aarch64_<su>hsub<mode>): New define_expand.
(aarch64_<sur>h<addsub><mode><vczle><vczbe>): Split into...
(*aarch64_<su>h<ADDSUB:optab><mode><vczle><vczbe>_insn): ... This...
(*aarch64_<su>rhadd<mode><vczle><vczbe>_insn): ... And this.
|
|
gcc/testsuite/
PR sanitizer/82501
* c-c++-common/asan/pointer-compare-1.c: Disable use of small data
on RISC-V.
|
|
gcc/
PR target/110036
* config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to
match libsanitizer.
|
|
This patch expresses the intrinsics for the SRA and RSRA instructions with
standard RTL codes rather than relying on UNSPECs.
These instructions perform a vector shift right plus accumulate with an
optional rounding constant addition for the RSRA variant.
There are a number of interesting points:
* The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
kinds of trouble as it started matching the existing define_insns for
"add x0, x0, asr #N" instructions and adding the SRA form as an extra
alternative required a significant amount of deduplication of iterators and
things still didn't work out well. I decided not to tackle that case in
this patch. It can be attempted later.
* For the RSRA variants that add a rounding constant (1 << (shift-1)) the
addition is notionally performed in a wider mode than the input types so that
overflow is handled properly. In RTL this can be represented with an appropriate
extend operation followed by a truncate back to the original modes.
However for 128-bit input modes such as V4SI we don't have appropriate modes
defined for this widening i.e. we'd need a V4DI mode to represent the
intermediate widened result. This patch defines such modes for
V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
more Advanced SIMD instruction that have similar intermediate widening
semantics.
* The above new modes led to a problem with stor-layout.cc. The new modes only
exist for the sake of the RTL optimisers understanding the semantics of the
instruction but are not indended to be moved to and from register or memory,
assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
for these new modes. However, the code in stor-layout.cc:<mode_for_vector>
explicitly doesn't check this when picking a TYPE_MODE due to modes being made
potentially available later through target switching (PR38240).
This led to these modes being picked as TYPE_MODE for declarations such as:
typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
fixed-length SVE modes are available and vector_type_mode later struggling
to rectify this.
This issue is addressed with the new target hook
TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a
vector mode can be used in any legal target attribute configuration of the
port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks
only the initial target configuration. This allows a simple adjustment in
stor-layout.cc that still disqualifies these limited modes early on while
allowing consideration of modes that can be turned on in the future with
target attributes.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New modes.
* config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p):
Declare prototype.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_simd_sra<mode>): Rename to...
(aarch64_<sra_op>sra_n<mode>_insn): ... This.
(aarch64_<sra_op>rsra_n<mode>_insn): New define_insn.
(aarch64_<sra_op>sra_n<mode>): New define_expand.
(aarch64_<sra_op>rsra_n<mode>): Likewise.
(aarch64_<sur>sra_n<mode>): Rename to...
(aarch64_<sur>sra_ndi): ... This.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add
any_target_p argument.
(aarch64_extract_vec_duplicate_wide_int): Define.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
(aarch64_const_vec_rnd_cst_p): Likewise.
(aarch64_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA): Delete.
(VSRA): Adjust for the above.
(sur): Likewise.
(V2XWIDE): New mode_attr.
(vec_or_offset): Likewise.
(SHIFTEXTEND): Likewise.
* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): New
predicate.
* doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document.
* doc/tm.texi.in: Regenerate.
* stor-layout.cc (mode_for_vector): Check
vector_mode_supported_any_target_p when iterating through vector modes.
* target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.
|
|
The previous fix to get_storage_model_access was incomplete and needs to be
extended to the node itself.
gcc/ada/
* gcc-interface/trans.cc (get_storage_model_access): Also strip any
type conversion in the node when unwinding the components.
|
|
It comes from a small oversight in get_storage_model_access.
gcc/ada/
* gcc-interface/trans.cc (node_is_component): Remove parentheses.
(node_is_type_conversion): New predicate.
(get_atomic_access): Use it.
(get_storage_model_access): Likewise and look into the parent to
find a component if it returns true.
(present_in_lhs_or_actual_p): Likewise.
|
|
gcc/ada/
* gcc-interface/trans.cc (Attribute_to_gnu) <Attr_Size>: Check that
the storage model has Copy_From before instantiating loads for it.
<Attr_Length>: Likewise.
<Attr_Bit_Position>: Likewise.
(gnat_to_gnu) <N_Indexed_Component>: Likewise.
<N_Slice>: Likewise.
|
|
When using 'Address on an object with a size clause, gigi would end up
creating a copy and using its address instead of the one of the original
object, leading to incorrect behavior. Remove the conversion (that
triggers the copy) when 'Address is applied to a declaration.
gcc/ada/
* gcc-interface/trans.cc (Attribute_to_gnu): Also strip conversion
in case of DECL.
|
|
This works around the limitations present for the support of arrays in the
middle-end by clearing the TREE_OVERFLOW flag for arrays with zero length.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Array_Type>: Use a
local variable for the GNAT index type.
<E_Array_Subtype>: Likewise. Call Is_Null_Range on the bounds and
force the zero on TYPE_SIZE and TYPE_SIZE_UNIT if it returns true.
|
|
gcc/ada/
* gcc-interface/trans.cc (gnat_to_gnu) <N_Op_Mod>: Test the
precision of the operation rather than that of the result type.
|
|
No functional changes.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Replace
integer_zero_node with null_pointer_node for pointer types.
* gcc-interface/trans.cc (gnat_gimplify_expr) <NULL_EXPR>: Likewise.
* gcc-interface/utils.cc (maybe_pad_type): Do not attempt to make a
packable type from a fat pointer type.
* gcc-interface/utils2.cc (build_atomic_load): Use a local variable.
(build_atomic_store): Likewise.
|
|
gcc/ada/
* gcc-interface/misc.cc (internal_error_function): Be prepared for
an input_location set to UNKNOWN_LOCATION.
|
|
The code generator must now be prepared to translate assignment statements
to objects allocated with a storage model and that are not initialized yet.
gcc/ada/
* gcc-interface/trans.cc (Attribute_to_gnu) <Attr_Size>: Tweak.
(gnat_to_gnu) <N_Assignment_Statement>: Declare a local variable.
For a target with a storage model, use the Actual_Designated_Subtype
to compute the size if it is present.
|
|
As the additional temporaries required by the semantics of nonnative storage
models are now created by the front-end, in particular for actual parameters
and assignment statements, the corresponding code in gigi can be removed.
gcc/ada/
* gcc-interface/trans.cc (Call_to_gnu): Remove code implementing the
by-copy semantics for actuals with nonnative storage models.
(gnat_to_gnu) <N_Assignment_Statement>: Remove code instantiating a
temporary for assignments between nonnative storage models.
|
|
gcc/ada/
* gcc-interface/decl.cc (range_cannot_be_superflat): Return true
immediately if Cannot_Be_Superflat is set.
* gcc-interface/misc.cc (gnat_post_options): Do not override the
-Wstringop-overflow setting.
|
|
This also removes some obsolete stuff.
gcc/ada/
* gcc-interface/Make-lang.in (ADA_CFLAGS): Move up.
(ALL_ADAFLAGS): Add $(NO_PIE_CFLAGS).
(ada/mdll.o): Remove.
(ada/mdll-fil.o): Likewise.
(ada/mdll-utl.o): Likewise.
|
|
Don't require storage access for explicit dereferences used as
lvalue (e.g. Some_Access.all'Address) or for renamings.
gcc/ada/
* gcc-interface/trans.cc (get_storage_model_access): Don't require
storage model access for dereference used as lvalue or renamings.
|
|
This streamlines the handling of qualified expressions in the expansion of
aggregates and plugs a couple of loopholes that may cause memory leaks.
gcc/ada/
* exp_aggr.adb (Build_Array_Aggr_Code): Move the declaration of Typ
to the beginning.
(Initialize_Array_Component): Test the unqualified version of the
expression for the nested array case.
(Initialize_Ctrl_Array_Component): Do not duplicate the expression
here. Do the pattern matching of the unqualified version of it.
(Gen_Assign): Call Unqualify to compute Expr_Q and use Expr_Q in
subsequent pattern matching.
(Initialize_Ctrl_Record_Component): Do the pattern matching of the
unqualified version of the aggregate.
(Build_Record_Aggr_Code): Call Unqualify.
(Convert_Aggr_In_Assignment): Likewise.
(Convert_Aggr_In_Object_Decl): Likewise.
(Component_OK_For_Backend): Likewise.
(Is_Delayed_Aggregate): Likewise.
|
|
This extends an earlier fix done for the others choice of an array aggregate
to all the choices of the aggregate, since the same sharing issue may happen
when the choices are not contiguous.
gcc/ada/
* exp_aggr.adb (Build_Array_Aggr_Code.Get_Assoc_Expr): Duplicate the
expression here instead of...
(Build_Array_Aggr_Code): ...here.
|
|
This happens when the peculiar check emitted by Check_Large_Modular_Array
is applied to an object whose actual subtype is an itype with dynamic size,
because the first reference to the itype in the expanded code may turn out
to be within the raise statement, which is problematic for the eloboration
of this itype by the code generator at library level.
gcc/ada/
* freeze.adb (Check_Large_Modular_Array): Fix head comment, use
Standard_Long_Long_Integer_Size directly and generate a reference
just before the raise statement if the Etype of the object is an
itype declared in an open scope.
|
|
The original fix makes it possible to create transient scopes around return
statements in more cases, but it overlooks that transient scopes are reused
and, in particular, that they can be promoted to secondary stack management.
gcc/ada/
* exp_ch7.adb (Find_Enclosing_Transient_Scope): Return the index in
the scope table instead of the scope's entity.
(Establish_Transient_Scope): If an enclosing scope already exists,
do not set the Uses_Sec_Stack flag on it if the node to be wrapped
is a return statement which requires secondary stack management.
|
|
This commit changes the runtime on aarch64-linux to use the Linux
version of s-tsmona.adb, so as to add support for this functionality
on aarch64-linux.
gcc/ada/
* Makefile.rtl: Use libgnat/s-tsmona__linux.adb on
aarch64-linux. Link libgnat with -ldl, as the use of
s-tsmona__linux.adb requires it.
|
|
For access-to-subprogram types with Pre/Post aspects we create a wrapper
routine that evaluates these aspects. Spec of this wrapper was created
always, while its body was only created when expansion was enabled.
Now we only create these wrappers when expansion is enabled. In
particular, we don't create them in GNATprove mode; instead, GNATprove
picks the Pre/Post expressions directly from the aspects.
gcc/ada/
* exp_ch3.adb
(Build_Access_Subprogram_Wrapper_Body): Build wrapper body if requested
by routine that builds wrapper spec.
* sem_ch3.adb
(Analyze_Full_Type_Declaration): Only build wrapper when expander is
active.
(Build_Access_Subprogram_Wrapper):
Remove special-case for GNATprove.
|
|
gcc/ada/
* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix minor issues.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Fix minor issues.
* gnat_ugn.texi: Regenerate.
|
|
The Default_Stack_Size function does not check that the binder specified
default stack size is greater than the minimum stack size for the runtime.
This can result in tasks using default stack sizes less than the minimum
stack size because the Adjust_Storage_Size only adjusts storages sizes for
tasks that explicitly specify a storage size. To avoid this, the binder
specified default stack size is round up to the minimum stack size if
required.
gcc/ada/
* libgnat/s-parame.adb: Check that Default_Stack_Size >=
Minimum_Stack_size.
* libgnat/s-parame__rtems.adb: Ditto.
* libgnat/s-parame__vxworks.adb: Check that Default_Stack_Size >=
Minimum_Stack_size and use the proper Minimum_Stack_Size if
Stack_Check_Limits is enabled.
|
|
This happens when the expression of the return statement is a call that does
not return on the same stack as the enclosing function.
gcc/ada/
* sem_res.adb (Resolve_Call): Restrict previous change to calls that
return on the same stack as the enclosing function. Tidy up.
|
|
gcc/ada/
* libgnat/a-cidlli.adb (Put_Image): Simplify.
* libgnat/a-coinve.adb (Put_Image): Likewise.
|
|
The compiler fails to capture global references during the analysis of the
aspect on the generic type because it analyzes a copy of the expression.
gcc/ada/
* exp_util.adb (Build_DIC_Procedure_Body.Add_Own_DIC): When inside
a generic unit, preanalyze the expression directly.
(Build_Invariant_Procedure_Body.Add_Own_Invariants): Likewise.
|
|
The coding style rules require to avoid using FIXME comments. ??? is
preferred.
gcc/ada/
* init.c: Replace FIXME by ???
|
|
Make some changes in reassoc pass to make it more friendly to fma pass later.
Using FMA instead of mult + add reduces register pressure and insruction
retired.
There are mainly two changes
1. Put no-mult ops and mult ops alternately at the end of the queue, which is
conducive to generating more fma and reducing the loss of FMA when breaking
the chain.
2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel
chains according to the given correlation width, keeping the FMA chance as
much as possible.
With the patch applied
On ICX:
507.cactuBSSN_r: Improved by 1.7% for multi-copy .
503.bwaves_r : Improved by 0.60% for single copy .
507.cactuBSSN_r: Improved by 1.10% for single copy .
519.lbm_r : Improved by 2.21% for single copy .
no measurable changes for other benchmarks.
On aarch64
507.cactuBSSN_r: Improved by 1.7% for multi-copy.
503.bwaves_r : Improved by 6.00% for single-copy.
no measurable changes for other benchmarks.
TEST1:
float
foo (float a, float b, float c, float d, float *e)
{
return *e + a * b + c * d ;
}
For "-Ofast -mfpmath=sse -mfma" GCC generates:
vmulss %xmm3, %xmm2, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vaddss (%rdi), %xmm0, %xmm0
ret
With this patch GCC generates:
vfmadd213ss (%rdi), %xmm1, %xmm0
vfmadd231ss %xmm2, %xmm3, %xmm0
ret
TEST2:
for (int i = 0; i < N; i++)
{
a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] + m[i]* o[i] + p[i];
}
For "-Ofast -mfpmath=sse -mfma" GCC generates:
vmovapd e(%rax), %ymm4
vmulpd d(%rax), %ymm4, %ymm3
addq $32, %rax
vmovapd c-32(%rax), %ymm5
vmovapd j-32(%rax), %ymm6
vmulpd h-32(%rax), %ymm6, %ymm2
vmovapd a-32(%rax), %ymm6
vaddpd p-32(%rax), %ymm6, %ymm0
vmovapd g-32(%rax), %ymm7
vfmadd231pd b-32(%rax), %ymm5, %ymm3
vmovapd o-32(%rax), %ymm4
vmulpd m-32(%rax), %ymm4, %ymm1
vmovapd l-32(%rax), %ymm5
vfmadd231pd f-32(%rax), %ymm7, %ymm2
vfmadd231pd k-32(%rax), %ymm5, %ymm1
vaddpd %ymm3, %ymm0, %ymm0
vaddpd %ymm2, %ymm0, %ymm0
vaddpd %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq $8192, %rax
jne .L4
vzeroupper
ret
with this patch applied GCC breaks the chain with width = 2 and generates 6 fma:
vmovapd a(%rax), %ymm2
vmovapd c(%rax), %ymm0
addq $32, %rax
vmovapd e-32(%rax), %ymm1
vmovapd p-32(%rax), %ymm5
vmovapd g-32(%rax), %ymm3
vmovapd j-32(%rax), %ymm6
vmovapd l-32(%rax), %ymm4
vmovapd o-32(%rax), %ymm7
vfmadd132pd b-32(%rax), %ymm2, %ymm0
vfmadd132pd d-32(%rax), %ymm5, %ymm1
vfmadd231pd f-32(%rax), %ymm3, %ymm0
vfmadd231pd h-32(%rax), %ymm6, %ymm1
vfmadd231pd k-32(%rax), %ymm4, %ymm0
vfmadd231pd m-32(%rax), %ymm7, %ymm1
vaddpd %ymm1, %ymm0, %ymm0
vmovapd %ymm0, a-32(%rax)
cmpq $8192, %rax
jne .L2
vzeroupper
ret
gcc/ChangeLog:
PR tree-optimization/98350
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel): Rewrite this function.
(rank_ops_for_fma): New.
(reassociate_bb): Handle new function.
gcc/testsuite/ChangeLog:
PR tree-optimization/98350
* gcc.dg/pr98350-1.c: New test.
* gcc.dg/pr98350-2.c: Ditto.
|
|
gcc/ChangeLog:
* rtl.h (rtx_addr_can_trap_p): Change return type from int to bool.
(rtx_unstable_p): Ditto.
(reg_mentioned_p): Ditto.
(reg_referenced_p): Ditto.
(reg_used_between_p): Ditto.
(reg_set_between_p): Ditto.
(modified_between_p): Ditto.
(no_labels_between_p): Ditto.
(modified_in_p): Ditto.
(reg_set_p): Ditto.
(multiple_sets): Ditto.
(set_noop_p): Ditto.
(noop_move_p): Ditto.
(reg_overlap_mentioned_p): Ditto.
(dead_or_set_p): Ditto.
(dead_or_set_regno_p): Ditto.
(find_reg_fusage): Ditto.
(find_regno_fusage): Ditto.
(side_effects_p): Ditto.
(volatile_refs_p): Ditto.
(volatile_insn_p): Ditto.
(may_trap_p_1): Ditto.
(may_trap_p): Ditto.
(may_trap_or_fault_p): Ditto.
(computed_jump_p): Ditto.
(auto_inc_p): Ditto.
(loc_mentioned_in_p): Ditto.
* rtlanal.cc (computed_jump_p_1): Adjust forward declaration.
(rtx_unstable_p): Change return type from int to bool
and adjust function body accordingly.
(rtx_addr_can_trap_p): Ditto.
(reg_mentioned_p): Ditto.
(no_labels_between_p): Ditto.
(reg_used_between_p): Ditto.
(reg_referenced_p): Ditto.
(reg_set_between_p): Ditto.
(reg_set_p): Ditto.
(modified_between_p): Ditto.
(modified_in_p): Ditto.
(multiple_sets): Ditto.
(set_noop_p): Ditto.
(noop_move_p): Ditto.
(reg_overlap_mentioned_p): Ditto.
(dead_or_set_p): Ditto.
(dead_or_set_regno_p): Ditto.
(find_reg_fusage): Ditto.
(find_regno_fusage): Ditto.
(remove_node_from_insn_list): Ditto.
(volatile_insn_p): Ditto.
(volatile_refs_p): Ditto.
(side_effects_p): Ditto.
(may_trap_p_1): Ditto.
(may_trap_p): Ditto.
(may_trap_or_fault_p): Ditto.
(computed_jump_p): Ditto.
(auto_inc_p): Ditto.
(loc_mentioned_in_p): Ditto.
* combine.cc (can_combine_p): Update indirect function.
|
|
Even though we can't support floating-point operations which are
depending
on FRM yet, (for example vfadd support is blocked) since the RVV
intrinsic doc is not updated
and we can't support mode switching for this.
We can support floating-point to integer conversion now since it's not
depending on FRM and
we don't need mode switching support for this ('rtz' conversions
independent FRM).
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/autovec.md (<optab><mode><vconvert>2): New pattern.
* config/riscv/iterators.md: New attribute.
* config/riscv/vector-iterators.md: New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test.
|
|
Notice there is warning:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md: In function ‘rtx_def*
gen_anddi3(rtx, rtx, rtx)’:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
Add unsigned conversion to fix this warning.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv.md: Fix signed and unsigned comparison
warning.
|
|
Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/autovec.md (fnma<mode>4): New pattern.
(*fnma<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.
|
|
|
|
This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is enabled.
Provide an example from the existing testcases.
Testcase:
int ConEmv_imm_imm_reg(int x, int y){
if (x == 1000) return 10;
return y;
}
Cflags:
-O2 -march=rv64gc_xtheadcondmov -mabi=lp64d
before patch:
ConEmv_imm_imm_reg:
addi a5,a0,-1000
li a0,10
th.mvnez a0,zero,a5
th.mveqz a1,zero,a5
or a0,a0,a1
ret
after patch:
ConEmv_imm_imm_reg:
addi a5,a0,-1000
li a0,10
th.mvnez a0,a1,a5
ret
Signed-off-by: Die Li <lidie@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_conditional_move_onesided):
Delete.
(riscv_expand_conditional_move): Reuse the TARGET_SFB_ALU expand
process for TARGET_XTHEADCONDMOV
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
|
|
gcc/ChangeLog:
PR target/110021
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also require
TARGET_AVX512BW to generate truncv16hiv16qi2.
|
|
In the case where the target supports extension instructions,
it is preferable to use that instead of doing the same in other ways.
For the following case
void foo (unsigned long a, unsigned long* ptr) {
ptr[0] = a & 0xffffffffUL;
ptr[1] &= 0xffffffffUL;
}
GCC generates
foo:
li a5,-1
srli a5,a5,32
and a0,a0,a5
sd a0,0(a1)
ld a4,8(a1)
and a5,a4,a5
sd a5,8(a1)
ret
but it will be profitable to generate this one
foo:
zext.w a0,a0
sd a0,0(a1)
lwu a5,8(a1)
sd a5,8(a1)
ret
This patch fixes mentioned issue.
It supports HI -> DI, HI->SI and SI -> DI extensions.
gcc/ChangeLog:
* config/riscv/riscv.md (and<mode>3): New expander.
(*and<mode>3) New pattern.
* config/riscv/predicates.md (arith_operand_or_mode_mask): New
predicate.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/and-extend-1.c: New test
* gcc.target/riscv/and-extend-2.c: New test
|
|
This patch would like to remove unnecessary comments of some self
explained parameters and try a better name to avoid misleading.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-v.cc (emit_vlmax_insn): Remove unnecessary
comments and rename local variables.
(emit_nonvlmax_insn): Diito.
(emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(emit_scalar_move_insn): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|
|
This patch would like to remove the magic number in the riscv-v.cc, and
align the same value to one macro.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-v.cc (emit_vlmax_insn): Eliminate the
magic number.
(emit_nonvlmax_insn): Ditto.
(emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(expand_vec_series): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to optimize the VLS vector initialization like
repeating sequence. From the vslide1down to the vmerge with a simple
cost model, aka every instruction only has 1 cost.
Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax
typedef int64_t vnx32di __attribute__ ((vector_size (256)));
__attribute__ ((noipa)) void
f_vnx32di (int64_t a, int64_t b, int64_t *out)
{
vnx32di v = {
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
};
*(vnx32di *) out = v;
}
Before this patch:
vslide1down.vx (x31 times)
After this patch:
li a5,-1431654400
addi a5,a5,-1365
li a3,-1431654400
addi a3,a3,-1366
slli a5,a5,32
add a5,a5,a3
vsetvli a4,zero,e64,m8,ta,ma
vmv.v.x v8,a0
vmv.s.x v0,a5
vmerge.vxm v8,v8,a1,v0
vs8r.v v8,0(a2)
Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into
SEW = 128 element and then broadcast this big element.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum insn_type): New type.
* config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro.
(rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced
class member.
(rvv_builder::get_merged_repeating_sequence): Ditto.
(rvv_builder::repeating_sequence_use_merge_profitable_p): New function
to evaluate the optimization cost.
(rvv_builder::get_merge_scalar_mask): New function to get the merge
mask.
(emit_scalar_move_insn): New function to emit vmv.s.x.
(emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x.
(emit_nonvlmax_integer_move_insn): New function to emit nonvlmax
vmv.v.x.
(get_repeating_sequence_dup_machine_mode): New function to get the dup
machine mode.
(expand_vector_init_merge_repeating_sequence): New function to perform
the optimization.
(expand_vec_init): Add this vector init optimization.
* config/riscv/riscv.h (BITS_PER_WORD): New macro.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This extends the condition to more cases involving debug instructions.
gcc/
* tree-ssa-loop-manip.cc (create_iv): Try harder to find a SLOC to
put onto the increment when it is inserted after the position.
|
|
The Ada compiler gives a bogus warning:
storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run
time [enabled by default]
Ironically enough, this occurs because of an intermediate conversion to an
unsigned type which is supposed to hide overflows but is counter-productive
for constants because TREE_OVERFLOW is always set for them, so it ends up
setting a bogus TREE_OVERFLOW when converting back to the original type.
The fix simply redirects INTEGER_CSTs to the other, direct path without the
intermediate conversion to the unsigned type.
gcc/
* match.pd ((T)P - (T)(P + A) -> -(T) A): Avoid artificial overflow
on constants.
gcc/testsuite/
* gnat.dg/specs/storage_offset1.ads: New test.
|
|
In s-oscons-tmplt.c, sigset is defined inside the HAVE_SOCKETS bloc.
A platform could require sigset without supporting sockets.
gcc/ada/
* s-oscons-tmplt.c: move the definition of sigset out of the
HAVE_SOCKETS bloc.
|
|
g-spogwa.adb is the body of the procedure GNAT.Sockets.Poll.G_Wait.
This is a socket specific procedure. It should only be built for
systems that support sockets.
gcc/ada/
* Makefile.rtl: Move g-spogwa$(objext) from GNATRTL_NONTASKING_OBJS
to GNATRTL_SOCKETS_OBJS
|
|
It occurs during the instantiation because the compiler forgets the context
of the generic declaration.
gcc/ada/
* freeze.adb (Wrap_Imported_Subprogram): Use Copy_Subprogram_Spec in
both cases to copy the spec of the subprogram.
|