Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.
Similary to the binop tests, there are flavors for zvfh now.
gcc/ChangeLog:
* config/riscv/autovec.md (<optab><mode>2): Add unop expanders.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add unops.
|
|
This implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.
The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.
As long as we do not have full middle-end support we need
-ffast-math for the tests.
In order to allow proper _Float16 this patch disables
general _Float16 promotion to float TARGET_ZVFH is defined
similar to TARGET_ZFH or TARGET_ZHINX.
gcc/ChangeLog:
* config/riscv/autovec.md (<optab><mode>3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(enum vxrm_field_enum): Rename this...
(enum fixed_point_rounding_mode): ...to this.
(enum frm_field_enum): Rename this...
(enum floating_point_rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
* lib/target-supports.exp: Add riscv_vector_hw and riscv_zvfh_hw
target selectors.
|
|
When the destination register of a vmv.x.s needs to be sign extended to
XLEN we currently emit an sext insn. Since vmv.x.s performs this
automatically this patch adds two instruction patterns that include
sign_extend for the destination operand.
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Add VI_QH iterator.
* config/riscv/autovec-opt.md
(@pred_extract_first_sextdi<mode>): New vmv.x.s pattern
that includes sign extension.
(@pred_extract_first_sextsi<mode>): Dito for SImode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Ensure
that no sext insns are present.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
|
|
This implements the vec_set and vec_extract patterns for integer and
floating-point data types. For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.
vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.
The patch does not include vector-vector extraction which
will be done at a later time.
gcc/ChangeLog:
* config/riscv/autovec.md (vec_set<mode>): Implement.
(vec_extract<mode><vel>): Implement.
* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
(emit_vlmax_slide_insn): Declare.
(emit_nonvlmax_slide_tu_insn): Declare.
(emit_scalar_move_insn): Export.
(emit_nonvlmax_integer_move_insn): Export.
* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
(emit_nonvlmax_slide_tu_insn): New function.
(emit_vlmax_masked_mu_insn): No change.
(emit_vlmax_integer_move_insn): Export.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
New test.
|
|
This patch adds the missing (u)int8_t types to the binop tests.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Adapt for
(u)int8_t.
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Dito.
|
|
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).
Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.
One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops. Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.
size scalar 128 256 512 512e 512f
1 9.42 11.32 9.35 11.17 15.13 16.89
2 5.72 6.53 6.66 6.66 7.62 8.56
3 4.49 5.10 5.10 5.74 5.08 5.73
4 4.10 4.33 4.29 5.21 3.79 4.25
6 3.78 3.85 3.86 4.76 2.54 2.85
8 3.64 1.89 3.76 4.50 1.92 2.16
12 3.56 2.21 3.75 4.26 1.26 1.42
16 3.36 0.83 1.06 4.16 0.95 1.07
20 3.39 1.42 1.33 4.07 0.75 0.85
24 3.23 0.66 1.72 4.22 0.62 0.70
28 3.18 1.09 2.04 4.20 0.54 0.61
32 3.16 0.47 0.41 0.41 0.47 0.53
34 3.16 0.67 0.61 0.56 0.44 0.50
38 3.19 0.95 0.95 0.82 0.40 0.45
42 3.09 0.58 1.21 1.13 0.36 0.40
'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop. From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.
This patch does not enable using fully masked loops or
masked epilogues by default. More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.
Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.
Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls. Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info. vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.
I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
The vect_prepare_for_masked_peels hunk might run into issues with
SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
looked odd.
Bootstrapped and tested on x86_64-unknown-linux-gnu. I've run
the testsuite with --param vect-partial-vector-usage=2 with and
without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
and one latent wrong-code (PR110237).
* tree-vectorizer.h (enum vect_partial_vector_style): New.
(_loop_vec_info::partial_vector_style): Likewise.
(LOOP_VINFO_PARTIAL_VECTORS_STYLE): Likewise.
(rgroup_controls::compare_type): Add.
(vec_loop_masks): Change from a typedef to auto_vec<>
to a structure.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
Adjust. Convert niters_skip to compare_type.
(vect_set_loop_condition_partial_vectors_avx512): New function
implementing the AVX512 partial vector codegen.
(vect_set_loop_condition): Dispatch to the correct
vect_set_loop_condition_partial_vectors_* function based on
LOOP_VINFO_PARTIAL_VECTORS_STYLE.
(vect_prepare_for_masked_peels): Compute LOOP_VINFO_MASK_SKIP_NITERS
in the original niter type.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
partial_vector_style.
(can_produce_all_loop_masks_p): Adjust.
(vect_verify_full_masking): Produce the rgroup_controls vector
here. Set LOOP_VINFO_PARTIAL_VECTORS_STYLE on success.
(vect_verify_full_masking_avx512): New function implementing
verification of AVX512 style masking.
(vect_verify_loop_lens): Set LOOP_VINFO_PARTIAL_VECTORS_STYLE.
(vect_analyze_loop_2): Also try AVX512 style masking.
Adjust condition.
(vect_estimate_min_profitable_iters): Implement AVX512 style
mask producing cost.
(vect_record_loop_mask): Do not build the rgroup_controls
vector here but record masks in a hash-set.
(vect_get_loop_mask): Implement AVX512 style mask query,
complementing the existing while_ult style.
|
|
This adds a loop_vinfo argument for future use, making the next
patch smaller.
* tree-vectorizer.h (vect_get_loop_mask): Add loop_vec_info
argument.
* tree-vect-loop.cc (vect_get_loop_mask): Likewise.
(vectorize_fold_left_reduction): Adjust.
(vect_transform_reduction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-stmts.cc (vectorizable_call): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
|
|
This commit fixes an ICE when an optimize attribute changes the prevailing
optimization level.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105069 describes the
same ICE for the sh target, where the fix was to enable save/restore of
target specific options modified via TARGET_OPTIMIZATION_TABLE hook.
For the AVR target, mgas-isr-prologues and -mmain-is-OS_task are those
target specific options. As they enable generation of more optimal code,
this commit adds the Optimization option property to those option records,
and that fixes the ICE.
Regression run shows no regressions, and >100 new PASSes.
PR target/110086
gcc/ChangeLog:
* config/avr/avr.opt (mgas-isr-prologues, mmain-is-OS_task):
Add Optimization option property.
gcc/testsuite/ChangeLog:
* gcc.target/avr/pr110086.c: New test.
|
|
This patch adds a new 2-instructions constant synthesis pattern:
- A non-negative square value that root can fit into a signed 12-bit:
=> "MOVI(.N) Ax, simm12" + "MULL Ax, Ax, Ax"
Due to the execution cost of the integer multiply instruction (MULL), this
synthesis works only when the 32-bit Integer Multiply Option is configured
and optimize for size is specified.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_constantsynth_2insn):
Add new pattern for the abovementioned case.
|
|
It used to always return a constant 4, which is same as the default
behavior, but doesn't take into account the effects of secondary
reloads.
Therefore, the implementation of this target hook is removed.
gcc/ChangeLog:
* config/xtensa/xtensa.cc
(TARGET_MEMORY_MOVE_COST, xtensa_memory_move_cost): Remove.
|
|
There is a functionality as const_anchor in cse.cc. This const_anchor
supports to generate new constants through adding small gap/offsets to
existing constant. For example:
void __attribute__ ((noinline)) foo (long long *a)
{
*a++ = 0x2351847027482577LL;
*a++ = 0x2351847027482578LL;
}
The second constant (0x2351847027482578LL) can be compated by adding '1'
to the first constant (0x2351847027482577LL).
This is profitable if more than one instructions are need to build the
second constant.
* For rs6000, we can enable this functionality, as the instruction
'addi' is just for this when gap is smaller than 0x8000.
* One potential side effect of this feature:
Comparing with
"r101=0x2351847027482577LL
...
r201=0x2351847027482578LL"
The new r201 will be "r201=r101+1", and then r101 will live longer,
and would increase pressure when allocating registers.
But I feel, this would be acceptable for this const_anchor feature.
With this feature, for GCC source code and SPEC object files, the
significant changes are the improvement that: "addi" vs. "2 or more
insns: lis+or.."; it also exposes some other optimizations
opportunities: like combine/jump2. While the side effect is also
occurring in few cases, but it does not impact overall performance.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/const_anchors.c: New test.
* gcc.target/powerpc/try_const_anchors_ice.c: New test.
|
|
The const_anchor in cse.cc supports integer constants only.
There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in
try_const_anchors.
In the latest code, some non-integer modes are used with const int.
For examples:
"set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of
rs6000, i386, arm, and pa. For this, the mode may be BLKmode.
Pattern "(set (strict_low_part (xx)) (const_int xx))" could
be generated in a few ports. For this, the mode may be VOIDmode.
So, avoid mode other than SCALAR_INT_MODE in try_const_anchors
would be needed.
Some discussions in the previous thread:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html
gcc/ChangeLog:
* cse.cc (try_const_anchors): Check SCALAR_INT_MODE.
|
|
The packing in vpacksswb/vpackssdw is not a simple concat, it's an
interweave from src1 and src2 for every 128 bit(or 64-bit for the
ss_truncate result).
.i.e.
dst[192-255] = ss_truncate (src2[128-255])
dst[128-191] = ss_truncate (src1[128-255])
dst[64-127] = ss_truncate (src2[0-127])
dst[0-63] = ss_truncate (src1[0-127]
The patch refined those patterns with an extra vec_select for the
interweave.
gcc/ChangeLog:
PR target/110235
* config/i386/sse.md (<sse2_avx2>_packsswb<mask_name>):
Substitute with ..
(sse2_packsswb<mask_name>): .. this, ..
(avx2_packsswb<mask_name>): .. this and ..
(avx512bw_packsswb<mask_name>): .. this.
(<sse2_avx2>_packssdw<mask_name>): Substitute with ..
(sse2_packssdw<mask_name>): .. this, ..
(avx2_packssdw<mask_name>): .. this and ..
(avx512bw_packssdw<mask_name>): .. this.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-vpackssdw-3.c: New test.
* gcc.target/i386/avx512bw-vpacksswb-3.c: New test.
|
|
us_truncate.
packuswb/packusdw does unsigned saturation for signed source, but rtl
us_truncate means does unsigned saturation for unsigned source.
So for value -1, packuswb will produce 0, but us_truncate produces
255. The patch reimplement those related patterns and functions with
UNSPEC_US_TRUNCATE instead of us_truncate.
gcc/ChangeLog:
PR target/110235
* config/i386/i386-expand.cc (ix86_split_mmx_pack): Use
UNSPEC_US_TRUNCATE instead of original us_truncate for
packusdw/packuswb.
* config/i386/mmx.md (mmx_pack<s_trunsuffix>swb): Substitute
with ..
(mmx_packsswb): .. this and ..
(mmx_packuswb): .. this.
(mmx_packusdw): Use UNSPEC_US_TRUNCATE instead of original
us_truncate.
(s_trunsuffix): Removed code iterator.
(any_s_truncate): Ditto.
* config/i386/sse.md (<sse2_avx2>_packuswb<mask_name>): Use
UNSPEC_US_TRUNCATE instead of original us_truncate.
(<sse4_1_avx2>_packusdw<mask_name>): Ditto.
* config/i386/i386.md (UNSPEC_US_TRUNCATE): New unspec_c_enum.
|
|
|
|
This patch would like to fix one typo when GET_MODE_CLASS by mode.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Fix one typo.
|
|
gcc/testsuite/ChangeLog:
* gcc.dg/lto/20091013-1_0.c: Disable stringop-overread warning.
|
|
gcc/ChangeLog:
* rtl.h (*rtx_equal_p_callback_function):
Change return type from int to bool.
(rtx_equal_p): Ditto.
(*hash_rtx_callback_function): Ditto.
* rtl.cc (rtx_equal_p): Change return type from int to bool
and adjust function body accordingly.
* early-remat.cc (scratch_equal): Ditto.
* sel-sched-ir.cc (skip_unspecs_callback): Ditto.
(hash_with_unspec_callback): Ditto.
|
|
This patch removes stor-layout.o from the front end and also removes
back end header files from gcc-consolidation.h.
gcc/m2/ChangeLog:
PR modula2/110284
* Make-lang.in (m2_OBJS): Assign $(GM2_C_OBJS).
(GM2_C_OBJS): Remove m2/stor-layout.o.
(m2/stor-layout.o): Remove rule.
* gm2-gcc/gcc-consolidation.h (rtl.h): Remove include.
(df.h): Remove include.
(except.h): Remove include.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
Testing the V2 version of Manolis's fold-mem-offsets patch exposed a minor bug
in the arc backend.
The movsf_insn pattern has constraints which allow storing certain constants
to memory. reload/lra will target those alternatives under the right
circumstances. However the insn's condition requires that one of the two
operands must be a register.
Thus if a pass were to force re-recognition of the pattern we can get an
unrecognized insn failure.
This patch adjusts the conditions to more closely match movsi_insn. More
specifically it allows storing a constant into a limited set of memory
operands (as defined by the Usc constraint). movqi_insn has the same
core problem and gets the same solution.
Committed after the tester validated there are not regresisons
gcc/
* config/arc/arc.md (movqi_insn): Allow certain constants to
be stored into memory in the pattern's condition.
(movsf_insn): Similarly.
|
|
this patch extends ipa-fnsummary to anticipate statements that will be removed
by SRA. This is done by looking for calls passing addresses of automatic
variables. In function body we look for dereferences from pointers of such
variables and mark them with new not_sra_candidate condition.
This is just first step which is overly optimistic. We do not try to prove that
given automatic variable will not be SRAed even after inlining. We now also
optimistically assume that the transformation will always happen. I will restrict
this in a followup patch, but I think it is useful to gether some data on how
much code is affected by this.
This is motivated by PR109849 where we fail to fully inline push_back.
The patch alone does not solve the problem even for -O3, but improves
analysis in this case.
gcc/ChangeLog:
PR tree-optimization/109849
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new parameter
ES; handle ipa_predicate::not_sra_candidate.
(evaluate_properties_for_edge): Pass es to
evaluate_conditions_for_known_args.
(ipa_fn_summary_t::duplicate): Handle sra candidates.
(dump_ipa_call_summary): Dump points_to_possible_sra_candidate.
(load_or_store_of_ptr_parameter): New function.
(points_to_possible_sra_candidate_p): New function.
(analyze_function_body): Initialize points_to_possible_sra_candidate;
determine sra predicates.
(estimate_ipcp_clone_size_and_time): Update call of
evaluate_conditions_for_known_args.
(remap_edge_params): Update points_to_possible_sra_candidate.
(read_ipa_call_summary): Stream points_to_possible_sra_candidate
(write_ipa_call_summary): Likewise.
* ipa-predicate.cc (ipa_predicate::add_clause): Handle not_sra_candidate.
(dump_condition): Dump it.
* ipa-predicate.h (struct inline_param_summary): Add
points_to_possible_sra_candidate.
gcc/testsuite/ChangeLog:
PR tree-optimization/109849
* g++.dg/ipa/devirt-45.C: Update template.
|
|
This patch refactors the three places in the i386.md backend that we
set the carry flag into a new ix86_expand_carry helper function, that
allows Jakub's recently added uaddc<mode>5 and usubc<mode>5 expanders
to take advantage of the recently added support for the stc instruction.
2023-06-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_carry): New helper
function for setting the carry flag.
(ix86_expand_builtin) <handlecarry>: Use it here.
* config/i386/i386-protos.h (ix86_expand_carry): Prototype here.
* config/i386/i386.md (uaddc<mode>5): Use ix86_expand_carry.
(usubc<mode>5): Likewise.
|
|
This clean-up improves consistency within i386.md by using QImode for
the constant shift count in patterns that specify a mode.
2023-06-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (*concat<mode><dwi>3_1): Use QImode
for the immediate constant shift count.
(*concat<mode><dwi>3_2): Likewise.
(*concat<mode><dwi>3_3): Likewise.
(*concat<mode><dwi>3_4): Likewise.
(*concat<mode><dwi>3_5): Likewise.
(*concat<mode><dwi>3_6): Likewise.
|
|
Use default argument when callback function is not required to merge
rtx_equal_p and hash_rtx functions with their callback variants.
gcc/ChangeLog:
* cse.cc (hash_rtx_cb): Rename to hash_rtx.
(hash_rtx): Remove.
* early-remat.cc (remat_candidate_hasher::equal): Update
to call rtx_equal_p with rtx_equal_p_callback_function argument.
* rtl.cc (rtx_equal_p_cb): Rename to rtx_equal_p.
(rtx_equal_p): Remove.
* rtl.h (rtx_equal_p): Add rtx_equal_p_callback_function
argument with NULL default value.
(rtx_equal_p_cb): Remove function declaration.
(hash_rtx_cb): Ditto.
(hash_rtx): Add hash_rtx_callback_function argument
with NULL default value.
* sel-sched-ir.cc (free_nop_pool): Update function comment.
(skip_unspecs_callback): Ditto.
(vinsn_init): Update to call hash_rtx with
hash_rtx_callback_function argument.
(vinsn_equal_p): Ditto.
|
|
This patch adds support for the float16 tuple type.
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (valid_type): Enable FP16 tuple.
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(ADJUST_ALIGNMENT): Ditto.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
(ADJUST_NUNITS): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t):
New types.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): New macro.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): New.
* config/riscv/riscv.md: New.
* config/riscv/vector-iterators.md: New.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/tuple-28.c: New test.
* gcc.target/riscv/rvv/base/tuple-29.c: New test.
* gcc.target/riscv/rvv/base/tuple-30.c: New test.
* gcc.target/riscv/rvv/base/tuple-31.c: New test.
* gcc.target/riscv/rvv/base/tuple-32.c: New test.
|
|
|
|
This patch splits out two (independent) minor changes to i386-expand.cc's
ix86_expand_move from a larger patch, given that it's better to review
and commit these independent pieces separately from a more complex patch.
The first change is to test for CONST_WIDE_INT_P before calling
ix86_convert_const_wide_int_to_broadcast. Whilst stepping through
this function in gdb, I was surprised that the code was continually
jumping into this function with operands that obviously weren't
appropriate.
The second change is to generalize the optimization for efficiently
moving a TImode value to V1TImode (via V2DImode), to cover all 128-bit
vector modes.
Hence for the test case:
typedef unsigned long uv2di __attribute__ ((__vector_size__ (16)));
uv2di foo2(__int128 x) { return (uv2di)x; }
we'd previously move via memory with:
foo2: movq %rdi, -24(%rsp)
movq %rsi, -16(%rsp)
movdqa -24(%rsp), %xmm0
ret
with this patch we now generate with -O2 (the same as V1TImode):
foo2: movq %rdi, %xmm0
movq %rsi, %xmm1
punpcklqdq %xmm1, %xmm0
ret
and with -O2 -msse4 the even better:
foo2: movq %rdi, %xmm0
pinsrq $1, %rsi, %xmm0
ret
The new test case is unimaginatively called sse2-v1ti-mov-2.c given
the original test case just for V1TI mode was called sse2-v1ti-mov-1.c.
2023-06-17 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Check that OP1 is
CONST_WIDE_INT_P before calling ix86_convert_wide_int_to_broadcast.
Generalize special case for converting TImode to V1TImode to handle
all 128-bit vector conversions.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-mov-2.c: New test case.
|
|
From c3f3b2fd53291805b5d0be19df6d1a348c5889ec Mon Sep 17 00:00:00 2001
From: Costas Argyris <costas.argyris@gmail.com>
Date: Thu, 15 Jun 2023 12:37:35 +0100
Subject: [PATCH] gcc-ar: Remove code duplication.
Preparatory refactoring that simplifies by eliminating
some duplicated code, before trying to fix 77576.
I believe this stands on its own regardless of the PR.
It also saves a nargv element when we have a plugin and
three when not.
gcc/
* gcc-ar.cc (main): Refactor to slightly reduce code
duplication. Avoid unnecessary elements in nargv.
|
|
|
|
The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
code_for_reduc (code, mode1, mode2)
{
if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+
if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx8qi; // ZVE64
if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx4qi; // ZVE32
}
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx8QI), then the correct code of ZVE32
will be returned as expectation.
Please note both GCC 13 and 14 are impacted by this issue.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
PR target/110265
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md
(@pred_reduc_<reduc><mode><vlmul1>): Removed.
(@pred_reduc_<reduc><mode><vlmul1_zve64>): Likewise.
(@pred_reduc_<reduc><mode><vlmul1_zve32>): Likewise.
(@pred_reduc_<reduc><VQI:mode><VQI_LMUL1:mode>): New pattern.
(@pred_reduc_<reduc><VHI:mode><VHI_LMUL1:mode>): Likewise.
(@pred_reduc_<reduc><VSI:mode><VSI_LMUL1:mode>): Likewise.
(@pred_reduc_<reduc><VDI:mode><VDI_LMUL1:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.
|
|
This patch fixes this issue happens on both GCC-13 and GCC-14.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264
The testcase is too big and I failed to reduce it so I didn't append
test into this patch.
This patch should not only land into GCC-14 but also should backport to GCC-13.
PR target/110264
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (insert_vsetvl): Fix bug.
|
|
In CL 384695 I simplified the code that built lists of benchmarks,
examples, and fuzz tests, and managed to break it. This CL corrects
the code to once again make the benchmarks available, and to run
the examples with output and the fuzz targets.
Doing this revealed a test failure in internal/fuzz on 32-bit x86:
a signalling NaN is turned into a quiet NaN on the 387 floating-point
stack that GCC uses by default. This CL skips the test.
Fixes golang/go#60826
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/503798
|
|
While the design of these builtins in clang is questionable,
rather than being say
unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
so that it is clear they add two [0, 0xffffffff] range numbers
plus one [0, 1] range carry in and give [0, 0xffffffff] range
return plus [0, 1] range carry out, they actually instead
add 3 [0, 0xffffffff] values together but the carry out
isn't then the expected [0, 2] value because
0xffffffffULL + 0xffffffff + 0xffffffff is 0x2fffffffd,
but just [0, 1] whether there was any overflow at all.
It is something used in the wild and shorter to write than the
corresponding
#define __builtin_addc(a,b,carry_in,carry_out) \
({ unsigned _s; \
unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \
unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \
*(carry_out) = (_c1 | _c2); \
_s; })
and so a canned builtin for something people could often use.
It isn't that hard to maintain on the GCC side, as we just lower
it to two .ADD_OVERFLOW calls early, and the already committed
pottern recognization code can then make .UADDC/.USUBC calls out of
that if the carry in is in [0, 1] range and the corresponding
optab is supported by the target.
2023-06-16 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* builtin-types.def (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR): New
types.
* builtins.def (BUILT_IN_ADDC, BUILT_IN_ADDCL, BUILT_IN_ADDCLL,
BUILT_IN_SUBC, BUILT_IN_SUBCL, BUILT_IN_SUBCLL): New builtins.
* builtins.cc (fold_builtin_addc_subc): New function.
(fold_builtin_varargs): Handle BUILT_IN_{ADD,SUB}C{,L,LL}.
* doc/extend.texi (__builtin_addc, __builtin_subc): Document.
* gcc.target/i386/pr79173-11.c: New test.
* gcc.dg/builtin-addc-1.c: New test.
|
|
The following testcase ICEs, because I misremembered what the return value
from match_arith_overflow is. It isn't true if __builtin_*_overflow was
matched, but it is true only in the BIT_NOT_EXPR case if stmt was removed.
So, if match_arith_overflow matches something, gsi_stmt (gsi) will not
be stmt and match_uaddc_usubc will be confused and can ICE.
The following patch fixes it by checking if gsi_stmt (gsi) == stmt,
in that case we know it is still a PLUS_EXPR/MINUS_EXPR and we can try to
pattern match it further as UADDC/USUBC.
2023-06-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110271
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children)
<case PLUS_EXPR>: Ignore return value from match_arith_overflow,
instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt.
* gcc.c-torture/compile/pr110271.c: New test.
|
|
As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621976.html this
should put the autotools generated files in sync to what they were
generated from (and make an automated checker happy).
Tested by bootstrapping on top of only a few revisions ago.
zlib/ChangeLog:
2023-06-16 Martin Jambor <mjambor@suse.cz>
* Makefile.in: Regenerate.
* configure: Likewise.
gcc/ChangeLog:
2023-06-16 Martin Jambor <mjambor@suse.cz>
* configure: Regenerate.
|
|
This patch addresses the last remaining issue with PR target/31985, that
GCC could make better use of memory addressing modes when implementing
double word addition. This is achieved by adding a define_insn_and_split
that combines an *add<dwi>3_doubleword with a *concat<mode><dwi>3, so
that the components of the concat can be used directly, without first
being loaded into a double word register.
For test_c in the bugzilla PR:
Before:
pushl %ebx
subl $16, %esp
movl 28(%esp), %eax
movl 36(%esp), %ecx
movl 32(%esp), %ebx
movl 24(%esp), %edx
addl %ecx, %eax
adcl %ebx, %edx
movl %eax, 8(%esp)
movl %edx, 12(%esp)
addl $16, %esp
popl %ebx
ret
After:
test_c:
subl $20, %esp
movl 36(%esp), %eax
movl 32(%esp), %edx
addl 28(%esp), %eax
adcl 24(%esp), %edx
movl %eax, 8(%esp)
movl %edx, 12(%esp)
addl $20, %esp
ret
2023-06-16 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/31985
* config/i386/i386.md (*add<dwi>3_doubleword_concat): New
define_insn_and_split combine *add<dwi>3_doubleword with
a *concat<mode><dwi>3 for more efficient lowering after reload.
gcc/testsuite/ChangeLog
PR target/31985
* gcc.target/i386/pr31985.c: New test case.
|
|
IRA adds conflicts to the pseudos from insns can throw exceptions
internally even if the exception code is final for the function and
the pseudo value is not used in the exception code. This results in
spilling a pseudo in a loop (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).
The following patch fixes the problem.
PR rtl-optimization/110215
gcc/ChangeLog:
* ira-lives.cc: Include except.h.
(process_bb_node_lives): Ignore conflicts from cleanup exceptions
when the pseudo does not live at the exception landing pad.
|
|
macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
of the form:
typedef enum T : BaseType T;
i.e. an elaborated-type-specifier with an additional enum-base.
Upstream LLVM can be made to accept the above construct with
-Wno-error=elaborated-enum-base.
This patch adds the -Welaborated-enum-base warning to GCC and adjusts
the C++ parser to emit this warning instead of rejecting this code
outright.
The macro expansion in the macOS headers occurs in the case that the
compiler declares support for enums with underlying type using
__has_feature, see
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html
GCC rejecting this construct outright means that GCC fails to bootstrap
on Darwin in the case that it (correctly) implements __has_feature and
declares support for C++ enums with underlying type.
With this patch, GCC can bootstrap on Darwin in combination with the
(WIP) __has_feature patch posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html
gcc/c-family/ChangeLog:
* c.opt (Welaborated-enum-base): New.
gcc/ChangeLog:
* doc/invoke.texi: Document -Welaborated-enum-base.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_enum_specifier): Don't reject
elaborated-type-specifier with enum-base, instead emit new
Welaborated-enum-base warning.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/enum40.C: Adjust expected diagnostics.
* g++.dg/cpp0x/forw_enum6.C: Likewise.
* g++.dg/cpp0x/elab-enum-base.C: New test.
|
|
Similar to the low-half patterns, we want to match both ashiftrt and
lshiftrt with the truncate for SHRN2. We reuse the SHIFTRT iterator
and the AARCH64_VALID_SHRN_OP check to help, but because we expand the
high-half patterns by their gen_* names we need to disambiguate all the
different trunc+shift combinations in the pattern name, which leads to a
slight renaming of the builtins. The AARCH64_VALID_SHRN_OP check on the
expander and the define_insns ensures that no invalid combination ends
up getting matched.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (shrn2_n): Rename builtins to...
(ushrn2_n): ... This.
(sqshrn2_n): Rename builtins to...
(ssqshrn2_n): ... This.
(uqshrn2_n): Rename builtins to...
(uqushrn2_n): ... This.
* config/aarch64/arm_neon.h (vqshrn_high_n_s16): Adjust for the above.
(vqshrn_high_n_s32): Likewise.
(vqshrn_high_n_s64): Likewise.
(vqshrn_high_n_u16): Likewise.
(vqshrn_high_n_u32): Likewise.
(vqshrn_high_n_u64): Likewise.
(vshrn_high_n_s16): Likewise.
(vshrn_high_n_s32): Likewise.
(vshrn_high_n_s64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.
* config/aarch64/aarch64-simd.md (aarch64_<shrn_op>shrn2_n<mode>_insn_le):
Rename to...
(aarch64_<shrn_op><sra_op>shrn2_n<mode>_insn_le): ... This.
Use SHIFTRT iterator and AARCH64_VALID_SHRN_OP check.
(aarch64_<shrn_op>shrn2_n<mode>_insn_be): Rename to...
(aarch64_<shrn_op><sra_op>shrn2_n<mode>_insn_be): ... This.
Use SHIFTRT iterator and AARCH64_VALID_SHRN_OP check.
(aarch64_<shrn_op>shrn2_n<mode>): Rename to...
(aarch64_<shrn_op><sra_op>shrn2_n<mode>): ... This.
Update expander for the above.
|
|
This patch is large in lines of code, but it is a fairly regular
extension of the first patch as it converts the high-half patterns
to standard RTL codes in the same fashion as the first patch did for the
low-half ones.
This now allows us to remove the unspec codes for these instructions as
there are no more uses of them left.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (shrn2): Rename builtins to...
(shrn2_n): ... This.
(rshrn2): Rename builtins to...
(rshrn2_n): ... This.
* config/aarch64/arm_neon.h (vrshrn_high_n_s16): Adjust for the above.
(vrshrn_high_n_s32): Likewise.
(vrshrn_high_n_s64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vshrn_high_n_s16): Likewise.
(vshrn_high_n_s32): Likewise.
(vshrn_high_n_s64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_<srn_op>shrn<mode>2_vect_le):
Delete.
(*aarch64_<srn_op>shrn<mode>2_vect_be): Likewise.
(aarch64_shrn2<mode>_insn_le): Likewise.
(aarch64_shrn2<mode>_insn_be): Likewise.
(aarch64_shrn2<mode>): Likewise.
(aarch64_rshrn2<mode>_insn_le): Likewise.
(aarch64_rshrn2<mode>_insn_be): Likewise.
(aarch64_rshrn2<mode>): Likewise.
(aarch64_<sur>q<r>shr<u>n2_n<mode>_insn_le): Likewise.
(aarch64_<shrn_op>shrn2_n<mode>_insn_le): New define_insn.
(aarch64_<sur>q<r>shr<u>n2_n<mode>_insn_be): Delete.
(aarch64_<shrn_op>shrn2_n<mode>_insn_be): New define_insn.
(aarch64_<sur>q<r>shr<u>n2_n<mode>): Delete.
(aarch64_<shrn_op>shrn2_n<mode>): New define_expand.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_le): New define_insn.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_be): New define_insn.
(aarch64_<shrn_op>rshrn2_n<mode>): New define_expand.
(aarch64_sqshrun2_n<mode>_insn_le): New define_insn.
(aarch64_sqshrun2_n<mode>_insn_be): New define_insn.
(aarch64_sqshrun2_n<mode>): New define_expand.
(aarch64_sqrshrun2_n<mode>_insn_le): New define_insn.
(aarch64_sqrshrun2_n<mode>_insn_be): New define_insn.
(aarch64_sqrshrun2_n<mode>): New define_expand.
* config/aarch64/iterators.md (UNSPEC_SQSHRUN, UNSPEC_SQRSHRUN,
UNSPEC_SQSHRN, UNSPEC_UQSHRN, UNSPEC_SQRSHRN, UNSPEC_UQRSHRN):
Delete unspec values.
(VQSHRN_N): Delete int iterator.
|
|
The first patch in the series has some fallout in the testsuite,
particularly gcc.target/aarch64/shrn-combine-2.c.
Our previous patterns for SHRN matched both
(truncate (ashiftrt (x) (N))) and (truncate (lshiftrt (x) (N))
as these are equivalent for the shift amounts involved.
In our refactoring, however, we mapped shrn to truncate+lshiftrt.
The fix here is to iterate over ashiftrt,lshiftrt in the pattern for it.
However, we don't want to allow ashiftrt for us_truncate or lshiftrt for
ss_truncate from the ALL_TRUNC iterator.
This patch addds a AARCH64_VALID_SHRN_OP helper to gate the valid
combinations of truncations and shifts.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64.h (AARCH64_VALID_SHRN_OP): Define.
* config/aarch64/aarch64-simd.md
(*aarch64_<shrn_op>shrn_n<mode>_insn<vczle><vczbe>): Rename to...
(*aarch64_<shrn_op><shrn_s>shrn_n<mode>_insn<vczle><vczbe>): ... This.
Use SHIFTRT iterator and add AARCH64_VALID_SHRN_OP to condition.
* config/aarch64/iterators.md (shrn_s): New code attribute.
|
|
Some instructions from the previous patch have scalar forms:
SQSHRN,SQRSHRN,UQSHRN,UQRSHRN,SQSHRUN,SQRSHRUN.
This patch converts the patterns for these to use standard RTL codes.
Their MD patterns deviate slightly from the vector forms mostly due to
things like operands being scalar rather than vectors.
One nuance is in the SQSHRUN,SQRSHRUN patterns. These end in a truncate
to the scalar narrow mode e.g. SI -> QI. This gets simplified by the
RTL passes to a subreg rather than keeping it as a truncate.
So we end up representing these without the truncate and in the expander
read the narrow subreg in order to comply with the expected width of the
intrinsic.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>q<r>shr<u>n_n<mode>):
Rename to...
(aarch64_<shrn_op>shrn_n<mode>): ... This. Reimplement with RTL codes.
(*aarch64_<shrn_op>rshrn_n<mode>_insn): New define_insn.
(aarch64_sqrshrun_n<mode>_insn): Likewise.
(aarch64_sqshrun_n<mode>_insn): Likewise.
(aarch64_<shrn_op>rshrn_n<mode>): New define_expand.
(aarch64_sqshrun_n<mode>): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
* config/aarch64/iterators.md (V2XWIDE): Add HI and SI modes.
|
|
This patch reimplements the MD patterns for the instructions that
perform narrowing right shifts with optional rounding and saturation
using standard RTL codes rather than unspecs.
There are four groups of patterns involved:
* Simple narrowing shifts with optional signed or unsigned truncation:
SHRN, SQSHRN, UQSHRN. These are expressed as a truncation operation of
a right shift. The matrix of valid combinations looks like this:
| ashiftrt | lshiftrt |
------------------------------------------
ss_truncate | SQSHRN | X |
us_truncate | X | UQSHRN |
truncate | X | SHRN |
------------------------------------------
* Narrowing shifts with rounding with optional signed or unsigned
truncation: RSHRN, SQRSHRN, UQRSHRN. These follow the same
combinations of truncation and shift codes as above, but also perform
intermediate widening of the results in order to represent the addition
of the rounding constant. This group also corrects an existing
inaccuracy for RSHRN where we don't currently model the intermediate
widening for rounding.
* The somewhat special "Signed saturating Shift Right Unsigned Narrow":
SQSHRUN. Similar to the SQXTUN instructions, these perform a
saturating truncation that isn't represented by US_TRUNCATE or
SS_TRUNCATE but needs to use a clamping operation followed by a
TRUNCATE.
* The rounding version of the above: SQRSHRUN. It needs the special
clamping truncate representation but with an intermediate widening and
rounding addition.
Besides using standard RTL codes for all of the above instructions, this
patch allows us to get rid of the explicit define_insns and
define_expands for SHRN and RSHRN.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf. We've got pretty thorough execute tests in
advsimd-intrinsics.exp that exercise these and many instances of these
instructions get constant-folded away during optimisation and the
validation still passes (during development where I was figuring out the
details of the semantics they were discovering failures), so I'm fairly
confident in the representation.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (shrn): Rename builtins to...
(shrn_n): ... This.
(rshrn): Rename builtins to...
(rshrn_n): ... This.
* config/aarch64/arm_neon.h (vshrn_n_s16): Adjust for the above.
(vshrn_n_s32): Likewise.
(vshrn_n_s64): Likewise.
(vshrn_n_u16): Likewise.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
(vrshrn_n_s16): Likewise.
(vrshrn_n_s32): Likewise.
(vrshrn_n_s64): Likewise.
(vrshrn_n_u16): Likewise.
(vrshrn_n_u32): Likewise.
(vrshrn_n_u64): Likewise.
* config/aarch64/aarch64-simd.md
(*aarch64_<srn_op>shrn<mode><vczle><vczbe>): Delete.
(aarch64_shrn<mode>): Likewise.
(aarch64_rshrn<mode><vczle><vczbe>_insn): Likewise.
(aarch64_rshrn<mode>): Likewise.
(aarch64_<sur>q<r>shr<u>n_n<mode>_insn<vczle><vczbe>): Likewise.
(aarch64_<sur>q<r>shr<u>n_n<mode>): Likewise.
(*aarch64_<shrn_op>shrn_n<mode>_insn<vczle><vczbe>): New define_insn.
(*aarch64_<shrn_op>rshrn_n<mode>_insn<vczle><vczbe>): Likewise.
(*aarch64_sqshrun_n<mode>_insn<vczle><vczbe>): Likewise.
(*aarch64_sqrshrun_n<mode>_insn<vczle><vczbe>): Likewise.
(aarch64_<shrn_op>shrn_n<mode>): New define_expand.
(aarch64_<shrn_op>rshrn_n<mode>): Likewise.
(aarch64_sqshrun_n<mode>): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
* config/aarch64/iterators.md (ALL_TRUNC): New code iterator.
(TRUNCEXTEND): New code attribute.
(TRUNC_SHIFT): Likewise.
(shrn_op): Likewise.
* config/aarch64/predicates.md (aarch64_simd_umax_quarter_mode):
New predicate.
|
|
This patch would like to fix one maybe-uninitialized warning. Aka:
riscv-vsetvl.cc:4354:3: error: 'vsetvl_rinsn' may be used uninitialized [-Werror=maybe-uninitialized]
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc
(pass_vsetvl::global_eliminate_vsetvl_insn): Initialize var by NULL.
|
|
The following adds two patterns simplifying comparisons,
uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0)
is always true.
PR tree-optimization/110278
* match.pd (uns < (typeof uns)(uns != 0) -> false): New.
(x != (typeof x)(x == 0) -> true): Likewise.
|
|
It adjust preprocess, compile and link flags, which allows to change
default -lmsvcrt library by another provided by MinGW runtime.
gcc/ChangeLog:
* config/i386/mingw-w64.h (CPP_SPEC): Adjust for -mcrtdll=.
(REAL_LIBGCC_SPEC): New define.
* config/i386/mingw.opt: Add mcrtdll=
* config/i386/mingw32.h (CPP_SPEC): Adjust for -mcrtdll=.
(REAL_LIBGCC_SPEC): Adjust for -mcrtdll=.
(STARTFILE_SPEC): Adjust for -mcrtdll=.
* doc/invoke.texi: Add mcrtdll= documentation.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
|
|
Support for __attribute__ ((code_readable)). Takes up to one argument of
"yes", "no", "pcrel". This will change the code readability setting for just
that function. If no argument is supplied, then the setting is 'yes'.
gcc/ChangeLog:
* config/mips/mips.cc (enum mips_code_readable_setting):New enmu.
(mips_handle_code_readable_attr):New static function.
(mips_get_code_readable_attr):New static enum function.
(mips_set_current_function):Set the code_readable mode.
(mips_option_override):Same as above.
* doc/extend.texi:Document code_readable.
gcc/testsuite/ChangeLog:
* gcc.target/mips/code-readable-attr-1.c: New test.
* gcc.target/mips/code-readable-attr-2.c: New test.
* gcc.target/mips/code-readable-attr-3.c: New test.
* gcc.target/mips/code-readable-attr-4.c: New test.
* gcc.target/mips/code-readable-attr-5.c: New test.
|
|
The following makes sure we optimize x != 0 using range info
via tree_expr_nonzero_p via match.pd.
PR tree-optimization/110269
* fold-const.cc (fold_binary_loc): Merge x != 0 folding
with tree_expr_nonzero_p ...
* match.pd (cmp (convert? addr@0) integer_zerop): With this
pattern.
* gcc.dg/tree-ssa/pr110269.c: New testcase.
|
|
PR c/107583 notes that we weren't issuing a hint for
struct foo {
time_t mytime; /* missing <time.h> include should trigger fixit */
};
in the C frontend.
The root cause is that one of the "unknown type name" diagnostics
was missing logic to emit hints, which this patch fixes.
gcc/c/ChangeLog:
PR c/107583
* c-parser.cc (c_parser_declspecs): Add hints to "unknown type
name" error.
gcc/testsuite/ChangeLog:
PR c/107583
* c-c++-common/spellcheck-pr107583.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
|