Age | Commit message (Collapse) | Author | Files | Lines |
|
Ideally, the linker will be queried for its version and that will be
used to determine capabilities that cannot be discovered from
reasonable configuration testing.
When building cross tools, this might not be possible, and we have
strategies for providing useful defaults. These are adjusted here to
refect current choices.
gcc/ChangeLog:
* config/darwin.h (MIN_LD64_NO_COAL_SECTS): Adjust.
Amend handling for LD64_VERSION fallback defaults.
|
|
The darwinN.h headers (with the sole exception of darwin7.h,
which contains a target macro definition) now only contain
values that set fall-backs for cross-compilations, these can
be provided from the config.gcc script which means we no longer
need the darwinN.h - so delete them.
gcc/ChangeLog:
* config.gcc: Compute default version information
from the configured target. Likewise defaults for
ld64.
* config/darwin10.h: Removed.
* config/darwin12.h: Removed.
* config/darwin9.h: Removed.
* config/rs6000/darwin8.h: Removed.
|
|
Darwin defines ASM_OUTPUT_ALIGNED_DECL_COMMON which is used in
preference to ASM_OUTPUT_ALIGNED_COMMON, which makes the latter
definition dead code. Remove this.
gcc/ChangeLog:
* config/darwin9.h (ASM_OUTPUT_ALIGNED_COMMON): Delete.
|
|
We now need a modern (C++11) toolchain to bootstrap GCC, so there's no
need to skip the stack protect for Darwin < 9.
gcc/ChangeLog:
* config/darwin9.h (STACK_CHECK_STATIC_BUILTIN): Move from here..
* config/darwin.h (STACK_CHECK_STATIC_BUILTIN): .. to here.
|
|
There is no need to make the LINK_GCC_C_SEQUENCE_SPEC conditional on
configuration parameters, it is adequately conditionalized on the
macosx-version-min.
gcc/ChangeLog:
* config/darwin10.h (LINK_GCC_C_SEQUENCE_SPEC): Move from
here...
* config/darwin.h (LINK_GCC_C_SEQUENCE_SPEC): ... to here.
|
|
The darwinN.h headers were (presumably) introduced to allow specs to be
adjusted when there was no mmacosx-version-min handling, or that was
considered unreliable.
We have version-specific specs for the values that have configuration
data, and the version is set in the driver (so may be considered
reliably present).
Some of the 'darwinN.h' content has become dead code, and the reminder
is either conditionalised on version information (or is setting values
used as fall-backs in cross-compilations).
With the changes needed for Darwin20 / macOS 11 the 'darwnN.h' headers
are now too unwieldy to be useful - so this series moves the relevant
specs definitons to the common 'darwin.h' header and then finally uses
the config.gcc script to supply the fall-back defaults for cross-
compilations.
We can then delete all but the main header, since the darwinN.h are
unused.
This change moves a spec from darwin10.h to the main darwin.h
target header.
gcc/ChangeLog:
* config/darwin10.h (LINK_GCC_C_SEQUENCE_SPEC): Move the spec
for the Darwin10 unwinder stub from here ...
* config/darwin.h (LINK_COMMAND_SPEC_A): ... to here.
|
|
The toolchain now requires a C++11 compiler to bootstrap and
none of the older Darwin toolchains which were based on stabs
debugging are suitable. We can simplify the debug setup now.
gcc/ChangeLog:
* config/darwin.h (DSYMUTIL_SPEC): Default to DWARF
(ASM_DEBUG_SPEC):Only define if the assembler supports
stabs.
(PREFERRED_DEBUGGING_TYPE): Default to DWARF.
(DARWIN_PREFER_DWARF): Define.
* config/darwin9.h (PREFERRED_DEBUGGING_TYPE): Remove.
(DARWIN_PREFER_DWARF): Likewise
(DSYMUTIL_SPEC): Likewise.
(COLLECT_RUN_DSYMUTIL): Likewise.
(ASM_DEBUG_SPEC): Likewise.
(ASM_DEBUG_OPTION_SPEC): Likewise.
|
|
There is no need for combine splitters to emit insn patterns with clobbers,
the pass is smart enough to add clobbers to patterns as necessary.
2020-12-30 Uroš Bizjak <ubizjak@gmail.com>
gcc/
* config/i386/i386.md: Remove unnecessary clobbers
from combine splitters.
|
|
[PR98461]
The following patch adds combine splitters to optimize:
- vpcmpeqd %ymm1, %ymm1, %ymm1
- vpandn %ymm1, %ymm0, %ymm0
vpmovmskb %ymm0, %eax
+ notl %eax
etc. (for vectors with less than 32 elements with xorl instead of notl).
2020-12-30 Jakub Jelinek <jakub@redhat.com>
PR target/98461
* config/i386/sse.md (<sse2_avx2>_pmovmskb): Add splitters
for pmovmskb of NOT vector.
* gcc.target/i386/sse2-pr98461.c: New test.
* gcc.target/i386/avx2-pr98461.c: New test.
|
|
Generate MAC(U) instruction instead of MACD(U) when the destination
register is already choosen as ACCL register.
gcc/
2020-12-29 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.md (maddsidi4_split): Skip macd gen, use mac insn
instead.
(macd): Update register letters.
(umaddsidi4_split): Skip macdu gen, use macu insn instead.
(macdu): Update register letters.
Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
|
|
The ARC code contains code which should only work with the old reload
pass. Such code is found in arc_secondary_reload hook, however it was
not properly quarded. Reverse the if-condition predicate such that
req_equiv_mem is called when lra is not in progress.
gcc/
2020-12-29 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (arc_secondary_reload): Flip if-condition
predicates.
Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
|
|
The REGNO_OK_FOR_BASE_P is using reg_renumber array. However, it is
not always defined. Use it only when it is defined.
gcc/
2020-12-29 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.h (REGNO_OK_FOR_BASE_P): Check if defined
reg_renumber.
Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
|
|
We need an temporary register when moving data from a cached memory to
an uncached memory. Fix this issue and add a test for it.
gcc/
2020-12-29 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.c (prepare_move_operands): Use a temporary
registers when we have cached mem-to-uncached mem moves.
gcc/testsuite/
2020-12-29 Vladimir Isaev <isaev@synopsys.com>
* gcc.target/arc/uncached-9.c: New test.
Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
|
|
Update movdi, movdf and mov vectors not to use predicated vadd2
instructions. vadd2 is used as a "fast" move in these patterns. This
fixes a number of failures in dejagnu.
gcc/
2020-12-29 Claudiu Zissulescu <claziss@synopsys.com>
* config/arc/arc.md (movdi_insn): Update pattern, no predicated
vadd2 usage.
(movdf_insn): Likewise.
* config/arc/simdext.md (movVEC_insn): Likewise.
Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
|
|
Use copy_to_reg where appropriate, use int_mode_for_mode
and fix comment indentation.
2020-12-29 Uroš Bizjak <ubizjak@gmail.com>
gcc/
* config/i386/i386-expand.c (ix86_gen_TWO52): Use REAL_MODE_FORMAT
to determine number of mantissa bits. Use real_2expN instead
of real_ldexp.
(ix86_expand_rint): Use copy_to_reg.
(ix86_expand_floorceildf_32): Ditto.
(ix86_expand_truncdf_32): Ditto.
(ix86_expand_rounddf_32): Ditto.
(ix86_expand_floorceil): Use copy_to_reg and int_mode_for_mode.
(ix86_expand_trunc): Ditto.
(ix86_expand_round): Ditto.
|
|
x86_expand_rint expander uses x86_sse_copysign_to_positive, which
is unable to change the sign from - to +. When FE_DOWNWARD rounding
direction is in effect, the expanded sequence that involves subtraction
can trigger x - x = -0.0 special rule. x86_sse_copysign_to_positive
fails to change the sign of the intermediate value, assumed to always
be positive, back to positive.
The patch adds one extra fabs that strips the sign from the intermediate
value when flag_rounding_math is in effect.
2020-12-28 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/96793
* config/i386/i386-expand.c (ix86_expand_rint):
Remove the sign of the intermediate value for flag_rounding_math.
gcc/testsuite/
PR target/96793
* gcc.target/i386/pr96793-2.c: New test.
|
|
It is possible to avoid the call to force_reg and use existing
temporary register in ix86_expand_trunc, ix86_expand_round and
ix86_expand_rounddf_32 expanders.
2020-12-28 Uroš Bizjak <ubizjak@gmail.com>
gcc/
* config/i386/i386-expand.c (ix86_expand_trunc): Use
existing temporary register to avoid a call to force_reg.
|
|
gcc/ChangeLog:
* config/i386/i386.md (optab): New code attr.
* config/i386/sse.md (<code>v32qiv32hi2): Rename to ...
(<optab>v32qiv32hi2) ... this.
(<code>v16qiv16hi2): Likewise.
(<code>v8qiv8hi2): Likewise.
(<code>v16qiv16si2): Likewise.
(<code>v8qiv8si2): Likewise.
(<code>v4qiv4si2): Likewise.
(<code>v16hiv16si2): Likewise.
(<code>v8hiv8si2): Likewise.
(<code>v4hiv4si2): Likewise.
(<code>v8qiv8di2): Likewise.
(<code>v4qiv4di2): Likewise.
(<code>v2qiv2di2): Likewise.
(<code>v8hiv8di2): Likewise.
(<code>v4hiv4di2): Likewise.
(<code>v2hiv2di2): Likewise.
(<code>v8siv8di2): Likewise.
(<code>v4siv4di2): Likewise.
(<code>v2siv2di2): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr92658-avx2-2.c: New test.
* gcc.target/i386/pr92658-avx512bw-2.c: Likewise.
* gcc.target/i386/pr92658-sse4-2.c: Likewise.
|
|
The subprocess return string is raw bytes in python3, it must decode
before used as string, verifed with python2 and python3.
gcc/ChangeLog:
* config/riscv/multilib-generator (arch_canonicalize): Call
decode for the subprocess return value.
|
|
The shift to macOS version 11 also means that '11' without any
following '.x' is accepted as a valid version number. This adjusts
the validation code to accept this and map it to 11.0.0 which
matches what the clang toolchain appears to do.
gcc/ChangeLog:
* config/darwin-driver.c (validate_macosx_version_min): Allow
MACOSX_DEPLOYMENT_TARGET=11.
(darwin_default_min_version): Adjust warning spelling to avoid
an apostrophe.
|
|
x86_expand_truncdf_32 expander uses x86_sse_copysign_to_positive, which
is unable to change the sign from - to +. When FE_DOWNWARD rounding
direction is in effect, the expanded sequence that involves subtraction
can trigger x - x = -0.0 special rule. x86_sse_copysign_to_positive
fails to change the sign of the intermediate value, assumed to always
be positive, back to positive.
The patch adds one extra fabs that strips the sign from the intermediate
value when flag_rounding_math is in effect.
2020-12-23 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/96793
* config/i386/i386-expand.c (ix86_expand_truncdf_32):
Remove the sign of the intermediate value for flag_rounding_math.
gcc/testsuite/
PR target/96793
* gcc.target/i386/pr96793-1.c: New test.
|
|
The type attribute "alu_shfit_imm" is subdivided into
"alu_shift_imm_lsl_1to4" and "alu_shift_imm_other", to accommodate
optimazations of some microarchitectures.
Here is the detailed discussion.
https://gcc.gnu.org/pipermail/gcc/2020-September/233594.html
gcc/
* config/arm/types.md (define_attr "autodetect_type"): New.
(define_attr "type"): Subdivide alu_shift_imm.
* config/arm/common.md: New file.
* config/aarch64/predicates.md:Include common.md.
* config/arm/predicates.md:Include common.md.
* config/aarch64/aarch64.md (*add_<shift>_<mode>): Set autodetect_type.
(*add_<shift>_si_uxtw): Likewise.
(*sub_<shift>_<mode>): Likewise.
(*sub_<shift>_si_uxtw): Likewise.
(*neg_<shift>_<mode>2): Likewise.
(*neg_<shift>_si2_uxtw): Likewise.
* config/arm/arm.md (*addsi3_carryin_shift): Likewise.
(add_not_shift_cin): Likewise.
(*subsi3_carryin_shift): Likewise.
(*subsi3_carryin_shift_alt): Likewise.
(*rsbsi3_carryin_shift): Likewise.
(*rsbsi3_carryin_shift_alt): Likewise.
(*arm_shiftsi3): Likewise.
(*<arith_shift_insn>_multsi): Likewise.
(*<arith_shift_insn>_shiftsi): Likewise.
(subsi3_carryin): Set new type.
(*if_arith_move): Set new type.
(*if_move_arith): Set new type.
(define_attr "core_cycles"): Use new type.
* config/arm/arm-fixed.md (arm_ssatsihi_shift): Set autodetect_type.
* config/arm/thumb2.md (*orsi_not_shiftsi_si): Likewise.
(*thumb2_shiftsi3_short): Set new type.
* config/aarch64/falkor.md (falkor_alu_1_xyz): Use new type.
* config/aarch64/saphira.md (saphira_alu_1_xyz): Likewise.
* config/aarch64/thunderx.md (thunderx_arith_shift): Likewise.
* config/aarch64/thunderx2t99.md (thunderx2t99_alu_shift): Likewise.
* config/aarch64/thunderx3t110.md (thunderx3t110_alu_shift): Likewise.
(thunderx3t110_alu_shift1): Likewise.
* config/aarch64/tsv110.md (tsv110_alu_shift): Likewise.
* config/arm/arm1020e.md (1020alu_shift_op): Likewise.
* config/arm/arm1026ejs.md (alu_shift_op): Likewise.
* config/arm/arm1136jfs.md (11_alu_shift_op): Likewise.
* config/arm/arm926ejs.md (9_alu_op): Likewise.
* config/arm/cortex-a15.md (cortex_a15_alu_shift): Likewise.
* config/arm/cortex-a17.md (cortex_a17_alu_shiftimm): Likewise.
* config/arm/cortex-a5.md (cortex_a5_alu_shift): Likewise.
* config/arm/cortex-a53.md (cortex_a53_alu_shift): Likewise.
* config/arm/cortex-a57.md (cortex_a57_alu_shift): Likewise.
* config/arm/cortex-a7.md (cortex_a7_alu_shift): Likewise.
* config/arm/cortex-a8.md (cortex_a8_alu_shift): Likewise.
* config/arm/cortex-a9.md (cortex_a9_dp_shift): Likewise.
* config/arm/cortex-m4.md (cortex_m4_alu): Likewise.
* config/arm/cortex-m7.md (cortex_m7_alu_shift): Likewise.
* config/arm/cortex-r4.md (cortex_r4_alu_shift): Likewise.
* config/arm/exynos-m1.md (exynos_m1_alu_shift): Likewise.
* config/arm/fa526.md (526_alu_shift_op): Likewise.
* config/arm/fa606te.md (606te_alu_op): Likewise.
* config/arm/fa626te.md (626te_alu_shift_op): Likewise.
* config/arm/fa726te.md (726te_alu_shift_op): Likewise.
* config/arm/fmp626.md (mp626_alu_shift_op): Likewise.
* config/arm/marvell-pj4.md (pj4_shift): Likewise.
(pj4_shift_conds): Likewise.
(pj4_alu_shift): Likewise.
(pj4_alu_shift_conds): Likewise.
* config/arm/xgene1.md (xgene1_alu): Likewise.
* config/arm/arm.c (xscale_sched_adjust_cost): Likewise.
|
|
x86_expand_floorceil expander uses x86_sse_copysign_to_positive, which
is unable to change the sign from - to +. When FE_DOWNWARD rounding
direction is in effect, the expanded sequence that involves subtraction
can trigger x - x = -0.0 special rule. x86_sse_copysign_to_positive
fails to change the sign of the intermediate value, assumed to always
be positive, back to positive.
The patch adds one extra fabs that strips the sign from the intermediate
value when flag_rounding_math is in effect.
2020-12-22 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/96793
* config/i386/i386-expand.c (ix86_expand_floorceil):
Remove the sign of the intermediate value for flag_rounding_math.
(ix86_expand_floorceildf_32): Ditto.
gcc/testsuite/
PR target/96793
* gcc.target/i386/pr96793.c: New test.
|
|
gcc/ChangeLog
* config/i386/i386.md (*one_cmpl<mode>2_1): Fix typo, change
alternative from 2 to 1 in attr isa.
|
|
With the change to macOS 11 and Darwin20, the algorithm for mapping
kernel version to macOS version has changed.
We now have darwin 20.X.Y => macOS 11.(X > 0 ? X - 1 : 0).??.
It currently unclear if the Y will be mapped to macOS patch version
and, if so, whether it will be one-based or 0-based.
Likewise, it's unknown if Darwin 21 will map to macOS 12, so these
entries are unchanged for the present.
gcc/ChangeLog:
* config/darwin-driver.c (darwin_find_version_from_kernel):
Compute the minor OS version from the minor kernel version.
|
|
2020-12-20 Max Filippov <jcmvbkbc@gmail.com>
gcc/
* config/xtensa/xtensa.md (bswapsi2, bswapdi2): New patterns.
gcc/testsuite/
* gcc.target/xtensa/bswap.c: New test.
libgcc/
* config/xtensa/lib1funcs.S (__bswapsi2, __bswapdi2): New
functions.
* config/xtensa/t-xtensa (LIB1ASMFUNCS): Add _bswapsi2 and
_bswapdi2.
|
|
When compiling with -msve-vector-bits=128, aarch64_preferred_simd_mode
would pass the same vector width to aarch64_simd_container_mode for
both SVE and Advanced SIMD, and so Advanced SIMD would always “win”.
This patch instead makes it choose directly between SVE and Advanced
SIMD modes, so that aarch64-autovec-preference==2 and
aarch64-autovec-preference==4 work for this configuration.
(aarch64-autovec-preference shouldn't affect aarch64_simd_container_mode
because that would have an ABI impact for things like GNU vectors.)
gcc/
* config/aarch64/aarch64.c (aarch64_preferred_simd_mode): Use
aarch64_full_sve_mode and aarch64_vq_mode directly, instead of
going via aarch64_simd_container_mode.
|
|
Seems when I split the patch I forgot to include these into the rot iterator..
The uncommitted hunks were still in my local tree so didn't notice.
gcc/ChangeLog:
* config/arm/iterators.md (rot): Add UNSPEC_VCMUL, UNSPEC_VCMUL90,
UNSPEC_VCMUL180, UNSPEC_VCMUL270.
|
|
This patch adds support for -mcpu=cortex-a78c command line option.
For more information about this processor, see [0]:
[0] https://developer.arm.com/ip-products/processors/cortex-a/cortex-a78c
gcc/ChangeLog:
* config/arm/arm-cpus.in: Add Cortex-A78C core.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.
|
|
vect_better_loop_vinfo_p
While experimenting with some backend costs for Advanced SIMD and SVE I
hit many cases where GCC would pick SVE for VLA auto-vectorisation even when
the backend very clearly presented cheaper costs for Advanced SIMD.
For a simple float addition loop the SVE costs were:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 28
Vector prologue cost: 2
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 2
prologue iterations: 0
epilogue iterations: 0
Minimum number of vector iterations: 1
Calculated minimum iters for profitability: 4
and for Advanced SIMD (Neon) they're:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 11
Vector prologue cost: 0
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 0
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 0
vec.c:9:21: note: Runtime profitability threshold = 4
yet the SVE one was always picked. With guidance from Richard this seems
to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in
particular the part with the big comment explaining the
estimated_rel_new * 2 <= estimated_rel_old heuristic.
This patch extends the comparisons by introducing a three-way estimate
kind for poly_int values that the backend can distinguish.
This allows vect_better_loop_vinfo_p to ask for minimum, maximum and
likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper.
gcc/
* target.h (enum poly_value_estimate_kind): Define.
(estimated_poly_value): Take an estimate kind argument.
* target.def (estimated_poly_value): Update definition for the
above.
* doc/tm.texi: Regenerate.
* targhooks.c (estimated_poly_value): Update prototype.
* tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and
likely estimates of VF to pick between vinfos.
* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use
estimated_poly_value instead of aarch64_estimated_poly_value.
(aarch64_estimated_poly_value): Take a kind argument and handle
it.
|
|
gcc/ChangeLog
2020-12-17 Andrea Corallo <andrea.corallo@arm.com>
* config/arm/arm_neon.h (vcreate_p64): Remove call to
'__builtin_neon_vcreatedi'.
|
|
2020-12-16 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
gcc/
* config/xtensa/xtensa.md (*ashlsi3_1, *ashlsi3_3x, *ashrsi3_3x)
(*lshrsi3_3x): New patterns.
gcc/testsuite/
* gcc.target/xtensa/shifts.c: New test.
|
|
This implements support for powerpc64le architecture on FreeBSD. Since
we don't have powerpcle (32-bit), I did not add support for powerpcle
here. This remains to be changed if there is powerpcle support in the
future.
2020-12-15 Piotr Kubaj <pkubaj@FreeBSD.org>
gcc/
* config.gcc (powerpc*le-*-freebsd*): Add.
* configure.ac (powerpc*le-*-freebsd*): Ditto.
* configure: Regenerate.
* config/rs6000/freebsd64.h (ASM_SPEC_COMMON): Use ENDIAN_SELECT.
(DEFAULT_ASM_ENDIAN): Add little endian support.
(LINK_OS_FREEBSD_SPEC64): Ditto.
|
|
2020-12-16 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
gcc/
* config/xtensa/xtensa.c (xtensa_emit_move_sequence): Try to
replace 'l32r' with 'movi' + 'slli' when optimizing for size.
* config/xtensa/xtensa.md (movdi): Split loading DI mode constant
into register pair into two loads of SI mode constants.
|
|
This refactors the complex numbers bits of MVE to go through the same unspecs
as the NEON variant.
This is pre-work to allow code to be shared between NEON and MVE for the complex
vectorization patches.
gcc/ChangeLog:
* config/arm/arm_mve.h (__arm_vcmulq_rot90_f16):
(__arm_vcmulq_rot270_f16, _arm_vcmulq_rot180_f16, __arm_vcmulq_f16,
__arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32,
__arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcmlaq_f16,
__arm_vcmlaq_rot180_f16, __arm_vcmlaq_rot270_f16,
__arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32, __arm_vcmlaq_rot180_f32,
__arm_vcmlaq_rot270_f32, __arm_vcmlaq_rot90_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcmulq_f, vcmulq_rot90_f,
vcmulq_rot180_f, vcmulq_rot270_f, vcmlaq_f, vcmlaq_rot90_f,
vcmlaq_rot180_f, vcmlaq_rot270_f): Removed.
(vcmulq, vcmulq_rot90, vcmulq_rot180, vcmulq_rot270, vcmlaq,
vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270): New.
* config/arm/iterators.md (mve_rot): Add UNSPEC_VCMLA, UNSPEC_VCMLA90,
UNSPEC_VCMLA180, UNSPEC_VCMLA270, UNSPEC_VCMUL, UNSPEC_VCMUL90,
UNSPEC_VCMUL180, UNSPEC_VCMUL270.
(VCMUL): New.
* config/arm/mve.md (mve_vcmulq_f<mode, mve_vcmulq_rot180_f<mode>,
mve_vcmulq_rot270_f<mode>, mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>,
mve_vcmlaq_rot180_f<mode>, mve_vcmlaq_rot270_f<mode>,
mve_vcmlaq_rot90_f<mode>): Removed.
(mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>,
mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, mve_vcaddq<mve_rot><mode>):
New.
* config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270, UNSPEC_VCMUL,
UNSPEC_VCMUL180): New.
(VCMULQ_F, VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F,
VCMLAQ_F, VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F): Removed.
|
|
This adds implementation for the optabs for complex additions. With this the
following C code:
void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
{
for (int i=0; i < N; i++)
c[i] = a[i] + (b[i] * I);
}
generates
f90:
add r3, r2, #1600
.L2:
vld1.32 {q8}, [r0]!
vld1.32 {q9}, [r1]!
vcadd.f32 q8, q8, q9, #90
vst1.32 {q8}, [r2]!
cmp r3, r2
bne .L2
bx lr
instead of
f90:
add r3, r2, #1600
.L2:
vld2.32 {d24-d27}, [r0]!
vld2.32 {d20-d23}, [r1]!
vsub.f32 q8, q12, q11
vadd.f32 q9, q13, q10
vst2.32 {d16-d19}, [r2]!
cmp r3, r2
bne .L2
bx lr
gcc/ChangeLog:
* config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8,
__arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8,
__arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16,
__arm_vcaddq_rot90_s16, __arm_vcaddq_rot270_s16,
__arm_vcaddq_rot90_u32, __arm_vcaddq_rot270_u32,
__arm_vcaddq_rot90_s32, __arm_vcaddq_rot270_s32,
__arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16,
__arm_vcaddq_rot90_f32, __arm_vcaddq_rot270_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u,
vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f):
Removed.
(vcaddq_rot90, vcaddq_rot270): New.
* config/arm/constraints.md (Dz): Include MVE.
* config/arm/iterators.md (mve_rot): New.
(supf): Remove VCADDQ_ROT270_S, VCADDQ_ROT270_U, VCADDQ_ROT90_S,
VCADDQ_ROT90_U.
(VCADDQ_ROT270, VCADDQ_ROT90): Removed.
* config/arm/mve.md (mve_vcaddq_rot270_<supf><mode,
mve_vcaddq_rot90_<supf><mode>, mve_vcaddq_rot270_f<mode>,
mve_vcaddq_rot90_f<mode>): Removed.
(mve_vcaddq<mve_rot><mode>, mve_vcaddq<mve_rot><mode>): New.
* config/arm/unspecs.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S,
VCADDQ_ROT270_U, VCADDQ_ROT90_U, VCADDQ_ROT270_F,
VCADDQ_ROT90_F): Removed.
* config/arm/vec-common.md (cadd<rot><mode>3): New.
|
|
This adds implementation for the optabs for add complex operations. With this
the following C code:
void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
{
for (int i=0; i < N; i++)
c[i] = a[i] + (b[i] * I);
}
generates
f90:
mov x3, 0
.p2align 3,,7
.L2:
ldr q0, [x0, x3]
ldr q1, [x1, x3]
fcadd v0.4s, v0.4s, v1.4s, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 1600
bne .L2
ret
instead of
f90:
add x3, x1, 1600
.p2align 3,,7
.L2:
ld2 {v4.4s - v5.4s}, [x0], 32
ld2 {v2.4s - v3.4s}, [x1], 32
fsub v0.4s, v4.4s, v3.4s
fadd v1.4s, v5.4s, v2.4s
st2 {v0.4s - v1.4s}, [x2], 32
cmp x3, x1
bne .L2
ret
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (cadd<rot><mode>3): New.
* config/aarch64/iterators.md (SVE2_INT_CADD_OP): New.
* config/aarch64/aarch64-sve.md (cadd<rot><mode>3): New.
* config/aarch64/aarch64-sve2.md (cadd<rot><mode>3): New.
|
|
Prefixed instructions should not have their length explicitly set to '8'. The function get_attr_length() will adjust the length appropriately based on the value of the "prefixed" attribute.
2020-12-16 Pat Haugen <pthaugen@linux.ibm.com>
gcc/
* config/rs6000/mma.md (*movxo, mma_<vvi4i4i8>, mma_<avvi4i4i8>,
mma_<vvi4i4i2>, mma_<avvi4i4i2>, mma_<vvi4i4>, mma_<avvi4i4>,
mma_<pvi4i2>, mma_<apvi4i2>, mma_<vvi4i4i4>, mma_<avvi4i4i4>):
Remove explicit setting of length attribute.
|
|
gcc/brig/ChangeLog:
* lang.opt: Remove usage of Report.
gcc/c-family/ChangeLog:
* c.opt: Remove usage of Report.
gcc/ChangeLog:
* common.opt: Remove usage of Report.
* config/aarch64/aarch64.opt: Ditto.
* config/alpha/alpha.opt: Ditto.
* config/arc/arc.opt: Ditto.
* config/arm/arm.opt: Ditto.
* config/avr/avr.opt: Ditto.
* config/bfin/bfin.opt: Ditto.
* config/bpf/bpf.opt: Ditto.
* config/c6x/c6x.opt: Ditto.
* config/cr16/cr16.opt: Ditto.
* config/cris/cris.opt: Ditto.
* config/cris/elf.opt: Ditto.
* config/csky/csky.opt: Ditto.
* config/darwin.opt: Ditto.
* config/fr30/fr30.opt: Ditto.
* config/frv/frv.opt: Ditto.
* config/ft32/ft32.opt: Ditto.
* config/gcn/gcn.opt: Ditto.
* config/i386/cygming.opt: Ditto.
* config/i386/i386.opt: Ditto.
* config/ia64/ia64.opt: Ditto.
* config/ia64/ilp32.opt: Ditto.
* config/linux-android.opt: Ditto.
* config/linux.opt: Ditto.
* config/lm32/lm32.opt: Ditto.
* config/m32r/m32r.opt: Ditto.
* config/m68k/m68k.opt: Ditto.
* config/mcore/mcore.opt: Ditto.
* config/microblaze/microblaze.opt: Ditto.
* config/mips/mips.opt: Ditto.
* config/mmix/mmix.opt: Ditto.
* config/mn10300/mn10300.opt: Ditto.
* config/moxie/moxie.opt: Ditto.
* config/msp430/msp430.opt: Ditto.
* config/nds32/nds32.opt: Ditto.
* config/nios2/elf.opt: Ditto.
* config/nios2/nios2.opt: Ditto.
* config/nvptx/nvptx.opt: Ditto.
* config/pa/pa.opt: Ditto.
* config/pdp11/pdp11.opt: Ditto.
* config/pru/pru.opt: Ditto.
* config/riscv/riscv.opt: Ditto.
* config/rl78/rl78.opt: Ditto.
* config/rs6000/aix64.opt: Ditto.
* config/rs6000/linux64.opt: Ditto.
* config/rs6000/rs6000.opt: Ditto.
* config/rs6000/sysv4.opt: Ditto.
* config/rx/elf.opt: Ditto.
* config/rx/rx.opt: Ditto.
* config/s390/s390.opt: Ditto.
* config/s390/tpf.opt: Ditto.
* config/sh/sh.opt: Ditto.
* config/sol2.opt: Ditto.
* config/sparc/long-double-switch.opt: Ditto.
* config/sparc/sparc.opt: Ditto.
* config/tilegx/tilegx.opt: Ditto.
* config/tilepro/tilepro.opt: Ditto.
* config/v850/v850.opt: Ditto.
* config/visium/visium.opt: Ditto.
* config/vms/vms.opt: Ditto.
* config/vxworks.opt: Ditto.
* config/xtensa/xtensa.opt: Ditto.
gcc/lto/ChangeLog:
* lang.opt: Remove usage of Report.
|
|
This patch is to use paradoxical subreg instead of
zero_extend for promoting QI/HI to SI/DI when we
want to construct one vector with these modes.
Since we do the gpr->vsx movement and vector merge
or pack later, the high part is useless and safe to
use paradoxical subreg. It can avoid useless rlwinms
generated for signed cases.
Bootstrapped/regtested on powerpc64le-linux-gnu P9.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Use
paradoxical subreg instead of zero_extend for QI/HI promotion.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96933-1.c: Adjusted to check no rlwinm.
* gcc.target/powerpc/pr96933-2.c: Likewise.
|
|
gcc/
2020-12-16 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
PR target/66791
* config/arm/arm_neon.h: Replace calls to __builtin_vcgt* by
<, > operators in vclt and vcgt intrinsics respectively.
* config/arm/arm_neon_builtins.def: Remove entry for
vcgt and vcgtu.
|
|
gcc/
2020-12-16 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
PR target/66791
* config/arm/arm_neon.h: Replace calls to __builtin_vneg* by - operator
in vneg intrinsics.
* config/arm/arm_neon_builtins.def: Remove entry for vneg.
|
|
gcc/
2020-12-16 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
PR target/66791
* config/arm/arm_neon.h: Replace calls to __builtin_vcreate*
in vcreate intrinsics.
* config/arm/arm_neon_builtins.def: Remove entry for vcreate.
|
|
The following testcase fails to compile. The problem is that
when ix86_option_override_internal is called the first time for command
line, it sees -mtune= wasn't present on the command line and so as fallback
sets ix86_tune_string to ix86_arch_string value ("x86-64-v2"), but
ix86_tune_specified is false, so we don't find the tuning in the table
but don't error on it.
When processing the target attribute, ix86_tune_string is what
it was earlier left with, but this time ix86_tune_specified is true and
so we error on it.
The following patch does what is done already e.g. for "x86-64" march,
in particular default the tuning to "generic".
2020-12-15 Jakub Jelinek <jakub@redhat.com>
PR target/98274
* config/i386/i386-options.c (ix86_option_override_internal): Set
ix86_tune_string to "generic" even when it wasn't specified and
ix86_arch_string is "x86-64-v2", "x86-64-v3" or "x86-64-v4".
Remove useless {}s around a single statement.
* gcc.target/i386/pr98274.c: New test.
|
|
If somebody has -march=x86-64-v2 (or -v3 or -v4) in $CFLAGS, $CXXFLAGS etc.,
then -m32 or -mabi=ms stops working.
What is worse, if one configures gcc --with-arch-64=x86-64-v2 (or -v3 or -v4),
then -mabi=ms stops working.
I think that is a nightmare user experience. It is ok that x86-64-v[234]
behave slightly different from other -march= options (in that they imply
unless overridden -mtune=generic rather then -mtune= equal to the -march
argument), but the error when one mixes it with -mabi=ms, or -m32 doesn't
improve anything.
It is true that the exact option set is only defined in the x86-64 psABI
(IMHO that is a mistake too, we should copy that into the GCC documentation
like we document it for any other -march= option), but there is no reason
why that exact set of CPU features can't be used for other ABIs, it is just
a set of CPU features. If we add micro-architecture levels to the 32-bit
ABI (I doubt anyone wants to do that, but just hypothetically), then those
micro-architecture levels wouldn't certainly be called x86-64-v* but perhaps
i386-v*.
In the tests, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 can't be expected on -m32
not because the CPU feature wouldn't be set, but because the instruction
is 64-bit only and 32-bit code doesn't have __int128 etc. support.
2020-12-15 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386-options.c (ix86_option_override_internal): Don't
error on -march=x86-64-v[234] with -m32 or -mabi=ms.
* config.gcc: Don't reject --with-arch=x86-64-v[234] or
--with-arch_32=x86-64-v[234].
* doc/invoke.texi (-march=x86-64-v[234]): Document what the option
does for other ABIs.
* gcc.target/i386/x86-64-v2.c: Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v2-other.c: New test.
* gcc.target/i386/x86-64-v2-msabi.c: New test.
* gcc.target/i386/x86-64-v3.c: Fix a comment pasto. Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v3-other.c: New test.
* gcc.target/i386/x86-64-v3-msabi.c: New test.
* gcc.target/i386/x86-64-v4.c:Don't expect
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 to be defined with -m32.
* gcc.target/i386/x86-64-v4-other.c: New test.
* gcc.target/i386/x86-64-v4-msabi.c: New test.
|
|
2020-12-14 Max Filippov <jcmvbkbc@gmail.com>
gcc/
* config/xtensa/predicates.md (addsubx_operand): Change accepted
values from 2/4/8 to 1..3.
* config/xtensa/xtensa.md (*addx, *subx): Change RTL pattern
to use 'ashift' instead of 'mult'. Update operands[3] value.
gcc/testsuite/
* gcc.target/xtensa/pr98285.c: New test.
|
|
gcc/ChangeLog:
2020-12-13 Piotr Kubaj <pkubaj@FreeBSD.org>
Gerald Pfeifer <gerald@pfeifer.com>
* config/rs6000/freebsd64.h (PROCESSOR_DEFAULT): Update
to PROCESSOR_PPC7450.
(PROCESSOR_DEFAULT64): Update to PROCESSOR_POWER8.
|
|
Add support for --with-tune. Like --with-cpu and --with-arch, the argument is
validated and transformed into a -mtune option to be processed like any other
command-line option. --with-tune has no effect if a -mcpu or -mtune option
is used. The validating code didn't allow --with-cpu=native, so explicitly
allow that.
Co-authored-by: Delia Burduv <delia.burduv@arm.com>
Bootstrap OK, regress pass, OK to commit?
2020-09-03 Wilco Dijkstra <wdijkstr@arm.com>
gcc/
* config.gcc (aarch64*-*-*): Add --with-tune. Support --with-cpu=native.
* config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Add --with-tune.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_tune_cortex_a76): New
effective target test.
* gcc.target/aarch64/with-tune-config.c: New test.
* gcc.target/aarch64/with-tune-march.c: Likewise.
* gcc.target/aarch64/with-tune-mcpu.c: Likewise.
* gcc.target/aarch64/with-tune-mtune.c: Likewise.
|
|
This patch enables MVE vneg instructions for auto-vectorization. MVE
vnegq insns in mve.md are modified to use 'neg' instead of unspec
expression. The neg<mode>2 expander is added to vec-common.md.
Existing patterns in neon.md are prefixed with neon_.
It's not clear why we have different patterns for VDQW
and VH in neon.md, when WDQWH handles both, and patterns
with VDQ have provision for attributes for FP modes.
Another question is why <absneg_str><mode>2 always sets
neon_abs<q> type when it also handles neon_neq<q> cases.
2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/mve.md (mve_vnegq_f): Use 'neg' instead of unspec.
(mve_vnegq_s): Likewise.
* config/arm/neon.md (neg<mode>2): Rename into neon_neg<mode>2.
(<absneg_str><mode>2): Rename into neon_<absneg_str><mode>2.
(neon_v<absneg_str><mode>): Call gen_neon_<absneg_str><mode>2.
(vashr<mode>3): Call gen_neon_neg<mode>2.
(vlshr<mode>3): Call gen_neon_neg<mode>2.
(neon_vneg<mode>): Call gen_neon_neg<mode>2.
* config/arm/unspecs.md (VNEGQ_F, VNEGQ_S): Remove.
* config/arm/vec-common.md (neg<mode>2): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-vneg.c: Add tests for vneg.
|
|
This patch enables MVE vmvnq instructions for auto-vectorization. MVE
vmvnq insns in mve.md are modified to use 'not' instead of unspec
expression to support one_cmpl<mode>2. The one_cmpl<mode>2 expander
is added to vec-common.md.
2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* config/arm/iterators.md (VDQNOTM2): New mode iterator.
(supf): Remove VMVNQ_S and VMVNQ_U.
(VMVNQ): Remove.
* config/arm/mve.md (mve_vmvnq_u<mode>): New entry for vmvn
instruction using expression not.
(mve_vmvnq_s<mode>): New expander.
* config/arm/neon.md (one_cmpl<mode>2): Renamed into
one_cmpl<mode>2_neon.
* config/arm/unspecs.md (VMVNQ_S, VMVNQ_U): Remove.
* config/arm/vec-common.md (one_cmpl<mode>2): New expander.
gcc/testsuite/
* gcc.target/arm/simd/mve-vmvn.c: Add tests for vmvn.
|