aboutsummaryrefslogtreecommitdiff
path: root/gcc/fortran/trans-array.c
diff options
context:
space:
mode:
authorWilco Dijkstra <wdijkstr@arm.com>2020-03-06 18:29:02 +0000
committerWilco Dijkstra <wdijkstr@arm.com>2020-03-06 18:29:02 +0000
commit0b8393221177617f19e7c5c5c692b8c59f85fffb (patch)
treef6035c29b0c16c3c2ce9c4a0511327622bf47bd9 /gcc/fortran/trans-array.c
parent3e5c062e96c11a6eaef1cbf94b5992391a850dbf (diff)
downloadgcc-0b8393221177617f19e7c5c5c692b8c59f85fffb.zip
gcc-0b8393221177617f19e7c5c5c692b8c59f85fffb.tar.gz
gcc-0b8393221177617f19e7c5c5c692b8c59f85fffb.tar.bz2
[AArch64] Use intrinsics for widening multiplies (PR91598)
Inline assembler instructions don't have latency info and the scheduler does not attempt to schedule them at all - it does not even honor latencies of asm source operands. As a result, SIMD intrinsics which are implemented using inline assembler perform very poorly, particularly on in-order cores. Add new patterns and intrinsics for widening multiplies, which results in a 63% speedup for the example in the PR, thus fixing the reported regression. gcc/ PR target/91598 * config/aarch64/aarch64-builtins.c (TYPES_TERNOPU_LANE): Add define. * config/aarch64/aarch64-simd.md (aarch64_vec_<su>mult_lane<Qlane>): Add new insn for widening lane mul. (aarch64_vec_<su>mlal_lane<Qlane>): Likewise. * config/aarch64/aarch64-simd-builtins.def: Add intrinsics. * config/aarch64/arm_neon.h: (vmlal_lane_s16): Expand using intrinsics rather than inline asm. (vmlal_lane_u16): Likewise. (vmlal_lane_s32): Likewise. (vmlal_lane_u32): Likewise. (vmlal_laneq_s16): Likewise. (vmlal_laneq_u16): Likewise. (vmlal_laneq_s32): Likewise. (vmlal_laneq_u32): Likewise. (vmull_lane_s16): Likewise. (vmull_lane_u16): Likewise. (vmull_lane_s32): Likewise. (vmull_lane_u32): Likewise. (vmull_laneq_s16): Likewise. (vmull_laneq_u16): Likewise. (vmull_laneq_s32): Likewise. (vmull_laneq_u32): Likewise. * config/aarch64/iterators.md (Vcondtype): New iterator for lane mul. (Qlane): Likewise.
Diffstat (limited to 'gcc/fortran/trans-array.c')
0 files changed, 0 insertions, 0 deletions