diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2022-02-04 04:13:53 +0100 |
---|---|---|
committer | Tom de Vries <tdevries@suse.de> | 2022-02-10 09:01:54 +0100 |
commit | 9bacd7af2e3bba9ddad17e7de4e2d299419d819d (patch) | |
tree | dcaa5a85b4ee43e818edd22d0b05953252b4935f /gcc/fortran | |
parent | f68c3de7fc9065d8c9ac75b3736ea27abffdce45 (diff) | |
download | gcc-9bacd7af2e3bba9ddad17e7de4e2d299419d819d.zip gcc-9bacd7af2e3bba9ddad17e7de4e2d299419d819d.tar.gz gcc-9bacd7af2e3bba9ddad17e7de4e2d299419d819d.tar.bz2 |
PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0
This patch addresses the "increased register pressure" regression on
nvptx-none caused by my change to transition the backend to a
STORE_FLAG_VALUE = 1 target. This improved code generation for the
more common case of producing 0/1 Boolean values, but unfortunately
made things marginally worse when a 0/-1 mask value is desired.
Unfortunately, nvptx kernels are extremely sensitive to changes in
register usage, which was observable in the reported PR.
This patch provides optimizations for -(cond ? 1 : 0), effectively
simplify this into cond ? -1 : 0, where these ternary operators are
provided by nvptx's selp instruction, and for the specific case of
SImode, using (restoring) nvptx's "set" instruction (which avoids
the need for a predicate register).
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures. Unfortunately,
the exact register usage of a nvptx kernel depends upon the version of
the Cuda drivers being used (and the hardware), but I believe this
change should resolve the PR (for Thomas) by improving code generation
for the cases that regressed.
gcc/ChangeLog:
PR target/104345
* config/nvptx/nvptx.md (sel_true<mode>): Fix indentation.
(sel_false<mode>): Likewise.
(define_code_iterator eqne): New code iterator for EQ and NE.
(*selp<mode>_neg_<code>): New define_insn_and_split to optimize
the negation of a selp instruction.
(*selp<mode>_not_<code>): New define_insn_and_split to optimize
the bitwise not of a selp instruction.
(*setcc_int<mode>): Use set instruction for neg:SI of a selp.
gcc/testsuite/ChangeLog:
PR target/104345
* gcc.target/nvptx/neg-selp.c: New test case.
Diffstat (limited to 'gcc/fortran')
0 files changed, 0 insertions, 0 deletions