diff options
author | Maciej W. Rozycki <macro@embecosm.com> | 2021-11-14 21:01:51 +0000 |
---|---|---|
committer | Maciej W. Rozycki <macro@embecosm.com> | 2021-11-14 21:01:51 +0000 |
commit | 3057f1ab737582a9fb37a3fb967ed8bf3659f2f4 (patch) | |
tree | 96126a8a9dda99de35f2c35d237f3cee443e79ee /gcc/lra-int.h | |
parent | e9a53a4f764c37b50aff68811c5d37fcd6f38adb (diff) | |
download | gcc-3057f1ab737582a9fb37a3fb967ed8bf3659f2f4.zip gcc-3057f1ab737582a9fb37a3fb967ed8bf3659f2f4.tar.gz gcc-3057f1ab737582a9fb37a3fb967ed8bf3659f2f4.tar.bz2 |
VAX: Add the `setmemhi' instruction
The MOVC5 machine instruction has `memset' semantics if encoded with a
zero source length[1]:
"4. MOVC5 with a zero source length operand is the preferred way
to fill a block of memory with the fill character."
Use that instruction to implement the `setmemhi' instruction then. Use
the AP register in the register deferred mode for the source address to
yield the shortest possible encoding of the otherwise unused operand,
observing that the address is never dereferenced if the source length is
zero.
The use of this instruction yields steadily better performance, at least
with the Mariah VAX implementation, for a variable-length `memset' call
expanded inline as a single MOVC5 operation compared to an equivalent
libcall invocation:
Length: 1, time elapsed: 0.971789 (builtin), 2.847303 (libcall)
Length: 2, time elapsed: 0.907904 (builtin), 2.728259 (libcall)
Length: 3, time elapsed: 1.038311 (builtin), 2.917245 (libcall)
Length: 4, time elapsed: 0.775305 (builtin), 2.686088 (libcall)
Length: 7, time elapsed: 1.112331 (builtin), 2.992968 (libcall)
Length: 8, time elapsed: 0.856882 (builtin), 2.764885 (libcall)
Length: 15, time elapsed: 1.256086 (builtin), 3.096660 (libcall)
Length: 16, time elapsed: 1.001962 (builtin), 2.888131 (libcall)
Length: 31, time elapsed: 1.590456 (builtin), 3.774164 (libcall)
Length: 32, time elapsed: 1.288909 (builtin), 3.629622 (libcall)
Length: 63, time elapsed: 3.430285 (builtin), 5.269789 (libcall)
Length: 64, time elapsed: 3.265147 (builtin), 5.113156 (libcall)
Length: 127, time elapsed: 6.438772 (builtin), 8.268305 (libcall)
Length: 128, time elapsed: 6.268991 (builtin), 8.114557 (libcall)
Length: 255, time elapsed: 12.417338 (builtin), 14.259678 (libcall)
(times given in seconds per 1000000 `memset' invocations for the given
length made in a loop). It is clear from these figures that hardware
does data coalescence for consecutive bytes rather than naively copying
them one by one, as for lengths that are powers of 2 the figures are
consistently lower than ones for their respective next lower lengths.
The use of MOVC5 also requires at least 4 bytes less in terms of machine
code as it avoids encoding the address of `memset' needed for the CALLS
instruction used to make a libcall, as well as extra PUSHL instructions
needed to pass arguments to the call as those can be encoded directly as
the respective operands of the MOVC5 instruction.
It is perhaps worth noting too that for constant lengths we prefer to
emit up to 5 individual MOVx instructions rather than a single MOVC5
instruction to clear memory and for consistency we copy this behavior
here for filling memory with another value too, even though there may be
a performance advantage with a string copy in comparison to a piecemeal
copy, e.g.:
Length: 40, time elapsed: 2.183192 (string), 2.638878 (piecemeal)
But this is something for another change as it will have to be carefully
evaluated.
[1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment
Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section
3.10 "Character-String Instructions", p. 3-163
gcc/
* config/vax/vax.h (SET_RATIO): New macro.
* config/vax/vax.md (UNSPEC_SETMEM_FILL): New constant.
(setmemhi): New expander.
(setmemhi1): New insn and splitter.
(*setmemhi1): New insn.
gcc/testsuite/
* gcc.target/vax/setmem.c: New test.
Diffstat (limited to 'gcc/lra-int.h')
0 files changed, 0 insertions, 0 deletions