diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-06-07 23:35:15 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-06-07 23:35:15 +0100 |
commit | eba3565ce6d766c006cbf1f7f293bbd1226a682d (patch) | |
tree | 31e99a4c0a6e3dcb19b9b6a1161df54c108c94f0 /gcc/range-op-float.cc | |
parent | 28db36e2cfca1b7106adc8d371600fa3a325c4e2 (diff) | |
download | gcc-eba3565ce6d766c006cbf1f7f293bbd1226a682d.zip gcc-eba3565ce6d766c006cbf1f7f293bbd1226a682d.tar.gz gcc-eba3565ce6d766c006cbf1f7f293bbd1226a682d.tar.bz2 |
Add support for stc and cmc instructions in i386.md
This patch is the latest revision of my patch to add support for the
STC (set carry flag) and CMC (complement carry flag) instructions to
the i386 backend, incorporating Uros' previous feedback. The significant
changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern,
(iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate
implementations on pentium4 (which has a notoriously slow STC) when
not optimizing for size.
An example of the use of the stc instruction is:
unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}
which previously generated:
movl $1, %eax
addb $-1, %al
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
with this patch now generates:
stc
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
An example of the use of the cmc instruction (where the carry from
a first adc is inverted/complemented as input to a second adc) is:
unsigned int bar (unsigned int a, unsigned int b,
unsigned int c, unsigned int d)
{
unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1);
return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2);
}
which previously generated:
movl $1, %eax
addb $-1, %al
adcl %esi, %edi
setnc %al
movl %edi, o1(%rip)
addb $-1, %al
adcl %ecx, %edx
setc %al
movl %edx, o2(%rip)
movzbl %al, %eax
ret
and now generates:
stc
adcl %esi, %edi
cmc
movl %edi, o1(%rip)
adcl %ecx, %edx
setc %al
movl %edx, o2(%rip)
movzbl %al, %eax
ret
This version implements Uros' suggestions/refinements. (i) Avoid the
UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use
peephole2s to convert x86_stc and *x86_cmc into alternate forms on
TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is
available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al)
over negqi_ccc_1 (neg %al) for setting the carry from a QImode value,
These changes required two minor edits to i386.cc: ix86_cc_mode had
to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and
*x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that
combine would appreciate that this complex RTL expression was actually
a fast, single byte instruction [i.e. preferable].
2022-06-07 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
Use new x86_stc instruction when the carry flag must be set.
* config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc.
(ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc.
* config/i386/i386.h (TARGET_SLOW_STC): New define.
* config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc.
(x86_stc): New define_insn.
(define_peephole2): Convert x86_stc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*x86_cmc): New define_insn.
(define_peephole2): Convert *x86_cmc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*setccc): New define_insn_and_split for a no-op CCCmode move.
(*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(*setcc_qi_negqi_ccc_2_<mode>): Likewise.
* config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.
gcc/testsuite/ChangeLog
* gcc.target/i386/cmc-1.c: New test case.
* gcc.target/i386/stc-1.c: Likewise.
Diffstat (limited to 'gcc/range-op-float.cc')
0 files changed, 0 insertions, 0 deletions