Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This pass detects cases of expensive store forwarding and tries to
avoid them by reordering the stores and using suitable bit insertion
sequences. For example it can transform this:
strb w2, [x1, 1]
ldr x0, [x1] # Expensive store forwarding to larger load.
To:
ldr x0, [x1]
strb w2, [x1]
bfi x0, x2, 0, 8
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following
speedups have been observed.
Neoverse-N1: +29.4%
Intel Coffeelake: +13.1%
AMD 5950X: +17.5%
The transformation is rejected on cases that cause store_bit_field to
generate subreg expressions on different register classes. Files
avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain such
cases and have been marked as XFAIL.
Due to biasing of its operands in store_bit_field, there is a special
handling for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. The
need for this was exosed by an issue exposed on the H8 architecture,
which uses big-endian ordering, but BITS_BIG_ENDIAN is false. In that
case, the START parameter of store_bit_field needs to be calculated
from the end of the destination register.
gcc/ChangeLog:
* Makefile.in (OBJS): Add avoid-store-forwarding.o.
* common.opt (favoid-store-forwarding): New option.
* common.opt.urls: Regenerate.
* doc/invoke.texi: New param store-forwarding-max-distance.
* doc/passes.texi: Document new pass.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document new pass.
* params.opt (store-forwarding-max-distance): New param.
* passes.def: Add pass_rtl_avoid_store_forwarding before
pass_early_remat.
* target.def (avoid_store_forwarding_p): New DEFHOOK.
* target.h (struct store_fwd_info): Declare.
* targhooks.cc (default_avoid_store_forwarding_p): New function.
* targhooks.h (default_avoid_store_forwarding_p): Declare.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.
* avoid-store-forwarding.h: New file.
* timevar.def (TV_AVOID_STORE_FORWARDING): New timevar.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test.
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
|
|
This adds a conditional store optimization for the vectorizer as a pattern.
The vectorizer already supports modifying memory accesses because of the pattern
based gather/scatter recognition.
Doing it in the vectorizer allows us to still keep the ability to vectorize such
loops for architectures that don't have MASK_STORE support, whereas doing this
in ifcvt makes us commit to MASK_STORE.
Concretely for this loop:
void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int stride)
{
if (stride <= 1)
return;
for (int i = 0; i < n; i++)
{
int res = c[i];
int t = b[i+stride];
if (a[i] != 0)
res = t;
c[i] = res;
}
}
today we generate:
.L3:
ld1b z29.s, p7/z, [x0, x5]
ld1w z31.s, p7/z, [x2, x5, lsl 2]
ld1w z30.s, p7/z, [x1, x5, lsl 2]
cmpne p15.b, p6/z, z29.b, #0
sel z30.s, p15, z30.s, z31.s
st1w z30.s, p7, [x2, x5, lsl 2]
add x5, x5, x4
whilelo p7.s, w5, w3
b.any .L3
which in gimple is:
vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
vect_t_20.12_74 = .MASK_LOAD (vectp.10_72, 32B, loop_mask_67);
vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
mask__34.16_79 = vect__9.15_77 != { 0, ... };
vect_res_11.17_80 = VEC_COND_EXPR <mask__34.16_79, vect_t_20.12_74, vect_res_18.9_68>;
.MASK_STORE (vectp_c.18_81, 32B, loop_mask_67, vect_res_11.17_80);
A MASK_STORE is already conditional, so there's no need to perform the load of
the old values and the VEC_COND_EXPR. This patch makes it so we generate:
vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
mask__34.16_79 = vect__9.15_77 != { 0, ... };
.MASK_STORE (vectp_c.18_81, 32B, mask__34.16_79, vect_res_18.9_68);
which generates:
.L3:
ld1b z30.s, p7/z, [x0, x5]
ld1w z31.s, p7/z, [x1, x5, lsl 2]
cmpne p7.b, p7/z, z30.b, #0
st1w z31.s, p7, [x2, x5, lsl 2]
add x5, x5, x4
whilelo p7.s, w5, w3
b.any .L3
gcc/ChangeLog:
PR tree-optimization/115531
* tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New.
(vect_recog_cond_store_pattern): New.
(vect_vect_recog_func_ptrs): Use it.
* target.def (conditional_operation_is_expensive): New.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document it.
* targhooks.cc (default_conditional_operation_is_expensive): New.
* targhooks.h (default_conditional_operation_is_expensive): New.
|
|
Currently how we determine which mode will be used for a
floating point type is that for a given type precision
(size) call mode_for_size to get the first mode which has
this size in the specified class. On Powerpc, we have
three modes (TF/KF/IF) having the same mode precision 128
(see[1]), so the processing forces us to have to place TF
at the first place, it would require us to make more
adjustment in some generic code to avoid some unexpected
mode conversions and it would be even worse if we get rid
of TF eventually one day. And as Joseph pointed out in [2],
"floating types should have their mode, not a poorly
defined precision value", as Joseph and Richi suggested,
this patch is to introduce one hook mode_for_floating_type
which returns the corresponding mode for type float, double
or long double. The default implementation returns SFmode
for float and DFmode for double or long double. For ports
which need special treatment, there are some other patches
for their own port specific implementation (referring to
how {,LONG_}DOUBLE_TYPE_SIZE get used there). For all
generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE, depending
on the context, some of them are replaced with TYPE_PRECISION
of the according type node, some other are replaced with
GET_MODE_PRECISION on the mode from mode_for_floating_type.
This patch also poisons {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
so most defines of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in port
specific are removed, but there are still some which are
good to be kept for readability then they get renamed with
port specific prefix.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
gcc/jit/ChangeLog:
* jit-recording.cc (recording::memento_of_get_type::get_size): Update
macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
targetm.c.mode_for_floating_type with
TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
gcc/ChangeLog:
* coretypes.h (enum tree_index): Forward declaration.
* defaults.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* doc/rtl.texi: Update document by replacing {FLOAT,DOUBLE}_TYPE_SIZE
with C type {float,double}.
* doc/tm.texi.in: Document new hook mode_for_floating_type, remove
document entries for {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE and
update document for WIDEST_HARDWARE_FP_SIZE.
* doc/tm.texi: Regenerate.
* emit-rtl.cc (init_emit_once): Replace DOUBLE_TYPE_SIZE by
calling targetm.c.mode_for_floating_type with TI_DOUBLE_TYPE.
* real.h (REAL_VALUE_TO_TARGET_LONG_DOUBLE): Use TYPE_PRECISION of
long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
* system.h (FLOAT_TYPE_SIZE): Poison.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* target.def (mode_for_floating_type): New hook.
* targhooks.cc (default_mode_for_floating_type): New function.
(default_scalar_mode_supported_p): Update macros
{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
targetm.c.mode_for_floating_type with
TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
* targhooks.h (default_mode_for_floating_type): New declaration.
* tree-core.h (enum tree_index): Specify underlying type unsigned
to sync with forward declaration in coretypes.h.
(NUM_FLOATN_TYPES): Explicitly convert to int.
(NUM_FLOATNX_TYPES): Likewise.
(NUM_FLOATN_NX_TYPES): Likewise.
* tree.cc (build_common_tree_nodes): Update macros
{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
targetm.c.mode_for_floating_type with
TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE and set type mode accordingly.
* config/arc/arc.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/bpf/bpf.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/epiphany/epiphany.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/fr30/fr30.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/frv/frv.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/ft32/ft32.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/iq2000/iq2000.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/lm32/lm32.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/m32c/m32c.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/m32r/m32r.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/microblaze/microblaze.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/mmix/mmix.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/moxie/moxie.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/msp430/msp430.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/nds32/nds32.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/nios2/nios2.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/nvptx/nvptx.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/or1k/or1k.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/pdp11/pdp11.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/pru/pru.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/stormy16/stormy16.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/visium/visium.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/xtensa/xtensa.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/rs6000/rs6000.cc (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
(rs6000_c_mode_for_floating_type): New function.
* config/rs6000/rs6000.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/aarch64/aarch64.cc (aarch64_c_mode_for_floating_type):
New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/aarch64/aarch64.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/alpha/alpha.cc (alpha_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/alpha/alpha.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/avr/avr.cc (avr_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/avr/avr.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/i386/i386.cc (ix86_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/i386/i386.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/ia64/ia64.cc (ia64_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/ia64/ia64.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/riscv/riscv.cc (riscv_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/riscv/riscv.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/rl78/rl78.cc (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
(rl78_c_mode_for_floating_type): New function.
* config/rl78/rl78.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/rx/rx.cc (rx_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/rx/rx.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/s390/s390.cc (s390_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/s390/s390.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/sh/sh.cc (sh_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/sh/sh.h (LONG_DOUBLE_TYPE_SIZE): Remove.
* config/h8300/h8300.cc (h8300_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/h8300/h8300.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Remove.
(LONG_DOUBLE_TYPE_SIZE): Remove.
(DOUBLE_TYPE_MODE): New macro.
* config/h8300/linux.h (DOUBLE_TYPE_SIZE): Remove.
(DOUBLE_TYPE_MODE): New macro.
* config/loongarch/loongarch.cc (loongarch_c_mode_for_floating_type):
New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/loongarch/loongarch.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Remove.
(LONG_DOUBLE_TYPE_SIZE): Rename to ...
(LA_LONG_DOUBLE_TYPE_SIZE): ... this.
(UNITS_PER_FPVALUE): Replace LONG_DOUBLE_TYPE_SIZE with
LA_LONG_DOUBLE_TYPE_SIZE.
(MAX_FIXED_MODE_SIZE): Likewise.
(STRUCTURE_SIZE_BOUNDARY): Likewise.
(BIGGEST_ALIGNMENT): Likewise.
* config/m68k/m68k.cc (m68k_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/m68k/m68k.h (LONG_DOUBLE_TYPE_SIZE): Remove.
(LONG_DOUBLE_TYPE_MODE): New macro.
* config/m68k/netbsd-elf.h (LONG_DOUBLE_TYPE_SIZE): Remove.
(LONG_DOUBLE_TYPE_MODE): New macro.
* config/mips/mips.cc (mips_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/mips/mips.h (UNITS_PER_FPVALUE): Replace LONG_DOUBLE_TYPE_SIZE
with MIPS_LONG_DOUBLE_TYPE_SIZE.
(MAX_FIXED_MODE_SIZE): Likewise.
(STRUCTURE_SIZE_BOUNDARY): Likewise.
(BIGGEST_ALIGNMENT): Likewise.
(FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Remove.
(LONG_DOUBLE_TYPE_SIZE): Rename to ...
(MIPS_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/mips/n32-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(MIPS_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/pa/pa.cc (pa_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
(pa_scalar_mode_supported_p): Rename FLOAT_TYPE_SIZE to
PA_FLOAT_TYPE_SIZE, rename DOUBLE_TYPE_SIZE to PA_DOUBLE_TYPE_SIZE
and rename LONG_DOUBLE_TYPE_SIZE to PA_LONG_DOUBLE_TYPE_SIZE.
* config/pa/pa.h (PA_FLOAT_TYPE_SIZE): New macro.
(PA_DOUBLE_TYPE_SIZE): Likewise.
(PA_LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/pa/pa-64.h (FLOAT_TYPE_SIZE): Rename to ...
(PA_FLOAT_TYPE_SIZE): ... this.
(DOUBLE_TYPE_SIZE): Rename to ...
(PA_DOUBLE_TYPE_SIZE): ... this.
(LONG_DOUBLE_TYPE_SIZE): Rename to ...
(PA_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/pa/pa-hpux.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(PA_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/sparc.cc (sparc_c_mode_for_floating_type): New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
(FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
(sparc_type_code): Replace FLOAT_TYPE_SIZE with TYPE_PRECISION of
float_type_node.
* config/sparc/sparc.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Remove.
* config/sparc/freebsd.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/linux.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/linux64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/netbsd-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/openbsd64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/sol2.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/sp-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/sparc/sp64-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
(SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
* config/bfin/bfin.h (FLOAT_TYPE_SIZE): Rename to ...
(BFIN_FLOAT_TYPE_SIZE): ... this.
(DOUBLE_TYPE_SIZE): Rename to ...
(BFIN_DOUBLE_TYPE_SIZE): ... this.
(LONG_DOUBLE_TYPE_SIZE): Remove.
(UNITS_PER_FLOAT): Replace FLOAT_TYPE_SIZE with BFIN_FLOAT_TYPE_SIZE.
(UNITS_PER_DOUBLE): Replace DOUBLE_TYPE_SIZE with
BFIN_DOUBLE_TYPE_SIZE.
|
|
In cfgexpand, there is an optimization for branch which tests
targetm.gen_ccmp_first == NULL. However for target like x86-64, the
hook was implemented but it does not indicate that ccmp was enabled.
Add a new target hook TARGET_HAVE_CCMP and replace the middle-end
check for the existance of gen_ccmp_first to avoid misoptimization.
gcc/ChangeLog:
PR target/115370
PR target/115463
* target.def (have_ccmp): New target hook.
* targhooks.cc (default_have_ccmp): New function.
* targhooks.h (default_have_ccmp): New prototype.
* doc/tm.texi.in: Add TARGET_HAVE_CCMP.
* doc/tm.texi: Regenerate.
* cfgexpand.cc (expand_gimple_cond): Call targetm.have_ccmp
instead of checking if targetm.gen_ccmp_first exists.
* expr.cc (expand_expr_real_gassign): Likewise.
* config/i386/i386.cc (ix86_have_ccmp): New target hook to
check if APX_CCMP enabled.
(TARGET_HAVE_CCMP): Define.
|
|
on mingw ia32 [PR114968]
__cxa_atexit/__cxa_thread_atexit/__cxa_throw functions accept function
pointers to usually directly destructors rather than wrappers around
them.
Now, mingw ia32 uses implicitly __attribute__((thiscall)) calling
conventions for METHOD_TYPE (where the this pointer is passed in %ecx
register, the rest on the stack), so these functions use:
in config/os/mingw32/os_defines.h:
#if defined (__i386__)
#define _GLIBCXX_CDTOR_CALLABI __thiscall
#endif
in libsupc++/cxxabi.h
__cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*) _GLIBCXX_NOTHROW;
__cxa_thread_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void *) _GLIBCXX_NOTHROW;
__cxa_throw(void*, std::type_info*, void (_GLIBCXX_CDTOR_CALLABI *) (void *))
__attribute__((__noreturn__));
Now, mingw for some weird reason uses
#define TARGET_CXX_USE_ATEXIT_FOR_CXA_ATEXIT hook_bool_void_true
so it never actually uses __cxa_atexit, but does use __cxa_thread_atexit
and __cxa_throw. Recent changes for modules result in more detailed
__cxa_*atexit/__cxa_throw prototypes precreated by the compiler, and if
that happens and one also includes <cxxabi.h>, the compiler complains about
mismatches in the prototypes.
One thing is the missing thiscall attribute on the FUNCTION_TYPE, the
other problem is that all of atexit/__cxa_atexit/__cxa_thread_atexit
get function pointer types created by a single function,
get_atexit_fn_ptr_type (), which creates it depending on if atexit
or __cxa_atexit will be used as either void(*)(void) or void(*)(void *),
but when using atexit and __cxa_thread_atexit it uses the wrong function
type for __cxa_thread_atexit.
The following patch adds a target hook to add the thiscall attribute to the
function pointers, and splits the get_atexit_fn_ptr_type () function into
get_atexit_fn_ptr_type () and get_cxa_atexit_fn_ptr_type (), the former always
creates shared void(*)(void) type, the latter creates either
void(*)(void*) (on most targets) or void(__attribute__((thiscall))*)(void*)
(on mingw ia32). So that we don't waiste another GTY global tree for it,
because cleanup_type used for the same purpose for __cxa_throw should be
the same, the code changes it to use that type too.
In register_dtor_fn then based on the decision whether to use atexit,
__cxa_atexit or __cxa_thread_atexit it picks the right function pointer
type, and also if it decides to emit a __tcf_* wrapper for the cleanup,
uses that type for that wrapper so that it agrees on calling convention.
2024-05-10 Jakub Jelinek <jakub@redhat.com>
PR target/114968
gcc/
* target.def (use_atexit_for_cxa_atexit): Remove spurious space
from comment.
(adjust_cdtor_callabi_fntype): New cxx target hook.
* targhooks.h (default_cxx_adjust_cdtor_callabi_fntype): Declare.
* targhooks.cc (default_cxx_adjust_cdtor_callabi_fntype): New
function.
* doc/tm.texi.in (TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Add.
* doc/tm.texi: Regenerate.
* config/i386/i386.cc (ix86_cxx_adjust_cdtor_callabi_fntype): New
function.
(TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE): Redefine.
gcc/cp/
* cp-tree.h (atexit_fn_ptr_type_node, cleanup_type): Adjust macro
comments.
(get_cxa_atexit_fn_ptr_type): Declare.
* decl.cc (get_atexit_fn_ptr_type): Adjust function comment, only
build type for atexit argument.
(get_cxa_atexit_fn_ptr_type): New function.
(get_atexit_node): Call get_cxa_atexit_fn_ptr_type rather than
get_atexit_fn_ptr_type when using __cxa_atexit.
(get_thread_atexit_node): Call get_cxa_atexit_fn_ptr_type
rather than get_atexit_fn_ptr_type.
(start_cleanup_fn): Add ob_parm argument, call
get_cxa_atexit_fn_ptr_type or get_atexit_fn_ptr_type depending
on it and create PARM_DECL also based on that argument.
(register_dtor_fn): Adjust start_cleanup_fn caller, use
get_cxa_atexit_fn_ptr_type rather than get_atexit_fn_ptr_type
for use_dtor casts.
* except.cc (build_throw): Use get_cxa_atexit_fn_ptr_type ().
|
|
|
|
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.
On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes. This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.
The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute. Support for the
"target_version" attribute will be extended to C at a later date.
Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.
gcc/ChangeLog:
* attribs.cc (decl_attributes): Pass attribute name to target.
(is_function_default_version): Update comment to specify
incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro.
* target.def (valid_version_attribute_p): New hook.
* doc/tm.texi.in: Add new hook.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target macro to pick attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.
gcc/c-family/ChangeLog:
* c-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto, and add target_version.
(attr_target_version_exclusions): New.
(c_common_attribute_table): Add target_version.
(handle_target_version_attribute): New.
(handle_target_attribute): Amend comment.
(handle_target_clones_attribute): Ditto.
gcc/ada/ChangeLog:
* gcc-interface/utils.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.
gcc/d/ChangeLog:
* d-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.
gcc/cp/ChangeLog:
* decl2.cc (check_classfn): Update comment to include
target_version attributes.
|
|
We have the following two hooks into the call expansion code:
- TARGET_CALL_ARGS is called for each argument before arguments
are moved into hard registers.
- TARGET_END_CALL_ARGS is called after the end of the call
sequence (specifically, after any return value has been
moved to a pseudo).
This patch adds a TARGET_START_CALL_ARGS hook that is called before
the TARGET_CALL_ARGS sequence. This means that TARGET_START_CALL_REGS
and TARGET_END_CALL_REGS bracket the region in which argument registers
might be live. They also bracket a region in which the only call
emiitted by target-independent code is the call to the target function
itself. (For example, TARGET_START_CALL_ARGS happens after any use of
memcpy to copy arguments, and TARGET_END_CALL_ARGS happens before any
use of memcpy to copy the result.)
Also, the patch adds the cumulative argument structure as an argument
to the hooks, so that the target can use it to record and retrieve
information about the call as a whole.
The TARGET_CALL_ARGS docs said:
While generating RTL for a function call, this target hook is invoked once
for each argument passed to the function, either a register returned by
``TARGET_FUNCTION_ARG`` or a memory location. It is called just
- before the point where argument registers are stored.
The last bit was true for normal calls, but for libcalls the hook was
invoked earlier, before stack arguments have been copied. I don't think
this caused a practical difference for nvptx (the only port to use the
hooks) since I wouldn't expect any libcalls to take stack parameters.
gcc/
* doc/tm.texi.in: Add TARGET_START_CALL_ARGS.
* doc/tm.texi: Regenerate.
* target.def (start_call_args): New hook.
(call_args, end_call_args): Add a parameter for the cumulative
argument information.
* hooks.h (hook_void_rtx_tree): Delete.
* hooks.cc (hook_void_rtx_tree): Likewise.
* targhooks.h (hook_void_CUMULATIVE_ARGS): Declare.
(hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
* targhooks.cc (hook_void_CUMULATIVE_ARGS): New function.
(hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
* calls.cc (expand_call): Call start_call_args before computing
and storing stack parameters. Pass the cumulative argument
information to call_args and end_call_args.
(emit_library_call_value_1): Likewise.
* config/nvptx/nvptx.cc (nvptx_call_args): Add a cumulative
argument parameter.
(nvptx_end_call_args): Likewise.
|
|
In <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628748.html>
I proposed -fhardened, a new umbrella option that enables a reasonable set
of hardening flags. The read of the room seems to be that the option
would be useful. So here's a patch implementing that option.
Currently, -fhardened enables:
-D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
-D_GLIBCXX_ASSERTIONS
-ftrivial-auto-var-init=zero
-fPIE -pie -Wl,-z,relro,-z,now
-fstack-protector-strong
-fstack-clash-protection
-fcf-protection=full (x86 GNU/Linux only)
-fhardened will not override options that were specified on the command line
(before or after -fhardened). For example,
-D_FORTIFY_SOURCE=1 -fhardened
means that _FORTIFY_SOURCE=1 will be used. Similarly,
-fhardened -fstack-protector
will not enable -fstack-protector-strong.
Currently, -fhardened is only supported on GNU/Linux.
In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
to anything. This patch provides -Whardened, enabled by default, which
warns when -fhardened couldn't enable a particular option. I think most
often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
were not enabled.
gcc/c-family/ChangeLog:
* c-opts.cc: Include "target.h".
(c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
and _GLIBCXX_ASSERTIONS.
gcc/ChangeLog:
* common.opt (Whardened, fhardened): New options.
* config.in: Regenerate.
* config/bpf/bpf.cc: Include "opts.h".
(bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
not inform that -fstack-protector does not work.
* config/i386/i386-options.cc (ix86_option_override_internal): When
-fhardened, maybe enable -fcf-protection=full.
* config/linux-protos.h (linux_fortify_source_default_level): Declare.
* config/linux.cc (linux_fortify_source_default_level): New.
* config/linux.h (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Redefine.
* configure: Regenerate.
* configure.ac: Check if the linker supports '-z now' and '-z relro'.
Check if -fhardened is supported on $target_os.
* doc/invoke.texi: Document -fhardened and -Whardened.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Add.
* gcc.cc (driver_handle_option): Remember if any link options or -static
were specified on the command line.
(process_command): When -fhardened, maybe enable -pie and
-Wl,-z,relro,-z,now.
* opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
(finish_options): When -fhardened, enable
-ftrivial-auto-var-init=zero and -fstack-protector-strong.
(print_help_hardened): New.
(print_help): Call it.
* opts.h (flag_stack_protector_set_by_fhardened_p): Declare.
* target.def (fortify_source_default_level): New target hook.
* targhooks.cc (default_fortify_source_default_level): New.
* targhooks.h (default_fortify_source_default_level): Declare.
* toplev.cc (process_options): When -fhardened, enable
-fstack-clash-protection. If flag_stack_protector_set_by_fhardened_p,
do not warn that -fstack-protector not supported for this target.
Don't enable -fhardened when !HAVE_FHARDENED_SUPPORT.
gcc/testsuite/ChangeLog:
* gcc.misc-tests/help.exp: Test -fhardened.
* c-c++-common/fhardened-1.S: New test.
* c-c++-common/fhardened-1.c: New test.
* c-c++-common/fhardened-10.c: New test.
* c-c++-common/fhardened-11.c: New test.
* c-c++-common/fhardened-12.c: New test.
* c-c++-common/fhardened-13.c: New test.
* c-c++-common/fhardened-14.c: New test.
* c-c++-common/fhardened-15.c: New test.
* c-c++-common/fhardened-2.c: New test.
* c-c++-common/fhardened-3.c: New test.
* c-c++-common/fhardened-4.c: New test.
* c-c++-common/fhardened-5.c: New test.
* c-c++-common/fhardened-6.c: New test.
* c-c++-common/fhardened-7.c: New test.
* c-c++-common/fhardened-8.c: New test.
* c-c++-common/fhardened-9.c: New test.
* gcc.target/i386/cf_check-6.c: New test.
|
|
This reverts commit 8cdcea51c0fd753e6a652c9b236e91b3a6e0911c.
gcc/c-family/ChangeLog:
* c-cppbuiltin.cc (c_cpp_builtins): Do not define
__LIBGCC_GCOV_TYPE_SIZE.
gcc/ChangeLog:
* config/sparc/rtemself.h (SPARC_GCOV_TYPE_SIZE): Remove.
* config/sparc/sparc.cc (sparc_gcov_type_size): Likewise.
(TARGET_GCOV_TYPE_SIZE): Likewise.
* coverage.cc (get_gcov_type): Use LONG_LONG_TYPE_SIZE instead
of removed target hook.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_GCOV_TYPE_SIZE): Remove.
* target.def: Likewise.
* targhooks.cc (default_gcov_type_size): Likewise.
* targhooks.h (default_gcov_type_size): Likewise.
libgcc/ChangeLog:
* libgcov.h (gcov_type): Use LONG_LONG_TYPE_SIZE.
(gcov_type_unsigned): Likewise.
|
|
The following patch introduces the middle-end part of the _BitInt
support, a new BITINT_TYPE, handling it where needed, except the lowering
pass and sanitizer support.
2023-09-06 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* tree.def (BITINT_TYPE): New type.
* tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define.
(NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include
BITINT_TYPE.
(BITINT_TYPE_P): Define.
(CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if
they have BITINT_TYPE type.
(tree_check6, tree_not_check6): New inline functions.
(any_integral_type_check): Include BITINT_TYPE.
(build_bitint_type): Declare.
* tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst,
build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal,
type_hash_canon): Handle BITINT_TYPE.
(bitint_type_cache): New variable.
(build_bitint_type): New function.
(signed_or_unsigned_type_for, verify_type_variant, verify_type):
Handle BITINT_TYPE.
(tree_cc_finalize): Free bitint_type_cache.
* builtins.cc (type_to_class): Handle BITINT_TYPE.
(fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE.
* cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE
INTEGER_CSTs.
* convert.cc (convert_to_pointer_1, convert_to_real_1,
convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE.
(convert_to_integer_1): Likewise. For BITINT_TYPE don't check
GET_MODE_PRECISION (TYPE_MODE (type)).
* doc/generic.texi (BITINT_TYPE): Document.
* doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New.
* doc/tm.texi: Regenerated.
* dwarf2out.cc (base_type_die, is_base_type, modified_type_die,
gen_type_die_with_usage): Handle BITINT_TYPE.
(rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or
handle those which fit into shwi.
* expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce
to bitfield precision reads from BITINT_TYPE vars, parameters or
memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into
memory.
* fold-const.cc (fold_convert_loc, make_range_step): Handle
BITINT_TYPE.
(extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE).
(native_encode_int, native_interpret_int, native_interpret_expr):
Handle BITINT_TYPE.
* gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE
to some other integral type or vice versa conversions non-useless.
* gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE.
(clear_padding_unit): Mention in comment that _BitInt types don't need
to fit either.
(clear_padding_bitint_needs_padding_p): New function.
(clear_padding_type_may_have_padding_p): Handle BITINT_TYPE.
(clear_padding_type): Likewise.
* internal-fn.cc (expand_mul_overflow): For unsigned non-mode
precision operands force pos_neg? to 1.
(expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT,
expand_BITINTTOFLOAT): New functions.
* internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT,
BITINTTOFLOAT): New internal functions.
* internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT,
expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare.
* match.pd (non-equality compare simplifications from fold_binary):
Punt if TYPE_MODE (arg1_type) is BLKmode.
* pretty-print.h (pp_wide_int): Handle printing of large precision
wide_ints which would buffer overflow digit_buffer.
* stor-layout.cc (finish_bitfield_representative): For bit-fields
with BITINT_TYPE, prefer representatives with precisions in
multiple of limb precision.
(layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode
element type and assert it is BITINT_TYPE.
* target.def (bitint_type_info): New C target hook.
* target.h (struct bitint_info): New type.
* targhooks.cc (default_bitint_type_info): New function.
* targhooks.h (default_bitint_type_info): Declare.
* tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE.
Handle printing large wide_ints which would buffer overflow
digit_buffer.
* tree-ssa-sccvn.cc: Include target.h.
(eliminate_dom_walker::eliminate_stmt): Punt for large/huge
BITINT_TYPE.
* tree-switch-conversion.cc (jump_table_cluster::emit): For more than
64-bit BITINT_TYPE subtract low bound from expression and cast to
64-bit integer type both the controlling expression and case labels.
* typeclass.h (enum type_class): Add bitint_type_class enumerator.
* varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs.
* vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather
than widest_int.
(simplify_using_ranges::simplify_internal_call_using_ranges): Use
unsigned_type_for rather than build_nonstandard_integer_type.
|
|
As PR110248 shows, some middle-end passes like IVOPTs can
query the target hook legitimate_address_p with some
artificially constructed rtx to determine whether some
addressing modes are supported by target for some gimple
statement. But for now the existing legitimate_address_p
only checks the given mode, it's unable to distinguish
some special cases unfortunately, for example, for LEN_LOAD
ifn on Power port, we would expand it with lxvl hardware
insn, which only supports one register to hold the address
(the other register is holding the length), that is we
don't support base (reg) + index (reg) addressing mode for
sure. But hook legitimate_address_p only considers the
given mode which would be some vector mode for LEN_LOAD
ifn, and we do support base + index addressing mode for
normal vector load and store insns, so the hook will return
true for the query unexpectedly.
This patch is to introduce one extra argument of type
code_helper for hook legitimate_address_p, it makes targets
able to handle some special case like what's described
above.
PR tree-optimization/110248
gcc/ChangeLog:
* coretypes.h (class code_helper): Add forward declaration.
* doc/tm.texi: Regenerate.
* lra-constraints.cc (valid_address_p): Call target hook
targetm.addr_space.legitimate_address_p with an extra parameter
ERROR_MARK as its prototype changes.
* recog.cc (memory_address_addr_space_p): Likewise.
* reload.cc (strict_memory_address_addr_space_p): Likewise.
* target.def (legitimate_address_p, addr_space.legitimate_address_p):
Extend with one more argument of type code_helper, update the
documentation accordingly.
* targhooks.cc (default_legitimate_address_p): Adjust for the
new code_helper argument.
(default_addr_space_legitimate_address_p): Likewise.
* targhooks.h (default_legitimate_address_p): Likewise.
(default_addr_space_legitimate_address_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p): Adjust
with extra unnamed code_helper argument with default ERROR_MARK.
* config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise.
* config/arc/arc.cc (arc_legitimate_address_p): Likewise.
* config/arm/arm-protos.h (arm_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/avr/avr.cc (avr_addr_space_legitimate_address_p): Likewise.
* config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise.
* config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise.
* config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise.
* config/cris/cris-protos.h (cris_legitimate_address_p): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/cris/cris.cc (cris_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/csky/csky.cc (csky_legitimate_address_p): Likewise.
* config/epiphany/epiphany.cc (epiphany_legitimate_address_p):
Likewise.
* config/frv/frv.cc (frv_legitimate_address_p): Likewise.
* config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p): Likewise.
* config/gcn/gcn.cc (gcn_addr_space_legitimate_address_p): Likewise.
* config/h8300/h8300.cc (h8300_legitimate_address_p): Likewise.
* config/i386/i386.cc (ix86_legitimate_address_p): Likewise.
* config/ia64/ia64.cc (ia64_legitimate_address_p): Likewise.
* config/iq2000/iq2000.cc (iq2000_legitimate_address_p): Likewise.
* config/lm32/lm32.cc (lm32_legitimate_address_p): Likewise.
* config/loongarch/loongarch.cc (loongarch_legitimate_address_p):
Likewise.
* config/m32c/m32c.cc (m32c_legitimate_address_p): Likewise.
(m32c_addr_space_legitimate_address_p): Likewise.
* config/m32r/m32r.cc (m32r_legitimate_address_p): Likewise.
* config/m68k/m68k.cc (m68k_legitimate_address_p): Likewise.
* config/mcore/mcore.cc (mcore_legitimate_address_p): Likewise.
* config/microblaze/microblaze-protos.h (tree.h): New include for
tree_code ERROR_MARK.
(microblaze_legitimate_address_p): Adjust with extra unnamed
code_helper argument with default ERROR_MARK.
* config/microblaze/microblaze.cc (microblaze_legitimate_address_p):
Likewise.
* config/mips/mips.cc (mips_legitimate_address_p): Likewise.
* config/mmix/mmix.cc (mmix_legitimate_address_p): Likewise.
* config/mn10300/mn10300.cc (mn10300_legitimate_address_p): Likewise.
* config/moxie/moxie.cc (moxie_legitimate_address_p): Likewise.
* config/msp430/msp430.cc (msp430_legitimate_address_p): Likewise.
(msp430_addr_space_legitimate_address_p): Adjust with extra code_helper
argument with default ERROR_MARK and adjust the call to function
msp430_legitimate_address_p.
* config/nds32/nds32.cc (nds32_legitimate_address_p): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/nios2/nios2.cc (nios2_legitimate_address_p): Likewise.
* config/nvptx/nvptx.cc (nvptx_legitimate_address_p): Likewise.
* config/or1k/or1k.cc (or1k_legitimate_address_p): Likewise.
* config/pa/pa.cc (pa_legitimate_address_p): Likewise.
* config/pdp11/pdp11.cc (pdp11_legitimate_address_p): Likewise.
* config/pru/pru.cc (pru_addr_space_legitimate_address_p): Likewise.
* config/riscv/riscv.cc (riscv_legitimate_address_p): Likewise.
* config/rl78/rl78-protos.h (rl78_as_legitimate_address): Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/rl78/rl78.cc (rl78_as_legitimate_address): Adjust with
extra unnamed code_helper argument with default ERROR_MARK.
* config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Likewise.
(rs6000_debug_legitimate_address_p): Adjust with extra code_helper
argument and adjust the call to function rs6000_legitimate_address_p.
* config/rx/rx.cc (rx_is_legitimate_address): Adjust with extra
unnamed code_helper argument with default ERROR_MARK.
* config/s390/s390.cc (s390_legitimate_address_p): Likewise.
* config/sh/sh.cc (sh_legitimate_address_p): Likewise.
* config/sparc/sparc.cc (sparc_legitimate_address_p): Likewise.
* config/v850/v850.cc (v850_legitimate_address_p): Likewise.
* config/vax/vax.cc (vax_legitimate_address_p): Likewise.
* config/visium/visium.cc (visium_legitimate_address_p): Likewise.
* config/xtensa/xtensa.cc (xtensa_legitimate_address_p): Likewise.
* config/stormy16/stormy16-protos.h (xstormy16_legitimate_address_p):
Likewise.
(tree.h): New include for tree_code ERROR_MARK.
* config/stormy16/stormy16.cc (xstormy16_legitimate_address_p):
Adjust with extra unnamed code_helper argument with default
ERROR_MARK.
|
|
As has been discussed before, the following patch adds target hook
for math library function maximum errors measured in ulps.
The default is to return ~0U which is a magic maximum value which means
nothing is known about precision of the match function.
The first argument is unsigned int because enum combined_fn isn't available
everywhere where target hooks are included but is expected to be given
the enum combined_fn value, although it should be used solely to find out
which kind of match function (say sin vs. cos vs. sqrt vs. exp10) rather
than its variant (f suffix, no suffix, l suffix, f128 suffix, ...), for
which there is the machine_mode argument.
The last argument is a bool, if it is false, the function should return
maximum known error in ulps for a given function (taking -frounding-math
into account if enabled), with 0.5ulps being represented as 0.
If it is true, it is about whether the function can return values outside of
an intrinsic finite range for the function and by how many ulps.
E.g. sin/cos should return result in [-1.,1], if the function is expected
to never return values outside of that finite interval, the hook should
return 0. Similarly for sqrt such range is [-0.,+Inf].
The patch implements it for glibc only so far, I hope other maintainers
can submit details for Solaris, musl, perhaps BSDs, etc.
For glibc I've gathered data from:
1) https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
as latest published glibc data
2) https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html
as a few years old glibc data
3) using attached libc-ulps.sh script from glibc git
4) using attached ulp-tester.c (how to invoke in file comment; tested
both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
modes, plus on x86_64 float/double sin/cos using libmvec - see
attached libmvec-wrapper.c as well)
5) using attached boundary-tester.c to test for whether sin/cos/sqrt return
values outside of the intrinsic ranges for those functions (again,
tested on x86_64, ppc64, ppc64le plus on x86_64 using libmvec as well;
libmvec with non-default rounding modes is pretty much random number
generator it seems)
The data is added to various hooks, the generic and generic glibc versions
being in targhooks.c so that the various targets can easily override it.
The intent is that the generic glibc version handles most of the stuff
and specific target arch overrides handle the outliers or special cases.
The patch has special case for x86_64 when __FAST_MATH__ is defined (as
one can use in that case either libm or libmvec and we don't know which
one will be used; so it uses maximum of what libm provides and libmvec),
rs6000 (had to add one because cosf has 3ulps on ppc* rather than 1-2ulps
on most other targets; MODE_COMPOSITE_P could be in theory handled in the
generic code too, but as we have rs6000-linux specific function, it can be
done just there), arc-linux (because DFmode sin has 7ulps there compared to
1ulps on other targets, both in default rounding mode and in others) and
or1k-linux (while DFmode sin has 1ulps there for default rounding mode,
for other rounding modes it has up to 7ulps).
Now, for -frounding-math I'm trying to add a few ulps more because I expect
it to be much less tested, except that for boundary_p I try to use
the numbers I got from the 5) tester.
2023-04-28 Jakub Jelinek <jakub@redhat.com>
* target.def (libm_function_max_error): New target hook.
* doc/tm.texi.in (TARGET_LIBM_FUNCTION_MAX_ERROR): Add.
* doc/tm.texi: Regenerated.
* targhooks.h (default_libm_function_max_error,
glibc_linux_libm_function_max_error): Declare.
* targhooks.cc: Include case-cfn-macros.h.
(default_libm_function_max_error,
glibc_linux_libm_function_max_error): New functions.
* config/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/linux-protos.h (linux_libm_function_max_error): Declare.
* config/linux.cc: Include target.h and targhooks.h.
(linux_libm_function_max_error): New function.
* config/arc/arc.cc: Include targhooks.h and case-cfn-macros.h.
(arc_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/i386/i386.cc (ix86_libc_has_fast_function): Formatting fix.
(ix86_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/rs6000-protos.h
(rs6000_linux_libm_function_max_error): Declare.
* config/rs6000/rs6000-linux.cc: Include target.h, targhooks.h, tree.h
and case-cfn-macros.h.
(rs6000_linux_libm_function_max_error): New function.
* config/rs6000/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/linux64.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/or1k/or1k.cc: Include targhooks.h and case-cfn-macros.h.
(or1k_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
|
|
This now implements a hook
preferred_div_as_shifts_over_mult that indicates whether a target prefers that
the vectorizer decomposes division as shifts rather than multiplication when
possible.
In order to be able to use this we need to check whether the current precision
has enough bits to do the operation without any of the additions overflowing.
We use range information to determine this and only do the operation if we're
sure am overflow won't occur. This now uses ranger to do this range check.
This seems to work better than vect_get_range_info which uses range_query, but I
have not switched the interface of vect_get_range_info over in this PR fix.
As Andy said before initializing a ranger instance is cheap but not free, and if
the intention is to call it often during a pass it should be instantiated at
pass startup and passed along to the places that need it. This is a big
refactoring and doesn't seem right to do in this PR. But we should in GCC 14.
Currently we only instantiate it after a long series of much cheaper checks.
gcc/ChangeLog:
PR target/108583
* target.def (preferred_div_as_shifts_over_mult): New.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* targhooks.cc (default_preferred_div_as_shifts_over_mult): New.
* targhooks.h (default_preferred_div_as_shifts_over_mult): New.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it.
gcc/testsuite/ChangeLog:
PR target/108583
* gcc.dg/vect/vect-div-bitmask-4.c: New test.
* gcc.dg/vect/vect-div-bitmask-5.c: New test.
|
|
This reverts the changes for the CAN_SPECIAL_DIV_BY_CONST hook.
gcc/ChangeLog:
PR target/108583
* doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove.
* doc/tm.texi.in: Likewise.
* explow.cc (round_push, align_dynamic_address): Revert previous patch.
* expmed.cc (expand_divmod): Likewise.
* expmed.h (expand_divmod): Likewise.
* expr.cc (force_operand, expand_expr_divmod): Likewise.
* optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise.
* target.def (can_special_div_by_const): Remove.
* target.h: Remove tree-core.h include
* targhooks.cc (default_can_special_div_by_const): Remove.
* targhooks.h (default_can_special_div_by_const): Remove.
* tree-vect-generic.cc (expand_vector_operation): Remove hook.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook.
* tree-vect-stmts.cc (vectorizable_operation): Remove hook.
|
|
ia32 with -mno-sse2 [PR108883]
_Float16 and decltype(0.0bf16) types are on x86 supported only with
-msse2. On x86_64 that is the default, but on ia32 it is not.
We should still emit fundamental type tinfo for those types in
libsupc++.a/libstdc++.*, regardless of whether libsupc++/libstdc++
is compiled with -msse2 or not, as user programs can be compiled
with different ISA flags from libsupc++/libstdc++ and if they
are compiled with -msse2 and use std::float16_t or std::bfloat16_t
and need RTTI for it, it should work out of the box. Furthermore,
libstdc++ ABI on ia32 shouldn't depend on whether the library
is compiled with -mno-sse or -msse2.
Unfortunately, just hacking up libsupc++ Makefile/configure so that
a single source is compiled with -msse2 isn't appropriate, because
that TU emits also code and the code should be able to run on CPUs
which libstdc++ supports. We could add [[gnu::attribute ("no-sse2")]]
there perhaps conditionally, but it all gets quite ugly.
The following patch instead adds a target hook which allows the backend
to temporarily tweak registered types such that emit_support_tinfos
emits whatever is needed.
Additionally, it makes emit_support_tinfos_1 call emit_tinfo_decl
immediately, so that temporarily created dummy types for emit_support_tinfo
purposes only can be nullified again afterwards. And removes the
previous fallback_* types used for dfloat*_type_node tinfos even when
decimal types aren't supported.
2023-03-03 Jakub Jelinek <jakub@redhat.com>
PR target/108883
gcc/
* target.h (emit_support_tinfos_callback): New typedef.
* targhooks.h (default_emit_support_tinfos): Declare.
* targhooks.cc (default_emit_support_tinfos): New function.
* target.def (emit_support_tinfos): New target hook.
* doc/tm.texi.in (emit_support_tinfos): Document it.
* doc/tm.texi: Regenerated.
* config/i386/i386.cc (ix86_emit_support_tinfos): New function.
(TARGET_EMIT_SUPPORT_TINFOS): Redefine.
gcc/cp/
* cp-tree.h (enum cp_tree_index): Remove CPTI_FALLBACK_DFLOAT*_TYPE
enumerators.
(fallback_dfloat32_type, fallback_dfloat64_type,
fallback_dfloat128_type): Remove.
* rtti.cc (emit_support_tinfo_1): If not emitted already, call
emit_tinfo_decl and remove from unemitted_tinfo_decls right away.
(emit_support_tinfos): Move &dfloat*_type_node from fundamentals array
into new fundamentals_with_fallback array. Call emit_support_tinfo_1
on elements of that array too, with the difference that if
the type is NULL, use a fallback REAL_TYPE for it temporarily.
Drop the !targetm.decimal_float_supported_p () handling. Call
targetm.emit_support_tinfos at the end.
* mangle.cc (write_builtin_type): Remove references to
fallback_dfloat*_type. Handle bfloat16_type_node mangling.
|
|
|
|
As discussed in PR98125, -fpatchable-function-entry with
SECTION_LINK_ORDER support doesn't work well on powerpc64
ELFv1 because the filled "Symbol" in
.section name,"flags"o,@type,Symbol
sits in .opd section instead of in the function_section
like .text or named .text*.
Since we already generates one label LPFE* which sits in
function_section of current_function_decl, this patch is
to reuse it as the symbol for the linked_to section. It
avoids the above ABI specific issue when using the symbol
concluded from current_function_decl.
Besides, with this support some previous workarounds can
be reverted.
PR target/99889
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
Adjust to call function default_print_patchable_function_entry.
* targhooks.cc (default_print_patchable_function_entry_1): Remove and
move the flags preparation ...
(default_print_patchable_function_entry): ... here, adjust to use
current_function_funcdef_no for label no.
* targhooks.h (default_print_patchable_function_entry_1): Remove.
* varasm.cc (default_elf_asm_named_section): Adjust code for
__patchable_function_entries section support with LPFE label.
gcc/testsuite/ChangeLog:
* g++.dg/pr93195a.C: Remove the skip on powerpc*-*-* 64-bit.
* gcc.target/aarch64/pr92424-2.c: Adjust LPFE1 with LPFE0.
* gcc.target/aarch64/pr92424-3.c: Likewise.
* gcc.target/i386/pr93492-2.c: Likewise.
* gcc.target/i386/pr93492-3.c: Likewise.
* gcc.target/i386/pr93492-4.c: Likewise.
* gcc.target/i386/pr93492-5.c: Likewise.
|
|
In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.
e.g.:
x = y / (2 ^ (bitsize (y)/2)-1
This patch adds a new target hook can_special_div_by_const, similar to
can_vec_perm which can be called to check if a target will handle a particular
division in a special way in the back-end.
The vectorizer will then vectorize the division using the standard tree code
and at expansion time the hook is called again to generate the code for the
division.
Alot of the changes in the patch are to pass down the tree operands in all paths
that can lead to the divmod expansion so that the target hook always has the
type of the expression you're expanding since the types can change the
expansion.
gcc/ChangeLog:
* expmed.h (expand_divmod): Pass tree operands down in addition to RTX.
* expmed.cc (expand_divmod): Likewise.
* explow.cc (round_push, align_dynamic_address): Likewise.
* expr.cc (force_operand, expand_expr_divmod): Likewise.
* optabs.cc (expand_doubleword_mod, expand_doubleword_divmod):
Likewise.
* target.h: Include tree-core.
* target.def (can_special_div_by_const): New.
* targhooks.cc (default_can_special_div_by_const): New.
* targhooks.h (default_can_special_div_by_const): New.
* tree-vect-generic.cc (expand_vector_operation): Use it.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for support.
* tree-vect-stmts.cc (vectorizable_operation): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-div-bitmask-1.c: New test.
* gcc.dg/vect/vect-div-bitmask-2.c: New test.
* gcc.dg/vect/vect-div-bitmask-3.c: New test.
* gcc.dg/vect/vect-div-bitmask.h: New file.
|
|
'TARGET_ASM_DESTRUCTOR'
... after commit 4ee35c11fd328728c12f3e086ae016ca94624bf8
"Restore default 'sorry' 'TARGET_ASM_CONSTRUCTOR', 'TARGET_ASM_DESTRUCTOR'".
No functional change.
gcc/
* Makefile.in (OBJS): Remove 'dbxout.o'.
* config/nvptx/nvptx.cc: Don't '#include "dbxout.h"'.
* dbxout.cc: Remove.
* dbxout.h: Likewise.
* target-def.h (TARGET_ASM_CONSTRUCTOR, TARGET_ASM_DESTRUCTOR):
Default to 'default_asm_out_constructor',
'default_asm_out_destructor'.
* targhooks.cc (default_asm_out_constructor)
(default_asm_out_destructor): New.
* targhooks.h (default_asm_out_constructor)
(default_asm_out_destructor): Declare.
|
|
Back story: When GCC is configured and built on non-glibc platforms,
it seems very little to no effort is made to enumerate the available
C99 libm functions. It is all or nothing for C99 libm. The patch
introduces a new function, used on only FreeBSD, to inform gcc that
it has C99 libm functions (minus a few which clearly GCC does not check
nor test).
2022-04-15 Steven G. Kargl <kargl@gcc.gnu.org>
PR target/89125
* config/freebsd.h: Define TARGET_LIBC_HAS_FUNCTION to be
bsd_libc_has_function.
* targhooks.cc (bsd_libc_has_function): New function.
Expand the supported math functions to inclue C99 libm.
* targhooks.h (bsd_libc_has_function): New Prototype.
|
|
Power ISA 2.07 (Power8) introduces transactional memory
feature but ISA3.1 (Power10) removes it. It exposes one
troublesome issue as PR102059 shows. Users define some
function with target pragma cpu=power10 then it calls one
function with attribute always_inline which inherits
command line option -mcpu=power8 which enables HTM
implicitly. The current isa_flags check doesn't allow this
inlining due to "target specific option mismatch" and error
mesasge is emitted.
Normally, the callee function isn't intended to exploit HTM
feature, but the default flag setting make it look it has.
As Richi raised in the PR, we have fp_expressions flag in
function summary, and allow us to check the function
actually contains any floating point expressions to avoid
overkill. So this patch follows the similar idea but is
more target specific, for this rs6000 port specific
requirement on HTM feature check, we would like to check
rs6000 specific HTM built-in functions and inline assembly,
it allows targets to do their own customized checks and
updates.
It introduces two target hooks need_ipa_fn_target_info and
update_ipa_fn_target_info. The former allows target to do
some previous check and decides to collect target specific
information for this function or not. For some special
case, it can predict the analysis result and set it early
without any scannings. The latter allows the
analyze_function_body to pass gimple stmts down just like
fp_expressions handlings, target can do its own tricks.
I put them together as one hook initially with one boolean
to indicate whether it's initial time, but the code looks a
bit ugly, to separate them seems to have better readability.
gcc/ChangeLog:
PR ipa/102059
* config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
(rs6000_need_ipa_fn_target_info): New function.
(rs6000_update_ipa_fn_target_info): Likewise.
(rs6000_can_inline_p): Adjust for ipa function summary target info.
* config/rs6000/rs6000.h (RS6000_FN_TARGET_INFO_HTM): New macro.
* ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
summary target info.
(analyze_function_body): Adjust for ipa function summary target info
and call hook rs6000_need_ipa_fn_target_info and
rs6000_update_ipa_fn_target_info.
(ipa_merge_fn_summary_after_inlining): Adjust for ipa function summary
target info.
(inline_read_section): Likewise.
(ipa_fn_summary_write): Likewise.
* ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new hook.
(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
* target.def (update_ipa_fn_target_info): New hook.
(need_ipa_fn_target_info): Likewise.
* targhooks.c (default_need_ipa_fn_target_info): New function.
(default_update_ipa_fn_target_info): Likewise.
* targhooks.h (default_update_ipa_fn_target_info): New declare.
(default_need_ipa_fn_target_info): Likewise.
gcc/testsuite/ChangeLog:
PR ipa/102059
* gcc.dg/lto/pr102059-1_0.c: New test.
* gcc.dg/lto/pr102059-1_1.c: New test.
* gcc.dg/lto/pr102059-1_2.c: New test.
* gcc.dg/lto/pr102059-2_0.c: New test.
* gcc.dg/lto/pr102059-2_1.c: New test.
* gcc.dg/lto/pr102059-2_2.c: New test.
* gcc.target/powerpc/pr102059-1.c: New test.
* gcc.target/powerpc/pr102059-2.c: New test.
* gcc.target/powerpc/pr102059-3.c: New test.
|
|
|
|
The current vector cost interface has a quite a bit of redundancy
built in. Each target that defines its own hooks has to replicate
the basic unsigned[3] management. Currently each target also
duplicates the cost adjustment for inner loops.
This patch instead defines a vector_costs class for holding
the scalar or vector cost and allows targets to subclass it.
There is then only one costing hook: to create a new costs
structure of the appropriate type. Everything else can be
virtual functions, with common concepts implemented in the
base class rather than in each target's derivation.
This might seem like excess C++-ification, but it shaves
~100 LOC. I've also got some follow-on changes that become
significantly easier with this patch. Maybe it could help
with things like weighting blocks based on frequency too.
This will clash with Andre's unrolling patches. His patches
have priority so this patch should queue behind them.
The x86 and rs6000 parts fully convert to a self-contained class.
The equivalent aarch64 changes are more complex, so this patch
just does the bare minimum. A later patch will rework the
aarch64 bits.
gcc/
* target.def (targetm.vectorize.init_cost): Replace with...
(targetm.vectorize.create_costs): ...this.
(targetm.vectorize.add_stmt_cost): Delete.
(targetm.vectorize.finish_cost): Likewise.
(targetm.vectorize.destroy_cost_data): Likewise.
* doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* doc/tm.texi: Regenerate.
* tree-vectorizer.h (vec_info::vec_info): Remove target_cost_data
parameter.
(vec_info::target_cost_data): Change from a void * to a vector_costs *.
(vector_costs): New class.
(init_cost): Take a vec_info and return a vector_costs.
(dump_stmt_cost): Remove data parameter.
(add_stmt_cost): Replace vinfo and data parameters with a vector_costs.
(add_stmt_costs): Likewise.
(finish_cost): Replace data parameter with a vector_costs.
(destroy_cost_data): Delete.
* tree-vectorizer.c (dump_stmt_cost): Remove data argument and
don't print it.
(vec_info::vec_info): Remove the target_cost_data parameter and
initialize the member variable to null instead.
(vec_info::~vec_info): Delete target_cost_data instead of calling
destroy_cost_data.
(vector_costs::add_stmt_cost): New function.
(vector_costs::finish_cost): Likewise.
(vector_costs::record_stmt_cost): Likewise.
(vector_costs::adjust_cost_for_freq): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
call to vec_info::vec_info.
(vect_compute_single_scalar_iteration_cost): Update after above
changes to costing interface.
(vect_analyze_loop_operations): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_analyze_loop_2): Initialize LOOP_VINFO_TARGET_COST_DATA
at the start_over point, where it needs to be recreated after
trying without slp. Update retry code accordingly.
* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Update call
to vec_info::vec_info.
(vect_slp_analyze_operation): Update after above changes to costing
interface.
(vect_bb_vectorization_profitable_p): Likewise.
* targhooks.h (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete.
(default_finish_cost, default_destroy_cost_data): Likewise.
* targhooks.c (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete, moving logic to vector_costs instead.
(default_finish_cost, default_destroy_cost_data): Delete.
* config/aarch64/aarch64.c (aarch64_vector_costs): Inherit from
vector_costs. Add a constructor.
(aarch64_init_cost): Replace with...
(aarch64_vectorize_create_costs): ...this.
(aarch64_add_stmt_cost): Replace with...
(aarch64_vector_costs::add_stmt_cost): ...this. Use record_stmt_cost
to adjust the cost for inner loops.
(aarch64_finish_cost): Replace with...
(aarch64_vector_costs::finish_cost): ...this.
(aarch64_destroy_cost_data): Delete.
(TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* config/i386/i386.c (ix86_vector_costs): New structure.
(ix86_init_cost): Replace with...
(ix86_vectorize_create_costs): ...this.
(ix86_add_stmt_cost): Replace with...
(ix86_vector_costs::add_stmt_cost): ...this. Use adjust_cost_for_freq
to adjust the cost for inner loops.
(ix86_finish_cost, ix86_destroy_cost_data): Delete.
(TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* config/rs6000/rs6000.c (TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
(rs6000_cost_data): Inherit from vector_costs.
Add a constructor. Drop loop_info, cost and costing_for_scalar
in favor of the corresponding vector_costs member variables.
Add "m_" to the names of the remaining member variables and
initialize them.
(rs6000_density_test): Replace with...
(rs6000_cost_data::density_test): ...this.
(rs6000_init_cost): Replace with...
(rs6000_vectorize_create_costs): ...this.
(rs6000_update_target_cost_per_stmt): Replace with...
(rs6000_cost_data::update_target_cost_per_stmt): ...this.
(rs6000_add_stmt_cost): Replace with...
(rs6000_cost_data::add_stmt_cost): ...this. Use adjust_cost_for_freq
to adjust the cost for inner loops.
(rs6000_adjust_vect_cost_per_loop): Replace with...
(rs6000_cost_data::adjust_vect_cost_per_loop): ...this.
(rs6000_finish_cost): Replace with...
(rs6000_cost_data::finish_cost): ...this. Group loop code
into a single if statement and pass the loop_vinfo down to
subroutines.
(rs6000_destroy_cost_data): Delete.
|
|
Looking at calls.c:initialize_argument_information, I spotted some dead
code that seems to have been left behind from when MPX support was
removed.
This change removes that code as well as the associated target hooks
(which appear to be unused).
gcc/ChangeLog:
* calls.c (initialize_argument_information): Remove some dead
code, remove handling for function_arg returning const_int.
* doc/tm.texi: Delete documentation for unused target hooks.
* doc/tm.texi.in: Likewise.
* target.def (load_bounds_for_arg): Delete.
(store_bounds_for_arg): Delete.
(load_returned_bounds): Delete.
(store_returned_bounds): Delete.
* targhooks.c (default_load_bounds_for_arg): Delete.
(default_store_bounds_for_arg): Delete.
(default_load_returned_bounds): Delete.
(default_store_returned_bounds): Delete.
* targhooks.h (default_load_bounds_for_arg): Delete.
(default_store_bounds_for_arg): Delete.
(default_load_returned_bounds): Delete.
(default_store_returned_bounds): Delete.
|
|
If -fprofile-update=atomic is used, then the target must provide atomic
operations for the counters of the type returned by get_gcov_type().
This is a 64-bit type for targets which have a 64-bit long long type.
On 32-bit targets this could be an issue since they may not provide
64-bit atomic operations. Allow targets to override the default type
size with the new TARGET_GCOV_TYPE_SIZE target hook.
If a 32-bit gcov type size is used, then there is currently a warning in
libgcov-driver.c in a dead code block due to
sizeof (counter) == sizeof (gcov_unsigned_t):
libgcc/libgcov-driver.c: In function 'dump_counter':
libgcc/libgcov-driver.c:401:46: warning: right shift count >= width of type [-Wshift-count-overflow]
401 | dump_unsigned ((gcov_unsigned_t)(counter >> 32), dump_fn, arg);
| ^~
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Define
__LIBGCC_GCOV_TYPE_SIZE if flag_building_libgcc is true.
gcc/
* config/sparc/rtemself.h (SPARC_GCOV_TYPE_SIZE): Define.
* config/sparc/sparc.c (sparc_gcov_type_size): New.
(TARGET_GCOV_TYPE_SIZE): Redefine if SPARC_GCOV_TYPE_SIZE is defined.
* coverage.c (get_gcov_type): Use targetm.gcov_type_size().
* doc/tm.texi (TARGET_GCOV_TYPE_SIZE): Add hook under "Misc".
* doc/tm.texi.in: Regenerate.
* target.def (gcov_type_size): New target hook.
* targhooks.c (default_gcov_type_size): New.
* targhooks.h (default_gcov_type_size): Declare.
* tree-profile.c (gimple_gen_edge_profiler): Use precision of
gcov_type_node.
(gimple_gen_time_profiler): Likewise.
libgcc/
* libgcov.h (gcov_type): Define using __LIBGCC_GCOV_TYPE_SIZE.
(gcov_type_unsigned): Likewise.
|
|
Currently, doloop.xx variable is using the type as niter which may be
shorter than word size. For some targets, it would be better to use
word size type. For example, on 64bit system, to access 32bit value,
subreg maybe used. Then using 64bit type maybe better for niter if
it can be present in both 32bit and 64bit.
This patch add target hook to query preferred mode for doloop IV,
and update mode accordingly.
gcc/ChangeLog:
2021-07-29 Jiufu Guo <guojiufu@linux.ibm.com>
PR target/61837
* config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
(rs6000_preferred_doloop_mode): New hook.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook preferred_doloop_mode.
* target.def (preferred_doloop_mode): New hook.
* targhooks.c (default_preferred_doloop_mode): New hook.
* targhooks.h (default_preferred_doloop_mode): New hook.
* tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New function.
(add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
and compute_doloop_base_on_mode.
gcc/testsuite/ChangeLog:
2021-07-29 Jiufu Guo <guojiufu@linux.ibm.com>
PR target/61837
* gcc.target/powerpc/pr61837.c: New test.
|
|
1. Replace PUSH_ARGS with a target calls hook, TARGET_PUSH_ARGUMENT, which
takes an integer argument. When it returns true, push instructions will
be used to pass outgoing arguments. If the argument is nonzero, it is
the number of bytes to push and indicates the PUSH instruction usage is
optional so that the backend can decide if PUSH instructions should be
generated. Otherwise, the argument is zero.
2. Implement x86 target hook which returns false when the number of bytes
to push is no less than 16 (8 for 32-bit targets) if vector load and store
can be used.
3. Remove target PUSH_ARGS definitions which return 0 as it is the same
as the default.
4. Define TARGET_PUSH_ARGUMENT of cr16 and m32c to always return true.
gcc/
PR target/100704
* calls.c (expand_call): Replace PUSH_ARGS with
targetm.calls.push_argument (0).
(emit_library_call_value_1): Likewise.
* defaults.h (PUSH_ARGS): Removed.
(PUSH_ARGS_REVERSED): Replace PUSH_ARGS with
targetm.calls.push_argument (0).
* expr.c (block_move_libcall_safe_for_call_parm): Likewise.
(emit_push_insn): Pass the number bytes to push to
targetm.calls.push_argument and pass 0 if ARGS_ADDR is 0.
* hooks.c (hook_bool_uint_true): New.
* hooks.h (hook_bool_uint_true): Likewise.
* rtlanal.c (nonzero_bits1): Replace PUSH_ARGS with
targetm.calls.push_argument (0).
* target.def (push_argument): Add a targetm.calls hook.
* targhooks.c (default_push_argument): New.
* targhooks.h (default_push_argument): Likewise.
* config/bpf/bpf.h (PUSH_ARGS): Removed.
* config/cr16/cr16.c (TARGET_PUSH_ARGUMENT): New.
* config/cr16/cr16.h (PUSH_ARGS): Removed.
* config/i386/i386.c (ix86_push_argument): New.
(TARGET_PUSH_ARGUMENT): Likewise.
* config/i386/i386.h (PUSH_ARGS): Removed.
* config/m32c/m32c.c (TARGET_PUSH_ARGUMENT): New.
* config/m32c/m32c.h (PUSH_ARGS): Removed.
* config/nios2/nios2.h (PUSH_ARGS): Likewise.
* config/pru/pru.h (PUSH_ARGS): Likewise.
* doc/tm.texi.in: Remove PUSH_ARGS documentation. Add
TARGET_PUSH_ARGUMENT hook.
* doc/tm.texi: Regenerated.
gcc/testsuite/
PR target/100704
* gcc.target/i386/pr100704-1.c: New test.
* gcc.target/i386/pr100704-2.c: Likewise.
* gcc.target/i386/pr100704-3.c: Likewise.
|
|
rs6000 port function rs6000_density_test wants to differentiate the
current cost model is for the scalar version of a loop or block, or
the vector version. As Richi suggested, this patch introduces one
new parameter costing_for_scalar to init_cost hook to pass down this
information explicitly.
gcc/ChangeLog:
* doc/tm.texi: Regenerated.
* target.def (init_cost): Add new parameter costing_for_scalar.
* targhooks.c (default_init_cost): Adjust for new parameter.
* targhooks.h (default_init_cost): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
(vect_compute_single_scalar_iteration_cost): Likewise.
(vect_analyze_loop_2): Likewise.
* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
* tree-vectorizer.h (init_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
* config/i386/i386.c (ix86_init_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
|
|
The SECTION_LINK_ORDER changes broke powerpc64-linux ELFv1. Seems
that the assembler/linker relies on the symbol mentioned for the
"awo" section to be in the same section as the symbols mentioned in
the relocations in that section (i.e. labels for the patchable area
in this case). That is the case for most targets, including powerpc-linux
32-bit or powerpc64 ELFv2 (that one has -fpatchable-function-entry*
support broken for other reasons and it doesn't seem to be a regression).
But it doesn't work on powerpc64-linux ELFv1.
We emit:
.section ".opd","aw"
.align 3
_Z3foov:
.quad .L._Z3foov,.TOC.@tocbase,0
.previous
.type _Z3foov, @function
.L._Z3foov:
.section __patchable_function_entries,"awo",@progbits,_Z3foov
.align 3
.8byte .LPFE1
.section .text._Z3foov,"axG",@progbits,_Z3foov,comdat
.LPFE1:
nop
.LFB0:
.cfi_startproc
and because _Z3foov is in the .opd section rather than the function text
section, it doesn't work.
I'm afraid I don't know what exactly should be done, whether e.g.
it could use
.section __patchable_function_entries,"awo",@progbits,.L._Z3foov
instead, or whether the linker should be changed to handle it as is, or
something else.
But because we have a P1 regression that didn't see useful progress over the
4 months since it has been filed and we don't really have much time, below
is an attempt to do a targetted reversion of H.J's patch, basically act as
if HAVE_GAS_SECTION_LINK_ORDER is never true for powerpc64-linux ELFv1,
but for 32-bit or 64-bit ELFv2 keep working as is.
This would give us time to resolve it for GCC 12 properly.
2021-04-03 Jakub Jelinek <jakub@redhat.com>
PR testsuite/98125
* targhooks.h (default_print_patchable_function_entry_1): Declare.
* targhooks.c (default_print_patchable_function_entry_1): New function,
copied from default_print_patchable_function_entry with an added flags
argument.
(default_print_patchable_function_entry): Rewritten into a small
wrapper around default_print_patchable_function_entry_1.
* config/rs6000/rs6000.c (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
Redefine.
(rs6000_print_patchable_function_entry): New function.
* g++.dg/pr93195a.C: Skip on powerpc*-*-* 64-bit.
|
|
|
|
commit 64432b680eab0bddbe9a4ad4798457cf6a14ad60
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date: Thu Dec 17 18:02:37 2020 +0000
vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p
changed default_estimated_poly_value to
HOST_WIDE_INT
default_estimated_poly_value (poly_int64 x, poly_value_estimate_kind)
{
return x.coeffs[0];
}
Update default_estimated_poly_value prototype in targhooks.h to match it.
* targhooks.h (default_estimated_poly_value): Updated.
|
|
This patch introduces maybe_emit_call_builtin___clear_cache for the
builtin expander machinery and the trampoline initializers to use to
clear the instruction cache, removing a source of inconsistencies and
subtle errors in low-level machinery.
I've adjusted all trampoline_init implementations that used to issue
explicit calls to __clear_cache or similar to use this new primitive.
Specifically on vxworks targets, we needed to drop the __clear_cache
symbol in libgcc, for reasons related with linking that I didn't need
to understand, and we wanted to call cacheTextUpdate directly, despite
the different calling conventions: the second argument is a length
rather than the end address.
So I introduced a target hook to enable target OS-level overriding of
builtin __clear_cache call emission, retaining nearly (*) the same
logic to govern the decision on whether to emit a call (or nothing, or
a machine-dependent insn) but enabling a call to a target
system-defined function with different calling conventions to be
issued, without having to modify .md files of the various
architectures supported by the target system to introduce or modify
clear_cache insns.
(*) I write "nearly" mainly because, when not optimizing, we'd issue a
call regardless, but since the call may now be overridden, I added it
to the set of builtins that are not directly turned into calls when
not optimizing, following the normal expansion path instead. It
wouldn't be hard to skip the emission of cache-clearing insns when not
optimizing, but it didn't seem very important, especially for the new
uses from trampoline init.
Another difference that might be relevant is that now we expand
the begin and end arguments unconditionally. This might make a
difference if they have side effects. That's prettty much impossible
at expand time, but I thought I'd mention it.
I have NOT modified targets that did not issue cache-clearing calls in
trampoline init to use the new clear_cache-calling infrastructure even
if it would expand to nothing. I have considered doing so, to have
__builtin___clear_cache and trampoline init call cacheTextUpdate on
all vxworks targets, but decided not to, since on targets that don't
do any cache clearing, cacheTextUpdate ought to be a no-op, even
though rs6000 seems to use icbi and dcbf instructions in the function
called to initialize a trampoline, but AFAICT not in the __clear_cache
builtin. Hopefully target maintainers will have a look and take
advantage of this new piece of infrastructure to remove such
(apparent?) inconsistencies. Not rs6000 and other that call asm-coded
trampoline setup instructions, for sure, but they might wish to
introduce a CLEAR_INSN_CACHE macro or a clear_cache expander if they
don't have one.
for gcc/ChangeLog
* builtins.c (default_emit_call_builtin___clear_cache): New.
(maybe_emit_call_builtin___clear_cache): New.
(expand_builtin___clear_cache): Split into the above.
(expand_builtin): Do not issue clear_cache call any more.
* builtins.h (maybe_emit_call_builtin___clear_cache): Declare.
* config/aarch64/aarch64.c (aarch64_trampoline_init): Use
maybe_emit_call_builtin___clear_cache.
* config/arc/arc.c (arc_trampoline_init): Likewise.
* config/arm/arm.c (arm_trampoline_init): Likewise.
* config/c6x/c6x.c (c6x_initialize_trampoline): Likewise.
* config/csky/csky.c (csky_trampoline_init): Likewise.
* config/m68k/linux.h (FInALIZE_TRAMPOLINE): Likewise.
* config/tilegx/tilegx.c (tilegx_trampoline_init): Likewise.
* config/tilepro/tilepro.c (tilepro_trampoline_init): Ditto.
* config/vxworks.c: Include rtl.h, memmodel.h, and optabs.h.
(vxworks_emit_call_builtin___clear_cache): New.
* config/vxworks.h (CLEAR_INSN_CACHE): Drop.
(TARGET_EMIT_CALL_BUILTIN___CLEAR_CACHE): Define.
* target.def (trampoline_init): In the documentation, refer to
maybe_emit_call_builtin___clear_cache.
(emit_call_builtin___clear_cache): New.
* doc/tm.texi.in: Add new hook point.
(CLEAR_CACHE_INSN): Remove duplicate 'both'.
* doc/tm.texi: Rebuilt.
* targhooks.h (default_meit_call_builtin___clear_cache):
Declare.
* tree.h (BUILTIN_ASM_NAME_PTR): New.
for libgcc/ChangeLog
* config/t-vxworks (LIB2ADD): Drop.
* config/t-vxworks7 (LIB2ADD): Likewise.
* config/vxcache.c: Remove.
|
|
Handling stack variables has three features.
1) Ensure HWASAN required alignment for stack variables
When tagging shadow memory, we need to ensure that each tag granule is
only used by one variable at a time.
This is done by ensuring that each tagged variable is aligned to the tag
granule representation size and also ensure that the end of each
object is aligned to ensure the start of any other data stored on the
stack is in a different granule.
This patch ensures the above by forcing the stack pointer to be aligned
before and after allocating any stack objects. Since we are forcing
alignment we also use `align_local_variable` to ensure this new alignment
is advertised properly through SET_DECL_ALIGN.
2) Put tags into each stack variable pointer
Make sure that every pointer to a stack variable includes a tag of some
sort on it.
The way tagging works is:
1) For every new stack frame, a random tag is generated.
2) A base register is formed from the stack pointer value and this
random tag.
3) References to stack variables are now formed with RTL describing an
offset from this base in both tag and value.
The random tag generation is handled by a backend hook. This hook
decides whether to introduce a random tag or use the stack background
based on the parameter hwasan-random-frame-tag. Using the stack
background is necessary for testing and bootstrap. It is necessary
during bootstrap to avoid breaking the `configure` test program for
determining stack direction.
Using the stack background means that every stack frame has the initial
tag of zero and variables are tagged with incrementing tags from 1,
which also makes debugging a bit easier.
Backend hooks define the size of a tag, the layout of the HWASAN shadow
memory, and handle emitting the code that inserts and extracts tags from a
pointer.
3) For each stack variable, tag and untag the shadow stack on function
prologue and epilogue.
On entry to each function we tag the relevant shadow stack region for
each stack variable. This stack region is tagged to match the tag added to
each pointer to that variable.
This is the first patch where we use the HWASAN shadow space, so we need
to add in the libhwasan initialisation code that creates this shadow
memory region into the binary we produce. This instrumentation is done
in `compile_file`.
When exiting a function we need to ensure the shadow stack for this
function has no remaining tags. Without clearing the shadow stack area
for this stack frame, later function calls could get false positives
when those later function calls check untagged areas (such as parameters
passed on the stack) against a shadow stack area with left-over tag.
Hence we ensure that the entire stack frame is cleared on function exit.
config/ChangeLog:
* bootstrap-hwasan.mk: Disable random frame tags for stack-tagging
during bootstrap.
gcc/ChangeLog:
* asan.c (struct hwasan_stack_var): New.
(hwasan_sanitize_p): New.
(hwasan_sanitize_stack_p): New.
(hwasan_sanitize_allocas_p): New.
(initialize_sanitizer_builtins): Define new builtins.
(ATTR_NOTHROW_LIST): New macro.
(hwasan_current_frame_tag): New.
(hwasan_frame_base): New.
(stack_vars_base_reg_p): New.
(hwasan_maybe_init_frame_base_init): New.
(hwasan_record_stack_var): New.
(hwasan_get_frame_extent): New.
(hwasan_increment_frame_tag): New.
(hwasan_record_frame_init): New.
(hwasan_emit_prologue): New.
(hwasan_emit_untag_frame): New.
(hwasan_finish_file): New.
(hwasan_truncate_to_tag_size): New.
* asan.h (hwasan_record_frame_init): New declaration.
(hwasan_record_stack_var): New declaration.
(hwasan_emit_prologue): New declaration.
(hwasan_emit_untag_frame): New declaration.
(hwasan_get_frame_extent): New declaration.
(hwasan_maybe_enit_frame_base_init): New declaration.
(hwasan_frame_base): New declaration.
(stack_vars_base_reg_p): New declaration.
(hwasan_current_frame_tag): New declaration.
(hwasan_increment_frame_tag): New declaration.
(hwasan_truncate_to_tag_size): New declaration.
(hwasan_finish_file): New declaration.
(hwasan_sanitize_p): New declaration.
(hwasan_sanitize_stack_p): New declaration.
(hwasan_sanitize_allocas_p): New declaration.
(HWASAN_TAG_SIZE): New macro.
(HWASAN_TAG_GRANULE_SIZE): New macro.
(HWASAN_STACK_BACKGROUND): New macro.
* builtin-types.def (BT_FN_VOID_PTR_UINT8_PTRMODE): New.
* builtins.def (DEF_SANITIZER_BUILTIN): Enable for HWASAN.
* cfgexpand.c (align_local_variable): When using hwasan ensure
alignment to tag granule.
(align_frame_offset): New.
(expand_one_stack_var_at): For hwasan use tag offset.
(expand_stack_vars): Record stack objects for hwasan.
(expand_one_stack_var_1): Record stack objects for hwasan.
(init_vars_expansion): Initialise hwasan state.
(expand_used_vars): Emit hwasan prologue and generate hwasan epilogue.
(pass_expand::execute): Emit hwasan base initialization if needed.
* doc/tm.texi (TARGET_MEMTAG_TAG_SIZE,TARGET_MEMTAG_GRANULE_SIZE,
TARGET_MEMTAG_INSERT_RANDOM_TAG,TARGET_MEMTAG_ADD_TAG,
TARGET_MEMTAG_SET_TAG,TARGET_MEMTAG_EXTRACT_TAG,
TARGET_MEMTAG_UNTAGGED_POINTER): Document new hooks.
* doc/tm.texi.in (TARGET_MEMTAG_TAG_SIZE,TARGET_MEMTAG_GRANULE_SIZE,
TARGET_MEMTAG_INSERT_RANDOM_TAG,TARGET_MEMTAG_ADD_TAG,
TARGET_MEMTAG_SET_TAG,TARGET_MEMTAG_EXTRACT_TAG,
TARGET_MEMTAG_UNTAGGED_POINTER): Document new hooks.
* explow.c (get_dynamic_stack_base): Take new `base` argument.
* explow.h (get_dynamic_stack_base): Take new `base` argument.
* sanitizer.def (BUILT_IN_HWASAN_INIT): New.
(BUILT_IN_HWASAN_TAG_MEM): New.
* target.def (target_memtag_tag_size,target_memtag_granule_size,
target_memtag_insert_random_tag,target_memtag_add_tag,
target_memtag_set_tag,target_memtag_extract_tag,
target_memtag_untagged_pointer): New hooks.
* targhooks.c (HWASAN_SHIFT): New.
(HWASAN_SHIFT_RTX): New.
(default_memtag_tag_size): New default hook.
(default_memtag_granule_size): New default hook.
(default_memtag_insert_random_tag): New default hook.
(default_memtag_add_tag): New default hook.
(default_memtag_set_tag): New default hook.
(default_memtag_extract_tag): New default hook.
(default_memtag_untagged_pointer): New default hook.
* targhooks.h (default_memtag_tag_size): New default hook.
(default_memtag_granule_size): New default hook.
(default_memtag_insert_random_tag): New default hook.
(default_memtag_add_tag): New default hook.
(default_memtag_set_tag): New default hook.
(default_memtag_extract_tag): New default hook.
(default_memtag_untagged_pointer): New default hook.
* toplev.c (compile_file): Call hwasan_finish_file when finished.
|
|
These flags can't be used at the same time as any of the other
sanitizers.
We add an equivalent flag to -static-libasan in -static-libhwasan to
ensure static linking.
The -fsanitize=kernel-hwaddress option is for compiling targeting the
kernel. This flag has defaults to match the LLVM implementation and
sets some other behaviors to work in the kernel (e.g. accounting for
the fact that the stack pointer will have 0xff in the top byte and to not
call the userspace library initialisation routines).
The defaults are that we do not sanitize variables on the stack and
always recover from a detected bug.
Since we are introducing a few more conflicts between sanitizer flags we
refactor the checking for such conflicts to use a helper function which
makes checking for such conflicts more easy and consistent.
We introduce a backend hook `targetm.memtag.can_tag_addresses` that
indicates to the mid-end whether a target has a feature like AArch64 TBI
where the top byte of an address is ignored.
Without this feature hwasan sanitization is not done.
gcc/ChangeLog:
* common.opt (flag_sanitize_recover): Default for kernel
hwaddress.
(static-libhwasan): New cli option.
* config/aarch64/aarch64.c (aarch64_can_tag_addresses): New.
(TARGET_MEMTAG_CAN_TAG_ADDRESSES): New.
* config/gnu-user.h (LIBHWASAN_EARLY_SPEC): hwasan equivalent of
asan command line flags.
* cppbuiltin.c (define_builtin_macros_for_compilation_flags):
Add hwasan equivalent of __SANITIZE_ADDRESS__.
* doc/invoke.texi: Document hwasan command line flags.
* doc/tm.texi: Document new hook.
* doc/tm.texi.in: Document new hook.
* flag-types.h (enum sanitize_code): New sanitizer values.
* gcc.c (STATIC_LIBHWASAN_LIBS): New macro.
(LIBHWASAN_SPEC): New macro.
(LIBHWASAN_EARLY_SPEC): New macro.
(SANITIZER_EARLY_SPEC): Update to include hwasan.
(SANITIZER_SPEC): Update to include hwasan.
(sanitize_spec_function): Use hwasan options.
* opts.c (finish_options): Describe conflicts between address
sanitizers.
(find_sanitizer_argument): New.
(report_conflicting_sanitizer_options): New.
(sanitizer_opts): Introduce new sanitizer flags.
(common_handle_option): Add defaults for kernel sanitizer.
* params.opt (hwasan--instrument-stack): New
(hwasan-random-frame-tag): New
(hwasan-instrument-allocas): New
(hwasan-instrument-reads): New
(hwasan-instrument-writes): New
(hwasan-instrument-mem-intrinsics): New
* target.def (HOOK_PREFIX): Add new hook.
(can_tag_addresses): Add new hook under memtag prefix.
* targhooks.c (default_memtag_can_tag_addresses): New.
* targhooks.h (default_memtag_can_tag_addresses): New decl.
* toplev.c (process_options): Ensure hwasan only on
architectures that advertise the possibility.
|
|
This new feature causes the compiler to zero a subset of all call-used
registers at function return. This is used to increase program security
by either mitigating Return-Oriented Programming (ROP) attacks or
preventing information leakage through registers.
gcc/ChangeLog:
2020-10-30 Qing Zhao <qing.zhao@oracle.com>
H.J.Lu <hjl.tools@gmail.com>
* common.opt: Add new option -fzero-call-used-regs
* config/i386/i386.c (zero_call_used_regno_p): New function.
(zero_call_used_regno_mode): Likewise.
(zero_all_vector_registers): Likewise.
(zero_all_st_registers): Likewise.
(zero_all_mm_registers): Likewise.
(ix86_zero_call_used_regs): Likewise.
(TARGET_ZERO_CALL_USED_REGS): Define.
* df-scan.c (df_epilogue_uses_p): New function.
(df_get_exit_block_use_set): Replace EPILOGUE_USES with
df_epilogue_uses_p.
* df.h (df_epilogue_uses_p): Declare.
* doc/extend.texi: Document the new zero_call_used_regs attribute.
* doc/invoke.texi: Document the new -fzero-call-used-regs option.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook.
* emit-rtl.h (struct rtl_data): New field must_be_zero_on_return.
* flag-types.h (namespace zero_regs_flags): New namespace.
* function.c (gen_call_used_regs_seq): New function.
(class pass_zero_call_used_regs): New class.
(pass_zero_call_used_regs::execute): New function.
(make_pass_zero_call_used_regs): New function.
* optabs.c (expand_asm_reg_clobber_mem_blockage): New function.
* optabs.h (expand_asm_reg_clobber_mem_blockage): Declare.
* opts.c (zero_call_used_regs_opts): New structure array
initialization.
(parse_zero_call_used_regs_options): New function.
(common_handle_option): Handle -fzero-call-used-regs.
* opts.h (zero_call_used_regs_opts): New structure array.
* passes.def: Add new pass pass_zero_call_used_regs.
* recog.c (valid_insn_p): New function.
* recog.h (valid_insn_p): Declare.
* resource.c (init_resource_info): Replace EPILOGUE_USES with
df_epilogue_uses_p.
* target.def (zero_call_used_regs): New hook.
* targhooks.c (default_zero_call_used_regs): New function.
* targhooks.h (default_zero_call_used_regs): Declare.
* tree-pass.h (make_pass_zero_call_used_regs): Declare.
gcc/c-family/ChangeLog:
2020-10-30 Qing Zhao <qing.zhao@oracle.com>
H.J.Lu <hjl.tools@gmail.com>
* c-attribs.c (c_common_attribute_table): Add new attribute
zero_call_used_regs.
(handle_zero_call_used_regs_attribute): New function.
gcc/testsuite/ChangeLog:
2020-10-30 Qing Zhao <qing.zhao@oracle.com>
H.J.Lu <hjl.tools@gmail.com>
* c-c++-common/zero-scratch-regs-1.c: New test.
* c-c++-common/zero-scratch-regs-10.c: New test.
* c-c++-common/zero-scratch-regs-11.c: New test.
* c-c++-common/zero-scratch-regs-2.c: New test.
* c-c++-common/zero-scratch-regs-3.c: New test.
* c-c++-common/zero-scratch-regs-4.c: New test.
* c-c++-common/zero-scratch-regs-5.c: New test.
* c-c++-common/zero-scratch-regs-6.c: New test.
* c-c++-common/zero-scratch-regs-7.c: New test.
* c-c++-common/zero-scratch-regs-8.c: New test.
* c-c++-common/zero-scratch-regs-9.c: New test.
* c-c++-common/zero-scratch-regs-attr-usages.c: New test.
* gcc.target/i386/zero-scratch-regs-1.c: New test.
* gcc.target/i386/zero-scratch-regs-10.c: New test.
* gcc.target/i386/zero-scratch-regs-11.c: New test.
* gcc.target/i386/zero-scratch-regs-12.c: New test.
* gcc.target/i386/zero-scratch-regs-13.c: New test.
* gcc.target/i386/zero-scratch-regs-14.c: New test.
* gcc.target/i386/zero-scratch-regs-15.c: New test.
* gcc.target/i386/zero-scratch-regs-16.c: New test.
* gcc.target/i386/zero-scratch-regs-17.c: New test.
* gcc.target/i386/zero-scratch-regs-18.c: New test.
* gcc.target/i386/zero-scratch-regs-19.c: New test.
* gcc.target/i386/zero-scratch-regs-2.c: New test.
* gcc.target/i386/zero-scratch-regs-20.c: New test.
* gcc.target/i386/zero-scratch-regs-21.c: New test.
* gcc.target/i386/zero-scratch-regs-22.c: New test.
* gcc.target/i386/zero-scratch-regs-23.c: New test.
* gcc.target/i386/zero-scratch-regs-24.c: New test.
* gcc.target/i386/zero-scratch-regs-25.c: New test.
* gcc.target/i386/zero-scratch-regs-26.c: New test.
* gcc.target/i386/zero-scratch-regs-27.c: New test.
* gcc.target/i386/zero-scratch-regs-28.c: New test.
* gcc.target/i386/zero-scratch-regs-29.c: New test.
* gcc.target/i386/zero-scratch-regs-30.c: New test.
* gcc.target/i386/zero-scratch-regs-31.c: New test.
* gcc.target/i386/zero-scratch-regs-3.c: New test.
* gcc.target/i386/zero-scratch-regs-4.c: New test.
* gcc.target/i386/zero-scratch-regs-5.c: New test.
* gcc.target/i386/zero-scratch-regs-6.c: New test.
* gcc.target/i386/zero-scratch-regs-7.c: New test.
* gcc.target/i386/zero-scratch-regs-8.c: New test.
* gcc.target/i386/zero-scratch-regs-9.c: New test.
|
|
GCC has a target hook TARGET_LIBC_HAS_FUNCTION, which tells the compiler
which functions it can expect to be present in libc.
The default target hook does not include the sincos functions.
The nvptx port of newlib does include sincos and sincosf, but not sincosl.
The target hook TARGET_LIBC_HAS_FUNCTION does not distinguish between sincos,
sincosf and sincosl, so if we enable it for the sincos functions, then for
test.c:
...
long double x, a, b;
int main (void) {
x = 0.5;
a = sinl (x);
b = cosl (x);
printf ("a: %f\n", (double)a);
printf ("b: %f\n", (double)b);
return 0;
}
...
we introduce a regression:
...
$ gcc test.c -lm -O2
unresolved symbol sincosl
collect2: error: ld returned 1 exit status
...
Add a type argument to target hook TARGET_LIBC_HAS_FUNCTION_TYPE, and use it
in nvptx_libc_has_function_type to enable sincos and sincosf, but not sincosl.
Build and reg-tested on x86_64-linux.
Build and tested on nvptx.
gcc/ChangeLog:
2020-09-28 Tobias Burnus <tobias@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* builtins.c (expand_builtin_cexpi, fold_builtin_sincos): Update
targetm.libc_has_function call.
* builtins.def (DEF_C94_BUILTIN, DEF_C99_BUILTIN, DEF_C11_BUILTIN):
(DEF_C2X_BUILTIN, DEF_C99_COMPL_BUILTIN, DEF_C99_C90RES_BUILTIN):
Same.
* config/darwin-protos.h (darwin_libc_has_function): Update prototype.
* config/darwin.c (darwin_libc_has_function): Add arg.
* config/linux-protos.h (linux_libc_has_function): Update prototype.
* config/linux.c (linux_libc_has_function): Add arg.
* config/i386/i386.c (ix86_libc_has_function): Update
targetm.libc_has_function call.
* config/nvptx/nvptx.c (nvptx_libc_has_function): New function.
(TARGET_LIBC_HAS_FUNCTION): Redefine to nvptx_libc_has_function.
* convert.c (convert_to_integer_1): Update targetm.libc_has_function
call.
* match.pd: Same.
* target.def (libc_has_function): Add arg.
* doc/tm.texi: Regenerate.
* targhooks.c (default_libc_has_function, gnu_libc_has_function)
(no_c99_libc_has_function): Add arg.
* targhooks.h (default_libc_has_function, no_c99_libc_has_function)
(gnu_libc_has_function): Update prototype.
* tree-ssa-math-opts.c (pass_cse_sincos::execute): Update
targetm.libc_has_function call.
gcc/fortran/ChangeLog:
2020-09-30 Tom de Vries <tdevries@suse.de>
* f95-lang.c (gfc_init_builtin_functions): Update
targetm.libc_has_function call.
|
|
This adds a vectype parameter to add_stmt_cost which avoids the need
to pass down a (wrong) stmt_info just to carry this information.
Useful for invariants which do not have a stmt_info associated.
2020-05-13 Richard Biener <rguenther@suse.de>
* target.def (add_stmt_cost): Add new vectype parameter.
* targhooks.c (default_add_stmt_cost): Adjust.
* targhooks.h (default_add_stmt_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Take new
vectype parameter.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
* tree-vectorizer.h (stmt_info_for_cost::vectype): Add.
(dump_stmt_cost): Add new vectype parameter.
(add_stmt_cost): Likewise.
(record_stmt_cost): Likewise.
(record_stmt_cost): Add overload with old signature.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Adjust.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vectorizer.c (dump_stmt_cost): Add new vectype parameter.
* tree-vect-stmts.c (record_stmt_cost): Likewise.
(vect_prologue_cost_for_slp_op): Remove stmt_vec_info parameter
and pass down correct vectype and NULL stmt_info.
(vect_model_simple_cost): Adjust.
(vect_model_store_cost): Likewise.
|
|
gcc/
* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
* config/riscv/riscv-passes.def: New file.
* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
* config/riscv/riscv-shorten-memrefs.c: New file.
* config/riscv/riscv.c (tree-pass.h): New include.
(riscv_compressed_reg_p): New Function
(riscv_compressed_lw_offset_p): Likewise.
(riscv_compressed_lw_address_p): Likewise.
(riscv_shorten_lw_offset): Likewise.
(riscv_legitimize_address): Attempt to convert base + large_offset
to compressible new_base + small_offset.
(riscv_address_cost): Make anticipated compressed load/stores
cheaper for code size than uncompressed load/stores.
(riscv_register_priority): Move compressed register check to
riscv_compressed_reg_p.
* config/riscv/riscv.h (C_S_BITS): Define.
(CSW_MAX_OFFSET): Define.
* config/riscv/riscv.opt (mshorten-memefs): New option.
* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
(PASSES_EXTRA): Add riscv-passes.def.
* doc/invoke.texi: Document -mshorten-memrefs.
* config/riscv/riscv.c (riscv_new_address_profitable_p): New function.
(TARGET_NEW_ADDRESS_PROFITABLE_P): Define.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_NEW_ADDRESS_PROFITABLE_P): New hook.
* sched-deps.c (attempt_change): Use old address if it is cheaper than
new address.
* target.def (new_address_profitable_p): New hook.
* targhooks.c (default_new_address_profitable_p): New function.
* targhooks.h (default_new_address_profitable_p): Declare.
gcc/testsuite/
* gcc.target/riscv/shorten-memrefs-1.c: New test.
* gcc.target/riscv/shorten-memrefs-2.c: New test.
* gcc.target/riscv/shorten-memrefs-3.c: New test.
* gcc.target/riscv/shorten-memrefs-4.c: New test.
* gcc.target/riscv/shorten-memrefs-5.c: New test.
* gcc.target/riscv/shorten-memrefs-6.c: New test.
* gcc.target/riscv/shorten-memrefs-7.c: New test.
|
|
C++ makes mismatched prototype and implementation OK.
2020-05-05 Richard Biener <rguenther@suse.de>
* targhooks.h (default_add_stmt_cost): Add vec_info * parameter.
|
|
From-SVN: r279813
|
|
2019-11-27 Richard Biener <rguenther@suse.de>
* target.def (TARGET_VECTORIZE_BUILTIN_CONVERSION): Remove.
* targhooks.c (default_builtin_vectorized_conversion): Likewise.
* targhooks.h (default_builtin_vectorized_conversion): Likewise.
* optabs-tree.c (supportable_convert_operation): Do not call
targetm.vectorize.builtin_conversion. Remove unused decl parameter.
* optabs-tree.h (supportable_convert_operation): Adjust.
* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_CONVERSION): Remove.
* doc/tm.texi: Regenerate.
* tree-ssa-forwprop.c (simplify_vector_constructor): Adjust.
* tree-vect-generic.c (expand_vector_conversion): Likewise.
* tree-vect-stmts.c (vect_gen_widened_results_half): Remove
unused decl parameter and adjust.
(vect_create_vectorized_promotion_stmts): Likewise.
(vectorizable_conversion): Adjust.
From-SVN: r278765
|
|
This patch adds a mode in which the vectoriser tries each available
base vector mode and picks the one with the lowest cost. The new
behaviour is selected by autovectorize_vector_modes.
The patch keeps the current behaviour of preferring a VF of
loop->simdlen over any larger or smaller VF, regardless of costs
or target preferences.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.h (VECT_COMPARE_COSTS): New constant.
* target.def (autovectorize_vector_modes): Return a bitmask of flags.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_modes): Update accordingly.
* targhooks.c (default_autovectorize_vector_modes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_modes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_modes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_modes): Likewise.
* tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
(_loop_vec_info::vec_inside_cost): New member variables.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
(vect_analyze_loop): When autovectorize_vector_modes returns
VECT_COMPARE_COSTS, try vectorizing the loop with each available
vector mode and picking the one with the lowest cost.
(vect_estimate_min_profitable_iters): Record the computed costs
in the loop_vec_info.
From-SVN: r278336
|
|
This is another patch in the series to remove the assumption that
all modes involved in vectorisation have to be the same size.
Rather than have the target provide a list of vector sizes,
it makes the target provide a list of vector "approaches",
with each approach represented by a mode.
A later patch will pass this mode to targetm.vectorize.related_mode
to get the vector mode for a given element mode. Until then, the modes
simply act as an alternative way of specifying the vector size.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.h (vector_sizes, auto_vector_sizes): Delete.
(vector_modes, auto_vector_modes): New typedefs.
* target.def (autovectorize_vector_sizes): Replace with...
(autovectorize_vector_modes): ...this new hook.
* doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES):
Replace with...
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* targhooks.c (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead
of autovectorize_vector_sizes. Use the number of units in the mode
to calculate the maximum VF.
* omp-low.c (omp_clause_aligned_alignment): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
Use a loop based on related_mode to iterate through all supported
vector modes for a given scalar mode.
* optabs-query.c (can_vec_mask_load_store_p): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
* tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb_region): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
Replace with...
(aarch64_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with...
(arc_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with...
(arm_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with...
(ix86_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with...
(mips_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
From-SVN: r278236
|
|
This patch passes the data vector mode to get_mask_mode, rather than its
size and nunits. This is a bit simpler and allows targets to distinguish
between modes that happen to have the same size and number of elements.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.def (get_mask_mode): Take a vector mode itself as argument,
instead of properties about the vector mode.
* doc/tm.texi: Regenerate.
* targhooks.h (default_get_mask_mode): Update to reflect new
get_mode_mask interface.
* targhooks.c (default_get_mask_mode): Likewise. Use
related_int_vector_mode.
* optabs-query.c (can_vec_mask_load_store_p): Update call
to get_mask_mode.
* tree-vect-stmts.c (check_load_store_masking): Likewise, checking
first that the original mode really is a vector.
* tree.c (build_truth_vector_type_for): Likewise.
* config/aarch64/aarch64.c (aarch64_get_mask_mode): Update for new
get_mode_mask interface.
(aarch64_expand_sve_vcond): Update call accordingly.
* config/gcn/gcn.c (gcn_vectorize_get_mask_mode): Update for new
get_mode_mask interface.
* config/i386/i386.c (ix86_get_mask_mode): Likewise.
From-SVN: r278233
|
|
This patch is the first of a series that tries to remove two
assumptions:
(1) that all vectors involved in vectorisation must be the same size
(2) that there is only one vector mode for a given element mode and
number of elements
Relaxing (1) helps with targets that support multiple vector sizes or
that require the number of elements to stay the same. E.g. if we're
vectorising code that operates on narrow and wide elements, and the
narrow elements use 64-bit vectors, then on AArch64 it would normally
be better to use 128-bit vectors rather than pairs of 64-bit vectors
for the wide elements.
Relaxing (2) makes it possible for -msve-vector-bits=128 to produce
fixed-length code for SVE. It also allows unpacked/half-size SVE
vectors to work with -msve-vector-bits=256.
The patch adds a new hook that targets can use to control how we
move from one vector mode to another. The hook takes a starting vector
mode, a new element mode, and (optionally) a new number of elements.
The flexibility needed for (1) comes in when the number of elements
isn't specified.
All callers in this patch specify the number of elements, but a later
vectoriser patch doesn't.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.def (related_mode): New hook.
* doc/tm.texi.in (TARGET_VECTORIZE_RELATED_MODE): New hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_vectorize_related_mode): Declare.
* targhooks.c (default_vectorize_related_mode): New function.
* machmode.h (related_vector_mode): Declare.
* stor-layout.c (related_vector_mode): New function.
* expmed.c (extract_bit_field_1): Use it instead of mode_for_vector.
* optabs-query.c (qimode_for_vec_perm): Likewise.
* tree-vect-stmts.c (get_group_load_store_type): Likewise.
(vectorizable_store, vectorizable_load): Likewise
From-SVN: r278229
|
|
This patch replaces get_call_reg_set_usage with insn_callee_abi,
which returns the ABI of the target of a call insn. The ABI's
full_reg_clobbers corresponds to regs_invalidated_by_call,
whereas many callers instead passed call_used_or_fixed_regs, i.e.:
(regs_invalidated_by_call | fixed_reg_set)
The patch slavishly preserves the "| fixed_reg_set" for these callers;
later patches will clean this up.
2019-09-30 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.def (insn_callee_abi): New hook.
(remove_extra_call_preserved_regs): Delete.
* doc/tm.texi.in (TARGET_INSN_CALLEE_ABI): New macro.
(TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): Delete.
* doc/tm.texi: Regenerate.
* targhooks.h (default_remove_extra_call_preserved_regs): Delete.
* targhooks.c (default_remove_extra_call_preserved_regs): Delete.
* config/aarch64/aarch64.c (aarch64_simd_call_p): Constify the
insn argument.
(aarch64_remove_extra_call_preserved_regs): Delete.
(aarch64_insn_callee_abi): New function.
(TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): Delete.
(TARGET_INSN_CALLEE_ABI): New macro.
* rtl.h (get_call_fndecl): Declare.
(cgraph_rtl_info): Fix formatting. Tweak comment for
function_used_regs. Remove function_used_regs_valid.
* rtlanal.c (get_call_fndecl): Moved from final.c
* function-abi.h (insn_callee_abi): Declare.
(target_function_abi_info): Mention insn_callee_abi.
* function-abi.cc (fndecl_abi): Handle flag_ipa_ra in a similar
way to get_call_reg_set_usage did.
(insn_callee_abi): New function.
* regs.h (get_call_reg_set_usage): Delete.
* final.c: Include function-abi.h.
(collect_fn_hard_reg_usage): Add fixed and stack registers to
function_used_regs before the main loop rather than afterwards.
Use insn_callee_abi instead of get_call_reg_set_usage. Exit early
if function_used_regs ends up not being useful.
(get_call_fndecl): Move to rtlanal.c
(get_call_cgraph_rtl_info, get_call_reg_set_usage): Delete.
* caller-save.c: Include function-abi.h.
(setup_save_areas, save_call_clobbered_regs): Use insn_callee_abi
instead of get_call_reg_set_usage.
* cfgcleanup.c: Include function-abi.h.
(old_insns_match_p): Use insn_callee_abi instead of
get_call_reg_set_usage.
* cgraph.h (cgraph_node::rtl_info): Take a const_tree instead of
a tree.
* cgraph.c (cgraph_node::rtl_info): Likewise. Initialize
function_used_regs.
* df-scan.c: Include function-abi.h.
(df_get_call_refs): Use insn_callee_abi instead of
get_call_reg_set_usage.
* ira-lives.c: Include function-abi.h.
(process_bb_node_lives): Use insn_callee_abi instead of
get_call_reg_set_usage.
* lra-lives.c: Include function-abi.h.
(process_bb_lives): Use insn_callee_abi instead of
get_call_reg_set_usage.
* postreload.c: Include function-abi.h.
(reload_combine): Use insn_callee_abi instead of
get_call_reg_set_usage.
* regcprop.c: Include function-abi.h.
(copyprop_hardreg_forward_1): Use insn_callee_abi instead of
get_call_reg_set_usage.
* resource.c: Include function-abi.h.
(mark_set_resources, mark_target_live_regs): Use insn_callee_abi
instead of get_call_reg_set_usage.
* var-tracking.c: Include function-abi.h.
(dataflow_set_clear_at_call): Use insn_callee_abi instead of
get_call_reg_set_usage.
From-SVN: r276309
|
|
bt-load.c has AFAIK been dead code since the removal of the SH5 port
in 2016. I have a patch series that would need to update the liveness
tracking in a nontrivial way, so it seemed better to remove the pass
rather than install an untested and probably bogus change.
2019-09-09 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* Makefile.in (OBJS): Remove bt-load.o.
* doc/invoke.texi (fbranch-target-load-optimize): Delete.
(fbranch-target-load-optimize2, fbtr-bb-exclusive): Likewise.
* common.opt (fbranch-target-load-optimize): Mark as Ignore and
document that the option no longer does anything.
(fbranch-target-load-optimize2, fbtr-bb-exclusive): Likewise.
* target.def (branch_target_register_class): Delete.
(branch_target_register_callee_saved): Likewise.
* doc/tm.texi.in (TARGET_BRANCH_TARGET_REGISTER_CLASS): Likewise.
(TARGET_BRANCH_TARGET_REGISTER_CALLEE_SAVED): Likewise.
* doc/tm.texi: Regenerate.
* tree-pass.h (make_pass_branch_target_load_optimize1): Delete.
(make_pass_branch_target_load_optimize2): Likewise.
* passes.def (pass_branch_target_load_optimize1): Likewise.
(pass_branch_target_load_optimize2): Likewise.
* targhooks.h (default_branch_target_register_class): Likewise.
* targhooks.c (default_branch_target_register_class): Likewise.
* opt-suggestions.c (test_completion_valid_options): Remove
-fbtr-bb-exclusive from the list of test options.
* bt-load.c: Remove.
From-SVN: r275521
|
|
The hook is passed the unpromoted type mode instead of the promoted mode.
The aarch64 definition is redundant, but worth keeping for emphasis.
2019-08-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.def (callee_copies): Take a function_arg_info instead
of a mode, type and named flag.
* doc/tm.texi: Regenerate.
* targhooks.h (hook_callee_copies_named): Take a function_arg_info
instead of a mode, type and named flag.
(hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false): Delete.
(hook_bool_CUMULATIVE_ARGS_mode_tree_bool_true): Likewise.
(hook_bool_CUMULATIVE_ARGS_arg_info_true): New function.
* targhooks.c (hook_callee_copies_named): Take a function_arg_info
instead of a mode, type and named flag.
(hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false): Delete.
(hook_bool_CUMULATIVE_ARGS_mode_tree_bool_true): Likewise.
(hook_bool_CUMULATIVE_ARGS_arg_info_true): New function.
* calls.h (reference_callee_copied): Take a function_arg_info
instead of a mode, type and named flag.
* calls.c (reference_callee_copied): Likewise.
(initialize_argument_information): Update call accordingly.
(emit_library_call_value_1): Likewise.
* function.c (gimplify_parameters): Likewise.
* config/aarch64/aarch64.c (TARGET_CALLEE_COPIES): Define to
hook_bool_CUMULATIVE_ARGS_arg_info_false instead of
hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false.
* config/c6x/c6x.c (c6x_callee_copies): Delete.
(TARGET_CALLEE_COPIES): Define to
hook_bool_CUMULATIVE_ARGS_arg_info_true instead.
* config/epiphany/epiphany.c (TARGET_CALLEE_COPIES): Define to
hook_bool_CUMULATIVE_ARGS_arg_info_true instead of
hook_bool_CUMULATIVE_ARGS_mode_tree_bool_true.
* config/mips/mips.c (mips_callee_copies): Take a function_arg_info
instead of a mode, type and named flag.
* config/mmix/mmix.c (TARGET_CALLEE_COPIES): Define to
hook_bool_CUMULATIVE_ARGS_arg_info_true instead of
hook_bool_CUMULATIVE_ARGS_mode_tree_bool_true.
* config/mn10300/mn10300.c (TARGET_CALLEE_COPIES): Likewise.
* config/msp430/msp430.c (msp430_callee_copies): Delete.
(TARGET_CALLEE_COPIES): Define to
hook_bool_CUMULATIVE_ARGS_arg_info_true instead.
* config/pa/pa.c (pa_callee_copies): Take a function_arg_info
instead of a mode, type and named flag.
* config/sh/sh.c (sh_callee_copies): Likewise.
* config/v850/v850.c (TARGET_CALLEE_COPIES): Define to
hook_bool_CUMULATIVE_ARGS_arg_info_true instead of
hook_bool_CUMULATIVE_ARGS_mode_tree_bool_true.
From-SVN: r274702
|