diff options
author | Jakub Jelinek <jakub@redhat.com> | 2022-10-14 09:37:01 +0200 |
---|---|---|
committer | Jakub Jelinek <jakub@redhat.com> | 2022-10-14 09:37:01 +0200 |
commit | c2565a31c1622ab0926aeef4a6579413e121b9f9 (patch) | |
tree | 0182fba3c78ebcdc1d59f6c1ca9605ee62da6fd2 /gcc/expr.cc | |
parent | 16ec267063c8ce60769888d4097bcd158410adc8 (diff) | |
download | gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.zip gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.gz gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.bz2 |
middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support
Here is a complete patch to add std::bfloat16_t support on
x86 (AArch64 and ARM left for later). Almost no BFmode optabs
are added by the patch, so for binops/unops it extends to SFmode
first and then truncates back to BFmode.
For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
of all those conversions so that we avoid double rounding, for
BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
it emits BFmode -> SFmode conversion first and then converts to the even
wider mode, neither step should be imprecise.
For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
and then SFmode -> HFmode, because neither format is subset or superset
of the other, while SFmode is superset of both.
expr.cc then contains a -ffast-math optimization of the BF -> SF and
SF -> BF conversions if we don't optimize for space (and for the latter
if -frounding-math isn't enabled either).
For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
|| !flag_unsafe_math_optimizations, because I think the insn doesn't
raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
By default (unless x86 -fexcess-precision=16) we use float excess
precision for BFmode, so truncate only on explicit casts and assignments.
The patch introduces a single __bf16 builtin - __builtin_nansf16b,
because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN,
and uses f16b suffix instead of bf16 because there would be ambiguity on
log vs. logb - __builtin_logbf16 could be either log with bf16 suffix
or logb with f16 suffix. In other cases libstdc++ should mostly use
__builtin_*f for std::bfloat16_t overloads (we have a problem with
std::nextafter though but that one we have also for std::float16_t).
2022-10-14 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
* tree.h (bfloat16_type_node): Define.
* tree.cc (excess_precision_type): Promote bfloat16_type_mode
like float16_type_mode.
(build_common_tree_nodes): Initialize bfloat16_type_node if
BFmode is supported.
* expmed.h (maybe_expand_shift): Declare.
* expmed.cc (maybe_expand_shift): No longer static.
* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
conversions. If there is no optab, handle BF -> {DF,XF,TF,HF}
conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
-ffast-math generic implementation for BF -> SF and SF -> BF
conversions.
* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
* builtins.def (BUILT_IN_NANSF16B): New builtin.
* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
* config/i386/i386.cc (classify_argument): Handle E_BCmode.
(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
for -msse2.
(ix86_mangle_type): Mangle BFmode as DF16b.
(ix86_invalid_conversion, ix86_invalid_unary_op,
ix86_invalid_binary_op): Remove.
(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
TARGET_INVALID_BINARY_OP): Don't redefine.
* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
ix86_bf16_type_node, only create it if still NULL.
* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
predefine __BFLT16_*__ macros and for C++23 also
__STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related
macros for -fbuilding-libgcc.
* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
gcc/c/
* c-typeck.cc (convert_arguments): Don't promote __bf16 to
double.
gcc/cp/
* cp-tree.h (extended_float_type_p): Return true for
bfloat16_type_node.
* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_bfloat16,
check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
New.
* gcc.dg/torture/bfloat16-basic.c: New test.
* gcc.dg/torture/bfloat16-builtin.c: New test.
* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
* gcc.dg/torture/bfloat16-complex.c: New test.
* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
from bfloat16-builtin-issignaling-1.c.
* gcc.dg/torture/floatn-basic.h: Allow to be includable from
bfloat16-basic.c.
* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
diagnostics.
* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
libcpp/
* include/cpplib.h (CPP_N_BFLOAT16): Define.
* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
C++.
libgcc/
* config/i386/t-softfp (softfp_extensions): Add bfsf.
(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
-msse2.
* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
* soft-fp/brain.h: New file.
* soft-fp/truncsfbf2.c: New file.
* soft-fp/truncdfbf2.c: New file.
* soft-fp/truncxfbf2.c: New file.
* soft-fp/trunctfbf2.c: New file.
* soft-fp/trunchfbf2.c: New file.
* soft-fp/truncbfhf2.c: New file.
* soft-fp/extendbfsf2.c: New file.
libiberty/
* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
entry.
(cplus_demangle_type): Demangle DF16b.
* testsuite/demangle-expected (_Z3xxxDF16b): New test.
Diffstat (limited to 'gcc/expr.cc')
-rw-r--r-- | gcc/expr.cc | 150 |
1 files changed, 149 insertions, 1 deletions
diff --git a/gcc/expr.cc b/gcc/expr.cc index b897b6d..4c892d6 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp) gcc_assert ((GET_MODE_PRECISION (from_mode) != GET_MODE_PRECISION (to_mode)) || (DECIMAL_FLOAT_MODE_P (from_mode) - != DECIMAL_FLOAT_MODE_P (to_mode))); + != DECIMAL_FLOAT_MODE_P (to_mode)) + || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format + && REAL_MODE_FORMAT (to_mode) == &ieee_half_format) + || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format + && REAL_MODE_FORMAT (from_mode) == &ieee_half_format)); if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode)) /* Conversion between decimal float and binary float, same size. */ @@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp) return; } +#ifdef HAVE_SFmode + if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format + && REAL_MODE_FORMAT (SFmode) == &ieee_single_format) + { + if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode)) + { + /* To cut down on libgcc size, implement + BFmode -> {DF,XF,TF}mode conversions by + BFmode -> SFmode -> {DF,XF,TF}mode conversions. */ + rtx temp = gen_reg_rtx (SFmode); + convert_mode_scalar (temp, from, unsignedp); + convert_mode_scalar (to, temp, unsignedp); + return; + } + if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format) + { + /* Similarly, implement BFmode -> HFmode as + BFmode -> SFmode -> HFmode conversion where SFmode + has superset of BFmode values. We don't need + to handle sNaNs by raising exception and turning + into into qNaN though, as that can be done in the + SFmode -> HFmode conversion too. */ + rtx temp = gen_reg_rtx (SFmode); + int save_flag_finite_math_only = flag_finite_math_only; + flag_finite_math_only = true; + convert_mode_scalar (temp, from, unsignedp); + flag_finite_math_only = save_flag_finite_math_only; + convert_mode_scalar (to, temp, unsignedp); + return; + } + if (to_mode == SFmode + && !HONOR_NANS (from_mode) + && !HONOR_NANS (to_mode) + && optimize_insn_for_speed_p ()) + { + /* If we don't expect sNaNs, for BFmode -> SFmode we can just + shift the bits up. */ + machine_mode fromi_mode, toi_mode; + if (int_mode_for_size (GET_MODE_BITSIZE (from_mode), + 0).exists (&fromi_mode) + && int_mode_for_size (GET_MODE_BITSIZE (to_mode), + 0).exists (&toi_mode)) + { + start_sequence (); + rtx fromi = lowpart_subreg (fromi_mode, from, from_mode); + rtx tof = NULL_RTX; + if (fromi) + { + rtx toi = gen_reg_rtx (toi_mode); + convert_mode_scalar (toi, fromi, 1); + toi + = maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi, + GET_MODE_PRECISION (to_mode) + - GET_MODE_PRECISION (from_mode), + NULL_RTX, 1); + if (toi) + { + tof = lowpart_subreg (to_mode, toi, toi_mode); + if (tof) + emit_move_insn (to, tof); + } + } + insns = get_insns (); + end_sequence (); + if (tof) + { + emit_insn (insns); + return; + } + } + } + } + if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format + && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format + && !HONOR_NANS (from_mode) + && !HONOR_NANS (to_mode) + && !flag_rounding_math + && optimize_insn_for_speed_p ()) + { + /* If we don't expect qNaNs nor sNaNs and can assume rounding + to nearest, we can expand the conversion inline as + (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16. */ + machine_mode fromi_mode, toi_mode; + if (int_mode_for_size (GET_MODE_BITSIZE (from_mode), + 0).exists (&fromi_mode) + && int_mode_for_size (GET_MODE_BITSIZE (to_mode), + 0).exists (&toi_mode)) + { + start_sequence (); + rtx fromi = lowpart_subreg (fromi_mode, from, from_mode); + rtx tof = NULL_RTX; + do + { + if (!fromi) + break; + int shift = (GET_MODE_PRECISION (from_mode) + - GET_MODE_PRECISION (to_mode)); + rtx temp1 + = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi, + shift, NULL_RTX, 1); + if (!temp1) + break; + rtx temp2 + = expand_binop (fromi_mode, and_optab, temp1, const1_rtx, + NULL_RTX, 1, OPTAB_DIRECT); + if (!temp2) + break; + rtx temp3 + = expand_binop (fromi_mode, add_optab, fromi, + gen_int_mode ((HOST_WIDE_INT_1U + << (shift - 1)) - 1, + fromi_mode), NULL_RTX, + 1, OPTAB_DIRECT); + if (!temp3) + break; + rtx temp4 + = expand_binop (fromi_mode, add_optab, temp3, temp2, + NULL_RTX, 1, OPTAB_DIRECT); + if (!temp4) + break; + rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, + temp4, shift, NULL_RTX, 1); + if (!temp5) + break; + rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode); + if (!temp6) + break; + tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6), + toi_mode); + if (tof) + emit_move_insn (to, tof); + } + while (0); + insns = get_insns (); + end_sequence (); + if (tof) + { + emit_insn (insns); + return; + } + } + } +#endif + /* Otherwise use a libcall. */ libcall = convert_optab_libfunc (tab, to_mode, from_mode); |