middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support

Here is a complete patch to add std::bfloat16_t support on x86 (AArch64 and ARM left for later). Almost no BFmode optabs are added by the patch, so for binops/unops it extends to SFmode first and then truncates back to BFmode. For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations of all those conversions so that we avoid double rounding, for BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much it emits BFmode -> SFmode conversion first and then converts to the even wider mode, neither step should be imprecise. For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion and then SFmode -> HFmode, because neither format is subset or superset of the other, while SFmode is superset of both. expr.cc then contains a -ffast-math optimization of the BF -> SF and SF -> BF conversions if we don't optimize for space (and for the latter if -frounding-math isn't enabled either). For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16 but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math || !flag_unsafe_math_optimizations, because I think the insn doesn't raise on sNaNs, hardcodes round to nearest and flushes denormals to zero. By default (unless x86 -fexcess-precision=16) we use float excess precision for BFmode, so truncate only on explicit casts and assignments. The patch introduces a single __bf16 builtin - __builtin_nansf16b, because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN, and uses f16b suffix instead of bf16 because there would be ambiguity on log vs. logb - __builtin_logbf16 could be either log with bf16 suffix or logb with f16 suffix. In other cases libstdc++ should mostly use __builtin_*f for std::bfloat16_t overloads (we have a problem with std::nextafter though but that one we have also for std::float16_t). 2022-10-14 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE. * tree.h (bfloat16_type_node): Define. * tree.cc (excess_precision_type): Promote bfloat16_type_mode like float16_type_mode. (build_common_tree_nodes): Initialize bfloat16_type_node if BFmode is supported. * expmed.h (maybe_expand_shift): Declare. * expmed.cc (maybe_expand_shift): No longer static. * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF conversions. If there is no optab, handle BF -> {DF,XF,TF,HF} conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add -ffast-math generic implementation for BF -> SF and SF -> BF conversions. * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New. * builtins.def (BUILT_IN_NANSF16B): New builtin. * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B. * config/i386/i386.cc (classify_argument): Handle E_BCmode. (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode for -msse2. (ix86_mangle_type): Mangle BFmode as DF16b. (ix86_invalid_conversion, ix86_invalid_unary_op, ix86_invalid_binary_op): Remove. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP): Don't redefine. * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove. (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than ix86_bf16_type_node, only create it if still NULL. * config/i386/i386-builtin-types.def (BFLOAT16): Likewise. * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node, predefine __BFLT16_*__ macros and for C++23 also __STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related macros for -fbuilding-libgcc. * c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote __bf16 to double. gcc/cp/ * cp-tree.h (extended_float_type_p): Return true for bfloat16_type_node. * typeck.cc (cp_compare_floating_point_conversion_ranks): Set extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bfloat16, check_effective_target_bfloat16_runtime, add_options_for_bfloat16): New. * gcc.dg/torture/bfloat16-basic.c: New test. * gcc.dg/torture/bfloat16-builtin.c: New test. * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test. * gcc.dg/torture/bfloat16-complex.c: New test. * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable from bfloat16-builtin-issignaling-1.c. * gcc.dg/torture/floatn-basic.h: Allow to be includable from bfloat16-basic.c. * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected diagnostics. * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise. * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise. * g++.target/i386/bfloat_cpp_typecheck.C: Likewise. libcpp/ * include/cpplib.h (CPP_N_BFLOAT16): Define. * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for C++. libgcc/ * config/i386/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf. (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c, CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add -msse2. * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,x,t,h}fbf2. * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define. * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define. * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define. * soft-fp/brain.h: New file. * soft-fp/truncsfbf2.c: New file. * soft-fp/truncdfbf2.c: New file. * soft-fp/truncxfbf2.c: New file. * soft-fp/trunctfbf2.c: New file. * soft-fp/trunchfbf2.c: New file. * soft-fp/truncbfhf2.c: New file. * soft-fp/extendbfsf2.c: New file. libiberty/ * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment. * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t entry. (cplus_demangle_type): Demangle DF16b. * testsuite/demangle-expected (_Z3xxxDF16b): New test.
author: Jakub Jelinek <jakub@redhat.com> 2022-10-14 09:37:01 +0200
committer: Jakub Jelinek <jakub@redhat.com> 2022-10-14 09:37:01 +0200
commit: c2565a31c1622ab0926aeef4a6579413e121b9f9 (patch)
tree: 0182fba3c78ebcdc1d59f6c1ca9605ee62da6fd2 /gcc/expr.cc
parent: 16ec267063c8ce60769888d4097bcd158410adc8 (diff)
download: gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.zip
gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.gz
gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.bz2
1 files changed, 149 insertions, 1 deletions
diff --git a/gcc/expr.cc b/gcc/expr.cc
index b897b6d..4c892d6 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
       gcc_assert ((GET_MODE_PRECISION (from_mode)
 		   != GET_MODE_PRECISION (to_mode))
 		  || (DECIMAL_FLOAT_MODE_P (from_mode)
-		      != DECIMAL_FLOAT_MODE_P (to_mode)));
+		      != DECIMAL_FLOAT_MODE_P (to_mode))
+		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
 
       if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
 	/* Conversion between decimal float and binary float, same size.  */
@@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
 	  return;
 	}
 
+#ifdef HAVE_SFmode
+      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
+	{
+	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
+	    {
+	      /* To cut down on libgcc size, implement
+		 BFmode -> {DF,XF,TF}mode conversions by
+		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      convert_mode_scalar (temp, from, unsignedp);
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+	    {
+	      /* Similarly, implement BFmode -> HFmode as
+		 BFmode -> SFmode -> HFmode conversion where SFmode
+		 has superset of BFmode values.  We don't need
+		 to handle sNaNs by raising exception and turning
+		 into into qNaN though, as that can be done in the
+		 SFmode -> HFmode conversion too.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      int save_flag_finite_math_only = flag_finite_math_only;
+	      flag_finite_math_only = true;
+	      convert_mode_scalar (temp, from, unsignedp);
+	      flag_finite_math_only = save_flag_finite_math_only;
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (to_mode == SFmode
+	      && !HONOR_NANS (from_mode)
+	      && !HONOR_NANS (to_mode)
+	      && optimize_insn_for_speed_p ())
+	    {
+	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
+		 shift the bits up.  */
+	      machine_mode fromi_mode, toi_mode;
+	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				     0).exists (&fromi_mode)
+		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+					0).exists (&toi_mode))
+		{
+		  start_sequence ();
+		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+		  rtx tof = NULL_RTX;
+		  if (fromi)
+		    {
+		      rtx toi = gen_reg_rtx (toi_mode);
+		      convert_mode_scalar (toi, fromi, 1);
+		      toi
+			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
+					      GET_MODE_PRECISION (to_mode)
+					      - GET_MODE_PRECISION (from_mode),
+					      NULL_RTX, 1);
+		      if (toi)
+			{
+			  tof = lowpart_subreg (to_mode, toi, toi_mode);
+			  if (tof)
+			    emit_move_insn (to, tof);
+			}
+		    }
+		  insns = get_insns ();
+		  end_sequence ();
+		  if (tof)
+		    {
+		      emit_insn (insns);
+		      return;
+		    }
+		}
+	    }
+	}
+      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
+	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+	  && !HONOR_NANS (from_mode)
+	  && !HONOR_NANS (to_mode)
+	  && !flag_rounding_math
+	  && optimize_insn_for_speed_p ())
+	{
+	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
+	     to nearest, we can expand the conversion inline as
+	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
+	  machine_mode fromi_mode, toi_mode;
+	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				 0).exists (&fromi_mode)
+	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+				    0).exists (&toi_mode))
+	    {
+	      start_sequence ();
+	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+	      rtx tof = NULL_RTX;
+	      do
+		{
+		  if (!fromi)
+		    break;
+		  int shift = (GET_MODE_PRECISION (from_mode)
+			       - GET_MODE_PRECISION (to_mode));
+		  rtx temp1
+		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
+					  shift, NULL_RTX, 1);
+		  if (!temp1)
+		    break;
+		  rtx temp2
+		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp2)
+		    break;
+		  rtx temp3
+		    = expand_binop (fromi_mode, add_optab, fromi,
+				    gen_int_mode ((HOST_WIDE_INT_1U
+						   << (shift - 1)) - 1,
+						  fromi_mode), NULL_RTX,
+				    1, OPTAB_DIRECT);
+		  if (!temp3)
+		    break;
+		  rtx temp4
+		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp4)
+		    break;
+		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
+						  temp4, shift, NULL_RTX, 1);
+		  if (!temp5)
+		    break;
+		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+		  if (!temp6)
+		    break;
+		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
+					toi_mode);
+		  if (tof)
+		    emit_move_insn (to, tof);
+		}
+	      while (0);
+	      insns = get_insns ();
+	      end_sequence ();
+	      if (tof)
+		{
+		  emit_insn (insns);
+		  return;
+		}
+	    }
+	}
+#endif
+
       /* Otherwise use a libcall.  */
       libcall = convert_optab_libfunc (tab, to_mode, from_mode);
author	Jakub Jelinek <jakub@redhat.com>	2022-10-14 09:37:01 +0200
committer	Jakub Jelinek <jakub@redhat.com>	2022-10-14 09:37:01 +0200
commit	c2565a31c1622ab0926aeef4a6579413e121b9f9 (patch)
tree	0182fba3c78ebcdc1d59f6c1ca9605ee62da6fd2 /gcc/expr.cc
parent	16ec267063c8ce60769888d4097bcd158410adc8 (diff)
download	gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.zip gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.gz gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.bz2