middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support

Here is a complete patch to add std::bfloat16_t support on x86 (AArch64 and ARM left for later). Almost no BFmode optabs are added by the patch, so for binops/unops it extends to SFmode first and then truncates back to BFmode. For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations of all those conversions so that we avoid double rounding, for BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much it emits BFmode -> SFmode conversion first and then converts to the even wider mode, neither step should be imprecise. For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion and then SFmode -> HFmode, because neither format is subset or superset of the other, while SFmode is superset of both. expr.cc then contains a -ffast-math optimization of the BF -> SF and SF -> BF conversions if we don't optimize for space (and for the latter if -frounding-math isn't enabled either). For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16 but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math || !flag_unsafe_math_optimizations, because I think the insn doesn't raise on sNaNs, hardcodes round to nearest and flushes denormals to zero. By default (unless x86 -fexcess-precision=16) we use float excess precision for BFmode, so truncate only on explicit casts and assignments. The patch introduces a single __bf16 builtin - __builtin_nansf16b, because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN, and uses f16b suffix instead of bf16 because there would be ambiguity on log vs. logb - __builtin_logbf16 could be either log with bf16 suffix or logb with f16 suffix. In other cases libstdc++ should mostly use __builtin_*f for std::bfloat16_t overloads (we have a problem with std::nextafter though but that one we have also for std::float16_t). 2022-10-14 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE. * tree.h (bfloat16_type_node): Define. * tree.cc (excess_precision_type): Promote bfloat16_type_mode like float16_type_mode. (build_common_tree_nodes): Initialize bfloat16_type_node if BFmode is supported. * expmed.h (maybe_expand_shift): Declare. * expmed.cc (maybe_expand_shift): No longer static. * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF conversions. If there is no optab, handle BF -> {DF,XF,TF,HF} conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add -ffast-math generic implementation for BF -> SF and SF -> BF conversions. * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New. * builtins.def (BUILT_IN_NANSF16B): New builtin. * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B. * config/i386/i386.cc (classify_argument): Handle E_BCmode. (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode for -msse2. (ix86_mangle_type): Mangle BFmode as DF16b. (ix86_invalid_conversion, ix86_invalid_unary_op, ix86_invalid_binary_op): Remove. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP): Don't redefine. * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove. (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than ix86_bf16_type_node, only create it if still NULL. * config/i386/i386-builtin-types.def (BFLOAT16): Likewise. * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node, predefine __BFLT16_*__ macros and for C++23 also __STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related macros for -fbuilding-libgcc. * c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote __bf16 to double. gcc/cp/ * cp-tree.h (extended_float_type_p): Return true for bfloat16_type_node. * typeck.cc (cp_compare_floating_point_conversion_ranks): Set extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bfloat16, check_effective_target_bfloat16_runtime, add_options_for_bfloat16): New. * gcc.dg/torture/bfloat16-basic.c: New test. * gcc.dg/torture/bfloat16-builtin.c: New test. * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test. * gcc.dg/torture/bfloat16-complex.c: New test. * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable from bfloat16-builtin-issignaling-1.c. * gcc.dg/torture/floatn-basic.h: Allow to be includable from bfloat16-basic.c. * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected diagnostics. * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise. * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise. * g++.target/i386/bfloat_cpp_typecheck.C: Likewise. libcpp/ * include/cpplib.h (CPP_N_BFLOAT16): Define. * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for C++. libgcc/ * config/i386/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf. (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c, CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add -msse2. * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,x,t,h}fbf2. * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define. * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define. * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define. * soft-fp/brain.h: New file. * soft-fp/truncsfbf2.c: New file. * soft-fp/truncdfbf2.c: New file. * soft-fp/truncxfbf2.c: New file. * soft-fp/trunctfbf2.c: New file. * soft-fp/trunchfbf2.c: New file. * soft-fp/truncbfhf2.c: New file. * soft-fp/extendbfsf2.c: New file. libiberty/ * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment. * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t entry. (cplus_demangle_type): Demangle DF16b. * testsuite/demangle-expected (_Z3xxxDF16b): New test.
author: Jakub Jelinek <jakub@redhat.com> 2022-10-14 09:37:01 +0200
committer: Jakub Jelinek <jakub@redhat.com> 2022-10-14 09:37:01 +0200
commit: c2565a31c1622ab0926aeef4a6579413e121b9f9 (patch)
tree: 0182fba3c78ebcdc1d59f6c1ca9605ee62da6fd2 /gcc/config
parent: 16ec267063c8ce60769888d4097bcd158410adc8 (diff)
download: gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.zip
gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.gz
gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.bz2
4 files changed, 94 insertions, 69 deletions
diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def
index 63a360b..2c27a4e 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
-DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
+DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index b91aba1..b5c651a 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
 tree ix86_float16_type_node = NULL_TREE;
-tree ix86_bf16_type_node = NULL_TREE;
 tree ix86_bf16_ptr_type_node = NULL_TREE;
 
 /* Retrieve an element from the above table, building some of
@@ -1372,16 +1371,18 @@ ix86_register_float16_builtin_type (void)
 static void
 ix86_register_bf16_builtin_type (void)
 {
-  ix86_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (ix86_bf16_type_node) = 16;
-  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
-  layout_type (ix86_bf16_type_node);
+  if (bfloat16_type_node == NULL_TREE)
+    {
+      bfloat16_type_node = make_node (REAL_TYPE);
+      TYPE_PRECISION (bfloat16_type_node) = 16;
+      SET_TYPE_MODE (bfloat16_type_node, BFmode);
+      layout_type (bfloat16_type_node);
+    }
 
   if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
     {
-      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
-					    "__bf16");
-      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
+      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
     }
 }
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ff4de2d..480db35 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -2423,6 +2423,7 @@ classify_argument (machine_mode mode, const_tree type,
       classes[1] = X86_64_SSEUP_CLASS;
       return 2;
     case E_HCmode:
+    case E_BCmode:
       classes[0] = X86_64_SSE_CLASS;
       if (!(bit_offset % 64))
 	return 1;
@@ -22428,7 +22429,7 @@ ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
      be defined by the C front-end for AVX512FP16 intrinsics.  We will
      issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
      enabled.  */
-  return ((mode == HFmode && TARGET_SSE2)
+  return (((mode == HFmode || mode == BFmode) && TARGET_SSE2)
 	  ? true
 	  : default_libgcc_floating_mode_supported_p (mode));
 }
@@ -22731,7 +22732,7 @@ ix86_mangle_type (const_tree type)
   switch (TYPE_MODE (type))
     {
     case E_BFmode:
-      return "u6__bf16";
+      return "DF16b";
     case E_HFmode:
       /* _Float16 is "DF16_".
 	 Align with clang's decision in https://reviews.llvm.org/D33719. */
@@ -22747,55 +22748,6 @@ ix86_mangle_type (const_tree type)
     }
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-ix86_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<__bf16%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<__bf16%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-ix86_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 static GTY(()) tree ix86_tls_stack_chk_guard_decl;
 
 static tree
@@ -24853,15 +24805,6 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE ix86_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
-
 #undef TARGET_STACK_PROTECT_GUARD
 #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8e84752..6688d92 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1644,6 +1644,48 @@
   DONE;
 })
 
+(define_expand "cbranchbf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:BF 1 "cmp_fp_expander_operand")
+		    (match_operand:BF 2 "cmp_fp_expander_operand")))
+   (set (pc) (if_then_else
+	      (match_operator 0 "comparison_operator"
+	       [(reg:CC FLAGS_REG)
+		(const_int 0)])
+	      (label_ref (match_operand 3))
+	      (pc)))]
+  ""
+{
+  rtx op1 = gen_lowpart (HImode, operands[1]);
+  if (CONST_INT_P (op1))
+    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[1], BFmode);
+  else
+    {
+      rtx t1 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t1, op1));
+      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+      op1 = gen_lowpart (SFmode, t1);
+    }
+  rtx op2 = gen_lowpart (HImode, operands[2]);
+  if (CONST_INT_P (op2))
+    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[2], BFmode);
+  else
+    {
+      rtx t2 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t2, op2));
+      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
+      op2 = gen_lowpart (SFmode, t2);
+    }
+  do_compare_rtx_and_jump (op1, op2, GET_CODE (operands[0]), 0,
+			   SFmode, NULL_RTX, NULL,
+			   as_a <rtx_code_label *> (operands[3]),
+			   /* Unfortunately this isn't propagated.  */
+			   profile_probability::even ());
+  DONE;
+})
+
 (define_expand "cstorehf4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:HF 2 "cmp_fp_expander_operand")
@@ -1659,6 +1701,45 @@
   DONE;
 })
 
+(define_expand "cstorebf4"
+  [(set (reg:CC FLAGS_REG)
+	(compare:CC (match_operand:BF 2 "cmp_fp_expander_operand")
+		    (match_operand:BF 3 "cmp_fp_expander_operand")))
+   (set (match_operand:QI 0 "register_operand")
+	(match_operator 1 "comparison_operator"
+	  [(reg:CC FLAGS_REG)
+	   (const_int 0)]))]
+  ""
+{
+  rtx op1 = gen_lowpart (HImode, operands[2]);
+  if (CONST_INT_P (op1))
+    op1 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[2], BFmode);
+  else
+    {
+      rtx t1 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t1, op1));
+      emit_insn (gen_ashlsi3 (t1, t1, GEN_INT (16)));
+      op1 = gen_lowpart (SFmode, t1);
+    }
+  rtx op2 = gen_lowpart (HImode, operands[3]);
+  if (CONST_INT_P (op2))
+    op2 = simplify_const_unary_operation (FLOAT_EXTEND, SFmode,
+					  operands[3], BFmode);
+  else
+    {
+      rtx t2 = gen_reg_rtx (SImode);
+      emit_insn (gen_zero_extendhisi2 (t2, op2));
+      emit_insn (gen_ashlsi3 (t2, t2, GEN_INT (16)));
+      op2 = gen_lowpart (SFmode, t2);
+    }
+  rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]),
+				   op1, op2, SFmode, 0, 1);
+  if (!rtx_equal_p (res, operands[0]))
+    emit_move_insn (operands[0], res);
+  DONE;
+})
+
 (define_expand "cstore<mode>4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
author	Jakub Jelinek <jakub@redhat.com>	2022-10-14 09:37:01 +0200
committer	Jakub Jelinek <jakub@redhat.com>	2022-10-14 09:37:01 +0200
commit	c2565a31c1622ab0926aeef4a6579413e121b9f9 (patch)
tree	0182fba3c78ebcdc1d59f6c1ca9605ee62da6fd2 /gcc/config
parent	16ec267063c8ce60769888d4097bcd158410adc8 (diff)
download	gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.zip gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.gz gcc-c2565a31c1622ab0926aeef4a6579413e121b9f9.tar.bz2