Age | Commit message (Collapse) | Author | Files | Lines |
|
There are 3 possible relations range-ops might care about, but only the one
most likely to be needed is supplied. This patch provides a new class
relation_trio which allows 3 relations to be passed in a single word.
fold_range (), op1_range (), and op2_range () are adjusted to take a
relation_trio class instead of a relation_kind, then the routine can
extract which relation it wants to work with.
* gimple-range-fold.cc (fold_using_range::range_of_range_op):
Provide relation_trio class.
* gimple-range-gori.cc (gori_compute::refine_using_relation):
Provide relation_trio class.
(gori_compute::refine_using_relation): Ditto.
(gori_compute::compute_operand1_range): Provide lhs_op2 and
op1_op2 relations via relation_trio class.
(gori_compute::compute_operand2_range): Ditto.
* gimple-range-op.cc (gimple_range_op_handler::calc_op1): Use
relation_trio instead of relation_kind.
(gimple_range_op_handler::calc_op2): Ditto.
(*::fold_range): Ditto.
* gimple-range-op.h (gimple_range_op::calc_op1): Adjust prototypes.
(gimple_range_op::calc_op2): Adjust prototypes.
* range-op-float.cc (*::fold_range): Use relation_trio instead of
relation_kind.
(*::op1_range): Ditto.
(*::op2_range): Ditto.
* range-op.cc (*::fold_range): Use relation_trio instead of
relation_kind.
(*::op1_range): Ditto.
(*::op2_range): Ditto.
* range-op.h (class range_operator): Adjust prototypes.
(class range_operator_float): Ditto.
(class range_op_handler): Adjust prototypes.
(relop_early_resolve): Pickup op1_op2 relation from relation_trio.
* value-relation.cc (VREL_LAST): Adjust use to be one past the end of
the enum.
(relation_oracle::validate_relation): Use relation_trio in call
to fold_range.
* value-relation.h (enum relation_kind_t): Add VREL_LAST as
final element.
(class relation_trio): New.
(TRIO_VARYING, TRIO_SHIFT, TRIO_MASK): New.
|
|
Calling clean_nan on an undefined type traps, set_varying first. Other
tweaks for correctness.
* range-op-float.cc (foperator_not_equal::op1_range): Check for
VREL_EQ after singleton.
(foperator_unordered::op1_range): Set VARYING before calling
clear_nan().
(foperator_ordered::op1_range): Set rather than clear NAN if both
operands are the same.
|
|
The oracle will not register nonssense/useless relations, class
value_relation shouldn't either.
* value-relation.cc (value_relation::dump): Change message.
* value-relation.h (value_relation::set_relation): If op1 is the
same as op2 do not create a relation.
|
|
For example, for "g++-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4", the recent
commit r13-3220-g45381d6f9f4e7b5c7b062f5ad8cc9788091c2d07
"amdgcn: add multiple vector sizes" broke the build:
In file included from [...]/source-gcc/gcc/coretypes.h:458:0,
from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
[...]/source-gcc/gcc/config/gcn/gcn.cc: In function ‘machine_mode VnMODE(int, machine_mode)’:
./insn-modes.h:42:71: error: temporary of non-literal type ‘scalar_int_mode’ in a constant expression
#define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
^
[...]/source-gcc/gcc/config/gcn/gcn.cc:405:10: note: in expansion of macro ‘QImode’
case QImode:
^
In file included from [...]/source-gcc/gcc/coretypes.h:478:0,
from [...]/source-gcc/gcc/config/gcn/gcn.cc:24:
[...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not literal because:
class scalar_int_mode
^
[...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not an aggregate, does not have a trivial default constructor, and has no constexpr constructor that is not a copy or move constructor
[...]
Addressing this like simiar issues have been addressed in the past.
gcc/
* config/gcn/gcn.cc (VnMODE): Use 'case E_QImode:' instead of
'case QImode:', etc.
|
|
Added in 2015 r229696 (commit 1b223a9f3489296c625bdb7cc764196d04fd9231)
"defer mark_addressable calls during expand till the end of expand",
it has never been used 'extern'ally.
gcc/
* gimple-expr.cc (mark_addressable_2): Tag as 'static'.
|
|
'libgomp.c/reverse-offload-sm30.c'
That is, '-mptx=_' is only valid in '-foffload-options=nvptx-none', too.
Fix test case added in recent
commit r13-2625-g6b43f556f392a7165582aca36a19fe7389d995b2 "nvptx/mkoffload.cc:
Warn instead of error when reverse offload is not possible".
libgomp/
* testsuite/libgomp.c/reverse-offload-sm30.c: Fix nvptx-specific
'-foffload-options' syntax.
|
|
The following picks up the prototype by Ju-Zhe Zhong for vectorizing
first order recurrences. That solves two TSVC missed optimization PRs.
There's a new scalar cycle def kind, vect_first_order_recurrence
and it's handling of the backedge value vectorization is complicated
by the fact that the vectorized value isn't the PHI but instead
a (series of) permute(s) shifting in the recurring value from the
previous iteration. I've implemented this by creating both the
single vectorized PHI and the series of permutes when vectorizing
the scalar PHI but leave the backedge values in both unassigned.
The backedge values are (for the testcases) computed by a load
which is also the place after which the permutes are inserted.
That placement also restricts the cases we can handle (without
resorting to code motion).
I added both costing and SLP handling though SLP handling is
restricted to the case where a single vectorized PHI is enough.
Missing is epilogue handling - while prologue peeling would
be handled transparently by adjusting iv_phi_p the epilogue
case doesn't work with just inserting a scalar LC PHI since
that a) keeps the scalar load live and b) that loads is the
wrong one, it has to be the last, much like when we'd vectorize
the LC PHI as live operation. Unfortunately LIVE
compute/analysis happens too early before we decide on
peeling. When using fully masked loop vectorization the
vect-recurr-6.c works as expected though.
I have tested this on x86_64 for now, but since epilogue
handling is missing there's probably no practical cases.
My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
just fine but I didn't feel like running SPEC within SDE nor
is the WHILE_ULT patch complete enough.
PR tree-optimization/99409
PR tree-optimization/99394
* tree-vectorizer.h (vect_def_type::vect_first_order_recurrence): Add.
(stmt_vec_info_type::recurr_info_type): Likewise.
(vectorizable_recurr): New function.
* tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New
function.
(vect_analyze_scalar_cycles_1): Look for first order
recurrences.
(vect_analyze_loop_operations): Handle them.
(vect_transform_loop): Likewise.
(vectorizable_recurr): New function.
(maybe_set_vectorized_backedge_value): Handle the backedge value
setting in the first order recurrence PHI and the permutes.
* tree-vect-stmts.cc (vect_analyze_stmt): Handle first order
recurrences.
(vect_transform_stmt): Likewise.
(vect_is_simple_use): Likewise.
(vect_is_simple_use): Likewise.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Likewise.
(vect_build_slp_tree_2): Likewise.
(vect_schedule_scc): Handle the backedge value setting in the
first order recurrence PHI and the permutes.
* gcc.dg/vect/vect-recurr-1.c: New testcase.
* gcc.dg/vect/vect-recurr-2.c: Likewise.
* gcc.dg/vect/vect-recurr-3.c: Likewise.
* gcc.dg/vect/vect-recurr-4.c: Likewise.
* gcc.dg/vect/vect-recurr-5.c: Likewise.
* gcc.dg/vect/vect-recurr-6.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s252.c: Un-XFAIL.
* gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s291.c: Likewise.
Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
On many architectures, there is a padding gap after the how array
member, and cfa_how can be moved there. This reduces the size of the
struct and the amount of memory that uw_frame_state_for has to clear.
There is no measurable performance benefit from this on x86-64 (even
though the memset goes from 120 to 112 bytes), but it seems to be a
good idea to do anyway.
libgcc/
* unwind-dw2.h (struct frame_state_reg_info): Move cfa_how member
and reduce its size.
|
|
libstdc++-v3/ChangeLog:
* include/std/charconv (__cpp_lib_constexpr_charconv): Define to
correct value.
* include/std/version (__cpp_lib_constexpr_charconv): Likewise.
* testsuite/20_util/to_chars/constexpr.cc: Check correct value.
* testsuite/20_util/to_chars/version.cc: Likewise.
|
|
gcc/ChangeLog:
* config/riscv/t-riscv: Change Tab into 2 space.
|
|
Hi, this patch fixed my mistake in the previous commit patch.
Since "mangle_builtin_type" is a global function will be called in riscv.cc.
It's reasonable move it down and put them together stay with other global functions.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc (mangle_builtin_type): Move down the function.
|
|
stdint.h is considered a freestanding headers by C, and a valid stdint.h
is required for certain parts of libstdc++' configuration, so we should
simply provide one when we have no other way (i.e. newlib or
user-specified sysroot) of getting one.
* config.gcc: --target=*-elf --without-{newlib,headers} should
provide stdint.h.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h:
(get_intel_cpu): Handle Meteorlake.
* common/config/i386/i386-common.cc:
(processor_alias_table): Add Meteorlake.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h:
(get_intel_cpu): Handle Raptorlake.
* common/config/i386/i386-common.cc:
(processor_alias_table): Add Raptorlake.
|
|
|
|
GCC does not allow a the operand of an autoinc addressing mode to
overlap with another soure operand in the same insn. This is primarly
enforced with insn conditions. However, cases can slip through LRA
and reload. To address those scenarios we'll take an idea from the
pdp11 port for describing the restriction in constraints as well.
To implement that we need register classes and constraints which are
"all general purpose hardware registers except r0". And similarly for
r1..r7(sp).
This patch adds those register classes and constraints, but does not
yet use them.
gcc/
* config/h8300/constraints.md (Z0..Z7): New register
constraints.
* config/h8300/h8300.h (reg_class): Add new classes.
(REG_CLASS_NAMES): Similarly.
(REG_CLASS_CONTENTS): Similarly.
|
|
I want to use Z as a multi-letter constraint. So first we have to
adjust the existing use of Z. This does not affect code generation.
gcc/
* config/h8300/constraints.md (Zz constraint): Renamed
from "z".
* config/h8300/movepush.md (movqi_h8sx, movhi_h8sx): Adjust
constraint to use Zz instead of Z.
|
|
gcc/
* config/h8300/h8300.cc (h8300_register_move_cost): Fix typo.
|
|
|
|
The only remaining use of print_raw is conditionally compiled, so when
libstdc++ i built without debug backtrace support, there's an unused
warning function for it. Move it inside the conditional block.
libstdc++-v3/ChangeLog:
* src/c++11/debug.cc (print_raw): Move inside #if block.
|
|
Some of the helper functions use static constexpr local variables, which
is not permitted in a core constant expression. Removing the 'static'
seems to have negligible performance effect for __to_chars and
__to_chars_16. For __from_chars_alnum_to_val removing the 'static'
causes a significant performance impact for base 36 conversions. Use a
consteval lambda instead.
libstdc++-v3/ChangeLog:
* include/bits/charconv.h (__to_chars_10_impl): Add constexpr
for C++23. Remove 'static' from array.
* include/std/charconv (__cpp_lib_constexpr_charconv): Define.
(__to_chars, __to_chars_16): Remove 'static' from array, add
constexpr.
(__to_chars_10, __to_chars_8, __to_chars_2, __to_chars_i)
(to_chars, __raise_and_add, __from_chars_pow2_base)
(__from_chars_alnum, from_chars): Add constexpr.
(__from_chars_alnum_to_val): Avoid local static during constant
evaluation. Add constexpr.
* include/std/version (__cpp_lib_constexpr_charconv): Define.
* testsuite/20_util/from_chars/constexpr.cc: New test.
* testsuite/20_util/to_chars/constexpr.cc: New test.
* testsuite/20_util/to_chars/version.cc: New test.
|
|
The _Std_pair concept uses in <bits/uses_allocator_args.h> handles const
qualified pairs, but not volatile qualified. That's because it just uses
__is_pair which is specialized for const pairs.
This removes the partial specialization __is_pair<const pair<T,U>>, so
that __is_pair is now only true for cv-unqualified pairs. Then _Std_pair
needs to explicitly use remove_cv_t for the argument to __is_pair.
The other use of __is_pair is in map::insert(Pair&&) which doesn't want
to handle volatile so should just use remove_const_t.
libstdc++-v3/ChangeLog:
* include/bits/stl_map.h (map::insert(Pair&&)): Use
remove_const_t on argument to __is_pair.
* include/bits/stl_pair.h (__is_pair<const pair<T,U>>): Remove
partial specialization.
* include/bits/uses_allocator_args.h (_Std_pair): Use
remove_cv_t as per LWG 3677.
* testsuite/20_util/uses_allocator/lwg3677.cc: New test.
|
|
|
|
C2x has, like C++, adopted rules for identifiers based directly on an
unversioned normative reference to Unicode. Make libcpp follow those
rules for c2x / gnu2x standards (this involves bringing back a flag
separate from the C++ one for whether to use these identifier rules,
but this time enabled for all C++ language versions since that was the
conclusion adopted for C++ identifier handling).
There is one change here that affects C++. I believe the new
normative requirement for NFC only applies to identifiers, not to the
use of identifier-continue characters in pp-numbers, where there is no
such requirement and so the diagnostic ought to be a warning not a
pedwarn in pp-numbers, and that this is the case for both C and C++.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libcpp/
* charset.cc (ucn_valid_in_identifier): Check xid_identifiers not
cplusplus to determine whether to use CXX23 and NXX23 flags.
* include/cpplib.h (struct cpp_options): Add xid_identifiers.
* init.cc (struct lang_flags, lang_defaults): Add xid_identifiers.
(cpp_set_lang): Set xid_identifiers.
* lex.cc (warn_about_normalization): Add parameter identifier.
Only pedwarn about non-NFC for identifiers, not pp-numbers.
(_cpp_lex_direct): Update calls to warn_about_normalization.
gcc/testsuite/
* gcc.dg/cpp/c2x-ucnid-1-utf8.c, gcc.dg/cpp/c2x-ucnid-1.c: New
tests.
|
|
gcc/fortran/ChangeLog:
PR fortran/100971
* resolve.cc (resolve_transfer): Extend check for permissibility
of polymorphic elements in a data transfer to arrays.
gcc/testsuite/ChangeLog:
PR fortran/100971
* gfortran.dg/der_io_5.f90: New test.
|
|
gcc/ChangeLog:
* value-range.cc (frange::set): Implement distinction between
HONOR_SIGNED_ZEROS and MODE_HAS_SIGNED_ZEROS.
|
|
copysign(MAGNITUDE, SIGN) is implemented as the absolute of MAGNITUDE,
with SIGN applied. If the sign of "SIGN" cannot be determined, we
return a range of [-MAGNITUDE, +MAGNITUDE].
gcc/ChangeLog:
* gimple-range-op.cc (class cfn_copysign): New.
(gimple_range_op_handler::maybe_builtin_call): Add
CFN_BUILT_IN_COPYSIGN*.
|
|
gcc/testsuite/
* gfortran.dg/c-interop/deferred-character-2.f90: Use 'dg-do run'.
|
|
[-Inf, -Inf] is being flushed to [-Inf, -0.0] because real_isdenormal
is being overly pessimistic. It is missing a check for rvc_normal.
This doesn't cause problems in real.cc because all uses of
real_isdenormal are already on the rvc_normal path. The uses in
value-range.cc however, are not.
This patch adds a check for rvc_normal.
gcc/ChangeLog:
* real.h (real_isdenormal): Check rvc_normal.
* value-range.cc (range_tests_floats): New test.
|
|
For a zero-sized static pool we can completely elide all code for the EH
pool.
We no longer need to adjust the static buffer size to ensure at least
one free_entry can be created in it, because we no longer use a static
buffer at all if obj_count == 0. If the buffer exists, obj_count >= 1
and the buffer will be much larger than sizeof(free_entry).
libstdc++-v3/ChangeLog:
* libsupc++/eh_alloc.cc [USE_POOL]: New macro.
[!USE_POOL] (__gnu_cxx::__freeres, pool): Do not define.
[_GLIBCXX_EH_POOL_STATIC] (pool::arena): Do not use std::max.
(__cxxabiv1::__cxa_allocate_exception) [!USE_POOL]: Do not use
pool.
(__cxxabiv1::__cxa_free_exception) [!USE_POOL]: Likewise.
(__cxxabiv1::__cxa_allocate_dependent_exception) [!USE_POOL]:
Likewise.
(__cxxabiv1::__cxa_free_dependent_exception) [!USE_POOL]:
Likewise.
|
|
Replace two uses of print_raw where it's clearer to just use fprintf
directly. Then the only remaining use of print_raw is as the print_func
argument of pretty_print. When called by pretty_print the count is
either a positive integer or -1, so we can simplify print_raw itself.
Remove the default argument, because it's never used. Remove the check
for nbc == 0, which never happens (but would be harmless if it did).
Replace the conditional expression with a single call to fprintf, using
INT_MAX as the maximum length.
libstdc++-v3/ChangeLog:
* src/c++11/debug.cc (print_raw): Simplify.
(print_word): Print indentation by calling fprintf directly.
(_Error_formatter::_M_error): Print unindented string by calling
fprintf directly.
|
|
gcc/ChangeLog:
* gimple-range-op.cc
(gimple_range_op_handler::maybe_builtin_call): Replace
CFN_BUILTIN_SIGNBIT* cases with CASE_FLT_FN.
|
|
[-Inf, +Inf] was being chopped correctly for -ffinite-math-only, but
[-Inf, -Inf] was not. This was latent because a bug in
real_isdenormal is causing us to flush -Inf to zero.
gcc/ChangeLog:
* value-range.cc (frange::set): Normalize ranges for both bounds.
|
|
Similar to what we do for NANs when !HONOR_NANS and Inf when
flag_finite_math_only, we can remove -0.0 from the range at creation
time.
We were kinda sorta doing this because there is a bug in
real_isdenormal that is causing flush_denormals_to_zero to saturate
[x, -0.0] to [x, +0.0] when !HONOR_SIGNED_ZEROS. Fixing this bug
(upcoming), causes us to leave -0.0 in places where we aren't
expecting it (the intersection code).
gcc/ChangeLog:
* value-range.cc (frange::set): Drop -0.0 for !HONOR_SIGNED_ZEROS.
|
|
The FUNCTION_DECL we build for __dynamic_cast has an empty DECL_CONTEXT
but trees_out::tree_node expects FUNCTION_DECLs to have non-empty
DECL_CONTEXT, thus we crash when streaming out the dynamic_cast in the
below testcase.
This patch naively fixes this by setting DECL_CONTEXT for __dynamic_cast
appropriately. I suppose we should push it into the namespace too, like
we do for __cxa_atexit which is similarly lazily declared.
PR c++/106304
gcc/cp/ChangeLog:
* constexpr.cc (cxx_dynamic_cast_fn_p): Check for abi_node
instead of global_namespace.
* rtti.cc (build_dynamic_cast_1): Set DECL_CONTEXT and
DECL_SOURCE_LOCATION when building dynamic_cast_node. Push
it into the namespace.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr106304_a.C: New test.
* g++.dg/modules/pr106304_b.C: New test.
|
|
gcc/ChangeLog:
* gimple-range-op.cc
(gimple_range_op_handler::maybe_builtin_call): Add
CFN_BUILT_IN_SIGNBIT[FL]* entries.
|
|
The following fixes an omission from adding SLP permute nodes which
is live lanes originating from those. We have to check that we
can extract the lane and have to actually code generate them.
PR tree-optimization/107254
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
For permutes also analyze live lanes.
(vect_schedule_slp_node): For permutes also code generate
live lane extracts.
* gfortran.dg/vect/pr107254.f90: New testcase.
|
|
This is the infamous PR rtl-optimization/38644 rearing its ugly head for
leaf functions on SPARC more than a decade later... Richard E.'s generic
solution has never been implemented so let's do as other RISC back-ends did.
gcc/
PR target/107248
* config/sparc/sparc.cc (sparc_expand_prologue): Emit a frame
blockage for leaf functions.
(sparc_flat_expand_prologue): Emit frame instead of full blockage.
(sparc_expand_epilogue): Emit a frame blockage for leaf functions.
(sparc_flat_expand_epilogue): Emit frame instead of full blockage.
|
|
This makes the comment easier to read in the source, without altering
the Doxygen output.
libstdc++-v3/ChangeLog:
* include/std/iostream: Use markdown in Doxygen comment.
|
|
Add a test to catch regression in line counts for labels on top of
then/else blocks. Only the 'goto <label>' should contribute to the line
counter for the label, not the if.
gcc/testsuite/ChangeLog:
* gcc.misc-tests/gcov-4.c: New testcase.
|
|
The coverage support will under some conditions decide to split edges to
accurately report coverage. By running the test suite with/without this
edge splitting a small diff shows up, addressed by this patch, which
should catch future regressions.
Removing the edge splitting:
$ diff --git a/gcc/profile.cc b/gcc/profile.cc
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
Don't do that when the locuses match, so
if (blah) goto something;
is not computed twice. */
- if (last
- && gimple_has_location (last)
- && !RESERVED_LOCATION_P (e->goto_locus)
- && !single_succ_p (bb)
- && (LOCATION_FILE (e->goto_locus)
- != LOCATION_FILE (gimple_location (last))
- || (LOCATION_LINE (e->goto_locus)
- != LOCATION_LINE (gimple_location (last)))))
- {
- basic_block new_bb = split_edge (e);
- edge ne = single_succ_edge (new_bb);
- ne->goto_locus = e->goto_locus;
- }
+
if ((e->flags & (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL))
&& e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
need_exit_edge = 1;
Assuming the .gcov files from make chec-gcc RUNTESTFLAGS=gcov.exp are
kept:
$ diff -r no-split-edge with-split-edge | grep -C 2 -E "^[<>]\s\s"
diff -r sans-split-edge/gcc/gcov-4.c.gcov with-split-edge/gcc/gcov-4.c.gcov
228c228
< -: 224: break;
---
> 1: 224: break;
231c231
< -: 227: break;
---
> #####: 227: break;
237c237
< -: 233: break;
---
> 2: 233: break;
gcc/testsuite/ChangeLog:
* g++.dg/gcov/gcov-1.C: Add line count check.
* gcc.misc-tests/gcov-4.c: Likewise.
|
|
Here is a complete patch to add std::bfloat16_t support on
x86 (AArch64 and ARM left for later). Almost no BFmode optabs
are added by the patch, so for binops/unops it extends to SFmode
first and then truncates back to BFmode.
For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
of all those conversions so that we avoid double rounding, for
BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
it emits BFmode -> SFmode conversion first and then converts to the even
wider mode, neither step should be imprecise.
For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
and then SFmode -> HFmode, because neither format is subset or superset
of the other, while SFmode is superset of both.
expr.cc then contains a -ffast-math optimization of the BF -> SF and
SF -> BF conversions if we don't optimize for space (and for the latter
if -frounding-math isn't enabled either).
For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
|| !flag_unsafe_math_optimizations, because I think the insn doesn't
raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
By default (unless x86 -fexcess-precision=16) we use float excess
precision for BFmode, so truncate only on explicit casts and assignments.
The patch introduces a single __bf16 builtin - __builtin_nansf16b,
because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN,
and uses f16b suffix instead of bf16 because there would be ambiguity on
log vs. logb - __builtin_logbf16 could be either log with bf16 suffix
or logb with f16 suffix. In other cases libstdc++ should mostly use
__builtin_*f for std::bfloat16_t overloads (we have a problem with
std::nextafter though but that one we have also for std::float16_t).
2022-10-14 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
* tree.h (bfloat16_type_node): Define.
* tree.cc (excess_precision_type): Promote bfloat16_type_mode
like float16_type_mode.
(build_common_tree_nodes): Initialize bfloat16_type_node if
BFmode is supported.
* expmed.h (maybe_expand_shift): Declare.
* expmed.cc (maybe_expand_shift): No longer static.
* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
conversions. If there is no optab, handle BF -> {DF,XF,TF,HF}
conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
-ffast-math generic implementation for BF -> SF and SF -> BF
conversions.
* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
* builtins.def (BUILT_IN_NANSF16B): New builtin.
* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
* config/i386/i386.cc (classify_argument): Handle E_BCmode.
(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
for -msse2.
(ix86_mangle_type): Mangle BFmode as DF16b.
(ix86_invalid_conversion, ix86_invalid_unary_op,
ix86_invalid_binary_op): Remove.
(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
TARGET_INVALID_BINARY_OP): Don't redefine.
* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
ix86_bf16_type_node, only create it if still NULL.
* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
predefine __BFLT16_*__ macros and for C++23 also
__STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related
macros for -fbuilding-libgcc.
* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
gcc/c/
* c-typeck.cc (convert_arguments): Don't promote __bf16 to
double.
gcc/cp/
* cp-tree.h (extended_float_type_p): Return true for
bfloat16_type_node.
* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_bfloat16,
check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
New.
* gcc.dg/torture/bfloat16-basic.c: New test.
* gcc.dg/torture/bfloat16-builtin.c: New test.
* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
* gcc.dg/torture/bfloat16-complex.c: New test.
* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
from bfloat16-builtin-issignaling-1.c.
* gcc.dg/torture/floatn-basic.h: Allow to be includable from
bfloat16-basic.c.
* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
diagnostics.
* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
libcpp/
* include/cpplib.h (CPP_N_BFLOAT16): Define.
* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
C++.
libgcc/
* config/i386/t-softfp (softfp_extensions): Add bfsf.
(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
-msse2.
* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
* soft-fp/brain.h: New file.
* soft-fp/truncsfbf2.c: New file.
* soft-fp/truncdfbf2.c: New file.
* soft-fp/truncxfbf2.c: New file.
* soft-fp/trunctfbf2.c: New file.
* soft-fp/trunchfbf2.c: New file.
* soft-fp/truncbfhf2.c: New file.
* soft-fp/extendbfsf2.c: New file.
libiberty/
* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
entry.
(cplus_demangle_type): Demangle DF16b.
* testsuite/demangle-expected (_Z3xxxDF16b): New test.
|
|
PR87390]
The following incremental patch implements the C11 behavior (for all C++
versions) for
cond ? int : float
cond ? float : int
int cmp float
float cmp int
where int is any integral type, float any floating point type with
excess precision and cmp ==, !=, >, <, >=, <= and <=>.
2022-10-14 Jakub Jelinek <jakub@redhat.com>
PR c/82071
PR c/87390
PR c++/107097
gcc/cp/
* cp-tree.h (cp_ep_convert_and_check): Remove.
* cvt.cc (cp_ep_convert_and_check): Remove.
* call.cc (build_conditional_expr): Use excess precision for ?: with
one arm floating and another integral. Don't convert first to
semantic result type from integral types.
(convert_like_internal): Don't call cp_ep_convert_and_check, instead
just strip EXCESS_PRECISION_EXPR before calling cp_convert_and_check
or cp_convert.
* typeck.cc (cp_build_binary_op): Set may_need_excess_precision
for comparisons or SPACESHIP_EXPR with at least one operand integral.
Don't compute semantic_result_type if build_type is non-NULL. Call
cp_convert_and_check instead of cp_ep_convert_and_check.
gcc/testsuite/
* gcc.target/i386/excess-precision-8.c: For C++ wrap abort and
exit declarations into extern "C" block.
* gcc.target/i386/excess-precision-10.c: Likewise.
* g++.target/i386/excess-precision-7.C: Remove.
* g++.target/i386/excess-precision-8.C: New test.
* g++.target/i386/excess-precision-9.C: Remove.
* g++.target/i386/excess-precision-10.C: New test.
* g++.target/i386/excess-precision-12.C: New test.
|
|
The following patch implements excess precision support for C++.
Like for C, it uses EXCESS_PRECISION_EXPR tree to say that its operand
is evaluated in excess precision and what the semantic type of the
expression is.
In most places I've followed what the C FE does in similar spots, so
e.g. for binary ops if one or both operands are already
EXCESS_PRECISION_EXPR, strip those away or for operations that might need
excess precision (+, -, *, /) check if the operands should use excess
precision and convert to that type and at the end wrap into
EXCESS_PRECISION_EXPR with the common semantic type.
This patch follows the C99 handling where it differs from C11 handling.
There are some cases which needed to be handled differently, the C FE can
just strip EXCESS_PRECISION_EXPR (replace it with its operand) when handling
explicit cast, but that IMHO isn't right for C++ - the discovery what exact
conversion should be used (e.g. if user conversion or standard or their
sequence) should be decided based on the semantic type (i.e. type of
EXCESS_PRECISION_EXPR), and that decision continues in convert_like* where
we pick the right user conversion, again, if say some class has ctor
from double and long double and we are on ia32 with standard excess
precision promoting float/double to long double, then we should pick the
ctor from double. Or when some other class has ctor from just double,
and EXCESS_PRECISION_EXPR semantic type is float, we should choose the
user ctor from double, but actually just convert the long double excess
precision to double and not to float first. We need to make sure
even identity conversion converts from excess precision to the semantic one
though, but if identity is chained with other conversions, we don't want
the identity next_conversion to drop to semantic precision only to widen
afterwards.
The existing testcases tweaks were for cases on i686-linux where excess
precision breaks those tests, e.g. if we have
double d = 4.2;
if (d == 4.2)
then it does the expected thing only with -fexcess-precision=fast,
because with -fexcess-precision=standard it is actually
double d = 4.2;
if ((long double) d == 4.2L)
where 4.2L is different from 4.2. I've added -fexcess-precision=fast
to some tests and changed other tests to use constants that are exactly
representable and don't suffer from these excess precision issues.
There is one exception, pr68180.C looks like a bug in the patch which is
also present in the C FE (so I'd like to get it resolved incrementally
in both). Reduced testcase:
typedef float __attribute__((vector_size (16))) float32x4_t;
float32x4_t foo(float32x4_t x, float y) { return x + y; }
with -m32 -std=c11 -Wno-psabi or -m32 -std=c++17 -Wno-psabi
it is rejected with:
pr68180.c:2:52: error: conversion of scalar ‘long double’ to vector ‘float32x4_t’ {aka ‘__vector(4) float’} involves truncation
but without excess precision (say just -std=c11 -Wno-psabi or -std=c++17 -Wno-psabi)
it is accepted. Perhaps we should pass down the semantic type to
scalar_to_vector and use the semantic type rather than excess precision type
in the diagnostics.
2022-10-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/323
PR c++/107097
gcc/
* doc/invoke.texi (-fexcess-precision=standard): Mention that the
option now also works in C++.
gcc/c-family/
* c-common.def (EXCESS_PRECISION_EXPR): Remove comment part about
the tree being specific to C/ObjC.
* c-opts.cc (c_common_post_options): Handle flag_excess_precision
in C++ the same as in C.
* c-lex.cc (interpret_float): Set const_type to excess_precision ()
even for C++.
gcc/cp/
* parser.cc (cp_parser_primary_expression): Handle
EXCESS_PRECISION_EXPR with REAL_CST operand the same as REAL_CST.
* cvt.cc (cp_ep_convert_and_check): New function.
* call.cc (build_conditional_expr): Add excess precision support.
When type_after_usual_arithmetic_conversions returns error_mark_node,
use gcc_checking_assert that it is because of uncomparable floating
point ranks instead of checking all those conditions and make it
work also with complex types.
(convert_like_internal): Likewise. Add NESTED_P argument, pass true
to recursive calls to convert_like.
(convert_like): Add NESTED_P argument, pass it through to
convert_like_internal. For other overload pass false to it.
(convert_like_with_context): Pass false to NESTED_P.
(convert_arg_to_ellipsis): Add excess precision support.
(magic_varargs_p): For __builtin_is{finite,inf,inf_sign,nan,normal}
and __builtin_fpclassify return 2 instead of 1, document what it
means.
(build_over_call): Don't handle former magic 2 which is no longer
used, instead for magic 1 remove EXCESS_PRECISION_EXPR.
(perform_direct_initialization_if_possible): Pass false to NESTED_P
convert_like argument.
* constexpr.cc (cxx_eval_constant_expression): Handle
EXCESS_PRECISION_EXPR.
(potential_constant_expression_1): Likewise.
* pt.cc (tsubst_copy, tsubst_copy_and_build): Likewise.
* cp-tree.h (cp_ep_convert_and_check): Declare.
* cp-gimplify.cc (cp_fold): Handle EXCESS_PRECISION_EXPR.
* typeck.cc (cp_common_type): For COMPLEX_TYPEs, return error_mark_node
if recursive call returned it.
(convert_arguments): For magic 1 remove EXCESS_PRECISION_EXPR.
(cp_build_binary_op): Add excess precision support. When
cp_common_type returns error_mark_node, use gcc_checking_assert that
it is because of uncomparable floating point ranks instead of checking
all those conditions and make it work also with complex types.
(cp_build_unary_op): Likewise.
(cp_build_compound_expr): Likewise.
(build_static_cast_1): Remove EXCESS_PRECISION_EXPR.
gcc/testsuite/
* gcc.target/i386/excess-precision-1.c: For C++ wrap abort and
exit declarations into extern "C" block.
* gcc.target/i386/excess-precision-2.c: Likewise.
* gcc.target/i386/excess-precision-3.c: Likewise. Remove
check_float_nonproto and check_double_nonproto tests for C++.
* gcc.target/i386/excess-precision-7.c: For C++ wrap abort and
exit declarations into extern "C" block.
* gcc.target/i386/excess-precision-9.c: Likewise.
* g++.target/i386/excess-precision-1.C: New test.
* g++.target/i386/excess-precision-2.C: New test.
* g++.target/i386/excess-precision-3.C: New test.
* g++.target/i386/excess-precision-4.C: New test.
* g++.target/i386/excess-precision-5.C: New test.
* g++.target/i386/excess-precision-6.C: New test.
* g++.target/i386/excess-precision-7.C: New test.
* g++.target/i386/excess-precision-9.C: New test.
* g++.target/i386/excess-precision-11.C: New test.
* c-c++-common/dfp/convert-bfp-10.c: Add -fexcess-precision=fast
as dg-additional-options.
* c-c++-common/dfp/compare-eq-const.c: Likewise.
* g++.dg/cpp1z/constexpr-96862.C: Likewise.
* g++.dg/cpp1z/decomp12.C (main): Use 2.25 instead of 2.3 to
avoid excess precision differences.
* g++.dg/other/thunk1.C: Add -fexcess-precision=fast
as dg-additional-options.
* g++.dg/vect/pr64410.cc: Likewise.
* g++.dg/cpp1y/pr68180.C: Likewise.
* g++.dg/vect/pr89653.cc: Likewise.
* g++.dg/cpp0x/variadic-tuple.C: Likewise.
* g++.dg/cpp0x/nsdmi-union1.C: Use 4.25 instead of 4.2 to
avoid excess precision differences.
* g++.old-deja/g++.brendan/copy9.C: Add -fexcess-precision=fast
as dg-additional-options.
* g++.old-deja/g++.brendan/overload7.C: Likewise.
|
|
Implement the C2x feature of storage class specifiers in compound
literals. Such storage class specifiers (static, register or
thread_local; also constexpr, but we don't yet have C2x constexpr
support implemented) can be used before the type name (not mixed with
type specifiers, unlike in declarations) and have the same semantics
and constraints as for declarations of named objects. Also allow GNU
__thread to be used, given that thread_local can be.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-decl.cc (build_compound_literal): Add parameter scspecs.
Handle storage class specifiers.
* c-parser.cc (c_token_starts_compound_literal)
(c_parser_compound_literal_scspecs): New.
(c_parser_postfix_expression_after_paren_type): Add parameter
scspecs. Call pedwarn_c11 for use of storage class specifiers.
Update call to build_compound_literal.
(c_parser_cast_expression, c_parser_sizeof_expression)
(c_parser_alignof_expression): Handle storage class specifiers for
compound literals. Update calls to
c_parser_postfix_expression_after_paren_type.
(c_parser_postfix_expression): Update syntax comment.
* c-tree.h (build_compound_literal): Update prototype.
* c-typeck.cc (c_mark_addressable): Diagnose taking address of
register compound literal.
gcc/testsuite/
* gcc.dg/c11-complit-1.c, gcc.dg/c11-complit-2.c,
gcc.dg/c11-complit-3.c, gcc.dg/c2x-complit-2.c,
gcc.dg/c2x-complit-3.c, gcc.dg/c2x-complit-4.c,
gcc.dg/c2x-complit-5.c, gcc.dg/c2x-complit-6.c,
gcc.dg/c2x-complit-7.c, gcc.dg/c90-complit-2.c,
gcc.dg/gnu2x-complit-1.c, gcc.dg/gnu2x-complit-2.c: New tests.
|
|
|
|
If you compile the testcase with -O2 -fno-inline -Wall, you get:
In function 'process_array3':
cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [-
Wstringop-overflow=]
cc1: note: referencing argument 1 of type 'char[4]'
t.c:6:6: note: in a call to function 'process_array4'
6 | void process_array4 (char a[4], int n)
| ^~~~~~~~~~~~~~
cc1: warning: 'process_array4' accessing 4 bytes in a region of size 3 [-
Wstringop-overflow=]
cc1: note: referencing argument 1 of type 'char[4]'
t.c:6:6: note: in a call to function 'process_array4'
That's because the ICF IPA pass has identified the two functions and turned
process_array3 into a wrapper of process_array4.
gcc/
* gimple-ssa-warn-access.cc (pass_waccess::check_call): Return
early for calls made from thunks.
gcc/testsuite/
* gcc.dg/Wstringop-overflow-89.c: New test.
|
|
Split out from the C++ contracts patch.
gcc/cp/ChangeLog:
* cp-tree.h: Fix whitespace.
* parser.h: Fix whitespace.
* decl.cc: Fix whitespace.
* parser.cc: Fix whitespace.
* pt.cc: Fix whitespace.
|
|
gcc/analyzer/ChangeLog:
PR analyzer/107210
* svalue.cc (constant_svalue::maybe_fold_bits_within): Only
attempt to extract individual bits when tree_fits_uhwi_p.
gcc/testsuite/ChangeLog:
PR analyzer/107210
* gfortran.dg/analyzer/pr107210.f90: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Fortranized testcases of commits r13-3257-ga58a965eb73
and r13-3258-g0ec4e93fb9f.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/task-7.f90: New test.
* testsuite/libgomp.fortran/task-8.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-1.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-2.f90: New test.
* testsuite/libgomp.fortran/task-in-explicit-3.f90: New test.
* testsuite/libgomp.fortran/task-reduction-17.f90: New test.
* testsuite/libgomp.fortran/task-reduction-18.f90: New test.
|