Age | Commit message (Collapse) | Author | Files | Lines |
|
The read-thread-pointer test may require the gcc configured
with --enable-tls. If no, there x4 (aka tp) register will not
be presented in the assembly code.
This patch requires the tls for the dg checking. It will perform
the test checking if --enable-tls and mark the test as unsupported
if --disable-tls.
Configured with --enable-tls:
=== gcc Summary ===
of expected passes 16
Configured with --disable-tls:
=== gcc Summary ===
of unsupported tests 8
gcc/testsuite/ChangeLog:
* gcc.target/riscv/read-thread-pointer.c: Add required tls.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
In the early PHIOPT mode, the original minmax_replacement, would
replace a PHI node with up to 2 min/max expressions in some cases,
this allows for that too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (phiopt_early_allow): Allow for
up to 2 min/max expressions in the sequence/match code.
|
|
While looking into moving optimizations from minmax_replacement
in phiopt to match.pd, I Noticed that min/max were considered
trapping even if -ffinite-math-only was being used. This changes
those expressions to be similar as comparisons so that they are
not considered trapping if -ffinite-math-only is on.
OK? Bootstrapped and tested with no regressions on x86_64-linux-gnu.
gcc/ChangeLog:
* rtlanal.cc (may_trap_p_1): Treat SMIN/SMAX similar as
COMPARISON.
* tree-eh.cc (operation_could_trap_helper_p): Treate
MIN_EXPR/MAX_EXPR similar as other comparisons.
|
|
This simple patch moves the body of store_elim_worker
direclty into pass_cselim::execute.
Also removes unneeded prototypes too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (cond_store_replacement): Remove
prototype.
(cond_if_else_store_replacement): Likewise.
(get_non_trapping): Likewise.
(store_elim_worker): Move into ...
(pass_cselim::execute): This.
|
|
Now that store elimination and phiopt does not
share outer code, we can move tree_ssa_phiopt_worker
directly into pass_phiopt::execute and remove
many declarations (prototypes) from the file.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (two_value_replacement): Remove
prototype.
(match_simplify_replacement): Likewise.
(factor_out_conditional_conversion): Likewise.
(value_replacement): Likewise.
(minmax_replacement): Likewise.
(spaceship_replacement): Likewise.
(cond_removal_in_builtin_zero_pattern): Likewise.
(hoist_adjacent_loads): Likewise.
(tree_ssa_phiopt_worker): Move into ...
(pass_phiopt::execute): this.
|
|
Since the last cleanups, it made easier to see
that we should split out the store elimination
worker from tree_ssa_phiopt_worker function.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove
do_store_elim argument and split that part out to ...
(store_elim_worker): This new function.
(pass_cselim::execute): Call store_elim_worker.
(pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.
|
|
I'm at Ventana now. Change my email address accordingly. Also, add
myself to the DCO list.
ChangeLog:
* MAINTAINERS: Change my email address.
|
|
I noticed this after adding sanity check that the upper bound on number
of iterations never drop to -1. It seems to be relatively common case
(happening few hundred times in testsuite and also during bootstrap)
that loop-ch duplicates enough so the loop itself no longer loops.
This is later detected in loop unrolling but since we test the number
of iterations anyway it seems better to do that earlier.
* cfgloopmanip.h (unloop_loops): Export.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Unloop loops
that no longer loop.
* tree-ssa-loop-ivcanon.cc (unloop_loops): Export; do not free
vectors of loops to unloop.
(canonicalize_induction_variables): Free vectors here.
(tree_unroll_loops_completely): Free vectors here.
|
|
The following generalizes the range-op for __builtin_expect
by using the fnspec machinery.
PR tree-optimization/109170
* gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call):
Handle __builtin_expect and similar via cfn_pass_through_arg1
and inspecting the calls fnspec.
* builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
and BUILT_IN_EXPECT_WITH_PROBABILITY.
|
|
There are still shells on some systems that lack the ability to start
scripts when not using the shell name explicitly. Adjust genmultilib
to use ${CONFIG_SHELL-/bin/sh} the same way configure does.
for gcc/ChangeLog
* genmultilib: Use CONFIG_SHELL to run sub-scripts.
|
|
The old legacy code would allow building ranges containing symbolics,
even though the entire ranger ecosystem does not handle them. These
were normalized into non-zero ranges by helper functions in VRP
(range_fold_*_expr) before calling the ranger. The only users of
these functions should have been legacy VRP, which is no more.
However, a handful of users crept into IPA, even though these
functions shouldn't never been called outside of VRP or vr-values.
The issue here is that IPA is building a range of [&foo, &foo] and
expecting range_fold_binary to normalize it to non-zero. Fixed by
adding a helper function before calling the range_op handler.
I think these covers the problematic ranges. If not, I'll come up
with something more generalized that does not involve polluting
irange::set with the normalization code. After all, this only
involves a handful of IPA places.
I've also added an assert in irange::set() making it easier to detect
any possible fallout without having to drill deep into the setter.
gcc/ChangeLog:
PR tree-optimization/109639
* ipa-cp.cc (ipa_value_range_from_jfunc): Normalize range.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
* ipa-prop.h (ipa_range_set_and_normalize): New.
* value-range.cc (irange::set): Assert min and max are INTEGER_CST.
|
|
When we simplify a BIT_FIELD_REF of a CTOR like { _1, _2, _3, _4 }
and attempt to produce (view converted) { _1, _2 } for a selected
subset we fail to realize this cannot be done from match.pd since
we have no way to write the resulting CTOR "operation" and the
built CTOR { _1, _2 } isn't a GIMPLE value.
This kind of simplifications have to be done in forwprop (or would
need a match.pd syntax extension) where we can split out the CTOR
to a separate stmt.
The following disables this particular simplification when we are
simplifying GIMPLE. With enhanced IL checking this otherwise
causes ICEs in the testsuite from vectorized code.
* match.pd (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2): Do not
create a CTOR operand in the result when simplifying GIMPLE.
|
|
When for example complex lowering wants to extract the imaginary
part of a complex variable for lowering a complex move we can
end up with it generating __imag <VIEW_CONVERT_EXPR <_22> > which
is valid GENERIC. It then feeds that to the gimplifier via
force_gimple_operand but that fails to split up this chain
of handled components, generating invalid GIMPLE catched by
verification when PR109644 is fixed.
The following rectifies this by noting in gimplify_compound_lval
when the base object which we gimplify first ends up being a
register.
* gimplify.cc (gimplify_compound_lval): When the base
gimplified to a register make sure to split up chains
of operations.
|
|
The following addresses IPA param manipulation (through IPA SRA)
replacing
BIT_FIELD_REF <*this_8(D), 8, 56>
with
BIT_FIELD_REF <VIEW_CONVERT_EXPR<const struct profile_count>(ISRA.814), 8, 56>
which is supposed to be invalid GIMPLE (ISRA.814 is a register).
There's currently insufficient checking in place to catch this in the
IL verifier but I am working on that as part of fixing PR109594.
The solution for the particular testcase I am running into this is
to split the conversion to a separate stmt. Generally the modification
phase is set up for this but the extra_stmts sequence isn't passed
around everywhere. The following passes it to modify_expression
from modify_assignment when rewriting the RHS.
PR ipa/109607
* ipa-param-manipulation.h
(ipa_param_body_adjustments::modify_expression): Add extra_stmts
argument.
* ipa-param-manipulation.cc
(ipa_param_body_adjustments::modify_expression): Likewise.
When we need a conversion and the replacement is a register
split the conversion out.
(ipa_param_body_adjustments::modify_assignment): Pass
extra_stmts to RHS modify_expression.
* g++.dg/torture/pr109607.C: New testcase.
|
|
libstdc++-v3/ChangeLog:
* include/bits/mofunc_impl.h: Fix typo in doxygen comment.
* include/std/format: Likewise.
|
|
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (FORMULA_TRANSPARENT, DOT_FONTNAME)
(DOT_FONTSIZE, DOT_TRANSPARENT): Remove obsolete options.
|
|
Including the header source code in the doxygen-generated PDF file makes
it too large, and causes pdflatex to run out of memory. If we only set
SOURCE_BROWSER=YES for the HTML docs then we won't include the sources
in the PDF file.
There are several macros defined for std::valarray that are only used to
generate repetitive code and then #undef'd. Those aren't useful in the
doxygen docs, especially the ones that reuse the same name in different
files. Omitting them avoids warnings about duplicate labels in the
refman.tex file.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (SOURCE_BROWSER): Only set to YES for
HTML docs.
* include/bits/gslice_array.h (_DEFINE_VALARRAY_OPERATOR): Omit
from doxygen docs.
* include/bits/indirect_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/mask_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/bits/slice_array.h (_DEFINE_VALARRAY_OPERATOR):
Likewise.
* include/std/valarray (_DEFINE_VALARRAY_UNARY_OPERATOR)
(_DEFINE_VALARRAY_AUGMENTED_ASSIGNMENT)
(_DEFINE_VALARRAY_EXPR_AUGMENTED_ASSIGNMENT)
(_DEFINE_BINARY_OPERATOR): Likewise.
|
|
libstdc++-v3/ChangeLog:
* include/bits/memory_resource.h: Improve doxygen comments.
* include/std/memory_resource: Likewise.
|
|
libstdc++-v3/ChangeLog:
PR libstdc++/40380
* include/bits/basic_string.h: Improve doxygen comments.
* include/bits/cow_string.h: Likewise.
* include/bits/forward_list.h: Likewise.
* include/bits/fs_dir.h: Likewise.
* include/bits/fs_path.h: Likewise.
* include/bits/quoted_string.h: Likewise.
* include/bits/stl_bvector.h: Likewise.
* include/bits/stl_map.h: Likewise.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_vector.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/std/filesystem: Likewise.
* include/std/iomanip: Likewise.
|
|
This changes std::random_device constructors to throw std::system_error
(with EINVAL as the error code) when the constructor argument is
invalid. We can also throw std::system_error when read(2) fails so that
the exception includes the additional information provided by errno.
As noted in the PR, this is consistent with libc++, and doesn't break
any existing code which catches std::runtime_error, because those
handlers will still catch std::system_error.
libstdc++-v3/ChangeLog:
PR libstdc++/105081
* src/c++11/random.cc (__throw_syserr): New function.
(random_device::_M_init, random_device::_M_init_pretr1): Use new
function for bad tokens.
(random_device::_M_getval): Use new function for read errors.
* testsuite/util/testsuite_random.h (random_device_available):
Change catch handler to use std::system_error.
|
|
On the following testcase we ICE, because after we emit the
variable-sized object may not be initialized except with an empty initializer
error we don't really reset the initializer to error_mark_node and then at
-Wformat checking time we ICE on seeing STRING_CST initializer for a VLA.
The following patch just arranges for error_mark_node to be returned after
the error diagnostics.
2023-04-27 Jakub Jelinek <jakub@redhat.com>
PR c/109409
* c-parser.cc (c_parser_initializer): Move diagnostics about
initialization of variable sized object with non-empty initializer
after c_parser_expr_no_commas call and ret.set_error (); after it.
* gcc.dg/pr109409.c: New test.
|
|
The change to allow empty initializers in C broke error-recovery on the
following testcase. We are emitting function %qD is initialized like a
variable error early; if the initializer is non-empty, we just emit
another error that the initializer is invalid. Previously if it was empty,
we'd emit another error that scalar is being initialized by empty
initializer (not really correct), but now we instead just try to
build_zero_cst for the FUNCTION_TYPE and ICE on it.
The following patch just emits the same diagnostics for the empty
initializers as we emit for the non-empty ones.
2023-04-27 Jakub Jelinek <jakub@redhat.com>
PR c/107682
PR c/109412
* c-typeck.cc (pop_init_level): If constructor_type is FUNCTION_TYPE,
reject empty initializer as invalid.
* gcc.dg/pr109412.c: New test.
|
|
gcc/ChangeLog:
* doc/extend.texi (Zero Length): Describe example.
|
|
We fail to verify the constraints under which we allow handled
components to wrap registers. The gcc.dg/pr70022.c testcase shows
that we happily end up with
_2 = VIEW_CONVERT_EXPR<int[4]>(v_1(D))
as produced by SSA rewrite and update_address_taken. But the intent
was that we wrap registers with at most a single level of handled
components and specifically only allow __real, __imag, BIT_FIELD_REF
and VIEW_CONVERT_EXPR on them, but not ARRAY_REF or COMPONENT_REF.
Together with the improved gimple_load predicate taking advantage
of the above and ASAN this eventually ICEd.
The following fixes update_address_taken as to this constraint.
PR tree-optimization/109594
* tree-ssa.cc (non_rewritable_mem_ref_base): Constrain
what we rewrite to a register based on the above.
|
|
RISC-V will emit ".option nopic" when -fno-pie is in effect, which
matches the generic pattern. Just like done for Alpha, special-case
RISC-V.
gcc/testsuite/
* c-c++-common/patchable_function_entry-decl.c: Special-case
RISC-V.
* c-c++-common/patchable_function_entry-default.c: Likewise.
* c-c++-common/patchable_function_entry-definition.c: Likewise.
|
|
For PR61445 I removed this assert, but PR108242 demonstrated why it's still
useful; to avoid regressing the former testcase I check pattern_defined
in the assert.
This reverts r212524.
PR c++/61445
gcc/cp/ChangeLog:
* pt.cc (instantiate_decl): Assert !defer_ok for local
class members.
|
|
|
|
With this, execution time for e.g. __moddi3 go from 59 to 40 cycles in
the "fast" case or from 290 to 200 cycles in the "slow" case (when the
!TARGET_HAS_NO_HW_DIVIDE variant calls division and modulus functions
for 32-bit SImode), as exposed by gcc.c-torture/execute/arith-rand-ll.c
compiled for -march=v10.
Unfortunately, it just puts a performance improvement "dent" of 0.07%
in a arith-rand-ll.c-based performance test - where all loops are also
reduced to 1/10.
The size of every affected libgcc function is reduced to less than
half and they are all now leaf functions.
* config/cris/t-cris (HOST_LIBGCC2_CFLAGS): Add
-DTARGET_HAS_NO_HW_DIVIDE.
|
|
This patch fixes whitespace errors introduced with
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html
2023-04-26 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/riscv.cc: Fix whitespace.
* config/riscv/sync.md: Fix whitespace.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
It occurred to me that we have a perfectly good DECL_INITIAL field to put
the instantiated DMI into, we don't need a separate hash table.
gcc/cp/ChangeLog:
* init.cc (nsdmi_inst): Remove.
(maybe_instantiate_nsdmi_init): Use DECL_INITIAL instead.
|
|
The earlier fix for PR109241 avoided the crash by handling a type with no
TREE_BINFO. But we want to move toward doing the partial substitution of
classes in generic lambdas, so let's take a step in that direction.
PR c++/109241
gcc/cp/ChangeLog:
* pt.cc (instantiate_class_template): Do partially instantiate.
(tsubst_expr): Do call complete_type for partial instantiations.
|
|
Normally we re-instantiate a function declaration when we start to
instantiate the body in case of multiple declarations. In this wacky
testcase, this causes a problem because the type of the w_counter parameter
depends on its declaration not being in scope yet, so the name lookup only
finds the previous declaration. This isn't a problem for member functions,
since they aren't subject to argument-dependent lookup. So let's just skip
the regeneration for hidden friends.
PR c++/69836
gcc/cp/ChangeLog:
* pt.cc (regenerate_decl_from_template): Skip unique friends.
gcc/testsuite/ChangeLog:
* g++.dg/template/friend76.C: New test.
|
|
This introduces an early exit test to most_specialized_partial_spec for
templates which have no partial specializations, saving some unnecessary
work during class template instantiation in the common case. In passing,
modernize the code a bit.
gcc/cp/ChangeLog:
* pt.cc (most_specialized_partial_spec): Exit early when
DECL_TEMPLATE_SPECIALIZATIONS is empty. Move local variable
declarations closer to their first use. Remove redundant
flag_concepts test. Remove redundant forward declaration.
|
|
Sparsely used ssa caches can benefit from using a bitmap to
determine if a name already has an entry. Utilize it in the path query
and remove its private bitmap for tracking the same info.
Also use it in the "assume" query class.
PR tree-optimization/108697
* gimple-range-cache.cc (ssa_global_cache::clear_range): Do
not clear the vector on an out of range query.
(ssa_cache::dump): Use dump_range_query instead of get_range.
(ssa_cache::dump_range_query): New.
(ssa_lazy_cache::dump_range_query): New.
(ssa_lazy_cache::set_range): New.
* gimple-range-cache.h (ssa_cache::dump_range_query): New.
(class ssa_lazy_cache): New.
(ssa_lazy_cache::ssa_lazy_cache): New.
(ssa_lazy_cache::~ssa_lazy_cache): New.
(ssa_lazy_cache::get_range): New.
(ssa_lazy_cache::clear_range): New.
(ssa_lazy_cache::clear): New.
(ssa_lazy_cache::dump): New.
* gimple-range-path.cc (path_range_query::path_range_query): Do
not allocate a ssa_cache object nor has_cache bitmap.
(path_range_query::~path_range_query): Do not free objects.
(path_range_query::clear_cache): Remove.
(path_range_query::get_cache): Adjust.
(path_range_query::set_cache): Remove.
(path_range_query::dump): Don't call through a pointer.
(path_range_query::internal_range_of_expr): Set cache directly.
(path_range_query::reset_path): Clear cache directly.
(path_range_query::ssa_range_in_phi): Fold with globals only.
(path_range_query::compute_ranges_in_phis): Simply set range.
(path_range_query::compute_ranges_in_block): Call cache directly.
* gimple-range-path.h (class path_range_query): Replace bitmap
and cache pointer with lazy cache object.
* gimple-range.h (class assume_query): Use ssa_lazy_cache.
|
|
This renames the ssa_global_cache to be ssa_cache. The original use was
to function as a global cache, but its uses have expanded. Remove all mention
of "global" from the class and methods. Also add a has_range method.
* gimple-range-cache.cc (ssa_cache::ssa_cache): Rename.
(ssa_cache::~ssa_cache): Rename.
(ssa_cache::has_range): New.
(ssa_cache::get_range): Rename.
(ssa_cache::set_range): Rename.
(ssa_cache::clear_range): Rename.
(ssa_cache::clear): Rename.
(ssa_cache::dump): Rename and use get_range.
(ranger_cache::get_global_range): Use get_range and set_range.
(ranger_cache::range_of_def): Use get_range.
* gimple-range-cache.h (class ssa_cache): Rename class and methods.
(class ranger_cache): Use ssa_cache.
* gimple-range-path.cc (path_range_query::path_range_query): Use
ssa_cache.
(path_range_query::get_cache): Use get_range.
(path_range_query::set_cache): Use set_range.
* gimple-range-path.h (class path_range_query): Use ssa_cache.
* gimple-range.cc (assume_query::assume_range_p): Use get_range.
(assume_query::range_of_expr): Use get_range.
(assume_query::assume_query): Use set_range.
(assume_query::calculate_op): Use get_range and set_range.
* gimple-range.h (class assume_query): Use ssa_cache.
|
|
Add a sparse vector class for cache and use if by default.
Rename the evrp_* params to vrp_*, and add a param for small CFGS which use
just the original basic vector.
* gimple-range-cache.cc (sbr_vector::sbr_vector): Add parameter
and local to optionally zero memory.
(br_vector::grow): Only zero memory if flag is set.
(class sbr_lazy_vector): New.
(sbr_lazy_vector::sbr_lazy_vector): New.
(sbr_lazy_vector::set_bb_range): New.
(sbr_lazy_vector::get_bb_range): New.
(sbr_lazy_vector::bb_range_p): New.
(block_range_cache::set_bb_range): Check flags and Use sbr_lazy_vector.
* gimple-range-gori.cc (gori_map::calculate_gori): Use
param_vrp_switch_limit.
(gori_compute::gori_compute): Use param_vrp_switch_limit.
* params.opt (vrp_sparse_threshold): Rename from evrp_sparse_threshold.
(vrp_switch_limit): Rename from evrp_switch_limit.
(vrp_vector_threshold): New.
|
|
If either of the SSA names in a comparison do not have any equivalences
or relations, we can short-circuit the check slightly.
* value-relation.cc (dom_oracle::query_relation): Check early for lack
of any relation.
* value-relation.h (equiv_oracle::has_equiv_p): New.
|
|
If the direct dependence fields point directly to an ssa-name,
its possible that an optimization frees an ssa-name, and the value
pointed to may now be in the free list. Simply maintain the ssa
version number instead.
PR tree-optimization/109417
* gimple-range-gori.cc (range_def_chain::register_dependency):
Save the ssa version number, not the pointer.
(gori_compute::may_recompute_p): No need to check if a dependency
is in the free list.
* gimple-range-gori.h (class range_def_chain): Change ssa1 and ssa2
fields to be unsigned int instead of trees.
(ange_def_chain::depend1): Adjust.
(ange_def_chain::depend2): Adjust.
* gimple-range.h: Include "ssa.h" to inline ssa_name().
|
|
AIX 7.2 minimum ISA is POWER7 and AIX 7.3 minimum ISA is POWER8.
This patch changes the aix72.h configuration to POWER7 with VSX enabled
by default (with the AIX VSX ABI limitations), matching LLVM on AIX,
and changes the aix73.h configuration to POWER8.
gcc/ChangeLog:
* config/rs6000/aix72.h (TARGET_DEFAULT): Use ISA_2_6_MASKS_SERVER.
* config/rs6000/aix73.h (TARGET_DEFAULT): Use ISA_2_7_MASKS_SERVER.
(PROCESSOR_DEFAULT): Use PROCESSOR_POWER8.
Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
|
|
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.
This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.
gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.
2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
PR target/104338
* config/riscv/riscv-protos.h: Add helper function stubs.
* config/riscv/riscv.cc: Add helper functions for subword masking.
* config/riscv/riscv.opt: Add command-line flag.
* config/riscv/sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* doc/invoke.texi: Add blurb regarding command-line flag.
libgcc/ChangeLog:
PR target/104338
* config/riscv/atomic.c: Add reference to duplicate logic.
gcc/testsuite/ChangeLog:
PR target/104338
* gcc.target/riscv/inline-atomics-1.c: New test.
* gcc.target/riscv/inline-atomics-2.c: New test.
* gcc.target/riscv/inline-atomics-3.c: New test.
* gcc.target/riscv/inline-atomics-4.c: New test.
* gcc.target/riscv/inline-atomics-5.c: New test.
* gcc.target/riscv/inline-atomics-6.c: New test.
* gcc.target/riscv/inline-atomics-7.c: New test.
* gcc.target/riscv/inline-atomics-8.c: New test.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
2023-04-26 Patrick O'Neill <patrick@rivosinc.com>
* MAINTAINERS: Add myself.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Similar to the previous patch, we can reimplement the rshrn2 patterns using standard RTL codes
for shift, truncate and plus with the appropriate constants.
This allows us to get rid of UNSPEC_RSHRN entirely.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_rshrn2<mode>_insn_le):
Reimplement using standard RTL codes instead of unspec.
(aarch64_rshrn2<mode>_insn_be): Likewise.
(aarch64_rshrn2<mode>): Adjust for the above.
* config/aarch64/aarch64.md (UNSPEC_RSHRN): Delete.
|
|
This patch reimplements the backend patterns for the rshrn intrinsics using standard RTL codes rather than UNSPECS.
We already represent shrn as truncate of a shift. rshrn can be represented as truncate (src + (1 << (shft - 1)) >> shft),
similar to how LLVM treats it.
I have a follow-up patch to do the same for the rshrn2 pattern, which will allow us to remove the UNSPEC_RSHRN entirely.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_rshrn<mode>_insn_le): Reimplement
with standard RTL codes instead of an UNSPEC.
(aarch64_rshrn<mode>_insn_be): Likewise.
(aarch64_rshrn<mode>): Adjust for the above.
* config/aarch64/predicates.md (aarch64_simd_rshrn_imm_vec): Define.
|
|
libsanitizer/ChangeLog:
* LOCAL_PATCHES: Change revision.
|
|
|
|
|
|
This patch try to legitimise the const0_rtx (aka zero register)
as the base register for the RVV load/store instructions.
For example:
vint32m1_t test_vle32_v_i32m1_shortcut (size_t vl)
{
return __riscv_vle32_v_i32m1 ((int32_t *)0, vl);
}
Before this patch:
li a5,0
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(a5) <- can propagate the const 0 to a5 here
vs1r.v v24,0(a0)
After this patch:
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(zero)
vs1r.v v24,0(a0)
As above, this patch allow you to propagate the const 0 (aka zero
register) to the base register of the RVV Unit-Stride load in the
combine pass. This may benefit the underlying RVV auto-vectorization.
However, the indexed load failed to perform the optimization and it
will be take care of in another PATCH.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_classify_address): Allow
const0_rtx for the RVV load/store.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
This patch removes all the code paths guarded by legacy_mode_p(), thus
allowing us to re-use the int_range<1> idiom for a range of one
sub-range. This allows us to represent these simple ranges in a more
efficient manner.
gcc/ChangeLog:
* range-op.cc (range_op_cast_tests): Remove legacy support.
* value-range-storage.h (vrange_allocator::alloc_irange): Same.
* value-range.cc (irange::operator=): Same.
(get_legacy_range): Same.
(irange::copy_legacy_to_multi_range): Delete.
(irange::copy_to_legacy): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Remove legacy support.
(irange::verify_range): Same.
(irange::legacy_lower_bound): Delete.
(irange::legacy_upper_bound): Delete.
(irange::legacy_equal_p): Delete.
(irange::operator==): Remove legacy support.
(irange::singleton_p): Same.
(irange::value_inside_range): Same.
(irange::contains_p): Same.
(intersect_ranges): Delete.
(irange::legacy_intersect): Delete.
(union_ranges): Delete.
(irange::legacy_union): Delete.
(irange::legacy_verbose_union_): Delete.
(irange::legacy_verbose_intersect): Delete.
(irange::irange_union): Remove legacy support.
(irange::irange_intersect): Same.
(irange::intersect): Same.
(irange::invert): Same.
(ranges_from_anti_range): Delete.
(gt_pch_nx): Adjust for legacy removal.
(gt_ggc_mx): Same.
(range_tests_legacy): Delete.
(range_tests_misc): Adjust for legacy removal.
(range_tests): Same.
* value-range.h (class irange): Same.
(irange::legacy_mode_p): Delete.
(ranges_from_anti_range): Delete.
(irange::nonzero_p): Adjust for legacy removal.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
(irange::union_): Same.
(irange::intersect): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
* vr-values.cc (simplify_using_ranges::legacy_fold_cond_overflow): Same.
|
|
gcc/ChangeLog:
* value-range.cc (irange::copy_legacy_to_multi_range): Rewrite use
of range_has_numeric_bounds_p with irange API.
(range_has_numeric_bounds_p): Delete.
* value-range.h (range_has_numeric_bounds_p): Delete.
|
|
gcc/ChangeLog:
* tree-data-ref.cc (compute_distributive_range): Replace uses of
range_int_cst_p with irange API.
* tree-ssa-strlen.cc (get_range_strlen_dynamic): Same.
* tree-vrp.h (range_int_cst_p): Delete.
* vr-values.cc (check_for_binary_op_overflow): Replace usees of
range_int_cst_p with irange API.
(vr_set_zero_nonzero_bits): Same.
(range_fits_type_p): Same.
(simplify_using_ranges::simplify_casted_cond): Same.
* tree-vrp.cc (range_int_cst_p): Remove.
|