Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
PR target/102089
* config.gcc: MIPS: use N64 ABI by default if the triple end
with -gnuabi64, which is used by Debian since 2013.
|
|
I've hit a bootstrap-debug error involving large subprograms in
gcc/ada/sem_ch12.adb. I'm afraid I couldn't narrow it down to a
reasonable testcase.
thread1 made different decisions about a block containing a
builtin_eh_filter call because in one compilation, estimate_num_insns
found a cgraph_node for the builtin and could thus get to the
is_simple_builtin test, but in the other it didn't. With different
insn counts, one stage jump-threaded and the other didn't, and the
resulting code diverged quite a bit.
The reason the builtin had a cgraph_node in one case but not the other
was that modref got a chance to analyze the builtin call when it was
the first stmt in the block, and that created the cgraph_node.
However, when it was preceded by debug stmts, the loop in
analyze_function was cut short after the first debug stmt, because the
summary so far was not useful.
This patch fixes both issues: skip debug stmts in the analyze_function
loop, so as to prevent them from affecting any decisions in the loop,
and enable the insn count estimator to get to the is_simple_builtin
test when a cgraph_node has not been created for the builtin.
for gcc/ChangeLog
* ipa-modref.c (analyze_function): Skip debug stmts.
* tree-inline.c (estimate_num_insn): Consider builtins even
without a cgraph_node.
|
|
|
|
Even if the operand of -> has dependent type, if it's a pointer we know
that the result will be the target type of that pointer. This should avoid
some unnecessary TYPEOF_EXPR when looking up a name after ->.
gcc/cp/ChangeLog:
* typeck2.c (build_x_arrow): Do set TREE_TYPE when operand is
a dependent pointer.
|
|
gcc/
* config/h8300/bitfield.md (cstore<mode>4): Remove expander.
* config/h8300/h8300.c (h8300_expand_branch): Remove function.
* config/h8300/h8300-protos.h (h8300_expadn_branch): Remove prototype.
* config/h8300/h8300.md (eqne): New code iterator.
(geultu, geultu_to_c): Similarly.
* config/h8300/testcompare.md (cstore<mode>4): Dummy expander.
(store_c_<mode>, store_c_i_<mode>): New define_insn_and_splits
(cmp<mode>_c): New pattern
|
|
Segher asked that I update the comments to include the d-form vector stores
(even though they wouldn't be generated by this test).
2021-08-25 Michael Meissner <meissner@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/float128-call.c: Update comments.
|
|
gcc/
* tree-ssa-dom.c (reduce_vector_comparison_to_scalar_comparison): New
function.
(dom_opt_dom_walker::optimize_stmt): Use it.
|
|
I built a compiler on a little endian power8 system where the default long
double was IEEE 128-bit instead of IBM 128-bit. I discovered that on
power8, we would generate a lxvd2x and xxpermdi to deal with the endianess
instead of the Altivec lxv.
In addition, I noticed the constant that was being loaded (1.0q) could be
loaded by the lxvkq instruction.
I rewrote the test to handle all forms of vector load and store that can
be generated.
2021-08-27 Michael Meissner <meissner@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/float128-call.c: Fix test for IEEE 128-bit
long double and power10.
|
|
Some newer assemblers emit section start temp symbols for mod init and term
sections if there is no suitable symbol present already.
The temp symbols are linker visible and therefore appear in the symbol tables.
Since the temp symbol number can vary when debug is enabled, that causes
compare-debug fails. The solution is to provide a stable linker-visible
symbol.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin.c (finalize_ctors): Add a section-start linker-
visible symbol.
(finalize_dtors): Likewise.
* config/darwin.h (MIN_LD64_INIT_TERM_START_LABELS): New.
|
|
2021-08-27 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-call.c (rs6000-builtins.h): New #include.
(rs6000_init_builtins): Call rs6000_init_generated_builtins. Skip the
old initialization logic when new builtins are enabled.
* config/rs6000/rs6000-gen-builtins.c (write_decls): Rename
rs6000_autoinit_builtins to rs6000_init_generated_builtins.
(write_init_file): Likewise.
|
|
Although the cctools assembler is based of GNU GAS, it is from a
very old version (1.38) which does not support many of the features
that the target supports test is expecting.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Exclude cctools assembler based on
GAS 1.38.
|
|
In r12-3048-ge0b6d0b39c6, the GAS version parameter was removed from
the gcc_GAS_CHECK_FEATURE macro. It seems that overlapping comit/test
cycles resulted in several AMDGCN and one Darwin commit with the now
extra parameter still present.
This causes wrong configure code to be generated when autoreconf is
used in the gcc directory.
Fixed by removing the extraneous parm from the AMDGCN and Darwin cases.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* configure.ac (darwin2[[0-9]]* | darwin19*): Alter use of
gcc_GAS_CHECK_FEATURE to remove an extraneous parameter.
(amdgcn-* | gcn-*) Likewise.
|
|
Without the 'template', this function template compares 'traverse' to 'f',
and then compares the result to 'a'. Evidently it hasn't been instantiated
yet.
gcc/ChangeLog:
* symbol-summary.h: Added missing template keyword.
|
|
This fixes DCE to be able to elide dead control flow in an
infinite loop without an exit edge. This special situation is
handled well by the code finding an edge to preserve since there's
no chance it will find the exit edge and make the loop finite.
2021-08-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/45178
* tree-ssa-dce.c (find_obviously_necessary_stmts): For
infinite loops without exit do not mark control dependent
edges of the latch necessary.
* gcc.dg/tree-ssa/ssa-dce-3.c: Adjust testcase.
|
|
gcc/ChangeLog:
PR target/101472
* config/i386/sse.md: (<avx512>scattersi<mode>): Add mask operand to
UNSPEC_VSIBADDR.
(<avx512>scattersi<mode>): Likewise.
(*avx512f_scattersi<VI48F:mode>): Merge mask operand to set_dest.
(*avx512f_scatterdi<VI48F:mode>): Likewise
gcc/testsuite/ChangeLog:
PR target/101472
* gcc.target/i386/avx512f-pr101472.c: New test.
* gcc.target/i386/avx512vl-pr101472.c: New test.
|
|
This patch is to add the support to make vectorizer able to
vectorize some built-in function scalar versions on Power10.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_builtin_md_vectorized_function): Add
support for built-in functions MISC_BUILTIN_DIVWE, MISC_BUILTIN_DIVWEU,
MISC_BUILTIN_DIVDE, MISC_BUILTIN_DIVDEU, P10_BUILTIN_CFUGED,
P10_BUILTIN_CNTLZDM, P10_BUILTIN_CNTTZDM, P10_BUILTIN_PDEPD and
P10_BUILTIN_PEXTD on Power10.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/dive-vectorize-1.c: New test.
* gcc.target/powerpc/dive-vectorize-1.h: New test.
* gcc.target/powerpc/dive-vectorize-2.c: New test.
* gcc.target/powerpc/dive-vectorize-2.h: New test.
* gcc.target/powerpc/dive-vectorize-run-1.c: New test.
* gcc.target/powerpc/dive-vectorize-run-2.c: New test.
* gcc.target/powerpc/p10-bifs-vectorize-1.c: New test.
* gcc.target/powerpc/p10-bifs-vectorize-1.h: New test.
* gcc.target/powerpc/p10-bifs-vectorize-run-1.c: New test.
|
|
This patch is to make prototypes of some Power10 built-in
functions consistent with what's in the documentation, as
well as the vector version. Otherwise, useless conversions
can be generated in gimple IR, and the vectorized versions
will have inconsistent types.
gcc/ChangeLog:
* config/rs6000/rs6000-call.c (builtin_function_type): Add unsigned
signedness for some Power10 bifs.
|
|
Further fixes to structure alignment when the structure is packed
and contains double. This patch checks for packed attribute
at the top level.
gcc/ChangeLog:
PR target/102068
* config/rs6000/rs6000.c (rs6000_adjust_field_align): Use
computed alignment if the entire struct has attribute packed.
|
|
A follow-up to https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html
gcc/
PR target/98167
PR target/43147
* config/i386/i386.c (ix86_gimple_fold_builtin): Fold
IX86_BUILTIN_SHUFPD512, IX86_BUILTIN_SHUFPS512,
IX86_BUILTIN_SHUFPD256, IX86_BUILTIN_SHUFPS,
IX86_BUILTIN_SHUFPS256.
(ix86_masked_all_ones): New function.
gcc/testsuite/
* gcc.target/i386/avx512f-vshufpd-1.c: Adjust testcase.
* gcc.target/i386/avx512f-vshufps-1.c: Adjust testcase.
* gcc.target/i386/pr43147.c: New test.
|
|
|
|
There is no point to check RTXes before calling force_reg,
force_reg checks for REG RTX by itself.
2021-08-26 Uroš Bizjak <ubizjak@gmail.com>
gcc/
* config/i386/i386.md (*btr<mode>_1): Call force_reg unconditionally.
(conditional moves with memory inputs splitters): Ditto.
* config/i386/sse.md (one_cmpl<mode>2): Simplify.
|
|
* ipa-modref-tree.h (modref_access_node::try_merge_with): Restart
search after merging.
|
|
2021-08-26 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-overload.def: Add remaining overloads.
|
|
2021-06-07 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-builtin-new.def: Add cell stanza.
|
|
2021-06-15 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-builtin-new.def: Add ieee128-hw, dfp,
crypto, and htm stanzas.
|
|
2021-06-16 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/rs6000-builtin-new.def: Add mma stanza.
|
|
gcc/ChangeLog:
* tree-ssa-uninit.c (warn_uninit): Refactor and simplify.
(warn_uninit_phi_uses): Remove argument from calls to warn_uninit.
(warn_uninitialized_vars): Same. Reduce visibility of locals.
(warn_uninitialized_phi): Same.
|
|
This patch is the next in the series to improve bit bounds in tree-ssa's
bit CCP pass, this time: bounds for shifts and rotates by unknown amounts.
This allows us to optimize expressions such as ((x&15)<<(y&24))&64.
In this case, the expression (y&24) contains only two unknown bits,
and can therefore have only four possible values: 0, 8, 16 and 24.
From this (x&15)<<(y&24) has the nonzero bits 0x0f0f0f0f, and from
that ((x&15)<<(y&24))&64 must always be zero.
One clever use of computer science in this patch is the use of XOR
to efficiently enumerate bit patterns in Gray code order. As the
order in which we generate values is not significant, it's faster
and more convenient to enumerate values by flipping one bit at a
time, rather than in numerical order [which would require carry
bits and additional logic].
There's a pre-existing ??? comment in tree-ssa-ccp.c that we should
eventually be able to optimize (x<<(y|8))&255, but this patch takes the
conservatively paranoid approach of only optimizing cases where the
shift/rotate is guaranteed to be less than the target precision, and
therefore avoids changing any cases that potentially might invoke
undefined behavior. This patch does optimize (x<<((y&31)|8))&255.
2021-08-26 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* tree-ssa-ccp.c (get_individual_bits): Helper function to
extract the individual bits from a widest_int constant (mask).
(gray_code_bit_flips): New read-only table for effiently
enumerating permutations/combinations of bits.
(bit_value_binop) [LROTATE_EXPR, RROTATE_EXPR]: Handle rotates
by unknown counts that are guaranteed less than the target
precision and four or fewer unknown bits by enumeration.
[LSHIFT_EXPR, RSHIFT_EXPR]: Likewise, also handle shifts by
enumeration under the same conditions. Handle remaining
shifts as a mask based upon the minimum possible shift value.
gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/ssa-ccp-41.c: New test case.
|
|
As suggested by Richard Biener in the comments of PR middle-end/102029,
the new test "INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type) ..." is
redundant, and just "INTEGRAL_TYPE_P (type)" is the preferred form.
2021-08-26 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* match.pd (shift transformations): Remove a redundant
!POINTER_TYPE_P check.
|
|
We want to replace all REGs equal to FROM.
2021-08-26 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/102057
* config/i386/i386.md (cmove reg-reg move elimination peephole2s):
Set all_regs to true in the call to replace_rtx.
|
|
this patch makes insertion to modref access tree smarter when --param
modref-max-bases and moredref-max-refs are hit. Instead of giving up
we either give up on base alias set (make it equal to ref) or turn the
alias set to 0. This lets us to track useful info on quite large
functions, such as ggc_free.
gcc/ChangeLog:
* ipa-modref-tree.c (test_insert_search_collapse): Update test.
* ipa-modref-tree.h (modref_base_node::insert): Be smarter when
hiting --param modref-max-refs limit.
(modref_tree:insert_base): Be smarter when hitting
--param modref-max-bases limit. Add new parameter REF.
(modref_tree:insert): Update.
(modref_tree:merge): Update.
* ipa-modref.c (read_modref_records): Update.
|
|
gcc/ChangeLog:
* params.opt: (modref-max-adjustments): Add full stop.
|
|
gcc/ChangeLog:
* ipa-modref-tree.h (modref_ref_node::verify): New member
functoin.
(modref_ref_node::insert): Use it.
(modref_ref_node::try_mere_with): Fix off by one error.
|
|
gcc/ChangeLog:
* cgraph.h (create_version_clone_with_body): Add new parameter.
* cgraphclones.c: Likewise.
* multiple_target.c (create_dispatcher_calls): Do not use
numbered suffixes.
(create_target_clone): Likewise here.
gcc/testsuite/ChangeLog:
* gcc.target/i386/mvc5.c: Scan assembly names.
* gcc.target/i386/mvc7.c: Likewise.
* gcc.target/i386/pr95778-1.c: Update scanned patterns.
* gcc.target/i386/pr95778-2.c: Likewise.
Co-Authored-By: Stefan Kneifel <stefan.kneifel@bluewin.ch>
|
|
gcc/Changelog:
* doc/extend.texi: Add note about reserved priorities
to the constructor attribute.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
|
|
|
|
gcc/testsuite:
* gcc.dg/tree-ssa/evrp1.c: Add -details to dump option.
* gcc.dg/tree-ssa/evrp2.c: Same.
* gcc.dg/tree-ssa/evrp3.c: Same.
* gcc.dg/tree-ssa/evrp4.c: Same.
* gcc.dg/tree-ssa/evrp6.c: Same.
* gcc.dg/tree-ssa/pr64130.c: Same.
|
|
This patch adds 3 more selections to target-supports.exp to see if we can
specify to use a particular long double format (IEEE 128-bit, IBM extended
double, 64-bit), and the library support will track the changes for the long
double. This is needed because two of the tests in the test suite use long
double, and they are actually testing IBM extended double.
This patch also forces the two tests that explicitly require long double
to use the IBM double-double encoding to explicitly run the test. This
requires GLIBC 2.32 or greater in order to do the switch.
I have run tests on a little endian power9 system with 3 compilers. There were
no regressions with these patches, and the two tests in the following patches
now work if the default long double is not IBM 128-bit:
* One compiler used the default IBM 128-bit format;
* One compiler used the IEEE 128-bit format; (and)
* One compiler used 64-bit long doubles.
I have also tested compilers on a big endian power8 system with a compiler
defaulting to power8 code generation and another with the default cpu
set. There were no regressions.
2021-08-25 Michael Meissner <meissner@linux.ibm.com>
gcc/testsuite/
PR target/94630
* gcc.target/powerpc/pr70117.c: Specify that we need the long double
type to be IBM 128-bit. Remove the code to use __ibm128.
* c-c++-common/dfp/convert-bfp-11.c: Specify that we need the long
double type to be IBM 128-bit. Run the test at -O2 optimization.
* lib/target-supports.exp (add_options_for_long_double_ibm128): New
function.
(check_effective_target_long_double_ibm128): New function.
(add_options_for_long_double_ieee128): New function.
(check_effective_target_long_double_ieee128): New function.
(add_options_for_long_double_64bit): New function.
(check_effective_target_long_double_64bit): New function.
|
|
switch
So the problem here is there is code in the C++ front-end not to add a
break statement (to the IR) if the previous block does not fall through.
The problem is the code which does the check to see if the block
may fallthrough does not check a CLEANUP_STMT; it assumes it is always
fall through. Anyways this adds the code for the case of a CLEANUP_STMT
that is only for !CLEANUP_EH_ONLY (the try/finally case).
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/cp/ChangeLog:
PR c++/66590
* cp-objcp-common.c (cxx_block_may_fallthru): Handle
CLEANUP_STMT for the case which will be try/finally.
gcc/testsuite/ChangeLog:
PR c++/66590
* g++.dg/warn/Wreturn-5.C: New test.
|
|
gcc/ChangeLog:
* gimple-range-cache.cc (ssa_global_cache::dump): Avoid printing
range table header alone.
* gimple-range.cc (gimple_ranger::export_global_ranges): Same.
|
|
The removal of remove_zero_width_bit_fields, in addition to triggering
some ABI issues that need solving anyway (ABI incompatibility between
C and C++) also resulted in UB inside of gcc, we now call build_zero_init
which calls build_int_cst on an integral type with TYPE_PRECISION of 0.
Fixed by ignoring the zero width bitfields. I understand
build_value_init_noctor wants to initialize to 0 even unnamed bitfields
(of non-zero width), at least until we have some CONSTRUCTOR flag that says
that even all the padding bits should be cleared.
2021-08-25 Jakub Jelinek <jakub@redhat.com>
PR c++/102019
* init.c (build_value_init_noctor): Ignore unnamed zero-width
bitfields.
|
|
this patch adds logic needed to merge neighbouring accesses in ipa-modref
summaries. This helps analyzing array initializers and similar code. It is
bit of work, since it breaks the fact that modref tree makes a good lattice for
dataflow: the access ranges can be extended indefinitely. For this reason I
added counter tracking number of adjustments and a cap to limit them during the
dataflow.
gcc/ChangeLog:
* doc/invoke.texi: Document --param modref-max-adjustments.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref-tree.h (struct modref_access_node): Add adjustments;
(modref_access_node::operator==): Fix handling of access ranges.
(modref_access_node::contains): Constify parameter; handle also
mismatched parm offsets.
(modref_access_node::update): New function.
(modref_access_node::merge): New function.
(unspecified_modref_access_node): Update constructor.
(modref_ref_node::insert_access): Add record_adjustments parameter;
handle merging.
(modref_ref_node::try_merge_with): New private function.
(modref_tree::insert): New record_adjustments parameter.
(modref_tree::merge): New record_adjustments parameter.
(modref_tree::copy_from): Update.
* ipa-modref.c (dump_access): Dump adjustments field.
(get_access): Update constructor.
(record_access): Update call of insert.
(record_access_lto): Update call of insert.
(merge_call_side_effects): Add record_adjustments parameter.
(get_access_for_fnspec): Update.
(process_fnspec): Update.
(analyze_call): Update.
(analyze_function): Update.
(read_modref_records): Update.
(ipa_merge_modref_summary_after_inlining): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
* params.opt (param-max-modref-adjustments=): New.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/modref-1.c: Update testcase.
* gcc.dg/tree-ssa/modref-4.c: Update testcase.
* gcc.dg/tree-ssa/modref-8.c: New test.
|
|
I noticed that the built-functions for xxspltiw, xxspltidp, xxsplti32dx,
xxpermx, and xxeval all used the 'vecsimple' type. These instructions are
permute instructions (3 cycle latency) and should use 'vecperm' instead.
While I was at it, I changed the UNSPEC name for xxspltidp to be
UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
2021-08-25 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename from
UNSPEC_XXSPLTID.
(xxspltiw_v4si): Use vecperm type attribute.
(xxspltiw_v4si_inst): Use vecperm type attribute.
(xxspltiw_v4sf_inst): Likewise.
(xxspltidp_v2df): Use vecperm type attribute. Use
UNSPEC_XXSPLTIDP instead of UNSPEC_XXSPLTID.
(xxspltidp_v2df_inst): Likewise.
(xxsplti32dx_v4si): Use vecperm type attribute.
(xxsplti32dx_v4si_inst): Likewise.
(xxsplti32dx_v4sf_inst): Likewise.
(xxblend_<mode>): Likewise.
(xxpermx): Likewise.
(xxpermx_inst): Likewise.
(xxeval): Likewise.
|
|
Adds the logic to handle -finput-charset in layout_get_source_line(), so that
source lines are converted from their input encodings prior to being output by
diagnostics machinery. Also adds the ability to strip a UTF-8 BOM similarly.
gcc/c-family/ChangeLog:
PR other/93067
* c-opts.c (c_common_input_charset_cb): New function.
(c_common_post_options): Call new function
diagnostic_initialize_input_context().
gcc/d/ChangeLog:
PR other/93067
* d-lang.cc (d_input_charset_callback): New function.
(d_init): Call new function
diagnostic_initialize_input_context().
gcc/fortran/ChangeLog:
PR other/93067
* cpp.c (gfc_cpp_post_options): Call new function
diagnostic_initialize_input_context().
gcc/ChangeLog:
PR other/93067
* coretypes.h (typedef diagnostic_input_charset_callback): Declare.
* diagnostic.c (diagnostic_initialize_input_context): New function.
* diagnostic.h (diagnostic_initialize_input_context): Declare.
* input.c (default_charset_callback): New function.
(file_cache::initialize_input_context): New function.
(file_cache_slot::create): Added ability to convert the input
according to the input context.
(file_cache::file_cache): Initialize the new input context.
(class file_cache_slot): Added new m_alloc_offset member.
(file_cache_slot::file_cache_slot): Initialize the new member.
(file_cache_slot::~file_cache_slot): Handle potentially offset buffer.
(file_cache_slot::maybe_grow): Likewise.
(file_cache_slot::needs_read_p): Handle NULL fp, which is now possible.
(file_cache_slot::get_next_line): Likewise.
* input.h (class file_cache): Added input context member.
libcpp/ChangeLog:
PR other/93067
* charset.c (init_iconv_desc): Adapt to permit PFILE argument to
be NULL.
(_cpp_convert_input): Likewise. Also move UTF-8 BOM logic to...
(cpp_check_utf8_bom): ...here. New function.
(cpp_input_conversion_is_trivial): New function.
* files.c (read_file_guts): Allow PFILE argument to be NULL. Add
INPUT_CHARSET argument as an alternate source of this information.
(read_file): Pass the new argument to read_file_guts.
(cpp_get_converted_source): New function.
* include/cpplib.h (struct cpp_converted_source): Declare.
(cpp_get_converted_source): Declare.
(cpp_input_conversion_is_trivial): Declare.
(cpp_check_utf8_bom): Declare.
gcc/testsuite/ChangeLog:
PR other/93067
* gcc.dg/diagnostic-input-charset-1.c: New test.
* gcc.dg/diagnostic-input-utf8-bom.c: New test.
|
|
2021-08-25 Ankur Saini <arsenic@sourceware.org>
gcc/analyzer/ChangeLog:
PR analyzer/101980
* engine.cc (exploded_graph::maybe_create_dynamic_call): Don't create
calls if max recursion limit is reached.
|
|
When we swap operands for SLP builds we lose track where exactly
pattern defs are - but we fail to update the any_pattern member
of the operands info. Do so conservatively.
2021-08-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/102046
* tree-vect-slp.c (vect_build_slp_tree_2): Conservatively
update ->any_pattern when swapping operands.
* gcc.dg/vect/pr102046.c: New testcase.
|
|
For ASHIFT + ZERO_EXTEND pattern, combine pass failed to
match it to lea since it will generate non-canonical
zero-extend. Adjust predicate and cost_model to allow combine
for lea.
gcc/ChangeLog:
PR target/101716
* config/i386/i386.c (ix86_live_on_entry): Adjust comment.
(ix86_decompose_address): Remove retval check for ASHIFT,
allow non-canonical zero extend if AND mask covers ASHIFT
count.
(ix86_legitimate_address_p): Adjust condition for decompose.
(ix86_rtx_costs): Adjust cost for lea with non-canonical
zero-extend.
Co-Authored by: Uros Bizjak <ubizjak@gmail.com>
gcc/testsuite/ChangeLog:
PR target/101716
* gcc.target/i386/pr101716.c: New test.
|
|
For code like:
unsigned foo(unsigned val, unsigned start)
{
unsigned cnt = 0;
for (unsigned i = start; i > val; ++i)
cnt++;
return cnt;
}
The number of iterations should be about UINT_MAX - start.
There is function adjust_cond_for_loop_until_wrap which
handles similar work for const bases.
Like adjust_cond_for_loop_until_wrap, this patch enhance
function number_of_iterations_cond/number_of_iterations_lt
to analyze number of iterations for this kind of loop.
gcc/ChangeLog:
2021-08-25 Jiufu Guo <guojiufu@linux.ibm.com>
PR tree-optimization/101145
* tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
New function.
(number_of_iterations_lt): Invoke above function.
(adjust_cond_for_loop_until_wrap):
Merge to number_of_iterations_until_wrap.
(number_of_iterations_cond): Update invokes for
adjust_cond_for_loop_until_wrap and number_of_iterations_lt.
gcc/testsuite/ChangeLog:
2021-08-25 Jiufu Guo <guojiufu@linux.ibm.com>
PR tree-optimization/101145
* gcc.dg/vect/pr101145.c: New test.
* gcc.dg/vect/pr101145.inc: New test.
* gcc.dg/vect/pr101145_1.c: New test.
* gcc.dg/vect/pr101145_2.c: New test.
* gcc.dg/vect/pr101145_3.c: New test.
* gcc.dg/vect/pr101145inf.c: New test.
* gcc.dg/vect/pr101145inf.inc: New test.
* gcc.dg/vect/pr101145inf_1.c: New test.
|
|
gcc/ChangeLog:
PR target/101471
* config/i386/avx512dqintrin.h (_mm512_fpclass_ps_mask): Fix
macro define in O0.
(_mm512_mask_fpclass_ps_mask): Ditto.
gcc/testsuite/ChangeLog:
PR target/101471
* gcc.target/i386/avx512f-pr101471.c: New test.
|
|
The existing vec_unpacku_{hi,lo} supports emulated unsigned
unpacking for short and char but misses the support for int.
This patch adds the support of vec_unpacku_{hi,lo}_v4si.
Meanwhile, the current implementation uses vector permutation
way, which requires one extra customized constant vector as
the permutation control vector. It's better to use vector
merge high/low with zero constant vector, to save the space
in constant area as well as the cost to initialize pcv in
prologue. This patch updates it with vector merging and
simplify it with iterators.
gcc/ChangeLog:
* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
(vec_unpacku_hi_v8hi): Likewise.
(vec_unpacku_lo_v16qi): Likewise.
(vec_unpacku_lo_v8hi): Likewise.
(vec_unpacku_hi_<VP_small_lc>): New define_expand.
(vec_unpacku_lo_<VP_small_lc>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/unpack-vectorize-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-1.h: New test.
* gcc.target/powerpc/unpack-vectorize-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-2.h: New test.
* gcc.target/powerpc/unpack-vectorize-3.c: New test.
* gcc.target/powerpc/unpack-vectorize-3.h: New test.
* gcc.target/powerpc/unpack-vectorize-run-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-3.c: New test.
* gcc.target/powerpc/unpack-vectorize.h: New test.
|