aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-10arm: Initialize vector costing fieldsChristophe Lyon1-7/+28
The movi, dup and extract costing fields were recently added to struct vector_cost_table, but there initialization is missing for the arm (aarch32) specific descriptions. Although the arm port does not use these fields (only aarch64 does), this is causing warnings during the build, and even build failures when using gcc-4.8.5 as host compiler: /gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::movi' }; ^ /gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::movi' [-Wmissing-field-initializers] /gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::dup' /gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::dup' [-Wmissing-field-initializers] /gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 'vector_cost_table::extract' /gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 'vector_cost_table::extract' [-Wmissing-field-initializers] This patch uses the same initialization values as in aarch64 for consistency: + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ 2021-11-10 Christophe Lyon <christophe.lyon@foss.st.com> gcc/ * config/arm/arm.c (cortexa9_extra_costs, cortexa8_extra_costs, cortexa5_extra_costs, cortexa7_extra_costs, cortexa12_extra_costs, cortexa15_extra_costs, v7m_extra_costs): Initialize movi, dup and extract costing fields.
2021-11-10path solver: Adjustments for use outside of the backward threader.Aldy Hernandez2-19/+39
Here are some enhancements to make it easier for other clients to use the path solver. First, I've made the imports to the solver optional since we can calculate them ourselves. However, I've left the ability to set them, since the backward threader adds a few SSA names in addition to the default ones. As a follow-up I may move all the import set up code from the threader to the solver, as the extra imports tend to improve the behavior slightly. Second, Richi suggested an entry point where you just feed the solver an edge, which will be quite convenient for a subsequent patch adding a client in the header copying pass. The required some shuffling, since we'll be adding the blocks on the fly. There's now a vector copy, but the impact will be minimal, since these are just 5-6 entries at the most. Tested on ppc64le Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::path_range_query): Do not init m_path. (path_range_query::dump): Change m_path uses to non-pointer. (path_range_query::defined_outside_path): Same. (path_range_query::set_path): Same. (path_range_query::add_copies_to_imports): Same. (path_range_query::range_of_stmt): Same. (path_range_query::compute_outgoing_relations): Same. (path_range_query::compute_ranges): Imports are now optional. Implement overload that takes an edge. * gimple-range-path.h (class path_range_query): Make imports optional for compute_ranges. Add compute_ranges(edge) overload. Make m_path an auto_vec instead of a pointer and adjust accordingly.
2021-11-10AArch64: do not keep negated mask and inverse mask live at the same timeTamar Christina7-11/+55
The following example: void f11(double * restrict z, double * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) { z[i] = (w[i] > 0) ? w[i] : y[i]; } } Generates currently: ptrue p2.b, all ld1d z0.d, p0/z, [x1, x2, lsl 3] fcmgt p1.d, p2/z, z0.d, #0.0 bic p3.b, p2/z, p0.b, p1.b ld1d z1.d, p3/z, [x3, x2, lsl 3] and after the previous patches generates: ptrue p3.b, all ld1d z0.d, p0/z, [x1, x2, lsl 3] fcmgt p1.d, p0/z, z0.d, #0.0 fcmgt p2.d, p3/z, z0.d, #0.0 not p1.b, p0/z, p1.b ld1d z1.d, p1/z, [x3, x2, lsl 3] where a duplicate comparison is performed for w[i] > 0. This is because in the vectorizer we're emitting a comparison for both a and ~a where we just need to emit one of them and invert the other. After this patch we generate: ld1d z0.d, p0/z, [x1, x2, lsl 3] fcmgt p1.d, p0/z, z0.d, #0.0 mov p2.b, p1.b not p1.b, p0/z, p1.b ld1d z1.d, p1/z, [x3, x2, lsl 3] In order to perform the check I have to fully expand the NOT stmts when recording them as the SSA names for the top level expressions differ but their arguments don't. e.g. in _31 = ~_34 the value of _34 differs but not the operands in _34. But we only do this when the operation is an ordered one because mixing ordered and unordered expressions can lead to de-optimized code. Note: This patch series is working incrementally towards generating the most efficient code for this and other loops in small steps. The mov is created by postreload when it does a late CSE. gcc/ChangeLog: * tree-vectorizer.h (struct scalar_cond_masked_key): Add inverted_p. (default_hash_traits<scalar_conf_masked_key>): Likewise. * tree-vect-stmts.c (vectorizable_condition): Check if inverse of mask is live. * tree-vectorizer.c (scalar_cond_masked_key::get_cond_ops_from_tree): Register mask inverses. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pred-not-gen-1.c: Update testcase. * gcc.target/aarch64/sve/pred-not-gen-2.c: Update testcase. * gcc.target/aarch64/sve/pred-not-gen-3.c: Update testcase. * gcc.target/aarch64/sve/pred-not-gen-4.c: Update testcase.
2021-11-10middle-end: Add an RPO pass after successful vectorizationTamar Christina1-21/+32
Following my current SVE predicate optimization series a problem has presented itself in that the way vector masks are generated for masked operations relies on CSE to share masks efficiently. The issue however is that masking is done using the & operand and & is associative and so reassoc decides to reassociate the masked operations. This makes CSE then unable to CSE an unmasked and a masked operation leading to duplicate operations being performed. To counter this we want to add an RPO pass over the vectorized loop body when vectorization succeeds. This makes it then no longer reliant on the RTL level CSE. I have not added a testcase for this as it requires the changes in my patch series, however the entire series relies on this patch to work so all the tests there cover it. gcc/ChangeLog: * tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon successful vectorization.
2021-11-10Grow sbr_vector in ranger's on-entry cache as needed.Andrew MacLeod1-4/+31
The on-entry cache does not expect the number of BBs to change. This could happen in various scenarios, recently in the suggestion to use ranger with loop unswitching and also with a work in progress to use the path solver in the loopch pass. This patch fixes both. This is a patch from Andrew, who tested it on x86-64 Linux. gcc/ChangeLog: * gimple-range-cache.cc (sbr_vector::grow): New. (sbr_vector::set_bb_range): Call grow. (sbr_vector::get_bb_range): Same. (sbr_vector::bb_range_p): Remove assert.
2021-11-10AArch64: Remove shuffle pattern for rounding variant.Tamar Christina3-34/+2
This removed the patterns to optimize the rounding shift and narrow. The optimization is valid only for the truncating rounding shift and narrow, for the rounding shift and narrow we need a different pattern that I will submit separately. This wasn't noticed before as the benchmarks did not run conformance as part of the run, which we now do and this now passes again. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_topbits_shuffle<mode>_le ,*aarch64_topbits_shuffle<mode>_be): Remove. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shrn-combine-8.c: Update. * gcc.target/aarch64/shrn-combine-9.c: Update.
2021-11-10Extend modref by side-effect analysisJan Hubicka4-57/+219
Make modref to also collect info whether function has side effects. This allows pure/const function detection and also handling functions which do store some memory in similar way as we handle pure/consts now. The code is symmetric to what ipa-pure-const does. Modref is actually more capable on proving that a given function is pure/const (since it understands that non-pure function can be called when it only modifies data on stack) so we could retire ipa-pure-const's pure-const discovery at some point. However this patch only does the anlaysis - the consumers of this flag will come next. Bootstrapped/regtested x86_64-linux. I plan to commit it later today if there are no complains. gcc/ChangeLog: * ipa-modref.c: Include tree-eh.h (modref_summary::modref_summary): Initialize side_effects. (struct modref_summary_lto): New bool field side_effects. (modref_summary_lto::modref_summary_lto): Initialize side_effects. (modref_summary::dump): Dump side_effects. (modref_summary_lto::dump): Dump side_effects. (merge_call_side_effects): Merge side effects. (process_fnspec): Calls to non-const/pure or looping function is a side effect. (analyze_call): Self-recursion is a side-effect; handle special builtins. (analyze_load): Watch for volatile and throwing memory. (analyze_store): Likewise. (analyze_stmt): Watch for volatitle asm. (analyze_function): Handle side_effects. (modref_summaries::duplicate): Duplicate side_effects. (modref_summaries_lto::duplicate): Likewise. (modref_write): Stream side_effects. (read_section): Likewise. (update_signature): Update. (propagate_unknown_call): Handle side_effects. (modref_propagate_in_scc): Likewise. * ipa-modref.h (struct modref_summary): Add side_effects. * ipa-pure-const.c (special_builtin_state): Rename to ... (builtin_safe_for_const_function_p): ... this one. (check_call): Update. (finite_function_p): Break out from ... (propagate_pure_const): ... here * ipa-utils.h (finite_function): Declare.
2021-11-10Fix typo in modref-13.cJan Hubicka1-1/+1
gcc/testsuite/ChangeLog: 2021-11-10 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/tree-ssa/modref-13.c: Fix typo.
2021-11-10rs6000: Remove LINK_OS_EXTRA_SPEC{32,64} from --with-advance-toolchainLucas A. M. Magalhaes1-10/+0
Historically this was added to fill gaps from ld.so.cache on early AT releases. This now are just causing errors and rework. Since AT5.0 the AT's ld.so is using a correctly configured ld.so.cache and sets the DT_INTERP to AT's ld.so. This two factors are sufficient for an AT builded program to get the correct libraries. GCC congured with --with-advance-toolchain has issues building GlibC releases because it adds DT_RUNPATH to ld.so and that's unsupported. 2021-11-10 Lucas A. M. Magalhães <lamm@linux.ibm.com> gcc/ * config.gcc (powerpc*-*-*): Remove -rpath from --with-advance-toolchain.
2021-11-10attribs: Implement -Wno-attributes=vendor::attr [PR101940]Marek Polacek12-12/+374
It is desirable for -Wattributes to warn about e.g. [[deprecate]] void g(); // typo, should warn However, -Wattributes also warns about vendor-specific attributes (that's because lookup_scoped_attribute_spec -> find_attribute_namespace finds nothing), which, with -Werror, causes grief. We don't want the -Wattributes warning for [[company::attr]] void f(); GCC warns because it doesn't know the "company" namespace; it only knows the "gnu" and "omp" namespaces. We could entirely disable warning about attributes in unknown scopes but then the compiler would also miss typos like [[company::attrx]] void f(); or [[gmu::warn_used_result]] int write(); so that is not a viable solution. A workaround is to use a #pragma: #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wattributes" [[company::attr]] void f() {} #pragma GCC diagnostic pop but that's a mouthful and awkward to use and could also hide typos. In fact, any macro-based solution doesn't seem like a way forward. This patch implements -Wno-attributes=, which takes these arguments: company::attr company:: This option should go well with using @file: the user could have a file containing -Wno-attributes=vendor::attr1,vendor::attr2 and then invoke gcc with '@attrs' or similar. I've also added a new pragma which has the same effect: The pragma along with the new option should help with various static analysis tools. PR c++/101940 gcc/ChangeLog: * attribs.c (struct scoped_attributes): Add a bool member. (lookup_scoped_attribute_spec): Forward declare. (register_scoped_attributes): New bool parameter, defaulted to false. Use it. (handle_ignored_attributes_option): New function. (free_attr_data): New function. (init_attributes): Call handle_ignored_attributes_option. (attr_namespace_ignored_p): New function. (decl_attributes): Check attr_namespace_ignored_p before warning. * attribs.h (free_attr_data): Declare. (register_scoped_attributes): Adjust declaration. (handle_ignored_attributes_option): Declare. (canonicalize_attr_name): New function template. (canonicalize_attr_name): Use it. * common.opt (Wattributes=): New option with a variable. * doc/extend.texi: Document #pragma GCC diagnostic ignored_attributes. * doc/invoke.texi: Document -Wno-attributes=. * opts.c (common_handle_option) <case OPT_Wattributes_>: Handle. * plugin.h (register_scoped_attributes): Adjust declaration. * toplev.c (compile_file): Call free_attr_data. gcc/c-family/ChangeLog: * c-pragma.c (handle_pragma_diagnostic): Handle #pragma GCC diagnostic ignored_attributes. gcc/testsuite/ChangeLog: * c-c++-common/Wno-attributes-1.c: New test. * c-c++-common/Wno-attributes-2.c: New test. * c-c++-common/Wno-attributes-3.c: New test.
2021-11-10arm: enable cortex-a710 CPUPrzemyslaw Wirkus4-5/+20
This patch is adding support for Cortex-A710 CPU in Arm. gcc/ChangeLog: * config/arm/arm-cpus.in (cortex-a710): New CPU. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Regenerate. * doc/invoke.texi: Update docs.
2021-11-10[AArch64] Fix bootstrap failure due to missing ATTRIBUTE_UNUSEDAndre Vieira1-1/+1
gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_general_gimple_fold_builtin): Mark argument as unused.
2021-11-10lto-wrapper: fix memory corruption.Martin Liska1-1/+1
The first argument of merge_and_complain is actually vector where we merge options and it should be propagated to caller properly. Fixes: ==6656== Invalid read of size 8 ==6656== at 0x408056: merge_and_complain (lto-wrapper.c:335) ==6656== by 0x408056: find_and_merge_options(int, long, char const*, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>*, char const*) (lto-wrapper.c:1139) ==6656== by 0x408AFC: run_gcc(unsigned int, char**) (lto-wrapper.c:1505) ==6656== by 0x4061A2: main (lto-wrapper.c:2138) ==6656== Address 0x4e69b18 is 344 bytes inside a block of size 1,768 free'd ==6656== at 0x484339F: realloc (vg_replace_malloc.c:1192) ==6656== by 0x4993C0: xrealloc (xmalloc.c:181) ==6656== by 0x406A82: reserve<cl_decoded_option> (vec.h:290) ==6656== by 0x406A82: reserve (vec.h:1858) ==6656== by 0x406A82: vec<cl_decoded_option, va_heap, vl_ptr>::safe_push(cl_decoded_option const&) [clone .isra.0] (vec.h:1967) ==6656== by 0x4077E0: merge_and_complain (lto-wrapper.c:457) ==6656== by 0x4077E0: find_and_merge_options(int, long, char const*, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>*, char const*) (lto-wrapper.c:1139) ==6656== by 0x408AFC: run_gcc(unsigned int, char**) (lto-wrapper.c:1505) ==6656== by 0x4061A2: main (lto-wrapper.c:2138) ==6656== Block was alloc'd at ==6656== at 0x483E70F: malloc (vg_replace_malloc.c:380) ==6656== by 0x4993D7: xrealloc (xmalloc.c:179) ==6656== by 0x407476: reserve<cl_decoded_option> (vec.h:290) ==6656== by 0x407476: reserve (vec.h:1858) ==6656== by 0x407476: reserve_exact (vec.h:1878) ==6656== by 0x407476: create (vec.h:1893) ==6656== by 0x407476: get_options_from_collect_gcc_options(char const*, char const*) (lto-wrapper.c:163) ==6656== by 0x407674: find_and_merge_options(int, long, char const*, vec<cl_decoded_option, va_heap, vl_ptr>, vec<cl_decoded_option, va_heap, vl_ptr>*, char const*) (lto-wrapper.c:1132) ==6656== by 0x408AFC: run_gcc(unsigned int, char**) (lto-wrapper.c:1505) ==6656== by 0x4061A2: main (lto-wrapper.c:2138) gcc/ChangeLog: * lto-wrapper.c (merge_and_complain): Make the first argument a reference type.
2021-11-10aarch64: Tweak FMAX/FMIN iteratorsRichard Sandiford6-51/+39
There was some duplication between the maxmin_uns (uns for unspec rather than unsigned) int attribute and the optab int attribute. The difficulty for FMAXNM and FMINNM is that the instructions really correspond to two things: the smax/smin optabs for floats (used only for fast-math-like flags) and the fmax/fmin optabs (used for built-in functions). The optab attribute was consistently for the former but maxmin_uns had a mixture of both. This patch renames maxmin_uns to fmaxmin and only uses it for the fmax and fmin optabs. The reductions that previously used the maxmin_uns attribute now use the optab attribute instead. FMAX and FMIN are awkward in that they don't correspond to any optab. It's nevertheless useful to define them alongside the “real” optabs. Previously they were known as “smax_nan” and “smin_nan”, but the problem with those names it that smax and smin are only used for floats if NaNs don't matter. This patch therefore uses fmax_nan and fmin_nan instead. There is still some inconsistency, in that the optab attribute handles UNSPEC_COND_FMAX but the fmaxmin attribute handles UNSPEC_FMAX. This is because the SVE FP instructions, being predicated, have to use unspecs in cases where the Advanced SIMD ones could use rtl codes. At least there are no duplicate entries though, so this seemed like the best compromise for now. gcc/ * config/aarch64/iterators.md (optab): Use fmax_nan instead of smax_nan and fmin_nan instead of smin_nan. (maxmin_uns): Rename to... (fmaxmin): ...this and make the same changes. Remove entries unrelated to fmax* and fmin*. * config/aarch64/aarch64.md (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. * config/aarch64/aarch64-simd.md (aarch64_<maxmin_uns>p<mode>): Rename to... (aarch64_<optab>p<mode>): ...this. (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. (reduc_<maxmin_uns>_scal_<mode>): Rename to... (reduc_<optab>_scal_<mode>): ...this and update gen* call. (aarch64_reduc_<maxmin_uns>_internal<mode>): Rename to... (aarch64_reduc_<optab>_internal<mode>): ...this. (aarch64_reduc_<maxmin_uns>_internalv2si): Rename to... (aarch64_reduc_<optab>_internalv2si): ...this. * config/aarch64/aarch64-sve.md (<maxmin_uns><mode>3): Rename to... (<fmaxmin><mode>3): ...this. * config/aarch64/aarch64-simd-builtins.def (smax_nan, smin_nan) Rename to... (fmax_nan, fmin_nan): ...this. * config/aarch64/arm_neon.h (vmax_f32, vmax_f64, vmaxq_f32, vmaxq_f64) (vmin_f32, vmin_f64, vminq_f32, vminq_f64, vmax_f16, vmaxq_f16) (vmin_f16, vminq_f16): Update accordingly.
2021-11-10vect: Pass scalar_costs to finish_costRichard Sandiford6-18/+23
When finishing the vector costs, it can be useful to know what the associated scalar costs were. This allows targets to read information collected about the original scalar loop when trying to make a final judgement about the cost of the vector code. This patch therefore passes the scalar costs to vector_costs::finish_cost. The parameter is null for the scalar costs themselves. gcc/ * tree-vectorizer.h (vector_costs::finish_cost): Take the corresponding scalar costs as a parameter. (finish_cost): Likewise. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost) (vect_estimate_min_profitable_iters): Update accordingly. * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise. * tree-vectorizer.c (vector_costs::finish_cost): Likewise. * config/aarch64/aarch64.c (aarch64_vector_costs::finish_cost): Likewise. * config/rs6000/rs6000.c (rs6000_cost_data::finish_cost): Likewise.
2021-11-10vect: Keep scalar costs around longerRichard Sandiford2-15/+19
The scalar costs for a loop are fleeting, with only the final single_scalar_iteration_cost being kept for later comparison. This patch replaces single_scalar_iteration_cost with the cost structure, so that (with later patches) it's possible for targets to examine other target-specific cost properties as well. This will be done by passing the scalar costs to hooks where appropriate; targets shouldn't try to read the information directly from loop_vec_infos. gcc/ * tree-vectorizer.h (_loop_vec_info::scalar_costs): New member variable. (_loop_vec_info::single_scalar_iteration_cost): Delete. (LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Delete. (vector_costs::total_cost): New function. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update after above changes. (_loop_vec_info::~_loop_vec_info): Delete scalar_costs. (vect_compute_single_scalar_iteration_cost): Store the costs in loop_vinfo->scalar_costs. (vect_estimate_min_profitable_iters): Get the scalar cost from loop_vinfo->scalar_costs.
2021-11-10vect: Hookize better_loop_vinfo_pRichard Sandiford3-137/+226
One of the things we want to do on AArch64 is compare vector loops side-by-side and pick the best one. For some targets, we want this to be based on issue rates as well as the usual latency-based costs (at least for loops with relatively high iteration counts). The current approach to doing this is: when costing vectorisation candidate A, try to guess what the other main candidate B will look like and adjust A's latency-based cost up or down based on the likely difference between A and B's issue rates. This effectively means that we try to cost parts of B at the same time as A, without actually being able to see B. This is needlessly indirect and complex. It was a compromise due to the code being added (too) late in the GCC 11 cycle, so that target-independent changes weren't possible. The target-independent code already compares two candidate loop_vec_infos side-by-side, so that information about A and B above are available directly. This patch creates a way for targets to hook into this comparison. The AArch64 code can therefore hook into better_main_loop_than_p to compare issue rates. If the issue rate comparison isn't decisive, the code can fall back to the normal latency-based comparison instead. gcc/ * tree-vectorizer.h (vector_costs::better_main_loop_than_p) (vector_costs::better_epilogue_loop_than_p) (vector_costs::compare_inside_loop_cost) (vector_costs::compare_outside_loop_cost): Likewise. * tree-vectorizer.c (vector_costs::better_main_loop_than_p) (vector_costs::better_epilogue_loop_than_p) (vector_costs::compare_inside_loop_cost) (vector_costs::compare_outside_loop_cost): New functions, containing code moved from... * tree-vect-loop.c (vect_better_loop_vinfo_p): ...here.
2021-11-10vect: Remove vec_outside/inside_cost fieldsRichard Sandiford2-21/+19
The vector costs now use a common base class instead of being completely abstract. This means that there's no longer a need to record the inside and outside costs separately. gcc/ * tree-vectorizer.h (_loop_vec_info): Remove vec_outside_cost and vec_inside_cost. (vector_costs::outside_cost): New function. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update after above. (vect_estimate_min_profitable_iters): Likewise. (vect_better_loop_vinfo_p): Get the inside and outside costs from the loop_vec_infos' vector_costs.
2021-11-10vect: Move vector costs to loop_vec_infoRichard Sandiford4-20/+18
target_cost_data is in vec_info but is really specific to loop_vec_info. This patch moves it there and renames it to vector_costs, to distinguish it from scalar target costs. gcc/ * tree-vectorizer.h (vec_info::target_cost_data): Replace with... (_loop_vec_info::vector_costs): ...this. (LOOP_VINFO_TARGET_COST_DATA): Delete. * tree-vectorizer.c (vec_info::vec_info): Remove target_cost_data initialization. (vec_info::~vec_info): Remove corresponding delete. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize vector_costs to null. (_loop_vec_info::~_loop_vec_info): Delete vector_costs. (vect_analyze_loop_operations): Update after above changes. (vect_analyze_loop_2): Likewise. (vect_estimate_min_profitable_iters): Likewise. * tree-vect-slp.c (vect_slp_analyze_operations): Likewise.
2021-11-10Make EAF flags more regular (and expressive)Jan Hubicka16-172/+302
I hoped that I am done with EAF flags related changes, but while looking into the Fortran testcases I noticed that I have designed them in unnecesarily restricted way. I followed the scheme of NOESCAPE and NODIRECTESCAPE which is however the only property tht is naturally transitive. This patch replaces the existing flags by 9 flags: EAF_UNUSED EAF_NO_DIRECT_CLOBBER and EAF_NO_INDIRECT_CLOBBER EAF_NO_DIRECT_READ and EAF_NO_INDIRECT_READ EAF_NO_DIRECT_ESCAPE and EAF_NO_INDIRECT_ESCAPE EAF_NO_DIRECT_READ and EAF_NO_INDIRECT_READ So I have removed the unified EAF_DIRECT flag and made each of the flags to come in direct and indirect variant. Newly the indirect variant is not implied by direct (well except for escape but it is not special cased in the code) Consequently we can analyse i.e. the case where function reads directly and clobber indirectly as in the following testcase: struct wrap { void **array; }; __attribute__ ((noinline)) void write_array (struct wrap *ptr) { ptr->array[0]=0; } int test () { void *arrayval; struct wrap w = {&arrayval}; write_array (&w); return w.array == &arrayval; } This is pretty common in array descriptors and also C++ pointer wrappers or structures containing pointers to arrays. Other advantage is that !binds_to_current_def_p functions we can still track the fact that the value is not clobbered indirectly while previously we implied EAF_DIRECT for all three cases. Finally the propagation becomes more regular and I hope easier to understand because the flags are handled in a symmetric way. In tree-ssa-structalias I now produce "callarg" var_info as before and if necessary also "indircallarg" for the indirect accesses. I added some logic to optimize the common case where we can not make difference between direct and indirect. gcc/ChangeLog: 2021-11-09 Jan Hubicka <hubicka@ucw.cz> * tree-core.h (EAF_DIRECT): Remove. (EAF_NOCLOBBER): Remove. (EAF_UNUSED): Remove. (EAF_NOESCAPE): Remove. (EAF_NO_DIRECT_CLOBBER): New. (EAF_NO_INDIRECT_CLOBBER): New. (EAF_NODIRECTESCAPE): Remove. (EAF_NO_DIRECT_ESCAPE): New. (EAF_NO_INDIRECT_ESCAPE): New. (EAF_NOT_RETURNED): Remove. (EAF_NOT_RETURNED_INDIRECTLY): New. (EAF_NOREAD): Remove. (EAF_NO_DIRECT_READ): New. (EAF_NO_INDIRECT_READ): New. * gimple.c (gimple_call_arg_flags): Update for new flags. (gimple_call_retslot_flags): Update for new flags. * ipa-modref.c (dump_eaf_flags): Likewise. (remove_useless_eaf_flags): Likewise. (deref_flags): Likewise. (modref_lattice::init): Likewise. (modref_lattice::merge): Likewise. (modref_lattice::merge_direct_load): Likewise. (modref_lattice::merge_direct_store): Likewise. (modref_eaf_analysis::merge_call_lhs_flags): Likewise. (callee_to_caller_flags): Likewise. (modref_eaf_analysis::analyze_ssa_name): Likewise. (modref_eaf_analysis::propagate): Likewise. (modref_merge_call_site_flags): Likewise. * ipa-modref.h (interposable_eaf_flags): Likewise. * tree-ssa-alias.c: (ref_maybe_used_by_call_p_1) Likewise. * tree-ssa-structalias.c (handle_call_arg): Likewise. (handle_rhs_call): Likewise. * tree-ssa-uninit.c (maybe_warn_pass_by_reference): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ipa/modref-1.C: Update template. * gcc.dg/ipa/modref-3.c: Update template. * gcc.dg/lto/modref-3_0.c: Update template. * gcc.dg/lto/modref-4_0.c: Update template. * gcc.dg/tree-ssa/modref-10.c: Update template. * gcc.dg/tree-ssa/modref-11.c: Update template. * gcc.dg/tree-ssa/modref-5.c: Update template. * gcc.dg/tree-ssa/modref-6.c: Update template. * gcc.dg/tree-ssa/modref-13.c: New test.
2021-11-10testsuite: change vect_long to vect_long_long in complex tests.Tamar Christina4-13/+9
These tests are still failing on SPARC and it looks like this is because I need to use vect_long_long instead of vect_long. gcc/testsuite/ChangeLog: PR testsuite/103042 * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Use vect_long_long instead of vect_long. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: Likewise. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Likewise. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: Likewise.
2021-11-10middle-end: Fix signbit tests when ran on ISA with support for masks.Tamar Christina2-1/+10
These test don't work on vector ISAs where the truth type don't match the vector mode of the operation. However I still want the tests to run on these architectures but just turn off the ISA modes that enable masks. This thus turns off SVE is it's on and turns off AVX512 if it's on. gcc/testsuite/ChangeLog: * gcc.dg/signbit-2.c: Turn off masks. * gcc.dg/signbit-5.c: Likewise.
2021-11-10vect: remove unused variable in complex numbers detection code.Tamar Christina1-1/+0
This removed an unused variable that clang seems to catch when compiling GCC with Clang. gcc/ChangeLog: * tree-vect-slp-patterns.c (complex_mul_pattern::matches): Remove l1node.
2021-11-10libstdc++: Fix test for libstdc++ not including <unistd.h> [PR100117]Jonathan Wakely1-1/+112
The <cxxx> headers for the C library are not under our control, so we can't prevent them from including <unistd.h>. Change the PR 49745 test to only include the C++ library headers, not the <cxxx> ones. To ensure <bits/stdc++.h> isn't included automatically we need to use no_pch to disable PCH. libstdc++-v3/ChangeLog: PR libstdc++/100117 * testsuite/17_intro/headers/c++1998/49745.cc: Explicitly list all C++ headers instead of including <bits/stdc++.h>
2021-11-10libstdc++: Disable gthreads weak symbols for glibc 2.34 [PR103133]Jonathan Wakely1-0/+6
Since Glibc 2.34 all pthreads symbols are defined directly in libc not libpthread, and since Glibc 2.32 we have used __libc_single_threaded to avoid unnecessary locking in single-threaded programs. This means there is no reason to avoid linking to libpthread now, and so no reason to use weak symbols defined in gthr-posix.h for all the pthread_xxx functions. libstdc++-v3/ChangeLog: PR libstdc++/100748 PR libstdc++/103133 * config/os/gnu-linux/os_defines.h (_GLIBCXX_GTHREAD_USE_WEAK): Define for glibc 2.34 and later.
2021-11-10testsuite/102690 - XFAIL g++.dg/warn/Warray-bounds-16.CRichard Biener1-3/+3
This XFAILs the bogus diagnostic test and rectifies the expectation on the optimization. 2021-11-10 Richard Biener <rguenther@suse.de> PR testsuite/102690 * g++.dg/warn/Warray-bounds-16.C: XFAIL diagnostic part and optimization.
2021-11-10[AArch64] Fix TBAA information when lowering NEON loads and stores to gimpleAndre Vieira2-18/+47
This patch fixes the wrong TBAA information when lowering NEON loads and stores to gimple that showed up when bootstrapping with UBSAN. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_general_gimple_fold_builtin): Change pointer alignment and alias. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/lowering_tbaa.c: New test.
2021-11-10[AArch64] Fix big-endian testisms introduced by NEON gimple lowering patchAndre Vieira3-6/+12
This patch reverts the tests for big-endian after the NEON gimple lowering patch. The earlier patch only lowers NEON loads and stores for little-endian, meaning the codegen now differs between endinanness so we need target specific testing. gcc/testsuite/ChangeLog: * gcc.target/aarch64/fmla_intrinsic_1.c: Fix big-endian testism. * gcc.target/aarch64/fmls_intrinsic_1.c: Likewise. * gcc.target/aarch64/fmul_intrinsic_1.c: Likewise.
2021-11-10Fix modref_tree::remap_paramsJan Hubicka1-1/+1
gcc/ChangeLog: * ipa-modref-tree.h (modref_tree::remap_params): Fix off-by-one error.
2021-11-10rs6000, libgcc: Fix up -Wmissing-prototypes warning on rs6000/linux-unwind.hJakub Jelinek1-1/+2
Jonathan reported and I've verified a In file included from ../../../libgcc/unwind-dw2.c:412: ./md-unwind-support.h:398:6: warning: no previous prototype for ‘ppc_backchain_fallback’ [-Wmissing-prototypes] 398 | void ppc_backchain_fallback (struct _Unwind_Context *context, void *a) | ^~~~~~~~~~~~~~~~~~~~~~ warning on powerpc*-linux* libgcc build. All the other MD_* macro functions are static, so I think the following is the right thing rather than adding a previous prototype for ppc_backchain_fallback. 2021-11-10 Jakub Jelinek <jakub@redhat.com> * config/rs6000/linux-unwind.h (ppc_back_fallback): Make it static, formatting fix.
2021-11-10Improve integer bit test on __atomic_fetch_[or|and]_* returnsliuhongt29-42/+1557
commit adedd5c173388ae505470df152b9cb3947339566 Author: Jakub Jelinek <jakub@redhat.com> Date: Tue May 3 13:37:25 2016 +0200 re PR target/49244 (__sync or __atomic builtins will not emit 'lock bts/btr/btc') optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns with lock bts/btr/btc by turning mask_2 = 1 << cnt_1; _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); _5 = _4 & mask_2; into _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3); _5 = _4; and mask_6 = 1 << bit_5(D); _1 = ~mask_6; _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); _3 = _2 & mask_6; _4 = _3 != 0; into mask_6 = 1 << bit_5(D); _1 = ~mask_6; _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0); _4 = _11 != 0; But it failed to optimize many equivalent, but slighly different cases: 1. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _4 = (_Bool) _1; 2. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _4 = (_Bool) _1; 3. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _7 = ~_1; _5 = (_Bool) _7; 4. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _7 = ~_1; _5 = (_Bool) _7; 5. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _2 = (int) _1; _7 = ~_2; _5 = (_Bool) _7; 6. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _2 = (int) _1; _7 = ~_2; _5 = (_Bool) _7; 7. _1 = __atomic_fetch_or_4 (ptr_6, 0x80000000, _3); _5 = (signed int) _1; _4 = _5 < 0; 8. _1 = __atomic_fetch_and_4 (ptr_6, 0x7fffffff, _3); _5 = (signed int) _1; _4 = _5 < 0; 9. _1 = 1 << bit_4(D); mask_5 = (unsigned int) _1; _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0); _3 = _2 & mask_5; 10. mask_7 = 1 << bit_6(D); _1 = ~mask_7; _2 = (unsigned int) _1; _3 = __atomic_fetch_and_4 (v_9(D), _2, 0); _4 = (int) _3; _5 = _4 & mask_7; We make mask_2 = 1 << cnt_1; _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); _5 = _4 & mask_2; and mask_6 = 1 << bit_5(D); _1 = ~mask_6; _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); _3 = _2 & mask_6; _4 = _3 != 0; the canonical forms for this optimization and transform cases 1-9 to the equivalent canonical form. For cases 10 and 11, we simply remove the cast before __atomic_fetch_or_4/__atomic_fetch_and_4 with _1 = 1 << bit_4(D); _2 = __atomic_fetch_or_4 (v_7(D), _1, 0); _3 = _2 & _1; and mask_7 = 1 << bit_6(D); _1 = ~mask_7; _3 = __atomic_fetch_and_4 (v_9(D), _1, 0); _6 = _3 & mask_7; _5 = (int) _6; 2021-11-04 H.J. Lu <hongjiu.lu@intel.com> Hongtao Liu <hongtao.liu@intel.com> gcc/ PR middle-end/102566 * match.pd (nop_atomic_bit_test_and_p): New match. * tree-ssa-ccp.c (convert_atomic_bit_not): New function. (gimple_nop_atomic_bit_test_and_p): New prototype. (optimize_atomic_bit_test_and): Transform equivalent, but slighly different cases to their canonical forms. gcc/testsuite/ PR middle-end/102566 * g++.target/i386/pr102566-1.C: New test. * g++.target/i386/pr102566-2.C: Likewise. * g++.target/i386/pr102566-3.C: Likewise. * g++.target/i386/pr102566-4.C: Likewise. * g++.target/i386/pr102566-5a.C: Likewise. * g++.target/i386/pr102566-5b.C: Likewise. * g++.target/i386/pr102566-6a.C: Likewise. * g++.target/i386/pr102566-6b.C: Likewise. * gcc.target/i386/pr102566-1a.c: Likewise. * gcc.target/i386/pr102566-1b.c: Likewise. * gcc.target/i386/pr102566-2.c: Likewise. * gcc.target/i386/pr102566-3a.c: Likewise. * gcc.target/i386/pr102566-3b.c: Likewise. * gcc.target/i386/pr102566-4.c: Likewise. * gcc.target/i386/pr102566-5.c: Likewise. * gcc.target/i386/pr102566-6.c: Likewise. * gcc.target/i386/pr102566-7.c: Likewise. * gcc.target/i386/pr102566-8a.c: Likewise. * gcc.target/i386/pr102566-8b.c: Likewise. * gcc.target/i386/pr102566-9a.c: Likewise. * gcc.target/i386/pr102566-9b.c: Likewise. * gcc.target/i386/pr102566-10a.c: Likewise. * gcc.target/i386/pr102566-10b.c: Likewise. * gcc.target/i386/pr102566-11.c: Likewise. * gcc.target/i386/pr102566-12.c: Likewise. * gcc.target/i386/pr102566-13.c: New test. * gcc.target/i386/pr102566-14.c: New test.
2021-11-10[Ada] Minor cleanup in translation of calls to subprogramsEric Botcazou3-65/+60
gcc/ada/ * gcc-interface/ada-tree.h (DECL_STUBBED_P): Delete. * gcc-interface/decl.c (gnat_to_gnu_entity): Do not set it. * gcc-interface/trans.c (Call_to_gnu): Use GNAT_NAME local variable and adjust accordingly. Replace test on DECL_STUBBED_P with direct test on Convention and move it down in the processing.
2021-11-10[Ada] Warn for bidirectional charactersBob Duff2-4/+54
gcc/ada/ * scng.adb (Check_Bidi): New procedure to give warning. Note that this is called only for non-ASCII characters, so should not be an efficiency issue. (Slit): Call Check_Bidi for wide characters in string_literals. (Minus_Case): Call Check_Bidi for wide characters in comments. (Char_Literal_Case): Call Check_Bidi for wide characters in character_literals. Move Accumulate_Checksum down, because otherwise, if Err is True, the Code is uninitialized. * errout.ads: Make the obsolete nature of "Insertion character ?" more prominent; one should not have to read several paragraphs before finding out that it's obsolete.
2021-11-10[Ada] Avoid warnings regarding rep clauses in generics -- follow-onBob Duff1-1/+1
gcc/ada/ * repinfo.adb (List_Component_Layout): Initialize Sbit.
2021-11-10[Ada] Fix comments about expansion of array equalityPiotr Trojanek1-2/+3
gcc/ada/ * exp_ch4.adb (Expand_Array_Equality): Fix inconsistent casing in comment about the template for expansion of array equality; now we use lower case for true/false/boolean. (Handle_One_Dimension): Fix comment about the template for expansion of array equality.
2021-11-10[Ada] Avoid warnings regarding rep clauses in genericsBob Duff2-9/+26
gcc/ada/ * repinfo.adb (List_Common_Type_Info, List_Object_Info): Add check for In_Generic_Scope. (List_Component_Layout): Check for known static values. * sem_ch13.adb (Check_Record_Representation_Clause): Add check for In_Generic_Scope.
2021-11-10[Ada] ACATS BDC1002 shall not error on arbitrary aspectEtienne Servais10-76/+164
gcc/ada/ * aspects.adb, aspects.ads (Is_Aspect_Id): New function. * namet-sp.ads, namet-sp.adb (Aspect_Spell_Check, Attribute_Spell_Check): New Functions. * par-ch13.adb (Possible_Misspelled_Aspect): Removed. (With_Present): Use Aspect_Spell_Check, use Is_Aspect_Id. (Get_Aspect_Specifications): Use Aspect_Spell_Check, Is_Aspect_Id, Bad_Aspect. * par-sync.adb (Resync_Past_Malformed_Aspect): Use Is_Aspect_Id. * sem_ch13.adb (Check_One_Attr): Use Is_Aspect_Id. * sem_prag.adb (Process_Restrictions_Or_Restriction_Warnings): Introduce the Process_No_Specification_Of_Aspect, emit a warning instead of an error on unknown aspect, hint for typos. Introduce Process_No_Use_Of_Attribute to add spell check for attributes too. (Set_Error_Msg_To_Profile_Name): Use Is_Aspect_Id. * sem_util.adb (Bad_Attribute): Use Attribute_Spell_Check. (Bad_Aspect): New function. * sem_util.ads (Bad_Aspect): New function.
2021-11-10[Ada] Do not assume a priority value of zero is a valid priorityPatrick Bernardi3-7/+8
gcc/ada/ * libgnarl/s-taskin.adb (Initialize_ATCB): Initialize T.Common.Current_Priority to Priority'First. * libgnarl/s-taskin.ads (Unspecified_Priority): Redefined as -1. * libgnat/system-rtems.ads: Start priority range from 1, as 0 is reserved by the operating system.
2021-11-10[Ada] Prove double precision integer arithmetic unitPierre-Alexandre Bazin6-113/+2515
gcc/ada/ * libgnat/a-nbnbig.ads: Mark the unit as Pure. * libgnat/s-aridou.adb: Add contracts and ghost code for proof. (Scaled_Divide): Reorder operations and use of temporaries in two places to facilitate proof. * libgnat/s-aridou.ads: Add full functional contracts. * libgnat/s-arit64.adb: Mark in SPARK. * libgnat/s-arit64.ads: Add contracts similar to those from s-aridou.ads. * rtsfind.ads: Document the limitation that runtime units loading does not work for private with-clauses.
2021-11-10[Ada] Don't carry action bodies for expansion of array equalityPiotr Trojanek4-61/+34
gcc/ada/ * exp_ch3.adb (Make_Eq_Body): Adapt call to Expand_Record_Equality. * exp_ch4.ads, exp_ch4.adb (Expand_Composite_Equality): Remove Bodies parameter; adapt comment; fix style in body; adapt calls to Expand_Record_Equality. (Expand_Array_Equality): Adapt calls to Expand_Composite_Equality. (Expand_Record_Equality): Remove Bodies parameter; adapt comment; adapt call to Expand_Composite_Equality. * exp_ch8.adb (Build_Body_For_Renaming): Adapt call to Expand_Record_Equality.
2021-11-10[Ada] Use predefined equality for arrays inside recordsPiotr Trojanek1-68/+2
gcc/ada/ * exp_ch4.adb (Expand_Composite_Equality): Handle arrays inside records just like scalars; only records inside records need dedicated handling.
2021-11-10[Ada] Fix oversight in latest change to Has_Compatible_TypeEric Botcazou3-9/+26
gcc/ada/ * sem_type.ads (Has_Compatible_Type): Add For_Comparison parameter. * sem_type.adb (Has_Compatible_Type): Put back the reversed calls to Covers guarded with For_Comparison. * sem_ch4.adb (Analyze_Membership_Op) <Try_One_Interp>: Remove new reversed call to Covers and set For_Comparison to true instead. (Find_Comparison_Types) <Try_One_Interp>: Likewise (Find_Equality_Types) <Try_One_Interp>: Likewise.
2021-11-10[Ada] Create explicit ghost mirror unit for big integersYannick Moy5-9/+37
gcc/ada/ * Makefile.rtl: Add unit. * libgnat/a-nbnbin__ghost.adb: Move... * libgnat/a-nbnbig.adb: ... here. Mark ghost as ignored. * libgnat/a-nbnbin__ghost.ads: Move... * libgnat/a-nbnbig.ads: ... here. Add comment for purpose of this unit. Mark ghost as ignored. * libgnat/s-widthu.adb: Use new unit. * sem_aux.adb (First_Subtype): Adapt to the case of a ghost type whose freeze node is rewritten to a null statement.
2021-11-10[Ada] Fix Constraint error on rexgexp close bracket find algorithmEtienne Servais1-48/+59
gcc/ada/ * libgnat/s-regexp.adb (Check_Well_Formed_Pattern): Fix Constraint_Error on missing close bracket.
2021-11-10[Ada] Extend optimized equality of 2-element arraysPiotr Trojanek1-13/+11
gcc/ada/ * exp_ch4.adb (Expand_Array_Equality): Remove check of the array bound being an N_Range node; use Type_High_Bound/Type_Low_Bound, which handle all kinds of array bounds.
2021-11-10[Ada] Warn when interfaces swapped between full and partial viewEtienne Servais1-21/+53
gcc/ada/ * sem_ch3.adb (Derived_Type_Declaration): Introduce a subprogram for tree transformation. If a tree transformation is performed, then warn that it would be better to reorder the interfaces.
2021-11-10[Ada] Add guard against previous error for peculiar ACATS testEric Botcazou1-0/+6
gcc/ada/ * sem_ch4.adb (Find_Non_Universal_Interpretations): Add guard.
2021-11-10[Ada] Better error message on missing parenthesesYannick Moy1-2/+4
gcc/ada/ * par-ch4.adb (P_Primary): Adapt test for getting error message on missing parentheses.
2021-11-10Extend is_cond_scalar_reduction to handle bit_and/bit_xor/bit_ior.liuhongt4-8/+95
This will enable transformation like - # sum1_50 = PHI <prephitmp_64(13), 0(4)> - # sum2_52 = PHI <sum2_21(13), 0(4)> + # sum1_50 = PHI <_87(13), 0(4)> + # sum2_52 = PHI <_89(13), 0(4)> # ivtmp_62 = PHI <ivtmp_61(13), 64(4)> i.2_7 = (long unsigned int) i_49; _8 = i.2_7 * 8; ... vec1_i_38 = vec1_29 >> _10; vec2_i_39 = vec2_31 >> _10; _11 = vec1_i_38 & 1; - _63 = tmp_37 ^ sum1_50; - prephitmp_64 = _11 == 0 ? sum1_50 : _63; + _ifc__86 = _11 != 0 ? tmp_37 : 0; + _87 = sum1_50 ^ _ifc__86; _12 = vec2_i_39 & 1; : so that vectorizer won't failed due to /* If this isn't a nested cycle or if the nested cycle reduction value is used ouside of the inner loop we cannot handle uses of the reduction value. */ if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "reduction used in loop.\n"); return NULL; } gcc/ChangeLog: PR tree-optimization/103126 * tree-vect-loop.c (neutral_op_for_reduction): Remove static. * tree-vectorizer.h (neutral_op_for_reduction): Declare. * tree-if-conv.c : Include tree-vectorizer.h. (is_cond_scalar_reduction): Handle BIT_XOR_EXPR/BIT_IOR_EXPR/BIT_AND_EXPR. (convert_scalar_cond_reduction): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/ifcvt-reduction-logic-op.c: New test.
2021-11-10i386: Support complex fma/conj_fma for _Float16.konglin12-0/+63
Support cmla_optab, cmul_optab, cmla_conj_optab, cmul_conj_optab for vector _Float16. gcc/ChangeLog: * config/i386/sse.md (cmul<conj_op><mode>3): add new define_expand. (cmla<conj_op><mode>4): Likewise gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vector-complex-float.c: New test.