Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch folds certain reductions of X & CST to X[I] & CST[I] if I is
the only nonzero element of CST. This includes the motivating case in
which CST[I] is -1.
We could do the same for REDUC_MAX on unsigned types, but I wasn't sure
that that special case was worth it.
2019-01-07 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/88598
* tree.h (single_nonzero_element): Declare.
* tree.c (single_nonzero_element): New function.
* match.pd: Fold certain reductions of X & CST to X[I] & CST[I]
if I is the only nonzero element of CST.
gcc/testsuite/
PR tree-optimization/88598
* gcc.dg/vect/pr88598-1.c: New test.
* gcc.dg/vect/pr88598-2.c: Likewise.
* gcc.dg/vect/pr88598-3.c: Likewise.
* gcc.dg/vect/pr88598-4.c: Likewise.
* gcc.dg/vect/pr88598-5.c: Likewise.
* gcc.dg/vect/pr88598-6.c: Likewise.
From-SVN: r267646
|
|
The PR has:
vect__6.24_42 = vect__5.23_41 * { 0.0, 1.0e+0, 0.0, 0.0 };
which for -fno-signed-zeros -fno-signaling-nans can be simplified to:
vect__6.24_42 = vect__5.23_41 & { 0, -1, 0, 0 };
I deliberately didn't handle COMPLEX_CST or CONSTRUCTOR in
initializer_each_zero_or_onep since there are no current use cases.
The patch also makes (un)signed_type_for handle floating-point types.
I tried to audit all callers and the few that handle null returns would
be unaffected.
2019-01-07 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/88598
* tree.h (initializer_each_zero_or_onep): Declare.
* tree.c (initializer_each_zero_or_onep): New function.
(signed_or_unsigned_type_for): Handle float types too.
(unsigned_type_for, signed_type_for): Update comments accordingly.
* match.pd: Fold x * { 0 or 1, 0 or 1, ...} to
x & { 0 or -1, 0 or -1, ... }.
gcc/testsuite/
PR tree-optimization/88598
* gcc.dg/pr88598-1.c: New test.
* gcc.dg/pr88598-2.c: Likewise.
* gcc.dg/pr88598-3.c: Likewise.
* gcc.dg/pr88598-4.c: Likewise.
* gcc.dg/pr88598-5.c: Likewise.
From-SVN: r267645
|
|
* doc/install.texi: Replace references to x86_64-unknown-linux-gnu
with x86_64-pc-linux-gnu.
From-SVN: r267643
|
|
With PTX_MAX_VECTOR_LENGTH set to larger than PTX_WARP_SIZE, routines can be
called from offloading regions with vector-size set to larger than warp size.
OTOH, vector-partitionable routines assume warp-sized vector length.
Detect if we're calling a vector-partitionable routine from an offloading
region, and if so, fall back to warp-sized vector length in that region.
2019-01-07 Tom de Vries <tdevries@suse.de>
PR target/85486
* config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p): New
function.
(nvptx_goacc_validate_dims): Force vl32 if calling vector-partitionable
routines.
From-SVN: r267640
|
|
instead of V_512 and TARGET_AVX instead of...
* config/i386/sse.md (vec_extract<mode><ssehalfvecmodelower>): Use
V_256_512 iterator instead of V_512 and TARGET_AVX instead of
TARGET_AVX512F as condition.
From-SVN: r267639
|
|
PR debug/88723
* dwarf2out.c (const_ok_for_output_1): Remove redundant call to
const_not_ok_for_debug_p target hook.
(mem_loc_descriptor) <case UNSPEC>: Only call const_ok_for_output_1
on UNSPEC and subexpressions thereof if all subexpressions of the
UNSPEC are CONSTANT_P.
From-SVN: r267638
|
|
PR tree-optimization/88676
* tree-ssa-phiopt.c (two_value_replacement): New function.
(tree_ssa_phiopt_worker): Call it.
* gcc.dg/tree-ssa/pr88676.c: New test.
* gcc.dg/pr88676.c: New test.
* gcc.dg/tree-ssa/pr15826.c: Just verify there is no goto,
allow &.
From-SVN: r267634
|
|
since r266664)
PR sanitizer/88619
* cfgexpand.c (expand_stack_vars): Only align prev_offset to
ASAN_MIN_RED_ZONE_SIZE, not to maximum of that and alignb.
* c-c++-common/asan/pr88619.c: New test.
From-SVN: r267633
|
|
PR c++/85052
* tree-vect-generic.c: Include insn-config.h and recog.h.
(expand_vector_piecewise): Add defaulted ret_type argument,
if non-NULL, use that in preference to type for the result type.
(expand_vector_parallel): Formatting fix.
(do_vec_conversion, do_vec_narrowing_conversion,
expand_vector_conversion): New functions.
(expand_vector_operations_1): Call expand_vector_conversion
for VEC_CONVERT ifn calls.
* internal-fn.def (VEC_CONVERT): New internal function.
* internal-fn.c (expand_VEC_CONVERT): New function.
* fold-const-call.c (fold_const_vec_convert): New function.
(fold_const_call): Use it for CFN_VEC_CONVERT.
* doc/extend.texi (__builtin_convertvector): Document.
c-family/
* c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR.
(c_build_vec_convert): Declare.
* c-common.c (c_build_vec_convert): New function.
c/
* c-parser.c (c_parser_postfix_expression): Parse
__builtin_convertvector.
cp/
* cp-tree.h (cp_build_vec_convert): Declare.
* parser.c (cp_parser_postfix_expression): Parse
__builtin_convertvector.
* constexpr.c: Include fold-const-call.h.
(cxx_eval_internal_function): Handle IFN_VEC_CONVERT.
(potential_constant_expression_1): Likewise.
* semantics.c (cp_build_vec_convert): New function.
* pt.c (tsubst_copy_and_build): Handle CALL_EXPR to
IFN_VEC_CONVERT.
testsuite/
* c-c++-common/builtin-convertvector-1.c: New test.
* c-c++-common/torture/builtin-convertvector-1.c: New test.
* g++.dg/ext/builtin-convertvector-1.C: New test.
* g++.dg/cpp0x/constexpr-builtin4.C: New test.
From-SVN: r267632
|
|
Add support for vector reductions with openacc vector_length larger than
warp-size.
2019-01-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-protos.h (nvptx_output_red_partition): Declare.
* config/nvptx/nvptx.c (vector_red_size, vector_red_align,
vector_red_partition, vector_red_sym): New global variables.
(nvptx_option_override): Initialize vector_red_sym.
(nvptx_declare_function_name): Restore red_partition register.
(nvptx_file_end): Emit code to declare the vector reduction variables.
(nvptx_output_red_partition): New function.
(nvptx_expand_shared_addr): Add vector argument. Use it to handle
large vector reductions.
(enum nvptx_builtins): Add NVPTX_BUILTIN_VECTOR_ADDR.
(nvptx_init_builtins): Add VECTOR_ADDR.
(nvptx_expand_builtin): Update call to nvptx_expand_shared_addr.
Handle nvptx_expand_shared_addr.
(nvptx_get_shared_red_addr): Add vector argument and handle large
vectors.
(nvptx_goacc_reduction_setup): Add offload_attrs argument and handle
large vectors.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.
(nvptx_goacc_reduction): Update calls to nvptx_goacc_reduction_{setup,
init,fini,teardown}.
(nvptx_init_axis_predicate): Initialize vector_red_partition.
(nvptx_set_current_function): Init vector_red_partition.
* config/nvptx/nvptx.md (UNSPECV_RED_PART): New unspecv.
(nvptx_red_partition): New insn.
* config/nvptx/nvptx.h (struct machine_function): Add red_partition.
From-SVN: r267631
|
|
When compiling an empty loop:
...
long long v1;
#pragma acc parallel num_gangs (640) num_workers(1) vector_length (128)
#pragma acc loop
for (v1 = 0; v1 < 20; v1 += 2)
;
...
the compiler emits two subsequent bar.syncs. This triggers some bug on my
quadro m1200 (I'm assuming in the ptxas/JIT compiler) that hangs the testcase.
This patch works around the bug by doing an optimization: we detect that this is
an empty loop (a forked immediately followed by a joining), and don't emit the
barriers.
The patch does not include the test-case yet, since vector_length (128) is not
yet supported at this point.
2019-01-07 Tom de Vries <tdevries@suse.de>
PR target/85381
* config/nvptx/nvptx.c (nvptx_process_pars): Don't emit barriers for
empty loops.
From-SVN: r267630
|
|
Add support for a per-worker broadcast buffer and barrier, to be used for
openacc vector_length larger than warp-size.
2019-01-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (oacc_bcast_partition): Declare.
(nvptx_option_override): Init oacc_bcast_partition.
(nvptx_init_oacc_workers): New function.
(nvptx_declare_function_name): Call nvptx_init_oacc_workers.
(nvptx_needs_shared_bcast): New function.
(nvptx_find_par): Generalize to enable vectors to use shared-memory
to propagate state.
(nvptx_shared_propagate): Initialize vector bcast partition and
synchronization state.
(nvptx_single): Generalize to enable vectors to use shared-memory
to propagate state.
(nvptx_process_pars): Likewise.
(nvptx_set_current_function): Initialize oacc_broadcast_partition.
* config/nvptx/nvptx.h (struct machine_function): Add
bcast_partition and sync_bar members.
From-SVN: r267629
|
|
Allow PTX_MAX_VECTOR_LENGTH to be define as larger than PTX_WARP_SIZE in
nvptx_goacc_validate_dims_1.
2019-01-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (nvptx_welformed_vector_length_p)
(nvptx_apply_dim_limits): New function.
(nvptx_goacc_validate_dims_1): Allow PTX_MAX_VECTOR_LENGTH larger than
PTX_WARP_SIZE.
From-SVN: r267628
|
|
Move warnings in nvptx_goacc_validate_dims_1 to as late as possible. This
allows us more flexibility in setting the dimensions.
2019-01-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (nvptx_goacc_validate_dims_1): Move warnings to
as late as possible.
From-SVN: r267627
|
|
Remove PTX_VECTOR_LENGTH and replace uses of it with PTX_DEFAULT_VECTOR_LENGTH,
PTX_MAX_VECTOR_LENGTH and PTX_WARP_SIZE.
2019-01-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (PTX_VECTOR_LENGTH): Remove.
(PTX_DEFAULT_VECTOR_LENGTH, PTX_MAX_VECTOR_LENGTH): Define.
(nvptx_goacc_validate_dims_1, nvptx_dim_limit)
(nvptx_goacc_reduction_fini): Use PTX_DEFAULT_VECTOR_LENGTH,
PTX_MAX_VECTOR_LENGTH and PTX_WARP_SIZE instead of PTX_VECTOR_LENGTH.
From-SVN: r267626
|
|
Add a few asserts to nvptx_goacc_validate_dims.
2019-01-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Add asserts.
From-SVN: r267625
|
|
Expose oacc_min_dims to backends.
2019-01-07 Tom de Vries <tdevries@suse.de>
* omp-offload.c (oacc_get_min_dim): New function.
* omp-offload.h (oacc_get_min_dim): Declare.
From-SVN: r267623
|
|
PR target/88521
* config/i386/i386.c (function_value_ms_64): Return small sturct in
AX_REG and float/double in FIRST_SSE_REG for 4 or 8 byte modes.
From-SVN: r267622
|
|
From-SVN: r267619
|
|
test starting with r248334)
PR tree-opt/86020
Revert:
2017-05-22 Jan Hubicka <hubicka@ucw.cz>
* ipa-inline.c (edge_badness): Use inlined_time instead of
inline_summaries->get.
From-SVN: r267612
|
|
* opts.c (enable_fdo_optimizations): Enable
version-loops-for-strides, loop-interchange, unrol-and-jam
and tree-loop-distribution.
* invoke.texi: Document newly enabled options.
From-SVN: r267611
|
|
2019-01-06 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/88658
* gfortran.h: Add macro gfc_real_4_kind
* simplify.c (simplify_min_max): Special case for the types of
AMAX0, AMIN0, MAX1 and MIN1, which actually change the types of
their arguments.
2019-01-06 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/88658
* gfortran.dg/min_max_type_2.f90: New test.
From-SVN: r267609
|
|
PR c/88363
* c-c++-common/attributes-4.c (falloc_align_int128,
falloc_size_int128): Guard with #ifdef __SIZEOF_INT128__.
From-SVN: r267608
|
|
From-SVN: r267606
|
|
* doc/invoke.texi (max-inline-insns-small): New parameters.
* ipa-inline.c (want_early_inline_function_p): simplify.
(want_inline_small_function_p): Fix pasto from previous patch;
use max-inline-insns-small bound.
* params.def (max-inline-insns-small): New param.
* ipa-fnsummary.c (analyze_function_body): Initialize time/size
variables correctly.
From-SVN: r267603
|
|
* doc/invoke.texi: Document max-inline-insns-size,
uninlined-function-insns, uninlined-function-time,
uninlined-thunk-insns and uninlined-thunk-time.
* params.def: Add max-inline-insns-size,
uninlined-function-insns, uninlined-function-time,
uninlined-thunk-insns and uninlined-thunk-time.
* ipa-fnsummary.c (compute_fn_summary, analyze_function_body): Use
new parameters.
* ipa-inline.c (can_inline_edge_by_limits_p,
want_inline_small_function_p): Use new parameters.
From-SVN: r267601
|
|
From-SVN: r267600
|
|
2019-01-05 Dominique d'Humieres <dominiq@gcc.gnu.org>
* gcc.dg/plugin/plugindir1.c: Adjust dg-prune-output for Darwin.
* gcc.dg/plugin/plugindir2.c: Likewise.
* gcc.dg/plugin/plugindir3.c: Likewise.
* gcc.dg/plugin/plugindir4.c: Likewise.
From-SVN: r267599
|
|
2019-01-05 Janus Weil <janus@gcc.gnu.org>
PR fortran/88009
* class.c (gfc_find_derived_vtab): Mark the _final component as
artificial.
(find_intrinsic_vtab): Ditto. Also add an extra check to avoid
dereferencing a null pointer and adjust indentation.
* resolve.c (resolve_fl_variable): Add extra check to avoid
dereferencing a null pointer. Move variable declarations to local scope.
(resolve_fl_procedure): Add extra check to avoid dereferencing a null
pointer.
* symbol.c (check_conflict): Suppress errors for artificial symbols.
2019-01-05 Janus Weil <janus@gcc.gnu.org>
PR fortran/88009
* gfortran.dg/blockdata_10.f90: New test case.
From-SVN: r267598
|
|
at function.c:783)
PR middle-end/82564
PR target/88620
* expr.c (expand_assignment): For calls returning VLA structures
if to_rtx is not a MEM, force it into a stack temporary.
* gcc.dg/nested-func-12.c: New test.
* gcc.c-torture/compile/pr82564.c: New test.
From-SVN: r267595
|
|
PR debug/88635
* dwarf2out.c (const_ok_for_output_1): Reject MINUS that contains
SYMBOL_REF, CODE_LABEL or UNSPEC in subexpressions of second argument.
Reject PLUS that contains SYMBOL_REF, CODE_LABEL or UNSPEC in
subexpressions of both operands.
(mem_loc_descriptor): Handle UNSPEC if target hook acks it and all the
subrtxes are CONSTANT_P.
* config/i386/i386.c (ix86_const_not_ok_for_debug_p): Revert
2018-11-09 changes.
* gcc.dg/debug/dwarf2/pr88635.c: New test.
From-SVN: r267594
|
|
2019-01-05 Dominique d'Humieres <dominiq@gcc.gnu.org>
PR target/60563
Missing PR entry in the previous commit.
From-SVN: r267593
|
|
2019-01-05 Dominique d'Humieres <dominiq@gcc.gnu.org>
* g++.dg/ext/sync-4.C: Add dg-xfail-run-if for darwin.
From-SVN: r267592
|
|
gcc/c-family/ChangeLog:
PR c/88546
* c-attribs.c (handle_copy_attribute): Avoid copying attribute leaf.
Handle C++ empty throw specification and C11 _Noreturn.
(has_attribute): Also handle C11 _Noreturn.
gcc/ChangeLog:
PR c/88546
* attribs.c (decls_mismatched_attributes): Avoid warning for attribute
leaf.
gcc/testsuite/ChangeLog:
PR c/88546
* g++.dg/ext/attr-copy.C: New test.
* gcc.dg/attr-copy-4.c: Disable macro expansion tracking.
* gcc.dg/attr-copy-6.c: New test.
* gcc.dg/attr-copy-7.c: New test.
From-SVN: r267591
|
|
Traceback routines, e.g. callers and funcentry, may call
__go_get_backtrace_state. If a profiling signal arrives while we
are in the critical section of __go_get_backtrace_state, it tries
to do a traceback, which also calls __go_get_backtrace_state,
which tries to enter the same critical section and will deadlock.
Prevent this deadlock by setting up runtime_in_callers before
calling __go_get_backtrace_state.
Found while investigating golang/go#29448. Will add a test in the
next CL.
Updates golang/go#29448.
Reviewed-on: https://go-review.googlesource.com/c/156037
From-SVN: r267590
|
|
From-SVN: r267589
|
|
From-SVN: r267588
|
|
gcc/c-family/ChangeLog:
PR c/88363
* c-attribs.c (positional_argument): Also accept enumerated types.
gcc/testsuite/ChangeLog:
PR c/88363
* c-c++-common/attributes-4.c: New test.
gcc/ChangeLog:
PR c/88363
* doc/extend.texi (attribute alloc_align, alloc_size): Update.
From-SVN: r267583
|
|
previous state afterwards.
* gdbinit.in: Turn off pagination for the skip commands, restore
it to previous state afterwards.
From-SVN: r267581
|
|
gcc.target/aarch64/combine_bfxil.c to 18 (PR/87763)
gcc/testsuite/Changelog:
2019-01-04 Sam Tebbs <sam.tebbs@arm.com>
PR gcc/87763
* gcc.target/aarch64/combine_bfxil.c: Change scan-assembler-times bfxil
count to 18.
From-SVN: r267579
|
|
From-SVN: r267575
|
|
2019-01-04 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/48543
* gfortran.dg/const_chararacter_merge.f90: Actually remove.
* gfortran.dg/merge_char_const.f90: Restore.
From-SVN: r267572
|
|
PR target/88594
* config/i386/i386.c (ix86_expand_divmod_libfunc): Use mode instead
of GET_MODE (opN) as modes of the libcall arguments.
* gcc.dg/pr88594.c: New test.
From-SVN: r267571
|
|
While their use for masking is indeed restricted to %k1...%k7, use as
"normal" insn operands also permits %k0. Remove the unnecessary
limitations, requiring quite a few testsuite adjustments.
Oddly enough some AVX512{F,DQ} test cases already check for %k[0-7],
while others did permit {%k0} - where they get touched here anyway this
gets fixed at the same time.
From-SVN: r267570
|
|
gcc/ChangeLog:
* calls.c (maybe_warn_nonstring_arg): Avoid assuming maxlen is set.
gcc/testsuite/ChangeLog:
* gcc.dg/Wstringop-truncation-6.c: New test.
From-SVN: r267569
|
|
From-SVN: r267568
|
|
lxvd2x/stxvd2x.
2019-01-03 Aaron Sawdey <acsawdey@linux.ibm.com>
* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.
From-SVN: r267562
|
|
The vector length and maximum number of workers are known compile-time. Make
these easily available during code generation via new functions.
2019-01-03 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (MACH_VECTOR_LENGTH, MACH_MAX_WORKERS): Define.
(init_axis_dim, nvptx_mach_max_workers, nvptx_mach_vector_length): New
function.
* config/nvptx/nvptx.h (struct machine_function): Add axis_dims.
From-SVN: r267558
|
|
Factor out populate_offload_attrs from nvptx_reorg.
2019-01-03 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (struct offload_attrs): New.
(populate_offload_attrs): New function. Factor mask extraction out of
nvptx_reorg. Add extraction of dimensions.
(nvptx_reorg): Use populate_offload_attrs.
From-SVN: r267557
|
|
Add early-out cases for for oacc_min_dims_p and routine_p in
nvptx_goacc_validate_dims, allowing simplification of the rest of the
function.
2019-01-03 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (nvptx_goacc_validate_dims_1): Add early-out
cases for oacc_min_dims_p and routine_p. Add asserts for
oacc_default_dims_p and offload_region_p.
From-SVN: r267556
|