riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2021-03-16	i386: Fix up _mm256_vzeroupper() handling [PR99563]	Jakub Jelinek	4	-13/+64
	My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling when the implicit vzeroupper handling is disabled. The epilogue_completed splitter for vzeroupper now adds clobbers for all registers which don't have explicit sets in the pattern and the sets are added during vzeroupper pass. Before my changes, for explicit user vzeroupper, we just weren't modelling its effects at all, it was just unspec that didn't tell that it clobbers the upper parts of all XMM < %xmm16 registers. But now the splitter will even for those add clobbers and as it has no sets, it will add clobbers for all registers, which means we optimize away anything that lived across that vzeroupper. The vzeroupper pass has two parts, one is the mode switching that computes where to put the implicit vzeroupper calls and puts them there, and then another that uses df to figure out what sets to add to all the vzeroupper. The former part should be done only under the conditions we have in the gate, but the latter as this PR shows needs to happen either if we perform the implicit vzeroupper additions, or if there are (or could be) any explicit vzeroupper instructions. As that function does df_analyze and walks the whole IL, I think it would be too expensive to run it always whenever TARGET_AVX, so this patch remembers if we've expanded at least one __builtin_ia32_vzeroupper in the function and runs that part of the vzeroupper pass both when the old condition is true or when this new flag is set. 2021-03-16 Jakub Jelinek <jakub@redhat.com> PR target/99563 * config/i386/i386.h (struct machine_function): Add has_explicit_vzeroupper bitfield. * config/i386/i386-expand.c (ix86_expand_builtin): Set cfun->machine->has_explicit_vzeroupper when expanding IX86_BUILTIN_VZEROUPPER. * config/i386/i386-features.c (rest_of_handle_insert_vzeroupper): Do the mode switching only when TARGET_VZEROUPPER, expensive optimizations turned on and not optimizing for size. (pass_insert_vzeroupper::gate): Enable even when cfun->machine->has_explicit_vzeroupper is set. * gcc.target/i386/avx-pr99563.c: New test.
2021-03-16	aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542]	Jakub Jelinek	7	-8/+28
	As the patch shows, there are several bugs in aarch64_simd_clone_compute_vecsize_and_simdlen. One is that unlike for function declarations that aren't definitions it completely ignores argument types. Such decls don't have DECL_ARGUMENTS, but we can walk TYPE_ARG_TYPES instead, like the i386 backend does or like the simd cloning code in the middle end does too. Another problem is that it checks types of uniform arguments. That is unnecessary, uniform arguments are passed the way it normally is, it is a scalar argument rather than vector, so there is no reason not to support uniform argument of different size, or long double, structure etc. 2021-03-16 Jakub Jelinek <jakub@redhat.com> PR target/99542 * config/aarch64/aarch64.c (aarch64_simd_clone_compute_vecsize_and_simdlen): If not a function definition, walk TYPE_ARG_TYPES list if non-NULL for argument types instead of DECL_ARGUMENTS. Ignore types for uniform arguments. * gcc.dg/gomp/pr99542.c: New test. * gcc.dg/gomp/pr59669-2.c (bar): Don't expect a warning on aarch64. * gcc.dg/gomp/simd-clones-2.c (setArray): Likewise. * g++.dg/vect/simd-clone-7.cc (bar): Likewise. * g++.dg/gomp/declare-simd-1.C (f37): Expect a different warning on aarch64. * gcc.dg/declare-simd.c (fn2): Expect a new warning on aarch64.
2021-03-16	testsuite: Fix up target selector syntax errors in modules/builtin-3*.C ↵	Jakub Jelinek	2	-3/+3
	[PR99601] Without this patch I'm seeing: ERROR: tcl error sourcing /home/jakub/src/gcc/gcc/testsuite/g++.dg/modules/modules.exp. ERROR: unmatched open brace in list while executing "foreach op $tmp { switch [lindex $op 0] { "dg-options" { set std_prefix "-std=gnu++" if { [string match "-std=" [lindex $op 2]] } { ..." (procedure "module-init" line 7) invoked from within "module-init $src" invoked from within "if [runtest_file_p $runtests $src] { set tests [lsort [find [file dirname $src] [regsub {_a.[CHX]$} [file tail $src] {_[a-z].[CHX]}]]] set std_lis..." ("foreach" body line 3) invoked from within "foreach src [lsort [find $srcdir/$subdir {_a.[CHX}]] { # use the FOO_a.C name as the parallelization key if [runtest_file_p $runtests $src] {..." (file "/home/jakub/src/gcc/gcc/testsuite/g++.dg/modules/modules.exp" line 304) invoked from within "source /home/jakub/src/gcc/gcc/testsuite/g++.dg/modules/modules.exp" ("uplevel" body line 1) invoked from within "uplevel #0 source /home/jakub/src/gcc/gcc/testsuite/g++.dg/modules/modules.exp" invoked from within "catch "uplevel #0 source $test_file_name"" 2021-03-16 Jakub Jelinek <jakub@redhat.com> PR c++/99601 g++.dg/modules/builtin-3_a.C: Fix target selector syntax errors. * g++.dg/modules/builtin-3_b.C: Likewise.
2021-03-15	libgo: update to Go 1.16.2 release	Ian Lance Taylor	1	-1/+1
	Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/301459
2021-03-15	Update gcc sv.po.	Joseph Myers	1	-434/+287
	* sv.po: Update.
2021-03-15	c++: Fix 2 testcases [PR 99601]	Nathan Sidwell	2	-3/+3
	I'd failed to correctly restrict some checks to lp64 x86 targets. PR c++/99601 gcc/testsuite/ * g++.dg/modules/builtin-3_a.C: Fix lp64 x86 detection. * g++.dg/modules/builtin-3_b.C: Fix lp64 x86 detection.
2021-03-15	coroutines : Convert await_ready () expressions to bool [PR99047].	Iain Sandoe	2	-1/+90
	The awaiter.await_ready() should be converted per [expr.await]/3 (3.6) await-ready is the expression e.await_ready(), contextually converted to bool. gcc/cp/ChangeLog: PR c++/99047 * coroutines.cc (expand_one_await_expression): If the await_ready() expression is not a boolean then convert it as required. gcc/testsuite/ChangeLog: PR c++/99047 * g++.dg/coroutines/pr99047.C: New test.
2021-03-15	coroutines : Handle rethrow from unhandled_exception [PR98704].	Iain Sandoe	2	-21/+145
	Although there is still some discussion in CWG 2451 on this, the implementors are agreed on the intent. When promise.unhandled_exception () is entered, the coroutine is considered to be still running - returning from the method will cause the final await expression to be evaluated. If the method throws, that action is considered to make the coroutine suspend (since, otherwise, it would be impossible to reclaim its resources, since one cannot destroy a running coro). The wording issue is to do with how to represent the place at which the coroutine should be considered suspended. For the implementation here, that place is immediately before the promise life-time ends. A handler for the rethrown exception, can thus call xxxx.destroy() which will run DTORs for the promise and any parameter copies [as needed] then the coroutine frame will be deallocated. At present, we also set "done=true" in this case (for compatibility with other current implementations). One might consider 'done()' to be misleading in the case of an abnormal termination - that is also part of the CWG 2451 discussion. gcc/cp/ChangeLog: PR c++/98704 * coroutines.cc (build_actor_fn): Make destroy index 1 correspond to the abnormal unhandled_exception() exit. Substitute the proxy for the resume index. (coro_rewrite_function_body): Arrange to reset the resume index and make done = true for a rethrown exception from unhandled_exception (). (morph_fn_to_coro): Adjust calls to build_actor_fn and coro_rewrite_function_body. gcc/testsuite/ChangeLog: PR c++/98704 * g++.dg/coroutines/torture/pr98704.C: New test.
2021-03-15	coroutines : Handle for await expressions in for stmts [PR98480].	Iain Sandoe	5	-0/+428
	The handling of await expressions in the init, condition and iteration expressions of for loops had been omitted. Fixed thus. gcc/cp/ChangeLog: PR c++/98480 * coroutines.cc (replace_continue): Rewrite continue into 'goto label'. (await_statement_walker): Handle await expressions in the initializer, condition and iteration expressions of for loops. gcc/testsuite/ChangeLog: PR c++/98480 * g++.dg/coroutines/pr98480.C: New test. * g++.dg/coroutines/torture/co-await-24-for-init.C: New test. * g++.dg/coroutines/torture/co-await-25-for-condition.C: New test. * g++.dg/coroutines/torture/co-await-26-for-iteration-expr.C: New test.
2021-03-15	coroutines : Avoid generating empty statements [PR96749].	Iain Sandoe	3	-20/+123
	In the compiler-only idiom: " a = (target expr creats temp, op uses temp) " the target expression variable needs to be promoted to a frame one (if the expression has a suspend point). However, the only uses of the var are in the second part of the compound expression - and we were creating an empty statement corresponding to the (now unused) first arm. This then produces the spurious warnings noted. Fixed by avoiding generation of a separate variable nest for isolated target expressions (or similarly isolated co_awaits used in a function call). gcc/cp/ChangeLog: PR c++/96749 * coroutines.cc (flatten_await_stmt): Allow for the case where a target expression variable only has uses in the second part of a compound expression. (maybe_promote_temps): Avoid emiting empty statements. gcc/testsuite/ChangeLog: PR c++/96749 * g++.dg/coroutines/pr96749-1.C: New test. * g++.dg/coroutines/pr96749-2.C: New test.
2021-03-15	tree-optimization/98834 - fix optimization regression with _b_c_p	Richard Biener	2	-1/+82
	The following makes FRE optimize a load we formerly required SRA + CCP for which now run after we get rid of all __builtin_constant_p calls. 2021-03-15 Richard Biener <rguenther@suse.de> PR tree-optimization/98834 * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle missing subsetting by truncating the access size. * g++.dg/opt/pr98834.C: New testcase.
2021-03-15	analyzer: fix missing comma in initializer	Martin Liska	1	-1/+1
	Fixes the following valid warning: gcc/analyzer/sm-file.cc:250:5: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation] gcc/analyzer/ChangeLog: * sm-file.cc (get_file_using_fns): Add missing comma in initializer.
2021-03-15	znver3 tuning part 1	Jan Hubicka	2	-1/+135
	2021-03-15 Jan Hubicka <hubicka@ucw.cz> * config/i386/i386-options.c (processor_cost_table): Add znver3_cost. * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy of znver2_cost.
2021-03-15	Handle EXEC_IOLENGTH in doloop_contained_procedure_code.	Thomas Koenig	3	-0/+36
	This rather obvious patch fixes an ICE on valid which came about because I did not handle EXEC_IOLENGTH as start of an I/O statement when checking for the DO loop variable. This is an 11 regression. gcc/fortran/ChangeLog: PR fortran/99345 * frontend-passes.c (doloop_contained_procedure_code): Properly handle EXEC_IOLENGTH. gcc/testsuite/ChangeLog: PR fortran/99345 * gfortran.dg/do_check_16.f90: New test. * gfortran.dg/do_check_17.f90: New test.
2021-03-15	Fortran: Fix problem with allocate initialization [PR99545].	Paul Thomas	2	-1/+41
	2021-03-15 Paul Thomas <pault@gcc.gnu.org> gcc/fortran/ChangeLog PR fortran/99545 * trans-stmt.c (gfc_trans_allocate): Mark the initialization assignment by setting init_flag. gcc/testsuite/ChangeLog PR fortran/99545 * gfortran.dg/pr99545.f90: New test.
2021-03-15	OpenMP: Fix 'omp declare target' handling for vars [PR99509]	Tobias Burnus	2	-6/+37
	For variables with 'declare target' attribute, varpool_node::get_create marks variables as offload; however, if the node already exists, it is not updated. C/C++ may tag decl with 'declare target implicit', which may only be after varpool creation turned into 'declare target' or 'declare target link'; in this case, the tagging has to happen in the FE. gcc/c/ChangeLog: PR c++/99509 * c-decl.c (finish_decl): For 'omp declare target implicit' vars, ensure that the varpool node is marked as offloadable. gcc/cp/ChangeLog: PR c++/99509 * decl.c (cp_finish_decl): For 'omp declare target implicit' vars, ensure that the varpool node is marked as offloadable. libgomp/ChangeLog: PR c++/99509 * testsuite/libgomp.c-c++-common/declare_target-1.c: New test.
2021-03-15	Fix -Wstring-concatenation warning.	Martin Liska	1	-1/+1
	Fix the following clang warning: gcc/spellcheck.c:477:3: warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation] gcc/ChangeLog: * spellcheck.c: Add missing comma in initialization.
2021-03-14	testsuite: fix typo in testcase pr99492.c	David Edelsohn	1	-3/+3
	gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr99492.c: Fix typo.
2021-03-15	Daily bump.	GCC Administrator	4	-1/+25

2021-03-14	PR fortran/99112 - ICE with runtime diagnostics for SIZE intrinsic function	Harald Anlauf	3	-10/+59
	Add/fix handling of runtime checks for CLASS arguments with ALLOCATABLE or POINTER attribute. gcc/fortran/ChangeLog: * trans-expr.c (gfc_conv_procedure_call): Fix runtime checks for CLASS arguments. * trans-intrinsic.c (gfc_conv_intrinsic_size): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/pr99112.f90: New test. Co-authored-by: Paul Thomas <pault@gcc.gnu.org>
2021-03-14	i386: Some more -mavx512vl -mno-avx512bw fixes [PR99321]	Uros Bizjak	1	-34/+24
	2021-03-14 Uroš Bizjak <ubizjak@gmail.com> gcc/ * config/i386/sse.md (vec_extract<mode>): Merge alternative 0 with alternative 2 and alternative 1 with alternative 3 using YW register constraint. (vec_extract<PEXTR_MODE12:mode>_zext): Merge alternatives using YW register constraint. (vec_extractv16qi_zext): Ditto. (vec_extractv4si): Merge alternatives 4 and 5 using Yw register constraint. (*ssse3_palignr<mode>_perm): Use Yw instead of v for alternative 3.
2021-03-14	Daily bump.	GCC Administrator	4	-1/+38

2021-03-13	PR tree-optimization/99489 - ICE calling strncat after strcat	Martin Sebor	2	-1/+42
	gcc/ChangeLog: PR tree-optimization/99489 * builtins.c (gimple_call_alloc_size): Fail gracefully when argument is not a call statement. gcc/testsuite/ChangeLog: PR tree-optimization/99489 * gcc.dg/Wstringop-truncation-9.c: New test.
2021-03-13	Fortran: Fix for class defined operators [PR99125].	Paul Thomas	3	-3/+27
	2021-03-13 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/99125 * trans-array.c (gfc_conv_expr_descriptor): For deferred length length components use the ss_info string length instead of gfc_get_expr_charlen. Make sure that the deferred string length is a variable before assigning to it. Otherwise use the expr. * trans-expr.c (gfc_conv_string_length): Make sure that the deferred string length is a variable before assigning to it. gcc/testsuite/ PR fortran/99125 * gfortran.dg/alloc_deferred_comp_1.f90: New test.
2021-03-13	match.pd: Don't optimize vector X + (X << C) -> X * (1 + (1 << C)) if there ↵	Jakub Jelinek	2	-2/+21
	is no mult support [PR99544] E.g. on aarch64, the target has V2DImode addition and shift by scalar optabs, but doesn't have V2DImode multiply. The following testcase ICEs because this simplification is done after last lowering, but generally, even if it is done before that, turning it into a multiplication will not be an improvement because that means scalarization, while the former can be done in vectors. It would be nice if we added expansion support for vector multiplication by uniform constants using shifts and additions like we have for scalar multiplication, but that is something that can be done in stage1. 2021-03-13 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/99544 * match.pd (X + (X << C) -> X * (1 + (1 << C))): Don't simplify if for vector types multiplication can't be done in type's mode. * gcc.dg/gomp/pr99544.c: New test.
2021-03-12	misc/cgo/testcarchive: don't use == for string equality in C code	Ian Lance Taylor	1	-1/+1
	Backport of https://golang.org/cl/300993. For PR go/99553 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/301458
2021-03-13	Daily bump.	GCC Administrator	5	-1/+129

2021-03-12	c++: ICE with using-decl [PR 99238]	Nathan Sidwell	5	-45/+52
	This ICE was caused by a stray TREE_VISITED marker. The lookup machinery was leaving it there due to the way I'd arranged for it to be cleared. That was presuming the name_lookup::value field didn't change, and that wasn't always true in the using-decl processing. I took the opportunity to break out a helper, and then call it immediately after lookups, rather than wait until destructor time. Added some asserts the module machinery to catch further cases of this. PR c++/99238 gcc/cp/ * module.cc (depset::hash::add_binding_entity): Assert not visited. (depset::add::add_specializations): Likewise. * name-lookup.c (name_lookup::dedup): New. (name_lookup::~name_lookup): Assert not deduping. (name_lookup::restore_state): Likewise. (name_lookup::add_overload): Replace outlined code with dedup call. (name_lookup::add_value): Likewise. (name_lookup::search_namespace_only): Likewise. (name_lookup::adl_namespace_fns): Likewise. (name_lookup::adl_class_fns): Likewise. (name_lookup::search_adl): Likewise. Add clearing dedup call. (name_lookup::search_qualified): Likewise. (name_lookup::search_unqualified): Likewise. gcc/testsuite/ * g++.dg/modules/pr99238.h: New. * g++.dg/modules/pr99238_a.H: New. * g++.dg/modules/pr99238_b.H: New.
2021-03-12	Fix memory constraint bug in SPARC back-end	Eric Botcazou	4	-20/+16
	It's a bug exposed by the recent LRA changes, whereby the T constraint fails to behave properly when LRA is enabled (unlike when reload is enabled). The patch also gets rid of the awkward W constraint, which is strictly equivalent to m in 64-bit mode and, as a result, renames the w constraint into W. gcc/ PR target/99422 * config/sparc/constraints.md (w): Rename to... (W): ... this and ditch previous implementation. * config/sparc/sparc.md (movdi_insn_sp64): Replace W with m. (movdf_insn_sp64): Likewise. (mov<VM64:mode>_insn_sp64): Likewise. config/sparc/sync.md (atomic_compare_and_swap<mode>_1): Replace w with W. (atomic_compare_and_swap_leon3_1): Likewise. (atomic_compare_and_swapdi_v8plus): Likewise. * config/sparc/sparc.c (memory_ok_for_ldd): Remove useless test on architecture and add missing address validity check during LRA.
2021-03-12	Fortran/OpenMP: Accept implicit-save DATA vars for threadprivate [PR99514]	Tobias Burnus	2	-5/+16
	gcc/fortran/ChangeLog: PR fortran/99514 * resolve.c (resolve_symbol): Accept vars which are in DATA and hence (either) implicit SAVE (or in common). gcc/testsuite/ChangeLog: PR fortran/99514 * gfortran.dg/gomp/threadprivate-1.f90: New test.
2021-03-12	Fortran/OpenMP: Fix use_device_{ptr,addr} with assumed-size array [PR98858]	Tobias Burnus	1	-1/+1
	gcc/ChangeLog: PR fortran/98858 * gimplify.c (omp_add_variable): Handle NULL_TREE as size occuring for assumed-size arrays in use_device_{ptr,addr}. libgomp/ChangeLog: PR fortran/98858 * testsuite/libgomp.fortran/use_device_ptr-3.f90: New test.
2021-03-12	i386: Hopefully last set of -mavx512vl -mno-avx512bw fixes [PR99321]	Jakub Jelinek	4	-239/+332
	This is the final patch of the series started with https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566139.html and continued with https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566356.html This time, I went through all the remaining instructions marked by gas as requiring both AVX512BW and AVX512VL and for each checked tmp-mddump.md, figure out if it ever could be a problem (e.g. instructions that require AVX512BW+AVX512VL, but didn't exist before AVX512F are usually fine, the patterns have the right conditions, the bugs are typically on pre-AVX512F patterns where we have just blindly added v while they actually can't access those unless AVX512BW+AVX512VL), added test where possible (the test doesn't cover MMX though)and fixed md bugs. For mmx pextr[bw]/pinsr[bw] patterns it introduces per discussions a new YW constraint that only requires AVX512BW and not AVX512VL, because those instructions only require the former and not latter when using EVEX encoding. There are some other interesting details, e.g. most of the 8 interleave patterns (vpunck[hl]{bw,wd}) had correctly && <mask_avx512vl_condition> && <mask_avx512bw_condition> in the conditions because for masking it needs to be always EVEX encoded and then it needs both VL+BW, but 2 of those 8 had just && <mask_avx512vl_condition> and so again would run into the -mavx512vl -mno-avx512bw problems. Another problem different from others was mmx eq/gt comparisons, that was using Yv constraints, so would happily accept %xmm16+ registers for -mavx512vl, but there actually are no such EVEX encoded instructions, as AVX512 comparisons work with %k* registers instead. The newly added testcase without the patch fails with: /tmp/ccVROLo2.s: Assembler messages: /tmp/ccVROLo2.s:9: Error: unsupported instruction `vpabsb' /tmp/ccVROLo2.s:20: Error: unsupported instruction `vpabsb' /tmp/ccVROLo2.s:31: Error: unsupported instruction `vpabsw' /tmp/ccVROLo2.s:42: Error: unsupported instruction `vpabsw' /tmp/ccVROLo2.s:53: Error: unsupported instruction `vpaddsb' /tmp/ccVROLo2.s:64: Error: unsupported instruction `vpaddsb' /tmp/ccVROLo2.s:75: Error: unsupported instruction `vpaddsw' /tmp/ccVROLo2.s:86: Error: unsupported instruction `vpaddsw' /tmp/ccVROLo2.s:97: Error: unsupported instruction `vpsubsb' /tmp/ccVROLo2.s:108: Error: unsupported instruction `vpsubsb' /tmp/ccVROLo2.s:119: Error: unsupported instruction `vpsubsw' /tmp/ccVROLo2.s:130: Error: unsupported instruction `vpsubsw' /tmp/ccVROLo2.s:141: Error: unsupported instruction `vpaddusb' /tmp/ccVROLo2.s:152: Error: unsupported instruction `vpaddusb' /tmp/ccVROLo2.s:163: Error: unsupported instruction `vpaddusw' /tmp/ccVROLo2.s:174: Error: unsupported instruction `vpaddusw' /tmp/ccVROLo2.s:185: Error: unsupported instruction `vpsubusb' /tmp/ccVROLo2.s:196: Error: unsupported instruction `vpsubusb' /tmp/ccVROLo2.s:207: Error: unsupported instruction `vpsubusw' /tmp/ccVROLo2.s:218: Error: unsupported instruction `vpsubusw' /tmp/ccVROLo2.s:258: Error: unsupported instruction `vpaddusw' /tmp/ccVROLo2.s:269: Error: unsupported instruction `vpavgb' /tmp/ccVROLo2.s:280: Error: unsupported instruction `vpavgb' /tmp/ccVROLo2.s:291: Error: unsupported instruction `vpavgw' /tmp/ccVROLo2.s:302: Error: unsupported instruction `vpavgw' /tmp/ccVROLo2.s:475: Error: unsupported instruction `vpmovsxbw' /tmp/ccVROLo2.s:486: Error: unsupported instruction `vpmovsxbw' /tmp/ccVROLo2.s:497: Error: unsupported instruction `vpmovzxbw' /tmp/ccVROLo2.s:508: Error: unsupported instruction `vpmovzxbw' /tmp/ccVROLo2.s:548: Error: unsupported instruction `vpmulhuw' /tmp/ccVROLo2.s:559: Error: unsupported instruction `vpmulhuw' /tmp/ccVROLo2.s:570: Error: unsupported instruction `vpmulhw' /tmp/ccVROLo2.s:581: Error: unsupported instruction `vpmulhw' /tmp/ccVROLo2.s:592: Error: unsupported instruction `vpsadbw' /tmp/ccVROLo2.s:603: Error: unsupported instruction `vpsadbw' /tmp/ccVROLo2.s:643: Error: unsupported instruction `vpshufhw' /tmp/ccVROLo2.s:654: Error: unsupported instruction `vpshufhw' /tmp/ccVROLo2.s:665: Error: unsupported instruction `vpshuflw' /tmp/ccVROLo2.s:676: Error: unsupported instruction `vpshuflw' /tmp/ccVROLo2.s:687: Error: unsupported instruction `vpslldq' /tmp/ccVROLo2.s:698: Error: unsupported instruction `vpslldq' /tmp/ccVROLo2.s:709: Error: unsupported instruction `vpsrldq' /tmp/ccVROLo2.s:720: Error: unsupported instruction `vpsrldq' /tmp/ccVROLo2.s:899: Error: unsupported instruction `vpunpckhbw' /tmp/ccVROLo2.s:910: Error: unsupported instruction `vpunpckhbw' /tmp/ccVROLo2.s:921: Error: unsupported instruction `vpunpckhwd' /tmp/ccVROLo2.s:932: Error: unsupported instruction `vpunpckhwd' /tmp/ccVROLo2.s:943: Error: unsupported instruction `vpunpcklbw' /tmp/ccVROLo2.s:954: Error: unsupported instruction `vpunpcklbw' /tmp/ccVROLo2.s:965: Error: unsupported instruction `vpunpcklwd' /tmp/ccVROLo2.s:976: Error: unsupported instruction `vpunpcklwd' 2021-03-12 Jakub Jelinek <jakub@redhat.com> PR target/99321 * config/i386/constraints.md (YW): New internal constraint. * config/i386/sse.md (v_Yw): Add V4TI, V2TI, V1TI and TI cases. (<sse2_avx2>_<insn><mode>3<mask_name>, <sse2_avx2>_uavg<mode>3<mask_name>, abs<mode>2, <s>mul<mode>3_highpart<mask_name>): Use <v_Yw> instead of v in constraints. (<sse2_avx2>_psadbw): Use YW instead of v in constraints. (avx2_pmaddwd, sse2_pmaddwd, <code>v8hi3, <code>v16qi3, avx2_pmaddubsw256, ssse3_pmaddubsw128): Merge last two alternatives into one, use Yw instead of former x,v. (ashr<mode>3, <insn><mode>3): Use <v_Yw> instead of x in constraints of the last alternative. (<sse2_avx2>_packsswb<mask_name>, <sse2_avx2>_packssdw<mask_name>, <sse2_avx2>_packuswb<mask_name>, <sse4_1_avx2>_packusdw<mask_name>, <ssse3_avx2>_pmulhrsw<mode>3<mask_name>, <ssse3_avx2>_palignr<mode>, <ssse3_avx2>_pshufb<mode>3<mask_name>): Merge last two alternatives into one, use <v_Yw> instead of former x,v. (avx2_interleave_highv32qi<mask_name>, vec_interleave_highv16qi<mask_name>): Use Yw instead of v in constraints. Add && <mask_avx512bw_condition> to condition. (avx2_interleave_lowv32qi<mask_name>, vec_interleave_lowv16qi<mask_name>, avx2_interleave_highv16hi<mask_name>, vec_interleave_highv8hi<mask_name>, avx2_interleave_lowv16hi<mask_name>, vec_interleave_lowv8hi<mask_name>, avx2_pshuflw_1<mask_name>, sse2_pshuflw_1<mask_name>, avx2_pshufhw_1<mask_name>, sse2_pshufhw_1<mask_name>, avx2_<code>v16qiv16hi2<mask_name>, sse4_1_<code>v8qiv8hi2<mask_name>, sse4_1_<code>v8qiv8hi2<mask_name>_1, <sse2_avx2>_<insn><mode>3): Use Yw instead of v in constraints. * config/i386/mmx.md (Yv_Yw): New define_mode_attr. (mmx_<insn><mode>3, mmx_ashr<mode>3, mmx_<insn><mode>3): Use <Yv_Yw> instead of Yv in constraints. (mmx_<insn><mode>3, mmx_mulv4hi3, mmx_smulv4hi3_highpart, mmx_umulv4hi3_highpart, mmx_pmaddwd, mmx_<code>v4hi3, mmx_<code>v8qi3, mmx_pack<s_trunsuffix>swb, mmx_packssdw, mmx_punpckhbw, mmx_punpcklbw, mmx_punpckhwd, mmx_punpcklwd, mmx_uavgv8qi3, mmx_uavgv4hi3, mmx_psadbw): Use Yw instead of Yv in constraints. (mmx_pinsrw, mmx_pinsrb, mmx_pextrw, mmx_pextrw_zext, mmx_pextrb, mmx_pextrb_zext): Use YW instead of Yv in constraints. (mmx_eq<mode>3, mmx_gt<mode>3): Use x instead of Yv in constraints. (mmx_andnot<mode>3, mmx_<code><mode>3): Split last alternative into two, one with just x, another isa avx512vl with v. * gcc.target/i386/avx512vl-pr99321-2.c: New test.
2021-03-12	c++: Fix up calls to immediate functions returning reference [PR99507]	Jakub Jelinek	2	-0/+10
	build_cxx_call calls convert_from_reference at the end, so if an immediate function returns a reference, we were constant evaluating not just that call, but that call wrapped in an INDIRECT_REF. That unfortunately means it can constant evaluate to something non-addressable, so if code later needs to take its address it will fail. The following patch fixes that by undoing the convert_from_reference wrapping for the cxx_constant_value evaluation and readdding it ad the end. 2021-03-12 Jakub Jelinek <jakub@redhat.com> PR c++/99507 * call.c (build_over_call): For immediate evaluation of functions that return references, undo convert_from_reference effects before calling cxx_constant_value and call convert_from_reference afterwards. * g++.dg/cpp2a/consteval19.C: New test.
2021-03-12	analyzer: document new param	Martin Liska	1	-0/+4
	gcc/ChangeLog: * doc/invoke.texi: Add missing param documentation.
2021-03-12	Daily bump.	GCC Administrator	5	-1/+207

2021-03-11	compiler: create temporaries for heap variables	Ian Lance Taylor	6	-56/+99
	The compiler generally doesn't create a temporary for an expression that is a variable, because it's normally valid to simply reload the value from the variable. However, if the variable is in the heap, then loading the value is a pointer indirection. The process of creating GCC IR can cause the variable load and the pointer indirection to be split, such that the second evaluation only does the pointer indirection. If there are conditionals in between the two uses, this can cause the second use to load the pointer from an uninitialized register. Avoid this by introducing a new Expression method that returns whether it is safe to evaluate an expression multiple times, and use it everywhere. The test case is https://golang.org/cl/300789. Fixes golang/go#44383 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/300809
2021-03-11	analyzer: new implementation of shortest feasible path [PR96374]	David Malcolm	20	-106/+1366
	The analyzer builds an exploded graph of (point,state) pairs and when it finds a problem, records a diagnostic at the relevant exploded node. Once it has finished exploring the graph, the analyzer needs to generate the shortest feasible path through the graph to each diagnostic's node. This is used: - for rejecting diagnostics that are infeasible (due to impossible sets of constraints), - for use in determining which diagnostic to use in each deduplication set (the one with the shortest path), and - for building checker_paths for the "winning" diagnostics, giving a list of events Prior to this patch the analyzer simply found the shortest path to the node, and then checked it for feasibility, which could lead to falsely rejecting diagnostics: "the shortest path, if feasible" is not the same as "the shortest feasible path" (PR analyzer/96374). An example is PR analyzer/93355, where this issue causes the analyzer to fail to emit a leak warning for a missing fclose on an error-handling path in intl/localealias.c. This patch implements a new algorithm for finding the shortest feasible path to an exploded node: instead of simply finding the shortest path, the new algorithm uses a worklist to iteratively build a tree of path prefixes, which are feasible paths by construction, until a path to the target node is found. The worklist is prioritized, so that the first feasible path discovered is the shortest possible feasible path. The algorithm continues trying paths until the target node is reached or a limit is exceeded, in which case the diagnostic is treated as being infeasible (which could still be a false negative, but is much less likely to happen than before). Iteratively building a tree of paths allows for work to be reused, and the tree can be dumped in .dot form (via a new -fdump-analyzer-feasibility option), making it much easier to debug compared to other approaches I tried. Doing so fixes the missing leak warning for PR analyzer/93355 and various other test cases. Testing: - I manually verified that the behavior is determistic using 50 builds of pr93355-localealias.c. All dumps were identical. - I manually verified that it still builds with --disable-analyzer. - Lightly tested with valgrind; no additional issues. - Lightly performance tested, showing a slight speed regression to the analyzer relative to before the patch, but correctness for this issue is more important than the slight performance hit for the analyzer. gcc/ChangeLog: PR analyzer/96374 * Makefile.in (ANALYZER_OBJS): Add analyzer/feasible-graph.o and analyzer/trimmed-graph.o. * doc/analyzer.texi (Analyzer Paths): Rewrite description of feasibility checking to reflect new implementation. * doc/invoke.texi (-fdump-analyzer-feasibility): Document new option. * shortest-paths.h (shortest_paths::get_shortest_distance): New. gcc/analyzer/ChangeLog: PR analyzer/96374 * analyzer.opt (-param=analyzer-max-infeasible-edges=): New param. (fdump-analyzer-feasibility): New flag. * diagnostic-manager.cc: Include "analyzer/trimmed-graph.h" and "analyzer/feasible-graph.h". (epath_finder::epath_finder): Convert m_sep to a pointer and only create it if !flag_analyzer_feasibility. (epath_finder::~epath_finder): New. (epath_finder::m_sep): Convert to a pointer. (epath_finder::get_best_epath): Add param "diag_idx" and use it when logging. Rather than finding the shortest path and then checking feasibility, instead use explore_feasible_paths unless !flag_analyzer_feasibility, in which case simply use the shortest path, and note if it is infeasible. Update for m_sep becoming a pointer. (class feasible_worklist): New. (epath_finder::explore_feasible_paths): New. (epath_finder::process_worklist_item): New. (class dump_eg_with_shortest_path): New. (epath_finder::dump_trimmed_graph): New. (epath_finder::dump_feasible_graph): New. (saved_diagnostic::saved_diagnostic): Add "idx" param, using it on new field m_idx. (saved_diagnostic::to_json): Dump m_idx. (saved_diagnostic::calc_best_epath): Pass m_idx to get_best_epath. Remove assertion that m_problem was set when m_best_epath is NULL. (diagnostic_manager::add_diagnostic): Pass an index when created saved_diagnostic instances. * diagnostic-manager.h (saved_diagnostic::saved_diagnostic): Add "idx" param. (saved_diagnostic::get_index): New accessor. (saved_diagnostic::m_idx): New field. * engine.cc (exploded_node::dump_dot): Call args.dump_extra_info. Move code to... (exploded_node::dump_processed_stmts): ...this new function and... (exploded_node::dump_saved_diagnostics): ...this new function. Add index of each diagnostic. (exploded_edge::dump_dot): Move bulk of code to... (exploded_edge::dump_dot_label): ...this new function. * exploded-graph.h (eg_traits::dump_args_t::dump_extra_info): New vfunc. (exploded_node::dump_processed_stmts): New decl. (exploded_node::dump_saved_diagnostics): New decl. (exploded_edge::dump_dot_label): New decl. * feasible-graph.cc: New file. * feasible-graph.h: New file. * trimmed-graph.cc: New file. * trimmed-graph.h: New file. gcc/testsuite/ChangeLog: PR analyzer/96374 * gcc.dg/analyzer/dot-output.c: Add -fdump-analyzer-feasibility to options. * gcc.dg/analyzer/feasibility-1.c (test_6): Remove xfail. (test_7): New. * gcc.dg/analyzer/pr93355-localealias-feasibility-2.c: Remove xfail. * gcc.dg/analyzer/pr93355-localealias-feasibility-3.c: Remove xfails. * gcc.dg/analyzer/pr93355-localealias-feasibility.c: Remove -fno-analyzer-feasibility from options. * gcc.dg/analyzer/pr93355-localealias.c: Likewise. * gcc.dg/analyzer/unknown-fns-4.c: Remove xfail.
2021-03-11	analyzer: support reverse direction in shortest-paths.h	David Malcolm	3	-37/+125
	This patch generalizes shortest-path.h so that it can be used to find the shortest path from each node to a given target node (on top of the existing support for finding the shortest path from a given origin node to each node). I've marked this as "analyzer" as this is the only code using shortest-paths.h. This patch is required by followup work to fix PR analyzer/96374. gcc/analyzer/ChangeLog: * diagnostic-manager.cc (epath_finder::epath_finder): Update shortest_paths init for new param. gcc/ChangeLog: * digraph.cc (selftest::test_shortest_paths): Update shortest_paths init for new param. Add test of SPS_TO_GIVEN_TARGET. * shortest-paths.h (enum shortest_path_sense): New. (shortest_paths::shortest_paths): Add "sense" param. Update for renamings. Generalize to use "sense" param. (shortest_paths::get_shortest_path): Rename param. (shortest_paths::m_sense): New field. (shortest_paths::m_prev): Rename... (shortest_paths::m_best_edge): ...to this. (shortest_paths::get_shortest_path): Update for renamings. Conditionalize flipping of path on sense of traversal.
2021-03-11	analyzer: gracefully handle impossible paths in shortest-paths.h	David Malcolm	2	-24/+83
	This bulletproofs the shortest_paths code against unreachable nodes, gracefully handling them, rather than failing an assertion. I've marked this as "analyzer" as this is the only code using shortest-paths.h. This patch is required by followup work to fix PR analyzer/96374. gcc/ChangeLog: * digraph.cc (selftest::test_shortest_paths): Add test coverage for paths from B and C. * shortest-paths.h (shortest_paths::shortest_paths): Handle unreachable nodes, rather than asserting.
2021-03-11	aix: Use lcomm for TLS static data.	David Edelsohn	4	-9/+5
	GCC on AIX generates thread local uninitialized data in the common section, which could conflict with another module. This patch changes the code generation to place static uninitialized thread local data into the local common section specified with .lcomm. This change also removes the need to create a file-local name for the TBSS data. gcc/ChangeLog: 2021-03-11 David Edelsohn <dje.gcc@gmail.com> PR target/99094 * config/rs6000/rs6000.c (rs6000_xcoff_file_start): Don't create xcoff_tbss_section_name. * config/rs6000/xcoff.h (ASM_OUTPUT_TLS_COMMON): Use .lcomm. * xcoffout.c (xcoff_tbss_section_name): Delete. * xcoffout.h (xcoff_tbss_section_name): Delete.
2021-03-11	c++: Fix unhiding friend with imports [PR 99248]	Nathan Sidwell	5	-3/+19
	This was a simple thinko about which object held the reference to the binding vector. I also noticed stale code in the tree dumper, as I recently removed the flags from a lazy number. PR c++/99248 gcc/cp/ * name-lookup.c (lookup_elaborated_type_1): Access slot not bind when there's a binding vector. * ptree.c (cxx_print_xnode): Lazy flags are no longer a thing. gcc/testsuite/ * g++.dg/modules/pr99248.h: New. * g++.dg/modules/pr99248_a.H: New. * g++.dg/modules/pr99248_b.H: New.
2021-03-11	c++: template partial instantiation mismatch [PR 99528]	Nathan Sidwell	5	-45/+35
	This turned out to be an existing problem, which had been hidden by other bugs. Templated members of templated classes can end up instantiating the template itself, and we were not handling the mergeableness of that correctly. PR c++/99528 gcc/cp/ * module.cc (enum merge_kind): Delete MK_type_tmpl_spec, MK_decl_tmpl_spec. (trees_in::decl_value): Adjust add_mergeable_specialization call. (trees_out::get_merge_kind): Adjust detecting a partial template instantiation. (trees_out::key_mergeable): Adjust handling same. (trees_in::key_mergeabvle): Likewise. gcc/testsuite/ * g++.dg/modules/pr99528.h: New. * g++.dg/modules/pr99528_a.H: New. * g++.dg/modules/pr99528_b.H: New. * g++.dg/modules/pr99528_c.C: New.
2021-03-11	testsuite/98245 - adjust dump scanning of gcc.dg/vect/bb-slp-46.c	Richard Biener	1	-2/+2
	Checking the number of pluses is unreliable since the vector size isn't known. Instead see that the unwanted scalar compute is not there. 2021-03-11 Richard Biener <rguenther@suse.de> PR testsuite/98245 * gcc.dg/vect/bb-slp-46.c: Scan for the scalar compute instead of verifying the total number of adds.
2021-03-11	testsuite/97494 - XFAIL gcc.dg/vect/pr97428.c on !vect_hw_misalign	Richard Biener	1	-1/+3
	While we could at least vectorize it on targets which support re-alignment tokens we fail to do this because of imperfections in alignment analysis. XFAIL when the HW cannot deal with misaligned vector accesses for now. 2021-03-11 Richard Biener <rguenther@suse.de> PR testsuite/97494 * gcc.dg/vect/pr97428.c: XFAIL on !vect_hw_misalign.
2021-03-11	testsuite/97494 - XFAIL gcc.dg/vect/vect-complex-5.c on !vect_hw_misalign	Richard Biener	1	-1/+1
	This is a missed optimization due to bogus alignment analysis. 2021-03-11 Richard Biener <rguenther@suse.de> PR testsuite/97494 * gcc.dg/vect/vect-complex-5.c: XFAIL on !vect_hw_misalign.
2021-03-11	testsuite/97494 - amend gcc.dg/vect/slp-21.c	Richard Biener	1	-2/+2
	As reported in the PR all powerpc64 targets fail FAIL: gcc.dg/vect/slp-21.c scan-tree-dump-times vect "vectorizing stmts using SLP" 2 because like on arm we now vectorize 4 opportunities. This adjusts the testcase to follow the arm example. 2021-03-11 Richard Biener <rguenther@suse.de> PR testsuite/97494 * gcc.dg/vect/slp-21.c: Adjust for powerpc64--*.
2021-03-11	tree-optimization/99523 - missing SSA decls in dumps	Richard Biener	1	-1/+6
	This makes sure to dump SSA names without identifier in the declaration part of a function dump. While we dump the anonymous variable decls the SSA names referencing them appear without a clear reference as to what anonymous variable is used (_3 vs. D.1234). 2021-03-11 Richard Biener <rguenther@suse.de> PR tree-optimization/99523 * tree-cfg.c (dump_function_to_file): Dump SSA names w/o identifier to the decls section as well, not only those without a VAR_DECL.
2021-03-11	icf: Check return type of internal fn calls [PR99517]	Jakub Jelinek	3	-1/+54
	The following testcase is miscompiled, because IPA-ICF considers the two functions identical. They aren't, the types of the .VEC_CONVERT call lhs is different. But for calls to internal functions, there is no fntype nor callee with a function type to compare, so all we compare is just the ifn, arguments and some call flags. The following patch fixes it by checking the internal fn calls like e.g. gimple assignments where the type of the lhs is checked too. 2021-03-11 Jakub Jelinek <jakub@redhat.com> PR ipa/99517 * ipa-icf-gimple.c (func_checker::compare_gimple_call): For internal function calls with lhs fail if the lhs don't have compatible types. * gcc.target/i386/avx2-pr99517-1.c: New test. * gcc.target/i386/avx2-pr99517-2.c: New test.
2021-03-11	cris: define HARD_FRAME_POINTER_REGNUM	Hans-Peter Nilsson	3	-20/+36
	Beware, tm.texi doesn't tell the whole story: a defined HARD_FRAME_POINTER_REGNUM (different to FRAME_POINTER_REGNUM) is supposed to make work easier for reload, being able to easily tell actual frame-pointer-related addresses from those that happen to use the same register or something to that effect. On reasonable code the performance effect is barely measurable. Looking at libgcc changes for -march=v10, the effect (where noticeable) is mostly indeterminate churn. Instances where it's not just insns moved around at no obvious effect: one more insn for addvdi3, subvdi3; two insns more in floatdisf; three insns shorter fixunsdfdi. Some of those seem related to pairing r8 with r9. The only effect on coremark is an infinitesimal positive effect from a three(!) cycles total (from the 15 calls) faster execution paths in vfprintf_r. Local microbenchmarks give similar results. With that in mind and not forgetting that expectations in the register allocator and reload leaning towards HARD_FRAME_POINTER_REGNUM defined (and different to) FRAME_POINTER_REGNUM or to wit, "all the kids do it", why not. Note that the offset at elimination really is 0. gcc: * config/cris/cris.h (HARD_FRAME_POINTER_REGNUM): Define. Change FRAME_POINTER_REGNUM to correspond to a new faked register faked_fp, part of GENNONACR_REGS like faked_ap. (CRIS_FAKED_REGS_CONTENTS): New helper macro. (FIRST_PSEUDO_REGISTER, FIXED_REGISTERS, CALL_USED_REGISTERS): (REG_ALLOC_ORDER, REG_CLASS_CONTENTS, REGNO_OK_FOR_BASE_P) (ELIMINABLE_REGS, REGISTER_NAMES): Adjust accordingly. * config/cris/cris.md (CRIS_FP_REGNUM): Renumber to new faked register. (CRIS_REAL_FP_REGNUM): New constant. * config/cris/cris.c (cris_reg_saved_in_regsave_area): Check for HARD_FRAME_POINTER_REGNUM instead of FRAME_POINTER_REGNUM. (cris_initial_elimination_offset): Handle elimination changes to HARD_FRAME_POINTER_REGNUM instead of FRAME_POINTER_REGNUM and add one from FRAME_POINTER_REGNUM to HARD_FRAME_POINTER_REGNUM. (cris_expand_prologue, cris_expand_epilogue): Emit code for hard_frame_pointer_rtx instead of frame_pointer_rtx.
2021-03-11	Daily bump.	GCC Administrator	7	-1/+165