aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-10[nvptx] Workaround sub.u16 driver JIT bugTom de Vries1-1/+8
There's a nvidia driver JIT bug that mishandles this code (minimized from builtin-arith-overflow-15.c): ... int main (void) { signed char r; unsigned char y = (unsigned char) 0x80; if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r)) __builtin_abort (); return 0; } ... which at ptx level minimizes to: ... mov.u16 r22, 0x0080; st.local.u16 [frame_var],r22; ld.local.u16 r32,[frame_var]; sub.u16 r33,0x0000,r32; cvt.u32.u16 r35,r33; ... where we expect r35 == 0x0000ff80 but get instead 0xffffff80, and where using nvptx-none-run -O0 fixes the problem. [ See also https://github.com/vries/nvidia-bugs/tree/master/builtin-arith-overflow-15 . ] Try to workaround the bug by using sub.s16 instead of sub.u16. Tested on nvptx. gcc/ChangeLog: 2022-02-07 Tom de Vries <tdevries@suse.de> PR target/97005 * config/nvptx/nvptx.md (define_insn "sub<mode>3"): Workaround driver JIT bug by using sub.s16 instead of sub.u16.
2022-02-10Fortran/OpenMP: Avoid ICE for invalid char array in omp atomic [PR104329]Tobias Burnus2-3/+36
PR fortran/104329 gcc/fortran/ChangeLog: * openmp.cc (resolve_omp_atomic): Defer extra-code assert after other diagnostics. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/atomic-28.f90: New test.
2022-02-10nvptx: Tweak constraints on copysign instructionsRoger Sayle1-2/+2
Many thanks to Thomas Schwinge for confirming my hypothesis that the register usage regression, PR target/104345, is solely due to libgcc's _muldc3 function. In addition to the isinf functionality in the previously proposed nvptx patch at https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588453.html which significantly reduces the number of instructions in _muldc3, the patch below further reduces both the number of instructions and the number of explicitly declared registers, by permitting floating point constant immediate operands in nvptx's copysign instruction. Fingers-crossed, the combination with all of the previous proposed nvptx patches improves things. Ultimately, increasing register usage from 50 to 51 registers, reducing the number of concurrent threads by ~2%, can easily be countered if we're now executing significantly fewer instructions in each kernel, for a net performance win. This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu with a "make" and "make -k check" with no new failures. gcc/ChangeLog: * config/nvptx/nvptx.md (copysign<mode>3): Allow immediate floating point constants as operands 1 and/or 2.
2022-02-10PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0Roger Sayle2-4/+71
This patch addresses the "increased register pressure" regression on nvptx-none caused by my change to transition the backend to a STORE_FLAG_VALUE = 1 target. This improved code generation for the more common case of producing 0/1 Boolean values, but unfortunately made things marginally worse when a 0/-1 mask value is desired. Unfortunately, nvptx kernels are extremely sensitive to changes in register usage, which was observable in the reported PR. This patch provides optimizations for -(cond ? 1 : 0), effectively simplify this into cond ? -1 : 0, where these ternary operators are provided by nvptx's selp instruction, and for the specific case of SImode, using (restoring) nvptx's "set" instruction (which avoids the need for a predicate register). This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu with a "make" and "make -k check" with no new failures. Unfortunately, the exact register usage of a nvptx kernel depends upon the version of the Cuda drivers being used (and the hardware), but I believe this change should resolve the PR (for Thomas) by improving code generation for the cases that regressed. gcc/ChangeLog: PR target/104345 * config/nvptx/nvptx.md (sel_true<mode>): Fix indentation. (sel_false<mode>): Likewise. (define_code_iterator eqne): New code iterator for EQ and NE. (*selp<mode>_neg_<code>): New define_insn_and_split to optimize the negation of a selp instruction. (*selp<mode>_not_<code>): New define_insn_and_split to optimize the bitwise not of a selp instruction. (*setcc_int<mode>): Use set instruction for neg:SI of a selp. gcc/testsuite/ChangeLog: PR target/104345 * gcc.target/nvptx/neg-selp.c: New test case.
2022-02-10nvptx: Fix and use BI mode logic instructions (e.g. and.pred)Roger Sayle5-21/+67
This patch adds support for nvptx's BImode and.pred, or.pred and xor.pred instructions. Technically, nvptx.md previously defined andbi3, iorbi3 and xorbi3 instructions, but the assembly language mnemonic output for these was incorrect (e.g. and.b1) and would be rejected by the ptxas assembler. The most significant part of this patch is the new define_split which teaches the compiler to actually use these instructions when appropriate (exposing the latent bug above). After https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587999.html, the function: int foo(int x, int y) { return (x==21) && (y==69); } when compiled with -O2 produces: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; setp.eq.u32 %r31, %r26, 21; setp.eq.u32 %r34, %r27, 69; selp.u32 %r37, 1, 0, %r31; selp.u32 %r38, 1, 0, %r34; and.b32 %value, %r37, %r38; with this patch we now save an extra instruction and generate: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; setp.eq.u32 %r31, %r26, 21; setp.eq.u32 %r34, %r27, 69; and.pred %r39, %r34, %r31; selp.u32 %value, 1, 0, %r39; This patch has been tested (on top of the patch mentioned above) on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. gcc/ChangeLog: * config/nvptx/nvptx.md (any_logic): Move code iterator earlier in machine description. (logic): Move code attribute earlier in machine description. (ilogic): New code attribute, like logic but "ior" for IOR. (and<mode>3, ior<mode>3, xor<mode>3): Delete. Replace with... (<ilogic><mode>3): New define_insn for HSDIM logic operations. (<ilogic>bi3): New define_insn for BI mode logic operations. (define_split): Lower logic operations from integer modes to BI mode predicate operations. gcc/testsuite/ChangeLog: * gcc.target/nvptx/bool-1.c: Update. * gcc.target/nvptx/bool-2.c: New test case for and.pred. * gcc.target/nvptx/bool-3.c: New test case for or.pred. * gcc.target/nvptx/bool-4.c: New test case for xor.pred.
2022-02-10nvptx: Add support for 64-bit mul.hi (and other) instructionsRoger Sayle6-4/+216
Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions, as previously reviewed (provisionally pre-approved) back in August 2020: https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551373.html Since then a few things have changed, so this patch uses the new SMUL_HIGHPART and UMUL_HIGHPART RTX expressions, but the test cases remain the same. Like the x86_64 backend, this patch retains the "trunc" forms of these instructions (while the RTL optimizers/combine may still generate them). Given that we're rapidly approaching stage 4, I also took the liberty of including support in nvptx.md for a few other instructions. With the new 64-bit highpart multiplication instructions added above, we can now provide a define_expand for efficient 64-bit (to 128-bit) widening multiplications. This patch also adds support for nvptx's testp.infinite instruction (for implementing __builtin_isinf) and the not.pred instruction. As an example of the code generation improvements, the function int foo(double x) { return __builtin_isinf(x); } previously generated with -O2: mov.f64 %r26, %ar0; abs.f64 %r28, %r26; setp.leu.f64 %r31, %r28, 0d7fefffffffffffff; selp.u32 %r30, 1, 0, %r31; mov.u32 %r29, %r30; cvt.u16.u8 %r35, %r29; mov.u16 %r33, %r35; xor.b16 %r32, %r33, 1; cvt.u32.u16 %r34, %r32; cvt.u32.u8 %value, %r34; and with this patch now generates: mov.f64 %r23, %ar0; testp.infinite.f64 %r24, %r23; selp.u32 %value, 1, 0, %r24; This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. gcc/ChangeLog: * config/nvptx/nvptx.md (UNSPEC_ISINF): New UNSPEC. (one_cmplbi2): New define_insn for not.pred. (mulditi3): New define_expand for signed widening multiply. (umulditi3): New define_expand for unsigned widening multiply. (smul<mode>3_highpart): New define_insn for signed highpart mult. (umul<mode>3_highpart): New define_insn for unsigned highpart mult. (*smulhi3_highpart_2): Renamed from smulhi3_highpart. (*smulsi3_highpart_2): Renamed from smulsi3_highpart. (*umulhi3_highpart_2): Renamed from umulhi3_highpart. (*umulsi3_highpart_2): Renamed from umulsi3_highpart. (*setcc<mode>_from_not_bi): New define_insn. (*setcc_isinf<mode>): New define_insn for testp.infinite. (isinf<mode>2): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/nvptx/mul-hi64.c: New test case. * gcc.target/nvptx/umul-hi64.c: New test case. * gcc.target/nvptx/mul-wide64.c: New test case. * gcc.target/nvptx/umul-wide64.c: New test case. * gcc.target/nvptx/isinf.c: New test case.
2022-02-10nvptx: Expand QI mode operations using SI mode instructionsRoger Sayle2-7/+123
One of the unusual target features of the Nvidia PTX ISA is that it doesn't provide QI mode (byte sized) operations or registers. Somewhat conventionally, 8-bit quantities are read from/written to memory using special instructions, but stored internally using SImode (32-bit) registers. GCC's middle-end accomodates targets without QImode optabs, by widening operations until suitable support is found, and with the current nvptx backend this means 16-bit HImode operations. The inconvenience is that nvptx is also a TARGET_TRULY_NOOP_TRUNCATION=false target, meaning that additional instructions are required to convert between the SImode registers used to hold QImode values, and the HImode registers used to operate on them (and back again). This results in a large amount of shuffling and type conversion in code dealing with bytes, i.e. using char or Boolean types. This patch improves the situation by providing expanders in the nvptx machine description to perform QImode operations natively in SImode instead of HImode. An alternate implementation might be to provide some form of target hook to specify which fallback modes to use during RTL expansion, but I think this requirement is unusual, and a solution entirely in the nvptx backend doesn't disturb/affect other targets. The improvements can be quite dramatic, as shown in the example below: int foo(int x, int y) { return (x==21) && (y==69); } previously with -O2 required 15 instructions: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; setp.eq.u32 %r31, %r26, 21; selp.u32 %r30, 1, 0, %r31; mov.u32 %r29, %r30; setp.eq.u32 %r34, %r27, 69; selp.u32 %r33, 1, 0, %r34; mov.u32 %r32, %r33; cvt.u16.u8 %r39, %r29; mov.u16 %r36, %r39; cvt.u16.u8 %r39, %r32; mov.u16 %r37, %r39; and.b16 %r35, %r36, %r37; cvt.u32.u16 %r38, %r35; cvt.u32.u8 %value, %r38; with this patch, now requires only 7 instructions: mov.u32 %r26, %ar0; mov.u32 %r27, %ar1; setp.eq.u32 %r31, %r26, 21; setp.eq.u32 %r34, %r27, 69; selp.u32 %r37, 1, 0, %r31; selp.u32 %r38, 1, 0, %r34; and.b32 %value, %r37, %r38; This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. gcc/ChangeLog: * config/nvptx/nvptx.md (cmp<mode>): Renamed from *cmp<mode>. (setcc<mode>_from_bi): Additionally support QImode. (extendbi<mode>2): Additionally support QImode. (zero_extendbi<mode>2): Additionally support QImode. (any_sbinary, any_ubinary, any_sunary, any_uunary): New code iterators for signed and unsigned, binary and unary operations. (<sbinary>qi3, <ubinary>qi3, <sunary>qi2, <uunary>qi2): New expanders to perform QImode operations using SImode instructions. (cstoreqi4): New define_expand. (*ext_truncsi2_qi): New define_insn. (*zext_truncsi2_qi): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/nvptx/bool-1.c: New test case.
2022-02-10nvptx: Improved support for HFMode including neghf2 and abshf2Roger Sayle5-0/+117
This patch adds more support for _Float16 (HFmode) to the nvptx backend. Currently negation, absolute value and floating point comparisons are implemented by promoting to float (SFmode). This patch adds suitable define_insns to nvptx.md, most conditional on TARGET_SM53 (-misa=sm_53). This patch also adds support for HFmode fused multiply-add. One subtlety is that neghf2 and abshf2 are implemented by (HImode) bit manipulation operations to update the sign bit. The NVidia PTX ISA documentation for neg.f16 and abs.f16 contains the caution "Future implementations may comply with the IEEE 754 standard by preserving the (NaN) payload and modifying only the sign bit". Given the availability of suitable replacements, I thought it best to provide IEEE 754 compliant implementations. If anyone observes a performance penalty from this choice I'm happy to provide a -ffast-math variant (or revisit this decision). This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu (including newlib) with a make and make -k check with no new failures. gcc/ChangeLog: * config/nvptx/nvptx.md (*cmpf): New define_insn. (cstorehf4): New define_expand. (fmahf4): New define_insn. (neghf2): New define_insn. (abshf2): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/nvptx/float16-3.c: New test case for neghf2. * gcc.target/nvptx/float16-4.c: New test case for abshf2. * gcc.target/nvptx/float16-5.c: New test case for fmahf4. * gcc.target/nvptx/float16-6.c: New test case.
2022-02-10doc: Tweak the www.bitwizard.nl referenceGerald Pfeifer1-1/+1
gcc: * doc/install.texi (Specific): Change the www.bitwizard.nl reference to use https.
2022-02-09C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.Marcel Vollweiler38-82/+961
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff): has_device_addr(list) "The has_device_addr clause indicates that its list items already have device addresses and therefore they may be directly accessed from a target device. If the device address of a list item is not for the device on which the target region executes, accessing the list item inside the region results in unspecified behavior. The list items may include array sections." (p. 200) "A list item may not be specified in both an is_device_ptr clause and a has_device_addr clause on the directive." (p. 202) "A list item that appears in an is_device_ptr or a has_device_addr clause must not be specified in any data-sharing attribute clause on the same target construct." (p. 203) gcc/c-family/ChangeLog: * c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * c-pragma.h (enum pragma_kind): Added 5.1 in comment. (enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr' clause. (c_parser_omp_variable_list): Handle array sections. (c_parser_omp_clause_has_device_addr): Added. (c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * c-typeck.cc (handle_omp_array_sections): Handle clause restrictions. (c_finish_omp_clauses): Handle array sections. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause. (cp_parser_omp_var_list_no_open): Handle array sections. (cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * semantics.cc (handle_omp_array_sections): Handle clause restrictions. (finish_omp_clauses): Handle array sections. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR case. * gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR. * openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR. (gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause. (resolve_omp_clauses): Same. * trans-openmp.cc (gfc_trans_omp_variable_list): Added OMP_LIST_HAS_DEVICE_ADDR case. (gfc_trans_omp_clauses): Firstprivatize of array descriptors. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Added cases for OMP_CLAUSE_HAS_DEVICE_ADDR and handle array sections. (gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR. (lower_omp_target): Same. * tree-core.h (enum omp_clause_code): Same. * tree-nested.cc (convert_nonlocal_omp_clauses): Same. (convert_local_omp_clauses): Same. * tree-pretty-print.cc (dump_omp_clause): Same. * tree.cc: Same. libgomp/ChangeLog: * libgomp.texi: Updated entry for HAS_DEVICE_ADDR. * target.c (copy_firstprivate_data): Copy only if host address is not NULL. * testsuite/libgomp.c++/target-has-device-addr-2.C: New test. * testsuite/libgomp.c++/target-has-device-addr-4.C: New test. * testsuite/libgomp.c++/target-has-device-addr-5.C: New test. * testsuite/libgomp.c++/target-has-device-addr-6.C: New test. * testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test. * testsuite/libgomp.c/target-has-device-addr-3.c: New test. * testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases. * g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases. * g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases. * c-c++-common/gomp/target-has-device-addr-1.c: New test. * c-c++-common/gomp/target-has-device-addr-2.c: New test. * c-c++-common/gomp/target-is-device-ptr-1.c: New test. * c-c++-common/gomp/target-is-device-ptr-2.c: New test. * gfortran.dg/gomp/is_device_ptr-3.f90: New test. * gfortran.dg/gomp/target-has-device-addr-1.f90: New test. * gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
2022-02-09AutoFDO: Don't try to promote indirect calls that result in recursive direct ↵Eugene Rozenfeld2-16/+78
calls AutoFDO tries to promote and inline all indirect calls that were promoted and inlined in the original binary and that are still hot. In the included test case, the promotion results in a direct call that is a recursive call. inline_call and optimize_inline_calls can't handle recursive calls at this stage. Currently, inline_call fails with a segmentation fault. This change leaves the indirect call alone if promotion will result in a recursive call. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (afdo_indirect_call): Don't attempt to promote indirect calls that will result in direct recursive calls. gcc/testsuite/ChangeLog: * g++.dg/tree-prof/indir-call-recursive-inlining.C : New test.
2022-02-09[COMMITTED] Fix PR aarch64/104474: ICE with vector float initializers and ↵Andrew Pinski4-1/+28
non-consts. The problem here is that the aarch64 back-end was placing const0_rtx into the constant vector RTL even if the mode was a floating point mode. The fix is instead to use CONST0_RTX and pass the mode to select the correct zero (either const_int or const_double). Committed as obvious after a bootstrap/test on aarch64-linux-gnu with no regressions. PR target/104474 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_handle_trailing_constants): Use CONST0_RTX instead of const0_rtx for the non-constant elements. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pr104474-1.c: New test. * gcc.target/aarch64/sve/pr104474-2.c: New test. * gcc.target/aarch64/sve/pr104474-3.c: New test.
2022-02-10Daily bump.GCC Administrator11-1/+515
2022-02-09analyzer: more uninit test coverageDavid Malcolm2-0/+204
In addition to other test coverage, this adds the examples from https://cwe.mitre.org/data/definitions/457.html (aka "CWE-457: Use of Uninitialized Variable") For reference, the output from -fanalyzer looks like this (after stripping away the DejaGnu directives): uninit-CWE-457-examples.c: In function 'example_2_bad_code': uninit-CWE-457-examples.c:56:3: warning: use of uninitialized value 'bN' [CWE-457] [-Wanalyzer-use-of-uninitialized-value] 56 | repaint(aN, bN); /* { dg-warning "use of uninitialized value 'bN'" } */ | ^~~~~~~~~~~~~~~ 'example_2_bad_code': events 1-4 | | 34 | int aN, bN; | | ^~ | | | | | (1) region created on stack here | 35 | switch (ctl) { | | ~~~~~~ | | | | | (2) following 'default:' branch... |...... | 51 | default: | | ~~~~~~~ | | | | | (3) ...to here |...... | 56 | repaint(aN, bN); | | ~~~~~~~~~~~~~~~ | | | | | (4) use of uninitialized value 'bN' here | uninit-CWE-457-examples.c: In function 'example_3_bad_code': uninit-CWE-457-examples.c:95:3: warning: use of uninitialized value 'test_string' [CWE-457] [-Wanalyzer-use-of-uninitialized-value] 95 | printf("%s", test_string); | ^~~~~~~~~~~~~~~~~~~~~~~~~ 'example_3_bad_code': events 1-4 | | 90 | char *test_string; | | ^~~~~~~~~~~ | | | | | (1) region created on stack here | 91 | if (i != err_val) | | ~ | | | | | (2) following 'false' branch (when 'i == err_val')... |...... | 95 | printf("%s", test_string); | | ~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (3) ...to here | | (4) use of uninitialized value 'test_string' here | gcc/testsuite/ChangeLog: * gcc.dg/analyzer/uninit-1.c: Add test coverage for shifts, comparisons, +, -, *, /, and __builtin_strlen. * gcc.dg/analyzer/uninit-CWE-457-examples.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-02-09compiler: don't warn for print()Ian Lance Taylor2-11/+2
We used to warn for calls to print(), because it doesn't do anything. However, a Go 1.18 test uses that call, and it is valid Go. Change the compiler to just accept it and compile it; this will produce calls to printlock and printunlock, and nothing else. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384355
2022-02-09compiler: use nil pointer for zero length string constantIan Lance Taylor2-4/+10
We used to pointlessly set the pointer of a zero length string constant to point to a zero byte constant. Instead, just use nil. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384354
2022-02-09compiler: treat notinheap types as not being pointersIan Lance Taylor4-8/+31
By definition, a type is marked notinheap doesn't contain any pointers that the garbage collector cares about, and neither does a pointer to such a type. Change the type descriptors to consistently treat such types as not being pointers, by setting ptrdata to 0 and gcdata to nil. Change-Id: Id8466555ec493456ff5ff09f1670551414619bd2 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/384118 Trust: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-02-09Fortran: try simplifications during reductions of array constructorsHarald Anlauf2-6/+85
gcc/fortran/ChangeLog: PR fortran/66193 * arith.cc (reduce_binary_ac): When reducing binary expressions, try simplification. Handle case of empty constructor. (reduce_binary_ca): Likewise. gcc/testsuite/ChangeLog: PR fortran/66193 * gfortran.dg/array_constructor_55.f90: New test.
2022-02-09gccgo: link static libgo against -lrt on GNU/LinuxIan Lance Taylor5-7/+60
The upcoming Go 1.18 release requires linking against -lrt on GNU/Linux (only) in order to call timer_create and friends. Also change gotools to link the runtime test against -lrt. * gospec.cc (RTLIB, RT_LIBRARY): Define. (lang_specific_driver): Add -lrt if linking statically on GNU/Linux. * configure.ac (RT_LIBS): Define. * Makefile.am (check-runtime): Set GOLIBS to $(RT_LIBS). * configure, Makefile.in: Regenerate.
2022-02-09libstdc++: Fix deadlock in atomic wait [PR104442]Thomas Rodgers1-4/+3
This issue was observed as a deadlock in 29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is "laundered" (e.g. type T* does not suffice as a waitable address for the platform's native waiting primitive), the address waited is that of the _M_ver member of __waiter_pool_base, so several threads may wait on the same address for unrelated atomic<T> objects. As noted in the PR, the implementation correctly exits the wait for the thread whose data changed, but not for any other threads waiting on the same address. As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting but the other waiters were not reloading the value of _M_ver before re-entering the wait. Moving the spin call inside the loop accomplishes this, and is consistent with the predicate accepting version of __waiter::_M_do_wait. libstdc++-v3/ChangeLog: PR libstdc++/104442 * include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin loop inside do loop so that threads failing the wait, reload _M_ver.
2022-02-09testsuite: AIX fixesDavid Edelsohn2-2/+3
gcc/testsuite/ChangeLog: * gcc.dg/Wstringop-overflow-69.c: Add -Wno-psabi. * gcc.dg/loop-unswitch-6.c: Omit -fcompare-debug on AIX.
2022-02-09x86: Compile PR target/104441 tests with -march=x86-64H.J. Lu2-2/+2
Compile PR target/104441 tests with -march=x86-64 to fix test failures when GCC is configured with --with-arch=native --with-cpu=native. PR target/104441 * gcc.target/i386/pr104441-1a.c: Compile with -march=x86-64. * gcc.target/i386/pr104441-1b.c: Likewise.
2022-02-09c: Fix up __builtin_assoc_barrier handling in the C FE [PR104427]Jakub Jelinek4-2/+19
The following testcase ICEs, because when creating PAREN_EXPR for __builtin_assoc_barrier the FE doesn't do the usual tweaks for EXCESS_PRECISION_EXPR or C_MAYBE_CONST_EXPR. I believe that the declared effect of the builtin is just association barrier, so e.g. excess precision should be still handled like if it wasn't there. The following patch uses build_unary_op to handle those. 2022-02-09 Jakub Jelinek <jakub@redhat.com> PR c/104427 * c-parser.cc (c_parser_postfix_expression) <case RID_BUILTIN_ASSOC_BARRIER>: Use parser_build_unary_op instead of build1_loc to build PAREN_EXPR. * c-typeck.cc (build_unary_op): Handle PAREN_EXPR. * c-fold.cc (c_fully_fold_internal): Likewise. * gcc.dg/pr104427.c: New test.
2022-02-09i386: -mno-xsave should disable all relevant ISA flags [PR104462]Uros Bizjak2-1/+15
2022-02-09 Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/104462 * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_XSAVE_UNSET): Also include OPTION_MASK_ISA2_AVX2_UNSET. gcc/testsuite/ChangeLog: PR target/104462 * gcc.target/i386/pr104462.c: New test.
2022-02-09i386: Force inputs to a register to avoid lowpart_subreg failure [PR104458]Uros Bizjak2-0/+16
Input operands can be in the form of: (subreg:DI (reg:V2SF 96) 0) which chokes lowpart_subreg. Force inputs to a register, which is preferable even when the input operand is from memory. 2022-02-09 Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/104458 * config/i386/i386-expand.cc (ix86_split_idivmod): Force operands[2] and operands[3] into a register.. gcc/testsuite/ChangeLog: PR target/104458 * gcc.target/i386/pr104458.c: New test.
2022-02-09Avoid using predefined insn name for instruction with different semanticsJeff Law1-2/+7
This isn't technically a regression, but it only impacts the v850 target and fixes a long standing code correctness issue. As outlined in slightly more detail in the PR, the v850 is using the pattern name "fnmasf4" and "fnmssf4" to generate fnmaf.s and fnmsf.s instructions respectively. Unfortunately fnmasf4 is expected to produce (-a * b) + c and fnmssf4 (-a * b) - c. Those v850 instructions actually negate the entire result. The fix is trivial. Use a different pattern name so that the combiner can still generate those instructions, but prevent those instructions from being used to implement GCC's notion of what fnmas and fnmss should be. This fixes pr97040 as well as a handful of testsuite failures for the v3e5 multilib. gcc/ PR target/97040 * config/v850/v850.md (*v850_fnmasf4): Renamed from fnmasf4. (*v850_fnmssf4): Renamed from fnmssf4
2022-02-09-fgo-dump-spec: really name alignment field "_"Ian Lance Taylor2-35/+34
* godump.cc (go_force_record_alignment): Really name the alignment field "_" (complete 2021-12-29 change). * gcc.misc-tests/godump-1.c: Adjust for alignment field rename.
2022-02-09rs6000: Correct function prototypes for vec_replace_unalignedBill Schmidt4-35/+38
Due to a pasto error in the documentation, vec_replace_unaligned was implemented with the same function prototypes as vec_replace_elt. It was intended that vec_replace_unaligned always specify output vectors as having type vector unsigned char, to emphasize that elements are potentially misaligned by this built-in function. This patch corrects the misimplementation. 2022-02-04 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000-builtins.def (VREPLACE_UN_UV2DI): Change function prototype. (VREPLACE_UN_UV4SI): Likewise. (VREPLACE_UN_V2DF): Likewise. (VREPLACE_UN_V2DI): Likewise. (VREPLACE_UN_V4SF): Likewise. (VREPLACE_UN_V4SI): Likewise. * config/rs6000/rs6000-overload.def (VEC_REPLACE_UN): Change all function prototypes. * config/rs6000/vsx.md (vreplace_un_<mode>): Remove define_expand. (vreplace_un_<mode>): New define_insn. gcc/testsuite/ * gcc.target/powerpc/vec-replace-word-runnable.c: Handle expected prototypes for each call to vec_replace_unaligned.
2022-02-09aarch64: Extend vec_concat patterns to 8-byte vectorsRichard Sandiford8-42/+430
This patch extends the previous support for 16-byte vec_concat so that it supports pairs of 4-byte elements. This too isn't strictly a regression fix, since the 8-byte forms weren't affected by the same problems as the 16-byte forms, but it leaves things in a more consistent state. gcc/ * config/aarch64/iterators.md (VDCSIF): New mode iterator. (VDBL): Handle SF. (single_wx, single_type, single_dtype, dblq): New mode attributes. * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Extend from VDC to VDCSIF. (store_pair_lanes<mode>): Likewise. (*aarch64_combine_internal<mode>): Likewise. (*aarch64_combine_internal_be<mode>): Likewise. (*aarch64_combinez<mode>): Likewise. (*aarch64_combinez_be<mode>): Likewise. * config/aarch64/aarch64.cc (aarch64_classify_address): Handle 8-byte modes for ADDR_QUERY_LDP_STP_N. (aarch64_print_operand): Likewise for %y. gcc/testsuite/ * gcc.target/aarch64/vec-init-13.c: New test. * gcc.target/aarch64/vec-init-14.c: Likewise. * gcc.target/aarch64/vec-init-15.c: Likewise. * gcc.target/aarch64/vec-init-16.c: Likewise. * gcc.target/aarch64/vec-init-17.c: Likewise.
2022-02-09aarch64: Remove move_lo/hi_quad expandersRichard Sandiford1-93/+18
This patch is the second of two to remove the old move_lo/hi_quad expanders and move_hi_quad insns. gcc/ * config/aarch64/aarch64-simd.md (@aarch64_split_simd_mov<mode>): Use aarch64_combine instead of move_lo/hi_quad. Tabify. (move_lo_quad_<mode>, aarch64_simd_move_hi_quad_<mode>): Delete. (aarch64_simd_move_hi_quad_be_<mode>, move_hi_quad_<mode>): Delete. (vec_pack_trunc_<mode>): Take general_operand elements and use aarch64_combine rather than move_lo/hi_quad to combine them. (vec_pack_trunc_df): Likewise.
2022-02-09aarch64: Add a general vec_concat expanderRichard Sandiford4-76/+122
After previous patches, we have a (mostly new) group of vec_concat patterns as well as vestiges of the old move_lo/hi_quad patterns. (A previous patch removed the move_lo_quad insns, but we still have the move_hi_quad insns and both sets of expanders.) This patch is the first of two to remove the old move_lo/hi_quad stuff. It isn't technically a regression fix, but it seemed better to make the changes now rather than leave things in a half-finished and inconsistent state. This patch defines an aarch64_vec_concat expander that coerces the element operands into a valid form, including the ones added by the previous patch. This in turn lets us get rid of one move_lo/hi_quad pair. As a side-effect, it also means that vcombines of 2 vectors make better use of the available forms, like vec_inits of 2 scalars already do. gcc/ * config/aarch64/aarch64-protos.h (aarch64_split_simd_combine): Delete. * config/aarch64/aarch64-simd.md (@aarch64_combinez<mode>): Rename to... (*aarch64_combinez<mode>): ...this. (@aarch64_combinez_be<mode>): Rename to... (*aarch64_combinez_be<mode>): ...this. (@aarch64_vec_concat<mode>): New expander. (aarch64_combine<mode>): Use it. (@aarch64_simd_combine<mode>): Delete. * config/aarch64/aarch64.cc (aarch64_split_simd_combine): Delete. (aarch64_expand_vector_init): Use aarch64_vec_concat. gcc/testsuite/ * gcc.target/aarch64/vec-init-12.c: New test.
2022-02-09aarch64: Add more vec_combine patternsRichard Sandiford5-0/+360
vec_combine is really one instruction on aarch64, provided that the lowpart element is in the same register as the destination vector. This patch adds patterns for that. The patch fixes a regression from GCC 8. Before the patch: int64x2_t s64q_1(int64_t a0, int64_t a1) { if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) return (int64x2_t) { a1, a0 }; else return (int64x2_t) { a0, a1 }; } generated: fmov d0, x0 ins v0.d[1], x1 ins v0.d[1], x1 ret whereas GCC 8 generated the more respectable: dup v0.2d, x0 ins v0.d[1], x1 ret gcc/ * config/aarch64/predicates.md (aarch64_reg_or_mem_pair_operand): New predicate. * config/aarch64/aarch64-simd.md (*aarch64_combine_internal<mode>) (*aarch64_combine_internal_be<mode>): New patterns. gcc/testsuite/ * gcc.target/aarch64/vec-init-9.c: New test. * gcc.target/aarch64/vec-init-10.c: Likewise. * gcc.target/aarch64/vec-init-11.c: Likewise.
2022-02-09aarch64: Remove redundant vec_concat patternsRichard Sandiford2-35/+17
move_lo_quad_internal_<mode> and move_lo_quad_internal_be_<mode> partially duplicate the later aarch64_combinez{,_be}<mode> patterns. The duplication itself is a regression. The only substantive differences between the two are: * combinez uses vector MOV (ORR) instead of element MOV (DUP). The former seems more likely to be handled via renaming. * combinez disparages the GPR->FPR alternative whereas move_lo_quad gave it equal cost. The new test gives a token example of when the combinez behaviour helps. gcc/ * config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>) (move_lo_quad_internal_be_<mode>): Delete. (move_lo_quad_<mode>): Use aarch64_combine<Vhalf> instead of the above. gcc/testsuite/ * gcc.target/aarch64/vec-init-8.c: New test.
2022-02-09aarch64: Generalise adjacency check for load_pair_lanesRichard Sandiford5-24/+62
This patch generalises the load_pair_lanes<mode> guard so that it uses aarch64_check_consecutive_mems to check for consecutive mems. It also allows the pattern to be used for STRICT_ALIGNMENT targets if the alignment is high enough. The main aim is to avoid an inline test, for the sake of a later patch that needs to repeat it. Reusing aarch64_check_consecutive_mems seemed simpler than writing an entirely new function. gcc/ * config/aarch64/aarch64-protos.h (aarch64_mergeable_load_pair_p): Declare. * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Use aarch64_mergeable_load_pair_p instead of inline check. * config/aarch64/aarch64.cc (aarch64_expand_vector_init): Likewise. (aarch64_check_consecutive_mems): Allow the reversed parameter to be null. (aarch64_mergeable_load_pair_p): New function.
2022-02-09aarch64: Generalise vec_set predicateRichard Sandiford1-1/+1
The aarch64_simd_vec_set<mode> define_insn takes memory operands, so this patch makes the vec_set<mode> optab expander do the same. gcc/ * config/aarch64/aarch64-simd.md (vec_set<mode>): Allow the element to be an aarch64_simd_nonimmediate_operand.
2022-02-09aarch64: Tighten general_operand predicatesRichard Sandiford1-3/+3
This patch fixes some case in which *general_operand was used over *nonimmediate_operand by patterns that don't accept immediates. This avoids some complication with later patches. gcc/ * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set<mode>): Use aarch64_simd_nonimmediate_operand instead of aarch64_simd_general_operand. (@aarch64_combinez<mode>): Use nonimmediate_operand instead of general_operand. (@aarch64_combinez_be<mode>): Likewise.
2022-02-09c++: memfn lookup consistency and using-decls [PR104432]Patrick Palka5-31/+73
In filter_memfn_lookup, we weren't correctly recognizing and matching up member functions introduced via a non-dependent using-decl. This caused us to crash in the below testcases in which we correctly pruned the overload set for the non-dependent call ahead of time, but then at instantiation time filter_memfn_lookup failed to match the selected function (introduced in each case by a non-dependent using-decl) to the corresponding function from the new lookup set. Such member functions need special handling in filter_memfn_lookup because they look exactly the same in the old and new lookup sets, whereas ordinary member functions that're defined in the (dependent) current class become more specialized in the new lookup set. This patch reworks the matching logic in filter_memfn_lookup so that it handles (member functions introduced by) non-dependent using-decls correctly, and is hopefully simpler overall. PR c++/104432 gcc/cp/ChangeLog: * call.cc (build_new_method_call): When a non-dependent call resolves to a specialization of a member template, always build the pruned overload set using the member template, not the specialization. * pt.cc (filter_memfn_lookup): New parameter newtype. Simplify and correct how members from the new lookup set are matched to those from the old one. (tsubst_baselink): Pass binfo_type as newtype to filter_memfn_lookup. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent19.C: New test. * g++.dg/template/non-dependent19a.C: New test. * g++.dg/template/non-dependent20.C: New test.
2022-02-09c++: modules and explicit(bool) [PR103752]Jason Merrill5-2/+48
We weren't streaming a C++20 dependent explicit-specifier. PR c++/103752 gcc/cp/ChangeLog: * module.cc (trees_out::core_vals): Stream explicit specifier. (trees_in::core_vals): Likewise. * pt.cc (store_explicit_specifier): No longer static. (tsubst_function_decl): Clear DECL_HAS_DEPENDENT_EXPLICIT_SPEC_P. * cp-tree.h (lookup_explicit_specifier): Declare. gcc/testsuite/ChangeLog: * g++.dg/modules/explicit-bool-1_b.C: New test. * g++.dg/modules/explicit-bool-1_a.H: New test.
2022-02-09middle-end/104464 - ISEL and non-call EH #2Richard Biener2-14/+25
The following adjusts the earlier change to still allow an uncritical replacement. 2022-02-09 Richard Biener <rguenther@suse.de> PR middle-end/104464 * gimple-isel.cc (gimple_expand_vec_cond_expr): Postpone throwing check to after unproblematic replacement. * gcc.dg/pr104464.c: New testcase.
2022-02-09c++: P2493 feature test macro updatesJason Merrill3-8/+8
The C++ committee just updated the values of these macros to reflect some late C++20 papers that we implement but others don't yet; see PR103891. gcc/c-family/ChangeLog: * c-cppbuiltin.cc (c_cpp_builtins): Update values of __cpp_constexpr and __cpp_concepts for C++20. gcc/testsuite/ChangeLog: * g++.dg/cpp23/feat-cxx2b.C: Adjust. * g++.dg/cpp2a/feat-cxx2a.C: Adjust.
2022-02-09[PATCH] PR tree-optimization/104420: Fix checks for constant folding X*0.0Roger Sayle6-8/+41
This patch resolves PR tree-optimization/104420, which is a P1 regression where, as observed by Jakub Jelinek, the conditions for constant folding x*0.0 are incorrect (following my patch for PR tree-optimization/96392). The multiplication x*0.0 may yield a negative zero result, -0.0, if X is negative (not just if x may be negative zero). Hence (without -ffast-math) (int)x*0.0 can't be optimized to 0.0, but (unsigned)x*0.0 can be constant folded. This adds a bunch of test cases to confirm the desired behaviour, and removes an incorrect test from gcc.dg/pr96392.c which checked for the wrong behaviour. 2022-02-09 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR tree-optimization/104420 * match.pd (mult @0 real_zerop): Tweak conditions for constant folding X*0.0 (or X*-0.0) to HONOR_SIGNED_ZEROS when appropriate. gcc/testsuite/ChangeLog PR tree-optimization/104420 * gcc.dg/pr104420-1.c: New test case. * gcc.dg/pr104420-2.c: New test case. * gcc.dg/pr104420-3.c: New test case. * gcc.dg/pr104420-4.c: New test case. * gcc.dg/pr96392.c: Remove incorrect test.
2022-02-09dwarf2out: Don't call expand_expr during early_dwarf [PR104407]Jakub Jelinek2-7/+49
As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init which can call expand_expr already during early_dwarf. The comment and PR explains it that the intent is to ensure the referenced vars and functions are properly mangled because free_lang_data doesn't cover everything, like template parameters etc. It doesn't work well though, because expand_expr can set DECL_RTLs e.g. on referenced vars and keep them there, and they can be created e.g. with different MEM_ALIGN compared to what they would be created with if they were emitted later. So, the following patch stops calling rtl_for_decl_init and instead for cases for which rtl_for_decl_init does anything at all walks the initializer and ensures referenced vars or functions are mangled. 2022-02-09 Jakub Jelinek <jakub@redhat.com> PR debug/104407 * dwarf2out.cc (mangle_referenced_decls): New function. (tree_add_const_value_attribute): Don't call rtl_for_decl_init if early_dwarf. Instead walk the initializer and try to mangle vars or functions referenced from it. * g++.dg/debug/dwarf2/pr104407.C: New test.
2022-02-09Register non-null side effects properly.Andrew MacLeod7-42/+187
This patch adjusts uses of nonnull to accurately reflect "somewhere in block". It also adds the ability to register statement side effects within a block for ranger which will apply for the rest of the block. PR tree-optimization/104288 gcc/ * gimple-range-cache.cc (non_null_ref::set_nonnull): New. (non_null_ref::adjust_range): Move to header. (ranger_cache::range_of_def): Don't check non-null. (ranger_cache::entry_range): Don't check non-null. (ranger_cache::range_on_edge): Check for nonnull on normal edges. (ranger_cache::update_to_nonnull): New. (non_null_loadstore): New. (ranger_cache::block_apply_nonnull): New. * gimple-range-cache.h (class non_null_ref): Update prototypes. (non_null_ref::adjust_range): Move to here and inline. (class ranger_cache): Update prototypes. * gimple-range-path.cc (path_range_query::range_defined_in_block): Do not search dominators. (path_range_query::adjust_for_non_null_uses): Ditto. * gimple-range.cc (gimple_ranger::range_of_expr): Check on-entry for def overrides. Do not check nonnull. (gimple_ranger::range_on_entry): Check dominators for nonnull. (gimple_ranger::range_on_edge): Check for nonnull on normal edges.. (gimple_ranger::register_side_effects): New. * gimple-range.h (gimple_ranger::register_side_effects): New. * tree-vrp.cc (rvrp_folder::fold_stmt): Call register_side_effects. gcc/testsuite/ * gcc.dg/pr104288.c: New.
2022-02-09tree-optimization/104445 - check for vector extraction supportRichard Biener5-6/+67
This adds a missing check to epilogue reduction re-use, namely that we can do hi/lo extracts from the vector when demoting it to the epilogue vector size. I've chosen to add a can_vec_extract helper to optabs-query.h, in the future we might want to simplify the vectorizers life by handling vector-from-vector extraction via BIT_FIELD_REFs during RTL expansion via the mode punning when the vec_extract is not directly supported. I'm not 100% sure we can always do the punning of the vec_extract result to a vector mode of the same size, but then I'm also not sure how to check for that (the vectorizer doesn't in other places it does that at the moment, but I suppose we eventually just go through memory there)? 2022-02-09 Richard Biener <rguenther@suse.de> PR tree-optimization/104445 PR tree-optimization/102832 * optabs-query.h (can_vec_extract): New. * optabs-query.cc (can_vec_extract): Likewise. * tree-vect-loop.cc (vect_find_reusable_accumulator): Check we can extract a hi/lo part from the larger vector, rework check iteration from larger to smaller sizes. * gcc.dg/vect/pr104445.c: New testcase.
2022-02-09x86: Add -m[no-]direct-extern-accessH.J. Lu32-17/+655
Add -m[no-]direct-extern-access and nodirect_extern_access attribute. -mdirect-extern-access is the default. With nodirect_extern_access attribute, GOT is always used to access undefined data and function symbols with nodirect_extern_access attribute, including in PIE and non-PIE. With -mno-direct-extern-access: 1. Always use GOT to access undefined data and function symbols, including in PIE and non-PIE. These will avoid copy relocations in executables. This is compatible with existing executables and shared libraries. 2. In executable and shared library, bind symbols with the STV_PROTECTED visibility locally: a. The address of data symbol is the address of data body. b. For systems without function descriptor, the function pointer is the address of function body. c. The resulting shared libraries may not be incompatible with executables which have copy relocations on protected symbols or use executable PLT entries as function addresses for protected functions in shared libraries. 3. Update asm_preferred_eh_data_format to select PC relative EH encoding format with -mno-direct-extern-access to avoid copy relocation. 4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy relocation with -mno-direct-extern-access. gcc/ PR target/35513 PR target/100593 * config/i386/gnu-property.cc: Include "i386-protos.h". (file_end_indicate_exec_stack_and_gnu_property): Generate a GNU_PROPERTY_1_NEEDED note for -mno-direct-extern-access or nodirect_extern_access attribute. * config/i386/i386-options.cc (handle_nodirect_extern_access_attribute): New function. (ix86_attribute_table): Add nodirect_extern_access attribute. * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a bool argument. (ix86_has_no_direct_extern_access): New. * config/i386/i386.cc (ix86_has_no_direct_extern_access): New. (ix86_force_load_from_GOT_p): Add a bool argument to indicate call operand. Force non-call load from GOT for -mno-direct-extern-access or nodirect_extern_access attribute. (legitimate_pic_address_disp_p): Avoid copy relocation in PIE for -mno-direct-extern-access or nodirect_extern_access attribute. (ix86_print_operand): Pass true to ix86_force_load_from_GOT_p for call operand. (asm_preferred_eh_data_format): Use PC-relative format for -mno-direct-extern-access to avoid copy relocation. Check ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4. (ix86_binds_local_p): Set ix86_has_no_direct_extern_access to true for -mno-direct-extern-access or nodirect_extern_access attribute. Don't treat protected data as extern and avoid copy relocation on common symbol with -mno-direct-extern-access or nodirect_extern_access attribute. (ix86_reloc_rw_mask): New to avoid copy relocation for -mno-direct-extern-access. (TARGET_ASM_RELOC_RW_MASK): New. * config/i386/i386.opt: Add -mdirect-extern-access. * doc/extend.texi: Document nodirect_extern_access attribute. * doc/invoke.texi: Document -m[no-]direct-extern-access. gcc/testsuite/ PR target/35513 PR target/100593 * g++.target/i386/pr35513-1.C: New file. * g++.target/i386/pr35513-2.C: Likewise. * gcc.target/i386/pr35513-1a.c: Likewise. * gcc.target/i386/pr35513-1b.c: Likewise. * gcc.target/i386/pr35513-2a.c: Likewise. * gcc.target/i386/pr35513-2b.c: Likewise. * gcc.target/i386/pr35513-3a.c: Likewise. * gcc.target/i386/pr35513-3b.c: Likewise. * gcc.target/i386/pr35513-4a.c: Likewise. * gcc.target/i386/pr35513-4b.c: Likewise. * gcc.target/i386/pr35513-5a.c: Likewise. * gcc.target/i386/pr35513-5b.c: Likewise. * gcc.target/i386/pr35513-6a.c: Likewise. * gcc.target/i386/pr35513-6b.c: Likewise. * gcc.target/i386/pr35513-7a.c: Likewise. * gcc.target/i386/pr35513-7b.c: Likewise. * gcc.target/i386/pr35513-8.c: Likewise. * gcc.target/i386/pr35513-9a.c: Likewise. * gcc.target/i386/pr35513-9b.c: Likewise. * gcc.target/i386/pr35513-10a.c: Likewise. * gcc.target/i386/pr35513-10b.c: Likewise. * gcc.target/i386/pr35513-11a.c: Likewise. * gcc.target/i386/pr35513-11b.c: Likewise. * gcc.target/i386/pr35513-12a.c: Likewise. * gcc.target/i386/pr35513-12b.c: Likewise.
2022-02-09x86: Check each component of source operand for AVX_U128_DIRTYH.J. Lu3-66/+168
commit 9775e465c1fbfc32656de77c618c61acf5bd905d Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Jul 27 07:46:04 2021 -0700 x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register called ix86_check_avx_upper_register to check mode on source operand. But ix86_check_avx_upper_register doesn't work on source operand like (vec_select:V2DI (reg/v:V4DI 23 xmm3 [orig:91 ymm ] [91]) (parallel [ (const_int 2 [0x2]) (const_int 3 [0x3]) ])) Add ix86_avx_u128_mode_source to check mode for each component of source operand. gcc/ PR target/104441 * config/i386/i386.cc (ix86_avx_u128_mode_source): New function. (ix86_avx_u128_mode_needed): Return AVX_U128_ANY for debug INSN. Call ix86_avx_u128_mode_source to check mode for each component of source operand. gcc/testsuite/ PR target/104441 * gcc.target/i386/pr104441-1a.c: New test. * gcc.target/i386/pr104441-1b.c: Likewise.
2022-02-09ICE: QImode(not SImode) operand should be passed to gen_vec_initv16qiqi in ↵liuhongt2-1/+27
ashlv16qi3. ix86_expand_vector_init expects vals to be a parallel containing values of individual fields which should be either element mode of the vector mode, or a vector mode with the same element mode and smaller number of elements. But in the expander ashlv16qi3, the second operand is SImode which can't be directly passed to gen_vec_initv16qiqi. gcc/ChangeLog: PR target/104451 * config/i386/sse.md (<insn><mode>3): lowpart_subreg operands[2] from SImode to QImode. gcc/testsuite/ChangeLog: PR target/104451 * gcc.target/i386/pr104451.c: New test.
2022-02-09middle-end/104450 - ISEL and non-call EHRichard Biener2-10/+30
The following avoids merging a vector compare with EH with a VEC_COND_EXPR. We should be able to do fallback expansion and if we really are for the optimization we need quite some shuffling to arrange for the proper EH redirection in all cases, IMHO not worth it. 2022-02-09 Richard Biener <rguenther@suse.de> PR middle-end/104450 * gimple-isel.cc: Pass cfun around. (+gimple_expand_vec_cond_expr): Do not combine a throwing comparison with the select. * g++.dg/torture/pr104450.C: New testcase.
2022-02-09target/104453 - guard call folding with NULL LHSRichard Biener2-0/+13
This guards shift builtin folding to do nothing when there is no LHS, similar to what other foldings do. 2022-02-09 Richard Biener <rguenther@suse.de> PR target/104453 * config/i386/i386.cc (ix86_gimple_fold_builtin): Guard shift folding for NULL LHS. * gcc.target/i386/pr104453.c: New testcase.
2022-02-08compiler: recognize Go 1.18 runtime/internal/atomic methodsIan Lance Taylor4-2/+174
The Go 1.18 library introduces specific types in runtime/internal/atomic. Recognize and optimize the methods on those types, as we do with the functions in runtime/internal/atomic. While we're here avoid getting confused by methods in any other package that we recognize specially. * go-gcc.cc (Gcc_backend::Gcc_backend): Define builtins __atomic_load_1 and __atomic_store_1. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/383654