aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-04-30hwasan: adjust wording in expected output in testsMartin Liska4-7/+7
gcc/testsuite/ChangeLog: * c-c++-common/hwasan/asan-pr70541.c: Adjust wording of expected output. * c-c++-common/hwasan/heap-overflow.c: Likewise. * c-c++-common/hwasan/sanity-check-pure-c.c: Likewise. * c-c++-common/hwasan/use-after-free.c: Likewise.
2023-04-30[PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__Longjun Luo2-4/+69
From 0821df518b264e754d698d399f98be1a62945e32 Mon Sep 17 00:00:00 2001 From: Longjun Luo <luolongjuna@gmail.com> Date: Thu, 12 Jan 2023 23:59:54 +0800 Subject: [PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__ As implied in gcc.gnu.org/legacy-ml/gcc-patches/2008-09/msg00076.html, gcc provides -Wno-builtin-macro-redefined to suppress warning when redefining builtin macro. However, at that time, there was no scenario for __LINE__ macro. But, when we try to build a live-patch, we compare sections by using -ffunction-sections. Some same functions are considered changed because of __LINE__ macro. At present, to detect such a changed caused by __LINE__ macro, we have to analyse code and maintain a function list. For example, in kpatch, check this commit github.com/dynup/kpatch/commit/0e1b95edeafa36edb7bcf11da6d1c00f76d7e03d. So, in this scenario, when we try to compared sections, it would be better to support suppress builtin macro redefined warnings for __LINE__ macro. libcpp: * init.cc (builtin_array): Do not always warn for a redefinition of __LINE__. gcc/testsuite * gcc.dg/builtin-redefine.c: Test for redefintion warnings for __LINE__. * gcc.dg/builtin-redefine-1.c: New test.
2023-04-30gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXINGJoakim Nohlgård2-20/+26
Fall back to ld -r if ld -shared fails during configure. The check for HAVE_LD_RO_RW_SECTION_MIXING can fail on targets where ld does not support shared objects, even though the answer to the test should be 'read-write'. One such target is riscv64-unknown-elf. Failing this test results in a libgcc crtbegin.o which has a writable .eh_frame section leading to the default linker scripts placing the .eh_frame section in a writable memory segment, or a linker warning when using ld scripts that place .eh_frame unconditionally in ROM. gcc/ChangeLog: * configure: Regenerate. * configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING
2023-04-30Remove duplicate constants created between passesGaius Mulley3-65/+211
There is no need to re-create constant literals between passes. This patch creates a constant pool and reuses a constant literal providing it is created at the same location. This in turn avoids generating duplicate overflow error messages when encountering an out of range constant literal. gcc/m2/ChangeLog: * gm2-compiler/SymbolTable.mod (ConstLitPoolEntry): New pointer to record. (ConstLitSym): New field RangeError. (ConstLitPoolTree): New SymbolTree representing name to index. (ConstLitArray): New dynamic array containing pointers to a ConstLitPoolEntry. (CreateConstLit): New procedure function. (LookupConstLitPoolEntry): New procedure function. (AddConstLitPoolEntry): New procedure function. (MakeConstLit): Re-implemented to check the constant lit pool before calling CreateConstLit. * m2.flex: Add ability to decode binary constant literals. gcc/testsuite/ChangeLog: * gm2/pim/run/pass/constlitbase.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-04-30Daily bump.GCC Administrator3-1/+69
2023-04-30reload: Handle generating reloads that also clobbers flagsHans-Peter Nilsson1-3/+26
* reload1.cc (emit_insn_if_valid_for_reload_1): Rename from emit_insn_if_valid_for_reload. (emit_insn_if_valid_for_reload): Call new helper, and if a SET fails to be recognized, also try emitting a parallel that clobbers TARGET_FLAGS_REGNUM, as applicable.
2023-04-29[xstormy16] Efficient HImode rotate left by a single bit.Roger Sayle3-5/+41
This patch contains some minor tweak to xstormy16's machine description most significantly providing a pattern for HImode rotate left by a single bit that requires only two instructions. unsigned short foo(unsigned short x) { return (x << 1) | (x >> 15); } currently with -O2 generates: foo: mov r7,r2 shr r7,#15 shl r2,#1 or r2,r7 ret with this patch, GCC now generates: foo: shl r2,#1 | adc r2,#0 ret Additionally neghi2 is converted to a define_insn (so that the RTL optimizers see the negation semantics), and HImode rotations by 8-bits can now be recognized and implemented using swpb. 2023-04-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (neghi2): Convert from a define_expand to a define_insn. (*rotatehi_1): New define_insn for efficient 2 insn sequence. (*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb. gcc/testsuite/ChangeLog * gcc.target/xstormy16/neghi2.c: New test case. * gcc.target/xstormy16/rotatehi-1.c: Likewise.
2023-04-29[xstormy16] Recognize/support swpn (swap nibbles) instruction.Roger Sayle5-0/+164
This patch adds support for xstormy16's swap nibbles instruction (swpn). For the test case: short foo(short x) { return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f); } GCC with -O2 currently generates the nine instruction sequence: foo: mov r7,r2 asr r2,#4 and r2,#15 mov.w r6,#-256 and r6,r7 or r2,r6 shl r7,#4 and r7,#255 or r2,r7 ret with this patch, we now generate: foo: swpn r2 ret To achieve this using combine's four instruction "combinations" requires a little wizardry. Firstly, define_insn_and_split are introduced to treat logical shifts followed by bitwise-AND as macro instructions that are split after reload. This is sufficient to recognize a QImode nibble swap, which can be implemented by swpn followed by either a zero-extension or a sign-extension from QImode to HImode. Then finally, in the correct context, a QImode swap-nibbles pattern can be combined to preserve the high-byte of a HImode word, matching the xstormy16's swpn semantics. The naming of the new code iterators is taken from i386.md. 2023-04-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (any_lshift): New code iterator. (any_or_plus): Likewise. (any_rotate): Likewise. (*<any_lshift>_and_internal): New define_insn_and_split to recognize a logical shift followed by an AND, and split it again after reload. (*swpn): New define_insn matching xstormy16's swpn. (*swpn_zext): New define_insn recognizing swpn followed by zero_extendqihi2, i.e. with the high byte set to zero. (*swpn_sext): Likewise, for swpn followed by cbw. (*swpn_sext_2): Likewise, for an alternate RTL form. (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior sequence is split in the correct place to recognize the *swpn_zext followed by any_or_plus (ior, xor or plus) instruction. gcc/testsuite/ChangeLog * gcc.target/xstormy16/swpn-1.c: New QImode test case. * gcc.target/xstormy16/swpn-2.c: New zero_extend test case. * gcc.target/xstormy16/swpn-3.c: New sign_extend test case. * gcc.target/xstormy16/swpn-4.c: New HImode test case.
2023-04-29add glibc-stdint.h to vax and lm32 linux target (PR target/105525)Mikael Pettersson1-2/+2
PR target/105525 is a build regression for the vax and lm32 linux targets present in gcc-12/13/head, where the builds fail due to unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__, caused by these two targets failing to provide glibc-stdint.h. Fixed thusly, tested by building crosses, which now succeeds. Ok for trunk? (Note I don't have commit rights.) PR target/105525 gcc/ * config.gcc (vax-*-linux*): Add glibc-stdint.h. (lm32-*-uclinux*): Likewise.
2023-04-29Adjust mips test for recent ifcvt costing changesJeff Law2-4/+4
MIPS ports have been failing a few tests since the change to add cost checks in another path through the if-converter pass. As with the other ports, these look like cases where we don't do good costing in the MIPS port. Someone who cares about MIPS will need to fix this properly. In the mean time this patch adjusts the branch cost when running the two affected tests and skips them at -Os. This is enough to verify that if conversion can still happen if the costs are adjusted. gcc/testsuite * gcc.target/mips/mips-ps-type-2.c: Adjust branch cost to encourage if-conversion. Skip for -Os. * gcc.target/mips/movcc-3.c: Similarly.
2023-04-29RISC-V: decouple stack allocation for rv32e w/o save-restoreFei Gao2-22/+50
Currently in rv32e, stack allocation for GPR callee-saved registers is always 12 bytes w/o save-restore. Actually, for the case without save-restore, less stack memory can be reserved. This patch decouples stack allocation for rv32e w/o save-restore and makes riscv_compute_frame_info more readable. output of testcase rv32e_stack.c before patch: addi sp,sp,-16 sw ra,12(sp) call getInt sw a0,0(sp) lw a0,0(sp) call PrintInts lw a5,0(sp) mv a0,a5 lw ra,12(sp) addi sp,sp,16 jr ra after patch: addi sp,sp,-8 sw ra,4(sp) call getInt sw a0,0(sp) lw a0,0(sp) call PrintInts lw a5,0(sp) mv a0,a5 lw ra,4(sp) addi sp,sp,8 jr ra gcc/ChangeLog: * config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function for riscv_use_save_libcall. (riscv_use_save_libcall): call riscv_avoid_save_libcall. (riscv_compute_frame_info): restructure to decouple stack allocation for rv32e w/o save-restore. gcc/testsuite/ChangeLog: * gcc.target/riscv/rv32e_stack.c: New test.
2023-04-29Daily bump.GCC Administrator7-1/+334
2023-04-29testsuite: Handle empty assembly lines in check-function-bodiesHans-Peter Nilsson1-1/+1
I tried to make use of check-function-bodies for cris-elf and was a bit surprised to see it failing. There's a deliberate empty line after the filled delay slot of the return-function which was mishandled. I thought "aha" and tried to add an empty line (containing just a "**" prefix) to the match, but that didn't help. While it was added as input from the function's assembly output to-be-matched like any other line, it couldn't be matched: I had to use "...", which works but is...distracting. Some digging shows that an empty assembly line can't be deliberately matched because all matcher lines (lines starting with the prefix, the ubiquitous "**") are canonicalized by trimming leading whitespace (the "string trim" in check-function-bodies) and instead adding a leading TAB character, thus empty lines end up containing just a TAB. For usability it's better to treat empty lines as fluff than to uglifying the test-case and the code to properly match them. Double-checking, no test-case tries to match an line containing just TAB (by providing an a line containing just "**\s*", i.e. zero or more whitespace characters). * lib/scanasm.exp (parse_function_bodies): Set fluff to include empty lines (besides optionally leading whitespace).
2023-04-28Fix autoprofiledbootstrap buildEugene Rozenfeld4-11/+88
1. Fix gcov version 2. Merge perf data collected when compiling the compiler and runtime libraries 3. Fix documentation typo Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.in: Define PROFILE_MERGER * Makefile.tpl: Define PROFILE_MERGER gcc/c/ChangeLog: * Make-lang.in: Merge perf data collected when compiling cc1 and runtime libraries gcc/cp/ChangeLog: * Make-lang.in: Merge perf data collected when compiling cc1plus and runtime libraries gcc/lto/ChangeLog: * Make-lang.in: Merge perf data collected when compiling lto1 and runtime libraries gcc/ChangeLog: * doc/install.texi: Fix documentation typo
2023-04-28RISC-V: Add divmod expansion supportMatevos Mehrabyan6-0/+65
Hi all, If we have division and remainder calculations with the same operands: a = b / c; d = b % c; We can replace the calculation of remainder with multiplication + subtraction, using the result from the previous division: a = b / c; d = a * c; d = b - d; Which will be faster. Currently, it isn't done for RISC-V. I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'. Best regards, Matevos. gcc/ChangeLog: * config/riscv/iterators.md (only_div, paired_mod): New iterators. (u): Add div/udiv cases. * config/riscv/riscv-protos.h (riscv_use_divmod_expander): Prototype. * config/riscv/riscv.cc (struct riscv_tune_param): Add field for divmod expansion. (rocket_tune_info, sifive_7_tune_info): Initialize new field. (thead_c906_tune_info): Likewise. (optimize_size_tune_info): Likewise. (riscv_use_divmod_expander): New function. * config/riscv/riscv.md (<u>divmod<mode>4): New expander. gcc/testsuite/ChangeLog: * gcc.target/riscv/divmod-1.c: New testcase. * gcc.target/riscv/divmod-2.c: New testcase.
2023-04-28RISC-V: Added support clmul[r,h] instructions for Zbc extension.Karen Sargsyan9-34/+97
clmul[h] instructions were added only for the ZBKC extension. This patch includes them in the ZBC extension too. Besides, added support of 'clmulr' instructions for ZBC extension. gcc/ChangeLog: * config/riscv/bitmanip.md: Added clmulr instruction. * config/riscv/riscv-builtins.cc (AVAIL): Add new. * config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type. (type): Add clmul * config/riscv/riscv-cmo.def: Added built-in function for clmulr. * config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md. * config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in functions to riscv-cmo.def. * config/riscv/generic.md: Add clmul to list of instructions using the generic_imul reservation. gcc/testsuite/ChangeLog: * gcc.target/riscv/zbc32.c: New test. * gcc.target/riscv/zbc64.c: New test.
2023-04-28RISC-V: Eliminate redundant zero extension of minu/maxu operandsJivan Hakobyan3-3/+39
RV64 the following code: unsigned Min(unsigned a, unsigned b) { return a < b ? a : b; } Compiles to: Min: zext.w a1,a1 zext.w a0,a0 minu a0,a1,a0 sext.w a0,a0 ret This patch removes unnecessary zero extensions of minu/maxu operands. gcc/ChangeLog: * config/riscv/bitmanip.md: Added expanders for minu/maxu instructions gcc/testsuite/ChangeLog: * gcc.target/riscv/zbb-min-max-02.c: Updated scanning check. * gcc.target/riscv/zbb-min-max-03.c: New tests.
2023-04-28PHIOPT: Move two_value_replacement to match.pdAndrew Pinski2-155/+96
This patch converts two_value_replacement function into a match.pd pattern. It is a direct translation with only one minor change, does not check for the {0,+-1} case as that is handled before in match.pd so there is no reason to do the extra check for it. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/100958 * tree-ssa-phiopt.cc (two_value_replacement): Remove. (pass_phiopt::execute): Don't call two_value_replacement. * match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to handle what two_value_replacement did.
2023-04-28MATCH: Add patterns from phiopt's minmax_replacementAndrew Pinski3-3/+26
This adds a few patterns from phiopt's minmax_replacement for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> . It is progress to remove minmax_replacement from phiopt. There are still some more cases dealing with constants on the edges (0/INT_MAX) to handle in match. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: Add patterns for "(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>". gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly. * gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert as that now does the combining.
2023-04-28MATCH: Factor out code that for min max detection with constantsAndrew Pinski3-28/+48
This factors out some of the code from the min/max detection from match.pd into a function so it can be reused in other places. This is mainly used to detect the conversions of >= to > which causes the integer values to be changed by one. Changes since v1: * factor out the checks for INTEGER_CSTs so it is more obvious. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Factor out the deciding the min/max from the "(cond (cmp (convert1? x) c1) (convert2? x) c2)" pattern to ... * fold-const.cc (minmax_from_comparison): this new function. * fold-const.h (minmax_from_comparison): New prototype.
2023-04-28PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.Roger Sayle2-11/+39
This patch fixes PR rtl-optimization/109476, which is a code quality regression affecting AVR. The cause is that the lower-subreg pass is sometimes overly aggressive, lowering the LSHIFTRT below: (insn 7 4 8 2 (set (reg:HI 51) (lshiftrt:HI (reg/v:HI 49 [ b ]) (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3} (nil)) into a pair of QImode SUBREG assignments: (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0) (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split} (nil)) (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1) (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split} (nil)) but this idiom, SETs of SUBREGs, interferes with combine's ability to associate/fuse instructions. The solution, on targets that have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false), is to split/lower LSHIFTRT to a ZERO_EXTEND. To answer Richard's question in comment #10 of the bugzilla PR, the function resolve_shift_zext is called with one of four RTX codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with LSHIFTRT can the setting of low_part and high_part SUBREGs be replaced by a ZERO_EXTEND. For ASHIFTRT, we require a sign extension, so don't set the high_part to zero; if we're splitting a ZERO_EXTEND then it doesn't make sense to replace it with a ZERO_EXTEND, and for ASHIFT we've played games to swap the high_part and low_part SUBREGs, so that we assign the low_part to zero (for double word shifts by greater than word size bits). 2023-04-28 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/109476 * lower-subreg.cc: Include explow.h for force_reg. (find_decomposable_shift_zext): Pass an additional SPEED_P argument. If decomposing a suitable LSHIFTRT and we're not splitting ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND instead of setting a high part SUBREG to zero, which helps combine. (decompose_multiword_subregs): Update call to resolve_shift_zext. gcc/testsuite/ChangeLog PR rtl-optimization/109476 * gcc.target/avr/mmcu/pr109476.c: New test case.
2023-04-28Add emulated scatter capability to the vectorizerRichard Biener7-35/+97
This adds a scatter vectorization capability to the vectorizer without target support by decomposing the offset and data vectors and then performing scalar stores in the order of vector lanes. This is aimed at cases where vectorizing the rest of the loop offsets the cost of vectorizing the scatter. The offset load is still vectorized and costed as such, but like with emulated gather those will be turned back to scalar loads by forwrpop. * tree-vect-data-refs.cc (vect_analyze_data_refs): Always consider scatters. * tree-vect-stmts.cc (vect_model_store_cost): Pass in the gather-scatter info and cost emulated scatters accordingly. (get_load_store_type): Support emulated scatters. (vectorizable_store): Likewise. Emulate them by extracting scalar offsets and data, doing scalar stores. * gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere. * gcc.dg/vect/vect-71.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.
2023-04-28Adjust costing of emulated vectorized gather/scatterRichard Biener4-4/+29
Emulated gather/scatter behave similar to strided elementwise accesses in that they need to decompose the offset vector and construct or decompose the data vector so handle them the same way, pessimizing the cases with may elements. For pr88531-2c.c instead of .L4: leaq (%r15,%rcx), %rdx incl %edi movl 16(%rdx), %r13d movl 24(%rdx), %r14d movl (%rdx), %r10d movl 4(%rdx), %r9d movl 8(%rdx), %ebx movl 12(%rdx), %r11d movl 20(%rdx), %r12d vmovss (%rax,%r14,4), %xmm2 movl 28(%rdx), %edx vmovss (%rax,%r13,4), %xmm1 vmovss (%rax,%r10,4), %xmm0 vinsertps $0x10, (%rax,%rdx,4), %xmm2, %xmm2 vinsertps $0x10, (%rax,%r12,4), %xmm1, %xmm1 vinsertps $0x10, (%rax,%r9,4), %xmm0, %xmm0 vmovlhps %xmm2, %xmm1, %xmm1 vmovss (%rax,%rbx,4), %xmm2 vinsertps $0x10, (%rax,%r11,4), %xmm2, %xmm2 vmovlhps %xmm2, %xmm0, %xmm0 vinsertf128 $0x1, %xmm1, %ymm0, %ymm0 vmulps %ymm3, %ymm0, %ymm0 vmovups %ymm0, (%r8,%rcx) addq $32, %rcx cmpl %esi, %edi jb .L4 we now prefer .L4: leaq 0(%rbp,%rdx,8), %rcx movl (%rcx), %r10d movl 4(%rcx), %ecx vmovss (%rsi,%r10,4), %xmm0 vinsertps $0x10, (%rsi,%rcx,4), %xmm0, %xmm0 vmulps %xmm1, %xmm0, %xmm0 vmovlps %xmm0, (%rbx,%rdx,8) incq %rdx cmpl %edi, %edx jb .L4 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Tame down element extracts and scalar loads for gather/scatter similar to elementwise strided accesses. * gcc.target/i386/pr89618-2.c: New testcase. * gcc.target/i386/pr88531-2b.c: Adjust. * gcc.target/i386/pr88531-2c.c: Likewise.
2023-04-28RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLRPan Li2-0/+323
When some RVV integer compare operators act on the same vector registers without mask. They can be simplified to VMCLR. This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the simplification by adding one new define_split. Given we have: vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) { return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); } Before this patch: vsetvli zero,a2,e8,m8,ta,ma vl8re8.v v24,0(a1) vmslt.vv v8,v24,v24 vsetvli a5,zero,e8,m8,ta,ma vsm.v v8,0(a0) ret After this patch: vsetvli zero,a2,e8,mf8,ta,ma vmclr.m v24 <- optimized to vmclr.m vsetvli zero,a5,e8,mf8,ta,ma vsm.v v24,0(a0) ret As above, we may have one instruction eliminated and require less vector registers. gcc/ChangeLog: * config/riscv/vector.md: Add new define split to perform the simplification. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
2023-04-28ipa/109652 - ICE in modification phase of IPA SRARichard Biener2-2/+45
There's another questionable IL transform by IPA SRA, replacing foo (p_1(D)->x) with foo (VIEW_CONVERT <union type> (ISRA.PARM.1)) where ISRA.PARM.1 is a register. Conversion of a register to an aggregate type is questionable but not entirely unreasonable and not within the set of IL I am rejecting when fixing PR109644. The following lets this slip through in IPA SRA transform by restricting re-gimplification to the case of register type results. To not break the previous testcase again we need to optimize the BIT_FIELD_REF <VIEW_CONVERT <...>, ...> case to elide the conversion. PR ipa/109652 * ipa-param-manipulation.cc (ipa_param_body_adjustments::modify_expression): Allow conversion of a register to a non-register type. Elide conversions inside BIT_FIELD_REFs. * gcc.dg/torture/pr109652.c: New testcase.
2023-04-28OpenACC: Stand-alone attach/detach clause fixes for Fortran [PR109622]Julian Brown2-9/+39
This patch fixes several cases where multiple attach or detach mapping nodes were being created for stand-alone attach or detach clauses in Fortran. After the introduction of stricter checking later during compilation, these extra nodes could cause ICEs, as seen in the PR. The patch also fixes cases that "happened to work" previously where the user attaches/detaches a pointer to array using a descriptor, and (I think!) the "_data" field has offset zero, hence the same address as the descriptor as a whole. 2023-04-27 Julian Brown <julian@codesourcery.com> PR fortran/109622 gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_clauses): Attach/detach clause fixes. gcc/testsuite/ * gfortran.dg/goacc/attach-descriptor.f90: Adjust expected output. libgomp/ * testsuite/libgomp.fortran/pr109622.f90: New test. * testsuite/libgomp.fortran/pr109622-2.f90: New test. * testsuite/libgomp.fortran/pr109622-3.f90: New test.
2023-04-28tree-optimization/109644 - missing IL checkingRichard Biener1-31/+44
We fail to verify the constraints under which we allow handled components to wrap registers. The gcc.dg/pr70022.c testcase shows that we happily end up with _2 = VIEW_CONVERT_EXPR<int[4]>(v_1(D)) as produced by SSA rewrite and update_address_taken. But the intent was that we wrap registers with at most a single level of handled components and specifically only allow __real, __imag, BIT_FIELD_REF and VIEW_CONVERT_EXPR on them, but not ARRAY_REF or COMPONENT_REF. The following makes IL verification stricter which catches the problem. PR tree-optimization/109644 * tree-cfg.cc (verify_types_in_gimple_reference): Check register constraints on the outermost VIEW_CONVERT_EXPR only. Do not allow register or invariant bases on multi-level or possibly variable index handled components.
2023-04-28Avoid more invalid GIMPLE with register basesRichard Biener1-0/+5
The Ada frontend, for example with gnat.dg/inline2_pkg.adb, tends to create VIEW_CONVERT expressions with aggregate type even of non-aggregate entities. In this case for example return <retval> = (BIT_FIELD_REF <VIEW_CONVERT_EXPR<struct inline2_pkg__ieee_short_real>(number), 16, 16> & 32640) != 32640; currently gimplification and SSA rewrite turn this into _1 = BIT_FIELD_REF <VIEW_CONVERT_EXPR<struct inline2_pkg__ieee_short_real>(number_2(D)); which is two operations on a register. While as seen with PR109652 we might not want to completely rule out register to aggregate type VIEW_CONVERTs we definitely do not want to stack multiple ops here. The solution is to make sure the gimplifier puts a non-register as the base object. For the above this will add number.1 = number; and use number.1 in the compound reference. Code generation is unchanged, FRE optimizes this to BIT_FIELD_REF <number_2(D), ...>. I think BIT_FIELD_REF <VIEW_CONVERT (x), ...> could be always rewritten into BIT_FIELD_REF <x, ...>, but that's a separate thing. * gimplify.cc (gimplify_compound_lval): When there's a non-register type produced by one of the handled component operations make sure we get a non-register base.
2023-04-28tree-optimization/108752 - vectorize emulated vectors in lowered formRichard Biener4-49/+125
The following makes sure to emit operations lowered to bit operations when vectorizing using emulated vectors. This avoids relying on the vector lowering pass adhering to the exact same cost considerations as the vectorizer. PR tree-optimization/108752 * tree-vect-generic.cc (build_replicated_const): Rename to build_replicated_int_cst and move to tree.{h,cc}. (do_plus_minus): Adjust. (do_negate): Likewise. * tree-vect-stmts.cc (vectorizable_operation): Emit emulated arithmetic vector operations in lowered form. * tree.h (build_replicated_int_cst): Declare. * tree.cc (build_replicated_int_cst): Moved from tree-vect-generic.cc build_replicated_const.
2023-04-28aarch64: PR target/99195 annotate more integer unary patterns for vec-concat ↵Kyrylo Tkachov2-10/+32
with zero More of the straightforward cases to annotate plus tests, this time for simple integer unary ops. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_rbit<mode>): Rename to... (aarch64_rbit<mode><vczle><vczbe>): ... This. (neg<mode>2): Rename to... (neg<mode>2<vczle><vczbe>): ... This. (abs<mode>2): Rename to... (abs<mode>2<vczle><vczbe>): ... This. (aarch64_abs<mode>): Rename to... (aarch64_abs<mode><vczle><vczbe>): ... This. (one_cmpl<mode>2): Rename to... (one_cmpl<mode>2<vczle><vczbe>): ... This. (clrsb<mode>2): Rename to... (clrsb<mode>2<vczle><vczbe>): ... This. (clz<mode>2): Rename to... (clz<mode>2<vczle><vczbe>): ... This. (popcount<mode>2): Rename to... (popcount<mode>2<vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add tests for unary integer ops.
2023-04-28Fortran: Fix (mostly) comment typosTobias Burnus22-53/+53
Only other changes are fixing the variable name a(b)breviated_modproc_decl and a few typos in gfortran.texi. gcc/fortran/ChangeLog: * gfortran.texi: Fix typos. * decl.cc: Fix typos in comments and in a variable name. * arith.cc: Fix comment typos. * check.cc: Likewise. * class.cc: Likewise. * dependency.cc: Likewise. * expr.cc: Likewise. * frontend-passes.cc: Likewise. * gfortran.h: Likewise. * intrinsic.cc: Likewise. * iresolve.cc: Likewise. * match.cc: Likewise. * module.cc: Likewise. * primary.cc: Likewise. * resolve.cc: Likewise. * simplify.cc: Likewise. * trans-array.cc: Likewise. * trans-decl.cc: Likewise. * trans-expr.cc: Likewise. * trans-intrinsic.cc: Likewise. * trans-openmp.cc: Likewise. * trans-stmt.cc: Likewise.
2023-04-28gimple-range-op: Handle sqrt (basic bounds only)Jakub Jelinek3-1/+126
The following patch adds sqrt support (but similarly to sincos, only dumb basic ranges only). Will improve this incrementally and sin/cos as well. 2023-04-28 Jakub Jelinek <jakub@redhat.com> * gimple-range-op.cc (class cfn_sqrt): New type. (op_cfn_sqrt): New variable. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_CFN_SQRT{,_FN}. * gcc.dg/tree-ssa/range-sqrt.c: New test. * gfortran.dg/ieee/ieee_6.f90: Make x volatile to avoid ranger optimizing sqrt (-1) call away because it is only used in test for whether it returns NaN.
2023-04-28Implement range-op entry for sin/cosJakub Jelinek3-0/+143
On Tue, Apr 18, 2023 at 03:12:50PM +0200, Aldy Hernandez wrote: > [I don't know why I keep poking at floats. I must really like the pain. > > This is the range-op entry for sin/cos. It is meant to serve as an > example of what we can do for glibc math functions. It is by no means > exhaustive, just a stub to restrict the return range from sin/cos to > [-1.0, 1.0] with appropriate smarts of NANs. > > As can be seen in the testcase, we see sin() as well as > __builtin_sin() in the IL, and can resolve the resulting range > accordingly. Here is an updated version of the patch on top of the Add targetm.libm_function_max_error patch with all my comments incorporated into your patch (but still no handling of sin/cos ranges shorter than 2*M_PI). 2023-04-28 Aldy Hernandez <aldyh@redhat.com> Jakub Jelinek <jakub@redhat.com> * value-range.h (frange_nextafter): Declare. * gimple-range-op.cc (class cfn_sincos): New. (op_cfn_sin, op_cfn_cos): New variables. (gimple_range_op_handler::maybe_builtin_call): Handle CASE_CFN_{SIN,COS}{,_FN}. * gcc.dg/tree-ssa/range-sincos.c: New test.
2023-04-28Add targetm.libm_function_max_errorJakub Jelinek15-1/+290
As has been discussed before, the following patch adds target hook for math library function maximum errors measured in ulps. The default is to return ~0U which is a magic maximum value which means nothing is known about precision of the match function. The first argument is unsigned int because enum combined_fn isn't available everywhere where target hooks are included but is expected to be given the enum combined_fn value, although it should be used solely to find out which kind of match function (say sin vs. cos vs. sqrt vs. exp10) rather than its variant (f suffix, no suffix, l suffix, f128 suffix, ...), for which there is the machine_mode argument. The last argument is a bool, if it is false, the function should return maximum known error in ulps for a given function (taking -frounding-math into account if enabled), with 0.5ulps being represented as 0. If it is true, it is about whether the function can return values outside of an intrinsic finite range for the function and by how many ulps. E.g. sin/cos should return result in [-1.,1], if the function is expected to never return values outside of that finite interval, the hook should return 0. Similarly for sqrt such range is [-0.,+Inf]. The patch implements it for glibc only so far, I hope other maintainers can submit details for Solaris, musl, perhaps BSDs, etc. For glibc I've gathered data from: 1) https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html as latest published glibc data 2) https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html as a few years old glibc data 3) using attached libc-ulps.sh script from glibc git 4) using attached ulp-tester.c (how to invoke in file comment; tested both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding modes, plus on x86_64 float/double sin/cos using libmvec - see attached libmvec-wrapper.c as well) 5) using attached boundary-tester.c to test for whether sin/cos/sqrt return values outside of the intrinsic ranges for those functions (again, tested on x86_64, ppc64, ppc64le plus on x86_64 using libmvec as well; libmvec with non-default rounding modes is pretty much random number generator it seems) The data is added to various hooks, the generic and generic glibc versions being in targhooks.c so that the various targets can easily override it. The intent is that the generic glibc version handles most of the stuff and specific target arch overrides handle the outliers or special cases. The patch has special case for x86_64 when __FAST_MATH__ is defined (as one can use in that case either libm or libmvec and we don't know which one will be used; so it uses maximum of what libm provides and libmvec), rs6000 (had to add one because cosf has 3ulps on ppc* rather than 1-2ulps on most other targets; MODE_COMPOSITE_P could be in theory handled in the generic code too, but as we have rs6000-linux specific function, it can be done just there), arc-linux (because DFmode sin has 7ulps there compared to 1ulps on other targets, both in default rounding mode and in others) and or1k-linux (while DFmode sin has 1ulps there for default rounding mode, for other rounding modes it has up to 7ulps). Now, for -frounding-math I'm trying to add a few ulps more because I expect it to be much less tested, except that for boundary_p I try to use the numbers I got from the 5) tester. 2023-04-28 Jakub Jelinek <jakub@redhat.com> * target.def (libm_function_max_error): New target hook. * doc/tm.texi.in (TARGET_LIBM_FUNCTION_MAX_ERROR): Add. * doc/tm.texi: Regenerated. * targhooks.h (default_libm_function_max_error, glibc_linux_libm_function_max_error): Declare. * targhooks.cc: Include case-cfn-macros.h. (default_libm_function_max_error, glibc_linux_libm_function_max_error): New functions. * config/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine. * config/linux-protos.h (linux_libm_function_max_error): Declare. * config/linux.cc: Include target.h and targhooks.h. (linux_libm_function_max_error): New function. * config/arc/arc.cc: Include targhooks.h and case-cfn-macros.h. (arc_libm_function_max_error): New function. (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine. * config/i386/i386.cc (ix86_libc_has_fast_function): Formatting fix. (ix86_libm_function_max_error): New function. (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine. * config/rs6000/rs6000-protos.h (rs6000_linux_libm_function_max_error): Declare. * config/rs6000/rs6000-linux.cc: Include target.h, targhooks.h, tree.h and case-cfn-macros.h. (rs6000_linux_libm_function_max_error): New function. * config/rs6000/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine. * config/rs6000/linux64.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine. * config/or1k/or1k.cc: Include targhooks.h and case-cfn-macros.h. (or1k_libm_function_max_error): New function. (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
2023-04-28testsuite/C++: suppress filename canonicalization in module testsJan Beulich6-6/+6
The pathname underneath gcm.cache/ is determined from the effective name used for the main input file of a particular module. When modules are built, no canonicalization occurs for the main input file. Hence the module file wouldn't be found if a different (the canonicalized) file name was used when importing that same module. (This is an effect of importing happening in the preprocessor, just like #include handling.) Since it doesn't look easy to make module generation use libcpp's maybe_shorter_path() (in fact I'd consider this a layering violation, while cloning the logic would - at least in principle - be prone to both going out of sync), simply suppress system header path canonicalization for the respective tests. gcc/testsuite/ * g++.dg/modules/alias-1_b.C: Add -fno-canonical-system-headers. * g++.dg/modules/alias-1_d.C: Likewise. * g++.dg/modules/alias-1_e.C: Likewise. * g++.dg/modules/alias-1_f.C: Likewise. * g++.dg/modules/cpp-6_c.C: Likewise. * g++.dg/modules/dir-only-2_b.C: Likewise.
2023-04-28testsuite/C++: cope with IPv6 being unavailableJan Beulich1-1/+1
When IPv6 is disabled in the kernel, the error message coming back from Cody::OpenInet6() is different from the sole so far expected one. gcc/testsuite/ * g++.dg/modules/bad-mapper-3.C: Relax failure pattern.
2023-04-28harden-conditionals: detach values before comparesAlexandre Oliva2-10/+39
The optimization barriers inserted after compares enable GCC to derive information about the values from e.g. the taken paths, or the absence of exceptions. Move them before the original compares, so that the reversed compares test copies of the original operands, without further optimizations. for gcc/ChangeLog * gimple-harden-conditionals.cc (insert_edge_check_and_trap): Move detach value calls... (pass_harden_conditional_branches::execute): ... here. (pass_harden_compares::execute): Detach values before compares. for gcc/testsuite/ChangeLog * c-c++-common/torture/harden-cond-comp.c: New.
2023-04-28Daily bump.GCC Administrator6-1/+186
2023-04-27Update gcc .po filesJoseph Myers19-37123/+37682
* be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po, ja.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.
2023-04-27amdgcn: Fix addsub bugAndrew Stubbs1-8/+15
The vec_fmsubadd instuction actually had add twice, by mistake. Also improve code-gen for all the complex patterns by using properly undefined values. Mostly this just prevents the compiler reserving space in the stack frame. gcc/ChangeLog: * config/gcn/gcn-valu.md (cmul<conj_op><mode>3): Use gcn_gen_undef. (cml<addsub_as><mode>4): Likewise. (vec_addsub<mode>3): Likewise. (cadd<rot><mode>3): Likewise. (vec_fmaddsub<mode>4): Likewise. (vec_fmsubadd<mode>4): Likewise, and use sub for the odd lanes.
2023-04-27c++: print conversion error at candidate locationJason Merrill2-1/+3
In testcases like this one, the printing of candidates in a diagnostic has been longer than necessary because it jumps back and forth between the call site and the candidate site. So here, we first say at the call site that no match was found; then we note the candidate site, and then explain why it's not suitable back at the call site, which means printing the call site line with caret again. With this patch, the conversion diagnostic is at the same location as the candidate, so we don't need to print any input line. gcc/cp/ChangeLog: * call.cc (print_conversion_rejection): Use iloc_sentinel. gcc/testsuite/ChangeLog: * g++.dg/template/copy1.C: Adjust error lines.
2023-04-27RISC-V: Add required tls to read thread pointer testPan Li1-0/+1
The read-thread-pointer test may require the gcc configured with --enable-tls. If no, there x4 (aka tp) register will not be presented in the assembly code. This patch requires the tls for the dg checking. It will perform the test checking if --enable-tls and mark the test as unsupported if --disable-tls. Configured with --enable-tls: === gcc Summary === of expected passes 16 Configured with --disable-tls: === gcc Summary === of unsupported tests 8 gcc/testsuite/ChangeLog: * gcc.target/riscv/read-thread-pointer.c: Add required tls. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-04-27PHIOPT: Allow MIN/MAX to have up to 2 MIN/MAX expressions for early phioptAndrew Pinski1-1/+15
In the early PHIOPT mode, the original minmax_replacement, would replace a PHI node with up to 2 min/max expressions in some cases, this allows for that too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (phiopt_early_allow): Allow for up to 2 min/max expressions in the sequence/match code.
2023-04-27MIN/MAX should be treated similar as comparisons for trappingAndrew Pinski2-0/+6
While looking into moving optimizations from minmax_replacement in phiopt to match.pd, I Noticed that min/max were considered trapping even if -ffinite-math-only was being used. This changes those expressions to be similar as comparisons so that they are not considered trapping if -ffinite-math-only is on. OK? Bootstrapped and tested with no regressions on x86_64-linux-gnu. gcc/ChangeLog: * rtlanal.cc (may_trap_p_1): Treat SMIN/SMAX similar as COMPARISON. * tree-eh.cc (operation_could_trap_helper_p): Treate MIN_EXPR/MAX_EXPR similar as other comparisons.
2023-04-27PHIOPT: Move store_elim_worker into pass_cselim::executeAndrew Pinski1-131/+119
This simple patch moves the body of store_elim_worker direclty into pass_cselim::execute. Also removes unneeded prototypes too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (cond_store_replacement): Remove prototype. (cond_if_else_store_replacement): Likewise. (get_non_trapping): Likewise. (store_elim_worker): Move into ... (pass_cselim::execute): This.
2023-04-27PHIOPT: Rename tree_ssa_phiopt_worker to pass_phiopt::executeAndrew Pinski1-204/+181
Now that store elimination and phiopt does not share outer code, we can move tree_ssa_phiopt_worker directly into pass_phiopt::execute and remove many declarations (prototypes) from the file. gcc/ChangeLog: * tree-ssa-phiopt.cc (two_value_replacement): Remove prototype. (match_simplify_replacement): Likewise. (factor_out_conditional_conversion): Likewise. (value_replacement): Likewise. (minmax_replacement): Likewise. (spaceship_replacement): Likewise. (cond_removal_in_builtin_zero_pattern): Likewise. (hoist_adjacent_loads): Likewise. (tree_ssa_phiopt_worker): Move into ... (pass_phiopt::execute): this.
2023-04-27PHIOPT: Split out store elimination from phioptAndrew Pinski1-54/+126
Since the last cleanups, it made easier to see that we should split out the store elimination worker from tree_ssa_phiopt_worker function. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove do_store_elim argument and split that part out to ... (store_elim_worker): This new function. (pass_cselim::execute): Call store_elim_worker. (pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.
2023-04-27Unloop loops that no longer loops in tree-ssa-loop-chJan Hubicka3-14/+43
I noticed this after adding sanity check that the upper bound on number of iterations never drop to -1. It seems to be relatively common case (happening few hundred times in testsuite and also during bootstrap) that loop-ch duplicates enough so the loop itself no longer loops. This is later detected in loop unrolling but since we test the number of iterations anyway it seems better to do that earlier. * cfgloopmanip.h (unloop_loops): Export. * tree-ssa-loop-ch.cc (ch_base::copy_headers): Unloop loops that no longer loop. * tree-ssa-loop-ivcanon.cc (unloop_loops): Export; do not free vectors of loops to unloop. (canonicalize_induction_variables): Free vectors here. (tree_unroll_loops_completely): Free vectors here.
2023-04-27tree-optimization/109170 - bogus use-after-free with __builtin_expectRichard Biener2-8/+13
The following generalizes the range-op for __builtin_expect by using the fnspec machinery. PR tree-optimization/109170 * gimple-range-op.cc (gimple_range_op_handler::maybe_builtin_call): Handle __builtin_expect and similar via cfn_pass_through_arg1 and inspecting the calls fnspec. * builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT and BUILT_IN_EXPECT_WITH_PROBABILITY.
2023-04-27Use CONFIG_SHELL-/bin/sh in genmultilibAlexandre Oliva1-15/+15
There are still shells on some systems that lack the ability to start scripts when not using the shell name explicitly. Adjust genmultilib to use ${CONFIG_SHELL-/bin/sh} the same way configure does. for gcc/ChangeLog * genmultilib: Use CONFIG_SHELL to run sub-scripts.