Age | Commit message (Collapse) | Author | Files | Lines |
|
Let
int8_t x = 127;
This DR says that while
x++;
invokes UB,
++x;
does not. The resolution was to make the first one valid. The
following test verifies that we don't report any errors in a constexpr
context.
DR 2855
gcc/testsuite/ChangeLog:
* g++.dg/DRs/dr2855.C: New test.
|
|
We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
Use UZP1 instead of INS when combining low and high halves of vectors.
UZP1 has 3 operands which improves register allocation, and is faster on
some microarchitectures.
gcc:
* config/aarch64/aarch64-simd.md (aarch64_combine_internal<mode>):
Use UZP1 instead of INS.
(aarch64_combine_internal_be<mode>): Likewise.
gcc/testsuite:
* gcc.target/aarch64/ldp_stp_16.c: Update to check for UZP1.
* gcc.target/aarch64/pr109072_1.c: Likewise.
* gcc.target/aarch64/vec-init-14.c: Likewise.
* gcc.target/aarch64/vec-init-9.c: Likewise.
|
|
while building more testcases for ipa-icf I noticed that there are two places
in aliasing code where we still compare TYPE_MAIN_VARIANT for pointer equality.
This is not good idea for LTO since type merging may not happen for example
when in one unit pointed to type is forward declared while in other it is fully
defined. We have same_type_for_tbaa for that.
Bootstrapped/regtested x86_64-linux, OK?
gcc/ChangeLog:
* alias.cc (reference_alias_ptr_type_1): Use view_converted_memref_p.
* alias.h (view_converted_memref_p): Declare.
* tree-ssa-alias.cc (view_converted_memref_p): Export.
(ao_compare::compare_ao_refs): Use same_type_for_tbaa.
|
|
gcc.dg/ipa/ipa-icf-38.c currently FAILs on Solaris (SPARC and x86, 32
and 64-bit):
FAIL: gcc.dg/ipa/ipa-icf-38.c scan-ltrans-tree-dump-not optimized "Function bar"
As it turns out, this only happens when the Solaris linker is used; with
GNU ld the test PASSes just fine. In fact, that happens because gld
supports the lto-plugin while ld does not: in a Solaris build with gld,
the test FAILs the same way as with ld when -fno-use-linker-plugin is
passed, so this patch requires linker_plugin.
Tested on i386-pc-solaris2.11 (ld and gld) and x86_64-pc-linux-gnu.
2024-05-15 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
PR ipa/85656
* gcc.dg/ipa/ipa-icf-38.c: Require linker_plugin.
|
|
g++.target/i386/pr97054.C currently FAILs on 64-bit Solaris/x86:
FAIL: g++.target/i386/pr97054.C -std=gnu++14 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++14 compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C -std=gnu++17 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++17 compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C -std=gnu++2a (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++2a compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C -std=gnu++98 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++98 compilation failed to produce executable
Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/g++.target/i386/pr97054.C:49:20: error: frame pointer required, but reserved
Since Solaris/x86 defaults to -fno-omit-frame-pointer, this patch
explicitly builds with -fomit-frame-pointer as is the default on other
x86 targets.
Tested on i386-pc-solaris2.11 (32 and 64-bit) and x86_64-pc-linux-gnu.
2024-05-15 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* g++.target/i386/pr97054.C (dg-options): Add -fomit-frame-pointer.
|
|
The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).
Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).
This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).
The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.
gcc/ChangeLog:
* config/riscv/riscv-string.cc (riscv_block_move_straight):
Hand over up to 2xXLEN bytes to move_by_pieces().
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
by-pieces.
* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
by-pieces.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
A recent patch added the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.
The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.
The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.
Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.
The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.
The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.
gcc/ChangeLog:
* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move_scalar): Set alignment properly if the
target has fast unaligned access.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).
As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.
Let's add some test cases to document the current behaviour
and to have tests to identify regressions.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.
|
|
The pointers_handled_p() method is an internal range-op helper to help
catch dispatch type mismatches for pointer operands. This is what
caught the IPA mismatch in PR114985.
This method is only a temporary measure to catch any incompatibilities
in the current pointer range-op entries. This patch returns true for
any *new* entries in the range-op table, as the current ones are
already fleshed out. This keeps us from having to implement this
boilerplate function for any new range-op entries.
PR tree-optimization/114995
* range-op-ptr.cc (range_operator::pointers_handled_p): Default to true.
|
|
When I was checking to making sure that all of the bugs dealing
with the case where gimple_can_duplicate_bb_p would return false was fixed,
I noticed that the code which was checking if a call statement was
returns_twice was checking all call statements rather than just the
last statement. Since calling gimple_call_flags has a small non-zero
overhead due to a few string comparison, removing the uses of it
can have a small performance improvement. In the case of returns_twice
functions calls, will always end the basic-block due to the check in
stmt_can_terminate_bb_p (and others). So checking only the last statement
is a small optimization and will be safe.
Bootstrapped and tested pon x86_64-linux-gnu with no regressions.
PR tree-optimization/114301
gcc/ChangeLog:
* tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
only on the last call statement rather than all.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
I should have double-checked the CI system before pushing Christoph's patches
for memset-zero. While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.
Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.
The test would need a revamp for rv32 as the expected output is all rv64 code
using "sd" instructions. I'm just not vested deeply enough into rv32 to adjust
the test to work in that environment though it should be fairly trivial to copy
the test and provide new expected output if someone cares enough.
Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O1 (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O1 (test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O2 (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O2 (test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O3 -g (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O3 -g (test for excess errors)
And after the ICE is fixed, these are eliminated by only running the test for
rv64:
> New tests that FAIL (3 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O1 check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O2 check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O3 -g check-function-bodies clear_buf_123
gcc/
* config/riscv/riscv-string.cc
(riscv_expand_block_clear_zicboz_zic64b): Handle rv32 correctly.
gcc/testsuite
* gcc.target/riscv/cmo-zicboz-zic64-1.c: Don't run on rv32.
|
|
ix86_expand_vec_perm_const_1 [PR107563]
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code generation for configurations
supporting SSE2.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Best
Levy
gcc/ChangeLog:
PR target/107563
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): Call expand_vec_perm_psrlw_psllw_por.
gcc/testsuite/ChangeLog:
PR target/107563
* g++.target/i386/pr107563-a.C: New test.
* g++.target/i386/pr107563-b.C: New test.
|
|
r14-4111-g6e92a6a2a72d3b made us check non-dependent simple assignment
expressions ahead of time and give them a type, as was already done for
compound assignments. Unlike for compound assignments however, if a
simple assignment resolves to an operator overload we represent it as a
(typed) MODOP_EXPR instead of a CALL_EXPR to the selected overload.
(I reckoned this was at worst a pessimization -- we'll just have to repeat
overload resolution at instantiatiation time.)
But this turns out to break the below testcase ultimately because
MODOP_EXPR (of non-reference type) is always treated as an lvalue
according to lvalue_kind, which is incorrect for the MODOP_EXPR
representing x=42.
We can fix this by representing such class assignment expressions as
CALL_EXPRs as well, but this turns out to require some tweaking of our
-Wparentheses warning logic and may introduce other fallout making it
unsuitable for backporting.
So this patch instead fixes lvalue_kind to consider the type of a
MODOP_EXPR representing a class assignment.
PR c++/114994
gcc/cp/ChangeLog:
* tree.cc (lvalue_kind) <case MODOP_EXPR>: For a class
assignment, consider the result type.
gcc/testsuite/ChangeLog:
* g++.dg/template/non-dependent32.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
|
|
So this patch allows us to eliminate an redundant AND in some shift-add
style sequences. I think the testcase was reduced from xz by the RAU
team, but I'm not highly confident of that.
Specifically the AND is masking off the upper 32 bits of the un-shifted
value and there's an outer SIGN_EXTEND from SI to DI. However in the
RTL it's working on the post-shifted value, so the constant is left
shifted, so we have to account for that in the pattern's condition.
We can just drop the AND in this case. So instead we do a 64bit shift,
then a sign extending ADD utilizing the low part of that 64bit shift result.
This has run through Ventana's CI as well as my own. I'll wait for it
to run through the larger CI system before pushing.
Jeff
gcc/
* config/riscv/riscv.md: Add pattern for sign extended shift-add
sequence with a masked input.
gcc/testsuite
* gcc.target/riscv/shift-add-2.c: New test.
|
|
|
|
We currently ICE upon the following invalid snippet because we fail to
properly handle tsubst_arg_types returning error_mark_node in
build_deduction_guide.
== cut ==
template<class... Ts, class>
struct A { A(Ts...); };
A a;
== cut ==
This patch fixes this, and has been successfully tested on x86_64-pc-linux-gnu.
PR c++/105760
gcc/cp/ChangeLog:
* pt.cc (build_deduction_guide): Check for error_mark_node
result from tsubst_arg_types.
gcc/testsuite/ChangeLog:
* g++.dg/parse/error66.C: New test.
|
|
gcc/cp/ChangeLog:
* decl.cc (wrap_cleanups_r): Clarify comment.
* init.cc (build_vec_init): Update comment.
|
|
Commit r15-436-g44e7855e did not fix PR115013 for PRU because
SMALL_REGISTER_CLASS_P is not returning an accurate value for the PRU
backend.
Word mode for PRU backend is defined as 8-bit, yet all ALU operations
are preferred in 32-bit mode. Thus checking whether a register class
contains a single word_mode register would not classify the actually
single SImode register classes as small. This affected the
multiplication source and destination register classes.
Fix by implementing TARGET_CLASS_LIKELY_SPILLED_P to treat all register
classes with SImode or smaller size as likely spilled. This in turn
corrects the behaviour of SMALL_REGISTER_CLASS_P for PRU.
PR rtl-optimization/115013
gcc/ChangeLog:
* config/pru/pru.cc (pru_class_likely_spilled_p): Implement
to mark classes containing one SImode register as likely
spilled.
(TARGET_CLASS_LIKELY_SPILLED_P): Define.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
- avoid an additional LUI insn
- side benefits of not clobbering a reg
e.g.
w/o patch w/ patch
long | |
plus(unsigned long i) | li a5,4096 |
{ | addi a5,a5,-2032 | addi a0, a0, 2047
return i + 2064; | add a0,a0,a5 | addi a0, a0, 17
} | ret | ret
NOTE: In theory not having const in a standalone reg might seem less
CSE friendly, but for workloads in consideration these mat are
from very late LRA reloads and follow on GCSE is not doing much
currently.
The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.
This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.
500.perlbench_r-0 | 1,214,534,029,025 | 1,212,887,959,387 |
500.perlbench_r-1 | 740,383,419,739 | 739,280,308,163 |
500.perlbench_r-2 | 692,074,638,817 | 691,118,734,547 |
502.gcc_r-0 | 190,820,141,435 | 190,857,065,988 |
502.gcc_r-1 | 225,747,660,839 | 225,809,444,357 | <- -0.02%
502.gcc_r-2 | 220,370,089,641 | 220,406,367,876 | <- -0.03%
502.gcc_r-3 | 179,111,460,458 | 179,135,609,723 | <- -0.02%
502.gcc_r-4 | 219,301,546,340 | 219,320,416,956 | <- -0.01%
503.bwaves_r-0 | 278,733,324,691 | 278,733,323,575 | <- -0.01%
503.bwaves_r-1 | 442,397,521,282 | 442,397,519,616 |
503.bwaves_r-2 | 344,112,218,206 | 344,112,216,760 |
503.bwaves_r-3 | 417,561,469,153 | 417,561,467,597 |
505.mcf_r | 669,319,257,525 | 669,318,763,084 |
507.cactuBSSN_r | 2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%
508.namd_r | 1,855,884,342,110 | 1,855,881,110,934 |
510.parest_r | 1,654,525,521,053 | 1,654,402,859,174 |
511.povray_r | 2,990,146,655,619 | 2,990,060,324,589 |
519.lbm_r | 1,158,337,294,525 | 1,158,337,294,529 |
520.omnetpp_r | 1,021,765,791,283 | 1,026,165,661,394 |
521.wrf_r | 1,715,955,652,503 | 1,714,352,737,385 |
523.xalancbmk_r | 849,846,008,075 | 849,836,851,752 |
525.x264_r-0 | 277,801,762,763 | 277,488,776,427 |
525.x264_r-1 | 927,281,789,540 | 926,751,516,742 |
525.x264_r-2 | 915,352,631,375 | 914,667,785,953 |
526.blender_r | 1,652,839,180,887 | 1,653,260,825,512 |
527.cam4_r | 1,487,053,494,925 | 1,484,526,670,770 |
531.deepsjeng_r | 1,641,969,526,837 | 1,642,126,598,866 |
538.imagick_r | 2,098,016,546,691 | 2,097,997,929,125 |
541.leela_r | 1,983,557,323,877 | 1,983,531,314,526 |
544.nab_r | 1,516,061,611,233 | 1,516,061,407,715 |
548.exchange2_r | 2,072,594,330,215 | 2,072,591,648,318 |
549.fotonik3d_r | 1,001,499,307,366 | 1,001,478,944,189 |
554.roms_r | 1,028,799,739,111 | 1,028,780,904,061 |
557.xz_r-0 | 363,827,039,684 | 363,057,014,260 |
557.xz_r-1 | 906,649,112,601 | 905,928,888,732 |
557.xz_r-2 | 509,023,898,187 | 508,140,356,932 |
997.specrand_fr | 402,535,577 | 403,052,561 |
999.specrand_ir | 402,535,577 | 403,052,561 |
This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs (thats a different PR/114729.
Implementation Details (for posterity)
--------------------------------------
- basic idea is to have a splitter selected via a new predicate for constant
being possible sum of two S12 and provide the transform.
This is however a 2 -> 2 transform which combine can't handle.
So we specify it using a define_insn_and_split.
- the initial loose "i" constraint caused LRA to accept invalid insns thus
needing a tighter new constraint as well.
- An additional fallback alternate with catch-all "r" register
constraint also needed to allow any "reloads" that LRA might
require for ADDI with const larger than S12.
Testing
--------
This is testsuite clean (rv64 only).
I'll rely on post-commit CI multlib run for any possible fallout for
other setups such as rv32.
| | gcc | g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/ lp64d/ medlow | 41 / 17 | 8 / 3 | 7 / 2 |
| rv64imafdc_zba_zbb_zbs_zicond/ lp64d/ medlow | 41 / 17 | 8 / 3 | 7 / 2 |
I also threw this into a buildroot run, it obviously boots Linux to
userspace. bloat-o-meter on glibc and kernel show overall decrease in
staic instruction counts with some minor spot increases.
These are generally in the category of
- LUI + ADDI are 2 byte each vs. two ADD being 4 byte each.
- Knock on effects due to inlining changes.
- Sometimes the slightly shorter 2-insn seq in a mult-exit function
can cause in-place epilogue duplication (vs. a jump back).
This is slightly larger but more efficient in execution.
In summary nothing to fret about.
| linux/scripts/bloat-o-meter build-gcc-240131/target/lib/libc.so.6 \
build-gcc-240131-new-splitter-1-variant/target/lib/libc.so.6
|
| add/remove: 0/0 grow/shrink: 21/49 up/down: 520/-3056 (-2536)
| Function old new delta
| getnameinfo 2756 2892 +136
...
| tempnam 136 144 +8
| padzero 276 284 +8
...
| __GI___register_printf_specifier 284 280 -4
| __EI_xdr_array 468 464 -4
| try_file_lock 268 260 -8
| pthread_create@GLIBC_2 3520 3508 -12
| __pthread_create_2_1 3520 3508 -12
...
| _nss_files_setnetgrent 932 904 -28
| _nss_dns_gethostbyaddr2_r 1524 1480 -44
| build_trtable 3312 3208 -104
| printf_positional 25000 22580 -2420
| Total: Before=2107999, After=2105463, chg -0.12%
Caveat:
------
Jeff noted during v2 review that the operand0 constraint !riscv_reg_frame_related
could potentially cause issues with hard reg cprop in future. If that
trips things up we will have to loosen the constraint while dialing down
the const range to (-2048 to 2032) as opposed to fll S12 range of
(-2048 to 2047) to keep stack regs aligned.
gcc/ChangeLog:
* config/riscv/riscv.h: New macros to check for sum of two S12
range.
* config/riscv/constraints.md: New constraint.
* config/riscv/predicates.md: New Predicate.
* config/riscv/riscv.md: New splitter.
* config/riscv/riscv.cc (riscv_reg_frame_related): New helper.
* config/riscv/riscv-protos.h: New helper prototype.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sum-of-two-s12-const-1.c: New test: checks
for new patterns output.
* gcc.target/riscv/sum-of-two-s12-const-2.c: Ditto.
* gcc.target/riscv/sum-of-two-s12-const-3.c: New test: should not
ICE.
Tested-by: Edwin Lu <ewlu@rivosinc.com> # pre-commit-CI #1520
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
The following revisits the fix for PR99954 which was observed as
causing missed memcpy recognition and instead using memmove for
non-aliasing copies. While the original fix mitigated bogus
recognition of memcpy the root cause was not properly identified.
The root cause is dr_analyze_indices "failing" to handle union
references and leaving the DRs indices in a state that's not correctly
handled by dr_may_alias. The following mitigates this there
appropriately, restoring memcpy recognition for non-aliasing copies.
This makes us run into a latent issue in ptr_deref_may_alias_decl_p
when the pointer is something like &MEM[0].a in which case we fail
to handle non-SSA name pointers. Add code similar to what we have
in ptr_derefs_may_alias_p.
PR tree-optimization/99954
* tree-data-ref.cc (dr_may_alias_p): For bases that are
not completely analyzed fall back to TBAA and points-to.
* tree-loop-distribution.cc
(loop_distribution::classify_builtin_ldst): When there
is no dependence again classify as memcpy.
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.
* gcc.dg/tree-ssa/ldist-40.c: New testcase.
|
|
The Zicboz extension offers the cbo.zero instruction, which can be used
to clean a memory region corresponding to a cache block.
The Zic64b extension defines the cache block size to 64 byte.
If both extensions are available, it is possible to use cbo.zero
to clear memory, if the alignment and size constraints are met.
This patch implements this.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_expand_block_clear): New prototype.
* config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
New function to expand a block-clear with cbo.zero.
(riscv_expand_block_clear): New RISC-V block-clear expansion function.
* config/riscv/riscv.md (setmem<mode>): New setmem expansion.
|
|
Let's add '\t' to the instruction match pattern to avoid false positive
matches when compiling with -flto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cmo-zicbom-1.c: Add \t to test pattern.
* gcc.target/riscv/cmo-zicbom-2.c: Likewise.
* gcc.target/riscv/cmo-zicbop-1.c: Likewise.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
* gcc.target/riscv/cmo-zicboz-1.c: Likewise.
* gcc.target/riscv/cmo-zicboz-2.c: Likewise.
|
|
Make clear_by_pieces() available to other parts of the compiler,
similar to store_by_pieces().
gcc/ChangeLog:
* expr.cc (clear_by_pieces): Remove static from clear_by_pieces.
* expr.h (clear_by_pieces): Add prototype for clear_by_pieces.
|
|
On aarch64, I get this failure:
...
FAIL: gcc.dg/pr115066.c scan-assembler \\.byte\\t0xb\\t# Define macro strx
...
This happens because we expect to match:
...
.byte 0xb # Define macro strx
...
but instead we get:
...
.byte 0xb // Define macro strx
...
Fix this by not explicitly matching the comment marker.
Tested on aarch64 and x86_64.
gcc/testsuite/ChangeLog:
2024-05-14 Tom de Vries <tdevries@suse.de>
* gcc.dg/pr115066.c: Don't match comment marker.
|
|
[PR107750]
gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c currently FAILs
on Solaris:
FAIL: gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c (test for
excess errors)
Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c:91:3:
error: implicit declaration of function 'memset'
[-Wimplicit-function-declaration]
Solaris <sys/select.h> has
but no declaration of memset. While one can argue that this should be
fixed, it's easy enough to just include <string.h> instead, which is
what this patch does.
Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.
2024-05-14 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
PR analyzer/107750
* gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c:
Include <string.h>.
|
|
Consider a hello world, compiled with -gsplit-dwarf and dwarf version 4, and
-g3:
...
$ gcc -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -save-temps -dA
...
In section .debug_macro.dwo, we have:
...
.Ldebug_macro0:
.value 0x4 # DWARF macro version number
.byte 0x2 # Flags: 32-bit, lineptr present
.long .Lskeleton_debug_line0
.byte 0x3 # Start new file
.uleb128 0 # Included from line number 0
.uleb128 0x1 # file /data/vries/hello.c
.byte 0x5 # Define macro strp
.uleb128 0 # At line number 0
.uleb128 0x1d0 # The macro: "__STDC__ 1"
...
Given that we use a DW_MACRO_define_strp, we'd expect 0x1d0 to be an
offset into a .debug_str.dwo section.
But in fact, 0x1d0 is an index into the string offset table in
section .debug_str_offsets.dwo:
...
.long 0x34f0 # indexed string 0x1d0: __STDC__ 1
...
Add asserts that catch this inconsistency, and fix this by using
DW_MACRO_define_strx instead.
Tested on x86_64.
gcc/ChangeLog:
2024-05-14 Tom de Vries <tdevries@suse.de>
PR debug/115066
* dwarf2out.cc (output_macinfo_op): Fix DW_MACRO_define_strx/strp
choice for v4 .debug_macro.dwo. Add asserts to check that choice.
gcc/testsuite/ChangeLog:
2024-05-14 Tom de Vries <tdevries@suse.de>
PR debug/115066
* gcc.dg/pr115066.c: New test.
|
|
this patch tames down inliner on (mutiply) self-recursive always_inline functions.
While we already have caps on recursive inlning, the testcase combines early inliner
and late inliner to get very wide recursive inlining tree. The basic idea is to
ignore DISREGARD_INLINE_LIMITS when deciding on inlining self recursive functions
(so we cut on function being large) and clear the flag once it is detected.
I did not include the testcase since it still produces a lot of code and would
slow down testing. It also outputs many inlining failed messages that is not
very nice, but it is hard to detect self recursin cycles in full generality
when indirect calls and other tricks may happen.
gcc/ChangeLog:
PR ipa/113291
* ipa-inline.cc (enum can_inline_edge_by_limits_flags): New enum.
(can_inline_edge_by_limits_p): Take flags instead of multiple bools; add flag
for forcing inlinie limits.
(can_early_inline_edge_p): Update.
(want_inline_self_recursive_call_p): Update; use FORCE_LIMITS mode.
(check_callers): Update.
(update_caller_keys): Update.
(update_callee_keys): Update.
(recursive_inlining): Update.
(add_new_edges_to_heap): Update.
(speculation_useful_p): Update.
(inline_small_functions): Clear DECL_DISREGARD_INLINE_LIMITS on self recursion.
(flatten_function): Update.
(inline_to_all_callers_1): Update.
|
|
This patch enables overlapped by-piece operations by defining
TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear
ratio is 2. So the overlap is only enabled with compare by-pieces.
gcc/
* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
gcc/testsuite/
* gcc.target/powerpc/block-cmp-9.c: New.
|
|
The implementation of User_Aspect_Definition uses subtype
Boolean_Aspects to decide which existing aspects can be used to define
old aspects. This subtype didn't include many of the SPARK aspects,
notably the Always_Terminates.
gcc/ada/
* aspects.ads (Aspect_Id, Boolean_Aspect): Change categorization
of Boolean-valued SPARK aspects.
* sem_ch13.adb (Analyze_Aspect_Specification): Adapt CASE
statements to new classification of Boolean-valued SPARK
aspects.
|
|
This patch fixes a crash when the compiler emits a warning about
an unchecked conversion and -gnatdJ is enabled.
gcc/ada/
* sem_ch13.adb (Validate_Unchecked_Conversions): Add node
parameters to Error_Msg calls.
|
|
gcc/ada/
* sem_util.adb: Typo fix in comment.
* exp_aggr.adb: Likewise.
|
|
gcc/ada/
* exp_ch7.adb (Finalization Management): Add a short description of
the implementation of finalization chains.
|
|
The call to Build_Allocate_Deallocate_Proc must occur before the special
accessibility check for class-wide allocation is generated, because this
check comes with cleanup code.
gcc/ada/
* exp_ch4.adb (Expand_Allocator_Expression): Move the first call to
Build_Allocate_Deallocate_Proc up to before the accessibility check.
|
|
A recent change broke pragma Warnings when -gnatD is enabled in some
cases. This patch fixes this by caching more slocs at times when it's
known that they haven't been modified by -gnatD.
gcc/ada/
* errout.adb (Validate_Specific_Warnings): Adapt to record
definition change.
* erroutc.adb (Set_Specific_Warning_On, Set_Specific_Warning_Off,
Warning_Specifically_Suppressed): Likewise.
* erroutc.ads: Change record definition.
|
|
This decouples the attachment to the appropriate finalization collection of
dynamically allocated objects that need finalization from their allocation.
The current implementation immediately attaches them after allocating them,
which means that they will be finalized even if their initialization does
not complete successfully. The new implementation instead generates the
same sequence as the one generated for (statically) declared objects, that
is to say, allocation, initialization and attachment in this order.
gcc/ada/
* exp_ch3.adb (Build_Default_Initialization): Do not generate the
protection for finalization collections.
(Build_Heap_Or_Pool_Allocator): Set the No_Initialization flag on
the declaration of the temporary.
* exp_ch4.adb (Build_Aggregate_In_Place): Do not build an allocation
procedure here.
(Expand_Allocator_Expression): Build an allocation procedure, if it
is required, only just before rewriting the allocator.
(Expand_N_Allocator): Do not build an allocation procedure if the
No_Initialization flag is set on the allocator, except for those
generated for special return objects. In other cases, build an
allocation procedure, if it is required, only before rewriting
the allocator.
* exp_ch7.ads (Make_Address_For_Finalize): New function declaration.
* exp_ch7.adb (Finalization Management): Update description for
dynamically allocated objects.
(Make_Address_For_Finalize): Remove declaration.
(Find_Last_Init): Change to function and move to...
(Process_Object_Declaration): Adjust to above change.
* exp_util.ads (Build_Allocate_Deallocate_Proc): Add Mark parameter
with Empty default and document it.
(Find_Last_Init): New function declaration.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Add Mark parameter
with Empty default and pass it in recursive call. Deal with type
conversions created for interface types. Adjust call sequence to
Allocate_Any_Controlled by changing Collection to In/Out parameter
and removing Finalize_Address parameter. For a controlled object,
generate a conditional call to Attach_Object_To_Collection for an
allocation and to Detach_Object_From_Collection for a deallocation.
(Find_Last_Init): ...here. Compute the initialization type for an
allocator whose designating type is class wide specifically and also
handle concurrent types.
* rtsfind.ads (RE_Id): Add RE_Attach_Object_To_Collection and
RE_Detach_Object_From_Collection.
(RE_Unit_Table): Add entries for RE_Attach_Object_To_Collection and
RE_Detach_Object_From_Collection.
* libgnat/s-finpri.ads (Finalization_Started): Delete.
(Attach_Node_To_Collection): Likewise.
(Detach_Node_From_Collection): Move to...
(Attach_Object_To_Collection): New procedure declaration.
(Detach_Object_From_Collection): Likewise.
(Finalization_Collection): Remove Atomic for Finalization_Started.
Add pragma Inline for Initialize.
* libgnat/s-finpri.adb: Add clause for Ada.Unchecked_Conversion.
(To_Collection_Node_Ptr): New instance of Ada.Unchecked_Conversion.
(Detach_Node_From_Collection): ...here.
(Attach_Object_To_Collection): New procedure.
(Detach_Object_From_Collection): Likewise.
(Finalization_Started): Delete.
(Finalize): Replace allocation with attachment in comments.
* libgnat/s-stposu.ads (Allocate_Any_Controlled): Rename parameter
Context_Subpool into Named_Subpool, parameter Context_Collection
into Collection and change it to In/Out, and remove Fin_Address.
* libgnat/s-stposu.adb: Remove clause for Ada.Unchecked_Conversion
and Finalization_Primitives.
(To_Collection_Node_Ptr): Delete.
(Allocate_Any_Controlled): Rename parameter Context_Subpool into
Named_Subpool, parameter Context_Collection into Collection and
change it to In/Out, and remove Fin_Address. Do not lock/unlock
and do not attach the object, instead only displace its address.
(Deallocate_Any_Controlled): Do not lock/unlock and do not detach
the object.
(Header_Size_With_Padding): Use qualified name for Header_Size.
|
|
A recent change to reduce duplication of compiler-generated Put_Image and
streaming subprograms introduced two regressions. One is yet another of the
many cases where generating these routines "on demand" (as opposed at the
point of the associated type declaration) requires loosening the compiler's
enforcement of privacy. The other is a use-before-definition issue that
occurs because the declaration of a Put_Image procedure is not hoisted far
enough.
gcc/ada/
* exp_attr.adb (Build_And_Insert_Type_Attr_Subp): If a subprogram
associated with a (library-level) type declared in another unit is
to be inserted somewhere in a list, then insert it at the head of
the list.
* sem_ch5.adb (Analyze_Assignment): Normally a limited-type
assignment is illegal. Relax this rule if Comes_From_Source is
False and the type is not immutably limited.
|
|
This patch makes it so the diagnostics coming from occurrences of
pragma Compile_Time_Error and Compile_Time_Warning are emitted with
a node parameter so they don't cause a crash when -gnatdJ is enabled.
gcc/ada/
* errout.ads (Error_Msg): Add node parameter.
* errout.adb (Error_Msg): Add parameter and pass it to
the underlying call.
* sem_prag.adb (Validate_Compile_Time_Warning_Or_Error): Pass
pragma node when emitting errors.
|
|
This patch makes it so -gnatyz style checks reports specify a node
ID. That is required since those checks are sometimes made during
semantic analysis of short-circuit operators, where the Current_Node
mechanism that -gnatdJ uses is not operational.
Check_Xtra_Parens_Precedence is moved from Styleg to Style to make
this possible.
gcc/ada/
* styleg.ads (Check_Xtra_Parens_Precedence): Moved ...
* style.ads (Check_Xtra_Parens_Precedence): ... here. Also
replace corresponding renaming.
* styleg.adb (Check_Xtra_Parens_Precedence): Moved ...
* style.adb (Check_Xtra_Parens_Precedence): here. Also use
Errout.Error_Msg and pass it a node parameter.
|
|
This eliminates a few oddities present in the expander for allocators and
aggregates present in allocators:
- Convert_Array_Aggr_In_Allocator takes both a Decl and Alloc parameters,
and inserts new code before Alloc for records and after Decl for arrays
through Convert_Array_Aggr_In_Allocator. Now, for the 3 (duplicated)
calls to the procedure, that's the same place. It also creates a new
list that it does not use in most cases.
- Expand_Allocator_Expression uses the same code sequence in 3 places
when the expression is an aggregate to build in place.
- Build_Allocate_Deallocate_Proc takes an Is_Allocate parameter that is
entirely determined by the N parameter: if N is an allocator, it must
be true; if N is a free statement, it must be false. Barring that,
the procedure either raises an assertion or Program_Error. It also
contains useless pattern matching code in the second part.
No functional changes.
gcc/ada/
* exp_aggr.ads (Convert_Aggr_In_Allocator): Rename Alloc into N,
replace Decl with Temp and adjust description.
(Convert_Aggr_In_Object_Decl): Alphabetize.
(Is_Delayed_Aggregate): Likewise.
* exp_aggr.adb (Convert_Aggr_In_Allocator): Rename Alloc into N
and replace Decl with Temp. Allocate a list only when neeeded.
(Convert_Array_Aggr_In_Allocator): Replace N with Decl and insert
new code before it.
* exp_ch4.adb (Build_Aggregate_In_Place): New procedure nested in
Expand_Allocator_Expression.
(Expand_Allocator_Expression): Call it to build aggregates in place.
Remove second parameter in calls to Build_Allocate_Deallocate_Proc.
(Expand_N_Allocator): Likewise.
* exp_ch13.adb (Expand_N_Free_Statement): Likewise.
* exp_util.ads (Build_Allocate_Deallocate_Proc): Remove Is_Allocate
parameter.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Remove Is_Allocate
parameter and replace it with local variable of same name. Delete
useless pattern matching.
|
|
Before this patch, the default status of -gnatw.i and -gnatw.d are
reported incorrectly in the usage string used throughout GNAT tools.
This patch fixes this.
gcc/ada/
* usage.adb (Usage): Fix enabled-by-default indicators.
|
|
The parameters should be swapped to fit Fileapi.h documentation.
BOOL LocalFileTimeToFileTime(
[in] const FILETIME *lpLocalFileTime,
[out] LPFILETIME lpFileTime
);
gcc/ada/
* libgnat/s-win32.ads (LocalFileTimeToFileTime): Swap parameters.
|
|
This patch tweaks the calls made to Errout subprograms to report
violations of dependence restrictions, in order fix a crash that
occurred with -gnatdJ and -fdiagnostics-format=json.
gcc/ada/
* restrict.adb (Violation_Of_No_Dependence): Tweak error
reporting calls.
|
|
This patch fixes a crash when -gnatdJ is enabled and a warning
must be emitted about an ineffective pragma Warnings clause.
Some modifications are made to the specific warnings machinery so
that warnings carry the ID of the pragma node they're about, so the
-gnatdJ mechanism can find an appropriate enclosing subprogram.
gcc/ada/
* sem_prag.adb (Analyze_Pragma): Adapt call to new signature.
* erroutc.ads (Set_Specific_Warning_Off): change signature
and update documentation.
(Validate_Specific_Warnings): Move ...
* errout.adb: ... here and change signature. Also move body
of Validate_Specific_Warnings from erroutc.adb.
(Finalize): Adapt call.
* errout.ads (Set_Specific_Warning_Off): Adapt signature of
renaming.
* erroutc.adb (Set_Specific_Warning_Off): Adapt signature and
body.
(Validate_Specific_Warnings): Move to the body of Errout.
(Warning_Specifically_Suppressed): Adapt body.
|
|
The allocation strategy for objects of a discriminated type with defaulted
discriminants is not the same when the allocation is dynamic as when it is
static (i.e a declaration): in the former case, the compiler allocates the
default size whereas, in the latter case, it allocates the maximum size.
This restores the default size, which was dropped during the refactoring.
gcc/ada/
* exp_aggr.adb (Build_Array_Aggr_Code): Pass N in the call to
Build_Initialization_Call.
(Build_Record_Aggr_Code): Likewise.
(Convert_Aggr_In_Object_Decl): Likewise.
(Initialize_Discriminants): Likewise.
* exp_ch3.ads (Build_Initialization_Call): Replace Loc witn N.
* exp_ch3.adb (Build_Array_Init_Proc): Pass N in the call to
Build_Initialization_Call.
(Build_Default_Initialization): Likewise.
(Expand_N_Object_Declaration): Likewise.
(Build_Initialization_Call): Replace Loc witn N parameter and add
Loc local variable. Build a default subtype for an allocator of
a discriminated type with defaulted discriminants.
(Build_Record_Init_Proc): Pass the declaration of components in the
call to Build_Initialization_Call.
* exp_ch6.adb (Make_CPP_Constructor_Call_In_Allocator): Pass the
allocator in the call to Build_Initialization_Call.
|
|
A previous change introduced an error in the diagnostic message about
overlapping actuals. This commit fixes this.
gcc/ada/
* sem_warn.adb (Warn_On_Overlapping_Actuals): Fix typo.
|
|
The compiler may either crash or incorrectly report errors when
a component association in a container aggregate is an if_expression
with an elsif part whose dependent expression is a call to a function
returning a result that requires finalization. The compiler complains
that a private type is expected, but a package or procedure name was
found. This is due to the compiler improperly associating expanded
calls to Finalize_Object with the aggregate, rather than the enclosing
object declaration being initialized by the aggregate, which can result
in the Finalize_Object procedure call being passed as an actual to
the Add_Unnamed operation of the container type and leading to a type
mismatch and the confusing error message. This is fixed by adjusting
the code that locates the proper context for insertion of Finalize_Object
calls to locate the enclosing declaration or statement rather than
stopping at the aggregate.
gcc/ada/
* exp_util.adb (Find_Hook_Context): Exclude N_*Aggregate Nkinds
of Parent (Par) from the early return in the second loop of the
In_Cond_Expr case, to prevent returning an aggregate from this
function rather than the enclosing declaration or statement.
|
|
Fix constructs that were flagged by CodePeer.
gcc/ada/
* exp_attr.adb: Replace 6 "not Present" tests with equivalent calls to "No".
|
|
Now that Default_Initialize_Object honors the No_Initialization flag in all
cases, objects of an access type declared without initialization expression
can no longer be considered as being automatically initialized to null.
gcc/ada/
* exp_ch3.adb (Expand_N_Object_Declaration): Examine the Expression
field after the call to Default_Initialize_Object in order to set
Is_Known_Null, as well as Is_Known_Non_Null, on an access object.
|