Age | Commit message (Collapse) | Author | Files | Lines |
|
Throw the switch in range-ops to make full use of the value/mask
information instead of only the nonzero bits. This will cause most of
the operators implemented in range-ops to use the value/mask
information calculated by CCP's bit_value_binop() function which
range-ops uses. This opens up more optimization opportunities.
In follow-up patches I will change the global range setter
(set_range_info) to be able to save the value/mask pair, and make both
CCP and IPA be able to save the known ones bit info, instead of
throwing it away.
gcc/ChangeLog:
* range-op.cc (irange_to_masked_value): Remove.
(update_known_bitmask): Update irange value/mask pair instead of
only updating nonzero bits.
gcc/testsuite/ChangeLog:
* gcc.dg/pr83073.c: Adjust testcase.
|
|
Improve profile update in loop-ch to handle situation where duplicated header
has loop invariant test. In this case we konw that all count of the exit edge belongs to
the duplicated loop header edge and can update probabilities accordingly.
Since we also do all the work to track this information from analysis to duplicaiton
I also added code to turn those conditionals to constants so we do not need later
jump threading pass to clean up.
This made me to work out that the propagation was buggy in few aspects
1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
to be IV-like when they are not
2) it did not check for novops calls that are not required to return same
value on every invocation.
3) I also added check for asm statement since those are not necessarily
reproducible either.
I would like to do more changes, but tried to prevent this patch from
snowballing. The analysis of what statements will remain after duplication can
be improved. I think we should use ranger query for other than first basic
block, too and possibly drop the IV heuristics then. Also it seems that a lot
of this logic is pretty much same to analysis in peeling pass, so unifying this
would be nice.
I also think I should move the profile update out of
gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
since those regions are singe entry multiple exit.
Bootstrapped/regtsted x86_64-linux, OK?
Honza
gcc/ChangeLog:
* tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
parameter and rewrite profile updating code to handle edges elimination.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
* tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
(loop_iv_derived_p): New function.
(should_duplicate_loop_header_p): Track invariant exit edges; fix handling
of PHIs and propagation of IV derived variables.
(ch_base::copy_headers): Pass around the invariant edges hash set.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.
|
|
Recently, two identical XTheadCondMov tests have been added, which both fail.
Let's fix that by changing the following:
* Merge both files into one (no need for separate tests for rv32 and rv64)
* Drop unrelated attribute check test (we already test for `th.mveqz`
and `th.mvnez` instructions, so there is little additional value)
* Fix the pattern to allow matching
Fixes: a1806f0918c0 ("RISC-V: Optimize TARGET_XTHEADCONDMOV")
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
This patch fixes the regression PR target/110598 caused by my recent
addition of a peephole2. The intention of that optimization was to
simplify zeroing a register, followed by an IOR, XOR or PLUS operation
on it into a move, or as described in the comment:
;; Peephole2 rega = 0; rega op= regb into rega = regb.
The issue is that I'd failed to consider the (rare and unusual) case,
where regb is rega, where the transformation leads to the incorrect
"rega = rega", when it should be "rega = 0". The minimal fix is to
add a !reg_mentioned_p check to the recent peephole2.
In addition to resolving the regression, I've added a second peephole2
to optimize the problematic case above, which contains a false
dependency and is therefore tricky to optimize elsewhere. This is an
improvement over GCC 13, for example, that generates the redundant:
xorl %edx, %edx
xorq %rdx, %rdx
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110598
* config/i386/i386.md (peephole2): Check !reg_mentioned_p when
optimizing rega = 0; rega op= regb for op in [XOR,IOR,PLUS].
(peephole2): Simplify rega = 0; rega op= rega cases.
gcc/testsuite/ChangeLog
PR target/110598
* gcc.target/i386/pr110598.c: New test case.
|
|
Fix declaring a parameter initialized using a pdt_len reference
not simplifying the reference to a constant.
2023-07-12 Andre Vehreschild <vehre@gcc.gnu.org>
gcc/fortran/ChangeLog:
PR fortran/102003
* expr.cc (find_inquiry_ref): Replace len of pdt_string by
constant.
(simplify_ref_chain): Ensure input to find_inquiry_ref is
NULL.
(gfc_match_init_expr): Prevent PDT analysis for function calls.
(gfc_pdt_find_component_copy_initializer): Get the initializer
value for given component.
* gfortran.h (gfc_pdt_find_component_copy_initializer): New
function.
* simplify.cc (gfc_simplify_len): Replace len() of PDT with pdt
component ref or constant.
gcc/testsuite/ChangeLog:
* gfortran.dg/pdt_33.f03: New test.
|
|
The following enhances the existing lowpart extraction support for
SLP VEC_PERM nodes to cover all vector aligned extractions. This
allows the existing bb-slp-pr95839.c testcase to be vectorized
with mips -mpaired-single and the new bb-slp-pr95839-3.c testcase
with SSE2.
PR tree-optimization/110630
* tree-vect-slp.cc (vect_add_slp_permutation): New
offset parameter, honor that for the extract code generation.
(vectorizable_slp_permutation_1): Handle offsetted identities.
* gcc.dg/vect/bb-slp-pr95839.c: Make stricter.
* gcc.dg/vect/bb-slp-pr95839-3.c: New variant testcase.
|
|
This patch is adding an obvious missing mult_high auto-vectorization pattern.
Consider this following case:
void __attribute__ ((noipa)) \
mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count) \
{ \
for (int i = 0; i < count; ++i) \
dst[i] = src[i] / 17; \
}
T (int32_t) \
TEST_ALL (DEF_LOOP)
Before this patch:
mod_int32_t:
ble a2,zero,.L5
li a5,17
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v2,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vdiv.vv v1,v1,v2
sub a2,a2,a5
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret
After this patch:
mod_int32_t:
ble a2,zero,.L5
li a5,2021163008
addiw a5,a5,-1927
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v3,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v2,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vmulh.vv v1,v2,v3
sub a2,a2,a5
vsra.vi v2,v2,31
vsra.vi v1,v1,3
vsub.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret
Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub",
4 more instructions are generated, we belive it's much better than before
since division is very slow in the hardward.
gcc/ChangeLog:
* config/riscv/autovec.md (smul<mode>3_highpart): New pattern.
(umul<mode>3_highpart): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.
|
|
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
never longer (yet sometimes shorter) than the corresponding VSHUFPS /
VPSHUFD, due to the immediate operand of the shuffle insns balancing the
(uniform) need for VEX3 in the broadcast ones. When EVEX encoding is
respective the broadcast insns are always shorter.
Add new alternatives to cover the AVX2 and AVX512 cases as appropriate.
While touching this anyway, switch to consistently using "sseshuf1" in
the "type" attributes for all shuffle forms.
gcc/
* config/i386/sse.md (vec_dupv4sf): Make first alternative use
vbroadcastss for AVX2. New AVX512F alternative.
(*vec_dupv4si): New AVX2 and AVX512F alternatives using
vpbroadcastd. Replace sselog1 by sseshuf1 in "type" attribute.
gcc/testsuite/
* gcc.target/i386/avx2-dupv4sf.c: New test.
* gcc.target/i386/avx2-dupv4si.c: Likewise.
* gcc.target/i386/avx512f-dupv4sf.c: Likewise.
* gcc.target/i386/avx512f-dupv4si.c: Likewise.
|
|
The current support of the bitfield-extraction instructions
th.ext and th.extu (XTheadBb extension) only covers sign_extract
and zero_extract. This patch add support for sign_extend and
zero_extend to avoid any shifts for sign or zero extensions.
gcc/ChangeLog:
* config/riscv/riscv.md: No base-ISA extension splitter for XThead*.
* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
New XThead extension INSN.
(*zero_extendsidi2_th_extu): New XThead extension INSN.
(*zero_extendhi<GPR:mode>2_th_extu): New XThead extension INSN.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-ext-1.c: New test.
* gcc.target/riscv/xtheadbb-extu-1.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
constraint of input operand to '0'
False dependency happens when destination is only updated by
pternlog. There is no false dependency when destination is also used
in source. So either a pxor should be inserted, or input operand
should be set with constraint '0'.
gcc/ChangeLog:
PR target/110438
PR target/110202
* config/i386/predicates.md
(int_float_vector_all_ones_operand): New predicate.
* config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): New
define_insn.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Adjust to
define_insn_and_split to avoid false dependence.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Adjust constraint
of operands 1 to '0' to avoid false dependence.
(*andnot<mode>3): Ditto.
(iornot<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110438.c: New test.
* gcc.target/i386/pr100711-6.c: Adjust testcase.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Granite Rapids D.
* common/config/i386/i386-common.cc:
(processor_alias_table): Add graniterapids-d.
* common/config/i386/i386-cpuinfo.h
(enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D.
* config.gcc: Add -march=graniterapids-d.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Handle graniterapids-d.
* config/i386/i386.h: (PTA_GRANITERAPIDS_D): New.
* doc/extend.texi: Add graniterapids-d.
* doc/invoke.texi: Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv16.C: Add graniterapids-d.
* gcc.target/i386/funcspec-56.inc: Handle new march.
|
|
Since commit 24a8acc, 128 bit intrin is enabled for VAES. However,
AVX512VL is not checked until we reached into pattern, which reports an
ICE.
Added an AVX512VL guard at builtin to report error when checking ISA
flags.
gcc/ChangeLog:
* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
Add OPTION_MASK_ISA_AVX512VL.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512vl-vaes-1.c: New test.
|
|
|
|
This patch is to recognize specific permutation pattern which can be applied compress approach.
Consider this following case:
typedef int8_t vnx64i __attribute__ ((vector_size (64)));
1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31, \
37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81, \
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, \
100, 101, 102, 103, 104, 105, 106, 107
void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t *out)
{
vnx64i v1 = *(vnx64i*)x;
vnx64i v2 = *(vnx64i*)y;
vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
*(vnx64i*)out = v3;
}
https://godbolt.org/z/P33nev6cW
Before this patch:
lui a4,%hi(.LANCHOR0)
addi a4,a4,%lo(.LANCHOR0)
vl4re8.v v4,0(a4)
li a4,64
vsetvli a5,zero,e8,m4,ta,mu
vl4re8.v v20,0(a0)
vl4re8.v v16,0(a1)
vmv.v.x v12,a4
vrgather.vv v8,v20,v4
vmsgeu.vv v0,v4,v12
vsub.vv v4,v4,v12
vrgather.vv v8,v16,v4,v0.t
vs4r.v v8,0(a2)
ret
After this patch:
lui a4,%hi(.LANCHOR0)
addi a4,a4,%lo(.LANCHOR0)
vsetvli a5,zero,e8,m4,ta,ma
vl4re8.v v12,0(a1)
vl4re8.v v8,0(a0)
vlm.v v0,0(a4)
vslideup.vi v4,v12,20
vcompress.vm v4,v8,v0
vs4r.v v4,0(a2)
ret
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum insn_type): Add vcompress optimization.
* config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto.
(shuffle_compress_patterns): Ditto.
(expand_vec_perm_const_1): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
|
|
Some of the analyzer out-of-bounds-diagram tests fail on AIX.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-4.c: Skip on AIX.
* gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-7.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-13.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-15.c: Same.
|
|
gcc/fortran/ChangeLog:
PR fortran/110288
* symbol.cc (gfc_copy_formal_args_intr): When deriving the formal
argument attributes from the actual ones for intrinsic procedure
calls, take special care of CHARACTER arguments that we do not
wrongly treat them formally as deferred-length.
gcc/testsuite/ChangeLog:
PR fortran/110288
* gfortran.dg/findloc_10.f90: New test.
|
|
The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c. The .h file
contains a large number of vsx vector built-in tests. The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor. The tests are compile only.
This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the built-in argument
types and the functional correctness of each built-in. There is also a
compile only test that verifies the built-ins generate the expected number
of instructions for the various built-in tests.
gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
|
|
The pr97428.c test assumes support for vectors of doubles, but some
targets only support vectors of floats, causing this test to fail with
such targets. Limit this test to targets that support vectors of
doubles then.
gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
|
|
This patch combines basic blocks for static analysis of uninitialized
variables providing that they are not the top of a loop, are not reached
by a conditional and are not reached after a procedure call. It also
avoids checking array accesses for static analysis. Finally the patch
adds switch modifiers to allow static analysis to include conditional
branches for subsequent basic block analysis.
gcc/ChangeLog:
* doc/gm2.texi (-Wuninit-variable-checking=) New item.
gcc/m2/ChangeLog:
* gm2-compiler/M2BasicBlock.def (InitBasicBlocksFromRange): New
parameter ScopeSym.
* gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): New
parameter ScopeSym.
(InitBasicBlocksFromRange): New parameter ScopeSym. Call
ConvertQuads2BasicBlock with ScopeSym.
(DisplayBasicBlocks): Uncomment.
* gm2-compiler/M2Code.mod: Replace VariableAnalysis with
ScopeBlockVariableAnalysis.
(InitialDeclareAndOptiomize): Add parameter scope.
(SecondDeclareAndOptimize): Add parameter scope.
* gm2-compiler/M2GCCDeclare.mod (DeclareConstructor): Add scope
parameter to DeclareTypesConstantsProceduresInRange.
(DeclareTypesConstantsProceduresInRange): New parameter scope.
Pass scope to DisplayQuadRange. Reformatted.
* gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2Optimize.mod (KnownReachable): New parameter
scope.
* gm2-compiler/M2Options.def (SetUninitVariableChecking): Add
arg parameter.
* gm2-compiler/M2Options.mod (SetUninitVariableChecking): Add
arg parameter and set boolean UninitVariableChecking and
UninitVariableConditionalChecking.
(UninitVariableConditionalChecking): New boolean set to FALSE.
* gm2-compiler/M2Quads.def (IsGoto): New procedure function.
(DisplayQuadRange): Add scope parameter.
(LoopAnalysis): Add scope parameter.
* gm2-compiler/M2Quads.mod: Import PutVarArrayRef.
(IsGoto): New procedure function.
(LoopAnalysis): Add scope parameter and use MetaErrorT1 instead
of WarnStringAt.
(BuildStaticArray): Call PutVarArrayRef.
(BuildDynamicArray): Call PutVarArrayRef.
(DisplayQuadRange): Add scope parameter.
(GetM2OperatorDesc): Add relational condition cases.
* gm2-compiler/M2Scope.def (ScopeProcedure): Add parameter.
* gm2-compiler/M2Scope.mod (DisplayScope): Pass scopeSym to
DisplayQuadRange.
(ForeachScopeBlockDo): Pass scopeSym to p.
* gm2-compiler/M2SymInit.def (VariableAnalysis): Rename to ...
(ScopeBlockVariableAnalysis): ... this.
* gm2-compiler/M2SymInit.mod (ScopeBlockVariableAnalysis): Add
scope parameter.
(bbEntry): New pointer to record.
(bbArray): New array.
(bbFreeList): New variable.
(errorList): New list.
(IssueConditional): New procedure.
(GenerateNoteFlow): New procedure.
(IssueWarning): New procedure.
(IsUniqueWarning): New procedure.
(CheckDeferredRecordAccess): Re-implement.
(CheckBinary): Add warning and lst parameters.
(CheckUnary): Add warning and lst parameters.
(CheckXIndr): Add warning and lst parameters.
(CheckIndrX): Add warning and lst parameters.
(CheckBecomes): Add warning and lst parameters.
(CheckComparison): Add warning and lst parameters.
(CheckReadBeforeInitQuad): Add warning and lst parameters to all
Check procedures. Add all case quadruple clauses.
(FilterCheckReadBeforeInitQuad): Add warning and lst parameters.
(CheckReadBeforeInitFirstBasicBlock): Add warning and lst parameters.
(bbArrayKill): New procedure.
(DumpBBEntry): New procedure.
(DumpBBArray): New procedure.
(DumpBBSequence): New procedure.
(TestBBSequence): New procedure.
(CreateBBPermultations): New procedure.
(ScopeBlockVariableAnalysis): New procedure.
(GetOp3): New procedure.
(GenerateCFG): New procedure.
(NewEntry): New procedure.
(AppendEntry): New procedure.
(init): Initialize bbFreeList and errorList.
* gm2-compiler/SymbolTable.def (PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-compiler/SymbolTable.mod (SymVar): ArrayRef new field.
(MakeVar): Set ArrayRef to FALSE.
(PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-gcc/init.cc (_M2_M2SymInit_init): New prototype.
(init_PerCompilationInit): Add call to _M2_M2SymInit_init.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New definition.
* gm2-lang.cc (gm2_langhook_handle_option): Add new case
OPT_Wuninit_variable_checking_.
* lang.opt: Wuninit-variable-checking= new entry.
gcc/testsuite/ChangeLog:
* gm2/switches/uninit-variable-checking/cascade/fail/cascadedif.mod: New test.
* gm2/switches/uninit-variable-checking/cascade/fail/switches-uninit-variable-checking-cascade-fail.exp:
New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
Here during ahead of time coercion of the variable template-id v1<int>,
since we pass only the innermost arguments to coerce_template_parms (and
outer arguments are still dependent at this point), substitution of the
default template argument V=U just lowers U from level 2 to level 1 rather
than replacing it with int as expected. Thus after coercion we incorrectly
end up with (effectively) v1<int, T> instead of v1<int, int>.
Coercion of a class/alias template-id on the other hand always passes
all levels arguments, which avoids this issue. So this patch makes us
do the same for variable template-ids.
PR c++/110580
gcc/cp/ChangeLog:
* pt.cc (lookup_template_variable): Pass all levels of arguments
to coerce_template_parms, and use the parameters from the most
general template.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/var-templ83.C: New test.
|
|
Antony Polukhin 2023-07-11 09:51:58 UTC
There's a typo at https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87
It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()`
gcc/testsuite/ChangeLog:
PR target/110170
* g++.target/i386/pr110170.C: Fix typo.
|
|
On ports with 32-bit long, the test produced excess errors:
gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
'Item::y' exceeds its type
Reported-by: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr110557.cc: Use long long instead of long for
64-bit type.
(test): Remove an unnecessary cast.
|
|
|
|
D front-end changes:
- Import dmd v2.104.1.
- Deprecation phase ended for access to private method when
overloaded with public method.
D runtime changes:
- Import druntime v2.104.1.
- Linux input header translations were added to druntime.
- Integration with the Valgrind `memcheck' tool has been added
to the garbage collector.
Phobos changes:
- Import phobos v2.104.1.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd a88e1335f7.
* dmd/VERSION: Bump version to v2.104.1.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime a88e1335f7.
* src/MERGE: Merge upstream phobos 1921d29df.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (libphobos-checking): Add valgrind flag.
(DRUNTIME_LIBRARIES_VALGRIND): Call.
* libdruntime/Makefile.am (DRUNTIME_CSOURCES): Add
etc/valgrind/valgrind_.c.
(DRUNTIME_DSOURCES): Add etc/valgrind/valgrind.d.
(DRUNTIME_DSOURCES_LINUX): Add core/sys/linux/input.d,
core/sys/linux/input_event_codes.d, core/sys/linux/uinput.d.
* libdruntime/Makefile.in: Regenerate.
* m4/druntime/libraries.m4 (DRUNTIME_LIBRARIES_VALGRIND): Define.
|
|
Now that we cache level-lowered ttps we can end up processing the same
ttp multiple times via (multiple calls to) redeclare_class_template, so
we can't assume a ttp's DECL_CONTEXT is initially empty.
PR c++/110523
gcc/cp/ChangeLog:
* pt.cc (redeclare_class_template): Relax the ttp DECL_CONTEXT
assert, and downgrade it to a checking assert.
gcc/testsuite/ChangeLog:
* g++.dg/template/ttp37.C: New test.
|
|
After the recent MVE intrinsics re-implementation, LTO stopped working
because the intrinsics would no longer be defined.
The main part of the patch is simple and similar to what we do for
AArch64:
- call handle_arm_mve_h() from arm_init_mve_builtins to declare the
intrinsics when the compiler is in LTO mode
- actually implement arm_builtin_decl for MVE.
It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
its value in the user code cannot be guessed at LTO time, so we always
have to assume that it was not defined. The led to a few fixes in the
way we register MVE builtins as placeholders or not. Without this
patch, we would just omit some versions of the inttrinsics when
__ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
placeholders, we need to always keep entries for all of them to ensure
that we have a consistent numbering scheme.
2023-06-26 Christophe Lyon <christophe.lyon@linaro.org>
PR target/110268
gcc/
* config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO.
(arm_builtin_decl): Hahndle MVE builtins.
* config/arm/arm-mve-builtins.cc (builtin_decl): New function.
(add_unique_function): Fix handling of
__ARM_MVE_PRESERVE_USER_NAMESPACE.
(add_overloaded_function): Likewise.
* config/arm/arm-protos.h (builtin_decl): New declaration.
gcc/testsuite/
* gcc.target/arm/pr110268-1.c: New test.
* gcc.target/arm/pr110268-2.c: New test.
|
|
For arm targets, we generate many effective-targets with
check_effective_target_FUNC_multilib and
check_effective_target_arm_arch_FUNC_multilib which check if we can
link and execute a simple program with a given set of flags/multilibs.
In some cases however, it's possible to link but not to execute a
program, so this patch adds similar _link effective-targets which only
check if link succeeds.
The patch does not uupdate the documentation as it already lacks the
numerous existing related effective-targets.
2023-07-07 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
* lib/target-supports.exp (arm_*FUNC_link): New effective-targets.
|
|
If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended. But this was not handled
correctly.
For example:
int x : 8;
long y : 55;
bool z : 1;
The vectorized extraction of y was:
vect__ifc__49.29_110 =
MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.27_108];
vect_patt_38.30_112 =
vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
vect_patt_40.32_114 =
VIEW_CONVERT_EXPR<vector(2) long int>(vect_patt_39.31_113);
This is obviously incorrect. This pach has implemented it as:
vect__ifc__25.16_62 =
MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.14_60];
vect_patt_31.17_63 =
VIEW_CONVERT_EXPR<vector(2) long int>(vect__ifc__25.16_62);
vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;
gcc/ChangeLog:
PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.
gcc/testsuite/ChangeLog:
PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.
|
|
This patch implements another of Uros' suggestions, to investigate a
insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64.
In PR 88873, the RTL the middle-end expands for passing V2DF in TImode
is subtly different from what it does for V2DI in TImode, sufficiently so
that my explanations for why insvti_lowpart_1 isn't required don't apply
in this case.
This patch adds an insvti_lowpart_1 pattern, complementing the existing
insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1.
Because the middle-end represents 128-bit constants using CONST_WIDE_INT
and 64-bit constants using CONST_INT, it's easiest to treat these as
different patterns, rather than attempt <dwi> parameterization.
This patch also includes a peephole2 (actually a pair) to transform
xchg instructions into mov instructions, when one of the destinations
is unused. This optimization is required to produce the optimal code
sequences below.
For the 64-bit case:
__int128 foo(__int128 x, unsigned long long y)
{
__int128 m = ~((__int128)~0ull);
__int128 t = x & m;
__int128 r = t | y;
return r;
}
Before:
xchgq %rdi, %rsi
movq %rdx, %rax
xorl %esi, %esi
xorl %edx, %edx
orq %rsi, %rax
orq %rdi, %rdx
ret
After:
movq %rdx, %rax
movq %rsi, %rdx
ret
For the 32-bit case:
long long bar(long long x, int y)
{
long long mask = ~0ull << 32;
long long t = x & mask;
long long r = t | (unsigned int)y;
return r;
}
Before:
pushl %ebx
movl 12(%esp), %edx
xorl %ebx, %ebx
xorl %eax, %eax
movl 16(%esp), %ecx
orl %ebx, %edx
popl %ebx
orl %ecx, %eax
ret
After:
movl 12(%esp), %eax
movl 8(%esp), %edx
ret
2023-07-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform xchg insn with a
REG_UNUSED note to a (simple) move.
(*insvti_lowpart_1): New define_insn_and_split.
(*insvdi_lowpart_1): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/insvdi_lowpart-1.c: New test case.
* gcc.target/i386/insvti_lowpart-1.c: Likewise.
|
|
Following Uros' suggestion, this patch adds support for AVX512VL's
vpro[lr][dq] instructions to the recently added scalar-to-vector (STV)
enhancements to handle DImode and SImode rotations by a constant.
For the test cases:
unsigned long long rot1(unsigned long long x) {
return (x>>1) | (x<<63);
}
void mem1(unsigned long long *p) {
*p = rot1(*p);
}
with -m32 -O2 -mavx512vl, we currently generate:
rot1: movl 4(%esp), %eax
movl 8(%esp), %edx
movl %eax, %ecx
shrdl $1, %edx, %eax
shrdl $1, %ecx, %edx
ret
mem1: movl 4(%esp), %eax
vmovq (%eax), %xmm0
vpshufd $20, %xmm0, %xmm0
vpsrlq $1, %xmm0, %xmm0
vpshufd $136, %xmm0, %xmm0
vmovq %xmm0, (%eax)
ret
with this patch, we now generate:
rot1: vmovq 4(%esp), %xmm0
vprorq $1, %xmm0, %xmm0
vmovd %xmm0, %eax
vpextrd $1, %xmm0, %edx
ret
mem1: movl 4(%esp), %eax
vmovq (%eax), %xmm0
vprorq $1, %xmm0, %xmm0
vmovq %xmm0, (%eax)
ret
2023-07-10 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Tweak
gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL.
(general_scalar_chain::convert_rotate): On TARGET_AVX512F generate
avx512vl_rolv2di or avx412vl_rolv4si when appropriate.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case.
|
|
D front-end changes:
- Import dmd v2.104.0.
- Assignment-style syntax is now allowed for `alias this'.
- Overloading `extern(C)' functions is now an error.
D runtime changes:
- Import druntime v2.104.0.
Phobos changes:
- Import phobos v2.104.0.
- Better static assert messages when instantiating
`std.algorithm.iteration.permutations' with wrong inputs.
- Added `std.system.instructionSetArchitecture' and
`std.system.ISA'.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 17ccd12af3.
* dmd/VERSION: Bump version to v2.104.0.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/apply.o to
d/postordervisitor.o.
* d-codegen.cc (make_location_t): Update for new front-end interface.
(build_filename_from_loc): Likewise.
(build_assert_call): Likewise.
(build_array_bounds_call): Likewise.
(build_bounds_index_condition): Likewise.
(build_bounds_slice_condition): Likewise.
(build_frame_type): Likewise.
(get_frameinfo): Likewise.
* d-diagnostic.cc (d_diagnostic_report_diagnostic): Likewise.
* decl.cc (build_decl_tree): Likewise.
(start_function): Likewise.
* expr.cc (ExprVisitor::visit (NewExp *)): Replace code generation of
`new pointer' with front-end lowering.
* runtime.def (NEWITEMT): Remove.
(NEWITEMIT): Remove.
* toir.cc (IRVisitor::visit (LabelStatement *)): Update for new
front-end interface.
* typeinfo.cc (check_typeinfo_type): Likewise.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime 17ccd12af3.
* src/MERGE: Merge upstream phobos 8d3800bee.
gcc/testsuite/ChangeLog:
* gdc.dg/asm4.d: Update test.
|
|
We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.
This patch adds pre_reload splitter to detect the min/max pattern.
Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.
gcc/ChangeLog:
PR target/110170
* config/i386/i386.md (*ieee_max<mode>3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min<mode>3_1): Ditto, but for fp min pattern.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.
|
|
|
|
D front-end changes:
- Import dmd v2.104.0-beta.1.
- Better error message when attribute inference fails down the
call stack.
- Using `;' as an empty statement has been turned into an error.
- Using `in' parameters with non- `extern(D)' or `extern(C++)'
functions is deprecated.
- `in ref' on parameters has been deprecated in favor of
`-preview=in'.
- Throwing `immutable', `const', `inout', and `shared' qualified
objects is now deprecated.
- User Defined Attributes now parse Template Arguments.
D runtime changes:
- Import druntime v2.104.0-beta.1.
Phobos changes:
- Import phobos v2.104.0-beta.1.
- Better static assert messages when instantiating
`std.algorithm.comparison.clamp' with wrong inputs.
- `std.typecons.Rebindable' now supports all types.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 28a3b24c2e.
* dmd/VERSION: Bump version to v2.104.0-beta.1.
* d-codegen.cc (build_bounds_slice_condition): Update for new
front-end interface.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Initialize global.compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Replace code generation
with new front-end lowering.
(ExprVisitor::visit (LoweredAssignExp *)): New method.
(ExprVisitor::visit (StructLiteralExp *)): Don't generate static
initializer symbols for structs defined in C sources.
* runtime.def (ARRAYCATT): Remove.
(ARRAYCATNTX): Remove.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime 28a3b24c2e.
* src/MERGE: Merge upstream phobos 8ab95ded5.
gcc/testsuite/ChangeLog:
* gdc.dg/rtti1.d: Move array concat testcase to ...
* gdc.dg/nogc1.d: ... here. New test.
|
|
Dumps of profile_counts are quite hard to interpret since they are 64bit fixed point
values. In many cases one looks at a single function and it is better to think of
basic block frequency, that is how many times it is executed each invocatoin. This
patch makes CFG dumps to also print this info.
For example:
main()
{
for (int i = 0; i < 10; i++)
t();
}
the -fdump-tree-optimized-blocks-details now prints:
int main ()
{
unsigned int ivtmp_1;
unsigned int ivtmp_2;
;; basic block 2, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot
;; prev block 0, next block 3, flags: (NEW, VISITED)
;; pred: ENTRY [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)
;; succ: 3 [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)
;; basic block 3, loop depth 1, count 976138697 (estimated locally, freq 10.0011), maybe hot
;; prev block 2, next block 4, flags: (NEW, VISITED)
;; pred: 3 [90.0% (guessed)] count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE)
;; 2 [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)
# ivtmp_2 = PHI <ivtmp_1(3), 10(2)>
t ();
ivtmp_1 = ivtmp_2 + 4294967295;
if (ivtmp_1 != 0)
goto <bb 3>; [90.00%]
else
goto <bb 4>; [10.00%]
;; succ: 3 [90.0% (guessed)] count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE)
;; 4 [10.0% (guessed)] count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE)
;; basic block 4, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot
;; prev block 3, next block 1, flags: (NEW, VISITED)
;; pred: 3 [10.0% (guessed)] count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE)
return 0;
;; succ: EXIT [always] count:97603128 (estimated locally, freq 1.0000) (EXECUTABLE)
}
Which makes it easier to see that the inner bb is executed 10 times per invocation
gcc/ChangeLog:
* cfg.cc (check_bb_profile): Dump counts with relative frequency.
(dump_edge_info): Likewise.
(dump_bb_info): Likewise.
* profile-count.cc (profile_count::dump): Add comma between quality and
freq.
gcc/testsuite/ChangeLog:
* gcc.dg/predict-22.c: Update template.
|
|
|
|
gcc/ChangeLog:
PR tree-optimization/110600
* cfgloopmanip.cc (scale_loop_profile): Add mising profile_dump check.
gcc/testsuite/ChangeLog:
PR tree-optimization/110600
* gcc.c-torture/compile/pr110600.c: New test.
|
|
2023-07-08 Steve Kargl <sgk@troutmask.apl.washington.edu>
gcc/fortran
PR fortran/99139
PR fortran/99368
* match.cc (gfc_match_namelist): Check for host associated or
defined types before applying default type.
(gfc_match_select_rank): Apply default type to selector of
unknown type if possible.
* resolve.cc (resolve_fl_variable): Do not apply local default
initialization to assumed rank entities.
gcc/testsuite/
PR fortran/99139
* gfortran.dg/pr99139.f90 : New test
PR fortran/99368
* gfortran.dg/pr99368.f90 : New test
|
|
In this testcase the profile is misupdated before loop has two exits.
The first exit is one eliminated by complete unrolling while second exit remains.
We remove first exit but forget about fact that the source BB of other exit will
then have higher frequency making other exit more likely.
This patch fixes that in duplicate_loop_body_to_header_edge.
While looking into resulting profiles I also noticed that in some cases
scale_loop_profile may drop probabilities to 0 incorrectly either when
trying to update exit from nested loop (which has similar problem) or when the profile
was inconsistent as described in coment bellow.
gcc/ChangeLog:
PR middle-end/110590
* cfgloopmanip.cc (scale_loop_profile): Avoid scaling exits within
inner loops and be more careful about inconsistent profiles.
(duplicate_loop_body_to_header_edge): Fix profile update when eliminated
exit is followed by other exit.
gcc/testsuite/ChangeLog:
PR middle-end/110590
* gcc.dg/tree-prof/update-cunroll-2.c: Remove xfail.
* gcc.dg/tree-ssa/update-cunroll.c: Likewise.
|
|
gcc/fortran/ChangeLog:
PR fortran/92178
* trans-expr.cc (gfc_conv_procedure_call): Check procedures for
allocatable dummy arguments with INTENT(OUT) and move deallocation
of actual arguments after evaluation of argument expressions before
the procedure is executed.
gcc/testsuite/ChangeLog:
PR fortran/92178
* gfortran.dg/intent_out_16.f90: New test.
* gfortran.dg/intent_out_17.f90: New test.
* gfortran.dg/intent_out_18.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
|
|
gcc/fortran/ChangeLog:
PR fortran/110585
* arith.cc (gfc_compare_expr): Handle equality comparison of constant
complex gfc_expr arguments.
gcc/testsuite/ChangeLog:
PR fortran/110585
* gfortran.dg/findloc_9.f90: New test.
|
|
|
|
gcc/testsuite/ChangeLog:
* gcc.dg/pr43864-2.c: Avoid matching pre dump with details-blocks.
* gcc.dg/pr43864-3.c: Likewise.
* gcc.dg/pr43864-4.c: Likewise.
* gcc.dg/pr43864.c: Likewise.
* gcc.dg/unroll-7.c: xfail.
|
|
When we collect just user events for autofdo with lbr we get some events where branch
sources are kernel addresses and branch targets are user addresses. Without kernel MMAP
events create_gcov can't make sense of kernel addresses. Currently create_gcov fails if
it can't map at least 95% of events. We sometimes get below this threshold with just
user events. The change is to collect both user events and kernel events.
Tested on x86_64-pc-linux-gnu.
ChangeLog:
* Makefile.in: Collect both kernel and user events for autofdo
* Makefile.tpl: Collect both kernel and user events for autofdo
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Collect both kernel and user events for autofdo
|
|
Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
result in surprising code. Consider the example below (from PR 43644):
unsigned __int128 foo(unsigned __int128 x, unsigned long long y) {
return x+y;
}
which currently results in 6 consecutive movq instructions:
foo: movq %rsi, %rax
movq %rdi, %rsi
movq %rdx, %rcx
movq %rax, %rdi
movq %rsi, %rax
movq %rdi, %rdx
addq %rcx, %rax
adcq $0, %rdx
ret
The underlying issue is that during RTL expansion, we generate the
following initial RTL for the x argument:
(insn 4 3 5 2 (set (reg:TI 85)
(subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
(nil))
(insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
(reg:DI 87)) "pr43644-2.c":5:1 -1
(nil))
(insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
(reg:TI 85)) "pr43644-2.c":5:1 -1
(nil))
which by combine/reload becomes
(insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
(const_int 0 [0])) "pr43644-2.c":5:1 -1
(nil))
(insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
(reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
(expr_list:REG_DEAD (reg:DI 93)
(nil)))
(insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
(reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
(expr_list:REG_DEAD (reg:DI 94)
(nil)))
where the heavy use of SUBREG SET_DESTs creates challenges for both
combine and register allocation.
The improvement proposed here is to avoid these problematic SUBREGs
by adding (two) special cases to ix86_expand_move. For insn 4, which
sets a TImode destination from a paradoxical SUBREG, to assign the
lowpart, we can use an explicit zero extension (zero_extendditi2 was
added in July 2022), and for insn 5, which sets the highpart of a
TImode register we can use the *insvti_highpart_1 instruction (that
was added in May 2023, after being approved for stage1 in January).
This allows combine to work its magic, merging these insns into a
*concatditi3 and from there into other optimized forms.
So for the test case above, we now generate only a single movq:
foo: movq %rdx, %rax
xorl %edx, %edx
addq %rdi, %rax
adcq %rsi, %rdx
ret
But there is a little bad news. This patch causes two (minor) missed
optimization regressions on x86_64; gcc.target/i386/pr82580.c and
gcc.target/i386/pr91681-1.c. As shown in the test case above, we're
no longer generating adcq $0, but instead using xorl. For the other
FAIL, register allocation now has more freedom and is (arbitrarily)
choosing a register assignment that doesn't match what the test is
expecting. These issues are easier to explain and fix once this patch
is in the tree.
The good news is that this approach fixes a number of long standing
issues, that need to checked in bugzilla, including PR target/110533
which was just opened/reported earlier this week.
2023-07-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/43644
PR target/110533
* config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of
TImode destinations from paradoxical SUBREGs (setting the lowpart)
into explicit zero extensions. Use *insvti_highpart_1 instruction
to set the highpart of a TImode destination.
gcc/testsuite/ChangeLog
PR target/43644
PR target/110533
* gcc.target/i386/pr110533.c: New test case.
* gcc.target/i386/pr43644-2.c: Likewise.
|
|
Restrict the generating of CONST_DECLs for D manifest constants to just
scalars without pointers. It shouldn't happen that a reference to a
manifest constant has not been expanded within a function body during
codegen, but it has been found to occur in older versions of the D
front-end (PR98277), so if the decl of a non-scalar constant is
requested, just return its initializer as an expression.
PR d/108842
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Only emit scalar
manifest constants.
(get_symbol_decl): Don't generate CONST_DECL for non-scalar manifest
constants.
* imports.cc (ImportVisitor::visit (VarDeclaration *)): New method.
gcc/testsuite/ChangeLog:
* gdc.dg/pr98277.d: Add more tests.
* gdc.dg/pr108842.d: New test.
|
|
Information about profile mismatches is printed only with -details-blocks for some time.
I think it should be printed even with default to make it easier to spot when someone introduces
new transform that breaks the profile, but I will send separate RFC for that.
This patch enables details in all testcases that greps for Invalid sum. There are 4 testcases
which fails:
gcc.dg/tree-ssa/loop-ch-profile-1.c
here the problem is that loop header dulication introduces loop invariant conditoinal that is later
updated by tree-ssa-dom but dom does not take care of updating profile.
Since loop-ch knows when it duplicates loop invariant, we may be able to get this right.
The test is still useful since it tests that right after ch profile is consistent.
gcc.dg/tree-prof/update-cunroll-2.c
This is about profile updating code in duplicate_loop_body_to_header_edge being wrong when optimized
out exit is not last in the loop. In that case the probability of later exits needs to be accounted in.
I will think about making this better - in general this does not seem to have easy solution, but for
special case of chained tests we can definitely account for the later exits.
gcc.dg/tree-ssa/update-unroll-1.c
This fails after aprefetch invoked unrolling. I did not look into details yet.
gcc.dg/tree-prof/update-unroll-2.c
This one seems similar as previous
I decided to xfail these tests and deal with them incrementally and filled in PR110590.
gcc/testsuite/ChangeLog:
* g++.dg/tree-prof/indir-call-prof.C: Add block-details to dump flags.
* gcc.dg/pr43864-2.c: Likewise.
* gcc.dg/pr43864-3.c: Likewise.
* gcc.dg/pr43864-4.c: Likewise.
* gcc.dg/pr43864.c: Likewise.
* gcc.dg/tree-prof/cold_partition_label.c: Likewise.
* gcc.dg/tree-prof/indir-call-prof.c: Likewise.
* gcc.dg/tree-prof/update-cunroll-2.c: Likewise.
* gcc.dg/tree-prof/update-tailcall.c: Likewise.
* gcc.dg/tree-prof/val-prof-1.c: Likewise.
* gcc.dg/tree-prof/val-prof-2.c: Likewise.
* gcc.dg/tree-prof/val-prof-3.c: Likewise.
* gcc.dg/tree-prof/val-prof-4.c: Likewise.
* gcc.dg/tree-prof/val-prof-5.c: Likewise.
* gcc.dg/tree-ssa/fnsplit-1.c: Likewise.
* gcc.dg/tree-ssa/loop-ch-profile-2.c: Likewise.
* gcc.dg/tree-ssa/update-threading.c: Likewise.
* gcc.dg/tree-ssa/update-unswitch-1.c: Likewise.
* gcc.dg/unroll-7.c: Likewise.
* gcc.dg/unroll-8.c: Likewise.
* gfortran.dg/pr25623-2.f90: Likewise.
* gfortran.dg/pr25623.f90: Likewise.
* gcc.dg/tree-ssa/loop-ch-profile-1.c: Likewise; xfail.
* gcc.dg/tree-ssa/update-cunroll.c: Likewise; xfail.
* gcc.dg/tree-ssa/update-unroll-1.c: Likewise; xfail.
|
|
Fix two bugs in scale_loop_profile which crept in during my
cleanups and curiously enoug did not show on the testcases we have so far.
The patch also adds the missing call to cap iteration count of the vectorized
loop epilogues.
Vectorizer profile needs more work, but I am trying to chase out obvious bugs first
so the profile quality statistics become meaningful and we can try to improve on them.
Now we get:
Pass dump id and name |static mismatcdynamic mismatch
|in count |in count
107t cunrolli | 3 +3| 17251 +17251
116t vrp | 5 +2| 30908 +16532
118t dce | 3 -2| 17251 -13657
127t ch | 13 +10| 17251
131t dom | 39 +26| 17251
133t isolate-paths | 47 +8| 17251
134t reassoc | 49 +2| 17251
136t forwprop | 53 +4| 202501 +185250
159t cddce | 61 +8| 216211 +13710
161t ldist | 62 +1| 216211
172t ifcvt | 66 +4| 373711 +157500
173t vect | 143 +77| 9801947 +9428236
176t cunroll | 149 +6| 12006408 +2204461
183t loopdone | 146 -3| 11944469 -61939
195t fre | 142 -4| 11944469
197t dom | 141 -1| 13038435 +1093966
199t threadfull | 143 +2| 13246410 +207975
200t vrp | 145 +2| 13444579 +198169
204t dce | 143 -2| 13371315 -73264
206t sink | 141 -2| 13371315
211t cddce | 147 +6| 13372755 +1440
255t optimized | 145 -2| 13372755
256r expand | 141 -4| 13371197 -1558
258r into_cfglayout | 139 -2| 13371197
275r loop2_unroll | 143 +4| 16792056 +3420859
291r ce2 | 141 -2| 16811462
312r pro_and_epilogue | 161 +20| 16873400 +61938
315r jump2 | 167 +6| 20910158 +4036758
323r bbro | 160 -7| 16559844 -4350314
Vect still introduces 77 profile mismatches (same as without this patch)
however subsequent cunroll works much better with 6 new mismatches compared to
78. Overall it reduces 229 mismatches to 160.
Also overall runtime estimate is now reduced by 6.9%.
Previously the overall runtime estimate grew by 11% which was result of the fat
that the epilogue profile was pretty much the same as profile of the original
loop.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* cfgloopmanip.cc (scale_loop_profile): Fix computation of count_in and scaling blocks
after exit.
* tree-vect-loop-manip.cc (vect_do_peeling): Scale loop profile of the epilogue if bound
is known.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/vect-profile-upate.c: New test.
|
|
Do not reinitialize vector lanes to zero since they are already
initialized to zero.
gcc/ChangeLog:
* config/s390/s390.cc (vec_init): Fix default case
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vec-init-3.c: New test.
|
|
For given testcase a reload pseudo happened to occur only in reload
insns created on one constraint sub-pass. Therefore its initial class
(ALL_REGS) was not refined and the reload insns were not processed on
the next constraint sub-passes. This resulted into the wrong insn.
PR rtl-optimization/110372
gcc/ChangeLog:
* lra-assigns.cc (assign_by_spills): Add reload insns involving
reload pseudos with non-refined class to be processed on the next
sub-pass.
* lra-constraints.cc (enough_allocatable_hard_regs_p): New func.
(in_class_p): Use it.
(print_curr_insn_alt): New func.
(process_alt_operands): Use it. Improve debug info.
(curr_insn_transform): Use print_curr_insn_alt. Refine reload
pseudo class if it is not refined yet.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110372.c: New.
|