aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/i386/x86-tune.def
AgeCommit message (Collapse)AuthorFilesLines
2024-10-10Add a new tune avx256_avoid_vec_perm for SRF.liuhongt1-1/+6
According to Intel SOM[1], For Crestmont, most 256-bit Intel AVX2 instructions can be decomposed into two independent 128-bit micro-operations, except for a subset of Intel AVX2 instructions, known as cross-lane operations, can only compute the result for an element by utilizing one or more sources belonging to other elements. The 256-bit instructions listed below use more operand sources than can be natively supported by a single reservation station within these microarchitectures. They are decomposed into two μops, where the first μop resolves a subset of operand dependencies across two cycles. The dependent second μop executes the 256-bit operation by using a single 128-bit execution port for two consecutive cycles with a five-cycle latency for a total latency of seven cycles. VPERM2I128 ymm1, ymm2, ymm3/m256, imm8 VPERM2F128 ymm1, ymm2, ymm3/m256, imm8 VPERMPD ymm1, ymm2/m256, imm8 VPERMPS ymm1, ymm2, ymm3/m256 VPERMD ymm1, ymm2, ymm3/m256 VPERMQ ymm1, ymm2/m256, imm8 Instead of setting tune avx128_optimal for SRF, the patch add a new tune avx256_avoid_vec_perm for it. so by default, vectorizer still uses 256-bit VF if cost is profitable, but lowers to 128-bit whenever 256-bit vec_perm is needed for auto-vectorization. w/o vec_perm, performance of 256-bit vectorization should be similar as 128-bit ones(some benchmark results show it's even better than 128-bit vectorization since it enables more parallelism for convert cases.) [1] https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs): Add new member m_num_avx256_vec_perm. (ix86_vector_costs::add_stmt_cost): Record 256-bit vec_perm. (ix86_vector_costs::finish_cost): Prevent vectorization for TAREGT_AVX256_AVOID_VEC_PERM when there's 256-bit vec_perm instruction. * config/i386/i386.h (TARGET_AVX256_AVOID_VEC_PERM): New Macro. * config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): Add m_CORE_ATOM. (X86_TUNE_AVX256_AVOID_VEC_PERM): New tune. gcc/testsuite/ChangeLog: * gcc.target/i386/avx256_avoid_vec_perm.c: New test.
2024-10-10Add new microarchitecture tune for SRF/GRR/CWF.liuhongt1-0/+8
For Crestmont, 4-operand vex blendv instructions come from MSROM and is slower than 3-instructions sequence (op1 & mask) | (op2 & ~mask). legacy blendv instruction can still be handled by the decoder. The patch add a new tune which is enabled for all processors except for SRF/CWF. It will use vpand + vpandn + vpor instead of vpblendvb(similar for vblendvps/vblendvpd) for SRF/CWF. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Guard instruction blendv generation under new tune. * config/i386/i386.h (TARGET_SSE_MOVCC_USE_BLENDV): New Macro. * config/i386/x86-tune.def (X86_TUNE_SSE_MOVCC_USE_BLENDV): New tune.
2024-09-11Enable tune fuse_move_and_alu for GNR.liuhongt1-1/+2
According to Intel Software Optimization Manual[1], the Redwood cove microarchitecture supports LD+OP and MOV+OP macro fusions. The patch enables MOV+OP tune for GNR. [1] https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_FUSE_MOV_AND_ALU): Enable for GNR and GNR-D.
2024-09-03Zen5 tuning part 3: scheduler tweaksJan Hubicka1-2/+9
this patch adds support for new fussion in znver5 documented in the optimization manual: The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions with certain ALU instructions. The following conditions need to be met for fusion to happen: - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B - The MOV is followed by an ALU instruction where the MOV and ALU destination register match. - The ALU instruction may source only registers or immediate data. There cannot be any memory source. - The ALU instruction sources either the source or dest of MOV instruction. - If ALU instruction has 2 reg sources, they should be different. - The following ALU instructions can fuse with an older qualified MOV instruction: ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR (I assume OP is OR) I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but with our model we can't realy use it. Increasing issue rate to 8 leads to infinite loop in scheduler. Finally, I also enabled fuse_alu_and_branch since it is supported by znver5 (I think by earlier zens too). New fussion pattern moves quite few instructions around in common code: @@ -2210,13 +2210,13 @@ .cfi_offset 3, -32 leaq 63(%rsi), %rbx movq %rbx, %rbp + shrq $6, %rbp + salq $3, %rbp subq $16, %rsp .cfi_def_cfa_offset 48 movq %rdi, %r12 - shrq $6, %rbp - movq %rsi, 8(%rsp) - salq $3, %rbp movq %rbp, %rdi + movq %rsi, 8(%rsp) call _Znwm movq 8(%rsp), %rsi movl $0, 8(%r12) @@ -2224,8 +2224,8 @@ movq %rax, (%r12) movq %rbp, 32(%r12) testq %rsi, %rsi - movq %rsi, %rdx cmovns %rsi, %rbx + movq %rsi, %rdx sarq $63, %rdx shrq $58, %rdx sarq $6, %rbx which should help decoder bandwidth and perhaps also cache, though I was not able to measure off-noise effect on SPEC. gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5. (ix86_adjust_cost): Add TODO about znver5 memory latency. (ix86_fuse_mov_alu_p): New. (ix86_macro_fusion_pair_p): Use it. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5. (X86_TUNE_FUSE_MOV_AND_ALU): New tune;
2024-09-03Zen5 tuning part 2: disable gather and scatterJan Hubicka1-6/+6
We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to be more complicated. gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot products in TSVC. Curiously enough breaking these out into microbenchmark reversed the situation and it turns out that the performance depends on how indices are distributed. gather is loss if indices are sequential, neutral if they are random and win for some strides (4, 8). This seems to be similar to earlier zens, so I think (especially for backporting znver5 support) that it makes sense to be conistent and disable gather unless we work out a good heuristics on when to use it. Since we typically do not know the indices in advance, I don't see how that can be done. I opened PR116582 with some examples of wins and loses gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.
2024-09-03Zen5 tuning part 1: avoid FMA chainsJan Hubicka1-4/+5
testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fared well with FMA, it was because of the split registers. Znver5 benefits from avoding FMA on all widths. This may be different with the mobile version though. On naive matrix multiplication benchmark the difference is 8% with -O3 only since with -Ofast loop interchange solves the problem differently. It is 30% win, for example, on S323 from TSVC: real_t s323(struct args_t * func_args) { // recurrences // coupled recurrence initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl = 0; nl < iterations/2; nl++) { for (int i = 1; i < LEN_1D; i++) { a[i] = b[i-1] + c[i] * d[i]; b[i] = a[i] + c[i] * e[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for znver5. (X86_TUNE_AVOID_256FMA_CHAINS): Likewise. (X86_TUNE_AVOID_512FMA_CHAINS): Likewise.
2024-07-08x86: Update branch hint for Redwood Cove.H.J. Lu1-2/+11
According to Intel® 64 and IA-32 Architectures Optimization Reference Manual[1], Branch Hint is updated for Redwood Cove. --------cut from [1]------------------------- Starting with the Redwood Cove microarchitecture, if the predictor has no stored information about a branch, the branch has the Intel® SSE2 branch taken hint (i.e., instruction prefix 3EH), When the codec decodes the branch, it flips the branch’s prediction from not-taken to taken. It then flushes the pipeline in front of it and steers this pipeline to fetch the taken path of the branch. --------cut end ----------------------------- Split tune branch_prediction_hints into branch_prediction_hints_taken and branch_prediction_hints_not_taken, always generate branch hint for conditional branches, both tunes are disabled by default. [1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html gcc/ * config/i386/i386.cc (ix86_print_operand): Always generate branch hint for conditional branches. * config/i386/i386.h (TARGET_BRANCH_PREDICTION_HINTS): Split into .. (TARGET_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and .. (TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this. * config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS): Split into .. (X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and .. (X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.
2024-06-19i386: Zhaoxin shijidadao enablementmayshao1-4/+4
This patch enables -march/-mtune=shijidadao, costs and tunings are set according to the characteristics of the processor. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao. * common/config/i386/i386-common.cc: Add shijidadao. * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): Add ZHAOXIN_FAM7H_SHIJIDADAO. * config.gcc: Add shijidadao. * config/i386/driver-i386.cc (host_detect_local_cpu): Let -march=native recognize shijidadao processors. * config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao. * config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO. (m_SHIJIDADAO): New definition. * config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO. * config/i386/x86-tune-costs.h (struct processor_costs): Add shijidadao_cost. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao. (ix86_adjust_cost): Ditto. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add m_SHIJIDADAO. (X86_TUNE_USE_GATHER_4PARTS): Ditto. (X86_TUNE_USE_GATHER_8PARTS): Ditto. (X86_TUNE_AVOID_128FMA_CHAINS): Ditto. * doc/extend.texi: Add details about shijidadao. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv32.C: Handle new -march * gcc.target/i386/funcspec-56.inc: Ditto.
2024-06-14Remove one_if_conv for latest Intel processors.liuhongt1-2/+2
The tune is added by PR79390 for SciMark2 on Broadwell. For latest GCC, with and without the -mtune-ctrl=^one_if_conv_insn. GCC will generate the same binary for SciMark2. And for SPEC2017, there's no big impact for SKX/CLX/ICX, and small improvements on SPR and later. gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_ONE_IF_CONV_INSN): Remove latest Intel processors. Co-authored by: Lingling Kong <lingling.kong@intel.com>
2024-05-20i386: Remove Xeon Phi ISA supportHaochen Jiang1-52/+41
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Remove Xeon Phi cpus. (get_available_features): Remove Xeon Phi ISAs. * common/config/i386/i386-common.cc (OPTION_MASK_ISA_AVX512PF_SET): Removed. (OPTION_MASK_ISA_AVX512ER_SET): Ditto. (OPTION_MASK_ISA2_AVX5124FMAPS_SET): Ditto. (OPTION_MASK_ISA2_AVX5124VNNIW_SET): Ditto. (OPTION_MASK_ISA_PREFETCHWT1_SET): Ditto. (OPTION_MASK_ISA_AVX512F_UNSET): Remove AVX512PF and AVX512ER. (OPTION_MASK_ISA_AVX512PF_UNSET): Removed. (OPTION_MASK_ISA_AVX512ER_UNSET): Ditto. (OPTION_MASK_ISA2_AVX5124FMAPS_UNSET): Ditto. (OPTION_MASK_ISA2_AVX5124VNNIW_UNSET): Ditto. (OPTION_MASK_ISA_PREFETCHWT1_UNSET): Ditto. (OPTION_MASK_ISA2_AVX512F_UNSET): Remove AVX5124FMAPS and AVX5125VNNIW. (ix86_handle_option): Remove Xeon Phi options. (processor_names): Remove Xeon Phi cpus. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Ditto. (enum processor_features): Remove Xeon Phi ISAs. * common/config/i386/i386-isas.h: Ditto. * config.gcc: Remove Xeon Phi cpus and ISAs. * config/i386/avx5124fmapsintrin.h: Remove intrin support. * config/i386/avx5124vnniwintrin.h: Ditto. * config/i386/avx512erintrin.h: Ditto. * config/i386/avx512pfintrin.h: Ditto. * config/i386/cpuid.h (bit_AVX512PF): Removed. (bit_AVX512ER): Ditto. (bit_PREFETCHWT1): Ditto. (bit_AVX5124VNNIW): Ditto. (bit_AVX5124FMAPS): Ditto. * config/i386/driver-i386.cc (host_detect_local_cpu): Remove Xeon Phi. * config/i386/i386-builtin-types.def: Remove unused types. * config/i386/i386-builtin.def (BDESC): Remove builtins. * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Ditto. * config/i386/i386-c.cc (ix86_target_macros_internal): Remove Xeon Phi cpus and ISAs. * config/i386/i386-expand.cc (ix86_expand_builtin): Remove Xeon Phi related handlers. (ix86_emit_swdivsf): Ditto. (ix86_emit_swsqrtsf): Ditto. * config/i386/i386-isa.def: Remove Xeon Phi ISAs. * config/i386/i386-options.cc (m_KNL): Removed. (m_KNM): Ditto. (isa2_opts): Remove Xeon Phi ISAs. (isa_opts): Ditto. (processor_cost_table): Remove Xeon Phi cpus. (ix86_valid_target_attribute_inner_p): Remove Xeon Phi ISAs. (ix86_option_override_internal): Remove Xeon Phi related handlers. * config/i386/i386-rust.cc (ix86_rust_target_cpu_info): Remove Xeon Phi ISAs. * config/i386/i386.cc (ix86_hard_regno_mode_ok): Remove Xeon Phi related handler. * config/i386/i386.h (TARGET_EMIT_VZEROUPPER): Removed. (enum processor_type): Remove Xeon Phi cpus. * config/i386/i386.md (prefetch): Remove PREFETCHWT1. (*prefetch_3dnow): Ditto. (*prefetch_prefetchwt1): Removed. * config/i386/i386.opt: Remove Xeon Phi ISAs. * config/i386/immintrin.h: Ditto. * config/i386/sse.md (VF1_AVX512ER_128_256): Removed. (rsqrt<mode>2): Change iterator from VF1_AVX512ER_128_256 to VF1_128_256. (GATHER_SCATTER_SF_MEM_MODE): Removed. (avx512pf_gatherpf<mode>sf): Ditto. (*avx512pf_gatherpf<VI48_512:mode>sf_mask): Ditto. (avx512pf_gatherpf<mode>df): Ditto. (*avx512pf_gatherpf<VI4_256_8_512:mode>df_mask): Ditto. (avx512pf_scatterpf<mode>sf): Ditto. (*avx512pf_scatterpf<VI48_512:mode>sf_mask): Ditto. (avx512pf_scatterpf<mode>df): Ditto. (*avx512pf_scatterpf<VI4_256_8_512:mode>df_mask): Ditto. (exp2<mode>2): Ditto. (avx512er_exp2<mode><mask_name><round_saeonly_name>): Ditto. (<mask_codefor>avx512er_rcp28<mode><mask_name><round_saeonly_name>): Ditto. (avx512er_vmrcp28<mode><mask_name><round_saeonly_name>): Ditto. (<mask_codefor>avx512er_rsqrt28<mode><mask_name><round_saeonly_name>): Ditto. (avx512er_vmrsqrt28<mode><mask_name><round_saeonly_name>): Ditto. (IMOD4): Ditto. (imod4_narrow): Ditto. (mov<mode>): Ditto. (*mov<mode>_internal): Ditto. (avx5124fmaddps_4fmaddps): Ditto. (avx5124fmaddps_4fmaddps_mask): Ditto. (avx5124fmaddps_4fmaddps_maskz): Ditto. (avx5124fmaddps_4fmaddss): Ditto. (avx5124fmaddps_4fmaddss_mask): Ditto. (avx5124fmaddps_4fmaddss_maskz): Ditto. (avx5124fmaddps_4fnmaddps): Ditto. (avx5124fmaddps_4fnmaddps_mask): Ditto. (avx5124fmaddps_4fnmaddps_maskz): Ditto. (avx5124fmaddps_4fnmaddss): Ditto. (avx5124fmaddps_4fnmaddss_mask): Ditto. (avx5124fmaddps_4fnmaddss_maskz): Ditto. (avx5124vnniw_vp4dpwssd): Ditto. (avx5124vnniw_vp4dpwssd_mask): Ditto. (avx5124vnniw_vp4dpwssd_maskz): Ditto. (avx5124vnniw_vp4dpwssds): Ditto. (avx5124vnniw_vp4dpwssds_mask): Ditto. (avx5124vnniw_vp4dpwssds_maskz): Ditto. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Remove Xeon Phi cpus. (ix86_adjust_cost): Ditto. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Ditto. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_MOVX): Ditto. (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto. (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto. (X86_TUNE_FOUR_JUMP_LIMIT): Ditto. (X86_TUNE_USE_INCDEC): Ditto. (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto. (X86_TUNE_OPT_AGU): Ditto. (X86_TUNE_AVOID_LEA_FOR_ADDR): Ditto. (X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE): Ditto. (X86_TUNE_USE_SAHF): Ditto. (X86_TUNE_USE_CLTD): Ditto. (X86_TUNE_USE_BT): Ditto. (X86_TUNE_ONE_IF_CONV_INSN): Ditto. (X86_TUNE_EXPAND_ABS): Ditto. (X86_TUNE_USE_SIMODE_FIOP): Ditto. (X86_TUNE_EXT_80387_CONSTANTS): Ditto. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto. (X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS): Ditto. (X86_TUNE_SLOW_PSHUFB): Ditto. (X86_TUNE_EMIT_VZEROUPPER): Removed. * config/i386/xmmintrin.h (enum _mm_hint): Remove _MM_HINT_ET1. * doc/extend.texi: Remove Xeon Phi. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Remove Xeon Phi ISAs. * g++.dg/other/i386-3.C: Ditto. * g++.target/i386/mv28.C: Ditto. * gcc.target/i386/builtin_target.c: Ditto. * gcc.target/i386/sse-12.c: Ditto. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-26.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: Removed. * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssds-2.c: Ditto. * gcc.target/i386/avx512er-check.h: Ditto. * gcc.target/i386/avx512er-vexp2pd-1.c: Ditto. * gcc.target/i386/avx512er-vexp2pd-2.c: Ditto. * gcc.target/i386/avx512er-vexp2ps-1.c: Ditto. * gcc.target/i386/avx512er-vexp2ps-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28pd-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28pd-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-3.c: Ditto. * gcc.target/i386/avx512er-vrcp28ps-4.c: Ditto. * gcc.target/i386/avx512er-vrcp28sd-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28sd-2.c: Ditto. * gcc.target/i386/avx512er-vrcp28ss-1.c: Ditto. * gcc.target/i386/avx512er-vrcp28ss-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28pd-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28pd-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-3.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-4.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-5.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ps-6.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28sd-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28sd-2.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ss-1.c: Ditto. * gcc.target/i386/avx512er-vrsqrt28ss-2.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto. * gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Ditto. * gcc.target/i386/pr104448.c: Ditto. * gcc.target/i386/pr82941-2.c: Ditto. * gcc.target/i386/pr82942-2.c: Ditto. * gcc.target/i386/pr82990-1.c: Ditto. * gcc.target/i386/pr82990-3.c: Ditto. * gcc.target/i386/pr82990-6.c: Ditto. * gcc.target/i386/pr82990-7.c: Ditto. * gcc.target/i386/pr89523-5.c: Ditto. * gcc.target/i386/pr89523-6.c: Ditto. * gcc.target/i386/pr91033.c: Ditto. * gcc.target/i386/prefetchwt1-1.c: Ditto.
2024-03-26cfgloopmanip, i386: Fix comment typosJakub Jelinek1-1/+1
I've noticed a comment typo in x86-tune.def and cfgloopmanip.cc has the same typo as well. 2024-03-26 Jakub Jelinek <jakub@redhat.com> * cfgloopmanip.cc (update_loop_exit_probability_scale_dom_bbs): Fix comment typo - multple -> multiple. * config/i386/x86-tune.def (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Likewise.
2024-03-18Add AMD znver5 processor enablement with scheduler modelJan Hubicka1-2/+2
2024-02-14 Jan Hubicka <jh@suse.cz> Karthiban Anbazhagan <Karthiban.Anbazhagan@amd.com> gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver5. * common/config/i386/i386-common.cc (processor_names): Add znver5. (processor_alias_table): Likewise. * common/config/i386/i386-cpuinfo.h (processor_types): Add new zen family. (processor_subtypes): Add znver5. * config.gcc (x86_64-*-* |...): Likewise. * config/i386/driver-i386.cc (host_detect_local_cpu): Let march=native detect znver5 cpu's. * config/i386/i386-c.cc (ix86_target_macros_internal): Add znver5. * config/i386/i386-options.cc (m_ZNVER5): New definition (processor_cost_table): Add znver5. * config/i386/i386.cc (ix86_reassociation_width): Likewise. * config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER5 (PTA_ZNVER5): New definition. * config/i386/i386.md (define_attr "cpu"): Add znver5. (Scheduling descriptions) Add znver5.md. * config/i386/x86-tune-costs.h (znver5_cost): New definition. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver5. (ix86_adjust_cost): Likewise. * config/i386/x86-tune.def (avx512_move_by_pieces): Add m_ZNVER5. (avx512_store_by_pieces): Add m_ZNVER5. * doc/extend.texi: Add znver5. * doc/invoke.texi: Likewise. * config/i386/znver4.md: Rename to zn4zn5.md; combine znver4 and znver5 Scheduler. gcc/testsuite/ChangeLog: * g++.target/i386/mv29.C: Handle znver5 arch. * gcc.target/i386/funcspec-56.inc:Likewise.
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-29Disable FMADD in chains for Zen4 and genericJan Hubicka1-4/+4
this patch disables use of FMA in matrix multiplication loop for generic (for x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U. For Intel this is neutral both on the matrix multiplication microbenchmark (attached) and spec2k17 where the difference was within noise for Core. On core the micro-benchmark runs as follows: With FMA: 578,500,241 cycles:u # 3.645 GHz ( +- 0.12% ) 753,318,477 instructions:u # 1.30 insn per cycle ( +- 0.00% ) 125,417,701 branches:u # 790.227 M/sec ( +- 0.00% ) 0.159146 +- 0.000363 seconds time elapsed ( +- 0.23% ) No FMA: 577,573,960 cycles:u # 3.514 GHz ( +- 0.15% ) 878,318,479 instructions:u # 1.52 insn per cycle ( +- 0.00% ) 125,417,702 branches:u # 763.035 M/sec ( +- 0.00% ) 0.164734 +- 0.000321 seconds time elapsed ( +- 0.19% ) So the cycle count is unchanged and discrete multiply+add takes same time as FMA. While on zen: With FMA: 484875179 cycles:u # 3.599 GHz ( +- 0.05% ) (82.11%) 752031517 instructions:u # 1.55 insn per cycle 125106525 branches:u # 928.712 M/sec ( +- 0.03% ) (85.09%) 128356 branch-misses:u # 0.10% of all branches ( +- 0.06% ) (83.58%) No FMA: 375875209 cycles:u # 3.592 GHz ( +- 0.08% ) (80.74%) 875725341 instructions:u # 2.33 insn per cycle 124903825 branches:u # 1.194 G/sec ( +- 0.04% ) (84.59%) 0.105203 +- 0.000188 seconds time elapsed ( +- 0.18% ) The diffrerence is that Cores understand the fact that fmadd does not need all three parameters to start computation, while Zen cores doesn't. Since this seems noticeable win on zen and not loss on Core it seems like good default for generic. float a[SIZE][SIZE]; float b[SIZE][SIZE]; float c[SIZE][SIZE]; void init(void) { int i, j, k; for(i=0; i<SIZE; ++i) { for(j=0; j<SIZE; ++j) { a[i][j] = (float)i + j; b[i][j] = (float)i - j; c[i][j] = 0.0f; } } } void mult(void) { int i, j, k; for(i=0; i<SIZE; ++i) { for(j=0; j<SIZE; ++j) { for(k=0; k<SIZE; ++k) { c[i][j] += a[i][k] * b[k][j]; } } } } int main(void) { clock_t s, e; init(); s=clock(); mult(); e=clock(); printf(" mult took %10d clocks\n", (int)(e-s)); return 0; } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS, X86_TUNE_AVOID_256FMA_CHAINS): Enable for znver4 and Core.
2023-10-30i386: Zhaoxin yongfeng enablementMayshao1-37/+38
Enable -march/-mtune=yongfeng. Costs and tunings are set according to the characteristics of the processor. Add a new .md file to describe yongfeng processor. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize yongfeng. * common/config/i386/i386-common.cc: Add yongfeng. * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): Add ZHAOXIN_FAM7H_YONGFENG. * config.gcc: Add yongfeng. * config/i386/driver-i386.cc (host_detect_local_cpu): Let -march=native recognize yongfeng processors. * config/i386/i386-c.cc (ix86_target_macros_internal): Add yongfeng. * config/i386/i386-options.cc (m_YONGFENG): New definition. (m_ZHAOXIN): Ditto. * config/i386/i386.h (enum processor_type): Add PROCESSOR_YONGFENG. * config/i386/i386.md: Add yongfeng. * config/i386/lujiazui.md: Fix typo. * config/i386/x86-tune-costs.h (struct processor_costs): Add yongfeng costs. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng. (ix86_adjust_cost): Ditto. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace m_LUJIAZUI with m_ZHAOXIN. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_MOVX): Ditto. (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto. (X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto. (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto. (X86_TUNE_USE_LEAVE): Ditto. (X86_TUNE_PUSH_MEMORY): Ditto. (X86_TUNE_LCP_STALL): Ditto. (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto. (X86_TUNE_OPT_AGU): Ditto. (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto. (X86_TUNE_USE_SAHF): Ditto. (X86_TUNE_USE_BT): Ditto. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto. (X86_TUNE_ONE_IF_CONV_INSN): Ditto. (X86_TUNE_AVOID_MFENCE): Ditto. (X86_TUNE_EXPAND_ABS): Ditto. (X86_TUNE_USE_SIMODE_FIOP): Ditto. (X86_TUNE_USE_FFREEP): Ditto. (X86_TUNE_EXT_80387_CONSTANTS): Ditto. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto. (X86_TUNE_SSE_TYPELESS_STORES): Ditto. (X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto. (X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG. (X86_TUNE_USE_GATHER_4PARTS): Ditto. (X86_TUNE_USE_GATHER_8PARTS): Ditto. (X86_TUNE_AVOID_128FMA_CHAINS): Ditto. * doc/extend.texi: Add details about yongfeng. * doc/invoke.texi: Ditto. * config/i386/yongfeng.md: New file to describe yongfeng processor. gcc/testsuite/ChangeLog: * g++.target/i386/mv32.C: Handle new -march. * gcc.target/i386/funcspec-56.inc: Ditto.
2023-10-27i386: Fiy typo in "partial_memory_read_stall" tune option.Uros Bizjak1-1/+1
gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL): i386: Fiy typo in "partial_memory_read_stall" tune option.
2023-10-25i386: Narrow test instructions with immediate operands [PR111698]Uros Bizjak1-0/+8
Narrow test instructions with immediate operand that test memory location for zero. E.g. testl $0x00aa0000, mem can be converted to testb $0xaa, mem+2. Reject targets where reading (possibly unaligned) part of memory location after a large write to the same address causes store-to-load forwarding stall. PR target/111698 gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL): New tune. * config/i386/i386.h (TARGET_PARTIAL_MEMORY_READ_STALL): New macro. * config/i386/i386.md: New peephole pattern to narrow test instructions with immediate operands that test memory locations for zero. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111698.c: New test.
2023-10-18x86: Add m_CORE_HYBRID for hybrid clients tuningHaochen Jiang1-62/+51
gcc/Changelog: * config/i386/i386-options.cc (m_CORE_HYBRID): New. * config/i386/x86-tune.def: Replace hybrid client tune to m_CORE_HYBRID.
2023-10-09i386: Implement doubleword right shifts by 1 bit using s[ha]r+rcr.Roger Sayle1-0/+3
This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr functions to implement doubleword right shifts by 1 bit, using a shift of the highpart that sets the carry flag followed by a rotate-carry-right (RCR) instruction on the lowpart. Conceptually this is similar to the recent left shift patch, but with two complicating factors. The first is that although the RCR sequence is shorter, and is a ~3x performance improvement on AMD, my microbenchmarking shows it ~10% slower on Intel. Hence this patch also introduces a new X86_TUNE_USE_RCR tuning parameter. The second is that I believe this is the first time a "rotate-right-through-carry" and a right shift that sets the carry flag from the least significant bit has been modelled in GCC RTL (on a MODE_CC target). For this I've used the i386 back-end's UNSPEC_CC_NE which seems appropriate. Finally rcrsi2 and rcrdi2 are separate define_insns so that we can use their generator functions. For the pair of functions: unsigned __int128 foo(unsigned __int128 x) { return x >> 1; } __int128 bar(__int128 x) { return x >> 1; } with -O2 -march=znver4 we previously generated: foo: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax shrq %rdx ret bar: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax sarq %rdx ret with this patch we now generate: foo: movq %rsi, %rdx movq %rdi, %rax shrq %rdx rcrq %rax ret bar: movq %rsi, %rdx movq %rdi, %rax sarq %rdx rcrq %rax ret 2023-10-09 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_split_ashr): Split shifts by one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. (ix86_split_lshr): Likewise, split shifts by one bit into lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. * config/i386/i386.h (TARGET_USE_RCR): New backend macro. * config/i386/i386.md (rcrsi2): New define_insn for rcrl. (rcrdi2): New define_insn for rcrq. (<anyshiftrt><mode>3_carry): New define_insn for right shifts that set the carry flag from the least significant bit, modelled using UNSPEC_CC_NE. * config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter controlling use of rcr 1 vs. shrd, which is significantly faster on AMD processors. gcc/testsuite/ChangeLog * gcc.target/i386/rcr-1.c: New 64-bit test case. * gcc.target/i386/rcr-2.c: New 32-bit test case.
2023-08-24Fix target_clone ("arch=graniterapids-d") and target_clone ("arch=arrowlake-s")liuhongt1-31/+32
Both "graniterapid-d" and "graniterapids" are attached with PROCESSOR_GRANITERAPID in processor_alias_table but mapped to different __cpu_subtype in get_intel_cpu. And get_builtin_code_for_version will try to match the first PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to "granitepraids" here. 861 else if (new_target->arch_specified && new_target->arch > 0) 1862 for (i = 0; i < pta_size; i++) 1863 if (processor_alias_table[i].processor == new_target->arch) 1864 { 1865 const pta *arch_info = &processor_alias_table[i]; 1866 switch (arch_info->priority) 1867 { 1868 default: 1869 arg_str = arch_info->name; This mismatch makes dispatch_function_versions check the preidcate of__builtin_cpu_is ("graniterapids") for "graniterapids-d" and causes the issue. The patch explicitly adds PROCESSOR_ARROWLAKE_S and PROCESSOR_GRANITERAPIDS_D to make a distinction. For "alderlake","raptorlake", "meteorlake" they share same isa, cost, tuning, and mapped to the same __cpu_type/__cpu_subtype in get_intel_cpu, so no need to add PROCESSOR_RAPTORLAKE and others. gcc/ChangeLog: * common/config/i386/i386-common.cc (processor_names): Add new member graniterapids-s and arrowlake-s. * config/i386/i386-options.cc (processor_alias_table): Update table with PROCESSOR_ARROWLAKE_S and PROCESSOR_GRANITERAPIDS_D. (m_GRANITERAPID_D): New macro. (m_ARROWLAKE_S): Ditto. (m_CORE_AVX512): Add m_GRANITERAPIDS_D. (processor_cost_table): Add icelake_cost for PROCESSOR_GRANITERAPIDS_D and alderlake_cost for PROCESSOR_ARROWLAKE_S. * config/i386/x86-tune.def: Hanlde m_ARROWLAKE_S same as m_ARROWLAKE. * config/i386/i386.h (enum processor_type): Add new member PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S
2023-08-16Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all ↵liuhongt1-2/+2
gather/scatter instructions Rename original use_gather to use_gather_8parts, Support -mtune-ctrl={,^}use_gather to set/clear tune features use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather as alias of -mtune-ctrl=, use_gather, ^use_gather. Similar for use_scatter. gcc/ChangeLog: * config/i386/i386-builtins.cc (ix86_vectorize_builtin_gather): Adjust for use_gather_8parts. * config/i386/i386-options.cc (parse_mtune_ctrl_str): Set/Clear tune features use_{gather,scatter}_{2parts, 4parts, 8parts} for -mtune-crtl={,^}{use_gather,use_scatter}. * config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust for use_scatter_8parts * config/i386/i386.h (TARGET_USE_GATHER): Rename to .. (TARGET_USE_GATHER_8PARTS): .. this. (TARGET_USE_SCATTER): Rename to .. (TARGET_USE_SCATTER_8PARTS): .. this. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to (X86_TUNE_USE_GATHER_8PARTS): .. this. (X86_TUNE_USE_SCATTER): Rename to (X86_TUNE_USE_SCATTER_8PARTS): .. this. * config/i386/i386.opt: Add new options mgather, mscatter.
2023-08-16Software mitigation: Disable gather generation in vectorization for GDS ↵liuhongt1-3/+3
affected Intel Processors. For more details of GDS (Gather Data Sampling), refer to https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html After microcode update, there's performance regression. To avoid that, the patch disables gather generation in autovectorization but uses gather scalar emulation instead. gcc/ChangeLog: * config/i386/i386-options.cc (m_GDS): New macro. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Don't enable for m_GDS. (X86_TUNE_USE_GATHER_4PARTS): Ditto. (X86_TUNE_USE_GATHER): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-gather-2.c: Adjust options to keep gather vectorization. * gcc.target/i386/avx2-gather-6.c: Ditto. * gcc.target/i386/avx512f-pr88464-1.c: Ditto. * gcc.target/i386/avx512f-pr88464-5.c: Ditto. * gcc.target/i386/avx512vl-pr88464-1.c: Ditto. * gcc.target/i386/avx512vl-pr88464-11.c: Ditto. * gcc.target/i386/avx512vl-pr88464-3.c: Ditto. * gcc.target/i386/avx512vl-pr88464-9.c: Ditto. * gcc.target/i386/pr88531-1b.c: Ditto. * gcc.target/i386/pr88531-1c.c: Ditto.
2023-07-17Initial Lunar Lake, Arrow Lake and Arrow Lake S SupportMo, Zewei1-39/+53
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Lunar Lake, Arrow Lake and Arrow Lake S. * common/config/i386/i386-common.cc: (processor_name): Add arrowlake. (processor_alias_table): Add arrow lake, arrow lake s and lunar lake. * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): Add INTEL_COREI7_ARROWLAKE and INTEL_COREI7_ARROWLAKE_S. * config.gcc: Add -march=arrowlake and -march=arrowlake-s. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle arrowlake-s. * config/i386/i386-c.cc (ix86_target_macros_internal): Add arrowlake. * config/i386/i386-options.cc (m_ARROWLAKE): New. (processor_cost_table): Add arrowlake. * config/i386/i386.h (enum processor_type): Add PROCESSOR_ARROWLAKE. * config/i386/x86-tune.def: Add m_ARROWLAKE. * doc/extend.texi: Add arrowlake and arrowlake-s. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Add arrowlake and arrowlake-s. * gcc.target/i386/funcspec-56.inc: Handle new march.
2023-06-07Add support for stc and cmc instructions in i386.mdRoger Sayle1-0/+4
This patch is the latest revision of my patch to add support for the STC (set carry flag) and CMC (complement carry flag) instructions to the i386 backend, incorporating Uros' previous feedback. The significant changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern, (iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate implementations on pentium4 (which has a notoriously slow STC) when not optimizing for size. An example of the use of the stc instruction is: unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) { return __builtin_ia32_addcarryx_u32 (1, a, b, c); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret with this patch now generates: stc adcl %esi, %edi setc %al movl %edi, (%rdx) movzbl %al, %eax ret An example of the use of the cmc instruction (where the carry from a first adc is inverted/complemented as input to a second adc) is: unsigned int bar (unsigned int a, unsigned int b, unsigned int c, unsigned int d) { unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1); return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2); } which previously generated: movl $1, %eax addb $-1, %al adcl %esi, %edi setnc %al movl %edi, o1(%rip) addb $-1, %al adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret and now generates: stc adcl %esi, %edi cmc movl %edi, o1(%rip) adcl %ecx, %edx setc %al movl %edx, o2(%rip) movzbl %al, %eax ret This version implements Uros' suggestions/refinements. (i) Avoid the UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use peephole2s to convert x86_stc and *x86_cmc into alternate forms on TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al) over negqi_ccc_1 (neg %al) for setting the carry from a QImode value, These changes required two minor edits to i386.cc: ix86_cc_mode had to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and *x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that combine would appreciate that this complex RTL expression was actually a fast, single byte instruction [i.e. preferable]. 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>: Use new x86_stc instruction when the carry flag must be set. * config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc. (ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc. * config/i386/i386.h (TARGET_SLOW_STC): New define. * config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc. (x86_stc): New define_insn. (define_peephole2): Convert x86_stc into alternate implementation on pentium4 without -Os when a QImode register is available. (*x86_cmc): New define_insn. (define_peephole2): Convert *x86_cmc into alternate implementation on pentium4 without -Os when a QImode register is available. (*setccc): New define_insn_and_split for a no-op CCCmode move. (*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to recognize (and eliminate) the carry flag being copied to itself. (*setcc_qi_negqi_ccc_2_<mode>): Likewise. * config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag. gcc/testsuite/ChangeLog * gcc.target/i386/cmc-1.c: New test case. * gcc.target/i386/stc-1.c: Likewise.
2023-05-27Disable avoid_false_dep_for_bmi for atom and icelake(and later) core processors.liuhongt1-1/+2
lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since icelake. At least for icelake and later intel Core processors, the errata tune is not needed. And the tune isn't need for ATOM either. gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Remove ATOM and ICELAKE(and later) core processors.
2023-03-06Enable scatter for genericJan Hubicka1-3/+3
2023-03-06 Jan Hubicka <hubicka@ucw.cz> PR target/108429 * config/i386/x86-tune.def (X86_TUNE_USE_SCATTER_2PARTS): Enable for generic. (X86_TUNE_USE_SCATTER_4PARTS): Likewise. (X86_TUNE_USE_SCATTER): Likewise.
2023-02-07Enable 512 bit vector for zen4Jan Hubicka1-1/+1
While internally 512 registers are splits into two 256 halves, 512 bit vectors reduces number of instructions to retire and has chance to improve paralelism. There are few tsvc benchmarks that improves significantly: runtime benchmark 256bit 512bit s2275 48.57 20.67 -58% s311 32.29 16.06 -50% s312 32.30 16.07 -50% vsumr 32.30 16.07 -50% s314 10.77 5.42 -50% s313 21.52 10.85 -50% vdotr 43.05 21.69 -50% s316 10.80 5.64 -48% s235 61.72 33.91 -45% s161 15.91 9.95 -38% s3251 32.13 20.31 -36% And there are no benchmarks with off-noise regression. The basic matrix multiplication loop improves by 32%. It is also expected that 512 bit vectors are more power effecient (I can't masure that). The down side is that loops with low trip counts may get slower when the unvectorized prologue and epilogue is hit more often. With SPECfp this problem happens with x264 (12% regression) and bwaves (6% regression) and this is tracked in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 and will need more work on vectorizer to support masked epilogues. After some additional testing it seems that using 512 bit vectors by default is now overall better choice. Bootstrapped/regtested x86_64-linux. Plan to commit it tomorrow. * config/i386/x86-tune.def (X86_TUNE_AVX256_OPTIMAL): Turn off for znver4.
2023-01-16Disable gather/scatter for zen4Jan Hubicka1-4/+19
this patch adds more tunes for zen4: - new tunes for avx512 scater instructions. In micro benchmarks these seems consistent loss compared to open-coded coe - disable use of gather for zen4 While these are win for a micro benchmarks (based on TSVC), enabling gather is a loss for parest. So for now it seems safe to keep it off. - disable pass to avoid FMA chains for znver4 since fmadd was optimized and does not seem to cause regressions. * config/i386/i386.cc (ix86_vectorize_builtin_scatter): Guard scatter by TARGET_USE_SCATTER. * config/i386/i386.h (TARGET_USE_SCATTER_2PARTS, TARGET_USE_SCATTER_4PARTS, TARGET_USE_SCATTER): New macros. * config/i386/x86-tune.def (TARGET_USE_SCATTER_2PARTS, TARGET_USE_SCATTER_4PARTS, TARGET_USE_SCATTER): New tunes. (X86_TUNE_AVOID_256FMA_CHAINS, X86_TUNE_AVOID_512FMA_CHAINS): Disable for znver4. (X86_TUNE_USE_GATHER): Disable for zen4.
2023-01-16Update copyright years.Jakub Jelinek1-1/+1
2022-12-22Zen4 tuning part 2Jan Hubicka1-8/+15
Adds tunes needed for zen4 microarchitecture. I added two new knobs. TARGET_AVX512_SPLIT_REGS which is used to specify that internally 512 vectors are split to 256 vectors. This affects vectorization costs and reassociation width. It probably should also affect RTX costs however I doubt it is very useful since RTL optimizers are usually not judging between 256 and 512 vectors. I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this flag may not be a win except for very specific benchmarks. I am still doing some more detailed testing here. Oherwise I disabled gathers on zen4 for 2 parts nad 4 parts. We can open code them and since the latencies has only increased since zen3 opencoding is better than actual instrucction. This shows at 4 tsvc benchmarks. I ended up setting AVX256_OPTIMAL. This is a compromise. There are some tsvc benchmarks that increase noticeably (up to 250%) however there are also few regressions. Most of these can be solved by incrasing vec_perm cost in the vectorizer. However this does not cure about 14% regression on x264 that is quite important. Here we produce vectorized loops for avx512 that probably would be faster if the loops in question had high enough iteration count. We hit this problem with avx256 too: since the loop iterates few times, only prologues/epilogues are used. Adding another round of prologue/epilogue code does not make it better. Finally I enabled avx stores for constnat sized memcpy and memset. I am not sure why this is an opt-in feature. I think for most hardware this is a win. gcc/ChangeLog: 2022-12-22 Jan Hubicka <hubicka@ucw.cz> * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Add TARGET_AVX512_SPLIT_REGS * config/i386/i386-options.cc (ix86_option_override_internal): Honor x86_TONE_AVOID_256FMA_CHAINS. * config/i386/i386.cc (ix86_vec_cost): Honor TARGET_AVX512_SPLIT_REGS. (ix86_reassociation_width): Likewise. * config/i386/i386.h (TARGET_AVX512_SPLIT_REGS): New tune. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for znver4. (X86_TUNE_USE_GATHER_4PARTS): Likewise. (X86_TUNE_AVOID_256FMA_CHAINS): Set for znver4. (X86_TUNE_AVOID_512FMA_CHAINS): New utne; set for znver4. (X86_TUNE_AVX256_OPTIMAL): Add znver4. (X86_TUNE_AVX512_SPLIT_REGS): New tune. (X86_TUNE_AVX256_MOVE_BY_PIECES): Add znver1-3. (X86_TUNE_AVX256_STORE_BY_PIECES): Add znver1-3. (X86_TUNE_AVX512_MOVE_BY_PIECES): Add znver4. (X86_TUNE_AVX512_STORE_BY_PIECES): Add znver4.
2022-12-07i386: Avoid fma_chain for -march=alderlake and sapphirerapids.Hongyu Wang1-1/+2
For Alderlake there is similar issue like PR 81616, enable avoid_fma256_chain will also benefit on Intel latest platforms Alderlake and Sapphire Rapids. gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS): Add m_SAPPHIRERAPIDS, m_ALDERLAKE and m_CORE_ATOM.
2022-11-17x86: Enable 256 move by pieces for ALDERLAKE machine.Lili Cui1-2/+2
gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVX256_MOVE_BY_PIECES): Add alderlake. (X86_TUNE_AVX256_STORE_BY_PIECES): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pieces-memset-50.c: New test.
2022-11-08Add m_CORE_ATOM for atom coresHaochen Jiang1-31/+40
gcc/ChangeLog: * config/i386/i386-options.cc (m_CORE_ATOM): New. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Initial tune for CORE_ATOM. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_DEST_FALSE_DEP_FOR_GLC): Ditto. (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto. (X86_TUNE_USE_LEAVE): Ditto. (X86_TUNE_PUSH_MEMORY): Ditto. (X86_TUNE_USE_INCDEC): Ditto. (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto. (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto. (X86_TUNE_USE_SAHF): Ditto. (X86_TUNE_USE_BT): Ditto. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto. (X86_TUNE_ONE_IF_CONV_INSN): Ditto. (X86_TUNE_AVOID_MFENCE): Ditto. (X86_TUNE_USE_SIMODE_FIOP): Ditto. (X86_TUNE_EXT_80387_CONSTANTS): Ditto. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto. (X86_TUNE_SSE_TYPELESS_STORES): Ditto. (X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto. (X86_TUNE_AVOID_4BYTE_PREFIXES): Ditto. (X86_TUNE_USE_GATHER_2PARTS): Ditto. (X86_TUNE_USE_GATHER_4PARTS): Ditto. (X86_TUNE_USE_GATHER): Ditto.
2022-05-23[x86_64]: Zhaoxin lujiazui enablementMayshao1-42/+47
This patch fix Zhaoxin CPU vendor ID detection problem and add zhaoxin "lujiazui" processor support. Currently gcc can't recognize Zhaoxin CPU (vendor ID "CentaurHauls" and "Shanghai") if user use -march=native option, which is confusing for users. This patch enables -march=native in zhaoxin family 7th processor and -march/-mtune=lujiazui, costs and tunning are set according to the characteristics of the processor. We add a new md file to describe lujiazui pipeline. Testing: Bootstrap is ok, and no regressions for i386/x86-64 testsuite. Background: Related Zhaoxin linux kernel patch can be found at: https://lore.kernel.org/lkml/01042674b2f741b2aed1f797359bdffb@zhaoxin.com/ Related Zhaoxin glibc patch can be found at: https://sourceware.org/git/?p=glibc.git;a=commit;h=32ac0b988466785d6e3cc1dffc364bb26fc63193 gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Detect the specific type of Zhaoxin CPU, and return Zhaoxin CPU name. (cpu_indicator_init): Handle Zhaoxin processors. * common/config/i386/i386-common.cc: Add lujiazui. * common/config/i386/i386-cpuinfo.h (enum processor_vendor): Add VENDOR_ZHAOXIN. (enum processor_types): Add ZHAOXIN_FAM7H. (enum processor_subtypes): Add ZHAOXIN_FAM7H_LUJIAZUI. * config.gcc: Add lujiazui. * config/i386/cpuid.h (signature_SHANGHAI_ebx): Add Signatures for zhaoxin (signature_SHANGHAI_ecx): Ditto. (signature_SHANGHAI_edx): Ditto. * config/i386/driver-i386.cc (host_detect_local_cpu): Let -march=native recognize lujiazui processors. * config/i386/i386-c.cc (ix86_target_macros_internal): Add lujiazui. * config/i386/i386-options.cc (m_LUJIAZUI): New_definition. * config/i386/i386.h (enum processor_type): Ditto. * config/i386/i386.md: Add lujiazui. * config/i386/x86-tune-costs.h (struct processor_costs): Add lujiazui costs. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add lujiazui. (ix86_adjust_cost): Ditto. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Add lujiazui Tunnings. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_MOVX): Ditto. (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto. (X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto. (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto. (X86_TUNE_USE_LEAVE): Ditto. (X86_TUNE_PUSH_MEMORY): Ditto. (X86_TUNE_LCP_STALL): Ditto. (X86_TUNE_USE_INCDEC): Ditto. (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto. (X86_TUNE_OPT_AGU): Ditto. (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto. (X86_TUNE_USE_SAHF): Ditto. (X86_TUNE_USE_BT): Ditto. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto. (X86_TUNE_ONE_IF_CONV_INSN): Ditto. (X86_TUNE_AVOID_MFENCE): Ditto. (X86_TUNE_EXPAND_ABS): Ditto. (X86_TUNE_USE_SIMODE_FIOP): Ditto. (X86_TUNE_USE_FFREEP): Ditto. (X86_TUNE_EXT_80387_CONSTANTS): Ditto. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto. (X86_TUNE_SSE_TYPELESS_STORES): Ditto. (X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto. * doc/extend.texi: Add details about lujiazui. * doc/invoke.texi: Add details about lujiazui. * config/i386/lujiazui.md: Introduce lujiazui cpu and include new md file. gcc/testsuite/ChangeLog: * gcc.target/i386/funcspec-56.inc: Test -arch=lujiauzi and -tune=lujiazui. * g++.target/i386/mv32.C: Ditto. Signed-off-by: mayshao <mayshao-oc@zhaoxin.com>
2022-03-29Disable gathers for znver3 for vectors with 2 or 4 elementsJan Hubicka1-1/+12
gcc/ChangeLog: 2022-03-28 Jan Hubicka <hubicka@ucw.cz> * config/i386/i386-builtins.cc (ix86_vectorize_builtin_gather): Test TARGET_USE_GATHER_2PARTS and TARGET_USE_GATHER_4PARTS. * config/i386/i386.h (TARGET_USE_GATHER_2PARTS): New macro. (TARGET_USE_GATHER_4PARTS): New macro. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): New tune (X86_TUNE_USE_GATHER_4PARTS): New tune
2022-01-17Change references of .c files to .cc filesMartin Liska1-2/+2
ChangeLog: * MAINTAINERS: Rename .c names to .cc. contrib/ChangeLog: * filter-clang-warnings.py: Rename .c names to .cc. * gcc_update: Likewise. * paranoia.cc: Likewise. contrib/header-tools/ChangeLog: * README: Rename .c names to .cc. gcc/ChangeLog: * Makefile.in: Rename .c names to .cc. * alias.h: Likewise. * asan.cc: Likewise. * auto-profile.h: Likewise. * basic-block.h (struct basic_block_d): Likewise. * btfout.cc: Likewise. * builtins.cc (expand_builtin_longjmp): Likewise. (validate_arg): Likewise. (access_ref::offset_bounded): Likewise. * caller-save.cc (reg_restore_code): Likewise. (setup_save_areas): Likewise. * calls.cc (initialize_argument_information): Likewise. (expand_call): Likewise. (emit_library_call_value_1): Likewise. * cfg-flags.def (RTL): Likewise. (SIBCALL): Likewise. (CAN_FALLTHRU): Likewise. * cfganal.cc (post_order_compute): Likewise. * cfgcleanup.cc (try_simplify_condjump): Likewise. (merge_blocks_move_predecessor_nojumps): Likewise. (merge_blocks_move_successor_nojumps): Likewise. (merge_blocks_move): Likewise. (old_insns_match_p): Likewise. (try_crossjump_bb): Likewise. * cfgexpand.cc (expand_gimple_stmt): Likewise. * cfghooks.cc (split_block_before_cond_jump): Likewise. (profile_record_check_consistency): Likewise. * cfghooks.h: Likewise. * cfgrtl.cc (pass_free_cfg::execute): Likewise. (rtl_can_merge_blocks): Likewise. (try_redirect_by_replacing_jump): Likewise. (make_pass_outof_cfg_layout_mode): Likewise. (cfg_layout_can_merge_blocks_p): Likewise. * cgraph.cc (release_function_body): Likewise. (cgraph_node::get_fun): Likewise. * cgraph.h (struct cgraph_node): Likewise. (asmname_hasher::equal): Likewise. (cgraph_inline_failed_type): Likewise. (thunk_adjust): Likewise. (dump_callgraph_transformation): Likewise. (record_references_in_initializer): Likewise. (ipa_discover_variable_flags): Likewise. * cgraphclones.cc (GTY): Likewise. * cgraphunit.cc (symbol_table::finalize_compilation_unit): Likewise. * collect-utils.h (GCC_COLLECT_UTILS_H): Likewise. * collect2-aix.h (GCC_COLLECT2_AIX_H): Likewise. * collect2.cc (maybe_run_lto_and_relink): Likewise. * combine-stack-adj.cc: Likewise. * combine.cc (setup_incoming_promotions): Likewise. (combine_simplify_rtx): Likewise. (count_rtxs): Likewise. * common.opt: Likewise. * common/config/aarch64/aarch64-common.cc: Likewise. * common/config/arm/arm-common.cc (arm_asm_auto_mfpu): Likewise. * common/config/avr/avr-common.cc: Likewise. * common/config/i386/i386-isas.h (struct _isa_names_table): Likewise. * conditions.h: Likewise. * config.gcc: Likewise. * config/aarch64/aarch64-builtins.cc (aarch64_resolve_overloaded_memtag): Likewise. * config/aarch64/aarch64-protos.h (aarch64_classify_address): Likewise. (aarch64_get_extension_string_for_isa_flags): Likewise. * config/aarch64/aarch64-sve-builtins.cc (function_builder::add_function): Likewise. * config/aarch64/aarch64.cc (aarch64_regmode_natural_size): Likewise. (aarch64_sched_first_cycle_multipass_dfa_lookahead): Likewise. (aarch64_option_valid_attribute_p): Likewise. (aarch64_short_vector_p): Likewise. (aarch64_float_const_representable_p): Likewise. * config/aarch64/aarch64.h (DBX_REGISTER_NUMBER): Likewise. (ASM_OUTPUT_POOL_EPILOGUE): Likewise. (GTY): Likewise. * config/aarch64/cortex-a57-fma-steering.cc: Likewise. * config/aarch64/driver-aarch64.cc (contains_core_p): Likewise. * config/aarch64/t-aarch64: Likewise. * config/aarch64/x-aarch64: Likewise. * config/aarch64/x-darwin: Likewise. * config/alpha/alpha-protos.h: Likewise. * config/alpha/alpha.cc (alpha_scalar_mode_supported_p): Likewise. * config/alpha/alpha.h (LONG_DOUBLE_TYPE_SIZE): Likewise. (enum reg_class): Likewise. * config/alpha/alpha.md: Likewise. * config/alpha/driver-alpha.cc (AMASK_LOCKPFTCHOK): Likewise. * config/alpha/x-alpha: Likewise. * config/arc/arc-protos.h (arc_eh_uses): Likewise. * config/arc/arc.cc (ARC_OPT): Likewise. (arc_ccfsm_advance): Likewise. (arc_arg_partial_bytes): Likewise. (conditionalize_nonjump): Likewise. * config/arc/arc.md: Likewise. * config/arc/builtins.def: Likewise. * config/arc/t-arc: Likewise. * config/arm/arm-c.cc (arm_resolve_overloaded_builtin): Likewise. (arm_pragma_target_parse): Likewise. * config/arm/arm-protos.h (save_restore_target_globals): Likewise. (arm_cpu_cpp_builtins): Likewise. * config/arm/arm.cc (vfp3_const_double_index): Likewise. (shift_op): Likewise. (thumb2_final_prescan_insn): Likewise. (arm_final_prescan_insn): Likewise. (arm_asm_output_labelref): Likewise. (arm_small_register_classes_for_mode_p): Likewise. * config/arm/arm.h: Likewise. * config/arm/arm.md: Likewise. * config/arm/driver-arm.cc: Likewise. * config/arm/symbian.h: Likewise. * config/arm/t-arm: Likewise. * config/arm/thumb1.md: Likewise. * config/arm/x-arm: Likewise. * config/avr/avr-c.cc (avr_register_target_pragmas): Likewise. * config/avr/avr-fixed.md: Likewise. * config/avr/avr-log.cc (avr_log_vadump): Likewise. * config/avr/avr-mcus.def: Likewise. * config/avr/avr-modes.def (FRACTIONAL_INT_MODE): Likewise. * config/avr/avr-passes.def (INSERT_PASS_BEFORE): Likewise. * config/avr/avr-protos.h (make_avr_pass_casesi): Likewise. * config/avr/avr.cc (avr_option_override): Likewise. (avr_build_builtin_va_list): Likewise. (avr_mode_dependent_address_p): Likewise. (avr_function_arg_advance): Likewise. (avr_asm_output_aligned_decl_common): Likewise. * config/avr/avr.h (RETURN_ADDR_RTX): Likewise. (SUPPORTS_INIT_PRIORITY): Likewise. * config/avr/avr.md: Likewise. * config/avr/builtins.def: Likewise. * config/avr/gen-avr-mmcu-specs.cc (IN_GEN_AVR_MMCU_TEXI): Likewise. * config/avr/gen-avr-mmcu-texi.cc (IN_GEN_AVR_MMCU_TEXI): Likewise. (main): Likewise. * config/avr/t-avr: Likewise. * config/bfin/bfin.cc (frame_related_constant_load): Likewise. * config/bpf/bpf-protos.h (GCC_BPF_PROTOS_H): Likewise. * config/bpf/bpf.h (enum reg_class): Likewise. * config/bpf/t-bpf: Likewise. * config/c6x/c6x-protos.h (GCC_C6X_PROTOS_H): Likewise. * config/cr16/cr16-protos.h: Likewise. * config/cris/cris.cc (cris_address_cost): Likewise. (cris_side_effect_mode_ok): Likewise. (cris_init_machine_status): Likewise. (cris_emit_movem_store): Likewise. * config/cris/cris.h (INDEX_REG_CLASS): Likewise. (enum reg_class): Likewise. (struct cum_args): Likewise. * config/cris/cris.opt: Likewise. * config/cris/sync.md: Likewise. * config/csky/csky.cc (csky_expand_prologue): Likewise. * config/darwin-c.cc: Likewise. * config/darwin-f.cc: Likewise. * config/darwin-sections.def (zobj_const_section): Likewise. * config/darwin.cc (output_objc_section_asm_op): Likewise. (fprintf): Likewise. * config/darwin.h (GTY): Likewise. * config/elfos.h: Likewise. * config/epiphany/epiphany-sched.md: Likewise. * config/epiphany/epiphany.cc (epiphany_function_value): Likewise. * config/epiphany/epiphany.h (GTY): Likewise. (NO_FUNCTION_CSE): Likewise. * config/epiphany/mode-switch-use.cc: Likewise. * config/epiphany/predicates.md: Likewise. * config/epiphany/t-epiphany: Likewise. * config/fr30/fr30-protos.h: Likewise. * config/frv/frv-protos.h: Likewise. * config/frv/frv.cc (TLS_BIAS): Likewise. * config/frv/frv.h (ASM_OUTPUT_ALIGNED_LOCAL): Likewise. * config/ft32/ft32-protos.h: Likewise. * config/gcn/gcn-hsa.h (ASM_APP_OFF): Likewise. * config/gcn/gcn.cc (gcn_init_libfuncs): Likewise. * config/gcn/mkoffload.cc (copy_early_debug_info): Likewise. * config/gcn/t-gcn-hsa: Likewise. * config/gcn/t-omp-device: Likewise. * config/h8300/h8300-protos.h (GCC_H8300_PROTOS_H): Likewise. (same_cmp_following_p): Likewise. * config/h8300/h8300.cc (F): Likewise. * config/h8300/h8300.h (struct cum_arg): Likewise. (BRANCH_COST): Likewise. * config/i386/cygming.h (DEFAULT_PCC_STRUCT_RETURN): Likewise. * config/i386/djgpp.h (TARGET_ASM_LTO_END): Likewise. * config/i386/dragonfly.h (NO_PROFILE_COUNTERS): Likewise. * config/i386/driver-i386.cc (detect_caches_intel): Likewise. * config/i386/freebsd.h (NO_PROFILE_COUNTERS): Likewise. * config/i386/i386-c.cc (ix86_target_macros): Likewise. * config/i386/i386-expand.cc (get_mode_wider_vector): Likewise. * config/i386/i386-options.cc (ix86_set_func_type): Likewise. * config/i386/i386-protos.h (ix86_extract_perm_from_pool_constant): Likewise. (ix86_register_pragmas): Likewise. (ix86_d_has_stdcall_convention): Likewise. (i386_pe_seh_init_sections): Likewise. * config/i386/i386.cc (ix86_function_arg_regno_p): Likewise. (ix86_function_value_regno_p): Likewise. (ix86_compute_frame_layout): Likewise. (legitimize_pe_coff_symbol): Likewise. (output_pic_addr_const): Likewise. * config/i386/i386.h (defined): Likewise. (host_detect_local_cpu): Likewise. (CONSTANT_ADDRESS_P): Likewise. (DEFAULT_LARGE_SECTION_THRESHOLD): Likewise. (struct machine_frame_state): Likewise. * config/i386/i386.md: Likewise. * config/i386/lynx.h (ASM_OUTPUT_ALIGN): Likewise. * config/i386/mmx.md: Likewise. * config/i386/sse.md: Likewise. * config/i386/t-cygming: Likewise. * config/i386/t-djgpp: Likewise. * config/i386/t-gnu-property: Likewise. * config/i386/t-i386: Likewise. * config/i386/t-intelmic: Likewise. * config/i386/t-omp-device: Likewise. * config/i386/winnt-cxx.cc (i386_pe_type_dllimport_p): Likewise. (i386_pe_adjust_class_at_definition): Likewise. * config/i386/winnt.cc (gen_stdcall_or_fastcall_suffix): Likewise. (i386_pe_mangle_decl_assembler_name): Likewise. (i386_pe_encode_section_info): Likewise. * config/i386/x-cygwin: Likewise. * config/i386/x-darwin: Likewise. * config/i386/x-i386: Likewise. * config/i386/x-mingw32: Likewise. * config/i386/x86-tune-sched-core.cc: Likewise. * config/i386/x86-tune.def: Likewise. * config/i386/xm-djgpp.h (STANDARD_STARTFILE_PREFIX_1): Likewise. * config/ia64/freebsd.h: Likewise. * config/ia64/hpux.h (REGISTER_TARGET_PRAGMAS): Likewise. * config/ia64/ia64-protos.h (ia64_except_unwind_info): Likewise. * config/ia64/ia64.cc (ia64_function_value_regno_p): Likewise. (ia64_secondary_reload_class): Likewise. (bundling): Likewise. * config/ia64/ia64.h: Likewise. * config/ia64/ia64.md: Likewise. * config/ia64/predicates.md: Likewise. * config/ia64/sysv4.h: Likewise. * config/ia64/t-ia64: Likewise. * config/iq2000/iq2000.h (FUNCTION_MODE): Likewise. * config/iq2000/iq2000.md: Likewise. * config/linux.h (TARGET_HAS_BIONIC): Likewise. (if): Likewise. * config/m32c/m32c.cc (m32c_function_needs_enter): Likewise. * config/m32c/m32c.h (MAX_REGS_PER_ADDRESS): Likewise. * config/m32c/t-m32c: Likewise. * config/m32r/m32r-protos.h: Likewise. * config/m32r/m32r.cc (m32r_print_operand): Likewise. * config/m32r/m32r.h: Likewise. * config/m32r/m32r.md: Likewise. * config/m68k/m68k-isas.def: Likewise. * config/m68k/m68k-microarchs.def: Likewise. * config/m68k/m68k-protos.h (strict_low_part_peephole_ok): Likewise. (m68k_epilogue_uses): Likewise. * config/m68k/m68k.cc (m68k_call_tls_get_addr): Likewise. (m68k_sched_adjust_cost): Likewise. (m68k_sched_md_init): Likewise. * config/m68k/m68k.h (__transfer_from_trampoline): Likewise. (enum m68k_function_kind): Likewise. * config/m68k/m68k.md: Likewise. * config/m68k/m68kemb.h: Likewise. * config/m68k/uclinux.h (ENDFILE_SPEC): Likewise. * config/mcore/mcore-protos.h: Likewise. * config/mcore/mcore.cc (mcore_expand_insv): Likewise. (mcore_expand_prolog): Likewise. * config/mcore/mcore.h (TARGET_MCORE): Likewise. * config/mcore/mcore.md: Likewise. * config/microblaze/microblaze-protos.h: Likewise. * config/microblaze/microblaze.cc (microblaze_legitimate_pic_operand): Likewise. (microblaze_function_prologue): Likewise. (microblaze_function_epilogue): Likewise. (microblaze_select_section): Likewise. (microblaze_asm_output_mi_thunk): Likewise. (microblaze_eh_return): Likewise. * config/microblaze/microblaze.h: Likewise. * config/microblaze/microblaze.md: Likewise. * config/microblaze/t-microblaze: Likewise. * config/mips/driver-native.cc: Likewise. * config/mips/loongson2ef.md: Likewise. * config/mips/mips-protos.h (mips_expand_vec_cmp_expr): Likewise. * config/mips/mips.cc (mips_rtx_costs): Likewise. (mips_output_filename): Likewise. (mips_output_function_prologue): Likewise. (mips_output_function_epilogue): Likewise. (mips_output_mi_thunk): Likewise. * config/mips/mips.h: Likewise. * config/mips/mips.md: Likewise. * config/mips/t-mips: Likewise. * config/mips/x-native: Likewise. * config/mmix/mmix-protos.h: Likewise. * config/mmix/mmix.cc (mmix_option_override): Likewise. (mmix_dbx_register_number): Likewise. (mmix_expand_prologue): Likewise. * config/mmix/mmix.h: Likewise. * config/mmix/mmix.md: Likewise. * config/mmix/predicates.md: Likewise. * config/mn10300/mn10300.cc (mn10300_symbolic_operand): Likewise. (mn10300_legitimate_pic_operand_p): Likewise. * config/mn10300/mn10300.h (enum reg_class): Likewise. (NO_FUNCTION_CSE): Likewise. * config/moxie/moxie-protos.h: Likewise. * config/moxie/uclinux.h (TARGET_LIBC_HAS_FUNCTION): Likewise. * config/msp430/msp430-devices.cc (extract_devices_dir_from_exec_prefix): Likewise. * config/msp430/msp430.cc (msp430_gimplify_va_arg_expr): Likewise. (msp430_incoming_return_addr_rtx): Likewise. * config/msp430/msp430.h (msp430_get_linker_devices_include_path): Likewise. * config/msp430/t-msp430: Likewise. * config/nds32/nds32-cost.cc (nds32_rtx_costs_speed_prefer): Likewise. (nds32_rtx_costs_size_prefer): Likewise. (nds32_init_rtx_costs): Likewise. * config/nds32/nds32-doubleword.md: Likewise. * config/nds32/nds32.cc (nds32_memory_move_cost): Likewise. (nds32_builtin_decl): Likewise. * config/nds32/nds32.h (enum nds32_16bit_address_type): Likewise. (enum nds32_isr_nested_type): Likewise. (enum reg_class): Likewise. * config/nds32/predicates.md: Likewise. * config/nds32/t-nds32: Likewise. * config/nios2/nios2.cc (nios2_pragma_target_parse): Likewise. * config/nvptx/nvptx-protos.h: Likewise. * config/nvptx/nvptx.cc (nvptx_goacc_expand_var_decl): Likewise. * config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Likewise. * config/nvptx/t-nvptx: Likewise. * config/nvptx/t-omp-device: Likewise. * config/pa/elf.h: Likewise. * config/pa/pa-linux.h (GLOBAL_ASM_OP): Likewise. * config/pa/pa-netbsd.h (GLOBAL_ASM_OP): Likewise. * config/pa/pa-openbsd.h (TARGET_ASM_GLOBALIZE_LABEL): Likewise. * config/pa/pa-protos.h (pa_eh_return_handler_rtx): Likewise. (pa_legitimize_reload_address): Likewise. (pa_can_use_return_insn): Likewise. * config/pa/pa.cc (mem_shadd_or_shadd_rtx_p): Likewise. (som_output_text_section_asm_op): Likewise. * config/pa/pa.h (PROFILE_BEFORE_PROLOGUE): Likewise. * config/pa/pa.md: Likewise. * config/pa/som.h: Likewise. * config/pa/t-pa: Likewise. * config/pdp11/pdp11.cc (decode_pdp11_d): Likewise. * config/pdp11/pdp11.h: Likewise. * config/pdp11/pdp11.md: Likewise. * config/pdp11/t-pdp11: Likewise. * config/pru/pru.md: Likewise. * config/pru/t-pru: Likewise. * config/riscv/riscv-protos.h (NUM_SYMBOL_TYPES): Likewise. (riscv_gpr_save_operation_p): Likewise. (riscv_d_register_target_info): Likewise. (riscv_init_builtins): Likewise. * config/riscv/riscv.cc (riscv_output_mi_thunk): Likewise. * config/riscv/riscv.h (CSW_MAX_OFFSET): Likewise. * config/riscv/t-riscv: Likewise. * config/rl78/rl78.cc (rl78_asm_ctor_dtor): Likewise. * config/rl78/t-rl78: Likewise. * config/rs6000/aix.h: Likewise. * config/rs6000/aix71.h (ASM_SPEC_COMMON): Likewise. * config/rs6000/aix72.h (ASM_SPEC_COMMON): Likewise. * config/rs6000/aix73.h (ASM_SPEC_COMMON): Likewise. * config/rs6000/darwin.h (TARGET_ASM_GLOBALIZE_LABEL): Likewise. * config/rs6000/driver-rs6000.cc: Likewise. * config/rs6000/freebsd.h: Likewise. * config/rs6000/freebsd64.h: Likewise. * config/rs6000/lynx.h (ASM_OUTPUT_ALIGN): Likewise. * config/rs6000/rbtree.cc: Likewise. * config/rs6000/rbtree.h: Likewise. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Likewise. * config/rs6000/rs6000-call.cc (rs6000_invalid_builtin): Likewise. (rs6000_expand_builtin): Likewise. (rs6000_init_builtins): Likewise. * config/rs6000/rs6000-cpus.def: Likewise. * config/rs6000/rs6000-gen-builtins.cc (write_init_ovld_table): Likewise. * config/rs6000/rs6000-internal.h (ALTIVEC_REG_BIT): Likewise. (quad_address_offset_p): Likewise. * config/rs6000/rs6000-logue.cc (interesting_frame_related_regno): Likewise. (rs6000_emit_epilogue): Likewise. * config/rs6000/rs6000-overload.def: Likewise. * config/rs6000/rs6000-p8swap.cc: Likewise. * config/rs6000/rs6000-protos.h (GCC_RS6000_PROTOS_H): Likewise. (rs6000_const_f32_to_i32): Likewise. * config/rs6000/rs6000.cc (legitimate_lo_sum_address_p): Likewise. (rs6000_debug_legitimize_address): Likewise. (rs6000_mode_dependent_address): Likewise. (rs6000_adjust_priority): Likewise. (rs6000_c_mode_for_suffix): Likewise. * config/rs6000/rs6000.h (defined): Likewise. (LONG_DOUBLE_TYPE_SIZE): Likewise. * config/rs6000/rs6000.md: Likewise. * config/rs6000/sysv4.h: Likewise. * config/rs6000/t-linux: Likewise. * config/rs6000/t-linux64: Likewise. * config/rs6000/t-rs6000: Likewise. * config/rs6000/x-darwin: Likewise. * config/rs6000/x-darwin64: Likewise. * config/rs6000/x-rs6000: Likewise. * config/rs6000/xcoff.h (ASM_OUTPUT_LABELREF): Likewise. * config/rx/rx.cc (rx_expand_builtin): Likewise. * config/s390/constraints.md: Likewise. * config/s390/driver-native.cc: Likewise. * config/s390/htmxlintrin.h: Likewise. * config/s390/s390-builtins.def (B_DEF): Likewise. (OB_DEF_VAR): Likewise. * config/s390/s390-builtins.h: Likewise. * config/s390/s390-c.cc: Likewise. * config/s390/s390-opts.h: Likewise. * config/s390/s390-protos.h (s390_check_symref_alignment): Likewise. (s390_register_target_pragmas): Likewise. * config/s390/s390.cc (s390_init_builtins): Likewise. (s390_expand_plus_operand): Likewise. (s390_expand_atomic): Likewise. (s390_valid_target_attribute_inner_p): Likewise. * config/s390/s390.h (LONG_DOUBLE_TYPE_SIZE): Likewise. * config/s390/s390.md: Likewise. * config/s390/t-s390: Likewise. * config/s390/vx-builtins.md: Likewise. * config/s390/x-native: Likewise. * config/sh/divtab-sh4-300.cc (main): Likewise. * config/sh/divtab-sh4.cc (main): Likewise. * config/sh/divtab.cc (main): Likewise. * config/sh/elf.h: Likewise. * config/sh/sh-protos.h (sh_fsca_int2sf): Likewise. * config/sh/sh.cc (SYMBOL_FLAG_FUNCVEC_FUNCTION): Likewise. (sh_struct_value_rtx): Likewise. (sh_remove_reg_dead_or_unused_notes): Likewise. * config/sh/sh.h (MIN_UNITS_PER_WORD): Likewise. * config/sh/t-sh: Likewise. * config/sol2-protos.h (solaris_override_options): Likewise. * config/sol2.h: Likewise. * config/sparc/driver-sparc.cc: Likewise. * config/sparc/freebsd.h: Likewise. * config/sparc/sparc-protos.h (make_pass_work_around_errata): Likewise. * config/sparc/sparc.cc (sparc_output_mi_thunk): Likewise. (sparc_asan_shadow_offset): Likewise. * config/sparc/sparc.h: Likewise. * config/sparc/sparc.md: Likewise. * config/sparc/t-sparc: Likewise. * config/sparc/x-sparc: Likewise. * config/stormy16/stormy16.cc (xstormy16_mode_dependent_address_p): Likewise. * config/t-darwin: Likewise. * config/t-dragonfly: Likewise. * config/t-freebsd: Likewise. * config/t-glibc: Likewise. * config/t-linux: Likewise. * config/t-netbsd: Likewise. * config/t-openbsd: Likewise. * config/t-pnt16-warn: Likewise. * config/t-sol2: Likewise. * config/t-vxworks: Likewise. * config/t-winnt: Likewise. * config/tilegx/t-tilegx: Likewise. * config/tilegx/tilegx-c.cc: Likewise. * config/tilegx/tilegx-protos.h (tilegx_function_profiler): Likewise. * config/tilegx/tilegx.md: Likewise. * config/tilepro/t-tilepro: Likewise. * config/tilepro/tilepro-c.cc: Likewise. * config/v850/t-v850: Likewise. * config/v850/v850-protos.h: Likewise. * config/v850/v850.cc (F): Likewise. * config/v850/v850.h (enum reg_class): Likewise. (SLOW_BYTE_ACCESS): Likewise. * config/vax/vax.cc (vax_mode_dependent_address_p): Likewise. * config/vax/vax.h (enum reg_class): Likewise. * config/vax/vax.md: Likewise. * config/visium/visium.cc (visium_legitimate_address_p): Likewise. * config/visium/visium.h: Likewise. * config/vms/t-vms: Likewise. * config/vms/vms-crtlmap.map: Likewise. * config/vms/vms-protos.h (vms_c_get_vms_ver): Likewise. * config/vx-common.h: Likewise. * config/x-darwin: Likewise. * config/x-hpux: Likewise. * config/x-linux: Likewise. * config/x-netbsd: Likewise. * config/x-openbsd: Likewise. * config/x-solaris: Likewise. * config/xtensa/xtensa-protos.h (xtensa_mem_offset): Likewise. * config/xtensa/xtensa.cc (xtensa_option_override): Likewise. * config/xtensa/xtensa.h: Likewise. * configure.ac: Likewise. * context.cc: Likewise. * convert.h: Likewise. * coretypes.h: Likewise. * coverage.cc: Likewise. * coverage.h: Likewise. * cppdefault.h (struct default_include): Likewise. * cprop.cc (local_cprop_pass): Likewise. (one_cprop_pass): Likewise. * cse.cc (hash_rtx_cb): Likewise. (fold_rtx): Likewise. * ctfc.h (ctfc_get_num_vlen_bytes): Likewise. * data-streamer.h (bp_unpack_var_len_int): Likewise. (streamer_write_widest_int): Likewise. * dbgcnt.def: Likewise. * dbxout.cc (dbxout_early_global_decl): Likewise. (dbxout_common_check): Likewise. * dbxout.h: Likewise. * debug.h (struct gcc_debug_hooks): Likewise. (dump_go_spec_init): Likewise. * df-core.cc: Likewise. * df-scan.cc (df_insn_info_delete): Likewise. (df_insn_delete): Likewise. * df.h (debug_df_chain): Likewise. (can_move_insns_across): Likewise. * dfp.cc (decimal_from_binary): Likewise. * diagnostic-color.cc: Likewise. * diagnostic-event-id.h: Likewise. * diagnostic-show-locus.cc (test_one_liner_labels): Likewise. * diagnostic.cc (bt_callback): Likewise. (num_digits): Likewise. * doc/avr-mmcu.texi: Likewise. * doc/cfg.texi: Likewise. * doc/contrib.texi: Likewise. * doc/cppinternals.texi: Likewise. * doc/extend.texi: Likewise. * doc/generic.texi: Likewise. * doc/gimple.texi: Likewise. * doc/gty.texi: Likewise. * doc/invoke.texi: Likewise. * doc/loop.texi: Likewise. * doc/lto.texi: Likewise. * doc/match-and-simplify.texi: Likewise. * doc/md.texi: Likewise. * doc/optinfo.texi: Likewise. * doc/options.texi: Likewise. * doc/passes.texi: Likewise. * doc/plugins.texi: Likewise. * doc/rtl.texi: Likewise. * doc/sourcebuild.texi: Likewise. * doc/tm.texi: Likewise. * doc/tm.texi.in: Likewise. * doc/tree-ssa.texi: Likewise. * dojump.cc (do_jump): Likewise. * dojump.h: Likewise. * dumpfile.cc (test_impl_location): Likewise. (test_capture_of_dump_calls): Likewise. * dumpfile.h (enum dump_kind): Likewise. (class dump_location_t): Likewise. (dump_enabled_p): Likewise. (enable_rtl_dump_file): Likewise. (dump_combine_total_stats): Likewise. * dwarf2asm.cc (dw2_asm_output_delta_uleb128): Likewise. * dwarf2ctf.h (ctf_debug_finish): Likewise. * dwarf2out.cc (dwarf2out_begin_prologue): Likewise. (struct loc_descr_context): Likewise. (rtl_for_decl_location): Likewise. (gen_subprogram_die): Likewise. (gen_label_die): Likewise. (is_trivial_indirect_ref): Likewise. (dwarf2out_late_global_decl): Likewise. (dwarf_file_hasher::hash): Likewise. (dwarf2out_end_source_file): Likewise. (dwarf2out_define): Likewise. (dwarf2out_early_finish): Likewise. * dwarf2out.h (struct dw_fde_node): Likewise. (struct dw_discr_list_node): Likewise. (output_loc_sequence_raw): Likewise. * emit-rtl.cc (gen_raw_REG): Likewise. (maybe_set_max_label_num): Likewise. * emit-rtl.h (struct rtl_data): Likewise. * errors.cc (internal_error): Likewise. (trim_filename): Likewise. * et-forest.cc: Likewise. * except.cc (init_eh_for_function): Likewise. * explow.cc (promote_ssa_mode): Likewise. (get_dynamic_stack_size): Likewise. * explow.h: Likewise. * expmed.h: Likewise. * expr.cc (safe_from_p): Likewise. (expand_expr_real_2): Likewise. (expand_expr_real_1): Likewise. * file-prefix-map.cc (remap_filename): Likewise. * final.cc (app_enable): Likewise. (make_pass_compute_alignments): Likewise. (final_scan_insn_1): Likewise. (final_scan_insn): Likewise. * fixed-value.h (fixed_from_string): Likewise. * flag-types.h (NO_DEBUG): Likewise. (DWARF2_DEBUG): Likewise. (VMS_DEBUG): Likewise. (BTF_DEBUG): Likewise. (enum ctf_debug_info_levels): Likewise. * fold-const.cc (const_binop): Likewise. (fold_binary_loc): Likewise. (fold_checksum_tree): Likewise. * fp-test.cc: Likewise. * function.cc (expand_function_end): Likewise. * function.h (struct function): Likewise. * fwprop.cc (should_replace_address): Likewise. * gcc-main.cc: Likewise. * gcc-rich-location.h (class gcc_rich_location): Likewise. * gcc-symtab.h: Likewise. * gcc.cc (MIN_FATAL_STATUS): Likewise. (driver_handle_option): Likewise. (quote_spec_arg): Likewise. (driver::finalize): Likewise. * gcc.h (set_input): Likewise. * gcov-dump.cc: Likewise. * gcov.cc (solve_flow_graph): Likewise. * gcse-common.cc: Likewise. * gcse.cc (make_pass_rtl_hoist): Likewise. * genattr-common.cc: Likewise. * genattrtab.cc (min_fn): Likewise. (write_const_num_delay_slots): Likewise. * genautomata.cc: Likewise. * genconditions.cc (write_one_condition): Likewise. * genconstants.cc: Likewise. * genemit.cc (gen_exp): Likewise. * generic-match-head.cc: Likewise. * genextract.cc: Likewise. * gengenrtl.cc (always_void_p): Likewise. * gengtype-parse.cc (gtymarker_opt): Likewise. * gengtype-state.cc (state_writer::state_writer): Likewise. (write_state_trailer): Likewise. (equals_type_number): Likewise. (read_state): Likewise. * gengtype.cc (open_base_files): Likewise. (struct file_rule_st): Likewise. (header_dot_h_frul): Likewise. * gengtype.h: Likewise. * genmatch.cc (main): Likewise. * genmddeps.cc: Likewise. * genmodes.cc (emit_mode_inner): Likewise. (emit_mode_unit_size): Likewise. * genpeep.cc (gen_peephole): Likewise. * genpreds.cc (write_tm_preds_h): Likewise. * genrecog.cc (validate_pattern): Likewise. (write_header): Likewise. (main): Likewise. * gensupport.cc (change_subst_attribute): Likewise. (traverse_c_tests): Likewise. (add_predicate): Likewise. (init_predicate_table): Likewise. * gensupport.h (struct optab_pattern): Likewise. (get_num_insn_codes): Likewise. (maybe_eval_c_test): Likewise. (struct pred_data): Likewise. * ggc-internal.h: Likewise. * gimple-fold.cc (maybe_fold_reference): Likewise. (get_range_strlen_tree): Likewise. * gimple-fold.h (gimple_stmt_integer_valued_real_p): Likewise. * gimple-low.cc: Likewise. * gimple-match-head.cc (directly_supported_p): Likewise. * gimple-pretty-print.h: Likewise. * gimple-ssa-sprintf.cc (format_percent): Likewise. (adjust_range_for_overflow): Likewise. * gimple-streamer.h: Likewise. * gimple.h (struct GTY): Likewise. (is_gimple_resx): Likewise. * gimplify.cc (gimplify_expr): Likewise. (gimplify_init_constructor): Likewise. (omp_construct_selector_matches): Likewise. (gimplify_omp_target_update): Likewise. (gimplify_omp_ordered): Likewise. (gimplify_va_arg_expr): Likewise. * graphite-isl-ast-to-gimple.cc (should_copy_to_new_region): Likewise. * haifa-sched.cc (increase_insn_priority): Likewise. (try_ready): Likewise. (sched_create_recovery_edges): Likewise. * ifcvt.cc (find_if_case_1): Likewise. (find_if_case_2): Likewise. * inchash.h: Likewise. * incpath.cc (add_env_var_paths): Likewise. * input.cc (dump_location_info): Likewise. (assert_loceq): Likewise. (test_lexer_string_locations_concatenation_1): Likewise. (test_lexer_string_locations_concatenation_2): Likewise. (test_lexer_string_locations_concatenation_3): Likewise. * input.h (BUILTINS_LOCATION): Likewise. (class string_concat_db): Likewise. * internal-fn.cc (expand_MUL_OVERFLOW): Likewise. (expand_LOOP_VECTORIZED): Likewise. * ipa-cp.cc (make_pass_ipa_cp): Likewise. * ipa-fnsummary.cc (remap_freqcounting_preds_after_dup): Likewise. (ipa_fn_summary_t::duplicate): Likewise. (make_pass_ipa_fn_summary): Likewise. * ipa-fnsummary.h (enum ipa_hints_vals): Likewise. * ipa-free-lang-data.cc (fld_simplified_type): Likewise. (free_lang_data_in_decl): Likewise. * ipa-inline.cc (compute_inlined_call_time): Likewise. (inline_always_inline_functions): Likewise. * ipa-inline.h (free_growth_caches): Likewise. (inline_account_function_p): Likewise. * ipa-modref.cc (modref_access_analysis::analyze_stmt): Likewise. (modref_eaf_analysis::analyze_ssa_name): Likewise. * ipa-param-manipulation.cc (ipa_param_body_adjustments::mark_dead_statements): Likewise. (ipa_param_body_adjustments::remap_with_debug_expressions): Likewise. * ipa-prop.cc (ipa_set_node_agg_value_chain): Likewise. * ipa-prop.h (IPA_UNDESCRIBED_USE): Likewise. (unadjusted_ptr_and_unit_offset): Likewise. * ipa-reference.cc (make_pass_ipa_reference): Likewise. * ipa-reference.h (GCC_IPA_REFERENCE_H): Likewise. * ipa-split.cc (consider_split): Likewise. * ipa-sra.cc (isra_read_node_info): Likewise. * ipa-utils.h (struct ipa_dfs_info): Likewise. (recursive_call_p): Likewise. (ipa_make_function_pure): Likewise. * ira-build.cc (ira_create_allocno): Likewise. (ira_flattening): Likewise. * ira-color.cc (do_coloring): Likewise. (update_curr_costs): Likewise. * ira-conflicts.cc (process_regs_for_copy): Likewise. * ira-int.h (struct ira_emit_data): Likewise. (ira_prohibited_mode_move_regs): Likewise. (ira_get_dup_out_num): Likewise. (ira_destroy): Likewise. (ira_tune_allocno_costs): Likewise. (ira_implicitly_set_insn_hard_regs): Likewise. (ira_build_conflicts): Likewise. (ira_color): Likewise. * ira-lives.cc (process_bb_node_lives): Likewise. * ira.cc (class ira_spilled_reg_stack_slot): Likewise. (setup_uniform_class_p): Likewise. (def_dominates_uses): Likewise. * ira.h (ira_nullify_asm_goto): Likewise. * langhooks.cc (lhd_post_options): Likewise. * langhooks.h (class substring_loc): Likewise. (struct lang_hooks_for_tree_inlining): Likewise. (struct lang_hooks_for_types): Likewise. (struct lang_hooks): Likewise. * libfuncs.h (synchronize_libfunc): Likewise. * loop-doloop.cc (doloop_condition_get): Likewise. * loop-init.cc (fix_loop_structure): Likewise. * loop-invariant.cc: Likewise. * lower-subreg.h: Likewise. * lra-constraints.cc (curr_insn_transform): Likewise. * lra-int.h (struct lra_insn_reg): Likewise. (lra_undo_inheritance): Likewise. (lra_setup_reload_pseudo_preferenced_hard_reg): Likewise. (lra_split_hard_reg_for): Likewise. (lra_coalesce): Likewise. (lra_final_code_change): Likewise. * lra-spills.cc (lra_final_code_change): Likewise. * lra.cc (lra_process_new_insns): Likewise. * lto-compress.h (struct lto_compression_stream): Likewise. * lto-streamer-out.cc (DFS::DFS_write_tree_body): Likewise. (write_symbol): Likewise. * lto-streamer.h (enum LTO_tags): Likewise. (lto_value_range_error): Likewise. (lto_append_block): Likewise. (lto_streamer_hooks_init): Likewise. (stream_read_tree_ref): Likewise. (lto_prepare_function_for_streaming): Likewise. (select_what_to_stream): Likewise. (omp_lto_input_declare_variant_alt): Likewise. (cl_optimization_stream_in): Likewise. * lto-wrapper.cc (append_compiler_options): Likewise. * machmode.def: Likewise. * machmode.h (struct int_n_data_t): Likewise. * main.cc (main): Likewise. * match.pd: Likewise. * omp-builtins.def (BUILT_IN_GOMP_CRITICAL_NAME_END): Likewise. (BUILT_IN_GOMP_LOOP_ULL_ORDERED_RUNTIME_NEXT): Likewise. * omp-expand.cc (expand_omp_atomic_fetch_op): Likewise. (make_pass_expand_omp_ssa): Likewise. * omp-low.cc (struct omp_context): Likewise. (struct omp_taskcopy_context): Likewise. (lower_omp): Likewise. * omp-oacc-neuter-broadcast.cc (omp_sese_active_worker_call): Likewise. (mask_name): Likewise. (omp_sese_dump_pars): Likewise. (worker_single_simple): Likewise. * omp-offload.cc (omp_finish_file): Likewise. (execute_oacc_loop_designation): Likewise. * optabs-query.cc (lshift_cheap_p): Likewise. * optc-gen.awk: Likewise. * optc-save-gen.awk: Likewise. * optinfo-emit-json.cc (optrecord_json_writer::optrecord_json_writer): Likewise. * opts-common.cc: Likewise. * output.h (app_enable): Likewise. (output_operand_lossage): Likewise. (insn_current_reference_address): Likewise. (get_insn_template): Likewise. (output_quoted_string): Likewise. * pass_manager.h (struct register_pass_info): Likewise. * plugin.cc: Likewise. * plugin.def (PLUGIN_ANALYZER_INIT): Likewise. * plugin.h (invoke_plugin_callbacks): Likewise. * pointer-query.cc (handle_mem_ref): Likewise. * postreload-gcse.cc (alloc_mem): Likewise. * predict.h (enum prediction): Likewise. (add_reg_br_prob_note): Likewise. * prefix.h: Likewise. * profile.h (get_working_sets): Likewise. * read-md.cc: Likewise. * read-md.h (struct mapping): Likewise. (class md_reader): Likewise. (class noop_reader): Likewise. * read-rtl-function.cc (function_reader::create_function): Likewise. (function_reader::extra_parsing_for_operand_code_0): Likewise. * read-rtl.cc (initialize_iterators): Likewise. * real.cc: Likewise. * real.h (struct real_value): Likewise. (format_helper::format_helper): Likewise. (real_hash): Likewise. (real_can_shorten_arithmetic): Likewise. * recog.cc (struct target_recog): Likewise. (offsettable_nonstrict_memref_p): Likewise. (constrain_operands): Likewise. * recog.h (MAX_RECOG_ALTERNATIVES): Likewise. (which_op_alt): Likewise. (struct insn_gen_fn): Likewise. * reg-notes.def (REG_NOTE): Likewise. * reg-stack.cc: Likewise. * regs.h (reg_is_parm_p): Likewise. * regset.h: Likewise. * reload.cc (push_reload): Likewise. (find_reloads): Likewise. (find_reloads_address_1): Likewise. (find_replacement): Likewise. (refers_to_regno_for_reload_p): Likewise. (refers_to_mem_for_reload_p): Likewise. * reload.h (push_reload): Likewise. (deallocate_reload_reg): Likewise. * reload1.cc (emit_input_reload_insns): Likewise. * reorg.cc (relax_delay_slots): Likewise. * rtl.def (UNKNOWN): Likewise. (SEQUENCE): Likewise. (BARRIER): Likewise. (ASM_OPERANDS): Likewise. (EQ_ATTR_ALT): Likewise. * rtl.h (struct GTY): Likewise. (LABEL_NAME): Likewise. (LABEL_ALT_ENTRY_P): Likewise. (SUBREG_BYTE): Likewise. (get_stack_check_protect): Likewise. (dump_rtx_statistics): Likewise. (unwrap_const_vec_duplicate): Likewise. (subreg_promoted_mode): Likewise. (gen_lowpart_common): Likewise. (operand_subword): Likewise. (immed_wide_int_const): Likewise. (decide_function_section): Likewise. (active_insn_p): Likewise. (delete_related_insns): Likewise. (try_split): Likewise. (val_signbit_known_clear_p): Likewise. (simplifiable_subregs): Likewise. (set_insn_deleted): Likewise. (subreg_get_info): Likewise. (remove_free_EXPR_LIST_node): Likewise. (finish_subregs_of_mode): Likewise. (get_mem_attrs): Likewise. (lookup_constant_def): Likewise. (rtx_to_tree_code): Likewise. (hash_rtx): Likewise. (condjump_in_parallel_p): Likewise. (validate_subreg): Likewise. (make_compound_operation): Likewise. (schedule_ebbs): Likewise. (print_inline_rtx): Likewise. (fixup_args_size_notes): Likewise. (expand_dec): Likewise. (prepare_copy_insn): Likewise. (mark_elimination): Likewise. (valid_mode_changes_for_regno): Likewise. (make_debug_expr_from_rtl): Likewise. (delete_vta_debug_insns): Likewise. (simplify_using_condition): Likewise. (set_insn_locations): Likewise. (fatal_insn_not_found): Likewise. (word_register_operation_p): Likewise. * rtlanal.cc (get_call_fndecl): Likewise. (side_effects_p): Likewise. (subreg_nregs): Likewise. (rtx_cost): Likewise. (canonicalize_condition): Likewise. * rtlanal.h (rtx_properties::try_to_add_note): Likewise. * run-rtl-passes.cc (run_rtl_passes): Likewise. * sanitizer.def (BUILT_IN_ASAN_VERSION_MISMATCH_CHECK): Likewise. * sched-deps.cc (add_dependence_1): Likewise. * sched-ebb.cc (begin_move_insn): Likewise. (add_deps_for_risky_insns): Likewise. (advance_target_bb): Likewise. * sched-int.h (reemit_notes): Likewise. (struct _haifa_insn_data): Likewise. (HID): Likewise. (DEP_CANCELLED): Likewise. (debug_ds): Likewise. (number_in_ready): Likewise. (schedule_ebbs_finish): Likewise. (find_modifiable_mems): Likewise. * sched-rgn.cc (debug_rgn_dependencies): Likewise. * sel-sched-dump.cc (dump_lv_set): Likewise. * sel-sched-dump.h: Likewise. * sel-sched-ir.cc (sel_insn_rtx_cost): Likewise. (setup_id_reg_sets): Likewise. (has_dependence_p): Likewise. (sel_num_cfg_preds_gt_1): Likewise. (bb_ends_ebb_p): Likewise. * sel-sched-ir.h (struct _list_node): Likewise. (struct idata_def): Likewise. (bb_next_bb): Likewise. * sel-sched.cc (vinsn_writes_one_of_regs_p): Likewise. (choose_best_pseudo_reg): Likewise. (verify_target_availability): Likewise. (can_speculate_dep_p): Likewise. (sel_rank_for_schedule): Likewise. * selftest-run-tests.cc (selftest::run_tests): Likewise. * selftest.h (class auto_fix_quotes): Likewise. * shrink-wrap.cc (handle_simple_exit): Likewise. * shrink-wrap.h: Likewise. * simplify-rtx.cc (simplify_context::simplify_associative_operation): Likewise. (simplify_context::simplify_gen_vec_select): Likewise. * spellcheck-tree.h: Likewise. * spellcheck.h: Likewise. * statistics.h (struct function): Likewise. * stmt.cc (conditional_probability): Likewise. * stmt.h: Likewise. * stor-layout.h: Likewise. * streamer-hooks.h: Likewise. * stringpool.h: Likewise. * symtab.cc (symbol_table::change_decl_assembler_name): Likewise. * target.def (HOOK_VECTOR_END): Likewise. (type.): Likewise. * target.h (union cumulative_args_t): Likewise. (by_pieces_ninsns): Likewise. (class predefined_function_abi): Likewise. * targhooks.cc (default_translate_mode_attribute): Likewise. * timevar.def: Likewise. * timevar.h (class timer): Likewise. * toplev.h (enable_rtl_dump_file): Likewise. * trans-mem.cc (collect_bb2reg): Likewise. * tree-call-cdce.cc (gen_conditions_for_pow): Likewise. * tree-cfg.cc (remove_bb): Likewise. (verify_gimple_debug): Likewise. (remove_edge_and_dominated_blocks): Likewise. (push_fndecl): Likewise. * tree-cfgcleanup.h (GCC_TREE_CFGCLEANUP_H): Likewise. * tree-complex.cc (expand_complex_multiplication): Likewise. (expand_complex_div_straight): Likewise. * tree-core.h (enum tree_index): Likewise. (enum operand_equal_flag): Likewise. * tree-eh.cc (honor_protect_cleanup_actions): Likewise. * tree-if-conv.cc (if_convertible_gimple_assign_stmt_p): Likewise. * tree-inline.cc (initialize_inlined_parameters): Likewise. * tree-inline.h (force_value_to_type): Likewise. * tree-nested.cc (get_chain_decl): Likewise. (walk_all_functions): Likewise. * tree-object-size.h: Likewise. * tree-outof-ssa.cc: Likewise. * tree-parloops.cc (create_parallel_loop): Likewise. * tree-pretty-print.cc (print_generic_expr_to_str): Likewise. (dump_generic_node): Likewise. * tree-profile.cc (tree_profiling): Likewise. * tree-sra.cc (maybe_add_sra_candidate): Likewise. * tree-ssa-address.cc: Likewise. * tree-ssa-alias.cc: Likewise. * tree-ssa-alias.h (ao_ref::max_size_known_p): Likewise. (dump_alias_stats): Likewise. * tree-ssa-ccp.cc: Likewise. * tree-ssa-coalesce.h: Likewise. * tree-ssa-live.cc (remove_unused_scope_block_p): Likewise. * tree-ssa-loop-manip.cc (copy_phi_node_args): Likewise. * tree-ssa-loop-unswitch.cc: Likewise. * tree-ssa-math-opts.cc: Likewise. * tree-ssa-operands.cc (class operands_scanner): Likewise. * tree-ssa-pre.cc: Likewise. * tree-ssa-reassoc.cc (optimize_ops_list): Likewise. (debug_range_entry): Likewise. * tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): Likewise. * tree-ssa-sccvn.h (TREE_SSA_SCCVN_H): Likewise. * tree-ssa-scopedtables.cc (add_expr_commutative): Likewise. (equal_mem_array_ref_p): Likewise. * tree-ssa-strlen.cc (is_strlen_related_p): Likewise. * tree-ssa-strlen.h (get_range_strlen_dynamic): Likewise. * tree-ssa-tail-merge.cc (stmt_local_def): Likewise. * tree-ssa-ter.h: Likewise. * tree-ssa-threadupdate.h (enum bb_dom_status): Likewise. * tree-streamer-in.cc (lto_input_ts_block_tree_pointers): Likewise. * tree-streamer-out.cc (pack_ts_block_value_fields): Likewise. (write_ts_block_tree_pointers): Likewise. * tree-streamer.h (struct streamer_tree_cache_d): Likewise. (streamer_read_tree_bitfields): Likewise. (streamer_write_integer_cst): Likewise. * tree-vect-patterns.cc (apply_binop_and_append_stmt): Likewise. (vect_synth_mult_by_constant): Likewise. * tree-vect-stmts.cc (vectorizable_operation): Likewise. * tree-vectorizer.cc: Likewise. * tree-vectorizer.h (class auto_purge_vect_location): Likewise. (vect_update_inits_of_drs): Likewise. (vect_get_mask_type_for_stmt): Likewise. (vect_rgroup_iv_might_wrap_p): Likewise. (cse_and_gimplify_to_preheader): Likewise. (vect_free_slp_tree): Likewise. (vect_pattern_recog): Likewise. (vect_stmt_dominates_stmt_p): Likewise. * tree.cc (initialize_tree_contains_struct): Likewise. (need_assembler_name_p): Likewise. (type_with_interoperable_signedness): Likewise. * tree.def (SWITCH_EXPR): Likewise. * tree.h (TYPE_SYMTAB_ADDRESS): Likewise. (poly_int_tree_p): Likewise. (inlined_function_outer_scope_p): Likewise. (tree_code_for_canonical_type_merging): Likewise. * value-prof.cc: Likewise. * value-prof.h (get_nth_most_common_value): Likewise. (find_func_by_profile_id): Likewise. * value-range.cc (vrp_operand_equal_p): Likewise. * value-range.h: Likewise. * var-tracking.cc: Likewise. * varasm.cc (default_function_section): Likewise. (function_section_1): Likewise. (assemble_variable): Likewise. (handle_vtv_comdat_section): Likewise. * vec.h (struct vec_prefix): Likewise. * vmsdbgout.cc (full_name): Likewise. * vtable-verify.cc: Likewise. * vtable-verify.h (struct vtv_graph_node): Likewise. * xcoffout.cc: Likewise. * xcoffout.h (DEBUG_SYMS_TEXT): Likewise. gcc/ada/ChangeLog: * Make-generated.in: Rename .c names to .cc. * adaint.c: Likewise. * ctrl_c.c (dummy_handler): Likewise. * gcc-interface/Makefile.in: Likewise. * gcc-interface/config-lang.in: Likewise. * gcc-interface/decl.cc (concat_name): Likewise. (init_gnat_decl): Likewise. * gcc-interface/gigi.h (concat_name): Likewise. (init_gnat_utils): Likewise. (build_call_raise_range): Likewise. (gnat_mark_addressable): Likewise. (gnat_protect_expr): Likewise. (gnat_rewrite_reference): Likewise. * gcc-interface/lang-specs.h (ADA_DUMPS_OPTIONS): Likewise. * gcc-interface/utils.cc (GTY): Likewise. (add_deferred_type_context): Likewise. (init_gnat_utils): Likewise. * gcc-interface/utils2.cc (gnat_stable_expr_p): Likewise. (gnat_protect_expr): Likewise. (gnat_stabilize_reference_1): Likewise. (gnat_rewrite_reference): Likewise. * gsocket.h: Likewise. * init.cc (__gnat_error_handler): Likewise. * libgnarl/s-intman.ads: Likewise. * libgnarl/s-osinte__android.ads: Likewise. * libgnarl/s-osinte__darwin.ads: Likewise. * libgnarl/s-osinte__hpux.ads: Likewise. * libgnarl/s-osinte__linux.ads: Likewise. * libgnarl/s-osinte__qnx.ads: Likewise. * libgnarl/s-taskin.ads: Likewise. * rtfinal.cc: Likewise. * s-oscons-tmplt.c (CND): Likewise. * set_targ.ads: Likewise. gcc/analyzer/ChangeLog: * analyzer.cc (is_special_named_call_p): Rename .c names to .cc. (is_named_call_p): Likewise. * region-model-asm.cc (deterministic_p): Likewise. * region.cc (field_region::get_relative_concrete_offset): Likewise. * sm-malloc.cc (method_p): Likewise. * supergraph.cc (superedge::dump_dot): Likewise. gcc/c-family/ChangeLog: * c-ada-spec.cc: Rename .c names to .cc. * c-ada-spec.h: Likewise. * c-common.cc (c_build_vec_convert): Likewise. (warning_candidate_p): Likewise. * c-common.h (enum rid): Likewise. (build_real_imag_expr): Likewise. (finish_label_address_expr): Likewise. (c_get_substring_location): Likewise. (c_build_bind_expr): Likewise. (conflict_marker_get_final_tok_kind): Likewise. (c_parse_error): Likewise. (check_missing_format_attribute): Likewise. (invalid_array_size_error): Likewise. (warn_for_multistatement_macros): Likewise. (build_attr_access_from_parms): Likewise. * c-cppbuiltin.cc (c_cpp_builtins): Likewise. * c-format.cc: Likewise. * c-gimplify.cc (c_gimplify_expr): Likewise. * c-indentation.h: Likewise. * c-objc.h (objc_prop_attr_kind_for_rid): Likewise. * c-omp.cc (c_omp_predetermined_mapping): Likewise. * c-opts.cc (c_common_post_options): Likewise. (set_std_cxx23): Likewise. * c-pragma.cc (handle_pragma_redefine_extname): Likewise. * c-pretty-print.h: Likewise. gcc/c/ChangeLog: * Make-lang.in: Rename .c names to .cc. * c-convert.cc: Likewise. * c-decl.cc (struct lang_identifier): Likewise. (pop_scope): Likewise. (finish_decl): Likewise. * c-objc-common.h (GCC_C_OBJC_COMMON): Likewise. * c-parser.cc (c_parser_skip_to_end_of_block_or_statement): Likewise. * c-parser.h (GCC_C_PARSER_H): Likewise. * c-tree.h (c_keyword_starts_typename): Likewise. (finish_declspecs): Likewise. (c_get_alias_set): Likewise. (enum c_oracle_request): Likewise. (tag_exists_p): Likewise. (set_c_expr_source_range): Likewise. * c-typeck.cc (c_common_type): Likewise. (c_finish_omp_clauses): Likewise. * config-lang.in: Likewise. gcc/cp/ChangeLog: * Make-lang.in: Rename .c names to .cc. * config-lang.in: Likewise. * constexpr.cc (cxx_eval_constant_expression): Likewise. * coroutines.cc (morph_fn_to_coro): Likewise. * cp-gimplify.cc (cp_gimplify_expr): Likewise. * cp-lang.cc (struct lang_hooks): Likewise. (get_template_argument_pack_elems_folded): Likewise. * cp-objcp-common.cc (cp_tree_size): Likewise. (cp_unit_size_without_reusable_padding): Likewise. (pop_file_scope): Likewise. (cp_pushdecl): Likewise. * cp-objcp-common.h (GCC_CP_OBJCP_COMMON): Likewise. (cxx_simulate_record_decl): Likewise. * cp-tree.h (struct named_label_entry): Likewise. (current_function_return_value): Likewise. (more_aggr_init_expr_args_p): Likewise. (get_function_version_dispatcher): Likewise. (common_enclosing_class): Likewise. (strip_fnptr_conv): Likewise. (current_decl_namespace): Likewise. (do_aggregate_paren_init): Likewise. (cp_check_const_attributes): Likewise. (qualified_name_lookup_error): Likewise. (generic_targs_for): Likewise. (mark_exp_read): Likewise. (is_global_friend): Likewise. (maybe_reject_flexarray_init): Likewise. (module_token_lang): Likewise. (handle_module_option): Likewise. (literal_integer_zerop): Likewise. (build_extra_args): Likewise. (build_if_nonnull): Likewise. (maybe_check_overriding_exception_spec): Likewise. (finish_omp_target_clauses): Likewise. (maybe_warn_zero_as_null_pointer_constant): Likewise. (cxx_print_error_function): Likewise. (decl_in_std_namespace_p): Likewise. (merge_exception_specifiers): Likewise. (mangle_module_global_init): Likewise. (cxx_block_may_fallthru): Likewise. (fold_builtin_source_location): Likewise. (enum cp_oracle_request): Likewise. (subsumes): Likewise. (cp_finish_injected_record_type): Likewise. (vtv_build_vtable_verify_fndecl): Likewise. (cp_tree_c_finish_parsing): Likewise. * cvt.cc (diagnose_ref_binding): Likewise. (convert_to_void): Likewise. (convert_force): Likewise. (type_promotes_to): Likewise. * decl.cc (make_unbound_class_template_raw): Likewise. (cxx_init_decl_processing): Likewise. (check_class_member_definition_namespace): Likewise. (cxx_maybe_build_cleanup): Likewise. * decl2.cc (maybe_emit_vtables): Likewise. * error.cc (dump_function_name): Likewise. * init.cc (is_class_type): Likewise. (build_new_1): Likewise. * lang-specs.h: Likewise. * method.cc (make_alias_for_thunk): Likewise. * module.cc (specialization_add): Likewise. (module_state::read_cluster): Likewise. * name-lookup.cc (check_extern_c_conflict): Likewise. * name-lookup.h (struct cxx_binding): Likewise. * parser.cc (cp_parser_identifier): Likewise. * parser.h (struct cp_parser): Likewise. * pt.cc (has_value_dependent_address): Likewise. (push_tinst_level_loc): Likewise. * semantics.cc (finish_omp_clauses): Likewise. (finish_omp_atomic): Likewise. * tree.cc (cp_save_expr): Likewise. (cp_free_lang_data): Likewise. * typeck.cc (cp_common_type): Likewise. (strip_array_domain): Likewise. (rationalize_conditional_expr): Likewise. (check_return_expr): Likewise. * vtable-class-hierarchy.cc: Likewise. gcc/d/ChangeLog: * d-gimplify.cc: Rename .c names to .cc. * d-incpath.cc: Likewise. * lang-specs.h: Likewise. gcc/fortran/ChangeLog: * check.cc (gfc_check_all_any): Rename .c names to .cc. * class.cc (find_intrinsic_vtab): Likewise. * config-lang.in: Likewise. * cpp.cc (cpp_define_builtins): Likewise. * data.cc (get_array_index): Likewise. * decl.cc (match_clist_expr): Likewise. (get_proc_name): Likewise. (gfc_verify_c_interop_param): Likewise. (gfc_get_pdt_instance): Likewise. (gfc_match_formal_arglist): Likewise. (gfc_get_type_attr_spec): Likewise. * dependency.cc: Likewise. * error.cc (gfc_format_decoder): Likewise. * expr.cc (check_restricted): Likewise. (gfc_build_default_init_expr): Likewise. * f95-lang.cc: Likewise. * gfc-internals.texi: Likewise. * gfortran.h (enum match): Likewise. (enum procedure_type): Likewise. (enum oacc_routine_lop): Likewise. (gfc_get_pdt_instance): Likewise. (gfc_end_source_files): Likewise. (gfc_mpz_set_hwi): Likewise. (gfc_get_option_string): Likewise. (gfc_find_sym_in_expr): Likewise. (gfc_errors_to_warnings): Likewise. (gfc_real_4_kind): Likewise. (gfc_free_finalizer): Likewise. (gfc_sym_get_dummy_args): Likewise. (gfc_check_intrinsic_standard): Likewise. (gfc_free_case_list): Likewise. (gfc_resolve_oacc_routines): Likewise. (gfc_check_vardef_context): Likewise. (gfc_free_association_list): Likewise. (gfc_implicit_pure_function): Likewise. (gfc_ref_dimen_size): Likewise. (gfc_compare_actual_formal): Likewise. (gfc_resolve_wait): Likewise. (gfc_dt_upper_string): Likewise. (gfc_generate_module_code): Likewise. (gfc_delete_bbt): Likewise. (debug): Likewise. (gfc_build_block_ns): Likewise. (gfc_dep_difference): Likewise. (gfc_invalid_null_arg): Likewise. (gfc_is_finalizable): Likewise. (gfc_fix_implicit_pure): Likewise. (gfc_is_size_zero_array): Likewise. (gfc_is_reallocatable_lhs): Likewise. * gfortranspec.cc: Likewise. * interface.cc (compare_actual_expr): Likewise. * intrinsic.cc (add_functions): Likewise. * iresolve.cc (gfc_resolve_matmul): Likewise. (gfc_resolve_alarm_sub): Likewise. * iso-c-binding.def: Likewise. * lang-specs.h: Likewise. * libgfortran.h (GFC_STDERR_UNIT_NUMBER): Likewise. * match.cc (gfc_match_label): Likewise. (gfc_match_symbol): Likewise. (match_derived_type_spec): Likewise. (copy_ts_from_selector_to_associate): Likewise. * match.h (gfc_match_call): Likewise. (gfc_get_common): Likewise. (gfc_match_omp_end_single): Likewise. (gfc_match_volatile): Likewise. (gfc_match_bind_c): Likewise. (gfc_match_literal_constant): Likewise. (gfc_match_init_expr): Likewise. (gfc_match_array_constructor): Likewise. (gfc_match_end_interface): Likewise. (gfc_match_print): Likewise. (gfc_match_expr): Likewise. * matchexp.cc (next_operator): Likewise. * mathbuiltins.def: Likewise. * module.cc (free_true_name): Likewise. * openmp.cc (gfc_resolve_omp_parallel_blocks): Likewise. (gfc_omp_save_and_clear_state): Likewise. * parse.cc (parse_union): Likewise. (set_syms_host_assoc): Likewise. * resolve.cc (resolve_actual_arglist): Likewise. (resolve_elemental_actual): Likewise. (check_host_association): Likewise. (resolve_typebound_function): Likewise. (resolve_typebound_subroutine): Likewise. (gfc_resolve_expr): Likewise. (resolve_assoc_var): Likewise. (resolve_typebound_procedures): Likewise. (resolve_equivalence_derived): Likewise. * simplify.cc (simplify_bound): Likewise. * symbol.cc (gfc_set_default_type): Likewise. (gfc_add_ext_attribute): Likewise. * target-memory.cc (gfc_target_interpret_expr): Likewise. * target-memory.h (gfc_target_interpret_expr): Likewise. * trans-array.cc (gfc_get_cfi_dim_sm): Likewise. (gfc_conv_shift_descriptor_lbound): Likewise. (gfc_could_be_alias): Likewise. (gfc_get_dataptr_offset): Likewise. * trans-const.cc: Likewise. * trans-decl.cc (trans_function_start): Likewise. (gfc_trans_deferred_vars): Likewise. (generate_local_decl): Likewise. (gfc_generate_function_code): Likewise. * trans-expr.cc (gfc_vptr_size_get): Likewise. (gfc_trans_class_array_init_assign): Likewise. (POWI_TABLE_SIZE): Likewise. (gfc_conv_procedure_call): Likewise. (gfc_trans_arrayfunc_assign): Likewise. * trans-intrinsic.cc (gfc_conv_intrinsic_len): Likewise. (gfc_conv_intrinsic_loc): Likewise. (conv_intrinsic_event_query): Likewise. * trans-io.cc (gfc_build_st_parameter): Likewise. * trans-openmp.cc (gfc_omp_check_optional_argument): Likewise. (gfc_omp_unshare_expr_r): Likewise. (gfc_trans_omp_array_section): Likewise. (gfc_trans_omp_clauses): Likewise. * trans-stmt.cc (trans_associate_var): Likewise. (gfc_trans_deallocate): Likewise. * trans-stmt.h (gfc_trans_class_init_assign): Likewise. (gfc_trans_deallocate): Likewise. (gfc_trans_oacc_declare): Likewise. * trans-types.cc: Likewise. * trans-types.h (enum gfc_packed): Likewise. * trans.cc (N_): Likewise. (trans_code): Likewise. * trans.h (gfc_build_compare_string): Likewise. (gfc_conv_expr_type): Likewise. (gfc_trans_deferred_vars): Likewise. (getdecls): Likewise. (gfc_get_array_descr_info): Likewise. (gfc_omp_firstprivatize_type_sizes): Likewise. (GTY): Likewise. gcc/go/ChangeLog: * config-lang.in: Rename .c names to .cc. * go-backend.cc: Likewise. * go-lang.cc: Likewise. * gospec.cc: Likewise. * lang-specs.h: Likewise. gcc/jit/ChangeLog: * config-lang.in: Rename .c names to .cc. * docs/_build/texinfo/libgccjit.texi: Likewise. * docs/internals/index.rst: Likewise. * jit-builtins.cc (builtins_manager::make_builtin_function): Likewise. * jit-playback.cc (fold_const_var): Likewise. (playback::context::~context): Likewise. (new_field): Likewise. (new_bitfield): Likewise. (new_compound_type): Likewise. (playback::compound_type::set_fields): Likewise. (global_set_init_rvalue): Likewise. (load_blob_in_ctor): Likewise. (new_global_initialized): Likewise. (double>): Likewise. (new_string_literal): Likewise. (as_truth_value): Likewise. (build_call): Likewise. (playback::context::build_cast): Likewise. (new_array_access): Likewise. (new_field_access): Likewise. (dereference): Likewise. (postprocess): Likewise. (add_jump): Likewise. (add_switch): Likewise. (build_goto_operands): Likewise. (playback::context::read_dump_file): Likewise. (init_types): Likewise. * jit-recording.cc (recording::context::get_int_type): Likewise. * jit-recording.h: Likewise. * libgccjit.cc (compatible_types): Likewise. (gcc_jit_context_acquire): Likewise. (gcc_jit_context_release): Likewise. (gcc_jit_context_new_child_context): Likewise. (gcc_jit_type_as_object): Likewise. (gcc_jit_context_get_type): Likewise. (gcc_jit_context_get_int_type): Likewise. (gcc_jit_type_get_pointer): Likewise. (gcc_jit_type_get_const): Likewise. (gcc_jit_type_get_volatile): Likewise. (gcc_jit_type_dyncast_array): Likewise. (gcc_jit_type_is_bool): Likewise. (gcc_jit_type_is_pointer): Likewise. (gcc_jit_type_is_integral): Likewise. (gcc_jit_type_dyncast_vector): Likewise. (gcc_jit_type_is_struct): Likewise. (gcc_jit_vector_type_get_num_units): Likewise. (gcc_jit_vector_type_get_element_type): Likewise. (gcc_jit_type_unqualified): Likewise. (gcc_jit_type_dyncast_function_ptr_type): Likewise. (gcc_jit_function_type_get_return_type): Likewise. (gcc_jit_function_type_get_param_count): Likewise. (gcc_jit_function_type_get_param_type): Likewise. (gcc_jit_context_new_array_type): Likewise. (gcc_jit_context_new_field): Likewise. (gcc_jit_field_as_object): Likewise. (gcc_jit_context_new_struct_type): Likewise. (gcc_jit_struct_as_type): Likewise. (gcc_jit_struct_set_fields): Likewise. (gcc_jit_struct_get_field_count): Likewise. (gcc_jit_context_new_union_type): Likewise. (gcc_jit_context_new_function_ptr_type): Likewise. (gcc_jit_param_as_rvalue): Likewise. (gcc_jit_context_new_function): Likewise. (gcc_jit_function_get_return_type): Likewise. (gcc_jit_function_dump_to_dot): Likewise. (gcc_jit_block_get_function): Likewise. (gcc_jit_global_set_initializer_rvalue): Likewise. (gcc_jit_rvalue_get_type): Likewise. (gcc_jit_context_new_rvalue_from_int): Likewise. (gcc_jit_context_one): Likewise. (gcc_jit_context_new_rvalue_from_double): Likewise. (gcc_jit_context_null): Likewise. (gcc_jit_context_new_string_literal): Likewise. (valid_binary_op_p): Likewise. (gcc_jit_context_new_binary_op): Likewise. (gcc_jit_context_new_comparison): Likewise. (gcc_jit_context_new_call): Likewise. (is_valid_cast): Likewise. (gcc_jit_context_new_cast): Likewise. (gcc_jit_object_get_context): Likewise. (gcc_jit_object_get_debug_string): Likewise. (gcc_jit_lvalue_access_field): Likewise. (gcc_jit_rvalue_access_field): Likewise. (gcc_jit_rvalue_dereference_field): Likewise. (gcc_jit_rvalue_dereference): Likewise. (gcc_jit_lvalue_get_address): Likewise. (gcc_jit_lvalue_set_tls_model): Likewise. (gcc_jit_lvalue_set_link_section): Likewise. (gcc_jit_function_new_local): Likewise. (gcc_jit_block_add_eval): Likewise. (gcc_jit_block_add_assignment): Likewise. (is_bool): Likewise. (gcc_jit_block_end_with_conditional): Likewise. (gcc_jit_block_add_comment): Likewise. (gcc_jit_block_end_with_jump): Likewise. (gcc_jit_block_end_with_return): Likewise. (gcc_jit_block_end_with_void_return): Likewise. (case_range_validator::case_range_validator): Likewise. (case_range_validator::validate): Likewise. (case_range_validator::get_wide_int): Likewise. (gcc_jit_block_end_with_switch): Likewise. (gcc_jit_context_set_str_option): Likewise. (gcc_jit_context_set_int_option): Likewise. (gcc_jit_context_set_bool_option): Likewise. (gcc_jit_context_set_bool_allow_unreachable_blocks): Likewise. (gcc_jit_context_set_bool_use_external_driver): Likewise. (gcc_jit_context_add_command_line_option): Likewise. (gcc_jit_context_add_driver_option): Likewise. (gcc_jit_context_enable_dump): Likewise. (gcc_jit_context_compile): Likewise. (gcc_jit_context_compile_to_file): Likewise. (gcc_jit_context_set_logfile): Likewise. (gcc_jit_context_dump_reproducer_to_file): Likewise. (gcc_jit_context_get_first_error): Likewise. (gcc_jit_context_get_last_error): Likewise. (gcc_jit_result_get_code): Likewise. (gcc_jit_result_get_global): Likewise. (gcc_jit_rvalue_set_bool_require_tail_call): Likewise. (gcc_jit_type_get_aligned): Likewise. (gcc_jit_type_get_vector): Likewise. (gcc_jit_function_get_address): Likewise. (gcc_jit_version_patchlevel): Likewise. (gcc_jit_block_add_extended_asm): Likewise. (gcc_jit_extended_asm_as_object): Likewise. (gcc_jit_extended_asm_set_volatile_flag): Likewise. (gcc_jit_extended_asm_set_inline_flag): Likewise. (gcc_jit_extended_asm_add_output_operand): Likewise. (gcc_jit_extended_asm_add_input_operand): Likewise. (gcc_jit_extended_asm_add_clobber): Likewise. * notes.txt: Likewise. gcc/lto/ChangeLog: * config-lang.in: Rename .c names to .cc. * lang-specs.h: Likewise. * lto-common.cc (gimple_register_canonical_type_1): Likewise. * lto-common.h: Likewise. * lto-dump.cc (lto_main): Likewise. * lto-lang.cc (handle_fnspec_attribute): Likewise. (lto_getdecls): Likewise. (lto_init): Likewise. * lto.cc (lto_main): Likewise. * lto.h: Likewise. gcc/objc/ChangeLog: * Make-lang.in: Rename .c names to .cc. * config-lang.in: Likewise. * lang-specs.h: Likewise. * objc-act.cc (objc_build_component_ref): Likewise. (objc_copy_binfo): Likewise. (lookup_method_in_hash_lists): Likewise. (objc_finish_foreach_loop): Likewise. * objc-act.h (objc_common_init_ts): Likewise. * objc-gnu-runtime-abi-01.cc: Likewise. * objc-lang.cc (struct lang_hooks): Likewise. * objc-map.cc: Likewise. * objc-next-runtime-abi-01.cc (generate_objc_symtab_decl): Likewise. * objc-runtime-shared-support.cc: Likewise. * objc-runtime-shared-support.h (build_protocol_initializer): Likewise. gcc/objcp/ChangeLog: * Make-lang.in: Rename .c names to .cc. * config-lang.in: Likewise. * lang-specs.h: Likewise. * objcp-decl.cc (objcp_end_compound_stmt): Likewise. * objcp-lang.cc (struct lang_hooks): Likewise. gcc/po/ChangeLog: * EXCLUDES: Rename .c names to .cc. libcpp/ChangeLog: * Makefile.in: Rename .c names to .cc. * charset.cc (convert_escape): Likewise. * directives.cc (directive_diagnostics): Likewise. (_cpp_handle_directive): Likewise. (lex_macro_node): Likewise. * include/cpplib.h (struct _cpp_file): Likewise. (PURE_ZERO): Likewise. (cpp_defined): Likewise. (cpp_error_at): Likewise. (cpp_forall_identifiers): Likewise. (cpp_compare_macros): Likewise. (cpp_get_converted_source): Likewise. (cpp_read_state): Likewise. (cpp_directive_only_process): Likewise. (struct cpp_decoded_char): Likewise. * include/line-map.h (enum lc_reason): Likewise. (enum location_aspect): Likewise. * include/mkdeps.h: Likewise. * init.cc (cpp_destroy): Likewise. (cpp_finish): Likewise. * internal.h (struct cpp_reader): Likewise. (_cpp_defined_macro_p): Likewise. (_cpp_backup_tokens_direct): Likewise. (_cpp_destroy_hashtable): Likewise. (_cpp_has_header): Likewise. (_cpp_expand_op_stack): Likewise. (_cpp_commit_buff): Likewise. (_cpp_restore_special_builtin): Likewise. (_cpp_bracket_include): Likewise. (_cpp_replacement_text_len): Likewise. (ufputs): Likewise. * line-map.cc (linemap_macro_loc_to_exp_point): Likewise. (linemap_check_files_exited): Likewise. (line_map_new_raw): Likewise. * traditional.cc (enum ls): Likewise.
2022-01-16[i386] GLC tuning: Break false dependency for dest register.wwwhhhyyy1-0/+6
For GoldenCove micro-architecture, force insert zero-idiom in asm template to break false dependency of dest register for several insns. The related insns are: VPERM/D/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH VFCMULCSH/PH gcc/ChangeLog: * config/i386/i386.h (TARGET_DEST_FALSE_DEP_FOR_GLC): New macro. * config/i386/sse.md (<avx512>_<complexopname>_<mode><maskc_name><round_name>): Insert zero-idiom in output template when attr enabled, set new attribute to true for non-mask/maskz insn. (avx512fp16_<complexopname>sh_v8hf<mask_scalarc_name><round_scalarcz_name>): Likewise. (avx512dq_mul<mode>3<mask_name>): Likewise. (<avx2_avx512>_permvar<mode><mask_name>): Likewise. (avx2_perm<mode>_1<mask_name>): Likewise. (avx512f_perm<mode>_1<mask_name>): Likewise. (avx512dq_rangep<mode><mask_name><round_saeonly_name>): Likewise. (avx512dq_ranges<mode><mask_scalar_name><round_saeonly_scalar_name>): Likewise. (<avx512>_getmant<mode><mask_name><round_saeonly_name>): Likewise. (avx512f_vgetmant<mode><mask_scalar_name><round_saeonly_scalar_name>): Likewise. * config/i386/subst.md (mask3_dest_false_dep_for_glc_cond): New subst_attr. (mask4_dest_false_dep_for_glc_cond): Likewise. (mask6_dest_false_dep_for_glc_cond): Likewise. (mask10_dest_false_dep_for_glc_cond): Likewise. (maskc_dest_false_dep_for_glc_cond): Likewise. (mask_scalar4_dest_false_dep_for_glc_cond): Likewise. (mask_scalarc_dest_false_dep_for_glc_cond): Likewise. * config/i386/x86-tune.def (X86_TUNE_DEST_FALSE_DEP_FOR_GLC): New DEF_TUNE enabled for m_SAPPHIRERAPIDS and m_ALDERLAKE gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-dest-false-dep-for-glc.c: New test. * gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512fp16vl-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto.
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-12-04i386, ipa-modref: Comment spelling fixJakub Jelinek1-2/+2
This patch fixes spelling of prefer (misspelled as preffer). 2021-12-04 Jakub Jelinek <jakub@redhat.com> * config/i386/x86-tune.def (X86_TUNE_PARTIAL_REG_DEPENDENCY): Fix comment typo, Preffer -> prefer. * ipa-modref-tree.c (modref_access_node::closer_pair_p): Likewise.
2021-12-03x86: Add -mmove-max=bits and -mstore-max=bitsH.J. Lu1-0/+10
Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move and store, independent of -mprefer-vector-width=bits: 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES which are enabled for Intel Sapphire Rapids processor. 2. Add -mmove-max=bits to set the maximum number of bits can be moved from memory to memory efficiently. The default value is derived from X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the preferred vector width. 3. Add -mstore-max=bits to set the maximum number of bits can be stored to memory efficiently. The default value is derived from X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the preferred vector width. gcc/ PR target/103269 * config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE and PVW_NONE to ix86_target_string. * config/i386/i386-options.c (ix86_target_string): Add arguments for move_max and store_max. (ix86_target_string::add_vector_width): New lambda. (ix86_debug_options): Pass ix86_move_max and ix86_store_max to ix86_target_string. (ix86_function_specific_print): Pass ptr->x_ix86_move_max and ptr->x_ix86_store_max to ix86_target_string. (ix86_valid_target_attribute_tree): Handle x_ix86_move_max and x_ix86_store_max. (ix86_option_override_internal): Set the default x_ix86_move_max and x_ix86_store_max. * config/i386/i386-options.h (ix86_target_string): Add prefer_vector_width and prefer_vector_width. * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed. (TARGET_AVX256_STORE_BY_PIECES): Likewise. (MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max == PVW_AVX512. Use 32 if ix86_move_max or ix86_store_max >= PVW_AVX256. (STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512. Use 32 if ix86_store_max >= PVW_AVX256. * config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits. * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New. (X86_TUNE_AVX512_STORE_BY_PIECES): Likewise. * doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits. gcc/testsuite/ PR target/103269 * gcc.target/i386/pieces-memcpy-17.c: New test. * gcc.target/i386/pieces-memcpy-18.c: Likewise. * gcc.target/i386/pieces-memcpy-19.c: Likewise. * gcc.target/i386/pieces-memcpy-20.c: Likewise. * gcc.target/i386/pieces-memcpy-21.c: Likewise. * gcc.target/i386/pieces-memset-45.c: Likewise. * gcc.target/i386/pieces-memset-46.c: Likewise. * gcc.target/i386/pieces-memset-47.c: Likewise. * gcc.target/i386/pieces-memset-48.c: Likewise. * gcc.target/i386/pieces-memset-49.c: Likewise.
2021-12-01i386: Fix up some minor formatting issues and one inconsistencyJakub Jelinek1-4/+4
While looking at a proposed vendor backport, I've noticed some formatting issues in x86-tune.def. Also, in all spots m_GENERIC comes last, except one recently changed, I think it is useful to have m_GENERIC always last for consistency. 2021-12-01 Jakub Jelinek <jakub@redhat.com> * config/i386/x86-tune.def (X86_TUNE_SCHEDULE, X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY, X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Formatting fixes. (X86_TUNE_USE_GATHER): Put m_GENERIC last for consistency.
2021-11-11x86: Update -mtune=alderlakeCui,Lili1-28/+30
Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and UMONITOR/UMWAIT/TPAUSE are supported. gcc/ChangeLog * config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake from m_CORE_AVX2. (processor_cost_table): Use alderlake_cost for Alderlake. * config/i386/i386.c (ix86_sched_init_global): Handle Alderlake. * config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake cost. * config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake issue rate to 4. (ix86_adjust_cost): Handle Alderlake. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. (X86_TUNE_MEMORY_MISMATCH_STALL): Likewise. (X86_TUNE_USE_LEAVE): Likewise. (X86_TUNE_PUSH_MEMORY): Likewise. (X86_TUNE_USE_INCDEC): Likewise. (X86_TUNE_INTEGER_DFMODE_MOVES): Likewise. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise. (X86_TUNE_USE_SAHF): Likewise. (X86_TUNE_USE_BT): Likewise. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise. (X86_TUNE_ONE_IF_CONV_INSN): Likewise. (X86_TUNE_AVOID_MFENCE): Likewise. (X86_TUNE_USE_SIMODE_FIOP): Likewise. (X86_TUNE_EXT_80387_CONSTANTS): Likewise. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_SSE_TYPELESS_STORES): Likewise. (X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise. (X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise. (X86_TUNE_USE_GATHER): Disable for Alderlake. (X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise. (X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.
2021-09-17x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCYH.J. Lu1-0/+15
1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters. 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP splitters. 3. Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling avx_partial_xmm_update attribute. Don't convert AVX partial XMM register update if there is no partial SSE register dependency for SSE conversion. gcc/ * config/i386/i386-features.c (remove_partial_avx_dependency): Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before generating vxorps. * config/i386/i386.h (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New. (TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. * config/i386/i386.md (SSE FP to FP splitters): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY. (SSE INT to FP splitter): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY. * config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. gcc/testsuite/ * gcc.target/i386/avx-covert-1.c: New file. * gcc.target/i386/avx-fp-covert-1.c: Likewise. * gcc.target/i386/avx-int-covert-1.c: Likewise. * gcc.target/i386/sse-covert-1.c: Likewise. * gcc.target/i386/sse-fp-covert-1.c: Likewise. * gcc.target/i386/sse-int-covert-1.c: Likewise.
2021-09-17x86: Update memcpy/memset inline strategies for -mtune=tremontH.J. Lu1-1/+1
Simply memcpy and memset inline strategies to avoid branches for -mtune=tremont: 1. Create Tremont cost model from generic cost model. 2. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector load and store for up to 16 * 16 (256) bytes when the data size is fixed and known. 3. Inline only if data size is known to be <= 256. a. Use "rep movsb/stosb" with simple code sequence if the data size is a constant. b. Use loop if data size is not a constant. 4. Use memcpy/memset libray function if data size is unknown or > 256. * config/i386/i386-options.c (processor_cost_table): Use tremont_cost for Tremont. * config/i386/x86-tune-costs.h (tremont_memcpy): New. (tremont_memset): Likewise. (tremont_cost): Likewise. * config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Enable for Tremont.
2021-09-17x86: Update -mtune=tremontH.J. Lu1-18/+19
Initial -mtune=tremont update 1. Use Haswell scheduling model. 2. Assume that stack engine allows to execute push&pop instructions in parall. 3. Prepare for scheduling pass as -mtune=generic. 4. Use the same issue rate as -mtune=generic. 5. Enable partial_reg_dependency. 6. Disable accumulate_outgoing_args 7. Enable use_leave 8. Enable push_memory 9. Disable four_jump_limit 10. Disable opt_agu 11. Disable avoid_lea_for_addr 12. Disable avoid_mem_opnd_for_cmove 13. Enable misaligned_move_string_pro_epilogues 14. Enable use_cltd 16. Enable avoid_false_dep_for_bmi 17. Enable avoid_mfence 18. Disable expand_abs 19. Enable sse_typeless_stores 20. Enable sse_load0_by_pxor 21. Disable split_mem_opnd_for_fp_converts 22. Disable slow_pshufb 23. Enable partial_reg_dependency This is the first patch to tune for Tremont. With all patches applied, performance impacts on SPEC CPU 2017 are: 500.perlbench_r 1.81% 502.gcc_r 0.57% 505.mcf_r 1.16% 520.omnetpp_r 0.00% 523.xalancbmk_r 0.00% 525.x264_r 4.55% 531.deepsjeng_r 0.00% 541.leela_r 0.39% 548.exchange2_r 1.13% 557.xz_r 0.00% geomean for intrate 0.95% 503.bwaves_r 0.00% 507.cactuBSSN_r 6.94% 508.namd_r 12.37% 510.parest_r 1.01% 511.povray_r 3.70% 519.lbm_r 36.61% 521.wrf_r 8.79% 526.blender_r 2.91% 527.cam4_r 6.23% 538.imagick_r 0.28% 544.nab_r 21.99% 549.fotonik3d_r 3.63% 554.roms_r -1.20% geomean for fprate 7.50% gcc/ChangeLog * common/config/i386/i386-common.c: Use Haswell scheduling model for Tremont. * config/i386/i386.c (ix86_sched_init_global): Prepare for Tremont scheduling pass. * config/i386/x86-tune-sched.c (ix86_issue_rate): Change Tremont issue rate to 4. (ix86_adjust_cost): Handle Tremont. * config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Enable for Tremont. (X86_TUNE_USE_LEAVE): Likewise. (X86_TUNE_PUSH_MEMORY): Likewise. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise. (X86_TUNE_USE_CLTD): Likewise. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise. (X86_TUNE_AVOID_MFENCE): Likewise. (X86_TUNE_SSE_TYPELESS_STORES): Likewise. (X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise. (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Disable for Tremont. (X86_TUNE_FOUR_JUMP_LIMIT): Likewise. (X86_TUNE_OPT_AGU): Likewise. (X86_TUNE_AVOID_LEA_FOR_ADDR): Likewise. (X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE): Likewise. (X86_TUNE_EXPAND_ABS): Likewise. (X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS): Likewise. (X86_TUNE_SLOW_PSHUFB): Likewise.
2021-09-13x86: Add TARGET_AVX256_[MOVE|STORE]_BY_PIECESH.J. Lu1-0/+11
1. Add TARGET_AVX256_MOVE_BY_PIECES to perform move by-pieces operation with 256-bit AVX instructions. 2. Add TARGET_AVX256_STORE_BY_PIECES to perform move and store by-pieces operations with 256-bit AVX instructions. They are enabled only for Intel Alder Lake and Intel processors with AVX512. gcc/ PR target/101935 * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): New. (TARGET_AVX256_STORE_BY_PIECES): Likewise. (MOVE_MAX): Check TARGET_AVX256_MOVE_BY_PIECES and TARGET_AVX256_STORE_BY_PIECES instead of TARGET_AVX256_SPLIT_UNALIGNED_LOAD and TARGET_AVX256_SPLIT_UNALIGNED_STORE. (STORE_MAX_PIECES): Check TARGET_AVX256_STORE_BY_PIECES instead of TARGET_AVX256_SPLIT_UNALIGNED_STORE. * config/i386/x86-tune.def (X86_TUNE_AVX256_MOVE_BY_PIECES): New. (X86_TUNE_AVX256_STORE_BY_PIECES): Likewise. gcc/testsuite/ PR target/101935 * g++.target/i386/pr80566-1.C: Add -mtune-ctrl=avx256_store_by_pieces. * gcc.target/i386/pr100865-4a.c: Likewise. * gcc.target/i386/pr100865-10a.c: Likewise. * gcc.target/i386/pr90773-20.c: Likewise. * gcc.target/i386/pr90773-21.c: Likewise. * gcc.target/i386/pr90773-22.c: Likewise. * gcc.target/i386/pr90773-23.c: Likewise. * g++.target/i386/pr80566-2.C: Add -mtune-ctrl=avx256_move_by_pieces. * gcc.target/i386/eh_return-1.c: Likewise. * gcc.target/i386/pr90773-26.c: Likewise. * gcc.target/i386/pieces-memcpy-12.c: Replace -mtune=haswell with -mtune-ctrl=avx256_move_by_pieces. * gcc.target/i386/pieces-memcpy-15.c: Likewise. * gcc.target/i386/pieces-memset-2.c: Replace -mtune=haswell with -mtune-ctrl=avx256_store_by_pieces. * gcc.target/i386/pieces-memset-5.c: Likewise. * gcc.target/i386/pieces-memset-11.c: Likewise. * gcc.target/i386/pieces-memset-14.c: Likewise. * gcc.target/i386/pieces-memset-20.c: Likewise. * gcc.target/i386/pieces-memset-23.c: Likewise. * gcc.target/i386/pieces-memset-29.c: Likewise. * gcc.target/i386/pieces-memset-30.c: Likewise. * gcc.target/i386/pieces-memset-33.c: Likewise. * gcc.target/i386/pieces-memset-34.c: Likewise. * gcc.target/i386/pieces-memset-44.c: Likewise. * gcc.target/i386/pieces-memset-37.c: Replace -mtune=generic with -mtune-ctrl=avx256_store_by_pieces.
2021-08-18Add x86 tune to enable v2df vector reduction by paddpd.liuhongt1-0/+5
The tune is disabled by default. gcc/ChangeLog: PR target/97147 * config/i386/i386.h (TARGET_V2DF_REDUCTION_PREFER_HADDPD): New macro. * config/i386/sse.md (*sse3_haddv2df3_low): Add TARGET_V2DF_REDUCTION_PREFER_HADDPD. (*sse3_hsubv2df3_low): Ditto. * config/i386/x86-tune.def (X86_TUNE_V2DF_REDUCTION_PREFER_HADDPD): New tune. gcc/testsuite/ChangeLog: PR target/97147 * gcc.target/i386/pr54400.c: Adjust testcase. * gcc.target/i386/pr94147.c: New test.
2021-04-06x86: Update memcpy/memset inline strategies for Skylake family CPUsH.J. Lu1-2/+1
Simply memcpy and memset inline strategies to avoid branches for Skylake family CPUs: 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector load and store for up to 16 * 16 (256) bytes when the data size is fixed and known. 2. Inline only if data size is known to be <= 256. a. Use "rep movsb/stosb" with simple code sequence if the data size is a constant. b. Use loop if data size is not a constant. 3. Use memcpy/memset libray function if data size is unknown or > 256. On Cascadelake processor with -march=native -Ofast -flto, 1. Performance impacts of SPEC CPU 2017 rate are: 500.perlbench_r 0.17% 502.gcc_r -0.36% 505.mcf_r 0.00% 520.omnetpp_r 0.08% 523.xalancbmk_r -0.62% 525.x264_r 1.04% 531.deepsjeng_r 0.11% 541.leela_r -1.09% 548.exchange2_r -0.25% 557.xz_r 0.17% Geomean -0.08% 503.bwaves_r 0.00% 507.cactuBSSN_r 0.69% 508.namd_r -0.07% 510.parest_r 1.12% 511.povray_r 1.82% 519.lbm_r 0.00% 521.wrf_r -1.32% 526.blender_r -0.47% 527.cam4_r 0.23% 538.imagick_r -1.72% 544.nab_r -0.56% 549.fotonik3d_r 0.12% 554.roms_r 0.43% Geomean 0.02% 2. Significant impacts on eembc benchmarks are: eembc/idctrn01 9.23% eembc/nnet_test 29.26% gcc/ * config/i386/x86-tune-costs.h (skylake_memcpy): Updated. (skylake_memset): Likewise. (skylake_cost): Change CLEAR_RATIO to 17. * config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Replace m_CANNONLAKE, m_ICELAKE_CLIENT, m_ICELAKE_SERVER, m_TIGERLAKE and m_SAPPHIRERAPIDS with m_SKYLAKE and m_CORE_AVX512. gcc/testsuite/ * gcc.target/i386/memcpy-strategy-9.c: New test. * gcc.target/i386/memcpy-strategy-10.c: Likewise. * gcc.target/i386/memcpy-strategy-11.c: Likewise. * gcc.target/i386/memset-strategy-7.c: Likewise. * gcc.target/i386/memset-strategy-8.c: Likewise. * gcc.target/i386/memset-strategy-9.c: Likewise.
2021-03-31x86: Update memcpy/memset inline strategies for Ice LakeH.J. Lu1-0/+7
Simply memcpy and memset inline strategies to avoid branches for -mtune=icelake: 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector load and store for up to 16 * 16 (256) bytes when the data size is fixed and known. 2. Inline only if data size is known to be <= 256. a. Use "rep movsb/stosb" with simple code sequence if the data size is a constant. b. Use loop if data size is not a constant. 3. Use memcpy/memset libray function if data size is unknown or > 256. On Ice Lake processor with -march=native -Ofast -flto, 1. Performance impacts of SPEC CPU 2017 rate are: 500.perlbench_r -0.93% 502.gcc_r 0.36% 505.mcf_r 0.31% 520.omnetpp_r -0.07% 523.xalancbmk_r -0.53% 525.x264_r -0.09% 531.deepsjeng_r -0.19% 541.leela_r 0.16% 548.exchange2_r 0.22% 557.xz_r -1.64% Geomean -0.24% 503.bwaves_r -0.01% 507.cactuBSSN_r 0.00% 508.namd_r 0.12% 510.parest_r 0.07% 511.povray_r 0.29% 519.lbm_r 0.00% 521.wrf_r -0.38% 526.blender_r 0.16% 527.cam4_r 0.18% 538.imagick_r 0.76% 544.nab_r -0.84% 549.fotonik3d_r -0.07% 554.roms_r -0.01% Geomean 0.02% 2. Significant impacts on eembc benchmarks are: eembc/nnet_test 9.90% eembc/mp2decoddata2 16.42% eembc/textv2data3 -4.86% eembc/qos 12.90% gcc/ * config/i386/i386-expand.c (expand_set_or_cpymem_via_rep): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, don't convert QImode to SImode. (decide_alg): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, use "rep movsb/stosb" only for known sizes. * config/i386/i386-options.c (processor_cost_table): Use Ice Lake cost for Cannon Lake, Ice Lake, Tiger Lake, Sapphire Rapids and Alder Lake. * config/i386/i386.h (TARGET_PREFER_KNOWN_REP_MOVSB_STOSB): New. * config/i386/x86-tune-costs.h (icelake_memcpy): New. (icelake_memset): Likewise. (icelake_cost): Likewise. * config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): New. gcc/testsuite/ * gcc.target/i386/memcpy-strategy-5.c: New test. * gcc.target/i386/memcpy-strategy-6.c: Likewise. * gcc.target/i386/memcpy-strategy-7.c: Likewise. * gcc.target/i386/memcpy-strategy-8.c: Likewise. * gcc.target/i386/memset-strategy-3.c: Likewise. * gcc.target/i386/memset-strategy-4.c: Likewise. * gcc.target/i386/memset-strategy-5.c: Likewise. * gcc.target/i386/memset-strategy-6.c: Likewise.
2021-03-17Enable gather on zen3 hardware.Jan Hubicka1-1/+1
For TSVC it get used by 5 benchmarks with following runtime improvements: s4114: 1.424 -> 1.209 (84.9017%) s4115: 2.021 -> 1.065 (52.6967%) s4116: 1.549 -> 0.854 (55.1323%) s4117: 1.386 -> 1.193 (86.075%) vag: 2.741 -> 1.940 (70.7771%) there is regression in s4112: 1.115 -> 1.184 (106.188%) The internal loop is: for (int i = 0; i < LEN_1D; i++) { a[i] += b[ip[i]] * s; } (so a standard accmulate and add with indirect addressing) 40a400: c5 fe 6f 24 03 vmovdqu (%rbx,%rax,1),%ymm4 40a405: c5 fc 28 da vmovaps %ymm2,%ymm3 40a409: 48 83 c0 20 add $0x20,%rax 40a40d: c4 e2 65 92 04 a5 00 vgatherdps %ymm3,0x594100(,%ymm4,4),%ymm0 40a414: 41 59 00 40a417: c4 e2 75 a8 80 e0 34 vfmadd213ps 0x5b34e0(%rax),%ymm1,%ymm0 40a41e: 5b 00 40a420: c5 fc 29 80 e0 34 5b vmovaps %ymm0,0x5b34e0(%rax) 40a427: 00 40a428: 48 3d 00 f4 01 00 cmp $0x1f400,%rax 40a42e: 75 d0 jne 40a400 <s4112+0x60> compared to: 40a280: 49 63 14 04 movslq (%r12,%rax,1),%rdx 40a284: 48 83 c0 04 add $0x4,%rax 40a288: c5 fa 10 04 95 00 41 vmovss 0x594100(,%rdx,4),%xmm0 40a28f: 59 00 40a291: c4 e2 71 a9 80 fc 34 vfmadd213ss 0x5b34fc(%rax),%xmm1,%xmm0 40a298: 5b 00 40a29a: c5 fa 11 80 fc 34 5b vmovss %xmm0,0x5b34fc(%rax) 40a2a1: 00 40a2a2: 48 3d 00 f4 01 00 cmp $0x1f400,%rax 40a2a8: 75 d6 jne 40a280 <s4112+0x40> Looking at instructions latencies - fmadd is 4 cycles - vgatherdps is 39 So vgather iself is 4.8 cycle per iteration and probably CPU is able to execute rest out of order getting clos to 4 cycles per iteration (it can do 2 loads in parallel, one store and rest fits easily to execution resources). That would explain 20% slowdown. gimple internal loop is: _2 = a[i_38]; _3 = (long unsigned int) i_38; _4 = _3 * 4; _5 = ip_18 + _4; _6 = *_5; _7 = b[_6]; _8 = _7 * s_19; _9 = _2 + _8; a[i_38] = _9; i_28 = i_38 + 1; ivtmp_52 = ivtmp_53 - 1; if (ivtmp_52 != 0) goto <bb 8>; [98.99%] else goto <bb 4>; [1.01%] 0x25bac30 a[i_38] 1 times scalar_load costs 12 in body 0x25bac30 *_5 1 times scalar_load costs 12 in body 0x25bac30 b[_6] 1 times scalar_load costs 12 in body 0x25bac30 _7 * s_19 1 times scalar_stmt costs 12 in body 0x25bac30 _2 + _8 1 times scalar_stmt costs 12 in body 0x25bac30 _9 1 times scalar_store costs 16 in body so 19 cycles estimate of scalar load 0x2668630 a[i_38] 1 times vector_load costs 12 in body 0x2668630 *_5 1 times unaligned_load (misalign -1) costs 12 in body 0x2668630 b[_6] 8 times scalar_load costs 96 in body 0x2668630 _7 * s_19 1 times scalar_to_vec costs 4 in prologue 0x2668630 _7 * s_19 1 times vector_stmt costs 12 in body 0x2668630 _2 + _8 1 times vector_stmt costs 12 in body 0x2668630 _9 1 times vector_store costs 16 in body so 40 cycles per 8x vectorized body tsvc.c:3450:27: note: operating only on full vectors. tsvc.c:3450:27: note: Cost model analysis: Vector inside of loop cost: 160 Vector prologue cost: 4 Vector epilogue cost: 0 Scalar iteration cost: 76 Scalar outside cost: 0 Vector outside cost: 4 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 1 I think this generally suffers from GIGO principle. One problem seems to be that we do not know about fmadd yet and compute it as two instructions (6 cycles instead of 4). More importnat problem is that we do not account the parallelism at all. I do not see how to disable the vecotrization here without bumping gather costs noticeably off reality and thus we probably can try to experiment with this if more similar problems are found. Icc is also using gather in s1115 and s128. For s1115 the vectorization does not seem to help and s128 gets slower. Clang and aocc does not use gathers. * config/i386/x86-tune-costs.h (struct processor_costs): Update costs of gather to match reality. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Enable for znver3.