Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2023-10-09 | Fixes for profile count/probability maintenance | Eugene Rozenfeld | 2 | -3/+3 | |
Verifier checks have recently been strengthened to check that all counts and probabilities are initialized. The checks fired during autoprofiledbootstrap build and this patch fixes it. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons * tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count when scaling loop profile | |||||
2023-10-09 | analyzer: fix build with gcc < 6 | David Malcolm | 1 | -1/+2 | |
gcc/analyzer/ChangeLog: * access-diagram.cc (boundaries::add): Explicitly state "boundaries::" scope for "kind" enum. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-09 | Ensure float equivalences include + and - zero. | Andrew MacLeod | 4 | -0/+44 | |
A floating point equivalence may not properly reflect both signs of zero, so be pessimsitic and ensure both signs are included. PR tree-optimization/111694 gcc/ * gimple-range-cache.cc (ranger_cache::fill_block_cache): Adjust equivalence range. * value-relation.cc (adjust_equivalence_range): New. * value-relation.h (adjust_equivalence_range): New prototype. gcc/testsuite/ * gcc.dg/pr111694.c: New. | |||||
2023-10-09 | Remove unused get_identity_relation. | Andrew MacLeod | 3 | -25/+2 | |
Turns out we didnt need this as there is no unordered relations managed by the oracle. * gimple-range-gori.cc (gori_compute::compute_operand1_range): Do not call get_identity_relation. (gori_compute::compute_operand2_range): Ditto. * value-relation.cc (get_identity_relation): Remove. * value-relation.h (get_identity_relation): Remove protyotype. | |||||
2023-10-09 | RISC-V Regression test: Fix slp-perm-4.c FAIL for RVV | Juzhe-Zhong | 1 | -1/+1 | |
RVV vectorize it with stride5 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes. | |||||
2023-10-09 | RISC-V Regression tests: Fix FAIL of pr97832* for RVV | Juzhe-Zhong | 3 | -6/+6 | |
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP with -fno-vect-cost-model. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8. * gcc.dg/vect/pr97832-3.c: Ditto. * gcc.dg/vect/pr97832-4.c: Ditto. | |||||
2023-10-09 | RISC-V Regression test: Fix FAIL of slp-12a.c | Juzhe-Zhong | 1 | -1/+1 | |
This case is vectorized by stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes. | |||||
2023-10-09 | RISC-V Regression test: Fix FAIL of slp-reduc-4.c for RVV | Juzhe-Zhong | 1 | -1/+1 | |
RVV vectortizes this case with stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes. | |||||
2023-10-09 | RISC-V Regression test: Adapt SLP tests like ARM SVE | Juzhe-Zhong | 2 | -2/+2 | |
Like ARM SVE, RVV is vectorizing these 2 cases in the same way. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-23.c: Add RVV like ARM SVE. * gcc.dg/vect/slp-perm-10.c: Ditto. | |||||
2023-10-09 | RISC-V: Add initial pipeline description for an out-of-order core. | Robin Dapp | 7 | -4/+382 | |
This adds a pipeline description for a generic out-of-order core. Latency and units are not based on any real processor but more or less educated guesses what such a processor would look like. In order to account for latency scaling by LMUL != 1, sched_adjust_cost is implemented. It will scale an instruction's latency by its LMUL so an LMUL == 8 instruction will take 8 times the number of cycles the same instruction with LMUL == 1 would take. As this potentially causes very high latencies which, in turn, might lead to scheduling anomalies and a higher number of vsetvls emitted this feature is only enabled when specifying -madjust-lmul-cost. Additionally, in order to easily recognize pre-RA vsetvls this patch introduces an insn type vsetvl_pre which is used in sched_adjust_cost. In the future we might also want a latency adjustment similar to lmul for reductions, i.e. make the latency dependent on the type and its number of units. gcc/ChangeLog: * config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add generic_ooo. * config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement scheduler hook. (TARGET_SCHED_ADJUST_COST): Define. * config/riscv/riscv.md (no,yes"): Include generic-ooo.md * config/riscv/riscv.opt: Add -madjust-lmul-cost. * config/riscv/generic-ooo.md: New file. * config/riscv/vector.md: Add vsetvl_pre. | |||||
2023-10-09 | RISC-V: Support movmisalign of RVV VLA modes | Juzhe-Zhong | 3 | -12/+17 | |
This patch fixed these following FAILs in regressions: FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum" Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit: https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520 I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern. However, after deep investigation && reading RVV ISA again and experiment on SPIKE, I realized I was wrong. RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints "If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, either the element is transferred successfully or an address misaligned exception is raised on that element." It's obvious that RVV ISA does allow misaligned vector load/store. And experiment and confirm on SPIKE: [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader z 0000000000000000 ra 0000000000010158 sp 0000003ffffffb40 gp 0000000000012c48 tp 0000000000000000 t0 00000000000110da t1 000000000000000f t2 0000000000000000 s0 0000000000013460 s1 0000000000000000 a0 0000000000012ef5 a1 0000000000012018 a2 0000000000012a71 a3 000000000000000d a4 0000000000000004 a5 0000000000012a71 a6 0000000000012a71 a7 0000000000012018 s2 0000000000000000 s3 0000000000000000 s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000 s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000 t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 pc 0000000000010258 va/inst 00000000020660a7 sr 8000000200006620 Store/AMO access fault! [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE. So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since it can improve multiple vectorization tests and fix dumple FAILs. This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled). Consider this following case: struct s { unsigned i : 31; char a : 4; }; #define N 32 #define ELT0 {0x7FFFFFFFUL, 0} #define ELT1 {0x7FFFFFFFUL, 1} #define ELT2 {0x7FFFFFFFUL, 2} #define ELT3 {0x7FFFFFFFUL, 3} #define RES 48 struct s A[N] = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; int __attribute__ ((noipa)) f(struct s *ptr, unsigned n) { int res = 0; for (int i = 0; i < n; ++i) res += ptr[i].a; return res; } -O3 -S -fno-vect-cost-model (default strict-align): f: mv a4,a0 beq a1,zero,.L9 addiw a5,a1,-1 li a3,14 vsetivli zero,16,e64,m8,ta,ma bleu a5,a3,.L3 andi a5,a0,127 bne a5,zero,.L3 srliw a3,a1,4 slli a3,a3,7 li a0,15 slli a0,a0,32 add a3,a3,a4 mv a5,a4 li a2,32 vmv.v.x v16,a0 vsetvli zero,zero,e32,m4,ta,ma vmv.v.i v4,0 .L4: vsetvli zero,zero,e64,m8,ta,ma vle64.v v8,0(a5) addi a5,a5,128 vand.vv v8,v8,v16 vsetvli zero,zero,e32,m4,ta,ma vnsrl.wx v8,v8,a2 vadd.vv v4,v4,v8 bne a5,a3,.L4 li a3,0 andi a5,a1,15 vmv.s.x v1,a3 andi a3,a1,-16 vredsum.vs v1,v4,v1 vmv.x.s a0,v1 mv a2,a0 beq a5,zero,.L15 slli a5,a3,3 add a5,a4,a5 lw a0,4(a5) andi a0,a0,15 addiw a4,a3,1 addw a0,a0,a2 bgeu a4,a1,.L15 lw a2,12(a5) andi a2,a2,15 addiw a4,a3,2 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,20(a5) andi a2,a2,15 addiw a4,a3,3 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,28(a5) andi a2,a2,15 addiw a4,a3,4 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,36(a5) andi a2,a2,15 addiw a4,a3,5 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,44(a5) andi a2,a2,15 addiw a4,a3,6 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,52(a5) andi a2,a2,15 addiw a4,a3,7 addw a0,a2,a0 bgeu a4,a1,.L15 lw a4,60(a5) andi a4,a4,15 addw a4,a4,a0 addiw a2,a3,8 mv a0,a4 bgeu a2,a1,.L15 lw a0,68(a5) andi a0,a0,15 addiw a2,a3,9 addw a0,a0,a4 bgeu a2,a1,.L15 lw a2,76(a5) andi a2,a2,15 addiw a4,a3,10 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,84(a5) andi a2,a2,15 addiw a4,a3,11 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,92(a5) andi a2,a2,15 addiw a4,a3,12 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,100(a5) andi a2,a2,15 addiw a4,a3,13 addw a0,a2,a0 bgeu a4,a1,.L15 lw a4,108(a5) andi a4,a4,15 addiw a3,a3,14 addw a0,a4,a0 bgeu a3,a1,.L15 lw a5,116(a5) andi a5,a5,15 addw a0,a5,a0 ret .L9: li a0,0 .L15: ret .L3: mv a5,a4 slli a4,a1,32 srli a1,a4,29 add a1,a5,a1 li a0,0 .L7: lw a4,4(a5) andi a4,a4,15 addi a5,a5,8 addw a0,a4,a0 bne a5,a1,.L7 ret -O3 -S -mno-strict-align -fno-vect-cost-model: f: beq a1,zero,.L4 slli a1,a1,32 li a5,15 vsetvli a4,zero,e64,m1,ta,ma slli a5,a5,32 srli a1,a1,32 li a6,32 vmv.v.x v3,a5 vsetvli zero,zero,e32,mf2,ta,ma vmv.v.i v2,0 .L3: vsetvli a5,a1,e64,m1,ta,ma vle64.v v1,0(a0) vsetvli a3,zero,e64,m1,ta,ma slli a2,a5,3 vand.vv v1,v1,v3 sub a1,a1,a5 vsetvli zero,zero,e32,mf2,ta,ma add a0,a0,a2 vnsrl.wx v1,v1,a6 vsetvli zero,a5,e32,mf2,tu,ma vadd.vv v2,v2,v1 bne a1,zero,.L3 li a5,0 vsetvli a3,zero,e32,mf2,ta,ma vmv.s.x v1,a5 vredsum.vs v2,v2,v1 vmv.x.s a0,v2 ret .L4: li a0,0 ret We can see it improves this case codegen a lot. gcc/ChangeLog: * config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): New macro. * config/riscv/riscv.cc (riscv_support_vector_misalignment): Depend on movmisalign pattern. * config/riscv/vector.md (movmisalign<mode>): New pattern. | |||||
2023-10-09 | THead: Fix missing CFI directives for th.sdd in prologue. | Xianmiao Qu | 2 | -5/+35 | |
When generating CFI directives for the store-pair instruction, if we add two parallel REG_FRAME_RELATED_EXPR expr_lists like (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (plus:DI (reg/f:DI 2 sp) (const_int 8 [0x8])) [1 S8 A64]) (reg:DI 1 ra)) (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (reg/f:DI 2 sp) [1 S8 A64]) (reg:DI 8 s0)) only the first expr_list will be recognized by dwarf2out_frame_debug funciton. So, here we generate a SEQUENCE expression of REG_FRAME_RELATED_EXPR, which includes two sub-expressions of RTX_FRAME_RELATED_P. Then the dwarf2out_frame_debug_expr function will iterate through all the sub-expressions and generate the corresponding CFI directives. gcc/ * config/riscv/thead.cc (th_mempair_save_regs): Fix missing CFI directives for store-pair instruction. gcc/testsuite/ * gcc.target/riscv/xtheadmempair-4.c: New test. | |||||
2023-10-09 | tree-optimization/111715 - improve TBAA for access paths with pun | Richard Biener | 2 | -1/+48 | |
The following improves basic TBAA for access paths formed by C++ abstraction where we are able to combine a path from an address-taking operation with a path based on that access using a pun to avoid memory access semantics on the address-taking part. The trick is to identify the point the semantic memory access path starts which allows us to use the alias set of the outermost access instead of only that of the base of this path. PR tree-optimization/111715 * alias.cc (reference_alias_ptr_type_1): When we have a type-punning ref at the base search for the access path part that's still semantically valid. * gcc.dg/tree-ssa/ssa-fre-102.c: New testcase. | |||||
2023-10-09 | RISC-V: Refine bswap16 auto vectorization code gen | Pan Li | 5 | -2/+188 | |
Update in v2 * Remove emit helper functions. * Take expand_binop instead. Original log: This patch would like to refine the code gen for the bswap16. We will have VEC_PERM_EXPR after rtl expand when invoking __builtin_bswap. It will generate about 9 instructions in loop as below, no matter it is bswap16, bswap32 or bswap64. .L2: 1 vle16.v v4,0(a0) 2 vmv.v.x v2,a7 3 vand.vv v2,v6,v2 4 slli a2,a5,1 5 vrgatherei16.vv v1,v4,v2 6 sub a4,a4,a5 7 vse16.v v1,0(a3) 8 add a0,a0,a2 9 add a3,a3,a2 bne a4,zero,.L2 But for bswap16 we may have a even simple code gen, which has only 7 instructions in loop as below. .L5 1 vle8.v v2,0(a5) 2 addi a5,a5,32 3 vsrl.vi v4,v2,8 4 vsll.vi v2,v2,8 5 vor.vv v4,v4,v2 6 vse8.v v4,0(a4) 7 addi a4,a4,32 bne a5,a6,.L5 Unfortunately, this way will make the insn in loop will grow up to 13 and 24 for bswap32 and bswap64. Thus, we will refine the code gen for the bswap16 only, and leave both the bswap32 and bswap64 as is. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl for shuffle bswap. (expand_vec_perm_const_1): Add handling for shuffle bswap pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com> | |||||
2023-10-09 | RISC-V Regression test: Fix FAIL of pr45752.c for RVV | Juzhe-Zhong | 1 | -1/+1 | |
RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model instead of SLP. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr45752.c: Adapt dump check for target supports load_lanes with stride = 5. | |||||
2023-10-09 | testsuite: Fix vect_cond_arith_* dump checks for RVV. | Robin Dapp | 4 | -14/+14 | |
gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-arith-2.c: Also match COND_LEN. * gcc.dg/vect/vect-cond-arith-4.c: Ditto. * gcc.dg/vect/vect-cond-arith-5.c: Ditto. * gcc.dg/vect/vect-cond-arith-6.c: Ditto. | |||||
2023-10-09 | RISC-V Regression test: Fix FAIL of fast-math-slp-38.c for RVV | Juzhe-Zhong | 1 | -1/+1 | |
Reference: https://godbolt.org/z/G9jzf5Grh RVV is able to vectorize this case using SLP. However, with -fno-vect-cost-model, RVV vectorize it by vec_load_lanes with stride 6. gcc/testsuite/ChangeLog: * gcc.dg/vect/fast-math-slp-38.c: Add ! vect_strided6. | |||||
2023-10-09 | i386: Implement doubleword right shifts by 1 bit using s[ha]r+rcr. | Roger Sayle | 6 | -0/+101 | |
This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr functions to implement doubleword right shifts by 1 bit, using a shift of the highpart that sets the carry flag followed by a rotate-carry-right (RCR) instruction on the lowpart. Conceptually this is similar to the recent left shift patch, but with two complicating factors. The first is that although the RCR sequence is shorter, and is a ~3x performance improvement on AMD, my microbenchmarking shows it ~10% slower on Intel. Hence this patch also introduces a new X86_TUNE_USE_RCR tuning parameter. The second is that I believe this is the first time a "rotate-right-through-carry" and a right shift that sets the carry flag from the least significant bit has been modelled in GCC RTL (on a MODE_CC target). For this I've used the i386 back-end's UNSPEC_CC_NE which seems appropriate. Finally rcrsi2 and rcrdi2 are separate define_insns so that we can use their generator functions. For the pair of functions: unsigned __int128 foo(unsigned __int128 x) { return x >> 1; } __int128 bar(__int128 x) { return x >> 1; } with -O2 -march=znver4 we previously generated: foo: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax shrq %rdx ret bar: movq %rdi, %rax movq %rsi, %rdx shrdq $1, %rsi, %rax sarq %rdx ret with this patch we now generate: foo: movq %rsi, %rdx movq %rdi, %rax shrq %rdx rcrq %rax ret bar: movq %rsi, %rdx movq %rdi, %rax sarq %rdx rcrq %rax ret 2023-10-09 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_split_ashr): Split shifts by one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. (ix86_split_lshr): Likewise, split shifts by one bit into lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz. * config/i386/i386.h (TARGET_USE_RCR): New backend macro. * config/i386/i386.md (rcrsi2): New define_insn for rcrl. (rcrdi2): New define_insn for rcrq. (<anyshiftrt><mode>3_carry): New define_insn for right shifts that set the carry flag from the least significant bit, modelled using UNSPEC_CC_NE. * config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter controlling use of rcr 1 vs. shrd, which is significantly faster on AMD processors. gcc/testsuite/ChangeLog * gcc.target/i386/rcr-1.c: New 64-bit test case. * gcc.target/i386/rcr-2.c: New 32-bit test case. | |||||
2023-10-09 | Allow -mno-evex512 usage | Haochen Jiang | 4 | -1/+40 | |
gcc/ChangeLog: * config/i386/i386.opt: Allow -mno-evex512. gcc/testsuite/ChangeLog: * gcc.target/i386/noevex512-1.c: New test. * gcc.target/i386/noevex512-2.c: Ditto. * gcc.target/i386/noevex512-3.c: Ditto. | |||||
2023-10-09 | Support -mevex512 for AVX512FP16 intrins | Haochen Jiang | 1 | -23/+21 | |
gcc/ChangeLog: * config/i386/sse.md (V48H_AVX512VL): Add TARGET_EVEX512. (VFH): Ditto. (VF2H): Ditto. (VFH_AVX512VL): Ditto. (VHFBF): Ditto. (VHF_AVX512VL): Ditto. (VI2H_AVX512VL): Ditto. (VI2F_256_512): Ditto. (VF48_I1248): Remove unused iterator. (VF48H_AVX512VL): Add TARGET_EVEX512. (VF_AVX512): Remove unused iterator. (REDUC_PLUS_MODE): Add TARGET_EVEX512. (REDUC_SMINMAX_MODE): Ditto. (FMAMODEM): Ditto. (VFH_SF_AVX512VL): Ditto. (VEC_PERM_AVX2): Ditto. Co-authored-by: Hu, Lin1 <lin1.hu@intel.com> | |||||
2023-10-09 | Support -mevex512 for ↵ | Haochen Jiang | 2 | -27/+31 | |
AVX512{IFMA,VBMI,VNNI,BF16,VPOPCNTDQ,VBMI2,BITALG,VP2INTERSECT},VAES,GFNI,VPCLMULQDQ intrins gcc/ChangeLog: * config/i386/sse.md (VI1_AVX512VL): Add TARGET_EVEX512. (VI8_FVL): Ditto. (VI1_AVX512F): Ditto. (VI1_AVX512VNNI): Ditto. (VI1_AVX512VL_F): Ditto. (VI12_VI48F_AVX512VL): Ditto. (*avx512f_permvar_truncv32hiv32qi_1): Ditto. (sdot_prod<mode>): Ditto. (VEC_PERM_AVX2): Ditto. (VPERMI2): Ditto. (VPERMI2I): Ditto. (vpmadd52<vpmadd52type>v8di): Ditto. (usdot_prod<mode>): Ditto. (vpdpbusd_v16si): Ditto. (vpdpbusds_v16si): Ditto. (vpdpwssd_v16si): Ditto. (vpdpwssds_v16si): Ditto. (VI48_AVX512VP2VL): Ditto. (avx512vp2intersect_2intersectv16si): Ditto. (VF_AVX512BF16VL): Ditto. (VF1_AVX512_256): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr90096.c: Adjust error message. Co-authored-by: Hu, Lin1 <lin1.hu@intel.com> | |||||
2023-10-09 | Support -mevex512 for AVX512BW intrins | Haochen Jiang | 4 | -126/+128 | |
gcc/Changelog: * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Make sure there is EVEX512 enabled. (ix86_expand_vecop_qihi2): Refuse V32QI->V32HI when no EVEX512. * config/i386/i386.cc (ix86_hard_regno_mode_ok): Disable 64 bit mask when !TARGET_EVEX512. * config/i386/i386.md (avx512bw_512): New. (SWI1248_AVX512BWDQ_64): Add TARGET_EVEX512. (*zero_extendsidi2): Change isa to avx512bw_512. (kmov_isa): Ditto. (*anddi_1): Ditto. (*andn<mode>_1): Change isa to kmov_isa. (*<code><mode>_1): Ditto. (*notxor<mode>_1): Ditto. (*one_cmpl<mode>2_1): Ditto. (*one_cmplsi2_1_zext): Change isa to avx512bw_512. (*ashl<mode>3_1): Change isa to kmov_isa. (*lshr<mode>3_1): Ditto. * config/i386/sse.md (VI12HFBF_AVX512VL): Add TARGET_EVEX512. (VI1248_AVX512VLBW): Ditto. (VHFBF_AVX512VL): Ditto. (VI): Ditto. (VIHFBF): Ditto. (VI_AVX2): Ditto. (VI1_AVX512): Ditto. (VI12_256_512_AVX512VL): Ditto. (VI2_AVX2_AVX512BW): Ditto. (VI2_AVX512VNNIBW): Ditto. (VI2_AVX512VL): Ditto. (VI2HFBF_AVX512VL): Ditto. (VI8_AVX2_AVX512BW): Ditto. (VIMAX_AVX2_AVX512BW): Ditto. (VIMAX_AVX512VL): Ditto. (VI12_AVX2_AVX512BW): Ditto. (VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto. (VI248_AVX512VL): Ditto. (VI248_AVX512VLBW): Ditto. (VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto. (VI248_AVX512BW): Ditto. (VI248_AVX512BW_AVX512VL): Ditto. (VI248_512): Ditto. (VI124_256_AVX512F_AVX512BW): Ditto. (VI_AVX512BW): Ditto. (VIHFBF_AVX512BW): Ditto. (SWI1248_AVX512BWDQ): Ditto. (SWI1248_AVX512BW): Ditto. (SWI1248_AVX512BWDQ2): Ditto. (*knotsi_1_zext): Ditto. (define_split for zero_extend + not): Ditto. (kunpckdi): Ditto. (REDUC_SMINMAX_MODE): Ditto. (VEC_EXTRACT_MODE): Ditto. (*avx512bw_permvar_truncv16siv16hi_1): Ditto. (*avx512bw_permvar_truncv16siv16hi_1_hf): Ditto. (truncv32hiv32qi2): Ditto. (avx512bw_<code>v32hiv32qi2): Ditto. (avx512bw_<code>v32hiv32qi2_mask): Ditto. (avx512bw_<code>v32hiv32qi2_mask_store): Ditto. (usadv64qi): Ditto. (VEC_PERM_AVX2): Ditto. (AVX512ZEXTMASK): Ditto. (SWI24_MASK): New. (vec_pack_trunc_<mode>): Change iterator to SWI24_MASK. (avx512bw_packsswb<mask_name>): Add TARGET_EVEX512. (avx512bw_packssdw<mask_name>): Ditto. (avx512bw_interleave_highv64qi<mask_name>): Ditto. (avx512bw_interleave_lowv64qi<mask_name>): Ditto. (<mask_codefor>avx512bw_pshuflwv32hi<mask_name>): Ditto. (<mask_codefor>avx512bw_pshufhwv32hi<mask_name>): Ditto. (vec_unpacks_lo_di): Ditto. (SWI48x_MASK): New. (vec_unpacks_hi_<mode>): Change iterator to SWI48x_MASK. (avx512bw_umulhrswv32hi3<mask_name>): Add TARGET_EVEX512. (VI1248_AVX512VL_AVX512BW): Ditto. (avx512bw_<code>v32qiv32hi2<mask_name>): Ditto. (*avx512bw_zero_extendv32qiv32hi2_1): Ditto. (*avx512bw_zero_extendv32qiv32hi2_2): Ditto. (<insn>v32qiv32hi2): Ditto. (pbroadcast_evex_isa): Change isa attribute to avx512bw_512. (VPERMI2): Add TARGET_EVEX512. (VPERMI2I): Ditto. | |||||
2023-10-09 | Support -mevex512 for AVX512DQ intrins | Haochen Jiang | 3 | -17/+31 | |
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse2_mulvxdi3): Add TARGET_EVEX512 for 512 bit usage. * config/i386/i386.cc (standard_sse_constant_opcode): Ditto. * config/i386/sse.md (VF1_VF2_AVX512DQ): Ditto. (VF1_128_256VL): Ditto. (VF2_AVX512VL): Ditto. (VI8_256_512): Ditto. (<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>): Ditto. (AVX512_VEC): Ditto. (AVX512_VEC_2): Ditto. (VI4F_BRCST32x2): Ditto. (VI8F_BRCST64x2): Ditto. | |||||
2023-10-09 | Support -mevex512 for AVX512F intrins | Haochen Jiang | 6 | -333/+442 | |
gcc/ChangeLog: * config/i386/i386-builtins.cc (ix86_vectorize_builtin_gather): Disable 512 bit gather when !TARGET_EVEX512. * config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode): Add TARGET_EVEX512. (ix86_expand_int_sse_cmp): Ditto. (ix86_expand_vector_init_one_nonzero): Disable subroutine when !TARGET_EVEX512. (ix86_emit_swsqrtsf): Add TARGET_EVEX512. (ix86_vectorize_vec_perm_const): Disable subroutine when !TARGET_EVEX512. * config/i386/i386.cc (standard_sse_constant_p): Add TARGET_EVEX512. (standard_sse_constant_opcode): Ditto. (ix86_get_ssemov): Ditto. (ix86_legitimate_constant_p): Ditto. (ix86_vectorize_builtin_scatter): Diable 512 bit scatter when !TARGET_EVEX512. * config/i386/i386.md (avx512f_512): New. (movxi): Add TARGET_EVEX512. (*movxi_internal_avx512f): Ditto. (*movdi_internal): Change alternative 12 to ?Yv. Adjust mode for alternative 13. (*movsi_internal): Change alternative 8 to ?Yv. Adjust mode for alternative 9. (*movhi_internal): Change alternative 11 to *Yv. (*movdf_internal): Change alternative 12 to Yv. (*movsf_internal): Change alternative 5 to Yv. Adjust mode for alternative 5 and 6. (*mov<mode>_internal): Change alternative 4 to Yv. (define_split for convert SF to DF): Add TARGET_EVEX512. (extendbfsf2_1): Ditto. * config/i386/predicates.md (bcst_mem_operand): Disable predicate for 512 bit when !TARGET_EVEX512. * config/i386/sse.md (VMOVE): Add TARGET_EVEX512. (V48_AVX512VL): Ditto. (V48_256_512_AVX512VL): Ditto. (V48H_AVX512VL): Ditto. (VI12_AVX512VL): Ditto. (V): Ditto. (V_512): Ditto. (V_256_512): Ditto. (VF): Ditto. (VF1_VF2_AVX512DQ): Ditto. (VFH): Ditto. (VFB): Ditto. (VF1): Ditto. (VF1_AVX2): Ditto. (VF2): Ditto. (VF2H): Ditto. (VF2_512_256): Ditto. (VF2_512_256VL): Ditto. (VF_512): Ditto. (VFB_512): Ditto. (VI48_AVX512VL): Ditto. (VI1248_AVX512VLBW): Ditto. (VF_AVX512VL): Ditto. (VFH_AVX512VL): Ditto. (VF1_AVX512VL): Ditto. (VI): Ditto. (VIHFBF): Ditto. (VI_AVX2): Ditto. (VI8): Ditto. (VI8_AVX512VL): Ditto. (VI2_AVX512F): Ditto. (VI4_AVX512F): Ditto. (VI4_AVX512VL): Ditto. (VI48_AVX512F_AVX512VL): Ditto. (VI8_AVX2_AVX512F): Ditto. (VI8_AVX_AVX512F): Ditto. (V8FI): Ditto. (V16FI): Ditto. (VI124_AVX2_24_AVX512F_1_AVX512BW): Ditto. (VI248_AVX512VLBW): Ditto. (VI248_AVX2_8_AVX512F_24_AVX512BW): Ditto. (VI248_AVX512BW): Ditto. (VI248_AVX512BW_AVX512VL): Ditto. (VI48_AVX512F): Ditto. (VI48_AVX_AVX512F): Ditto. (VI12_AVX_AVX512F): Ditto. (VI148_512): Ditto. (VI124_256_AVX512F_AVX512BW): Ditto. (VI48_512): Ditto. (VI_AVX512BW): Ditto. (VIHFBF_AVX512BW): Ditto. (VI4F_256_512): Ditto. (VI48F_256_512): Ditto. (VI48F): Ditto. (VI12_VI48F_AVX512VL): Ditto. (V32_512): Ditto. (AVX512MODE2P): Ditto. (STORENT_MODE): Ditto. (REDUC_PLUS_MODE): Ditto. (REDUC_SMINMAX_MODE): Ditto. (*andnot<mode>3): Change isa attribute to avx512f_512. (*andnot<mode>3): Ditto. (<code><mode>3): Ditto. (<code>tf3): Ditto. (FMAMODEM): Add TARGET_EVEX512. (FMAMODE_AVX512): Ditto. (VFH_SF_AVX512VL): Ditto. (avx512f_fix_notruncv16sfv16si<mask_name><round_name>): Ditto. (fix<fixunssuffix>_truncv16sfv16si2<mask_name><round_saeonly_name>): Ditto. (avx512f_cvtdq2pd512_2): Ditto. (avx512f_cvtpd2dq512<mask_name><round_name>): Ditto. (fix<fixunssuffix>_truncv8dfv8si2<mask_name><round_saeonly_name>): Ditto. (<mask_codefor>avx512f_cvtpd2ps512<mask_name><round_name>): Ditto. (vec_unpacks_lo_v16sf): Ditto. (vec_unpacks_hi_v16sf): Ditto. (vec_unpacks_float_hi_v16si): Ditto. (vec_unpacks_float_lo_v16si): Ditto. (vec_unpacku_float_hi_v16si): Ditto. (vec_unpacku_float_lo_v16si): Ditto. (vec_pack_sfix_trunc_v8df): Ditto. (avx512f_vec_pack_sfix_v8df): Ditto. (<mask_codefor>avx512f_unpckhps512<mask_name>): Ditto. (<mask_codefor>avx512f_unpcklps512<mask_name>): Ditto. (<mask_codefor>avx512f_movshdup512<mask_name>): Ditto. (<mask_codefor>avx512f_movsldup512<mask_name>): Ditto. (AVX512_VEC): Ditto. (AVX512_VEC_2): Ditto. (vec_extract_lo_v64qi): Ditto. (vec_extract_hi_v64qi): Ditto. (VEC_EXTRACT_MODE): Ditto. (<mask_codefor>avx512f_unpckhpd512<mask_name>): Ditto. (avx512f_movddup512<mask_name>): Ditto. (avx512f_unpcklpd512<mask_name>): Ditto. (*<avx512>_vternlog<mode>_all): Ditto. (*<avx512>_vpternlog<mode>_1): Ditto. (*<avx512>_vpternlog<mode>_2): Ditto. (*<avx512>_vpternlog<mode>_3): Ditto. (avx512f_shufps512_mask): Ditto. (avx512f_shufps512_1<mask_name>): Ditto. (avx512f_shufpd512_mask): Ditto. (avx512f_shufpd512_1<mask_name>): Ditto. (<mask_codefor>avx512f_interleave_highv8di<mask_name>): Ditto. (<mask_codefor>avx512f_interleave_lowv8di<mask_name>): Ditto. (vec_dupv2df<mask_name>): Ditto. (trunc<pmov_src_lower><mode>2): Ditto. (*avx512f_<code><pmov_src_lower><mode>2): Ditto. (*avx512f_vpermvar_truncv8div8si_1): Ditto. (avx512f_<code><pmov_src_lower><mode>2_mask): Ditto. (avx512f_<code><pmov_src_lower><mode>2_mask_store): Ditto. (truncv8div8qi2): Ditto. (avx512f_<code>v8div16qi2): Ditto. (*avx512f_<code>v8div16qi2_store_1): Ditto. (*avx512f_<code>v8div16qi2_store_2): Ditto. (avx512f_<code>v8div16qi2_mask): Ditto. (*avx512f_<code>v8div16qi2_mask_1): Ditto. (*avx512f_<code>v8div16qi2_mask_store_1): Ditto. (avx512f_<code>v8div16qi2_mask_store_2): Ditto. (vec_widen_umult_even_v16si<mask_name>): Ditto. (*vec_widen_umult_even_v16si<mask_name>): Ditto. (vec_widen_smult_even_v16si<mask_name>): Ditto. (*vec_widen_smult_even_v16si<mask_name>): Ditto. (VEC_PERM_AVX2): Ditto. (one_cmpl<mode>2): Ditto. (<mask_codefor>one_cmpl<mode>2<mask_name>): Ditto. (*one_cmpl<mode>2_pternlog_false_dep): Ditto. (define_split to xor): Ditto. (*andnot<mode>3): Ditto. (define_split for ior): Ditto. (*iornot<mode>3): Ditto. (*xnor<mode>3): Ditto. (*<nlogic><mode>3): Ditto. (<mask_codefor>avx512f_interleave_highv16si<mask_name>): Ditto. (<mask_codefor>avx512f_interleave_lowv16si<mask_name>): Ditto. (avx512f_pshufdv3_mask): Ditto. (avx512f_pshufd_1<mask_name>): Ditto. (*vec_extractv4ti): Ditto. (VEXTRACTI128_MODE): Ditto. (define_split to vec_extract): Ditto. (VI1248_AVX512VL_AVX512BW): Ditto. (<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>): Ditto. (<insn>v16qiv16si2): Ditto. (avx512f_<code>v16hiv16si2<mask_name>): Ditto. (<insn>v16hiv16si2): Ditto. (avx512f_zero_extendv16hiv16si2_1): Ditto. (avx512f_<code>v8qiv8di2<mask_name>): Ditto. (*avx512f_<code>v8qiv8di2<mask_name>_1): Ditto. (*avx512f_<code>v8qiv8di2<mask_name>_2): Ditto. (<insn>v8qiv8di2): Ditto. (avx512f_<code>v8hiv8di2<mask_name>): Ditto. (<insn>v8hiv8di2): Ditto. (avx512f_<code>v8siv8di2<mask_name>): Ditto. (*avx512f_zero_extendv8siv8di2_1): Ditto. (*avx512f_zero_extendv8siv8di2_2): Ditto. (<insn>v8siv8di2): Ditto. (avx512f_roundps512_sfix): Ditto. (vashrv8di3): Ditto. (vashrv16si3): Ditto. (pbroadcast_evex_isa): Change isa attribute to avx512f_512. (vec_dupv4sf): Add TARGET_EVEX512. (*vec_dupv4si): Ditto. (*vec_dupv2di): Ditto. (vec_dup<mode>): Change isa attribute to avx512f_512. (VPERMI2): Add TARGET_EVEX512. (VPERMI2I): Ditto. (VEC_INIT_MODE): Ditto. (VEC_INIT_HALF_MODE): Ditto. (<mask_codefor>avx512f_vcvtph2ps512<mask_name><round_saeonly_name>): Ditto. (avx512f_vcvtps2ph512_mask_sae): Ditto. (<mask_codefor>avx512f_vcvtps2ph512<mask_name><round_saeonly_name>): Ditto. (*avx512f_vcvtps2ph512<merge_mask_name>): Ditto. (INT_BROADCAST_MODE): Ditto. | |||||
2023-10-09 | Disable zmm register and 512 bit libmvec call when !TARGET_EVEX512 | Haochen Jiang | 4 | -33/+42 | |
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_broadcast_from_constant): Disable zmm broadcast for !TARGET_EVEX512. * config/i386/i386-options.cc (ix86_option_override_internal): Do not use PVW_512 when no-evex512. (ix86_simd_clone_adjust): Add evex512 target into string. * config/i386/i386.cc (type_natural_mode): Report ABI warning when using zmm register w/o evex512. (ix86_return_in_memory): Do not allow zmm when !TARGET_EVEX512. (ix86_hard_regno_mode_ok): Ditto. (ix86_set_reg_reg_cost): Ditto. (ix86_rtx_costs): Ditto. (ix86_vector_mode_supported_p): Ditto. (ix86_preferred_simd_mode): Ditto. (ix86_get_mask_mode): Ditto. (ix86_simd_clone_compute_vecsize_and_simdlen): Disable 512 bit libmvec call when !TARGET_EVEX512. (ix86_simd_clone_usable): Ditto. * config/i386/i386.h (BIGGEST_ALIGNMENT): Disable 512 alignment when !TARGET_EVEX512 (MOVE_MAX): Do not use PVW_512 when !TARGET_EVEX512. (STORE_MAX_PIECES): Ditto. | |||||
2023-10-09 | [PATCH 5/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins | Haochen Jiang | 1 | -78/+78 | |
gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. | |||||
2023-10-09 | [PATCH 4/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins | Haochen Jiang | 1 | -94/+94 | |
gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. | |||||
2023-10-09 | [PATCH 3/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins | Haochen Jiang | 1 | -113/+113 | |
gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. | |||||
2023-10-09 | [PATCH 2/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins | Haochen Jiang | 1 | -47/+47 | |
gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. | |||||
2023-10-09 | [PATCH 1/5] Add OPTION_MASK_ISA2_EVEX512 for 512 bit builtins | Haochen Jiang | 2 | -348/+372 | |
gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add OPTION_MASK_ISA2_EVEX512. * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Ditto. | |||||
2023-10-09 | [PATCH 5/5] Push evex512 target for 512 bit intrins | Haochen Jiang | 1 | -2678/+2705 | |
gcc/Changelog: * config/i386/avx512fp16intrin.h: Add evex512 target for 512 bit intrins. Co-authored-by: Hu, Lin1 <lin1.hu@intel.com> | |||||
2023-10-09 | [PATCH 4/5] Push evex512 target for 512 bit intrins | Haochen Jiang | 18 | -221/+282 | |
gcc/ChangeLog: * config.gcc: Add avx512bitalgvlintrin.h. * config/i386/avx5124fmapsintrin.h: Add evex512 target for 512 bit intrins. * config/i386/avx5124vnniwintrin.h: Ditto. * config/i386/avx512bf16intrin.h: Ditto. * config/i386/avx512bitalgintrin.h: Add evex512 target for 512 bit intrins. Split 128/256 bit intrins to avx512bitalgvlintrin.h. * config/i386/avx512erintrin.h: Add evex512 target for 512 bit intrins * config/i386/avx512ifmaintrin.h: Ditto * config/i386/avx512pfintrin.h: Ditto * config/i386/avx512vbmi2intrin.h: Ditto. * config/i386/avx512vbmiintrin.h: Ditto. * config/i386/avx512vnniintrin.h: Ditto. * config/i386/avx512vp2intersectintrin.h: Ditto. * config/i386/avx512vpopcntdqintrin.h: Ditto. * config/i386/gfniintrin.h: Ditto. * config/i386/immintrin.h: Add avx512bitalgvlintrin.h. * config/i386/vaesintrin.h: Add evex512 target for 512 bit intrins. * config/i386/vpclmulqdqintrin.h: Ditto. * config/i386/avx512bitalgvlintrin.h: New. | |||||
2023-10-09 | [PATCH 4/5] Push evex512 target for 512 bit intrins | Haochen Jiang | 1 | -138/+153 | |
gcc/ChangeLog: * config/i386/avx512bwintrin.h: Add evex512 target for 512 bit intrins. | |||||
2023-10-09 | [PATCH 2/5] Push evex512 target for 512 bit intrins | Haochen Jiang | 1 | -455/+467 | |
gcc/ChangeLog: * config/i386/avx512dqintrin.h: Add evex512 target for 512 bit intrins. | |||||
2023-10-09 | [PATCH 1/5] Push evex512 target for 512 bit intrins | Haochen Jiang | 1 | -3666/+3745 | |
gcc/ChangeLog: * config/i386/avx512fintrin.h: Add evex512 target for 512 bit intrins. | |||||
2023-10-09 | Initial support for -mevex512 | Haochen Jiang | 4 | -2/+43 | |
gcc/ChangeLog: * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_EVEX512_SET): New. (OPTION_MASK_ISA2_EVEX512_UNSET): Ditto. (ix86_handle_option): Handle EVEX512. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle EVEX512. Add __EVEX256__ when AVX512VL is set. * config/i386/i386-options.cc: (isa2_opts): Handle EVEX512. (ix86_valid_target_attribute_inner_p): Ditto. (ix86_option_override_internal): Set EVEX512 target if it is not explicitly set when AVX512 is enabled. Disable AVX512{PF,ER,4VNNIW,4FAMPS} for -mno-evex512. * config/i386/i386.opt: Add mevex512. Temporaily RejectNegative. | |||||
2023-10-09 | TEST: Fix dump FAIL for RVV (RISCV-V vector) | Juzhe-Zhong | 1 | -1/+2 | |
As this showed: https://godbolt.org/z/3K9oK7fx3 ARM SVE 2 times for FOLD_EXTRACT_LAST wheras RVV 4 times. This is because RISC-V doesn't enable vec_pack_trunc so we will failed conversion and fold_extract_last at the first time analysis. Then we succeed at the second time. So RVV has 4 times of showing "FOLD_EXTRACT_LAST:. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-cond-reduc-4.c: Add vect_pack_trunc variant. | |||||
2023-10-09 | rs6000: support 32bit inline lrint | Haochen Gui | 4 | -1/+65 | |
gcc/ PR target/88558 * config/rs6000/rs6000.md (lrint<mode>di2): Remove TARGET_FPRND from insn condition. (lrint<mode>si2): New insn pattern for 32bit lrint. gcc/testsuite/ PR target/106769 * gcc.target/powerpc/pr88558.h: New. * gcc.target/powerpc/pr88558-p7.c: New. * gcc.target/powerpc/pr88558-p8.c: New. | |||||
2023-10-09 | rs6000: enable SImode in FP register on P7 | Haochen Gui | 2 | -8/+9 | |
gcc/ PR target/88558 * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Enable SImode on FP registers for P7. * config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode move between FP registers. Set attribute isa of stfiwx to "*" and attribute of stxsiwx to "p7". | |||||
2023-10-09 | s390: Make use of new copysign RTL | Stefan Schulze Frielinghaus | 1 | -4/+2 | |
gcc/ChangeLog: * config/s390/s390.md: Make use of new copysign RTL. | |||||
2023-10-09 | [i386] APX EGPR: fix missing patterns that prohibit egpr | Hongyu Wang | 1 | -2/+2 | |
For some pattern m/Bm constraint in alternative 0 and 1 could result in egpr allocated on memory operand under -mapxf. Should use jm/ja instead. gcc/ChangeLog: * config/i386/sse.md (vec_concatv2di): Replace constraint "m" with "jm" for alternative 0 and 1 of operand 2. (sse4_1_<code><mode>3<mask_name>): Replace constraint "Bm" with "ja" for alternative 0 and 1 of operand2. | |||||
2023-10-09 | Daily bump. | GCC Administrator | 8 | -1/+401 | |
2023-10-08 | libcpp: eliminate LINEMAPS_{ORDINARY,MACRO}_MAPS | David Malcolm | 2 | -18/+2 | |
libcpp/ChangeLog: * include/line-map.h (LINEMAPS_ORDINARY_MAPS): Delete. (LINEMAPS_MACRO_MAPS): Delete. * line-map.cc (linemap_tracks_macro_expansion_locs_p): Update for deletion of LINEMAPS_MACRO_MAPS. (linemap_get_statistics): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-08 | libcpp: eliminate LINEMAPS_{,ORDINARY_,MACRO_}CACHE | David Malcolm | 3 | -37/+13 | |
It's simpler to use field access than to go through these inline functions that look as if they are macros. No functional change intended. libcpp/ChangeLog: * include/line-map.h (maps_info_ordinary::cache): Rename to... (maps_info_ordinary::m_cache): ...this. (maps_info_macro::cache): Rename to... (maps_info_macro::m_cache): ...this. (LINEMAPS_CACHE): Delete. (LINEMAPS_ORDINARY_CACHE): Delete. (LINEMAPS_MACRO_CACHE): Delete. * init.cc (read_original_filename): Update for adding "m_" prefix. * line-map.cc (linemap_add): Eliminate LINEMAPS_ORDINARY_CACHE in favor of a simple field access. (linemap_enter_macro): Likewise for LINEMAPS_MACRO_CACHE. (linemap_ordinary_map_lookup): Likewise for LINEMAPS_ORDINARY_CACHE, twice. (linemap_lookup_macro_index): Likewise for LINEMAPS_MACRO_CACHE. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-08 | libcpp: eliminate LINEMAPS_LAST_ALLOCATED{,_ORDINARY,_MACRO}_MAP | David Malcolm | 1 | -25/+0 | |
Nothing uses these; delete them. libcpp/ChangeLog: * include/line-map.h (LINEMAPS_LAST_ALLOCATED_MAP): Delete. (LINEMAPS_LAST_ALLOCATED_ORDINARY_MAP): Delete. (LINEMAPS_LAST_ALLOCATED_MACRO_MAP): Delete. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-08 | analyzer: improvements to out-of-bounds diagrams [PR111155] | David Malcolm | 10 | -181/+644 | |
Update out-of-bounds diagrams to show existing string values, and the initial write index within a string buffer. For example, given the out-of-bounds write in strcat in: void test (void) { char buf[10]; strcpy (buf, "hello"); strcat (buf, " world!"); } the diagram improves from: ┌─────┬─────┬────┬────┬────┐┌─────┬─────┬─────┐ │ [0] │ [1] │[2] │[3] │[4] ││ [5] │ [6] │ [7] │ ├─────┼─────┼────┼────┼────┤├─────┼─────┼─────┤ │ ' ' │ 'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │ ├─────┴─────┴────┴────┴────┴┴─────┴─────┴─────┤ │ string literal (type: 'char[8]') │ └─────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ v v v v v v v v ┌─────┬────────────────────────────────────────┬────┐┌─────────────────┐ │ [0] │ ... │[9] ││ │ ├─────┴────────────────────────────────────────┴────┤│after valid range│ │ 'buf' (type: 'char[10]') ││ │ └───────────────────────────────────────────────────┘└─────────────────┘ ├─────────────────────────┬─────────────────────────┤├────────┬────────┤ │ │ ╭─────────┴────────╮ ╭─────────┴─────────╮ │capacity: 10 bytes│ │overflow of 3 bytes│ ╰──────────────────╯ ╰───────────────────╯ to: ┌────┬────┬────┬────┬────┐┌─────┬─────┬─────┐ │[0] │[1] │[2] │[3] │[4] ││ [5] │ [6] │ [7] │ ├────┼────┼────┼────┼────┤├─────┼─────┼─────┤ │' ' │'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │ ├────┴────┴────┴────┴────┴┴─────┴─────┴─────┤ │ string literal (type: 'char[8]') │ └───────────────────────────────────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ v v v v v v v v ┌─────┬────────────────────┬────┬──────────────┬────┐┌─────────────────┐ │ [0] │ ... │[5] │ ... │[9] ││ │ ├─────┼────┬────┬────┬────┬┼────┼──────────────┴────┘│ │ │ 'h' │'e' │'l' │'l' │'o' ││NUL │ │after valid range│ ├─────┴────┴────┴────┴────┴┴────┴───────────────────┐│ │ │ 'buf' (type: 'char[10]') ││ │ └───────────────────────────────────────────────────┘└─────────────────┘ ├─────────────────────────┬─────────────────────────┤├────────┬────────┤ │ │ ╭─────────┴────────╮ ╭─────────┴─────────╮ │capacity: 10 bytes│ │overflow of 3 bytes│ ╰──────────────────╯ ╰───────────────────╯ gcc/analyzer/ChangeLog: PR analyzer/111155 * access-diagram.cc (boundaries::boundaries): Add logger param (boundaries::add): Add logging. (boundaries::get_hard_boundaries_in_range): New. (boundaries::m_logger): New field. (boundaries::get_table_x_for_offset): Make public. (class svalue_spatial_item): New. (class compound_svalue_spatial_item): New. (add_ellipsis_to_gaps): New. (valid_region_spatial_item::valid_region_spatial_item): Add theme param. Initialize m_boundaries, m_existing_sval, and m_existing_sval_spatial_item. (valid_region_spatial_item::add_boundaries): Set m_boundaries. Add boundaries for any m_existing_sval_spatial_item. (valid_region_spatial_item::add_array_elements_to_table): Rewrite creation of min/max index in terms of maybe_add_array_index_to_table. Rewrite ellipsis code using add_ellipsis_to_gaps. Add index values for any hard boundaries within the valid region. (valid_region_spatial_item::maybe_add_array_index_to_table): New, based on code formerly in add_array_elements_to_table. (valid_region_spatial_item::make_table): Make use of m_existing_sval_spatial_item, if any. (valid_region_spatial_item::m_boundaries): New field. (valid_region_spatial_item::m_existing_sval): New field. (valid_region_spatial_item::m_existing_sval_spatial_item): New field. (class svalue_spatial_item): Rename to... (class written_svalue_spatial_item): ...this. (class string_region_spatial_item): Rename to.. (class string_literal_spatial_item): ...this. Add "kind". (string_literal_spatial_item::add_boundaries): Use m_kind to determine kind of boundary. Update for renaming of m_actual_bits to m_bits. (string_literal_spatial_item::make_table): Likewise. Support not displaying a row for byte indexes, and not displaying a row for the type. (string_literal_spatial_item::add_column_for_byte): Make byte index row optional. (svalue_spatial_item::make): Convert to... (make_written_svalue_spatial_item): ...this. (make_existing_svalue_spatial_item): New. (access_diagram_impl::access_diagram_impl): Pass theme to m_valid_region_spatial_item ctor. Update for renaming of m_svalue_spatial_item. (access_diagram_impl::find_boundaries): Pass logger to boundaries. Update for renaming of... (access_diagram_impl::m_svalue_spatial_item): Rename to... (access_diagram_impl::m_written_svalue_spatial_item): ...this. gcc/testsuite/ChangeLog: PR analyzer/111155 * c-c++-common/analyzer/out-of-bounds-diagram-strcat-2.c: New test. * c-c++-common/analyzer/out-of-bounds-diagram-strcat.c: New test. * gcc.dg/analyzer/out-of-bounds-diagram-17.c: Update expected result to show the existing content of "buf" and the index at which the write starts. * gcc.dg/analyzer/out-of-bounds-diagram-18.c: Likewise. * gcc.dg/analyzer/out-of-bounds-diagram-19.c: Likewise. * gcc.dg/analyzer/out-of-bounds-diagram-6.c: Update expected output. gcc/ChangeLog: PR analyzer/111155 * text-art/table.cc (table::maybe_set_cell_span): New. (table::add_other_table): New. * text-art/table.h (class table::cell_placement): Add class table as a friend. (table::add_rows): New. (table::add_row): Reimplement in terms of add_rows. (table::maybe_set_cell_span): New decl. (table::add_other_table): New decl. * text-art/types.h (operator+): New operator for rect + coord. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-08 | libcpp: eliminate COMBINE_LOCATION_DATA | David Malcolm | 8 | -119/+118 | |
This patch eliminates the function "COMBINE_LOCATION_DATA" (which hasn't been a macro since r6-739-g0501dbd932a7e9) and the function "get_combined_adhoc_loc" in favor of a new line_maps::get_or_create_combined_loc member function. No functional change intended. gcc/cp/ChangeLog: * module.cc (module_state::read_location): Update for renaming of get_combined_adhoc_loc. gcc/ChangeLog: * genmatch.cc (main): Update for "m_" prefix of some fields of line_maps. * input.cc (make_location): Update for removal of COMBINE_LOCATION_DATA. (dump_line_table_statistics): Update for "m_" prefix of some fields of line_maps. (location_with_discriminator): Update for removal of COMBINE_LOCATION_DATA. (line_table_test::line_table_test): Update for "m_" prefix of some fields of line_maps. * toplev.cc (general_init): Likewise. * tree.cc (set_block): Update for removal of COMBINE_LOCATION_DATA. (set_source_range): Likewise. libcpp/ChangeLog: * include/line-map.h (line_maps::reallocator): Rename to... (line_maps::m_reallocator): ...this. (line_maps::round_alloc_size): Rename to... (line_maps::m_round_alloc_size): ...this. (line_maps::location_adhoc_data_map): Rename to... (line_maps::m_location_adhoc_data_map): ...this. (line_maps::num_optimized_ranges): Rename to... (line_maps::m_num_optimized_ranges): ..this. (line_maps::num_unoptimized_ranges): Rename to... (line_maps::m_num_unoptimized_ranges): ...this. (get_combined_adhoc_loc): Delete decl. (COMBINE_LOCATION_DATA): Delete. * lex.cc (get_location_for_byte_range_in_cur_line): Update for removal of COMBINE_LOCATION_DATA. (warn_about_normalization): Likewise. (_cpp_lex_direct): Likewise. * line-map.cc (line_maps::~line_maps): Update for "m_" prefix of some fields of line_maps. (rebuild_location_adhoc_htab): Likewise. (can_be_stored_compactly_p): Convert to... (line_maps::can_be_stored_compactly_p): ...this private member function. (get_combined_adhoc_loc): Convert to... (line_maps::get_or_create_combined_loc): ...this public member function. (line_maps::make_location): Update for removal of COMBINE_LOCATION_DATA. (get_data_from_adhoc_loc): Update for "m_" prefix of some fields of line_maps. (get_discriminator_from_adhoc_loc): Likewise. (get_location_from_adhoc_loc): Likewise. (get_range_from_adhoc_loc): Convert to... (line_maps::get_range_from_adhoc_loc): ...this private member function. (line_maps::get_range_from_loc): Update for conversion of get_range_from_adhoc_loc to a member function. (linemap_init): Update for "m_" prefix of some fields of line_maps. (line_map_new_raw): Likewise. (linemap_enter_macro): Likewise. (linemap_get_statistics): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-08 | libcpp: "const" and other cleanups | David Malcolm | 4 | -80/+143 | |
No functional change intended. gcc/ChangeLog: * input.cc (make_location): Move implementation to line_maps::make_location. libcpp/ChangeLog: * include/line-map.h (line_maps::pure_location_p): New decl. (line_maps::get_pure_location): New decl. (line_maps::get_range_from_loc): New decl. (line_maps::get_start): New. (line_maps::get_finish): New. (line_maps::make_location): New decl. (get_range_from_loc): Make line_maps param const. (get_discriminator_from_loc): Likewise. (pure_location_p): Likewise. (get_pure_location): Likewise. (linemap_check_files_exited): Likewise. (linemap_tracks_macro_expansion_locs_p): Likewise. (linemap_location_in_system_header_p): Likewise. (linemap_location_from_macro_definition_p): Likewise. (linemap_macro_map_loc_unwind_toward_spelling): Likewise. (linemap_included_from_linemap): Likewise. (first_map_in_common): Likewise. (linemap_compare_locations): Likewise. (linemap_location_before_p): Likewise. (linemap_resolve_location): Likewise. (linemap_unwind_toward_expansion): Likewise. (linemap_unwind_to_first_non_reserved_loc): Likewise. (linemap_expand_location): Likewise. (linemap_get_file_highest_location): Likewise. (linemap_get_statistics): Likewise. (linemap_dump_location): Likewise. (linemap_dump): Likewise. (line_table_dump): Likewise. * internal.h (linemap_get_expansion_line): Likewise. (linemap_get_expansion_filename): Likewise. * line-map.cc (can_be_stored_compactly_p): Likewise. (get_data_from_adhoc_loc): Drop redundant "class". (get_discriminator_from_adhoc_loc): Likewise. (get_location_from_adhoc_loc): Likewise. (get_range_from_adhoc_loc): Likewise. (get_range_from_loc): Make const and move implementation to... (line_maps::get_range_from_loc): ...this new function. (get_discriminator_from_loc): Make line_maps param const. (pure_location_p): Make const and move implementation to... (line_maps::pure_location_p): ...this new function. (get_pure_location): Make const and move implementation to... (line_maps::get_pure_location): ...this new function. (linemap_included_from_linemap): Make line_maps param const. (linemap_check_files_exited): Likewise. (linemap_tracks_macro_expansion_locs_p): Likewise. (linemap_macro_map_loc_unwind_toward_spelling): Likewise. (linemap_get_expansion_line): Likewise. (linemap_get_expansion_filename): Likewise. (linemap_location_in_system_header_p): Likewise. (first_map_in_common_1): Likewise. (linemap_compare_locations): Likewise. (linemap_macro_loc_to_spelling_point): Likewise. (linemap_macro_loc_to_def_point): Likewise. (linemap_macro_loc_to_exp_point): Likewise. (linemap_resolve_location): Likewise. (linemap_location_from_macro_definition_p): Likewise. (linemap_unwind_toward_expansion): Likewise. (linemap_unwind_to_first_non_reserved_loc): Likewise. (linemap_expand_location): Likewise. (linemap_dump): Likewise. (linemap_dump_location): Likewise. (linemap_get_file_highest_location): Likewise. (linemap_get_statistics): Likewise. (line_table_dump): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-08 | diagnostics: fix ICE on sarif output when source file is unreadable [PR111700] | David Malcolm | 2 | -2/+22 | |
gcc/ChangeLog: PR driver/111700 * input.cc (file_cache::add_file): Update leading comment to clarify that it can fail. (file_cache::lookup_or_add_file): Likewise. (file_cache::get_source_file_content): Gracefully handle lookup_or_add_file failing. gcc/testsuite/ChangeLog: PR driver/111700 * c-c++-common/diagnostic-format-sarif-file-pr111700.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com> | |||||
2023-10-08 | Support signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2HF/V4HF. | liuhongt | 5 | -1/+328 | |
gcc/ChangeLog: * config/i386/i386.cc (ix86_build_const_vector): Handle V2HF and V4HFmode. (ix86_build_signbit_mask): Ditto. * config/i386/mmx.md (mmxintvecmode): Ditto. (<code><mode>2): New define_expand. (*mmx_<code><mode>): New define_insn_and_split. (*mmx_nabs<mode>2): Ditto. (*mmx_andnot<mode>3): New define_insn. (<code><mode>3): Ditto. (copysign<mode>3): New define_expand. (xorsign<mode>3): Ditto. (signbit<mode>2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-absneghf.c: New test. * gcc.target/i386/part-vect-copysignhf.c: New test. * gcc.target/i386/part-vect-xorsignhf.c: New test. |