riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-06-20	ada: Fix edge case in Ada.Calendar.Formatting.Time_Of	Ronan Desplanques	1	-28/+3
	Before this patch, Ada.Calendar.Formatting.Time_Of executed extra code when passed a number of seconds equal to the number of seconds in a day. This caused the result to be off, perhaps because a statement resetting the number of seconds to zero was missing. Instead of adding such a statement, this patch removes the special handling of the problematic case, which gives the intended result. gcc/ada/ * libgnat/a-calfor.adb (Time_Of): Fix handling of special case.
2023-06-20	x86: correct and improve "*vec_dupv2di"	Jan Beulich	2	-6/+35
	The input constraint for the %vmovddup alternative was wrong, as the upper 16 XMM registers require AVX512VL to be used with this insn. To compensate, introduce a new alternative permitting all 32 registers, by broadcasting to the full 512 bits in that case if AVX512VL is not available. gcc/ * config/i386/sse.md (vec_dupv2di): Correct %vmovddup input constraint. Add new AVX512F alternative. gcc/testsuite/ * gcc.target/i386/avx512f-dupv2di.c: New test.
2023-06-20	debug/110295 - mixed up early/late debug for member DIEs	Richard Biener	2	-1/+21
	When we process a scope typedef during early debug creation and we have already created a DIE for the type when the decl is TYPE_DECL_IS_STUB and this DIE is still in limbo we end up just re-parenting that type DIE instead of properly creating a DIE for the decl, eventually picking up the now completed type and creating DIEs for the members. Instead this is currently defered to the second time we come here, when we annotate the DIEs with locations late where now the type DIE is no longer in limbo and we fall through doing the job for the decl. The following makes sure we perform the necessary early tasks for this by continuing with the decl DIE creation after setting a parent for the limbo type DIE. PR debug/110295 * dwarf2out.cc (process_scope_var): Continue processing the decl after setting a parent in case the existing DIE was in limbo. * g++.dg/debug/pr110295.C: New testcase.
2023-06-20	RISC-V: Fix fails of testcases	Juzhe-Zhong	4	-4/+4
	FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess errors) Excess errors: xgcc: fatal error: Cannot find suitable multilib set for '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d' compilation terminated. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
2023-06-20	RISC-V: Add tuple vector mode psABI checking and simplify code	Lehua Ding	41	-74/+104
	Hi, This patch does several things: 1. Adds the missed checking of tuple vector mode 2. Extend the scope of checking to all vector types, previously it was only for scalable vector types. 3. Simplify the logic of determining code of vector type which will lower to vector tmode code Best, Lehua gcc/ChangeLog: * config/riscv/riscv.cc (riscv_scalable_vector_type_p): Delete. (riscv_arg_has_vector): Simplify. (riscv_pass_in_vector_p): Adjust warning message. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Add -Wno-psabi option. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: Ditto. * gcc.target/riscv/rvv/base/pr110119-1.c: Ditto. * gcc.target/riscv/rvv/base/pr110119-2.c: Ditto. * gcc.target/riscv/vector-abi-1.c: Ditto. * gcc.target/riscv/vector-abi-2.c: Ditto. * gcc.target/riscv/vector-abi-3.c: Ditto. * gcc.target/riscv/vector-abi-4.c: Ditto. * gcc.target/riscv/vector-abi-5.c: Ditto. * gcc.target/riscv/vector-abi-6.c: Ditto. * gcc.target/riscv/vector-abi-7.c: New test. * gcc.target/riscv/vector-abi-8.c: New test. * gcc.target/riscv/vector-abi-9.c: New test.
2023-06-20	Daily bump.	GCC Administrator	7	-1/+751

2023-06-19	libcpp: reject codepoints above 0x10FFFF	Ben Boeckel	1	-0/+7
	Unicode does not support such values because they are unrepresentable in UTF-16. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10FFFF. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
2023-06-19	RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.	Jin Ma	5	-3/+102
	In order to avoid interrupt functions to change the FCSR, it needs to be saved and restored at the beginning and end of the function. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for FCSR. (riscv_for_each_saved_reg): Save and restore FCSR in interrupt functions. * config/riscv/riscv.md (riscv_frcsr): New patterns. (riscv_fscsr): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-fcsr-1.c: New test. * gcc.target/riscv/interrupt-fcsr-2.c: New test. * gcc.target/riscv/interrupt-fcsr-3.c: New test.
2023-06-19	Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans	Toru Kisuki	1	-1/+2
	gcc/ PR rtl-optimization/110305 * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Handle HONOR_SNANS for x + 0.0.
2023-06-19	optimize std::max early	Jan Hubicka	4	-2/+19
	we currently produce very bad code on loops using std::vector as a stack, since we fail to inline push_back which in turn prevents SRA and we fail to optimize out some store-to-load pairs. I looked into why this function is not inlined and it is inlined by clang. We currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30 at -O3. Clang has similar estimate, but still decides to inline at -O2. I looked into reason why the body is so large and one problem I spotted is the way std::max is implemented by taking and returning reference to the values. const T& max( const T& a, const T& b ); This makes it necessary to store the values to memory and load them later and max is used by code computing new size of vector on resize. We optimize this to MAX_EXPR, but only during late optimizations. I think this is a common enough coding pattern and we ought to make this transparent to early opts and IPA. The following is easist fix that simply adds phiprop pass that turns the PHI of address values into PHI of values so later FRE can propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE remove the memory stores. gcc/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * passes.def: Add phiprop to early optimization passes. * tree-ssa-phiprop.cc: Allow clonning. gcc/testsuite/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * gcc.dg/tree-ssa/phiprop-1.c: New test. * gcc.dg/tree-ssa/pr21463.c: Adjust template.
2023-06-19	AArch64: convert some patterns to compact MD syntax	Tamar Christina	1	-83/+78
	Hi All, This converts some patterns in the AArch64 backend to use the new compact syntax. gcc/ChangeLog: * config/aarch64/aarch64.md (arches): Add nosimd. (mov<mode>_aarch64, movsi_aarch64, *movdi_aarch64): Rewrite to compact syntax.
2023-06-19	New compact syntax for insn and insn_split in Machine Descriptions.	Tamar Christina	4	-3/+709
	This patch adds support for a compact syntax for specifying constraints in instruction patterns. Credit for the idea goes to Richard Earnshaw. With this new syntax we want a clean break from the current limitations to make something that is hopefully easier to use and maintain. The idea behind this compact syntax is that often times it's quite hard to correlate the entries in the constrains list, attributes and instruction lists. One has to count and this often is tedious. Additionally when changing a single line in the insn multiple lines in a diff change, making it harder to see what's going on. This new syntax takes into account many of the common things that are done in MD files. It's also worth saying that this version is intended to deal with the common case of a string based alternatives. For C chunks we have some ideas but those are not intended to be addressed here. It's easiest to explain with an example: normal syntax: (define_insn_and_split "movsi_aarch64" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m, r, r, r, w,r,w, w") (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))] "(register_operand (operands[0], SImode) \|\| aarch64_reg_or_zero (operands[1], SImode))" "@ mov\\t%w0, %w1 mov\\t%w0, %w1 mov\\t%w0, %w1 mov\\t%w0, %1 # return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]); ldr\\t%w0, %1 ldr\\t%s0, %1 str\\t%w1, %0 str\\t%s1, %0 adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1] adr\\t%x0, %c1 adrp\\t%x0, %A1 fmov\\t%s0, %w1 fmov\\t%w0, %s1 fmov\\t%s0, %s1 * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);" "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode) && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))" [(const_int 0)] "{ aarch64_expand_mov_immediate (operands[0], operands[1]); DONE; }" ;; The "mov_imm" type for CNT is just a placeholder. [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4, load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move") (set_attr "arch" ",,,,,sve,,fp,,fp,,,,fp,fp,fp,simd") (set_attr "length" "4,4,4,4,, 4,4, 4,4, 4,8,4,4, 4, 4, 4, 4") ] ) New syntax: (define_insn_and_split "movsi_aarch64" [(set (match_operand:SI 0 "nonimmediate_operand") (match_operand:SI 1 "aarch64_mov_operand"))] "(register_operand (operands[0], SImode) \|\| aarch64_reg_or_zero (operands[1], SImode))" {@ [cons: =0, 1; attrs: type, arch, length] [r , r ; mov_reg , * , 4] mov\t%w0, %w1 [k , r ; mov_reg , * , 4] ^ [r , k ; mov_reg , * , 4] ^ [r , M ; mov_imm , * , 4] mov\t%w0, %1 [r , n ; mov_imm , * ,16] # /* The "mov_imm" type for CNT is just a placeholder. / [r , Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); [r , m ; load_4 , , 4] ldr\t%w0, %1 [w , m ; load_4 , fp , 4] ldr\t%s0, %1 [m , rZ ; store_4 , * , 4] str\t%w1, %0 [m , w ; store_4 , fp , 4] str\t%s1, %0 [r , Usw; load_4 , * , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1] [r , Usa; adr , * , 4] adr\t%x0, %c1 [r , Ush; adr , * , 4] adrp\t%x0, %A1 [w , rZ ; f_mcr , fp , 4] fmov\t%s0, %w1 [r , w ; f_mrc , fp , 4] fmov\t%w0, %s1 [w , w ; fmov , fp , 4] fmov\t%s0, %s1 [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode); } "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode) && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))" [(const_int 0)] { aarch64_expand_mov_immediate (operands[0], operands[1]); DONE; } ) The main syntax rules are as follows (See docs for full rules): - Template must start with "{@" and end with "}" to use the new syntax. - "{@" is followed by a layout in parentheses which is "cons:" followed by a list of match_operand/match_scratch IDs, then a semicolon, then the same for attributes ("attrs:"). Both sections are optional (so you can use only cons, or only attrs, or both), and cons must come before attrs if present. - Each alternative begins with any amount of whitespace. - Following the whitespace is a comma-separated list of constraints and/or attributes within brackets [], with sections separated by a semicolon. - Following the closing ']' is any amount of whitespace, and then the actual asm output. - Spaces are allowed in the list (they will simply be removed). - All alternatives should be specified: a blank list should be "[,,]", "[,,;,]" etc., not "[]" or "" (however genattr may segfault if you leave certain attributes empty, I have found). - The actual constraint string in the match_operand or match_scratch, and the attribute string in the set_attr, must be blank or an empty string (you can't combine the old and new syntaxes). - The common idion * return can be shortened by using <<. - Any unexpanded iterators left during processing will result in an error at compile time. If for some reason <> is needed in the output then these must be escaped using \. - Within an {@ block both multiline and singleline C comments are allowed, but when used outside of a C block they must be the only non-whitespace blocks on the line - Inside an {@ block any unexpanded iterators will result in a compile time fault instead of incorrect assembly being generated at runtime. If the literal <> is needed in the output this needs to be escaped with \<\>. - This check is not performed inside C blocks (lines starting with ). - Instead of copying the previous instruction again in the next pattern, one can use ^ to refer to the previous asm string. This patch works by blindly transforming the new syntax into the old syntax, so it doesn't do extensive checking. However, it does verify that: - The correct number of constraints/attributes are specified. - You haven't mixed old and new syntax. - The specified operand IDs/attribute names actually exist. - You don't have duplicate cons If something goes wrong, it may write invalid constraints/attributes/template back into the rtx. But this shouldn't matter because error_at will cause the program to fail on exit anyway. Because this transformation occurs as early as possible (before patterns are queued), the rest of the compiler can completely ignore the new syntax and assume that the old syntax will always be used. This doesn't seem to have any measurable effect on the runtime of gen programs. gcc/ChangeLog: * gensupport.cc (class conlist, add_constraints, add_attributes, skip_spaces, expect_char, preprocess_compact_syntax, parse_section_layout, parse_section, convert_syntax): New. (process_rtx): Check for conversion. * genoutput.cc (process_template): Check for unresolved iterators. (class data): Add compact_syntax_p. (gen_insn): Use it. * gensupport.h (compact_syntax): New. (hash-set.h): Include. * doc/md.texi: Document it. Co-Authored-By: Omar Tahir <Omar.Tahir2@arm.com>
2023-06-19	recog: Change return type of predicate functions from int to bool	Uros Bizjak	4	-89/+91
	Also change some internal variables to bool and change return type of split_all_insns_noflow to void. gcc/ChangeLog: * recog.h (check_asm_operands): Change return type from int to bool. (insn_invalid_p): Ditto. (verify_changes): Ditto. (apply_change_group): Ditto. (constrain_operands): Ditto. (constrain_operands_cached): Ditto. (validate_replace_rtx_subexp): Ditto. (validate_replace_rtx): Ditto. (validate_replace_rtx_part): Ditto. (validate_replace_rtx_part_nosimplify): Ditto. (added_clobbers_hard_reg_p): Ditto. (peep2_regno_dead_p): Ditto. (peep2_reg_dead_p): Ditto. (store_data_bypass_p): Ditto. (if_test_bypass_p): Ditto. * rtl.h (split_all_insns_noflow): Change return type from unsigned int to void. * genemit.cc (output_added_clobbers_hard_reg_p): Change return type of generated added_clobbers_hard_reg_p from int to bool and adjust function body accordingly. Change "used" variable type from int to bool. * recog.cc (check_asm_operands): Change return type from int to bool and adjust function body accordingly. (insn_invalid_p): Ditto. Change "is_asm" variable to bool. (verify_changes): Change return type from int to bool. (apply_change_group): Change return type from int to bool and adjust function body accordingly. (validate_replace_rtx_subexp): Change return type from int to bool. (validate_replace_rtx): Ditto. (validate_replace_rtx_part): Ditto. (validate_replace_rtx_part_nosimplify): Ditto. (constrain_operands_cached): Ditto. (constrain_operands): Ditto. Change "lose" and "win" variables type from int to bool. (split_all_insns_noflow): Change return type from unsigned int to void and adjust function body accordingly. (peep2_regno_dead_p): Change return type from int to bool. (peep2_reg_dead_p): Ditto. (peep2_find_free_register): Change "success" variable type from int to bool (store_data_bypass_p_1): Change return type from int to bool. (store_data_bypass_p): Ditto.
2023-06-19	RISC-V: Fix VWEXTF iterator requirement	Li Xu	1	-6/+6
	gcc/ChangeLog: * config/riscv/vector-iterators.md: zvfh/zvfhmin depends on the Zve32f extension.
2023-06-19	RISC-V: Bugfix for RVV widenning reduction in ZVE32/64	Pan Li	11	-199/+253
	The rvv widdening reduction has 3 different patterns for zve128+, zve64 and zve32. They take the same iterator with different attributions. However, we need the generated function code_for_reduc (code, mode1, mode2). The implementation of code_for_reduc may look like below. code_for_reduc (code, mode1, mode2) { if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+ if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf; // ZVE64 if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf; // ZVE32 } Thus there will be a problem here. For example zve32, we will have code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of the ZVE128+ instead of the ZVE32 logically. This patch will merge the 3 patterns into pattern, and pass both the input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32 will be returned as expectation. Please note both GCC 13 and 14 are impacted by this issue. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: PR target/110299 * config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for modes. * config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64, VWLMUL1_ZVE32, VI_ZVE64, VI_ZVE32, VWI, VWI_ZVE64, VWI_ZVE32, VF_ZVE63 and VF_ZVE32. * config/riscv/vector.md (@pred_widen_reduc_plus<v_su><mode><vwlmul1>): Removed. (@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve64>): Ditto. (@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve32>): Ditto. (@pred_widen_reduc_plus<order><mode><vwlmul1>): Ditto. (@pred_widen_reduc_plus<order><mode><vwlmul1_zve64>): Ditto. (@pred_widen_reduc_plus<v_su><VQI:mode><VHI_LMUL1:mode>): New pattern. (@pred_widen_reduc_plus<v_su><VHI:mode><VSI_LMUL1:mode>): Ditto. (@pred_widen_reduc_plus<v_su><VSI:mode><VDI_LMUL1:mode>): Ditto. (@pred_widen_reduc_plus<order><VHF:mode><VSF_LMUL1:mode>): Ditto. (@pred_widen_reduc_plus<order><VSF:mode><VDF_LMUL1:mode>): Ditto. gcc/testsuite/ChangeLog: PR target/110299 * gcc.target/riscv/rvv/base/pr110299-1.c: New test. * gcc.target/riscv/rvv/base/pr110299-1.h: New test. * gcc.target/riscv/rvv/base/pr110299-2.c: New test. * gcc.target/riscv/rvv/base/pr110299-2.h: New test. * gcc.target/riscv/rvv/base/pr110299-3.c: New test. * gcc.target/riscv/rvv/base/pr110299-3.h: New test. * gcc.target/riscv/rvv/base/pr110299-4.c: New test. * gcc.target/riscv/rvv/base/pr110299-4.h: New test.
2023-06-19	RISC-V: Bugfix for RVV float reduction in ZVE32/64	Pan Li	7	-216/+366
	The rvv integer reduction has 3 different patterns for zve128+, zve64 and zve32. They take the same iterator with different attributions. However, we need the generated function code_for_reduc (code, mode1, mode2). The implementation of code_for_reduc may look like below. code_for_reduc (code, mode1, mode2) { if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+ if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf; // ZVE64 if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf; // ZVE32 } Thus there will be a problem here. For example zve32, we will have code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of the ZVE128+ instead of the ZVE32 logically. This patch will merge the 3 patterns into pattern, and pass both the input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32 will be returned as expectation. Please note both GCC 13 and 14 are impacted by this issue. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: PR target/110277 * config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for ret_mode. * config/riscv/vector-iterators.md: Add VHF, VSF, VDF, VHF_LMUL1, VSF_LMUL1, VDF_LMUL1, and remove unused attr. * config/riscv/vector.md (@pred_reduc_<reduc><mode><vlmul1>): Removed. (@pred_reduc_<reduc><mode><vlmul1_zve64>): Ditto. (@pred_reduc_<reduc><mode><vlmul1_zve32>): Ditto. (@pred_reduc_plus<order><mode><vlmul1>): Ditto. (@pred_reduc_plus<order><mode><vlmul1_zve32>): Ditto. (@pred_reduc_plus<order><mode><vlmul1_zve64>): Ditto. (@pred_reduc_<reduc><VHF:mode><VHF_LMUL1:mode>): New pattern. (@pred_reduc_<reduc><VSF:mode><VSF_LMUL1:mode>): Ditto. (@pred_reduc_<reduc><VDF:mode><VDF_LMUL1:mode>): Ditto. (@pred_reduc_plus<order><VHF:mode><VHF_LMUL1:mode>): Ditto. (@pred_reduc_plus<order><VSF:mode><VSF_LMUL1:mode>): Ditto. (@pred_reduc_plus<order><VDF:mode><VDF_LMUL1:mode>): Ditto. gcc/testsuite/ChangeLog: PR target/110277 * gcc.target/riscv/rvv/base/pr110277-1.c: New test. * gcc.target/riscv/rvv/base/pr110277-1.h: New test. * gcc.target/riscv/rvv/base/pr110277-2.c: New test. * gcc.target/riscv/rvv/base/pr110277-2.h: New test.
2023-06-19	amdgcn: implement vector div and mod libfuncs	Andrew Stubbs	110	-54/+2249
	Also divmod, but only for scalar modes, for now (because there are no complex int vectors yet). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_divmod_libfunc): New function. (gcn_init_libfuncs): Add div and mod functions for all modes. Add placeholders for divmod functions. (TARGET_EXPAND_DIVMOD_LIBFUNC): Define. libgcc/ChangeLog: * config/gcn/lib2-divmod-di.c: Reimplement like lib2-divmod.c. * config/gcn/lib2-divmod.c: Likewise. * config/gcn/lib2-gcn.h: Add new types and prototypes for all the new vector libfuncs. * config/gcn/t-amdgcn: Add new files. * config/gcn/amdgcn_veclib.h: New file. * config/gcn/lib2-vec_divmod-di.c: New file. * config/gcn/lib2-vec_divmod-hi.c: New file. * config/gcn/lib2-vec_divmod-qi.c: New file. * config/gcn/lib2-vec_divmod.c: New file. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/predcom-2.c: Avoid vectors on amdgcn. * gcc.dg/unroll-8.c: Likewise. * gcc.dg/vect/slp-26.c: Change expected results on amdgdn. * lib/target-supports.exp (check_effective_target_vect_int_mod): Add amdgcn. (check_effective_target_divmod): Likewise. * gcc.target/gcn/simd-math-3-16.c: New test. * gcc.target/gcn/simd-math-3-2.c: New test. * gcc.target/gcn/simd-math-3-32.c: New test. * gcc.target/gcn/simd-math-3-4.c: New test. * gcc.target/gcn/simd-math-3-8.c: New test. * gcc.target/gcn/simd-math-3-char-16.c: New test. * gcc.target/gcn/simd-math-3-char-2.c: New test. * gcc.target/gcn/simd-math-3-char-32.c: New test. * gcc.target/gcn/simd-math-3-char-4.c: New test. * gcc.target/gcn/simd-math-3-char-8.c: New test. * gcc.target/gcn/simd-math-3-char-run-16.c: New test. * gcc.target/gcn/simd-math-3-char-run-2.c: New test. * gcc.target/gcn/simd-math-3-char-run-32.c: New test. * gcc.target/gcn/simd-math-3-char-run-4.c: New test. * gcc.target/gcn/simd-math-3-char-run-8.c: New test. * gcc.target/gcn/simd-math-3-char-run.c: New test. * gcc.target/gcn/simd-math-3-char.c: New test. * gcc.target/gcn/simd-math-3-long-16.c: New test. * gcc.target/gcn/simd-math-3-long-2.c: New test. * gcc.target/gcn/simd-math-3-long-32.c: New test. * gcc.target/gcn/simd-math-3-long-4.c: New test. * gcc.target/gcn/simd-math-3-long-8.c: New test. * gcc.target/gcn/simd-math-3-long-run-16.c: New test. * gcc.target/gcn/simd-math-3-long-run-2.c: New test. * gcc.target/gcn/simd-math-3-long-run-32.c: New test. * gcc.target/gcn/simd-math-3-long-run-4.c: New test. * gcc.target/gcn/simd-math-3-long-run-8.c: New test. * gcc.target/gcn/simd-math-3-long-run.c: New test. * gcc.target/gcn/simd-math-3-long.c: New test. * gcc.target/gcn/simd-math-3-run-16.c: New test. * gcc.target/gcn/simd-math-3-run-2.c: New test. * gcc.target/gcn/simd-math-3-run-32.c: New test. * gcc.target/gcn/simd-math-3-run-4.c: New test. * gcc.target/gcn/simd-math-3-run-8.c: New test. * gcc.target/gcn/simd-math-3-run.c: New test. * gcc.target/gcn/simd-math-3-short-16.c: New test. * gcc.target/gcn/simd-math-3-short-2.c: New test. * gcc.target/gcn/simd-math-3-short-32.c: New test. * gcc.target/gcn/simd-math-3-short-4.c: New test. * gcc.target/gcn/simd-math-3-short-8.c: New test. * gcc.target/gcn/simd-math-3-short-run-16.c: New test. * gcc.target/gcn/simd-math-3-short-run-2.c: New test. * gcc.target/gcn/simd-math-3-short-run-32.c: New test. * gcc.target/gcn/simd-math-3-short-run-4.c: New test. * gcc.target/gcn/simd-math-3-short-run-8.c: New test. * gcc.target/gcn/simd-math-3-short-run.c: New test. * gcc.target/gcn/simd-math-3-short.c: New test. * gcc.target/gcn/simd-math-3.c: New test. * gcc.target/gcn/simd-math-4-char-run.c: New test. * gcc.target/gcn/simd-math-4-char.c: New test. * gcc.target/gcn/simd-math-4-long-run.c: New test. * gcc.target/gcn/simd-math-4-long.c: New test. * gcc.target/gcn/simd-math-4-run.c: New test. * gcc.target/gcn/simd-math-4-short-run.c: New test. * gcc.target/gcn/simd-math-4-short.c: New test. * gcc.target/gcn/simd-math-4.c: New test. * gcc.target/gcn/simd-math-5-16.c: New test. * gcc.target/gcn/simd-math-5-32.c: New test. * gcc.target/gcn/simd-math-5-4.c: New test. * gcc.target/gcn/simd-math-5-8.c: New test. * gcc.target/gcn/simd-math-5-char-16.c: New test. * gcc.target/gcn/simd-math-5-char-32.c: New test. * gcc.target/gcn/simd-math-5-char-4.c: New test. * gcc.target/gcn/simd-math-5-char-8.c: New test. * gcc.target/gcn/simd-math-5-char-run-16.c: New test. * gcc.target/gcn/simd-math-5-char-run-32.c: New test. * gcc.target/gcn/simd-math-5-char-run-4.c: New test. * gcc.target/gcn/simd-math-5-char-run-8.c: New test. * gcc.target/gcn/simd-math-5-char-run.c: New test. * gcc.target/gcn/simd-math-5-char.c: New test. * gcc.target/gcn/simd-math-5-long-16.c: New test. * gcc.target/gcn/simd-math-5-long-32.c: New test. * gcc.target/gcn/simd-math-5-long-4.c: New test. * gcc.target/gcn/simd-math-5-long-8.c: New test. * gcc.target/gcn/simd-math-5-long-run-16.c: New test. * gcc.target/gcn/simd-math-5-long-run-32.c: New test. * gcc.target/gcn/simd-math-5-long-run-4.c: New test. * gcc.target/gcn/simd-math-5-long-run-8.c: New test. * gcc.target/gcn/simd-math-5-long-run.c: New test. * gcc.target/gcn/simd-math-5-long.c: New test. * gcc.target/gcn/simd-math-5-run-16.c: New test. * gcc.target/gcn/simd-math-5-run-32.c: New test. * gcc.target/gcn/simd-math-5-run-4.c: New test. * gcc.target/gcn/simd-math-5-run-8.c: New test. * gcc.target/gcn/simd-math-5-run.c: New test. * gcc.target/gcn/simd-math-5-short-16.c: New test. * gcc.target/gcn/simd-math-5-short-32.c: New test. * gcc.target/gcn/simd-math-5-short-4.c: New test. * gcc.target/gcn/simd-math-5-short-8.c: New test. * gcc.target/gcn/simd-math-5-short-run-16.c: New test. * gcc.target/gcn/simd-math-5-short-run-32.c: New test. * gcc.target/gcn/simd-math-5-short-run-4.c: New test. * gcc.target/gcn/simd-math-5-short-run-8.c: New test. * gcc.target/gcn/simd-math-5-short-run.c: New test. * gcc.target/gcn/simd-math-5-short.c: New test. * gcc.target/gcn/simd-math-5.c: New test.
2023-06-19	amdgcn: Delete inactive libfuncs	Andrew Stubbs	3	-126/+0
	The HImode libfuncs weren't called and trying to enable them fails because TARGET_PROMOTE_FUNCTION_MODE wants to widen the arguments but the signedness isn't known. libgcc/ChangeLog: * config/gcn/lib2-gcn.h (QItype, UQItype, HItype, UHItype): Delete. (__divhi3, __modhi3, __udivhi3, __umodhi3): Delete. * config/gcn/t-amdgcn: Don't build lib2-divmod-hi.c. * config/gcn/lib2-divmod-hi.c: Removed.
2023-06-19	vect: vectorize via libfuncs	Andrew Stubbs	2	-3/+7
	This patch allows vectorization when the libfuncs are defined. gcc/ChangeLog: * tree-vect-generic.cc: Include optabs-libfuncs.h. (get_compute_type): Check optab_libfunc. * tree-vect-stmts.cc: Include optabs-libfuncs.h. (vectorizable_operation): Check optab_libfunc.
2023-06-19	amdgcn: minimal V64TImode vector support	Andrew Stubbs	3	-130/+299
	Just enough support for TImode vectors to exist, load, store, move, without any real instructions available. This is primarily for the use of divmodv64di4, which uses TImode to return a pair of DImode values. gcc/ChangeLog: * config/gcn/gcn-protos.h (vgpr_4reg_mode_p): New function. * config/gcn/gcn-valu.md (V_4REG, V_4REG_ALT): New iterators. (V_MOV, V_MOV_ALT): Likewise. (scalar_mode, SCALAR_MODE): Add TImode. (vnsi, VnSI, vndi, VnDI): Likewise. (vec_merge, vec_merge_with_clobber, vec_merge_with_vcc): Use V_MOV. (mov<mode>, mov<mode>_unspec): Use V_MOV. (mov<mode>_4reg): New insn. (mov<mode>_exec): New 4reg variant. (mov<mode>_sgprbase): Likewise. (reload_in<mode>, reload_out<mode>): Use V_MOV. (vec_set<mode>): Likewise. (vec_duplicate<mode><exec>): New 4reg variant. (vec_extract<mode><scalar_mode>): Likewise. (vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Rename to ... (vec_extract<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV. (vec_extract<V_4REG:mode><V_4REG_ALT:mode>_nop): New 4reg variant. (fold_extract_last_<mode>): Use V_MOV. (vec_init<V_ALL:mode><V_ALL_ALT:mode>): Rename to ... (vec_init<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV. (gather_load<mode><vnsi>, gather<mode>_expr<exec>, gather<mode>_insn_1offset<exec>, gather<mode>_insn_1offset_ds<exec>, gather<mode>_insn_2offsets<exec>): Use V_MOV. (scatter_store<mode><vnsi>, scatter<mode>_expr<exec_scatter>, scatter<mode>_insn_1offset<exec_scatter>, scatter<mode>_insn_1offset_ds<exec_scatter>, scatter<mode>_insn_2offsets<exec_scatter>): Likewise. (maskload<mode>di, maskstore<mode>di, mask_gather_load<mode><vnsi>, mask_scatter_store<mode><vnsi>): Likewise. config/gcn/gcn.cc (gcn_class_max_nregs): Use vgpr_4reg_mode_p. (gcn_hard_regno_mode_ok): Likewise. (GEN_VNM): Add TImode support. (USE_TI): New macro. Separate TImode operations from non-TImode ones. (gcn_vector_mode_supported_p): Add V64TImode, V32TImode, V16TImode, V8TImode, and V2TImode. (print_operand): Add 'J' and 'K' print codes.
2023-06-19	Remove -save-temps from tests using -flto	Richard Biener	9	-9/+9
	The following removes -save-temps that doesn't seem to have any good reason from tests that also run with -flto added. That can cause ltrans files to race with other multilibs tested and I'm frequently seeing linker complaints that the architecture doesn't match here. I'm not sure whether the .ltrans.o files end up in a non gccN/ specific directory or if we end up sharing the same dir for different multilibs (not sure if it's easily possible to avoid that). * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps. * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise. * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise. * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise. * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise. * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise. * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise. * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise. * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
2023-06-19	tree-optimization/110298 - CFG cleanup and stale nb_iterations	Richard Biener	2	-3/+24
	When unrolling we eventually kill nb_iterations info since it may refer to removed SSA names. But we do this only after cleaning up the CFG which in turn can end up accessing it. Fixed by swapping the two. PR tree-optimization/110298 * tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely): Clear number of iterations info before cleaning up the CFG. * gcc.dg/torture/pr110298.c: New testcase.
2023-06-19	Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'	Thomas Schwinge	1	-1/+1
	ERROR: libgomp.c/target-51.c: unknown dg option: \} for "}" Fix-up for recent commit 01fe115ba7eafebcf97bbac9e157038a003d0c85 "libgomp.c/target-51.c: Accept more error-msg variants in dg-output". libgomp/ * testsuite/libgomp.c/target-51.c: Fix DejaGnu directive syntax error.
2023-06-19	simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector	Kyrylo Tkachov	2	-0/+39
	In the testcase for this patch we try to vec_concat the lowpart and highpart of a vector, but the lowpart is expressed as a subreg. simplify-rtx.cc does not recognise this and combine ends up trying to match: Trying 7 -> 8: 7: r93:V2SI=vec_select(r95:V4SI,parallel) 8: r97:V4SI=vec_concat(r95:V4SI#0,r93:V2SI) REG_DEAD r95:V4SI REG_DEAD r93:V2SI Failed to match this instruction: (set (reg:V4SI 97) (vec_concat:V4SI (subreg:V2SI (reg/v:V4SI 95 [ a ]) 0) (vec_select:V2SI (reg/v:V4SI 95 [ a ]) (parallel:V4SI [ (const_int 2 [0x2]) (const_int 3 [0x3]) ])))) This should be just (set (reg:V4SI 97) (reg:V4SI 95)). This patch adds such a simplification. The testcase is a bit artificial, but I do have other aarch64-specific patterns that I want to optimise later that rely on this simplification happening. Without this patch for the testcase we generate: foo: dup d31, v0.d[1] ins v0.d[1], v31.d[0] ret whereas we should just not generate anything as the operation is ultimately a no-op. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify vec_concat of lowpart subreg and high part vec_select. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/low-high-combine_1.c: New test.
2023-06-19	Doc update: -foffload-options= examples + OpenMP in Fortran intrinsic modules	Tobias Burnus	2	-5/+17
	With LTO, the -O.. flags of the host are passed on to the lto compiler, which also includes offloading compilers. Therefore, using --foffload-options=-O3 is misleading as it implies that without the default optimizations are used. Hence, this flags has now been removed from the usage examples. The Fortran documentation lists the content (except for API routines) routines of the intrinsic OpenMP modules OMP_LIB and OMP_LIB_KINDS; this commit adds two missing named constants and links also to the OpenMP 5.1 and 5.2 OpenMP spec for completeness. gcc/ChangeLog: * doc/invoke.texi (-foffload-options): Remove '-O3' from the examples. gcc/fortran/ChangeLog: * intrinsic.texi (OpenMP Modules OMP_LIB and OMP_LIB_KINDS): Also add references to the OpenMP 5.1 and 5.2 spec; add omp_initial_device and omp_invalid_device named constants.
2023-06-19	vect: Restore aarch64 bootstrap	Richard Sandiford	1	-1/+2
	gcc/ * tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors): Handle null niters_skip.
2023-06-19	Fix build of aarc64	Richard Biener	1	-1/+2
	The following fixes a reference to LOOP_VINFO_MASKS array in the aarch64 backend after my changes. * config/aarch64/aarch64.cc (aarch64_vector_costs::analyze_loop_vinfo): Fix reference to LOOP_VINFO_MASKS.
2023-06-19	avr: Fix wrong array bounds warning on SFR access	Senthil Kumar Selvaraj	4	-7/+33
	The warning was raised on accessing SFRs at addresses below the default page size, as gcc considers accessing addresses in the first page of memory as suspicious. This doesn't apply to an embedded target like the avr, where both flash and RAM have zero as a valid address. Zero is also a valid address in named address spaces (__memx, flash<n> etc..). This commit implements TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID for the avr target and reports to gcc that zero is a valid address on all address spaces. It also disables flag_delete_null_pointer_checks based on the target hook, and modifies target-supports.exp to add avr to the list of targets that always keep null pointer checks. This fixes a bunch of DejaGNU failures that occur otherwise. PR target/105523 gcc/ChangeLog: * common/config/avr/avr-common.cc: Remove setting of OPT_fdelete_null_pointer_checks. * config/avr/avr.cc (avr_option_override): Clear flag_delete_null_pointer_checks if zero_address_valid. (avr_addr_space_zero_address_valid): New function. (TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): Provide target hook. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_keeps_null_pointer_checks): Add avr. * gcc.target/avr/pr105523.c: New test.
2023-06-19	VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs	Ju-Zhe Zhong	5	-10/+106
	This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets like RISC-V that uses length in loop control. Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR. Mask is the outcome of comparison. LEN_MASK_ LOAD/STORE format is defined as follows: 1). LEN_MASK_LOAD (ptr, align, length, mask). 2). LEN_MASK_STORE (ptr, align, length, mask, vec). Consider these 4 following cases: VLA: Variable-length auto-vectorization VLS: Specific-length auto-vectorization Case 1 (VLS): -mrvv-vector-bits=128 IR (Does not use LEN_MASK_): Code: v1 = MEM (...) for (int i = 0; i < 4; i++) v2 = MEM (...) a[i] = b[i] + c[i]; v3 = v1 + v2 MEM[...] = v3 Case 2 (VLS): -mrvv-vector-bits=128 IR (LEN_MASK_ with length = VF, mask = comparison): Code: mask = comparison for (int i = 0; i < 4; i++) v1 = LEN_MASK_LOAD (length = VF, mask) if (cond[i]) v2 = LEN_MASK_LOAD (length = VF, mask) a[i] = b[i] + c[i]; v3 = v1 + v2 LEN_MASK_STORE (length = VF, mask, v3) Case 3 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) v1 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask = {-1,-1,...}) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask = {-1,-1,...}, v3) Case 4 (VLA): Code: loop_len = SELECT_VL or MIN for (int i = 0; i < n; i++) mask = comparison if (cond[i]) v1 = LEN_MASK_LOAD (length = loop_len, mask) a[i] = b[i] + c[i]; v2 = LEN_MASK_LOAD (length = loop_len, mask) v3 = v1 + v2 LEN_MASK_STORE (length = loop_len, mask, v3) Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com> gcc/ChangeLog: * doc/md.texi: Add len_mask{load,store}. * genopinit.cc (main): Ditto. (CMP_NAME): Ditto. * internal-fn.cc (len_maskload_direct): Ditto. (len_maskstore_direct): Ditto. (expand_call_mem_ref): Ditto. (expand_partial_load_optab_fn): Ditto. (expand_len_maskload_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. (expand_len_maskstore_optab_fn): Ditto. (direct_len_maskload_optab_supported_p): Ditto. (direct_len_maskstore_optab_supported_p): Ditto. * internal-fn.def (LEN_MASK_LOAD): Ditto. (LEN_MASK_STORE): Ditto. * optabs.def (OPTAB_CD): Ditto.
2023-06-19	RISC-V: Add autovec FP unary operations.	Robin Dapp	17	-35/+284
	This patch adds floating-point autovec expanders for vfneg, vfabs as well as vfsqrt and the accompanying tests. Similary to the binop tests, there are flavors for zvfh now. gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode>2): Add unop expanders. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add unops.
2023-06-19	RISC-V: Add autovec FP binary operations.	Robin Dapp	37	-49/+582
	This implements the floating-point autovec expanders for binary operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds tests. The existing tests are split up into non-_Float16 and _Float16 flavors as we cannot rely on the zvfh extension being present. As long as we do not have full middle-end support we need -ffast-math for the tests. In order to allow proper _Float16 this patch disables general _Float16 promotion to float TARGET_ZVFH is defined similar to TARGET_ZFH or TARGET_ZHINX. gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode>3): Implement binop expander. * config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare. (enum vxrm_field_enum): Rename this... (enum fixed_point_rounding_mode): ...to this. (enum frm_field_enum): Rename this... (enum floating_point_rounding_mode): ...to this. * config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function * config/riscv/riscv.cc (riscv_const_insns): Clarify const vector handling. (riscv_libgcc_floating_mode_supported_p): Adjust comment. (riscv_excess_precision): Do not convert to float for ZVFH. * config/riscv/vector-iterators.md: Add VF_AUTO iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test. * lib/target-supports.exp: Add riscv_vector_hw and riscv_zvfh_hw target selectors.
2023-06-19	RISC-V: Add sign-extending variants for vmv.x.s.	Robin Dapp	6	-0/+42
	When the destination register of a vmv.x.s needs to be sign extended to XLEN we currently emit an sext insn. Since vmv.x.s performs this automatically this patch adds two instruction patterns that include sign_extend for the destination operand. gcc/ChangeLog: * config/riscv/vector-iterators.md: Add VI_QH iterator. * config/riscv/autovec-opt.md (@pred_extract_first_sextdi<mode>): New vmv.x.s pattern that includes sign extension. (@pred_extract_first_sextsi<mode>): Dito for SImode. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Ensure that no sext insns are present. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
2023-06-19	RISC-V: Implement vec_set and vec_extract.	Robin Dapp	15	-2/+1323
	This implements the vec_set and vec_extract patterns for integer and floating-point data types. For vec_set we broadcast the insert value to a vector register and then perform a vslideup with effective length 1 to the requested index. vec_extract is done by sliding down the requested element to index 0 and v(f)mv.[xf].s to a scalar register. The patch does not include vector-vector extraction which will be done at a later time. gcc/ChangeLog: * config/riscv/autovec.md (vec_set<mode>): Implement. (vec_extract<mode><vel>): Implement. * config/riscv/riscv-protos.h (enum insn_type): Add slide insn. (emit_vlmax_slide_insn): Declare. (emit_nonvlmax_slide_tu_insn): Declare. (emit_scalar_move_insn): Export. (emit_nonvlmax_integer_move_insn): Export. * config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function. (emit_nonvlmax_slide_tu_insn): New function. (emit_vlmax_masked_mu_insn): No change. (emit_vlmax_integer_move_insn): Export. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: New test.
2023-06-19	RISC-V: Add (u)int8_t to binop tests.	Robin Dapp	44	-70/+171
	This patch adds the missing (u)int8_t types to the binop tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/shift-run.c: Adapt for (u)int8_t. * gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/shift-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vand-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vor-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vor-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vor-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vxor-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vxor-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Dito.
2023-06-19	libgomp.c/target-51.c: Accept more error-msg variants in dg-output	Tobias Burnus	1	-2/+1
	Depending on the details, the testcase can fail with different but related messages; all of the following all could be observed for this testcase: libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for offloading libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available Before, the last two were tested for with 'target offload_device' and '! offload_device', respectively. Now, all three are accepted by matching '.' already after 'but' and without distinguishing whether the effective target is an offload_device or not. (For completeness, there is a fourth error that follows this pattern: 'OMP_TARGET_OFFLOAD is set to MANDATORY, but device is finalized'.) libgomp/ testsuite/libgomp.c/target-51.c: Accept more error msg variants as expected dg-output.
2023-06-19	AVX512 fully masked vectorization	Richard Biener	3	-65/+636
	This implemens fully masked vectorization or a masked epilog for AVX512 style masks which single themselves out by representing each lane with a single bit and by using integer modes for the mask (both is much like GCN). AVX512 is also special in that it doesn't have any instruction to compute the mask from a scalar IV like SVE has with while_ult. Instead the masks are produced by vector compares and the loop control retains the scalar IV (mainly to avoid dependences on mask generation, a suitable mask test instruction is available). Like RVV code generation prefers a decrementing IV though IVOPTs messes things up in some cases removing that IV to eliminate it with an incrementing one used for address generation. One of the motivating testcases is from PR108410 which in turn is extracted from x264 where large size vectorization shows issues with small trip loops. Execution time there improves compared to classic AVX512 with AVX2 epilogues for the cases of less than 32 iterations. size scalar 128 256 512 512e 512f 1 9.42 11.32 9.35 11.17 15.13 16.89 2 5.72 6.53 6.66 6.66 7.62 8.56 3 4.49 5.10 5.10 5.74 5.08 5.73 4 4.10 4.33 4.29 5.21 3.79 4.25 6 3.78 3.85 3.86 4.76 2.54 2.85 8 3.64 1.89 3.76 4.50 1.92 2.16 12 3.56 2.21 3.75 4.26 1.26 1.42 16 3.36 0.83 1.06 4.16 0.95 1.07 20 3.39 1.42 1.33 4.07 0.75 0.85 24 3.23 0.66 1.72 4.22 0.62 0.70 28 3.18 1.09 2.04 4.20 0.54 0.61 32 3.16 0.47 0.41 0.41 0.47 0.53 34 3.16 0.67 0.61 0.56 0.44 0.50 38 3.19 0.95 0.95 0.82 0.40 0.45 42 3.09 0.58 1.21 1.13 0.36 0.40 'size' specifies the number of actual iterations, 512e is for a masked epilog and 512f for the fully masked loop. From 4 scalar iterations on the AVX512 masked epilog code is clearly the winner, the fully masked variant is clearly worse and it's size benefit is also tiny. This patch does not enable using fully masked loops or masked epilogues by default. More work on cost modeling and vectorization kind selection on x86_64 is necessary for this. Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE which could be exploited further to unify some of the flags we have right now but there didn't seem to be many easy things to merge, so I'm leaving this for followups. Mask requirements as registered by vect_record_loop_mask are kept in their original form and recorded in a hash_set now instead of being processed to a vector of rgroup_controls. Instead that's now left to the final analysis phase which tries forming the rgroup_controls vector using while_ult and if that fails now tries AVX512 style which needs a different organization and instead fills a hash_map with the relevant info. vect_get_loop_mask now has two implementations, one for the two mask styles we then have. I have decided against interweaving vect_set_loop_condition_partial_vectors with conditions to do AVX512 style masking and instead opted to "duplicate" this to vect_set_loop_condition_partial_vectors_avx512. Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512. The vect_prepare_for_masked_peels hunk might run into issues with SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE looked odd. Bootstrapped and tested on x86_64-unknown-linux-gnu. I've run the testsuite with --param vect-partial-vector-usage=2 with and without -fno-vect-cost-model and filed two bugs, one ICE (PR110221) and one latent wrong-code (PR110237). * tree-vectorizer.h (enum vect_partial_vector_style): New. (_loop_vec_info::partial_vector_style): Likewise. (LOOP_VINFO_PARTIAL_VECTORS_STYLE): Likewise. (rgroup_controls::compare_type): Add. (vec_loop_masks): Change from a typedef to auto_vec<> to a structure. * tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors): Adjust. Convert niters_skip to compare_type. (vect_set_loop_condition_partial_vectors_avx512): New function implementing the AVX512 partial vector codegen. (vect_set_loop_condition): Dispatch to the correct vect_set_loop_condition_partial_vectors_* function based on LOOP_VINFO_PARTIAL_VECTORS_STYLE. (vect_prepare_for_masked_peels): Compute LOOP_VINFO_MASK_SKIP_NITERS in the original niter type. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize partial_vector_style. (can_produce_all_loop_masks_p): Adjust. (vect_verify_full_masking): Produce the rgroup_controls vector here. Set LOOP_VINFO_PARTIAL_VECTORS_STYLE on success. (vect_verify_full_masking_avx512): New function implementing verification of AVX512 style masking. (vect_verify_loop_lens): Set LOOP_VINFO_PARTIAL_VECTORS_STYLE. (vect_analyze_loop_2): Also try AVX512 style masking. Adjust condition. (vect_estimate_min_profitable_iters): Implement AVX512 style mask producing cost. (vect_record_loop_mask): Do not build the rgroup_controls vector here but record masks in a hash-set. (vect_get_loop_mask): Implement AVX512 style mask query, complementing the existing while_ult style.
2023-06-19	Add loop_vinfo argument to vect_get_loop_mask	Richard Biener	3	-25/+30
	This adds a loop_vinfo argument for future use, making the next patch smaller. * tree-vectorizer.h (vect_get_loop_mask): Add loop_vec_info argument. * tree-vect-loop.cc (vect_get_loop_mask): Likewise. (vectorize_fold_left_reduction): Adjust. (vect_transform_reduction): Likewise. (vectorizable_live_operation): Likewise. * tree-vect-stmts.cc (vectorizable_call): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise.
2023-06-19	OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping ↵	Tobias Burnus	8	-16/+392
	[PR110270] For C/C++ pointers, default implicit mapping firstprivatizes the pointer but if the memory it points to is mapped, the it is updated to point to the device memory (by attaching a zero sized array section of the pointed-to storage). However, if the pointed-to storage wasn't mapped, the pointer was set to NULL on the device side (OpenMP 5.0/5.1 semantic). With this commit, the pointer retains the on-host address in that case (OpenMP 5.2 semantic). The new semantic avoids an explicit map/firstprivate/is_device_ptr in the following sensible cases: Special values (e.g. pointer or 0x1, 0x2 etc.), explicitly device allocated memory (e.g. omp_target_alloc), and with (unified) shared memory. (Note: With (U)SM, mappings still must be tracked, at least when omp_target_associate_ptr does not fail when passing in two destinct pointers.) libgomp/ PR middle-end/110270 * target.c (gomp_map_vars_internal): Copy host value instead of NULL for GOMP_MAP_ZERO_LEN_ARRAY_SECTION if not mapped. * libgomp.texi (OpenMP 5.2 Impl.): Mark as 'Y'. * testsuite/libgomp.c/target-19.c: Update expected value. * testsuite/libgomp.c++/target-18.C: Likewise. * testsuite/libgomp.c++/target-19.C: Likewise. * testsuite/libgomp.c-c++-common/requires-unified-addr-2.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-4.c: New test.
2023-06-19	avr: Fix ICE on optimize attribute.	Senthil Kumar Selvaraj	2	-2/+7
	This commit fixes an ICE when an optimize attribute changes the prevailing optimization level. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105069 describes the same ICE for the sh target, where the fix was to enable save/restore of target specific options modified via TARGET_OPTIMIZATION_TABLE hook. For the AVR target, mgas-isr-prologues and -mmain-is-OS_task are those target specific options. As they enable generation of more optimal code, this commit adds the Optimization option property to those option records, and that fixes the ICE. Regression run shows no regressions, and >100 new PASSes. PR target/110086 gcc/ChangeLog: * config/avr/avr.opt (mgas-isr-prologues, mmain-is-OS_task): Add Optimization option property. gcc/testsuite/ChangeLog: * gcc.target/avr/pr110086.c: New test.
2023-06-18	xtensa: constantsynth: Add new 2-insns synthesis pattern	Takayuki 'January June' Suwa	1	-2/+10
	This patch adds a new 2-instructions constant synthesis pattern: - A non-negative square value that root can fit into a signed 12-bit: => "MOVI(.N) Ax, simm12" + "MULL Ax, Ax, Ax" Due to the execution cost of the integer multiply instruction (MULL), this synthesis works only when the 32-bit Integer Multiply Option is configured and optimize for size is specified. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_constantsynth_2insn): Add new pattern for the abovementioned case.
2023-06-18	xtensa: Remove TARGET_MEMORY_MOVE_COST hook	Takayuki 'January June' Suwa	1	-13/+0
	It used to always return a constant 4, which is same as the default behavior, but doesn't take into account the effects of secondary reloads. Therefore, the implementation of this target hook is removed. gcc/ChangeLog: * config/xtensa/xtensa.cc (TARGET_MEMORY_MOVE_COST, xtensa_memory_move_cost): Remove.
2023-06-19	rs6000: Enable const_anchor for 'addi'	Jiufu Guo	3	-0/+40
	There is a functionality as const_anchor in cse.cc. This const_anchor supports to generate new constants through adding small gap/offsets to existing constant. For example: void __attribute__ ((noinline)) foo (long long a) { a++ = 0x2351847027482577LL; a++ = 0x2351847027482578LL; } The second constant (0x2351847027482578LL) can be compated by adding '1' to the first constant (0x2351847027482577LL). This is profitable if more than one instructions are need to build the second constant. For rs6000, we can enable this functionality, as the instruction 'addi' is just for this when gap is smaller than 0x8000. * One potential side effect of this feature: Comparing with "r101=0x2351847027482577LL ... r201=0x2351847027482578LL" The new r201 will be "r201=r101+1", and then r101 will live longer, and would increase pressure when allocating registers. But I feel, this would be acceptable for this const_anchor feature. With this feature, for GCC source code and SPEC object files, the significant changes are the improvement that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some other optimizations opportunities: like combine/jump2. While the side effect is also occurring in few cases, but it does not impact overall performance. gcc/ChangeLog: * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define. gcc/testsuite/ChangeLog: * gcc.target/powerpc/const_anchors.c: New test. * gcc.target/powerpc/try_const_anchors_ice.c: New test.
2023-06-19	Check SCALAR_INT_MODE_P in try_const_anchors	Jiufu Guo	1	-3/+2
	The const_anchor in cse.cc supports integer constants only. There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in try_const_anchors. In the latest code, some non-integer modes are used with const int. For examples: "set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of rs6000, i386, arm, and pa. For this, the mode may be BLKmode. Pattern "(set (strict_low_part (xx)) (const_int xx))" could be generated in a few ports. For this, the mode may be VOIDmode. So, avoid mode other than SCALAR_INT_MODE in try_const_anchors would be needed. Some discussions in the previous thread: https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html gcc/ChangeLog: * cse.cc (try_const_anchors): Check SCALAR_INT_MODE.
2023-06-19	Refined 256/512-bit vpacksswb/vpackssdw patterns.	liuhongt	3	-18/+252
	The packing in vpacksswb/vpackssdw is not a simple concat, it's an interweave from src1 and src2 for every 128 bit(or 64-bit for the ss_truncate result). .i.e. dst[192-255] = ss_truncate (src2[128-255]) dst[128-191] = ss_truncate (src1[128-255]) dst[64-127] = ss_truncate (src2[0-127]) dst[0-63] = ss_truncate (src1[0-127] The patch refined those patterns with an extra vec_select for the interweave. gcc/ChangeLog: PR target/110235 * config/i386/sse.md (<sse2_avx2>_packsswb<mask_name>): Substitute with .. (sse2_packsswb<mask_name>): .. this, .. (avx2_packsswb<mask_name>): .. this and .. (avx512bw_packsswb<mask_name>): .. this. (<sse2_avx2>_packssdw<mask_name>): Substitute with .. (sse2_packssdw<mask_name>): .. this, .. (avx2_packssdw<mask_name>): .. this and .. (avx512bw_packssdw<mask_name>): .. this. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-vpackssdw-3.c: New test. * gcc.target/i386/avx512bw-vpacksswb-3.c: New test.
2023-06-19	Reimplement packuswb/packusdw with UNSPEC_US_TRUNCATE instead of original ↵	liuhongt	4	-30/+59
	us_truncate. packuswb/packusdw does unsigned saturation for signed source, but rtl us_truncate means does unsigned saturation for unsigned source. So for value -1, packuswb will produce 0, but us_truncate produces 255. The patch reimplement those related patterns and functions with UNSPEC_US_TRUNCATE instead of us_truncate. gcc/ChangeLog: PR target/110235 * config/i386/i386-expand.cc (ix86_split_mmx_pack): Use UNSPEC_US_TRUNCATE instead of original us_truncate for packusdw/packuswb. * config/i386/mmx.md (mmx_pack<s_trunsuffix>swb): Substitute with .. (mmx_packsswb): .. this and .. (mmx_packuswb): .. this. (mmx_packusdw): Use UNSPEC_US_TRUNCATE instead of original us_truncate. (s_trunsuffix): Removed code iterator. (any_s_truncate): Ditto. * config/i386/sse.md (<sse2_avx2>_packuswb<mask_name>): Use UNSPEC_US_TRUNCATE instead of original us_truncate. (<sse4_1_avx2>_packusdw<mask_name>): Ditto. * config/i386/i386.md (UNSPEC_US_TRUNCATE): New unspec_c_enum.
2023-06-19	Daily bump.	GCC Administrator	4	-1/+174

2023-06-19	RISC-V: Fix one typo for reduc expand GET_MODE_CLASS	Pan Li	1	-1/+1
	This patch would like to fix one typo when GET_MODE_CLASS by mode. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Fix one typo.
2023-06-18	Silence warning in gcc.dg/lto/20091013-1_0.c	Jan Hubicka	1	-1/+1
	gcc/testsuite/ChangeLog: * gcc.dg/lto/20091013-1_0.c: Disable stringop-overread warning.
2023-06-18	RTL: Change return type of predicate and callback functions from int to bool	Uros Bizjak	4	-34/+34
	gcc/ChangeLog: * rtl.h (rtx_equal_p_callback_function): Change return type from int to bool. (rtx_equal_p): Ditto. (hash_rtx_callback_function): Ditto. * rtl.cc (rtx_equal_p): Change return type from int to bool and adjust function body accordingly. * early-remat.cc (scratch_equal): Ditto. * sel-sched-ir.cc (skip_unspecs_callback): Ditto. (hash_with_unspec_callback): Ditto.
2023-06-18	PR modula2/110284 Remove stor-layout.o and backend header files	Gaius Mulley	2	-10/+1
	This patch removes stor-layout.o from the front end and also removes back end header files from gcc-consolidation.h. gcc/m2/ChangeLog: PR modula2/110284 * Make-lang.in (m2_OBJS): Assign $(GM2_C_OBJS). (GM2_C_OBJS): Remove m2/stor-layout.o. (m2/stor-layout.o): Remove rule. * gm2-gcc/gcc-consolidation.h (rtl.h): Remove include. (df.h): Remove include. (except.h): Remove include. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>