aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-05-30RISC-V: Allow all const_vec_duplicates as constants.Robin Dapp7-43/+74
As we can always broadcast an integer constant to a vector register allow them in riscv_const_insns. We need as many instructions as it takes to generate the constant and one vmv.vx. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_const_insns): Allow const_vec_duplicates. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: Add vmv.v.x tests. * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Dito.
2023-05-30Detect bswap + rotate for byte permutation in pass_bswap.liuhongt6-27/+342
The patch doesn't handle: 1. cast64_to_32, 2. memory source with rsize < range. gcc/ChangeLog: PR middle-end/108938 * gimple-ssa-store-merging.cc (is_bswap_or_nop_p): New function, cut from original find_bswap_or_nop function. (find_bswap_or_nop): Add a new parameter, detect bswap + rotate and save rotate result in the new parameter. (bswap_replace): Add a new parameter to indicate rotate and generate rotate stmt if needed. (maybe_optimize_vector_constructor): Adjust for new rotate parameter in the upper 2 functions. (pass_optimize_bswap::execute): Ditto. (imm_store_chain_info::output_merged_store): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108938-1.c: New test. * gcc.target/i386/pr108938-2.c: New test. * gcc.target/i386/pr108938-3.c: New test. * gcc.target/i386/pr108938-load-1.c: New test. * gcc.target/i386/pr108938-load-2.c: New test.
2023-05-30aarch64: Convert ADDLP and ADALP patterns to standard RTL codesKyrylo Tkachov3-19/+74
This patch converts the patterns for the integer widen and pairwise-add instructions to standard RTL operations. The pairwise addition withing a vector can be represented as an addition of two vec_selects, one selecting the even elements, and one selecting odd. Thus for the intrinsic vpaddlq_s8 we can generate: (set (reg:V8HI 92) (plus:V8HI (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ])) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) (const_int 4 [0x4]) (const_int 6 [0x6]) (const_int 8 [0x8]) (const_int 10 [0xa]) (const_int 12 [0xc]) (const_int 14 [0xe]) ])) (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ])) (parallel [ (const_int 1 [0x1]) (const_int 3 [0x3]) (const_int 5 [0x5]) (const_int 7 [0x7]) (const_int 9 [0x9]) (const_int 11 [0xb]) (const_int 13 [0xd]) (const_int 15 [0xf]) ])))) Similarly for the accumulating forms where there's an extra outer PLUS for the accumulation. We already have the handy helper functions aarch64_stepped_int_parallel_p and aarch64_gen_stepped_int_parallel defined in aarch64.cc that we can make use of to define the right predicate for the VEC_SELECT PARALLEL. This patch allows us to remove some code iterators and the UNSPEC definitions for SADDLP and UADDLP. UNSPEC_UADALP and UNSPEC_SADALP are retained because they are used by SVE2 patterns still. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>adalp<mode>): Delete. (aarch64_<su>adalp<mode>): New define_expand. (*aarch64_<su>adalp<mode><vczle><vczbe>_insn): New define_insn. (aarch64_<su>addlp<mode>): Convert to define_expand. (*aarch64_<su>addlp<mode><vczle><vczbe>_insn): New define_insn. * config/aarch64/iterators.md (UNSPEC_SADDLP, UNSPEC_UADDLP): Delete. (ADALP): Likewise. (USADDLP): Likewise. * config/aarch64/predicates.md (vect_par_cnst_even_or_odd_half): Define.
2023-05-30aarch64: Reimplement v(r)hadd and vhsub intrinsics with RTL codesKyrylo Tkachov2-16/+95
This patch reimplements the MD patterns for the UHADD,SHADD,UHSUB,SHSUB,URHADD,SRHADD instructions using standard RTL operations rather than unspecs. The correct RTL representations involves widening the inputs before adding them and halving, followed by a truncation back to the original mode. An unfortunate wart in the patch is that we end up having very similar expanders for the intrinsics through the aarch64_<su>h<ADDSUB:optab><mode> and aarch64_<su>rhadd<mode> names and the standard names for the vector averaging optabs <su>avg<mode>3_floor and <su>avg<mode>3_ceil. I'd like to reuse <su>avg<mode>3_ceil for the intrinsics builtin as well but our scheme in aarch64-simd-builtins.def and aarch64-builtins.cc makes it awkward by only allowing mappings of entries in aarch64-simd-builtins.def to: 0 - CODE_FOR_aarch64_<name><mode> 1-9 - CODE_FOR_<name><mode><1-9> 10 - CODE_FOR_<name><mode> whereas here we want a string after the <mode> i.e. CODE_FOR_uavg<mode>3_ceil. This patch adds a bit of remapping logic in aarch64-builtins.cc before the construction of the builtin info that remaps the CODE_FOR_* definitions in aarch64-simd-builtins.def to the optab-derived ones. CODE_FOR_aarch64_srhaddv4si gets remapped to CODE_FOR_avgv4si3_ceil, for example. It's a bit specific to this case, but this solution requires the least invasive changes while avoiding having duplicate expanders just for the sake of a different pattern name. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (VAR1): Move to after inclusion of aarch64-builtin-iterators.h. Add definition to remap shadd, uhadd, srhadd, urhadd builtin codes for standard optab ones. * config/aarch64/aarch64-simd.md (<u>avg<mode>3_floor): Rename to... (<su_optab>avg<mode>3_floor): ... This. Expand to RTL codes rather than unspec. (<u>avg<mode>3_ceil): Rename to... (<su_optab>avg<mode>3_ceil): ... This. Expand to RTL codes rather than unspec. (aarch64_<su>hsub<mode>): New define_expand. (aarch64_<sur>h<addsub><mode><vczle><vczbe>): Split into... (*aarch64_<su>h<ADDSUB:optab><mode><vczle><vczbe>_insn): ... This... (*aarch64_<su>rhadd<mode><vczle><vczbe>_insn): ... And this.
2023-05-30riscv: add work around for PR sanitizer/82501Andreas Schwab1-0/+1
gcc/testsuite/ PR sanitizer/82501 * c-c++-common/asan/pointer-compare-1.c: Disable use of small data on RISC-V.
2023-05-30riscv: update riscv_asan_shadow_offsetAndreas Schwab1-4/+3
gcc/ PR target/110036 * config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to match libsanitizer.
2023-05-30stor-layout, aarch64: Express SRA intrinsics with RTL codesKyrylo Tkachov10-31/+206
This patch expresses the intrinsics for the SRA and RSRA instructions with standard RTL codes rather than relying on UNSPECs. These instructions perform a vector shift right plus accumulate with an optional rounding constant addition for the RSRA variant. There are a number of interesting points: * The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N is left using the UNSPECs. Expressing it as a DImode plus+shift led to all kinds of trouble as it started matching the existing define_insns for "add x0, x0, asr #N" instructions and adding the SRA form as an extra alternative required a significant amount of deduplication of iterators and things still didn't work out well. I decided not to tackle that case in this patch. It can be attempted later. * For the RSRA variants that add a rounding constant (1 << (shift-1)) the addition is notionally performed in a wider mode than the input types so that overflow is handled properly. In RTL this can be represented with an appropriate extend operation followed by a truncate back to the original modes. However for 128-bit input modes such as V4SI we don't have appropriate modes defined for this widening i.e. we'd need a V4DI mode to represent the intermediate widened result. This patch defines such modes for V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have more Advanced SIMD instruction that have similar intermediate widening semantics. * The above new modes led to a problem with stor-layout.cc. The new modes only exist for the sake of the RTL optimisers understanding the semantics of the instruction but are not indended to be moved to and from register or memory, assigned to types, used as TYPE_MODE or participate in auto-vectorisation. This is expressed in aarch64 by aarch64_classify_vector_mode returning zero for these new modes. However, the code in stor-layout.cc:<mode_for_vector> explicitly doesn't check this when picking a TYPE_MODE due to modes being made potentially available later through target switching (PR38240). This led to these modes being picked as TYPE_MODE for declarations such as: typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit fixed-length SVE modes are available and vector_type_mode later struggling to rectify this. This issue is addressed with the new target hook TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a vector mode can be used in any legal target attribute configuration of the port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks only the initial target configuration. This allows a simple adjustment in stor-layout.cc that still disqualifies these limited modes early on while allowing consideration of modes that can be turned on in the future with target attributes. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New modes. * config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p): Declare prototype. (aarch64_const_vec_rsra_rnd_imm_p): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_simd_sra<mode>): Rename to... (aarch64_<sra_op>sra_n<mode>_insn): ... This. (aarch64_<sra_op>rsra_n<mode>_insn): New define_insn. (aarch64_<sra_op>sra_n<mode>): New define_expand. (aarch64_<sra_op>rsra_n<mode>): Likewise. (aarch64_<sur>sra_n<mode>): Rename to... (aarch64_<sur>sra_ndi): ... This. * config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add any_target_p argument. (aarch64_extract_vec_duplicate_wide_int): Define. (aarch64_const_vec_rsra_rnd_imm_p): Likewise. (aarch64_const_vec_rnd_cst_p): Likewise. (aarch64_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA): Delete. (VSRA): Adjust for the above. (sur): Likewise. (V2XWIDE): New mode_attr. (vec_or_offset): Likewise. (SHIFTEXTEND): Likewise. * config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): New predicate. * doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to clarify that it applies to current target options. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document. * doc/tm.texi.in: Regenerate. * stor-layout.cc (mode_for_vector): Check vector_mode_supported_any_target_p when iterating through vector modes. * target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to clarify that it applies to current target options. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.
2023-05-30ada: Fix wrong access for qualified aggregate with storage modelEric Botcazou1-3/+6
The previous fix to get_storage_model_access was incomplete and needs to be extended to the node itself. gcc/ada/ * gcc-interface/trans.cc (get_storage_model_access): Also strip any type conversion in the node when unwinding the components.
2023-05-30ada: Fix internal error on qualified aggregate with storage modelEric Botcazou1-17/+19
It comes from a small oversight in get_storage_model_access. gcc/ada/ * gcc-interface/trans.cc (node_is_component): Remove parentheses. (node_is_type_conversion): New predicate. (get_atomic_access): Use it. (get_storage_model_access): Likewise and look into the parent to find a component if it returns true. (present_in_lhs_or_actual_p): Likewise.
2023-05-30ada: Add missing guards for degenerate storage modelsEric Botcazou1-5/+10
gcc/ada/ * gcc-interface/trans.cc (Attribute_to_gnu) <Attr_Size>: Check that the storage model has Copy_From before instantiating loads for it. <Attr_Length>: Likewise. <Attr_Bit_Position>: Likewise. (gnat_to_gnu) <N_Indexed_Component>: Likewise. <N_Slice>: Likewise.
2023-05-30ada: Fix incorrect copies being used with 'AddressMarc Poulhiès1-4/+9
When using 'Address on an object with a size clause, gigi would end up creating a copy and using its address instead of the one of the original object, leading to incorrect behavior. Remove the conversion (that triggers the copy) when 'Address is applied to a declaration. gcc/ada/ * gcc-interface/trans.cc (Attribute_to_gnu): Also strip conversion in case of DECL.
2023-05-30ada: Fix bogus Storage_Error on dynamic array with static zero lengthEric Botcazou1-4/+21
This works around the limitations present for the support of arrays in the middle-end by clearing the TREE_OVERFLOW flag for arrays with zero length. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Array_Type>: Use a local variable for the GNAT index type. <E_Array_Subtype>: Likewise. Call Is_Null_Range on the bounds and force the zero on TYPE_SIZE and TYPE_SIZE_UNIT if it returns true.
2023-05-30ada: Fix minor issue with Mod operatorEric Botcazou1-4/+4
gcc/ada/ * gcc-interface/trans.cc (gnat_to_gnu) <N_Op_Mod>: Test the precision of the operation rather than that of the result type.
2023-05-30ada: Minor generic tweaks left and and rightEric Botcazou4-12/+14
No functional changes. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Replace integer_zero_node with null_pointer_node for pointer types. * gcc-interface/trans.cc (gnat_gimplify_expr) <NULL_EXPR>: Likewise. * gcc-interface/utils.cc (maybe_pad_type): Do not attempt to make a packable type from a fat pointer type. * gcc-interface/utils2.cc (build_atomic_load): Use a local variable. (build_atomic_store): Likewise.
2023-05-30ada: Make internal_error_function more robustEric Botcazou1-6/+16
gcc/ada/ * gcc-interface/misc.cc (internal_error_function): Be prepared for an input_location set to UNKNOWN_LOCATION.
2023-05-30ada: Adjust again the implementation of storage modelsEric Botcazou1-22/+29
The code generator must now be prepared to translate assignment statements to objects allocated with a storage model and that are not initialized yet. gcc/ada/ * gcc-interface/trans.cc (Attribute_to_gnu) <Attr_Size>: Tweak. (gnat_to_gnu) <N_Assignment_Statement>: Declare a local variable. For a target with a storage model, use the Actual_Designated_Subtype to compute the size if it is present.
2023-05-30ada: Simplify the implementation of storage modelsEric Botcazou1-103/+27
As the additional temporaries required by the semantics of nonnative storage models are now created by the front-end, in particular for actual parameters and assignment statements, the corresponding code in gigi can be removed. gcc/ada/ * gcc-interface/trans.cc (Call_to_gnu): Remove code implementing the by-copy semantics for actuals with nonnative storage models. (gnat_to_gnu) <N_Assignment_Statement>: Remove code instantiating a temporary for assignments between nonnative storage models.
2023-05-30ada: Make use of Cannot_Be_Superflat flag on N_Range nodesEric Botcazou2-3/+4
gcc/ada/ * gcc-interface/decl.cc (range_cannot_be_superflat): Return true immediately if Cannot_Be_Superflat is set. * gcc-interface/misc.cc (gnat_post_options): Do not override the -Wstringop-overflow setting.
2023-05-30ada: Disable PIE mode during the build of the Ada front-endEric Botcazou1-13/+3
This also removes some obsolete stuff. gcc/ada/ * gcc-interface/Make-lang.in (ADA_CFLAGS): Move up. (ALL_ADAFLAGS): Add $(NO_PIE_CFLAGS). (ada/mdll.o): Remove. (ada/mdll-fil.o): Likewise. (ada/mdll-utl.o): Likewise.
2023-05-30ada: Fix storage model handling for dereference as lvalue and renamingsMarc Poulhiès1-3/+21
Don't require storage access for explicit dereferences used as lvalue (e.g. Some_Access.all'Address) or for renamings. gcc/ada/ * gcc-interface/trans.cc (get_storage_model_access): Don't require storage model access for dereference used as lvalue or renamings.
2023-05-30ada: Small cleanups and fixes in expansion of aggregatesEric Botcazou1-62/+28
This streamlines the handling of qualified expressions in the expansion of aggregates and plugs a couple of loopholes that may cause memory leaks. gcc/ada/ * exp_aggr.adb (Build_Array_Aggr_Code): Move the declaration of Typ to the beginning. (Initialize_Array_Component): Test the unqualified version of the expression for the nested array case. (Initialize_Ctrl_Array_Component): Do not duplicate the expression here. Do the pattern matching of the unqualified version of it. (Gen_Assign): Call Unqualify to compute Expr_Q and use Expr_Q in subsequent pattern matching. (Initialize_Ctrl_Record_Component): Do the pattern matching of the unqualified version of the aggregate. (Build_Record_Aggr_Code): Call Unqualify. (Convert_Aggr_In_Assignment): Likewise. (Convert_Aggr_In_Object_Decl): Likewise. (Component_OK_For_Backend): Likewise. (Is_Delayed_Aggregate): Likewise.
2023-05-30ada: Fix wrong expansion of array aggregate with noncontiguous choicesEric Botcazou1-20/+18
This extends an earlier fix done for the others choice of an array aggregate to all the choices of the aggregate, since the same sharing issue may happen when the choices are not contiguous. gcc/ada/ * exp_aggr.adb (Build_Array_Aggr_Code.Get_Assoc_Expr): Duplicate the expression here instead of... (Build_Array_Aggr_Code): ...here.
2023-05-30ada: Fix internal error on array constant in expression functionEric Botcazou1-4/+21
This happens when the peculiar check emitted by Check_Large_Modular_Array is applied to an object whose actual subtype is an itype with dynamic size, because the first reference to the itype in the expanded code may turn out to be within the raise statement, which is problematic for the eloboration of this itype by the code generator at library level. gcc/ada/ * freeze.adb (Check_Large_Modular_Array): Fix head comment, use Standard_Long_Long_Integer_Size directly and generate a reference just before the raise statement if the Etype of the object is an itype declared in an open scope.
2023-05-30ada: Fix fallout of recent fix for missing finalizationEric Botcazou1-10/+26
The original fix makes it possible to create transient scopes around return statements in more cases, but it overlooks that transient scopes are reused and, in particular, that they can be promoted to secondary stack management. gcc/ada/ * exp_ch7.adb (Find_Enclosing_Transient_Scope): Return the index in the scope table instead of the scope's entity. (Establish_Transient_Scope): If an enclosing scope already exists, do not set the Uses_Sec_Stack flag on it if the node to be wrapped is a return statement which requires secondary stack management.
2023-05-30ada: Add System.Traceback.Symbolic.Module_Name support on AArch64 LinuxJoel Brobecker1-0/+2
This commit changes the runtime on aarch64-linux to use the Linux version of s-tsmona.adb, so as to add support for this functionality on aarch64-linux. gcc/ada/ * Makefile.rtl: Use libgnat/s-tsmona__linux.adb on aarch64-linux. Link libgnat with -ldl, as the use of s-tsmona__linux.adb requires it.
2023-05-30ada: Only build access-to-subprogram wrappers when expander is activePiotr Trojanek2-14/+2
For access-to-subprogram types with Pre/Post aspects we create a wrapper routine that evaluates these aspects. Spec of this wrapper was created always, while its body was only created when expansion was enabled. Now we only create these wrappers when expansion is enabled. In particular, we don't create them in GNATprove mode; instead, GNATprove picks the Pre/Post expressions directly from the aspects. gcc/ada/ * exp_ch3.adb (Build_Access_Subprogram_Wrapper_Body): Build wrapper body if requested by routine that builds wrapper spec. * sem_ch3.adb (Analyze_Full_Type_Declaration): Only build wrapper when expander is active. (Build_Access_Subprogram_Wrapper): Remove special-case for GNATprove.
2023-05-30ada: Fix minor issues in user's guideRonan Desplanques3-36/+32
gcc/ada/ * doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix minor issues. * doc/gnat_ugn/the_gnat_compilation_model.rst: Fix minor issues. * gnat_ugn.texi: Regenerate.
2023-05-30ada: Ensure Default_Stack_Size is greater than Minimum_Stack_SizeJohannes Kliemann3-2/+13
The Default_Stack_Size function does not check that the binder specified default stack size is greater than the minimum stack size for the runtime. This can result in tasks using default stack sizes less than the minimum stack size because the Adjust_Storage_Size only adjusts storages sizes for tasks that explicitly specify a storage size. To avoid this, the binder specified default stack size is round up to the minimum stack size if required. gcc/ada/ * libgnat/s-parame.adb: Check that Default_Stack_Size >= Minimum_Stack_size. * libgnat/s-parame__rtems.adb: Ditto. * libgnat/s-parame__vxworks.adb: Check that Default_Stack_Size >= Minimum_Stack_size and use the proper Minimum_Stack_Size if Stack_Check_Limits is enabled.
2023-05-30ada: Fix regression of secondary stack management in return statementsEric Botcazou1-38/+31
This happens when the expression of the return statement is a call that does not return on the same stack as the enclosing function. gcc/ada/ * sem_res.adb (Resolve_Call): Restrict previous change to calls that return on the same stack as the enclosing function. Tidy up.
2023-05-30ada: Use generalized loop iteration in Put_Image routinesEric Botcazou2-16/+10
gcc/ada/ * libgnat/a-cidlli.adb (Put_Image): Simplify. * libgnat/a-coinve.adb (Put_Image): Likewise.
2023-05-30ada: Fix visibility error with DIC or Type_Invariant aspect on generic typeEric Botcazou1-2/+17
The compiler fails to capture global references during the analysis of the aspect on the generic type because it analyzes a copy of the expression. gcc/ada/ * exp_util.adb (Build_DIC_Procedure_Body.Add_Own_DIC): When inside a generic unit, preanalyze the expression directly. (Build_Invariant_Procedure_Body.Add_Own_Invariants): Likewise.
2023-05-30ada: Fix coding style in init.cCedric Landet1-2/+2
The coding style rules require to avoid using FIXME comments. ??? is preferred. gcc/ada/ * init.c: Replace FIXME by ???
2023-05-30Handle FMA friendly in reassoc passLili Cui3-83/+217
Make some changes in reassoc pass to make it more friendly to fma pass later. Using FMA instead of mult + add reduces register pressure and insruction retired. There are mainly two changes 1. Put no-mult ops and mult ops alternately at the end of the queue, which is conducive to generating more fma and reducing the loss of FMA when breaking the chain. 2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel chains according to the given correlation width, keeping the FMA chance as much as possible. With the patch applied On ICX: 507.cactuBSSN_r: Improved by 1.7% for multi-copy . 503.bwaves_r : Improved by 0.60% for single copy . 507.cactuBSSN_r: Improved by 1.10% for single copy . 519.lbm_r : Improved by 2.21% for single copy . no measurable changes for other benchmarks. On aarch64 507.cactuBSSN_r: Improved by 1.7% for multi-copy. 503.bwaves_r : Improved by 6.00% for single-copy. no measurable changes for other benchmarks. TEST1: float foo (float a, float b, float c, float d, float *e) { return *e + a * b + c * d ; } For "-Ofast -mfpmath=sse -mfma" GCC generates: vmulss %xmm3, %xmm2, %xmm2 vfmadd132ss %xmm1, %xmm2, %xmm0 vaddss (%rdi), %xmm0, %xmm0 ret With this patch GCC generates: vfmadd213ss (%rdi), %xmm1, %xmm0 vfmadd231ss %xmm2, %xmm3, %xmm0 ret TEST2: for (int i = 0; i < N; i++) { a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] + m[i]* o[i] + p[i]; } For "-Ofast -mfpmath=sse -mfma" GCC generates: vmovapd e(%rax), %ymm4 vmulpd d(%rax), %ymm4, %ymm3 addq $32, %rax vmovapd c-32(%rax), %ymm5 vmovapd j-32(%rax), %ymm6 vmulpd h-32(%rax), %ymm6, %ymm2 vmovapd a-32(%rax), %ymm6 vaddpd p-32(%rax), %ymm6, %ymm0 vmovapd g-32(%rax), %ymm7 vfmadd231pd b-32(%rax), %ymm5, %ymm3 vmovapd o-32(%rax), %ymm4 vmulpd m-32(%rax), %ymm4, %ymm1 vmovapd l-32(%rax), %ymm5 vfmadd231pd f-32(%rax), %ymm7, %ymm2 vfmadd231pd k-32(%rax), %ymm5, %ymm1 vaddpd %ymm3, %ymm0, %ymm0 vaddpd %ymm2, %ymm0, %ymm0 vaddpd %ymm1, %ymm0, %ymm0 vmovapd %ymm0, a-32(%rax) cmpq $8192, %rax jne .L4 vzeroupper ret with this patch applied GCC breaks the chain with width = 2 and generates 6 fma: vmovapd a(%rax), %ymm2 vmovapd c(%rax), %ymm0 addq $32, %rax vmovapd e-32(%rax), %ymm1 vmovapd p-32(%rax), %ymm5 vmovapd g-32(%rax), %ymm3 vmovapd j-32(%rax), %ymm6 vmovapd l-32(%rax), %ymm4 vmovapd o-32(%rax), %ymm7 vfmadd132pd b-32(%rax), %ymm2, %ymm0 vfmadd132pd d-32(%rax), %ymm5, %ymm1 vfmadd231pd f-32(%rax), %ymm3, %ymm0 vfmadd231pd h-32(%rax), %ymm6, %ymm1 vfmadd231pd k-32(%rax), %ymm4, %ymm0 vfmadd231pd m-32(%rax), %ymm7, %ymm1 vaddpd %ymm1, %ymm0, %ymm0 vmovapd %ymm0, a-32(%rax) cmpq $8192, %rax jne .L2 vzeroupper ret gcc/ChangeLog: PR tree-optimization/98350 * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Rewrite this function. (rank_ops_for_fma): New. (reassociate_bb): Handle new function. gcc/testsuite/ChangeLog: PR tree-optimization/98350 * gcc.dg/pr98350-1.c: New test. * gcc.dg/pr98350-2.c: Ditto.
2023-05-30rtlanal: Change return type of predicate functions from int to boolUros Bizjak3-275/+275
gcc/ChangeLog: * rtl.h (rtx_addr_can_trap_p): Change return type from int to bool. (rtx_unstable_p): Ditto. (reg_mentioned_p): Ditto. (reg_referenced_p): Ditto. (reg_used_between_p): Ditto. (reg_set_between_p): Ditto. (modified_between_p): Ditto. (no_labels_between_p): Ditto. (modified_in_p): Ditto. (reg_set_p): Ditto. (multiple_sets): Ditto. (set_noop_p): Ditto. (noop_move_p): Ditto. (reg_overlap_mentioned_p): Ditto. (dead_or_set_p): Ditto. (dead_or_set_regno_p): Ditto. (find_reg_fusage): Ditto. (find_regno_fusage): Ditto. (side_effects_p): Ditto. (volatile_refs_p): Ditto. (volatile_insn_p): Ditto. (may_trap_p_1): Ditto. (may_trap_p): Ditto. (may_trap_or_fault_p): Ditto. (computed_jump_p): Ditto. (auto_inc_p): Ditto. (loc_mentioned_in_p): Ditto. * rtlanal.cc (computed_jump_p_1): Adjust forward declaration. (rtx_unstable_p): Change return type from int to bool and adjust function body accordingly. (rtx_addr_can_trap_p): Ditto. (reg_mentioned_p): Ditto. (no_labels_between_p): Ditto. (reg_used_between_p): Ditto. (reg_referenced_p): Ditto. (reg_set_between_p): Ditto. (reg_set_p): Ditto. (modified_between_p): Ditto. (modified_in_p): Ditto. (multiple_sets): Ditto. (set_noop_p): Ditto. (noop_move_p): Ditto. (reg_overlap_mentioned_p): Ditto. (dead_or_set_p): Ditto. (dead_or_set_regno_p): Ditto. (find_reg_fusage): Ditto. (find_regno_fusage): Ditto. (remove_node_from_insn_list): Ditto. (volatile_insn_p): Ditto. (volatile_refs_p): Ditto. (side_effects_p): Ditto. (may_trap_p_1): Ditto. (may_trap_p): Ditto. (may_trap_or_fault_p): Ditto. (computed_jump_p): Ditto. (auto_inc_p): Ditto. (loc_mentioned_in_p): Ditto. * combine.cc (can_combine_p): Update indirect function.
2023-05-30RISC-V: Add floating-point to integer conversion RVV auto-vectorization supportJuzhe-Zhong7-1/+110
Even though we can't support floating-point operations which are depending on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc is not updated and we can't support mode switching for this. We can support floating-point to integer conversion now since it's not depending on FRM and we don't need mode switching support for this ('rtz' conversions independent FRM). Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode><vconvert>2): New pattern. * config/riscv/iterators.md: New attribute. * config/riscv/vector-iterators.md: New attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test.
2023-05-30RISC-V: Fix warning in riscv.mdFrom: Juzhe-Zhong1-2/+2
Notice there is warning: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md: In function ‘rtx_def* gen_anddi3(rtx, rtx, rtx)’: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) Add unsigned conversion to fix this warning. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/riscv.md: Fix signed and unsigned comparison warning.
2023-05-30RISC-V: Add RVV FNMA auto-vectorization supportJuzhe-Zhong7-0/+432
Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/autovec.md (fnma<mode>4): New pattern. (*fnma<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.
2023-05-30Daily bump.GCC Administrator3-1/+41
2023-05-29RISC-V: Optimize TARGET_XTHEADCONDMOVDie Li3-98/+42
This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is enabled. Provide an example from the existing testcases. Testcase: int ConEmv_imm_imm_reg(int x, int y){ if (x == 1000) return 10; return y; } Cflags: -O2 -march=rv64gc_xtheadcondmov -mabi=lp64d before patch: ConEmv_imm_imm_reg: addi a5,a0,-1000 li a0,10 th.mvnez a0,zero,a5 th.mveqz a1,zero,a5 or a0,a0,a1 ret after patch: ConEmv_imm_imm_reg: addi a5,a0,-1000 li a0,10 th.mvnez a0,a1,a5 ret Signed-off-by: Die Li <lidie@eswincomputing.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_conditional_move_onesided): Delete. (riscv_expand_conditional_move): Reuse the TARGET_SFB_ALU expand process for TARGET_XTHEADCONDMOV gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output. * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
2023-05-29i386: Also require TARGET_AVX512BW to generate truncv16hiv16qi2 [PR110021]Uros Bizjak1-1/+1
gcc/ChangeLog: PR target/110021 * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also require TARGET_AVX512BW to generate truncv16hiv16qi2.
2023-05-29RISC-V: Use extension instructions instead of bitwise "and"Jivan Hakobyan4-1/+104
In the case where the target supports extension instructions, it is preferable to use that instead of doing the same in other ways. For the following case void foo (unsigned long a, unsigned long* ptr) { ptr[0] = a & 0xffffffffUL; ptr[1] &= 0xffffffffUL; } GCC generates foo: li a5,-1 srli a5,a5,32 and a0,a0,a5 sd a0,0(a1) ld a4,8(a1) and a5,a4,a5 sd a5,8(a1) ret but it will be profitable to generate this one foo: zext.w a0,a0 sd a0,0(a1) lwu a5,8(a1) sd a5,8(a1) ret This patch fixes mentioned issue. It supports HI -> DI, HI->SI and SI -> DI extensions. gcc/ChangeLog: * config/riscv/riscv.md (and<mode>3): New expander. (*and<mode>3) New pattern. * config/riscv/predicates.md (arith_operand_or_mode_mask): New predicate. gcc/testsuite/ChangeLog: * gcc.target/riscv/and-extend-1.c: New test * gcc.target/riscv/and-extend-2.c: New test
2023-05-29RISC-V: Refactor comments and naming of riscv-v.cc.Pan Li1-47/+49
This patch would like to remove unnecessary comments of some self explained parameters and try a better name to avoid misleading. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_insn): Remove unnecessary comments and rename local variables. (emit_nonvlmax_insn): Diito. (emit_vlmax_merge_insn): Ditto. (emit_vlmax_cmp_insn): Ditto. (emit_vlmax_cmp_mu_insn): Ditto. (emit_scalar_move_insn): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-29Daily bump.GCC Administrator4-1/+404
2023-05-29RISC-V: Eliminate the magic number in riscv-v.ccPan Li1-31/+46
This patch would like to remove the magic number in the riscv-v.cc, and align the same value to one macro. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_insn): Eliminate the magic number. (emit_nonvlmax_insn): Ditto. (emit_vlmax_merge_insn): Ditto. (emit_vlmax_cmp_insn): Ditto. (emit_vlmax_cmp_mu_insn): Ditto. (expand_vec_series): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-29RISC-V: Using merge approach to optimize repeating sequence in vec_initPan Li11-6/+457
This patch would like to optimize the VLS vector initialization like repeating sequence. From the vslide1down to the vmerge with a simple cost model, aka every instruction only has 1 cost. Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax typedef int64_t vnx32di __attribute__ ((vector_size (256))); __attribute__ ((noipa)) void f_vnx32di (int64_t a, int64_t b, int64_t *out) { vnx32di v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, }; *(vnx32di *) out = v; } Before this patch: vslide1down.vx (x31 times) After this patch: li a5,-1431654400 addi a5,a5,-1365 li a3,-1431654400 addi a3,a3,-1366 slli a5,a5,32 add a5,a5,a3 vsetvli a4,zero,e64,m8,ta,ma vmv.v.x v8,a0 vmv.s.x v0,a5 vmerge.vxm v8,v8,a1,v0 vs8r.v v8,0(a2) Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into SEW = 128 element and then broadcast this big element. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): New type. * config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro. (rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced class member. (rvv_builder::get_merged_repeating_sequence): Ditto. (rvv_builder::repeating_sequence_use_merge_profitable_p): New function to evaluate the optimization cost. (rvv_builder::get_merge_scalar_mask): New function to get the merge mask. (emit_scalar_move_insn): New function to emit vmv.s.x. (emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x. (emit_nonvlmax_integer_move_insn): New function to emit nonvlmax vmv.v.x. (get_repeating_sequence_dup_machine_mode): New function to get the dup machine mode. (expand_vector_init_merge_repeating_sequence): New function to perform the optimization. (expand_vec_init): Add this vector init optimization. * config/riscv/riscv.h (BITS_PER_WORD): New macro. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-29Fix incorrect SLOC inherited by induction variable incrementEric Botcazou1-4/+4
This extends the condition to more cases involving debug instructions. gcc/ * tree-ssa-loop-manip.cc (create_iv): Try harder to find a SLOC to put onto the increment when it is inserted after the position.
2023-05-29Fix artificial overflow during GENERIC foldingEric Botcazou2-0/+27
The Ada compiler gives a bogus warning: storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run time [enabled by default] Ironically enough, this occurs because of an intermediate conversion to an unsigned type which is supposed to hide overflows but is counter-productive for constants because TREE_OVERFLOW is always set for them, so it ends up setting a bogus TREE_OVERFLOW when converting back to the original type. The fix simply redirects INTEGER_CSTs to the other, direct path without the intermediate conversion to the unsigned type. gcc/ * match.pd ((T)P - (T)(P + A) -> -(T) A): Avoid artificial overflow on constants. gcc/testsuite/ * gnat.dg/specs/storage_offset1.ads: New test.
2023-05-29ada: Define sigset for systems that does not suport socketsCedric Landet1-5/+5
In s-oscons-tmplt.c, sigset is defined inside the HAVE_SOCKETS bloc. A platform could require sigset without supporting sockets. gcc/ada/ * s-oscons-tmplt.c: move the definition of sigset out of the HAVE_SOCKETS bloc.
2023-05-29ada: Set g-spogwa as a GNATRTL_SOCKETS_OBJSCedric Landet1-2/+1
g-spogwa.adb is the body of the procedure GNAT.Sockets.Poll.G_Wait. This is a socket specific procedure. It should only be built for systems that support sockets. gcc/ada/ * Makefile.rtl: Move g-spogwa$(objext) from GNATRTL_NONTASKING_OBJS to GNATRTL_SOCKETS_OBJS
2023-05-29ada: Fix spurious error on imported generic function with preconditionEric Botcazou1-7/+1
It occurs during the instantiation because the compiler forgets the context of the generic declaration. gcc/ada/ * freeze.adb (Wrap_Imported_Subprogram): Use Copy_Subprogram_Spec in both cases to copy the spec of the subprogram.