aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
10 daysada: Multiple levels of ghost codeViljar Indus28-838/+2654
Adds support for the new language feature that allows ghost enties and assertion pragmas and aspects to be associated with a new entity called assertion level. Added support for a new pragma Assertion_Level that declares new assertion levels. This pragma consists of the level name and assertion levels it depends on. pragma Assertion_Level (L1); pragma Assertion_Level (L2); pragma Assertion_Level (L3, Depends => [L1, L2]); There are two special assertion levels that are considered to be declared in the Standard package that have unique properties. Assertion level Runtime is always considered to be Checked. Its assertion policy cannot be changed and it is considered that all other assertion levels depend on this level. Assertion level Static is always considered to be Ignored. Its assertion policy cannot be changed. All assertion levels that depend on this level can also never be activated. Aspect Ghost now supports the assertion level as a possible argument: ... with Ghost => Level; All pragmas and aspects which were considered to be valid assertion kinds for pragma Assertion_Policy now support assertion level associations. The association consists of an assertion level and a set of existing arguments. Note that you can have multiple assertion level associations in a given pragma or aspect. e.g. pragma Assert (Level1 => (Check => Expr1, Message => "Msg1", Level2 => (Check => Expr2, Message => "Msg2")); It is possible to set an explicit Assertion_Policy for those levels which can control the policies for all entities associated with those levels. Setting the policy Check for a given level means also that policy Check is set for all of the policies that it depends on. From the previous example. pragma Assertion_Policy (L3 => Check); is equivalent to: pragma Assertion_Policy (L1 => Check); pragma Assertion_Policy (L2 => Check); pragma Assertion_Policy (L3 => Check); Setting the policy Ignore for a given level means that the policy Ignore is also applied to all the levels that depend on it. e.g pragma Assertion_Policy (L2 => Ignore); is equivalent to: pragma Assertion_Policy (L2 => Ignore); pragma Assertion_Policy (L3 => Ignore); Since now ghost regions can contain other ghost regions with a different assertion policy then new rules needed to be added for those situations to ensure valid compilation. Additionally all rules where we checked for compatible assertion policies have an additional check for a compatible assertion level dependencies. Ghost entities A and B are considered assertion level dependent when * A or B does not have an associated assertion level. * Both A and B have an assertion level and either * the level of A is or depends on the level of B. * the level of B cannot be enabled (is or depends on Static) gcc/ada/ChangeLog: * atree.adb (Mark_New_Ghost_Node): Store the assertion level on the entity. * contracts.adb (Analyze_Package_Contract): Add support for multiple pragma Initial_Condition orginating from multiple assertion levels. * cstand.adb (Make_Assertion_Level_Definition): New function that creates a new Assertion_Level and adds it to the Assertion_Levels table. (Create_Standard): Add definitions for assertion levels defined in Standard. (Print_Standard): Add assertion level pragmas do the output. * exp_ch6.adb (Check_Subprogram_Variant): Add support for multiple Subprogram_Variant pragmas created by assertion levels. * einfo.ads: add info for the new nodes and attributes. * exp_prag.adb (Consequence_Error): Fix error message string corruption caused by another call to the internal strings during the call to Make_Procedure_Call_Statement. (Expand_Pragma_Initial_Condition): Ensure all ghost related attributes are copied to the new pragma. (Expand_Pragma_Loop_Variant): Likewise. (Expand_Pragma_Subprogram_Variant). Likewise. Additionally create a new Subprogram_Variant function for each pragma associated with an assertion level. * exp_util.adb (Add_DIC_Check): Ensure all ghost related attributes are copied to the new pragma. (Build_DIC_Procedure_Body): Add support for mutliple DIC pragmas created from assertion levels. * gen_il-fields.ads: (Aspect_Ghost_Assertion_Level): New field. (Original_Aspect): New field. (Original_Pragma): New field. (Pragma_Ghost_Assertion_Level): New field. (Child_Levels): New field. (Ghost_Assertion_Level): New field. (Parent_Levels): New field. * gen_il-gen-gen_entities.adb: Add Ghost_Assertion_Level field for all entities Add new E_Assertion_Level entity for storing assertion levels. * gen_il-gen-gen_nodes.adb: Add Aspect_Ghost_Assertion_Level for N_Aspect to store the assertion level associated with the aspect. Add Original_Aspect to store the original aspect where the aspect that was transformed from an aspect with an assertion level origninated from. Add Pragma_Ghost_Assertion_Level and Original_Pragma to store the same information for N_Prama nodes. * gen_il-types.ads: Add new entity kind E_Assertion_Level * ghost.adb (Assertion_Level_Error_Msg): Create constant for error messages using the same main error message. (Ghost_Policy_Error_Msg): Likewise. (Assertion_Level_To_Name): New subprogram. (Check_Valid_Ghost_Declaration): New subprogram. (Get_Ghost_Aspect): New subprogram. (Get_Ghost_Assertion_Level): New subprogram. (Ghost_Policy_In_Effect): New subprogram. (Install_Ghost_Region): New subprogram. (Mark_And_Set_Ghost_Region): New subprogram. (Mark_Ghost_Declaration_Or_Body): Add new argument for assertion levels. (Check_Ghost_Completion): Update ghost policy calculation with assertion levels. Refactor error message. (Is_OK_Statement): Add new checks for valid assertion policies and assertion levels. (Is_OK_Pragma): Refactor the calculation of valid ghost pragmas. (Check_Ghost_Policy): Make the checks ghost region based. (Check_Ghost_Context): Refactor the order of checks. (Check_Ghost_Formal_Procedure_Or_Package): Relax the checks for overriding procedures. Now only ignored subprograms cannot be overridden by checked or non-ghost subprograms. (Check_Ghost_Primitive): Relax conditions for primitve operations. Now only checked primitive subprograms are considered invalid for ignored tagged types. Add assertion level compatibility checks. (Check_Ghost_Refinement): Relax conditions for ghost refinements. Add assertion level compatibility checks for refinements. (Install_Ghost_Region): Store the current region and the assertion for that region in the ghost config. (Enables_Ghostness): Refactor implementation to support assertion levels. (Is_Subject_To_Ghost): Simplify implementation. (Mark_And_Set_Ghost_Assignment): Refactor implementation. (Mark_And_Set_Ghost_Body): Add support for assertion levels. (Mark_And_Set_Ghost_Completion): Likewise. (Mark_And_Set_Ghost_Declaration): Likwise. (Mark_And_Set_Ghost_Instantiation): Likwise. (Mark_And_Set_Ghost_Procedure_Call): Refactor implementation. (Mark_Ghost_Declaration_Or_Body): Add support for assertion levels. (Set_Ghost_Mode): Likwise. * ghost.ads (Assertion_Level_From_Arg): New subprogram. (Install_Ghost_Region): Add argument Level for assertrion levels. (Is_Assertion_Level_Dependent): New subprogram. * lib-xref.ads: Add new mapping for E_Assertion_Level entities. * opt.ads (Ghost_Config_Type): Add new members Ghost_Assertion_Mode and Current_Region to the structure. * par-prag.adb (Prag): Add new pragma name Assertion_Level. * rtsfind.adb (Load_RTU): Update the arguments for the call to Install_Ghost_Region. * sem.adb (Do_Analyze): Likewise. * sem_ch13.adb (Convert_Aspect_With_Assertion_Levels): New subprogram. (Make_Aitem_Pragma): Copy ghost mode attributes from the aspect to the pragma. (Analyze_Aspect_Specifications): Convert aspects that have an assertion level association in the aspects without the association and the original supported syntax and with the assertion level stored on the aspect node. Updated duplicate detection to avoid duplicates being called on aspects with assertion levels that orginated from the same aspect. * sem_prag.adb (Apply_Check_Policy): New subprogram. (Get_Applicable_Policy): New subprogram. (Mark_Is_Checked): New subprogram. (Mark_Is_Disabled): New subprogram. (Mark_Is_Ignored): New subprgram. (Check_Arg_Is_One_Of): Remove versions that had a specific number of arguments and replace them with a list one. (Create_Pragma_Without_Assertion_Level): New subprogram. (Assertion_Level_Pragma_Comes_From_Source): New subprogram. (Analyze_Pragma): Replace aspects that have an assertion level with aspects without them where the level is stored on the pragma node. (Abstract_State): Add support for assertion levels in ghost Abstract_State pragmas. (Assert): Update argument handling for Assert like pragmas. (Assertion_Level): Add a new section to support the analysis of pragma Assertion_Level. (Assertion_Policy): Add support for setting the policy for assertion levels. (Check): Update argument handling. Update the assertion policy application process. (Check_Policy): Add support for assertion levels. Add check_policy pragmas for assertion_level dependencis also to the stack of known Check_Policy pragmas. (Default_Initial_Condition): Reject the use of DIC with assertion levels. Update duplication checks. (Ghost): Add support for assertion levels. Fix issue where assertion levels with Ghost => False were treated as ghost. (Predicate): Update the policy handling of Ghost_Predicate. (Analyze_Refined_State_In_Decl_Part): Create a new ghost region for analyzing Refined_State. (Check_Applicable_Policy): Refactor the implementation. Break it down to Get_Applicable_Policy and Apply_Check_Policy. (Check_Kind): Removed. Replaced by Get_Applicable_Policy and Apply_Check_Policy. (Initialize): Initialize the table storing all know assertion levels. * sem_prag.ads (Find_Assertion_Level): New subprogram. (Insert_Assertion_Level): New subprogram. (Check_Applicable_Policy): Add new argument Level. (Check_Kind): Removed. Merged with Get_Applicable_Policy. (Get_Assertion_Level): New subprogram. (Is_Valid_Assertion_Level): New subprogram. * sem_util.adb (Copy_Assertion_Policy_Attributes): New function for copying the ghost related attributes from one pragma to another. (Copy_Subprogram_Spec): Additionally copy the level from the spec. (Depends_On_Level): New function for checking if one level depends on another level. (From_Same_Aspect): New function for checking whether the aspects orignate from the same original aspect. (From_Same_Pragma): New function for checking whether the pragmas originate from the same original aspect or pragma. (Get_Subprogram_Entity): Avoid crash when being called when the entity has not been set for the subprogram. (Has_Assertion_Level_Argument): New function for checking whether the aspect or a pragma has an argument that is using an assertion level association. (Policy_In_Effect): add an additional argument for the level that should be checked along with the assertion name. * sem_util.ads (Copy_Assertion_Policy_Attributes): New function. (Depends_On_Level): Likewise. (From_Same_Aspect): Likewise. (From_Same_Pragma): Likewise. (Has_Assertion_Level_Argument): Likewise. (Is_Same_Or_Depends_On_Level): Likewise. (Policy_In_Effect): Add new argument Level. * sinfo.ads: Add documentation for all the new attributes that were added to the nodes and entities. * snames.ads-tmpl: Add new entries for Name_Assertion_Level, Name_uDefault_Assertion_Level and Pragma_Assertion_Level. * stand.ads: Add new entities for the predefined assertion levels. (Standard_Level_Static): Definition for the predefined Static level that is always ignored. (Standard_Level_Runtime): Defintion for the predefined Runtime level that is always checked. (Standard_Level_Default): Definition for the implicit Default level that is given for ghost entities that were not associated with an assertion level (e.g. Ghost => True). * tbuild.adb (Make_Assertion_Level): New function for constructin an assertion level. * tbuild.ads (Make_Assertion_Level): Likewise.
10 daysc++: Fix mangling of _Float16 template args [PR121801]Matthias Kretz2-1/+27
Signed-off-by: Matthias Kretz <m.kretz@gsi.de> gcc/testsuite/ChangeLog: PR c++/121801 * g++.dg/abi/pr121801.C: New test. gcc/cp/ChangeLog: PR c++/121801 * mangle.cc (write_real_cst): Handle 16-bit real and assert that reals have 16 bits or a multiple of 32 bits.
10 daysx86: Enable SSE4.1 ceil/floor/trunc for -OsH.J. Lu3-5/+54
Enable SSE4.1 ceil/floor/trunc for -Os to replace a function call with roundss or roundsd by dropping the !flag_trapping_math check. gcc/ PR target/121861 * config/i386/i386.cc (ix86_optab_supported_p): Drop !flag_trapping_math check for floor_optab, ceil_optab and btrunc_optab. gcc/testsuite/ PR target/121861 * gcc.target/i386/pr121861-1a.c: New file. * gcc.target/i386/pr121861-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
10 daysUse vpermil{ps,pd} instead of vperm{d,q} when permutation is in-lane.liuhongt7-14/+78
gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_vpermil): Extend to handle V8SImode. * config/i386/i386.cc (avx_vpermilp_parallel): Extend to handle vector integer modes with same vector size and same component size. * config/i386/sse.md (<sse2_avx_avx512f>_vpermilp<mode><mask_name>): Ditto. (V48_AVX): New mode iterator. (ssefltmodesuffix): Extend for V16SI/V8DI/V16SF/V8DF. gcc/testsuite/ChangeLog: * gcc.target/i386/avx256_avoid_vec_perm-3.c: New test. * gcc.target/i386/avx256_avoid_vec_perm-4.c: New test. * gcc.target/i386/avx512bw-vpalignr-4.c: Adjust testcase. * gcc.target/i386/avx512vl-vpalignr-4.c: Ditto.
10 daysExclude fake cross-lane permutation from avx256_avoid_vec_perm.liuhongt4-3/+103
SLP may take a broadcast as kind of vec_perm, the patch checks the permutation index to exclude those false positive. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Check permutation index for vec_perm, don't count it if we know it's not a cross-lane permutation. gcc/testsuite/ChangeLog: * gcc.target/i386/avx256_avoid_vec_perm.c: Adjust testcase. * gcc.target/i386/avx256_avoid_vec_perm-2.c: New test. * gcc.target/i386/avx256_avoid_vec_perm-5.c: New test.
10 daysDaily bump.GCC Administrator6-1/+291
11 daysc: Update TLS model after processing a TLS variableH.J. Lu1-2/+14
Set a tentative TLS model in grokvardecl and update TLS mode with the default TLS access model after a TLS variable has been fully processed if the default TLS access model is stronger. gcc/c/ PR c/107419 * c-decl.cc (c_decl_attributes): Update TLS model with the default TLS access model if the default TLS access model is stronger. (grokdeclarator): Set a tentative TLS model which will be updated by c_decl_attributes later. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
11 daysAda: Make -fdump-ada-spec deal with pointers to anonymous structureEric Botcazou1-3/+23
This is about -fdump-ada-spec not generating the definition of the structure for pointers to anonymous structure as structure elements. gcc/c-family: PR ada/121544 * c-ada-spec.cc (dump_ada_node) <POINTER_TYPE>: Dump the name of anonymous tagged pointed-to types specially. (dump_nested_type) <POINTER_TYPE>: Recurse on anonymous pointed-to types declared in the same file. Set TREE_VISITED on the underlying DECL of the field type, if any.
11 daysTestsuite: Fix spurious of ACATS-4 test cxai033Eric Botcazou1-2/+2
This tentatively applies the same tweak as in other similar cases. gcc/testsuite/ PR ada/121532 * ada/acats-4/tests/cxa/cxai033.a: Use Long_Switch_To_New_Task constant instead of Switch_To_New_Task in delay statements.
11 daystestsuite: Another fixup for fixed-point/bitint-1.c testXi Ruoyao1-1/+1
Besides r16-3595, there's another bug in this test: with -std=c23 the token _Sat isn't recognized as a keyword at all, thus an error massage different from the expected will be outputted. Fix it by using -std=gnu23 instead. gcc/testsuite: * gcc.dg/fixed-point/bitint-1.c (dg-options): Use -std=gnu23 instead of -std=c23.
11 daystree-optimization/121844 - IVOPTs and asm goto in latchRichard Biener2-6/+23
When there's an asm goto in the latch of a loop we may not use IP_END IVs since instantiating those would (need to) split the latch edge which in turn invalidates IP_NORMAL position handling. This is a revision of the PR107997 fix. PR tree-optimization/107997 PR tree-optimization/121844 * tree-ssa-loop-ivopts.cc (allow_ip_end_pos_p): Do not allow IP_END for latches ending with a control stmt. (create_new_iv): Do not split the latch edge, instead assert that's not necessary. * gcc.dg/torture/pr121844.c: New testcase.
11 daysRISC-V: Add pattern for vector-scalar widening floating-point addPaul-Antoine Arras12-3/+74
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a plus RTL instruction. Before this patch, we have four instructions, e.g.: fcvt.d.s fa0,fa0 vsetvli a5,zero,e64,m1,ta,ma vfmv.v.f v3,fa0 vfwadd.wv v1,v3,v2 After, we get only one: vfwadd.vf v1,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwadd_vf_<mode>): New pattern to combine float_extend + vec_duplicate + vfwadd.vv into vfwadd.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwadd. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h (DEF_VF_BINOP_WIDEN_CASE_0): Fix OP. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-1-f32.c: New test.
11 daysRISC-V: Adjust tt-ascalon-d8 branch costAnton Blanchard1-1/+1
If-conversion isn't being applied to this nbench code: #include <stdint.h> #define INTERNAL_FPF_PRECISION 4 typedef uint16_t u16; void ShiftMantLeft1(u16 *carry, u16 *mantissa) { int i; int new_carry; u16 accum; for(i=INTERNAL_FPF_PRECISION-1;i>=0;i--) { accum=mantissa[i]; new_carry=accum & 0x8000; accum=accum<<1; if(*carry) accum|=1; *carry=new_carry; mantissa[i]=accum; } return; } Bumping branch_cost from 3 to 4 triggers if-conversion, improving the nbench FP EMULATION result on Ascalon significantly. There's a risk that more aggressive use of conditional zero instructions will negatively impact workloads that predict well, but we haven't seen anything obvious. gcc/ChangeLog: * config/riscv/riscv.cc (tt_ascalon_d8_tune_info): Increase branch_cost from 3 to 4.
11 daystree-optimization/121830 - SLP cycle detection confused by nested cycleRichard Biener3-3/+16
The SLP reduc-index computation is confused by having an outer reduction inner loop nested cycle fed by another non-reduction nested cycle. Instead of undoing the unfortunate mixing of outer reduction inner cycles with general nested cycles the following instead distinguishes them by not setting STMT_VINFO_REDUC_DEF on the non-reduction nested cycles. PR tree-optimization/121830 * tree-vect-loop.cc (vect_analyze_scalar_cycles_1): Only set STMT_VINFO_REDUC_DEF on reductions. * tree-vect-slp.cc (vect_build_slp_tree_2): Identify reduction PHIs by a set STMT_VINFO_REDUC_DEF instead of their def type. * gcc.dg/vect/pr121830.c: New testcase.
11 daystree-optimization/121829 - bogus CFG with asm gotoRichard Biener2-15/+39
When the vectorizer removes a forwarder created earlier by split_edge it uses redirect_edge_pred for convenience and efficiency. That breaks down when the edge split is originating from an asm goto as that is a jump that needs adjustments from redirect_edge_and_branch. The following factores a simple vect_remove_forwarder handling this situation appropriately. PR tree-optimization/121829 * tree-vect-loop-manip.cc (vect_remove_forwarder): New function. (slpeel_tree_duplicate_loop_to_edge_cfg): Use it. * gcc.dg/torture/pr121829.c: New testcase.
11 daysdoc: Document the -folding option for -fdump-tree-* [PR114892]Alex Coplan1-0/+3
I noticed that the -fdump-tree-*-folding flag isn't documented in the Developer options section of invoke.texi; this patch fixes that. gcc/ChangeLog: PR tree-optimization/114892 * doc/invoke.texi (Developer Options): Document -folding option for -fdump-tree-*.
11 days[AutoFDO] Check count initialization to fix ICE with AutoFDOKugan Vivekanandarajah1-1/+2
Fix ICE with AutoFDO by adding initialization check before accessing IPA counts to avoid issues with uninitialized profile counts in self-recursive clone processing. gcc/ChangeLog: 2025-09-08 Kugan Vivekanandarajah <kvivekananda@nvidia.com> * ipa-cp.cc (gather_count_of_non_rec_edges): Check count initialization before adding to total. Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
11 daysRISC-V: Add pattern for vector-scalar single-width floating-point reverse subPaul-Antoine Arras17-0/+234
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a minus RTL instruction. The vec_duplicate is the minuend operand. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfsub.vv v1,v2,v1 After, we get only one: vfrsub.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfrsub_vf_<mode>): New pattern to combine vec_duplicate + vfsub.vv into vfrsub.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for vfrsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f64.c: New test.
11 daysRISC-V: Add pattern for vector-scalar single-width floating-point subPaul-Antoine Arras19-7/+239
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a minus RTL instruction. The vec_duplicate is the subtrahend operand. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfsub.vv v1,v1,v2 After, we get only one: vfsub.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfsub_vf_<mode>): New pattern to combine vec_duplicate + vfsub.vv into vfsub.vf. * config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: Adjust scan dumps. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for vfsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f64.c: New test.
11 daysRISC-V: Add pattern for vector-scalar single-width floating-point addPaul-Antoine Arras20-3/+236
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a plus RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfadd.vv v1,v1,v2 After, we get only one: vfadd.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfadd_vf_<mode>): New pattern to combine vec_duplicate + vfadd.vv into vfadd.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: Adjust scan dump. * gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfadd. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for vfadd. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f64.c: New test.
11 daysRISC-V: Add pattern for vector-scalar widening floating-point multiplyPaul-Antoine Arras14-6/+139
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a mult RTL instruction. Before this patch, we have six instructions, e.g.: fcvt.d.s fa0,fa0 vsetvli a5,zero,e64,m1,ta,ma vfmv.v.f v3,fa0 vfwcvt.f.f.v v1,v2 vsetvli zero,zero,e64,m1,ta,ma vfmul.vv v1,v3,v1 After, we get only one: vfwmul.vf v1,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwmul_vf_<mode>): New pattern to combine float_extend + vec_duplicate + vfmul.vv into vfmul.vf. * config/riscv/vector.md (*@pred_dual_widen_<optab><mode>_scalar): Swap operands to match the RTL emitted by expand, i.e. first float_extend then vec_duplicate. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwmul. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for widening variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_widen_run.h: New test helper. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmul-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmul-run-1-f32.c: New test.
11 daysRISC-V: Add patterns for vector-scalar IEEE floating-point maxPaul-Antoine Arras13-10/+32
These patterns enable the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into an unspec_vfmax RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfmax.vv v1,v2,v1 After, we get only one: vfmax.vf v1,v1,fa0 In some cases, it also shaves off one vsetvli. gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfmin_vf_ieee_<mode>): Rename into... (*v<ieee_fmaxmin_op>_vf_<mode>): New pattern to combine vec_duplicate + vf{max,min}.vv (unspec) into vf{max,min}.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f16.c: Add vfmax. * gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f16.c: Add vfmax. Also add missing -fno-fast-math. * gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f64.c: Likewise.
11 daysdoc: Remove references to Binutils 2.7 requirementsGerald Pfeifer1-5/+1
GNU Binutils 2.7 was released in 1996, no realistic need to point it out as a minimal requirement. gcc: * doc/extend.texi (SH Function Attributes): Remove reference to GNU Binutils 2.7 requirement. (H8/300 Variable Attributes): Ditto.
11 daysFortran: Correct variable typespec in PDT specification exprs [PR84008]Paul Thomas2-0/+32
2025-09-08 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/84008 * decl.cc (insert_parameter_exprs): Correct the typespec of new variable declarations, where the type is set to BT_PROCEDURE as a precaution for resolution of the whole program unit. gcc/testsuite/ PR fortran/84008 * gfortran.dg/pdt_45.f03: New test.
11 daysstrlen: Handle empty constructor as memset for combining with malloc to ↵Andrew Pinski4-5/+111
calloc [PR87900] This was noticed when turning memset (with constant size) into a store of an empty constructor but can be reproduced without that. In this case we have the following IR: ``` p_3 = __builtin_malloc (4096); *p_3 = {}; ``` Which we can treat the store as a memset. So this patch adds the similar optimization as memset/malloc now for malloc/constructor. This patch is on top of https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681439.html (it calls allow_memset_malloc_to_calloc but that can be removed if that patch is rejected). Changes since v1: * v2: Correctly return false from handle_assign after removing stmt. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/87900 gcc/ChangeLog: * tree-ssa-strlen.cc (strlen_pass::handle_assign): Add RHS argument. For empty constructor RHS, see if can combine with a previous malloc into a calloc. (strlen_pass::check_and_optimize_call): Update call to handle_assign; passing NULL_TREE for RHS. (strlen_pass::check_and_optimize_stmt): Update call to handle_assign. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/calloc-10.c: New test. * gcc.dg/tree-ssa/calloc-11.c: New test. * gcc.dg/tree-ssa/calloc-12.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 daysstrlen: Don't do the malloc+memset->calloc optimization in some cases [PR83022]Andrew Pinski5-0/+169
This fixes a long standing (since GCC 5) issue where the malloc+memset->calloc optimization would happen even if the memset was not always executed. This is a varient of Nathan's patch: https://inbox.sourceware.org/gcc-patches/f4b5d106-8176-b7bd-709b-d435188783b0@acm.org/ Jeff Law had suggested to look at probabilities of the basic blocks to see if it is profitable or not; I am not totally convinced that is a good idea. Though this is an extended version of Nathan's patch as it uses post domination to see if the memset is always called after the condition of null-ness. PR tree-optimization/83022 gcc/ChangeLog: * tree-ssa-strlen.cc (last_stmt_ptr_check): New function. (allow_memset_malloc_to_calloc): New function. (strlen_pass::handle_builtin_memset): Check to see if it is a good idea to do the malloc+memset->calloc optimization. (printf_strlen_execute): Free post dom info. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/calloc-6.c: New test. * gcc.dg/tree-ssa/calloc-7.c: New test. * gcc.dg/tree-ssa/calloc-8.c: New test. * gcc.dg/tree-ssa/calloc-9.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
11 daysgcc: regenerate common.opt.urlsSam James1-0/+3
Needed to add -fdep-fusion. gcc/ChangeLog: * common.opt.urls: Regenerate.
11 daysDaily bump.GCC Administrator4-1/+166
12 daysforwprop: Improve rejection of overlapping for copyprop of aggregates [PR121841]Andrew Pinski2-3/+30
Here we have: tmp = src1[0]; dest1[0] = tmp; where src1 and dest1 are decls. We currently reject this as the bases are different but since the bases are decls we know they won't overlap. This adds the extra check to allow this. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/121841 gcc/ChangeLog: * tree-ssa-forwprop.cc (optimize_agr_copyprop_1): Allow two different decls as bases as non-overlapping bases. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/copy-prop-aggregate-struct-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
12 daysgcc: introduce the dep_fusion passJeff Law1-3/+3
>> + >> + // opt_pass methods: >> + opt_pass *clone () override { return new pass_dep_fusion (m_ctxt); } >> + bool gate (function *) override; >> + unsigned int execute (function *) override; > > Wouldn't it be better to add 'final' along with 'override' to opt_pass > vfuncs? > (See commit 725793af78064fa605ea6d9376aaf99ecb71467b, etc.)Yea. It's easily missed. Fixed in the obvious way. Bootstrapped and regression tested on x86_64. Pushed to the trunk. gcc/ * dep-fusion.cc: Mark clone, gate and execute methods as final.
12 daysRISC-V: Add support for the XAndesvdot ISA extension.Kuan-Lin Chen20-3/+2488
This extension defines vector instructions to calculae of the signed/unsigned dot product of four SEW/4-bit data and accumulate the result into a SEWbit element for all elements in a vector register. gcc/ChangeLog: * config/riscv/andes-vector-builtins-bases.cc (nds_vd4dot): New class. (class nds_vd4dotsu): New class. * config/riscv/andes-vector-builtins-bases.h: New def. * config/riscv/andes-vector-builtins-functions.def (nds_vd4dots): Ditto. (nds_vd4dotsu): Ditto. (nds_vd4dotu): Ditto. * config/riscv/andes-vector.md (@pred_nds_vd4dot<su><mode>): New pattern. (@pred_nds_vd4dotsu<mode>): New pattern. * config/riscv/genrvv-type-indexer.cc (main): Modify sew of QUAD_FIX, QUAD_FIX_SIGNED and QUAD_FIX_UNSIGNED. * config/riscv/riscv-vector-builtins.cc (qexti_vvvv_ops): New operand information. (qexti_su_vvvv_ops): New operand information. (qextu_vvvv_ops): New operand information. * config/riscv/riscv-vector-builtins.h (XANDESVDOT_EXT): New def. (required_ext_to_isa_name): Add case XANDESVDOT_EXT. (required_extensions_specified): Ditto. (struct function_group_info): Ditto. * config/riscv/vector-iterators.md (NDS_QUAD_FIX): New iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotu.c: New test.
12 days[RISC-V] Fix ordering of pipeline modelsJeff Law1-1/+1
I missed that the new ascalon pipeline description was put into the wrong place during review. The net is tests which wanted to use generic-ooo explicitly for stability in the test output ended up getting a different pipeline model and different codegen than the test expected. This tripped a small number of vsetvl failures in the testsuite. This has spun on riscv64-elf and riscv32-elf in my tester and fixes the regression. I'm going to go ahead and push it as I'm likely offline this afternoon/evening and don't want anyone else to waste their time chasing the regression down. gcc/ * config/riscv/riscv-opts.h (riscv_microarchitecture_type): Fix ordering.
12 daysRISC-V: Add support for the XAndesvpackfph ISA extension.Kuan-Lin Chen19-1/+1381
This extension defines vector instructions to extract a pair of FP16 data from a floating-point register. Multiply the top FP16 data with the FP16 elements and add the result with the bottom FP16 data. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Turn on VECTOR_ELEN_FP_16 for XAndesvpackfph. * config/riscv/andes-vector-builtins-bases.cc (nds_vfpmad): New class. * config/riscv/andes-vector-builtins-bases.h: New def. * config/riscv/andes-vector-builtins-functions.def (nds_vfpmadt): Ditto. (nds_vfpmadb): Ditto. (nds_vfpmadt_frm): Ditto. (nds_vfpmadb_frm): Ditto. * config/riscv/andes-vector.md (@pred_nds_vfpmad<nds_tb><mode>): New pattern. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F16_OPS): New def. * config/riscv/riscv-vector-builtins.cc (f16_ops): Ditto * config/riscv/riscv-vector-builtins.def (float32_type_node): Ditto. * config/riscv/riscv-vector-builtins.h (XANDESVPACKFPH_EXT): Ditto. (required_ext_to_isa_name): Add case XANDESVPACKFPH_EXT. (required_extensions_specified): Ditto. * config/riscv/vector-iterators.md (VHF): New iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadt.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadt.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadt.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadt.c: New test.
12 daysc++: Update TLS model after processing a TLS variableH.J. Lu5-4/+61
Set a tentative TLS model in grokvardecl and update TLS mode with the default TLS access model after a TLS variable has been fully processed if the default TLS access model is stronger. gcc/cp/ PR c++/107393 * decl.cc (grokvardecl): Set a tentative TLS model which will be updated by cplus_decl_attributes later. * decl2.cc (cplus_decl_attributes): Update TLS model with the default TLS access model if the default TLS access model is stronger. * pt.cc (tsubst_decl): Set TLS model only after processing a variable. gcc/testsuite/ PR c++/107393 * g++.dg/tls/pr107393-1.C: New test. * g++.dg/tls/pr107393-2.C: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
12 daysAVR: ad target/121794 - Invoke zero_reg less.Georg-Johann Lay1-5/+5
gcc/ PR target/121794 * config/avr/avr.md (cmpqi3): Use cpi R,0 if possible.
12 daysRISC-V: Add test for vec_duplicate + vnmsub.vv unsigned combine with GR2VR ↵Pan Li16-0/+76
cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vnmsub.vvm combine to vnmsub.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vnmsub.vx. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-u8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
12 daysRISC-V: Add test for vec_duplicate + vnmsub.vv signed combine with GR2VR ↵Pan Li18-0/+446
cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vnmsub.vv combine to vnmsub.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vnmsub.vx. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/vx_ternary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vnmsub-run-1-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
12 daysRISC-V: Combine vec_duplicate + vnmsub.vv to vnmsub.vx on GR2VR costPan Li2-27/+31
This patch would like to combine the vec_duplicate + vnmsub.vv to the vnmsub.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 ... 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma ... 22 │ vnmsub.vv v1,v2,v3 ... 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 ... 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma ... 20 │ vnmsub.vx v1,a2,v3 ... 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vnmsac_vx_<mode>): Rename from. (*mul_minus_vx_<mode>): Rename to and add nmsub support. * config/riscv/vector.md (@pred_vnmsac_vx_<mode>): Rename from. (@pred_mul_minus_vx_<mode>): Rename to and add nmsub support. (*pred_nmsac_<mode>_scalar_undef): Rename from. (*pred_mul_minus_vx<mode>_undef): Rename to and add nmsub support. Signed-off-by: Pan Li <pan2.li@intel.com>
12 daysDaily bump.GCC Administrator5-1/+224
12 daysdoc: drop verify-canonical-types=1 refSam James1-2/+1
--param verify-canonical-types was removed back in r0-81986-g7313518b90b280. The same verification is controlled via our generic checking framework these days. gcc/ChangeLog: * doc/generic.texi (TYPE_CANONICAL): Don't mention long-removed --param verify-canonical-types.
12 daysdep_fusion: Fix if target does not have macro fusion [PR121835]Andrew Pinski1-0/+4
This new pass will ICE if the target does not define the macro_fusion_pair_p pass. The pass will not be useful in that case so it is best to return early. Pushed as obvious after a bootstrap on x86_64-linux-gnu. PR rtl-optimization/121835 gcc/ChangeLog: * dep-fusion.cc (pass_dep_fusion::execute): Return early if macro_fusion_pair_p is null. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
13 daysgcc: introduce the dep_fusion passArtemiy Volkov7-3/+169
Presently, the scheduler code only considers consecutive instructions for macro-op fusion (see sched-deps.cc::sched_macro_fuse_insns () for details). This patch introduces the new dep_fusion pass, which is intended to uncover more fusion opportunities by reordering eligible instructions to form fusible pairs (based solely on the value of the TARGET_SCHED_MACRO_FUSION_PAIR_P hook). This is achieved by using the RTL-SSA framework, and only the single-use instructions are considered for the first instruction of a pair. Aside from reordering instructions, this pass also sets the SCHED_GROUP flag for the second instruction so that following passes can implement special handling of the fused pairs. For instance, RA and regrename should make use of this information to preserve single-output property for some of such pairs. Accordingly, in passes.def, this patch adds two invocations of the new pass: just before IRA and just before regrename. The new pass is enabled at -O2+ and -Os. gcc/ChangeLog: * Makefile.in (OBJS): Add dep-fusion.o. * common.opt (fdep-fusion): Add option. * dep-fusion.cc: New pass. * doc/invoke.texi: Document it. * opts.cc (default_options_table): Enable it at -O2+ and -Os. * passes.def: Insert two instances of dep_fusion. * tree-pass.h (make_pass_dep_fusion): Declare new function.
13 daysdoc: fix -momit-leaf-frame-pointer typoSam James1-1/+1
For x86, the option is -momit-leaf-frame-pointer, not -fomit-leaf-frame-pointer. gcc/ChangeLog: * doc/invoke.texi (x86 Options): Fix '-momit-leaf-frame-pointer' typo.
13 daysforwprop: Factor out the memcpy followed by memset optimizationAndrew Pinski1-198/+212
As simplify_builtin_call adds more and more optimization, it is getting bigger and bigger and easier to misunderstand, so this factors out the memcpy followed by memset optimization (which was the original optimization added). Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (simplify_builtin_call): Factor out the memcpy followed by a memset optimization to ... (simplify_builtin_memcpy_memset): Here. New function. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
13 daysforwprop: Factor out memchr optimization to its own functionAndrew Pinski1-60/+70
As more optimizations are added to forwprop's simplify_builtin_call, this function is becoming harder and harder to understand. To help simplify things, this factors out the memchr optimization to its own function like what was done when memcmp optimization was added. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-forwprop.cc (simplify_builtin_call): Factor out the memchr optimization to ... (simplify_builtin_memchr): Here. New function. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
13 daysipa: Fix build on MacOSSimon Martin1-0/+1
The build is broken on MacOS since r16-3581-g1da3c4d90e678a because ipa-inline-transform.cc uses std::max but does not include <algorithm>. This patch fixes it by defining INCLUDE_ALGORITHM in that file. gcc/ChangeLog: * ipa-inline-transform.cc: Define INCLUDE_ALGORITHM.
13 daysinstall: Properly capitalize GNU BinutilsGerald Pfeifer1-12/+12
gcc: PR target/69374 * doc/install.texi (Prerequisites): Properly capitalize GNU Binutils. (Configuration): Ditto. (Building): Ditto. (Specific): Ditto.
13 daysdoc: consistently say 'whole-program' where appropriateSam James1-4/+4
Unchanged instances are deliberate. gcc/ChangeLog: * doc/invoke.texi: Say 'whole-program' consistently where appropriate.
13 daysdoc: consistently spell 'GNU Binutils'Sam James1-2/+2
gcc/ChangeLog: * doc/invoke.texi: Capitalize 'GNU Binutils' consistently.
13 daysdoc: update incremental link vs binutils informationSam James1-7/+6
GNU Binutils now supports linking LTO and non-LTO objects into a single mixed object file as of 2.44. Update the text to reflect this and fix some minor grammar issues while at it. gcc/ChangeLog: PR ipa/116410 * doc/invoke.texi (Link Options): Update -flinker-output= text to reflect GNU Binutils changes. Fix grammar.