aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-12-18RISC-V: Enable vect test for RV32Juzhe-Zhong1-3/+4
gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add RV32.
2023-12-18RISC-V: Fix natural regsize for fixed-vlmax of -march=rv64gc_zve32fJuzhe-Zhong4-3/+102
This patch fixes 12 ICEs of "full coverage" testing: Running target riscv-sim/-march=rv64gc_zve32f/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.dg/torture/pr96513.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr96513.c -O3 -g (internal compiler error: Segmentation fault) Running target riscv-sim/-march=rv64gc_zve32f/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.dg/torture/pr111048.c -O2 (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr111048.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr111048.c -O3 -g (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr96513.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr96513.c -O3 -g (internal compiler error: Segmentation fault) Running target riscv-sim/-march=rv64gc_zve32f/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.dg/torture/pr96513.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: Segmentation fault) FAIL: gcc.dg/torture/pr96513.c -O3 -g (internal compiler error: Segmentation fault) Running target riscv-sim/-march=rv64gc_zve32f/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax FAIL: gcc.c-torture/execute/20000801-1.c -O2 (internal compiler error: Segmentation fault) FAIL: gcc.c-torture/execute/20000801-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: Segmentation fault) FAIL: gcc.c-torture/execute/20000801-1.c -O3 -g (internal compiler error: Segmentation fault) The root cause of those ICEs is vector register size = 32bits, wheras scalar register size = 64bit. That is, vector regsize < scalar regsize on -march=rv64gc_zve32f FIXED-VLMAX. So the original natural regsize using scalar register size is incorrect. Instead, we should return minimum regsize between vector regsize and scalar regsize. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_regmode_natural_size): Fix ICE for FIXED-VLMAX of -march=rv32gc_zve32f. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/bug-4.c: New test. * gcc.target/riscv/rvv/autovec/bug-5.c: New test. * gcc.target/riscv/rvv/autovec/bug-6.c: New test.
2023-12-18tree-object-size: Robustify alloc_size attribute handling [PR113013]Jakub Jelinek2-9/+35
The following testcase ICEs because we aren't careful enough with alloc_size attribute. We do check that such an argument exists (although wouldn't handle correctly functions with more than INT_MAX arguments), but didn't check that it is scalar integer, the ICE is trying to fold_convert a structure to sizetype. Given that the attribute can also appear on non-prototyped functions where the arguments aren't known, I don't see how the FE could diagnose that and because we already handle the case where argument doesn't exist, I think we should also verify the argument is scalar integer convertible to sizetype. Furthermore, given this is not just in diagnostics but used for code generation, I think it is better to punt on arguments with larger precision then sizetype, the upper bits are then truncated. The patch also fixes some formatting issues and avoids duplication of the fold_convert, plus removes unnecessary check for if (arg1 >= 0), that is always the case after if (arg1 < 0) return ...; 2023-12-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113013 * tree-object-size.cc (alloc_object_size): Return size_unknown if corresponding argument(s) don't have integral type or have integral type with higher precision than sizetype. Don't check arg1 >= 0 uselessly. Compare argument indexes against gimple_call_num_args in unsigned type rather than int. Formatting fixes. * gcc.dg/pr113013.c: New test.
2023-12-18testsuite: Fix up abi-tag25a.C test for C++11Jakub Jelinek1-1/+1
Line 11 of abi-tag25.C is wrapped in #if __cpp_variable_templates which isn't defined for -std=c++11, so we can't expect a warning in that case either. 2023-12-18 Jakub Jelinek <jakub@redhat.com> * g++.dg/abi/abi-tag25a.C: Expect second dg-warning only for c++14 and later.
2023-12-18RISC-V: Bugfix for the RVV const vectorPan Li1-1/+1
This patch would like to fix one bug of const vector for interleave. Assume we need to generate interleave const vector like below. V = {{4, -4, 3, -3, 2, -2, 1, -1,} Before this patch: vsetvl a3, zero, e64, m8, ta, ma vid.v v8 v8 = {0, 1, 2, 3, 4} li a6, -1 vmul.vx v8, v8, a6 v8 = {-0, -1, -2, -3, -4} vadd.vi v24, v8, 4 v24 = { 4, 3, 2, 1, 0} vadd.vi v8, v8, -4 v8 = {-4, -5, -6, -7, -8} li a6, 32 vsll.vx v8, v8, a6 v8 = {0, -4, 0, -5, 0, -6, 0, -7,} for e32 vor v24, v24, v8 v24 = {4, -4, 3, -5, 2, -6, 1, -7,} for e32 After this patch: vsetvli a6,zero,e64,m8,ta,ma vid.v v8 v8 = {0, 1, 2, 3, 4} li a7,-1 vmul.vx v16,v8,a7 v16 = {-0, -1, -2, -3, -4} vaddvi v16,v16,4 v16 = { 4, 3, 2, 1, 0} vaddvi v8,v8,-4 v8 = {-4, -3, -2, -1, 0} li a7,32 vsll.vx v8,v8,a7 v8 = {0, -4, 0, -3, 0, -2,} for e32 vor.vv v16,v16,v8 v8 = {4, -4, 3, -3, 2, -2,} for e32 It is not easy to add asm check stable enough for this case, as we need to check the vadd -4 target comes from the vid output, which crosses 4 instructions up to point. Thus there is no test here and will be covered by gcc.dg/vect/pr92420.c in the underlying patches. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Take step2 instead of step1 for second series. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-12-18testsuite: Fix cpymem-1.c dump checks under different riscv-sim for RVV.xuli1-3/+26
gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/cpymem-1.c: Fix checks.
2023-12-18LoongArch: Add support for D frontend.liushuyu4-0/+108
gcc/ChangeLog: * config.gcc: Add loongarch-d.o to d_target_objs for LoongArch architecture. * config/loongarch/t-loongarch: Add object target for loongarch-d.cc. * config/loongarch/loongarch-d.cc (loongarch_d_target_versions): add interface function to define builtin D versions for LoongArch architecture. (loongarch_d_handle_target_float_abi): add interface function to define builtin D traits for LoongArch architecture. (loongarch_d_register_target_info): add interface function to register loongarch_d_handle_target_float_abi function. * config/loongarch/loongarch-d.h (loongarch_d_target_versions): add function prototype. (loongarch_d_register_target_info): Likewise. libphobos/ChangeLog: * configure.tgt: Enable libphobos for LoongArch architecture. * libdruntime/gcc/sections/elf.d: Add TLS_DTV_OFFSET constant for LoongArch64. * libdruntime/gcc/unwind/generic.d: Add __aligned__ constant for LoongArch64.
2023-12-18RISC-V: Add viota missed avl_type attributexuli2-1/+76
This patch fixes the following FAIL when LMUL = 8: riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medany/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=scalable FAIL: gcc.dg/vect/slp-multitypes-2.c execution test The rootcause is we missed viota avl_type, so we end up with incorrect vsetvl configuration: vsetvli zero,a2,e64,m8,ta,ma viota.m v16,v0 'a2' value is a garbage value. After this patch: vsetvli a4,zero,e64,m8,ta,ma viota.m v16,v0 gcc/ChangeLog: * config/riscv/vector.md: Add viota avl_type attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/bug-2.c: New test.
2023-12-18RISC-V: Fix POLY INT handle bugPan Li2-4/+45
This patch fixes the following FAIL: Running target riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8 FAIL: gcc.dg/vect/fast-math-vect-complex-3.c execution test The root cause is we generate incorrect codegen for (const_poly_int:DI [549755813888, 549755813888]) Before this patch: li a7,0 vmv.v.x v0,a7 After this patch: csrr a2,vlenb slli a2,a2,33 vmv.v.x v0,a2 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_mult_with_const_int): Change int into HOST_WIDE_INT. (riscv_legitimize_poly_move): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/bug-3.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-12-18Daily bump.GCC Administrator4-1/+61
2023-12-17Fortran: fix argument passing to CONTIGUOUS,TARGET dummy [PR97592]Harald Anlauf2-1/+237
gcc/fortran/ChangeLog: PR fortran/97592 * trans-expr.cc (gfc_conv_procedure_call): For a contiguous dummy with the TARGET attribute, the effective argument may still be contiguous even if the actual argument is not simply-contiguous. Allow packing to be decided at runtime by _gfortran_internal_pack. gcc/testsuite/ChangeLog: PR fortran/97592 * gfortran.dg/contiguous_15.f90: New test.
2023-12-17LoongArch: Add alslsi3_extendXi Ruoyao1-0/+12
Following the instruction cost fix, we are generating alsl.w $a0, $a0, $a0, 4 instead of li.w $t0, 17 mul.w $a0, $t0 for "x * 4", because alsl.w is 4 times faster than mul.w. But we didn't have a sign-extending pattern for alsl.w, causing an extra slli.w instruction generated to sign-extend $a0. Add the pattern to remove the redundant extension. gcc/ChangeLog: * config/loongarch/loongarch.md (alslsi3_extend): New define_insn.
2023-12-17LoongArch: Fix instruction costs [PR112936]Xi Ruoyao3-29/+43
Replace the instruction costs in loongarch_rtx_cost_data constructor based on micro-benchmark results on LA464 and LA664. This allows optimizations like "x * 17" to alsl, and "x * 68" to alsl and slli. gcc/ChangeLog: PR target/112936 * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Update instruction costs per micro-benchmark results. (loongarch_rtx_cost_optimize_size): Set all instruction costs to (COSTS_N_INSNS (1) + 1). * config/loongarch/loongarch.cc (loongarch_rtx_costs): Remove special case for multiplication when optimizing for size. Adjust division cost when TARGET_64BIT && !TARGET_DIV32. Account the extra cost when TARGET_CHECK_ZERO_DIV and optimizing for speed. gcc/testsuite/ChangeLog PR target/112936 * gcc.target/loongarch/mul-const-reduction.c: New test.
2023-12-17LoongArch: Include rtl.h for COSTS_N_INSNS instead of hard coding our ownXi Ruoyao1-2/+1
With loongarch-def.cc switched from C to C++, we can include rtl.h for COSTS_N_INSNS, instead of hard coding our own. THis is a non-functional change for now, but it will make the code more future-proof in case COSTS_N_INSNS in rtl.h would be changed. gcc/ChangeLog: * config/loongarch/loongarch-def.cc (rtl.h): Include. (COSTS_N_INSNS): Remove the macro definition.
2023-12-17install: Streamline the hppa*-hp-hpux* sectionGerald Pfeifer1-21/+2
gcc: PR target/69374 * doc/install.texi (Specific) <hppa*-hp-hpux*>: Remove a note on GCC 4.3. Remove details on how the HP assembler, which we document as not working, breaks. <hppa*-hp-hpux11>: Note that only the HP linker is supported.
2023-12-17doc: Remove references to buildstat.htmlGerald Pfeifer1-52/+1
gcc: PR other/69374 * doc/install.texi (Installing GCC): Remove reference to buildstat.html. (Testing): Ditto. (Final install): Remove section on submitting information for buildstat.html. Adjust the request for feedback.
2023-12-17Daily bump.GCC Administrator9-1/+343
2023-12-17c++: Seed namespaces for bindings [PR106363]Nathaniel Shead3-3/+20
Currently the first depset for an EK_BINDING is not seeded. This breaks the attached testcase as then the namespace is not considered referenced yet during streaming, but we've already finished importing. There doesn't seem to be any particular reason I could find for skipping the first depset for bindings, and removing the condition doesn't appear to cause any test failures, so this patch removes that check. PR c++/106363 gcc/cp/ChangeLog: * module.cc (module_state::write_cluster): Don't skip first depset for bindings. gcc/testsuite/ChangeLog: * g++.dg/modules/pr106363_a.C: New test. * g++.dg/modules/pr106363_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-12-16analyzer: add sarif properties for bounds checking diagnosticsDavid Malcolm7-0/+208
As a followup to r14-6057-g12b67d1e13b3cf, add SARIF property bags for -Wanalyzer-out-of-bounds, to help with debugging these warnings. This was very helpful with PR analyzer/112792. gcc/analyzer/ChangeLog: * analyzer.cc: Include "tree-pretty-print.h" and "diagnostic-event-id.h". (tree_to_json): New. (diagnostic_event_id_to_json): New. (bit_offset_to_json): New. (byte_offset_to_json): New. * analyzer.h (tree_to_json): New decl. (diagnostic_event_id_to_json): New decl. (bit_offset_to_json): New decl. (byte_offset_to_json): New decl. * bounds-checking.cc: Include "diagnostic-format-sarif.h". (out_of_bounds::maybe_add_sarif_properties): New. (concrete_out_of_bounds::maybe_add_sarif_properties): New. (concrete_past_the_end::maybe_add_sarif_properties): New. (symbolic_past_the_end::maybe_add_sarif_properties): New. * region-model.cc (region_to_value_map::to_json): New. (region_model::to_json): New. * region-model.h (region_to_value_map::to_json): New decl. (region_model::to_json): New decl. * store.cc (bit_range::to_json): New. (byte_range::to_json): New. * store.h (bit_range::to_json): New decl. (byte_range::to_json): New decl. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-12-16json: fix escaping of object keysDavid Malcolm1-40/+54
gcc/ChangeLog: * json.cc (print_escaped_json_string): New, taken from string::print. (object::print): Use it for printing keys. (string::print): Move implementation to print_escaped_json_string. (selftest::test_writing_objects): Add a key containing quote, backslash, and control characters. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-12-16analyzer: use bit-level granularity for concrete bounds-checking [PR112792]David Malcolm6-183/+512
PR analyzer/112792 reports false positives from -fanalyzer's bounds-checking on certain packed structs containing bitfields e.g. in the Linux kernel's drivers/dma/idxd/device.c: union msix_perm { struct { u32 rsvd2 : 8; u32 pasid : 20; }; u32 bits; } __attribute__((__packed__)); The root cause is that the bounds-checking is done using byte offsets and ranges; in the above, an access of "pasid" is treated as a 32-bit access starting one byte inside the union, thus accessing byte offsets 1-4 when only offsets 0-3 are valid. This patch updates the bounds-checking to use bit offsets and ranges wherever possible - for concrete offsets and capacities. In the above accessing "pasid" is treated as bits 8-27 of a 32-bit region, fixing the false positive. Symbolic offsets and ranges are still handled at byte granularity. gcc/analyzer/ChangeLog: PR analyzer/112792 * bounds-checking.cc (out_of_bounds::oob_region_creation_event_capacity): Rename "capacity" to "byte_capacity". Layout fix. (out_of_bounds::::add_region_creation_events): Rename "capacity" to "byte_capacity". (class concrete_out_of_bounds): Rename m_out_of_bounds_range to m_out_of_bounds_bits and convert from a byte_range to a bit_range. (concrete_out_of_bounds::get_out_of_bounds_bytes): New. (concrete_past_the_end::concrete_past_the_end): Rename param "byte_bound" to "bit_bound". Initialize m_byte_bound. (concrete_past_the_end::subclass_equal_p): Update for renaming of m_byte_bound to m_bit_bound. (concrete_past_the_end::m_bit_bound): New field. (concrete_buffer_overflow::concrete_buffer_overflow): Convert param "range" from byte_range to bit_range. Rename param "byte_bound" to "bit_bound". (concrete_buffer_overflow::emit): Update for bits vs bytes. (concrete_buffer_overflow::describe_final_event): Split into... (concrete_buffer_overflow::describe_final_event_as_bytes): ...this (concrete_buffer_overflow::describe_final_event_as_bits): ...and this. (concrete_buffer_over_read::concrete_buffer_over_read): Convert param "range" from byte_range to bit_range. Rename param "byte_bound" to "bit_bound". (concrete_buffer_over_read::emit): Update for bits vs bytes. (concrete_buffer_over_read::describe_final_event): Split into... (concrete_buffer_over_read::describe_final_event_as_bytes): ...this (concrete_buffer_over_read::describe_final_event_as_bits): ...and this. (concrete_buffer_underwrite::concrete_buffer_underwrite): Convert param "range" from byte_range to bit_range. (concrete_buffer_underwrite::describe_final_event): Split into... (concrete_buffer_underwrite::describe_final_event_as_bytes): ...this (concrete_buffer_underwrite::describe_final_event_as_bits): ...and this. (concrete_buffer_under_read::concrete_buffer_under_read): Convert param "range" from byte_range to bit_range. (concrete_buffer_under_read::describe_final_event): Split into... (concrete_buffer_under_read::describe_final_event_as_bytes): ...this (concrete_buffer_under_read::describe_final_event_as_bits): ...and this. (region_model::check_region_bounds): Use bits for concrete values, and rename locals to indicate whether we're dealing with bits or bytes. Specifically, replace "num_bytes_sval" with "num_bits_sval", and get it from reg's "get_bit_size_sval". Replace "num_bytes_tree" with "num_bits_tree". Rename "capacity" to "byte_capacity". Rename "cst_capacity_tree" to "cst_byte_capacity_tree". Replace "offset" and "num_bytes_unsigned" with "bit_offset" and "num_bits_unsigned" respectively, converting from byte_offset_t to bit_offset_t. Replace "out" and "read_bytes" with "bits_outside" and "read_bits" respectively, converting from byte_range to bit_range. Convert "buffer" from byte_range to bit_range. Replace "byte_bound" with "bit_bound". * region.cc (region::get_bit_size_sval): New. (offset_region::get_bit_offset): New. (offset_region::get_bit_size_sval): New. (sized_region::get_bit_size_sval): New. (bit_range_region::get_bit_size_sval): New. * region.h (region::get_bit_size_sval): New vfunc. (offset_region::get_bit_offset): New decl. (offset_region::get_bit_size_sval): New decl. (sized_region::get_bit_size_sval): New decl. (bit_range_region::get_bit_size_sval): New decl. * store.cc (bit_range::intersects_p): New, based on byte_range::intersects_p. (bit_range::exceeds_p): New, based on byte_range::exceeds_p. (bit_range::falls_short_of_p): New, based on byte_range::falls_short_of_p. (byte_range::intersects_p): Delete. (byte_range::exceeds_p): Delete. (byte_range::falls_short_of_p): Delete. * store.h (bit_range::intersects_p): New overload. (bit_range::exceeds_p): New. (bit_range::falls_short_of_p): New. (byte_range::intersects_p): Delete. (byte_range::exceeds_p): Delete. (byte_range::falls_short_of_p): Delete. gcc/testsuite/ChangeLog: PR analyzer/112792 * c-c++-common/analyzer/out-of-bounds-pr112792.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-12-16Fortran: Prevent unwanted finalization with -w option [PR112459]Paul Thomas3-2/+43
2023-12-16 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/112459 * trans-array.cc (gfc_trans_array_constructor_value): Replace gfc_notification_std with explicit logical expression that selects F2003/2008 and excludes -std=default/gnu. * trans-expr.cc (gfc_conv_expr): Ditto. gcc/testsuite/ PR fortran/112459 * gfortran.dg/pr112459.f90: New test.
2023-12-16Fortran: Fix problems with class array function selectors [PR112834]Paul Thomas6-6/+109
2023-12-16 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/112834 * match.cc (build_associate_name): Fix whitespace issues. (select_type_set_tmp): If the selector is of unknown type, go the SELECT TYPE selector to see if this is a function and, if the result is available, use its typespec. * parse.cc (parse_associate): Again, use the function result if the type of the selector result is unknown. * trans-stmt.cc (trans_associate_var): The expression has to be of type class, for class_target to be true. Convert and fix class functions. Pass the fixed expression. PR fortran/111853 * resolve.cc (gfc_expression_rank): Avoid null dereference. gcc/testsuite/ PR fortran/112834 * gfortran.dg/associate_63.f90 : New test. PR fortran/111853 * gfortran.dg/pr111853.f90 : New test.
2023-12-16c++: Fix unchecked use of CLASSTYPE_AS_BASE [PR113031]Nathaniel Shead2-1/+36
My previous commit (naively) assumed that a TREE_CODE of RECORD_TYPE or UNION_TYPE was sufficient for optype to be considered a "class type". However, this does not account for e.g. template type parameters of record or union type. This patch corrects to check for CLASS_TYPE_P before checking for as-base conversion. PR c++/113031 gcc/cp/ChangeLog: * constexpr.cc (cxx_fold_indirect_ref_1): Check for CLASS_TYPE before using CLASSTYPE_AS_BASE. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/pr113031.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2023-12-16[aarch64] Add function multiversioning supportAndrew Carlotti18-57/+1139
This adds initial support for function multiversioning on aarch64 using the target_version and target_clones attributes. This loosely follows the Beta specification in the ACLE [1], although with some differences that still need to be resolved (possibly as follow-up patches). Existing function multiversioning implementations are broken in various ways when used across translation units. This includes placing resolvers in the wrong translation units, and using symbol mangling that callers to unintentionally bypass the resolver in some circumstances. Fixing these issues for aarch64 will require modifications to our ACLE specification. It will also require further adjustments to existing middle end code, to facilitate different mangling and resolver placement while preserving existing target behaviours. The list of function multiversioning features specified in the ACLE is also inconsistent with the list of features supported in target option extensions. I intend to resolve some or all of these inconsistencies at a later stage. The target_version attribute is currently only supported in C++, since this is the only frontend with existing support for multiversioning using the target attribute. On the other hand, this patch happens to enable multiversioning with the target_clones attribute in Ada and D, as well as the entire C family, using their existing frontend support. This patch also does not support the following aspects of the Beta specification: - The target_clones attribute should allow an implicit unlisted "default" version. - There should be an option to disable function multiversioning at compile time. - Unrecognised target names in a target_clones attribute should be ignored (with an optional warning). This current patch raises an error instead. [1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning gcc/ChangeLog: * config/aarch64/aarch64-feature-deps.h (fmv_deps_<FEAT_NAME>): Define aarch64_feature_flags mask foreach FMV feature. * config/aarch64/aarch64-option-extensions.def: Use new macros to define FMV feature extensions. * config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p): Check for target_version attribute after processing target attribute. (aarch64_fmv_feature_data): New. (aarch64_parse_fmv_features): New. (aarch64_process_target_version_attr): New. (aarch64_option_valid_version_attribute_p): New. (get_feature_mask_for_version): New. (compare_feature_masks): New. (aarch64_compare_version_priority): New. (build_ifunc_arg_type): New. (make_resolver_func): New. (add_condition_to_bb): New. (dispatch_function_versions): New. (aarch64_generate_version_dispatcher_body): New. (aarch64_get_function_versions_dispatcher): New. (aarch64_common_function_versions): New. (aarch64_mangle_decl_assembler_name): New. (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation. (TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation. (TARGET_OPTION_FUNCTION_VERSIONS): New implementation. (TARGET_COMPARE_VERSION_PRIORITY): New implementation. (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation. (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation. (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation. * config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Set target macro. * config/arm/aarch-common.h (enum aarch_parse_opt_result): Add new value to report duplicate FMV feature. * common/config/aarch64/cpuinfo.h: New file. libgcc/ChangeLog: * config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared copy in gcc/common gcc/testsuite/ChangeLog: * gcc.target/aarch64/options_set_17.c: Reorder expected flags. * gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto. * gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
2023-12-16Add support for target_version attributeAndrew Carlotti14-23/+124
This patch adds support for the "target_version" attribute to the middle end and the C++ frontend, which will be used to implement function multiversioning in the aarch64 backend. On targets that don't use the "target" attribute for multiversioning, there is no conflict between the "target" and "target_clones" attributes. This patch therefore makes the mutual exclusion in C-family, D and Ada conditonal upon the value of the expanded_clones_attribute target hook. The "target_version" attribute is only added to C++ in this patch, because this is currently the only frontend which supports multiversioning using the "target" attribute. Support for the "target_version" attribute will be extended to C at a later date. Targets that currently use the "target" attribute for function multiversioning (i.e. i386 and rs6000) are not affected by this patch. gcc/ChangeLog: * attribs.cc (decl_attributes): Pass attribute name to target. (is_function_default_version): Update comment to specify incompatibility with target_version attributes. * cgraphclones.cc (cgraph_node::create_version_clone_with_body): Call valid_version_attribute_p for target_version attributes. * defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro. * target.def (valid_version_attribute_p): New hook. * doc/tm.texi.in: Add new hook. * doc/tm.texi: Regenerate. * multiple_target.cc (create_dispatcher_calls): Remove redundant is_function_default_version check. (expand_target_clones): Use target macro to pick attribute name. * targhooks.cc (default_target_option_valid_version_attribute_p): New. * targhooks.h (default_target_option_valid_version_attribute_p): New. * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include target_version attributes. gcc/c-family/ChangeLog: * c-attribs.cc (attr_target_exclusions): Make target/target_clones exclusion target-dependent. (attr_target_clones_exclusions): Ditto, and add target_version. (attr_target_version_exclusions): New. (c_common_attribute_table): Add target_version. (handle_target_version_attribute): New. (handle_target_attribute): Amend comment. (handle_target_clones_attribute): Ditto. gcc/ada/ChangeLog: * gcc-interface/utils.cc (attr_target_exclusions): Make target/target_clones exclusion target-dependent. (attr_target_clones_exclusions): Ditto. gcc/d/ChangeLog: * d-attribs.cc (attr_target_exclusions): Make target/target_clones exclusion target-dependent. (attr_target_clones_exclusions): Ditto. gcc/cp/ChangeLog: * decl2.cc (check_classfn): Update comment to include target_version attributes.
2023-12-16ada: Improve attribute exclusion handlingAndrew Carlotti1-37/+33
Change the handling of some attribute mutual exclusions to use the generic attribute exclusion lists, and fix some asymmetric exclusions by adding the exclusions for always_inline after noinline or target_clones. Aside from the new always_inline exclusions, the only change is functionality is the choice of warning message displayed. All warnings about attribute mutual exclusions now use the same message. gcc/ada/ChangeLog: * gcc-interface/utils.cc (attr_noinline_exclusions): New. (attr_always_inline_exclusions): Ditto. (attr_target_exclusions): Ditto. (attr_target_clones_exclusions): Ditto. (gnat_internal_attribute_table): Add new exclusion lists. (handle_noinline_attribute): Remove custom exclusion handling. (handle_target_attribute): Ditto. (handle_target_clones_attribute): Ditto.
2023-12-16c-family: Simplify attribute exclusion handlingAndrew Carlotti3-52/+34
This patch changes the handling of mutual exclusions involving the target and target_clones attributes to use the generic attribute exclusion lists. Additionally, the duplicate handling for the always_inline and noinline attribute exclusion is removed. The only change in functionality is the choice of warning message displayed - due to either a change in the wording for mutual exclusion warnings, or a change in the order in which different checks occur. gcc/c-family/ChangeLog: * c-attribs.cc (attr_always_inline_exclusions): New. (attr_target_exclusions): Ditto. (attr_target_clones_exclusions): Ditto. (c_common_attribute_table): Add new exclusion lists. (handle_noinline_attribute): Remove custom exclusion handling. (handle_always_inline_attribute): Ditto. (handle_target_attribute): Ditto. (handle_target_clones_attribute): Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mvc2.C: * g++.target/i386/mvc3.C:
2023-12-16aarch64: Fix +nopredres, +nols64 and +nomopsAndrew Carlotti3-10/+23
For native cpu feature detection, certain features have no entry in /proc/cpuinfo, so have to be assumed to be present whenever the detected cpu is supposed to support that feature. However, the logic for this was mistakenly implemented by excluding these features from part of aarch64_get_extension_string_for_isa_flags. This function is also used elsewhere when canonicalising explicit feature sets, which may require removing features that are normally implied by the specified architecture version. This change reenables generation of +nopredres, +nols64 and +nomops during canonicalisation, by relocating the misplaced native cpu detection logic. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (struct aarch64_option_extension): Remove unused field. (all_extensions): Ditto. (aarch64_get_extension_string_for_isa_flags): Remove filtering of features without native detection. * config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Explicitly add expected features that lack cpuinfo detection. gcc/testsuite/ChangeLog: * gcc.target/aarch64/options_set_28.c: New test.
2023-12-16aarch64: Fix +nocrypto handlingAndrew Carlotti5-15/+43
Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead. The value of the AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is retained because removing it would make processing the data in option-extensions.def significantly more complex. This bug should have been picked up by an existing test, but a missing newline meant that the pattern incorrectly allowed "+crypto+nocrypto". gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (aarch64_get_extension_string_for_isa_flags): Fix generation of the "+nocrypto" extension. * config/aarch64/aarch64.h (AARCH64_ISA_CRYPTO): Remove. (TARGET_CRYPTO): Remove. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Don't use TARGET_CRYPTO. gcc/testsuite/ChangeLog: * gcc.target/aarch64/options_set_4.c: Add terminating newline. * gcc.target/aarch64/options_set_27.c: New test.
2023-12-16Daily bump.GCC Administrator6-1/+453
2023-12-15[PATCH v4 2/3] RISC-V: Update XCValu constraints to match other vendorsMary Bennett2-9/+10
gcc/ChangeLog: * config/riscv/constraints.md: CVP2 -> CV_alu_pow2. * config/riscv/corev.md: Likewise.
2023-12-15[PATCH v4 1/3] RISC-V: Add support for XCVelw extension in CV32E40PMary Bennett10-0/+60
Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md Contributors: Mary Bennett <mary.bennett@embecosm.com> Nandni Jamnadas <nandni.jamnadas@embecosm.com> Pietra Ferreira <pietra.ferreira@embecosm.com> Charlie Keaney Jessica Mills Craig Blackmore <craig.blackmore@embecosm.com> Simon Cook <simon.cook@embecosm.com> Jeremy Bennett <jeremy.bennett@embecosm.com> Helene Chelin <helene.chelin@embecosm.com> gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add XCVelw. * config/riscv/corev.def: Likewise. * config/riscv/corev.md: Likewise. * config/riscv/riscv-builtins.cc (AVAIL): Likewise. * config/riscv/riscv-ftypes.def: Likewise. * config/riscv/riscv.opt: Likewise. * doc/extend.texi: Add XCVelw builtin documentation. * doc/sourcebuild.texi: Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/cv-elw-elw-compile-1.c: Create test for cv.elw. * lib/target-supports.exp: Add proc for the XCVelw extension.
2023-12-15[PATCH] RISC-V: Add -fno-vect-cost-model to pr112773 testcasePatrick O'Neill1-1/+1
The testcase for pr112773 started passing after r14-6472-g8501edba91e which was before the actual fix. This patch adds -fno-vect-cost-model which prevents the testcase from passing due to the vls change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/pr112773.c: Add -fno-vect-cost-model. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-12-15Re: [PATCH] RISC-V: fix scalar crypto patternsJeff Law15-67/+298
A handful of the scalar crypto instructions are supposed to take a constant integer argument 0..3 inclusive and one should accept 0..10. A suitable constraint was created and used for this purpose (D03 and DsA), but the operand's predicate is "register_operand". That's just wrong. This patch adds a new predicates "const_0_3_operand" and "const_0_10_operand" and fixes the relevant insns to use the appropriate predicate. It drops the now unnecessary constraints. The testsuite was broken in a way that made it consistent with the compiler, so the tests passed, when they really should have been issuing errors all along. This patch adjusts the existing tests so that they all expect a diagnostic on the invalid operand usage (including out of range constants). It adds new tests with proper constants, testing the extremes of valid values. PR target/110201 gcc/ * config/riscv/constraints.md (D03, DsA): Remove unused constraints. * config/riscv/predicates.md (const_0_3_operand): New predicate. (const_0_10_operand): Likewise. * config/riscv/crypto.md (riscv_aes32dsi): Use new predicate. Drop unnecessary constraint. (riscv_aes32dsmi, riscv_aes64im, riscv_aes32esi): Likewise. (riscv_aes32esmi, *riscv_<sm4_op>_si): Likewise. (riscv_<sm4_op>_di_extend, riscv_<sm4_op>_si): Likewise. gcc/testsuite * gcc.target/riscv/zknd32.c: Verify diagnostics are issued for invalid builtin arguments. * gcc.target/riscv/zknd64.c: Likewise. * gcc.target/riscv/zkne32.c: Likewise. * gcc.target/riscv/zkne64.c: Likewise. * gcc.target/riscv/zksed32.c: Likewise. * gcc.target/riscv/zksed64.c: Likewise. * gcc.target/riscv/zknd32-2.c: New test * gcc.target/riscv/zknd64-2.c: Likewise. * gcc.target/riscv/zkne32-2.c: Likewise. * gcc.target/riscv/zkne64-2.c: Likewise. * gcc.target/riscv/zksed32-2.c: Likewise. * gcc.target/riscv/zksed64-2.c: Likewise. Co-authored-by: Liao Shihua <shihua@iscas.ac.cn>
2023-12-15fortran: Update degree trigs documentation.Jerry DeLisle2-22/+19
This is only some cleanup. gcc/fortran/ChangeLog: PR fortran/112783 * intrinsic.texi: Fix where no COMPLEX allowed. * invoke.texi: Clarify -fdev-math.
2023-12-15aarch64: Add new load/store pair fusion pass.Alex Coplan7-2/+2763
This adds a new aarch64-specific RTL-SSA pass dedicated to forming load and store pairs (LDPs and STPs). As a motivating example for the kind of thing this improves, take the following testcase: extern double c[20]; double f(double x) { double y = x*x; y += c[16]; y += c[17]; y += c[18]; y += c[19]; return y; } for which we currently generate (at -O2): f: adrp x0, c add x0, x0, :lo12:c ldp d31, d29, [x0, 128] ldr d30, [x0, 144] fmadd d0, d0, d0, d31 ldr d31, [x0, 152] fadd d0, d0, d29 fadd d0, d0, d30 fadd d0, d0, d31 ret but with the pass, we generate: f: .LFB0: adrp x0, c add x0, x0, :lo12:c ldp d31, d29, [x0, 128] fmadd d0, d0, d0, d31 ldp d30, d31, [x0, 144] fadd d0, d0, d29 fadd d0, d0, d30 fadd d0, d0, d31 ret The pass is local (only considers a BB at a time). In theory, it should be possible to extend it to run over EBBs, at least in the case of pure (MEM_READONLY_P) loads, but this is left for future work. The pass works by identifying two kinds of bases: tree decls obtained via MEM_EXPR, and RTL register bases in the form of RTL-SSA def_infos. If a candidate memory access has a MEM_EXPR base, then we track it via this base, and otherwise if it is of a simple reg + <imm> form, we track it via the RTL-SSA def_info for the register. For each BB, for a given kind of base, we build up a hash table mapping the base to an access_group. The access_group data structure holds a list of accesses at each offset relative to the same base. It uses a splay tree to support efficient insertion (while walking the bb), and the nodes are chained using a linked list to support efficient iteration (while doing the transformation). For each base, we then iterate over the access_group to identify adjacent accesses, and try to form load/store pairs for those insns that access adjacent memory. The pass is currently run twice, both before and after register allocation. The first copy of the pass is run late in the pre-RA RTL pipeline, immediately after sched1, since it was found that sched1 was increasing register pressure when the pass was run before. The second copy of the pass runs immediately before peephole2, so as to get any opportunities that the existing ldp/stp peepholes can handle. There are some cases that we punt on before RA, e.g. accesses relative to eliminable regs (such as the soft frame pointer). We do this since we can't know the elimination offset before RA, and we want to avoid the RA reloading the offset (due to being out of ldp/stp immediate range) as this can generate worse code. The post-RA copy of the pass is there to pick up the crumbs that were left behind / things we punted on in the pre-RA pass. Among other things, it's needed to handle accesses relative to the stack pointer. It can also handle code that didn't exist at the time the pre-RA pass was run (spill code, prologue/epilogue code). This is an initial implementation, and there are (among other possible improvements) the following notable caveats / missing features that are left for future work, but could give further improvements: - Moving accesses between BBs within in an EBB, see above. - Out-of-range opportunities: currently the pass refuses to form pairs if there isn't a suitable base register with an immediate in range for ldp/stp, but it can be profitable to emit anchor addresses in the case that there are four or more out-of-range nearby accesses that can be formed into pairs. This is handled by the current ldp/stp peepholes, so it would be good to support this in the future. - Discovery: currently we prioritize MEM_EXPR bases over RTL bases, which can lead to us missing opportunities in the case that two accesses have distinct MEM_EXPR bases (i.e. different DECLs) but they are still adjacent in memory (e.g. adjacent variables on the stack). I hope to address this for GCC 15, hopefully getting to the point where we can remove the ldp/stp peepholes and scheduling hooks. Furthermore it would be nice to make the pass aware of section anchors (adding these as a third kind of base) allowing merging accesses to adjacent variables within the same section. gcc/ChangeLog: * config.gcc: Add aarch64-ldp-fusion.o to extra_objs for aarch64. * config/aarch64/aarch64-passes.def: Add copies of pass_ldp_fusion before and after RA. * config/aarch64/aarch64-protos.h (make_pass_ldp_fusion): Declare. * config/aarch64/aarch64.opt (-mearly-ldp-fusion): New. (-mlate-ldp-fusion): New. (--param=aarch64-ldp-alias-check-limit): New. (--param=aarch64-ldp-writeback): New. * config/aarch64/t-aarch64: Add rule for aarch64-ldp-fusion.o. * config/aarch64/aarch64-ldp-fusion.cc: New file. * doc/invoke.texi (AArch64 Options): Document new -m{early,late}-ldp-fusion options.
2023-12-15aarch64: Rewrite non-writeback ldp/stp patternsAlex Coplan8-334/+293
This patch overhauls the load/store pair patterns with two main goals: 1. Fixing a correctness issue (the current patterns are not RA-friendly). 2. Allowing more flexibility in which operand modes are supported, and which combinations of modes are allowed in the two arms of the load/store pair, while reducing the number of patterns required both in the source and in the generated code. The correctness issue (1) is due to the fact that the current patterns have two independent memory operands tied together only by a predicate on the insns. Since LRA only looks at the constraints, one of the memory operands can get reloaded without the other one being changed, leading to the insn becoming unrecognizable after reload. We fix this issue by changing the patterns such that they only ever have one memory operand representing the entire pair. For the store case, we use an unspec to logically concatenate the register operands before storing them. For the load case, we use unspecs to extract the "lanes" from the pair mem, with the second occurrence of the mem matched using a match_dup (such that there is still really only one memory operand as far as the RA is concerned). In terms of the modes used for the pair memory operands, we canonicalize these to V2x4QImode, V2x8QImode, and V2x16QImode. These modes have not only the correct size but also correct alignment requirement for a memory operand representing an entire load/store pair. Unlike the other two, V2x4QImode didn't previously exist, so had to be added with the patch. As with the previous patch generalizing the writeback patterns, this patch aims to be flexible in the combinations of modes supported by the patterns without requiring a large number of generated patterns by using distinct mode iterators. The new scheme means we only need a single (generated) pattern for each load/store operation of a given operand size. For the 4-byte and 8-byte operand cases, we use the GPI iterator to synthesize the two patterns. The 16-byte case is implemented as a separate pattern in the source (due to only having a single possible alternative). Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code, we add REG_CFA_OFFSET notes to the store pair insns emitted by aarch64_save_callee_saves, so that correct CFI information can still be generated. Furthermore, we now unconditionally generate these CFA notes on frame-related insns emitted by aarch64_save_callee_saves. This is done in case that the load/store pair pass forms these into pairs, in which case the CFA notes would be needed. We also adjust the ldp/stp peepholes to generate the new form. This is done by switching the generation to use the aarch64_gen_{load,store}_pair interface, making it easier to change the form in the future if needed. (Likewise, the upcoming aarch64 load/store pair pass also makes use of this interface). This patch also adds an "ldpstp" attribute to the non-writeback load/store pair patterns, which is used by the post-RA load/store pair pass to identify existing patterns and see if they can be promoted to writeback variants. One potential concern with using unspecs for the patterns is that it can block optimization by the generic RTL passes. This patch series tries to mitigate this in two ways: 1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline. 2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion to emit individual loads/stores instead of ldp/stp. These should then be formed back into load/store pairs much later in the RTL pipeline by the new load/store pair pass. gcc/ChangeLog: * config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp representation from peepholes, allowing use of new form. * config/aarch64/aarch64-modes.def (V2x4QImode): Define. * config/aarch64/aarch64-protos.h (aarch64_finish_ldpstp_peephole): Declare. (aarch64_swap_ldrstr_operands): Delete declaration. (aarch64_gen_load_pair): Adjust parameters. (aarch64_gen_store_pair): Likewise. * config/aarch64/aarch64-simd.md (load_pair<DREG:mode><DREG2:mode>): Delete. (vec_store_pair<DREG:mode><DREG2:mode>): Delete. (load_pair<VQ:mode><VQ2:mode>): Delete. (vec_store_pair<VQ:mode><VQ2:mode>): Delete. * config/aarch64/aarch64.cc (aarch64_pair_mode_for_mode): New. (aarch64_gen_store_pair): Adjust to use new unspec form of stp. Drop second mem from parameters. (aarch64_gen_load_pair): Likewise. (aarch64_pair_mem_from_base): New. (aarch64_save_callee_saves): Emit REG_CFA_OFFSET notes for frame-related saves. Adjust call to aarch64_gen_store_pair (aarch64_restore_callee_saves): Adjust calls to aarch64_gen_load_pair to account for change in interface. (aarch64_process_components): Likewise. (aarch64_classify_address): Handle 32-byte pair mems in LDP_STP_N case. (aarch64_print_operand): Likewise. (aarch64_copy_one_block_and_progress_pointers): Adjust calls to account for change in aarch64_gen_{load,store}_pair interface. (aarch64_set_one_block_and_progress_pointer): Likewise. (aarch64_finish_ldpstp_peephole): New. (aarch64_gen_adjusted_ldpstp): Adjust to use generation helper. * config/aarch64/aarch64.md (ldpstp): New attribute. (load_pair_sw_<SX:mode><SX2:mode>): Delete. (load_pair_dw_<DX:mode><DX2:mode>): Delete. (load_pair_dw_<TX:mode><TX2:mode>): Delete. (*load_pair_<ldst_sz>): New. (*load_pair_16): New. (store_pair_sw_<SX:mode><SX2:mode>): Delete. (store_pair_dw_<DX:mode><DX2:mode>): Delete. (store_pair_dw_<TX:mode><TX2:mode>): Delete. (*store_pair_<ldst_sz>): New. (*store_pair_16): New. (*load_pair_extendsidi2_aarch64): Adjust to use new form. (*zero_extendsidi2_aarch64): Likewise. * config/aarch64/iterators.md (VPAIR): New. * config/aarch64/predicates.md (aarch64_mem_pair_operand): Change to a special predicate derived from aarch64_mem_pair_operator.
2023-12-15aarch64: Generalize writeback ldp/stp patternsAlex Coplan4-118/+261
Thus far the writeback forms of ldp/stp have been exclusively used in prologue and epilogue code for saving/restoring of registers to/from the stack. As such, forms of ldp/stp that weren't needed for prologue/epilogue code weren't supported by the aarch64 backend. This patch generalizes the load/store pair writeback patterns to allow: - Base registers other than the stack pointer. - Modes that weren't previously supported. - Combinations of distinct modes provided they have the same size. - Pre/post variants that weren't previously needed in prologue/epilogue code. We make quite some effort to avoid a combinatorial explosion in the number of patterns generated (and those in the source) by making extensive use of special predicates. An updated version of the upcoming ldp/stp pass can generate the writeback forms, so this patch is motivated by that. This patch doesn't add zero-extending or sign-extending forms of the writeback patterns; that is left for future work. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_ldpstp_operand_mode_p): Declare. * config/aarch64/aarch64.cc (aarch64_gen_storewb_pair): Build RTL directly instead of invoking named pattern. (aarch64_gen_loadwb_pair): Likewise. (aarch64_ldpstp_operand_mode_p): New. * config/aarch64/aarch64.md (loadwb_pair<GPI:mode>_<P:mode>): Replace with ... (*loadwb_post_pair_<ldst_sz>): ... this. Generalize as described in cover letter. (loadwb_pair<GPF:mode>_<P:mode>): Delete (superseded by the above). (*loadwb_post_pair_16): New. (*loadwb_pre_pair_<ldst_sz>): New. (loadwb_pair<TX:mode>_<P:mode>): Delete. (*loadwb_pre_pair_16): New. (storewb_pair<GPI:mode>_<P:mode>): Replace with ... (*storewb_pre_pair_<ldst_sz>): ... this. Generalize as described in cover letter. (*storewb_pre_pair_16): New. (storewb_pair<GPF:mode>_<P:mode>): Delete. (*storewb_post_pair_<ldst_sz>): New. (storewb_pair<TX:mode>_<P:mode>): Delete. (*storewb_post_pair_16): New. * config/aarch64/predicates.md (aarch64_mem_pair_operator): New. (pmode_plus_operator): New. (aarch64_ldp_reg_operand): New. (aarch64_stp_reg_operand): New.
2023-12-15aarch64: Fix up printing of ldp/stp with -msve-vector-bits=128Alex Coplan1-1/+7
Later patches allow using SVE modes in ldp/stp with -msve-vector-bits=128, so we need to make sure that we don't use SVE addressing modes when printing the address for the ldp/stp. This patch does that. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_print_address_internal): Handle SVE modes when printing ldp/stp addresses.
2023-12-15aarch64: Fix up aarch64_print_operand xzr/wzr caseAlex Coplan2-2/+11
This adjusts aarch64_print_operand to recognize zero rtxes in modes other than VOIDmode. This allows us to use xzr/wzr for zero vectors, for example. We extract the test into a helper function, aarch64_const_zero_rtx_p, since this predicate is needed by later patches. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_const_zero_rtx_p): New. * config/aarch64/aarch64.cc (aarch64_const_zero_rtx_p): New. Use it ... (aarch64_print_operand): ... here. Recognize CONST0_RTXes in modes other than VOIDmode.
2023-12-15aarch64, testsuite: Fix up pr103147-10.[cC]Alex Coplan2-2/+2
This disables scheduling in the pr103147-10 tests. The tests use check-function-bodies, and upcoming changes lead to a different schedule. gcc/testsuite/ChangeLog: * g++.target/aarch64/pr103147-10.C: Add -fno-schedule-insns{,2} to dg-options. * gcc.target/aarch64/pr103147-10.c: Likewise.
2023-12-15aarch64, testsuite: Allow ldp/stp on SVE regs with -msve-vector-bits=128Alex Coplan2-0/+61
Later patches in the series allow ldp and stp to use SVE modes if -msve-vector-bits=128 is provided. This patch therefore adjusts tests that pass -msve-vector-bits=128 to allow ldp/stp to save/restore SVE registers. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/pcs/stack_clash_1_128.c: Allow ldp/stp saves of SVE registers. * gcc.target/aarch64/sve/pcs/struct_3_128.c: Likewise.
2023-12-15aarch64, testsuite: Fix up auto-init-padding testsAlex Coplan5-12/+16
The tests currently depend on memcpy lowering forming stps at -O0, but we no longer want to form stps during memcpy lowering, but instead in the upcoming load/store pair fusion pass. This patch therefore tweaks affected tests to enable optimizations (-O1), and adjusts the tests to avoid parts of the structures being optimized away where necessary. gcc/testsuite/ChangeLog: * gcc.target/aarch64/auto-init-padding-1.c: Add -O to options, adjust test to work with optimizations enabled. * gcc.target/aarch64/auto-init-padding-2.c: Add -O to options. * gcc.target/aarch64/auto-init-padding-3.c: Add -O to options, adjust test to work with optimizations enabled. * gcc.target/aarch64/auto-init-padding-4.c: Likewise. * gcc.target/aarch64/auto-init-padding-9.c: Likewise.
2023-12-15[PATCH] RISC-V: Add Zvfbfmin extension to the -march= optionXiao Zeng6-0/+104
This patch would like to add new sub extension (aka Zvfbfmin) to the -march= option. It introduces a new data type BF16. Depending on different usage scenarios, the Zvfbfmin extension may depend on 'V' or 'Zve32f'. This patch only implements dependencies in scenario of Embedded Processor. In scenario of Application Processor, it is necessary to explicitly indicate the dependent 'V' extension. You can locate more information about Zvfbfmin from below spec doc. https://github.com/riscv/riscv-bfloat16/releases/download/20231027/riscv-bfloat16.pdf gcc/ChangeLog: * common/config/riscv/riscv-common.cc: (riscv_implied_info): Add zvfbfmin item. (riscv_ext_version_table): Ditto. (riscv_ext_flag_table): Ditto. * config/riscv/riscv.opt: (MASK_ZVFBFMIN): New macro. (MASK_VECTOR_ELEN_BF_16): Ditto. (TARGET_ZVFBFMIN): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-31.c: New test. * gcc.target/riscv/arch-32.c: New test. * gcc.target/riscv/predef-32.c: New test. * gcc.target/riscv/predef-33.c: New test.
2023-12-15PR modula2/112946 ICE assignment of string to enumeration or setGaius Mulley7-107/+324
This patch introduces type checking during FoldBecomes and also adds set/string/enum checking to the type checker. FoldBecomes has been re-written, tidied up and re-factored. gcc/m2/ChangeLog: PR modula2/112946 * gm2-compiler/M2Check.mod (checkConstMeta): New procedure function. (checkConstEquivalence): New procedure function. (doCheckPair): Add call to checkConstEquivalence. * gm2-compiler/M2GenGCC.mod (ResolveConstantExpressions): Call FoldBecomes with reduced parameters. (FoldBecomes): Re-write. (TryDeclareConst): New procedure. (RemoveQuads): New procedure. (DeclaredOperandsBecomes): New procedure function. (TypeCheckBecomes): New procedure function. (PerformFoldBecomes): New procedure. * gm2-compiler/M2Range.mod (FoldAssignment): Call AssignmentTypeCompatible to check des expr compatibility. * gm2-compiler/M2SymInit.mod (CheckReadBeforeInitQuad): Remove parameter lst. (FilterCheckReadBeforeInitQuad): Remove parameter lst. (CheckReadBeforeInitFirstBasicBlock): Remove parameter lst. Call FilterCheckReadBeforeInitQuad without lst. gcc/testsuite/ChangeLog: PR modula2/112946 * gm2/iso/fail/badassignment.mod: New test. * gm2/iso/fail/badexpression.mod: New test. * gm2/iso/fail/badexpression2.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-12-15c++: section attribute on templates [PR70435, PR88061]Patrick Palka6-0/+59
The section attribute currently has no effect on templates because the call to set_decl_section_name only happens at parse time (on the dependent decl) and not also at instantiation time. This patch fixes this by propagating the section name from the template to the instantiation. PR c++/70435 PR c++/88061 gcc/cp/ChangeLog: * pt.cc (tsubst_function_decl): Propagate DECL_SECTION_NAME via set_decl_section_name. (tsubst_decl) <case VAR_DECL>: Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/attr-section1.C: New test. * g++.dg/ext/attr-section1a.C: New test. * g++.dg/ext/attr-section2.C: New test. * g++.dg/ext/attr-section2a.C: New test. * g++.dg/ext/attr-section2b.C: New test.
2023-12-15c++: abi_tag attribute on templates [PR109715]Patrick Palka3-0/+38
We need to look through TEMPLATE_DECL when looking up the abi_tag attribute (as with other function/variable declaration attributes). PR c++/109715 gcc/cp/ChangeLog: * mangle.cc (get_abi_tags): Strip TEMPLATE_DECL before looking up the abi_tag attribute. gcc/testsuite/ChangeLog: * g++.dg/abi/abi-tag25.C: New test. * g++.dg/abi/abi-tag25a.C: New test.
2023-12-15Fix tests for gompAndre Vieira3-4/+0
This is to fix testisms initially introduced by: commit f5fc001a84a7dbb942a6252b3162dd38b4aae311 Author: Andre Vieira <andre.simoesdiasvieira@arm.com> Date: Mon Dec 11 14:24:41 2023 +0000 aarch64: enable mixed-types for aarch64 simdclones gcc/testsuite/ChangeLog: * gcc.dg/gomp/pr87887-1.c: Fixed test. * gcc.dg/gomp/pr89246-1.c: Likewise. * gcc.dg/gomp/simd-clones-2.c: Likewise. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-1.c: Fixed test. * testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.
2023-12-15AArch64: Add inline memmove expansionWilco Dijkstra6-113/+123
Add support for inline memmove expansions. The generated code is identical as for memcpy, except that all loads are emitted before stores rather than being interleaved. The maximum size is 256 bytes which requires at most 16 registers. gcc/ChangeLog: * config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold): Change default. * config/aarch64/aarch64.md (cpymemdi): Add a parameter. (movmemdi): Call aarch64_expand_cpymem. * config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename function, simplify, support storing generated loads/stores. (aarch64_expand_cpymem): Support expansion of memmove. * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool arg. gcc/testsuite/ChangeLog: * gcc.target/aarch64/memmove.c: Add new test. * gcc.target/aarch64/memmove2.c: Likewise.