aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-11-06Darwin: Fix a narrowing warning.Iain Sandoe1-1/+1
cdtor_record needs to have an unsigned entry for the position in order to match with vec_safe_length. gcc/ChangeLog: * config/darwin.cc (cdtor_record): Make position unsigned. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-11-06openmp: Fix signed/unsigned warningAndrew Stubbs1-1/+1
My previous patch broke things when building with Werror. gcc/ChangeLog: * omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.
2024-11-06openmp: Add testcases for omp_max_vfAndrew Stubbs1-0/+37
Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. libgomp/ChangeLog: * testsuite/libgomp.c/max_vf-1.c: New test. * testsuite/libgomp.c/max_vf-2.c: New test. gcc/testsuite/ChangeLog: * gcc.dg/gomp/max_vf-1.c: New test.
2024-11-06openmp: Add IFN_GOMP_MAX_VFAndrew Stubbs4-8/+34
Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/ChangeLog: * internal-fn.cc (expand_GOMP_MAX_VF): New function. * internal-fn.def (GOMP_MAX_VF): New internal function. * omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when called in offload context, otherwise assume host context. * omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.
2024-11-06openmp: use offload max_vf for chunk_sizeAndrew Stubbs1-8/+28
The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will get the same chunk size, but device performance is the most important in this case. gcc/ChangeLog: * omp-expand.cc (is_in_offload_region): New function. (omp_adjust_chunk_size): Add pass-through "offload" parameter. (get_ws_args_for): Likewise. (determine_parallel_type): Use is_in_offload_region to adjust call to get_ws_args_for. (expand_omp_for_generic): Likewise. (expand_omp_for_static_chunk): Likewise.
2024-11-06openmp: Tune omp_max_vf for offload targetsAndrew Stubbs5-6/+20
If requested, return the vectorization factor appropriate for the offload device, if any. This change gives a significant speedup in the BabelStream "dot" benchmark on amdgcn. The omp_adjust_chunk_size usecase is set "false", for now, but I intend to change that in a follow-up patch. Note that NVPTX SIMT offload does not use this code-path. gcc/ChangeLog: * gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set omp_max_vf to offload == false. * omp-expand.cc (omp_adjust_chunk_size): Likewise. * omp-general.cc (omp_max_vf): Add "offload" parameter, and detect amdgcn offload devices. * omp-general.h (omp_max_vf): Likewise. * omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to omp_max_vf.
2024-11-06Add details output for assume processing.Andrew MacLeod1-19/+115
The Assume pass simply produces results, with no indication of how it arrived as the results it gets. Add some output to the details listing. The only functional change is when gori is used to calculate a range more than once (ie, multiple uses), we now load the merged range rather than just using the last calculated one. * tree-assume.cc (assume_query::assume_query): Add debug output. (assume_query::update_parms): Likewise. (assume_query::calculate_phi): Likewise. (assume_query::calculate_op): Likewise. Also pick up any merged path values. (assume_query::calculate_stmt): Likewise.
2024-11-06testsuite: add infinite recursion test case [PR63388]David Malcolm1-0/+21
gcc/testsuite/ChangeLog: PR c++/63388 * g++.dg/analyzer/infinite-recursion-pr63388.C: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-06diagnostics: fix typo in commentDavid Malcolm1-1/+1
gcc/ChangeLog: * diagnostic.h (class diagnostic_context): Fix typo in leading comment. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-06libstdc++: Deprecate useless <cxxx> compatibility headers for C++17Jonathan Wakely1-0/+2
These headers make no sense for C++ programs, because they either define different content to the corresponding <xxx.h> C header, or define nothing at all in namespace std. They were all deprecated in C++17, so add deprecation warnings to them, which can be disabled with -Wno-deprecated. For C++20 and later these headers are no longer in the standard at all, so compiling with _GLIBCXX_USE_DEPRECATED defined to 0 will give an error when they are included. Because #warning is non-standard before C++23 we need to use pragmas to ignore -Wc++23-extensions for the -Wsystem-headers -pedantic case. One g++ test needs adjustment because it includes <ciso646>, but that can be made conditional on the __cplusplus value without any reduction in test coverage. For the library tests, consolidate the std_c++0x_neg.cc XFAIL tests into the macros.cc test, using dg-error with a { target c++98_only } selector. This avoids having two separate test files, one for C++98 and one for everything later. Also add tests for the <xxx.h> headers to ensure that they behave as expected and don't give deprecated warnings. libstdc++-v3/ChangeLog: * doc/xml/manual/evolution.xml: Document deprecations. * doc/html/*: Regenerate. * include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move include guard to start of file. Include <complex> directly instead of <ccomplex>. * include/c_compatibility/tgmath.h: Include <cmath> and <complex> directly, instead of <ctgmath>. * include/c_global/ccomplex: Add deprecated #warning for C++17 and #error for C++20 if _GLIBCXX_USE_DEPRECATED == 0. * include/c_global/ciso646: Likewise. * include/c_global/cstdalign: Likewise. * include/c_global/cstdbool: Likewise. * include/c_global/ctgmath: Likewise. * include/c_std/ciso646: Likewise. * include/precompiled/stdc++.h: Do not include ccomplex, ciso646, cstdalign, cstdbool, or ctgmath in C++17 and later. * testsuite/18_support/headers/cstdalign/macros.cc: Check for warnings and errors for unsupported dialects. * testsuite/18_support/headers/cstdbool/macros.cc: Likewise. * testsuite/26_numerics/headers/ctgmath/complex.cc: Likewise. * testsuite/27_io/objects/char/1.cc: Do not include <ciso646>. * testsuite/27_io/objects/wchar_t/1.cc: Likewise. * testsuite/18_support/headers/cstdbool/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/cstdalign/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ccomplex/std_c++0x_neg.cc: Removed. * testsuite/26_numerics/headers/ctgmath/std_c++0x_neg.cc: Removed. * testsuite/18_support/headers/ciso646/macros.cc: New test. * testsuite/18_support/headers/ciso646/macros.h.cc: New test. * testsuite/18_support/headers/cstdbool/macros.h.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.cc: New test. * testsuite/26_numerics/headers/ccomplex/complex.h.cc: New test. * testsuite/26_numerics/headers/ctgmath/complex.h.cc: New test. gcc/testsuite/ChangeLog: * g++.old-deja/g++.other/headers1.C: Do not include ciso646 for C++17 and later.
2024-11-06ipcp don't propagate where not neededMichal Jires3-20/+50
This patch disables propagation of ipcp information into partitions where all instances of the node are marked to be inlined. Motivation: Incremental LTO needs stable values between compilations to be effective. This requirement fails with following example: void heavily_used_function(int); ... heavily_used_function(__LINE__); Ipcp creates long list of all __LINE__ arguments, and then propagates it with every function clone, even though for inlined functions this information is not useful. gcc/ChangeLog: * ipa-prop.cc (write_ipcp_transformation_info): Disable uneeded value propagation. * lto-cgraph.cc (lto_symtab_encoder_encode): Default values. (lto_symtab_encoder_always_inlined_p): New. (lto_set_symtab_encoder_not_always_inlined): New. (add_node_to): Set always inlined. * lto-streamer.h (struct lto_encoder_entry): New field. (lto_symtab_encoder_always_inlined_p): New.
2024-11-06store-merging: Apply --param=store-merging-max-size= in more spots [PR117439]Jakub Jelinek2-1/+36
Store merging assumes a merged region won't be too large. The assumption is e.g. in using inappropriate types in various spots (e.g. int for bit sizes and bit positions in a few spots, or unsigned for the total size in bytes of the merged region), in doing XNEWVEC for the whole total size of the merged region and preparing everything in there and even that XALLOCAVEC in two spots. The last case is what was breaking the test below in the patch, 64MB XALLOCAVEC is just too large, but even with that fixed I think we just shouldn't be merging gigabyte large merge groups. We already have --param=store-merging-max-size= parameter, right now with 65536 bytes maximum (if needed, we could raise that limit a little bit). That parameter is currently used when merging two adjacent stores, if the size of the already merged bitregion together with the new store's bitregion is above that limit, we don't merge those. I guess initially that was sufficient, at that time a store was always limited to MAX_BITSIZE_MODE_ANY_INT bits. But later on we've added support for empty ctors ({} and even later {CLOBBER}) and also added another spot where we merge further stores into the merge group, if there is some overlap, we can merge various other stores in one coalesce_immediate_stores iteration. And, we weren't applying the --param=store-merging-max-size= parameter in either of those cases. So a single store can be gigabytes long, and if there is some overlap, we can extend the region again to gigabytes in size. The following patch attempts to apply that parameter even in those cases. So, if testing if it should merge the merged group with info (we've already punted if those together are above the parameter) and some other stores, the first two hunks just punt if that would make the merge group too large. And the third hunk doesn't even add stores which are over the limit. 2024-11-06 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/117439 * gimple-ssa-store-merging.cc (imm_store_chain_info::coalesce_immediate_stores): Punt if merging of any of the additional overlapping stores would result in growing the bitregion size over param_store_merging_max_size. (pass_store_merging::process_store): Terminate all aliasing chains for stores with bitregion larger than param_store_merging_max_size. * g++.dg/opt/pr117439.C: New test.
2024-11-06store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless ↵Jakub Jelinek1-4/+5
necessary [PR117439] encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which it has to allocate a buffer and do various extra work like shifting the bits etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen doesn't have corresponding integer mode. The last case is explained later in the comments: /* The native_encode_expr machinery uses TYPE_MODE to determine how many bytes to write. This means it can write more than ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example write 8 bytes for a bitlen of 40). Skip the bytes that are not within bitlen and zero out the bits that are not relevant as well (that may contain a sign bit due to sign-extension). */ Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR or {CLOBBER}, which doesn't use native_encode_expr at all, just memset, so that case doesn't need those fancy games unless bitlen or bitpos aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is possible). The following patch makes us use the fast path even for empty_ctor_p which occupy full bytes, we can just memset that in the provided buffer and don't need to XALLOCAVEC another buffer. This patch in itself fixes the testcase from the PR (which was about using huge XALLLOCAVEC), but I want to do some other changes, to be posted in a next patch. 2024-11-06 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/117439 * gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an integral mode.
2024-11-06Fortran: F2008 passing of internal procs to a proc pointer [PR117434]Paul Thomas4-2/+234
2024-11-06 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/117434 * interface.cc (gfc_compare_actual_formal): Skip 'Expected a procedure pointer error' if the formal argument typespec has an interface and the type of the actual arg is BT_PROCEDURE. gcc/testsuite/ PR fortran/117434 * gfortran.dg/proc_ptr_54.f90: New test. This is temporarily compile-only until one one seven four five five is fixed. * gfortran.dg/proc_ptr_55.f90: New test. * gfortran.dg/proc_ptr_56.f90: New test.
2024-11-06i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instructions.Hu, Lin12-5/+33
gcc/ChangeLog: PR target/117304 * config/i386/i386-builtin.def: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 512-bits instructions. gcc/testsuite/ChangeLog: PR target/117304 * gcc.target/i386/pr117304-1.c: New test.
2024-11-06Intel MOVRS tests: Also scan (%e.x)H.J. Lu3-40/+40
Since x32 uses (%reg32), instead of (%r.x), also scan (%e.x). * gcc.target/i386/avx10_2-512-movrs-1.c: Also scan (%e.x). * gcc.target/i386/avx10_2-movrs-1.c: Likewise. * gcc.target/i386/movrs-1.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-06gcc.target/i386/apx-ndd.c: Also scan (%edi)H.J. Lu1-1/+1
Since x32 uses (%edi), instead of (%rdi), also scan (%edi). * gcc.target/i386/apx-ndd.c: Also scan (%edi). Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-06Daily bump.GCC Administrator6-1/+290
2024-11-05fortran: dynamically allocate error_buffer [PR117442]David Malcolm1-11/+13
PR fortran/117442 reports a crash on exit of f951 when configured with --enable-gather-detailed-mem-stats. The crash happens if any diagnostics were ever buffered into error_buffer. The root cause is that error_buffer is statically allocated and thus has a non-trivial destructor called at exit. If error_buffer's diagnostic_buffer ever buffered anything, then a diagnostic_per_format_buffer will have been created for the buffer per-output-sink, and the destructors for these call into the mem-stats subsystem, which has already beeen cleaned up. The simplest fix is to allocate error_buffer on the heap, rather that statically, which fixes the crash. There's a comment about error_buffer: /* pp_error_buffer is statically allocated. This simplifies memory management when using gfc_push/pop_error. */ added by Manu in r6-1748-g5862c189c2c3c2 while fixing PR fortran/66528. The comment appears to be out of date. I've tested maxerrors.f90 under valgrind, and it's clean with the patch. gcc/fortran/ChangeLog: PR fortran/117442 * error.cc (error_buffer): Convert to a pointer so it can be heap-allocated. (gfc_error_now): Update for error_buffer being heap-allocated. (gfc_clear_error): Likewise. (gfc_error_flag_test): Likewise. (gfc_error_check): Likewise. (gfc_push_error): Likewise. (gfc_pop_error): Likewise. (gfc_diagnostics_init): Allocate error_buffer on the heap, rather than statically. (gfc_diagnostics_finish): Delete error_buffer. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-05match: Fix comment for `X != 0 ? X + ~0 : 0` transformationAndrew Pinski1-1/+1
Just a small coment fix, the `(` was in the wrong location, making it look it was transforming into `(X - X) != 0` rather than `X - (X != 0)`. Pushed as obvious after a quick build for x86_64-linux-gnu. gcc/ChangeLog: * match.pd (X != 0 ? X + ~0 : 0): Fix comment. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-05testsuite: arm: Use effective-target for pr68620 and pr78041 testsTorbjörn SVENSSON2-4/+5
gcc/testsuite/ChangeLog: * gcc.target/arm/pr68620.c: Use effective-target arm_neon. * gcc.target/arm/pr78041.c: Use effective-target arm_arch_v7a. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-11-05testsuite: arm: Relax register selection [PR116623]Torbjörn SVENSSON1-4/+6
Since r15-1619-g3b9b8d6cfdf, test5 and test8 fails due to that "ip" might be used and r3 might be moved to another register for later dereference. gcc/testsuite/ChangeLog: PR testsuite/116623 * gcc.target/arm/mve/dlstp-compile-asm-2.c: Align test5 and test8 with changes in r15-1619-g3b9b8d6cfdf. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-11-05testsuite: arm: Use effective-target for pr98636.c testTorbjörn SVENSSON1-1/+3
The test case assumes that -mfp16-format=alternative is accepted for the target, but not all targets support this flag. One such target is Cortex-M85 that does support FP16, but not the alternative format. gcc/testsuite/ChangeLog: * gcc.target/arm/pr98636.c: Use effective-target arm_fp16_alternative. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-11-05c: gimplefe: Only allow an identifier before ? [PR117445]Andrew Pinski1-5/+3
Since r13-707-g68e0063397ba82, COND_EXPR/VEC_COND_EXPR has not allowed a comparison as the first operand but the gimple front-end was not updated for this change and you would error out later on. An assert was added with r15-4791-gb60031e8f9f8fe which meant an ICE would happen from the gimple FE. This removes support for parsing of the `?:` expressions except for an identifier. Bootstrapped and tested on x86_64-linux-gnu. gcc/c/ChangeLog: PR c/117445 * gimple-parser.cc (c_parser_gimple_statement): Remove support for comparisons before the querry (`?`) token. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-05PR target/117449: Restrict vector rotate match and split to pre-reloadKyrylo Tkachov3-7/+18
The vector rotate splitter has some logic to deal with post-reload splitting but not all cases in aarch64_emit_opt_vec_rotate are post-reload-safe. In particular the ROTATE+XOR expansion for TARGET_SHA3 can create RTL that can later be simplified to a simple ROTATE post-reload, which would then match the insn again and try to split it. So do a clean split pre-reload and avoid going down this path post-reload by restricting the insn_and_split to can_create_pseudo_p (). Bootstrapped and tested on aarch64-none-linux. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ PR target/117449 * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>): Match only when can_create_pseudo_p (). * config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Assume can_create_pseudo_p (). gcc/testsuite/ PR target/117449 * gcc.c-torture/compile/pr117449.c: New test.
2024-11-05testsuite: Fix up gcc.target/powerpc/safe-indirect-jump-3.c test [PR117444]Peter Bergner1-1/+1
The test safe-indirect-jump-3.c FAILs on powerpc64le-linux with the change in jump table generation behavior with commit r15-4756-g06bc3a734e8890, since it is compiled without optimization and expects jump tables to be generated. Add an explicit -fjump-tables to dg-options to get the old behavior back. 2024-11-05 Peter Bergner <bergner@linux.ibm.com> gcc/testsuite/ PR testsuite/117444 * gcc.target/powerpc/safe-indirect-jump-3.c: Add -fjump-tables to dg-options.
2024-11-05c++: allow array mem-init with -fpermissive [PR116634]Jason Merrill5-5/+8
We've accidentally accepted this forever (at least as far back as 4.7), but it's always been ill-formed; this was PR59465. And we didn't accept it for scalar types. But rather than switch to a hard error for this code, let's give a permerror so affected code can continue to work with -fpermissive. PR c++/116634 gcc/cp/ChangeLog: * init.cc (can_init_array_with_p): Allow PR59465 case with permerror. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/aggr-init1.C: Expect warning with -fpermissive. * g++.dg/init/array62.C: Adjust diagnostic. * g++.dg/init/array63.C: Adjust diagnostic. * g++.dg/init/array64.C: Adjust diagnostic.
2024-11-05c++: Fix crash during NRV optimization with invalid input [PR117099, PR117129]Simon Martin4-1/+59
PR117099 and PR117129 are ICEs upon invalid code that happen when NRVO is activated, and both due to the fact that we don't consistently set current_function_return_value to error_mark_node upon error in finish_return_expr. This patch fixes this inconsistency which fixes both cases since we skip calling finalize_nrv when current_function_return_value is error_mark_node. PR c++/117099 PR c++/117129 gcc/cp/ChangeLog: * typeck.cc (check_return_expr): Upon error, set current_function_return_value to error_mark_node. gcc/testsuite/ChangeLog: * g++.dg/parse/crash78.C: New test. * g++.dg/parse/crash78a.C: New test. * g++.dg/parse/crash79.C: New test.
2024-11-05c++: Don't crash upon invalid placement new operator [PR117101]Simon Martin3-2/+25
We currently crash upon the following invalid code (notice the "void void**" parameter) === cut here === using size_t = decltype(sizeof(int)); void *operator new(size_t, void void **p) noexcept { return p; } int x; void f() { int y; new (&y) int(x); } === cut here === The problem is that in this case, we end up with a NULL_TREE parameter list for the new operator because of the error, and (1) coerce_new_type wrongly complains about the first parameter type not being size_t, (2) std_placement_new_fn_p blindly accesses the parameter list, hence a crash. This patch does NOT address #1 since we can't easily distinguish between a new operator declaration without parameters from one with erroneous parameters (and it's not worth the risk to refactor and break things for an error recovery issue) hence a dg-bogus in new52.C, but it does address #2 and the ICE by simply checking the first parameter against NULL_TREE. It also adds a new testcase checking that we complain about new operators with no or invalid first parameters, since we did not have any. PR c++/117101 gcc/cp/ChangeLog: * init.cc (std_placement_new_fn_p): Check first_arg against NULL_TREE. gcc/testsuite/ChangeLog: * g++.dg/init/new52.C: New test. * g++.dg/init/new53.C: New test.
2024-11-05c++: Defer -fstrong-eval-order processing to template instantiation time ↵Simon Martin3-1/+44
[PR117158] Since r10-3793-g1a37b6d9a7e57c, we ICE upon the following valid code with -std=c++17 and above === cut here === struct Base { unsigned int *intarray; }; template <typename T> struct Sub : public Base { bool Get(int i) { return (Base::intarray[++i] == 0); } }; === cut here === The problem is that from c++17 on, we use -fstrong-eval-order and need to wrap the array access expression into a SAVE_EXPR. We do so at template declaration time, and end up calling contains_placeholder_p with a SCOPE_REF, that it does not handle well. This patch fixes this by deferring the wrapping into SAVE_EXPR to instantiation time for templates, when the SCOPE_REF will have been turned into a COMPONENT_REF. PR c++/117158 gcc/cp/ChangeLog: * typeck.cc (cp_build_array_ref): Only wrap array expression into a SAVE_EXPR at template instantiation time. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/eval-order13.C: New test. * g++.dg/parse/crash77.C: New test.
2024-11-05testsuite: fix testcase pr110279-1.cDi Zhao1-2/+2
The test case is for targets that support FMA. Previously the "target" selector is missed in dg-final command. gcc/testsuite/ChangeLog: PR tree-optimization/110279 * gcc.dg/pr110279-1.c: add target selector.
2024-11-05Support vector float_extend from __bf16 to float.liuhongt6-1/+144
It's supported by vector permutation with zero vector. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vector_bf2sf_with_vec_perm): New function. * config/i386/i386-protos.h (ix86_expand_vector_bf2sf_with_vec_perm): New Declare. * config/i386/mmx.md (extendv2bfv2sf2): New expander. * config/i386/sse.md (extend<sf_cvt_bf16_lower><mode>2): Ditto. (VF1_AVX512BW): New mode iterator. (sf_cvt_bf16): Add V4SF. (sf_cvt_bf16_lower): New mode attr. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-extendbf2sf.c: New test. * gcc.target/i386/sse2-extendbf2sf.c: New test.
2024-11-05Support vector float_truncate for SF to BF.liuhongt7-0/+172
Generate native instruction whenever possible, otherwise use vector permutation with odd indices. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vector_sf2bf_with_vec_perm): New function. * config/i386/i386-protos.h (ix86_expand_vector_sf2bf_with_vec_perm): New declare. * config/i386/mmx.md (truncv2sfv2bf2): New expander. * config/i386/sse.md (truncv4sfv4bf2): Ditto. (truncv8sfv8bf2): Ditto. (truncv16sfv16bf2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bf16-truncsfbf.c: New test. * gcc.target/i386/avx512bw-truncsfbf.c: New test. * gcc.target/i386/ssse3-truncsfbf.c: New test.
2024-11-05c++: Mark replaceable global operator new/delete with const std::nothrow_t& ↵Jakub Jelinek3-4/+43
argument as DECL_IS_REPLACEABLE_OPERATOR [PR117370] cxx_init_decl_processing predeclares 12 out of the 20 replaceable global new/delete operators and sets DECL_IS_REPLACEABLE_OPERATOR on those. But it doesn't handle the remaining 8, in particular void* operator new(std::size_t, const std::nothrow_t&) noexcept; void* operator new[](std::size_t, const std::nothrow_t&) noexcept; void operator delete(void*, const std::nothrow_t&) noexcept; void operator delete[](void*, const std::nothrow_t&) noexcept; void* operator new(std::size_t, std::align_val_t, const std::nothrow_t&) noexcept; void* operator new[](std::size_t, std::align_val_t, const std::nothrow_t&) noexcept; void operator delete(void*, std::align_val_t, const std::nothrow_t&) noexcept; void operator delete[](void*, std::align_val_t, const std::nothrow_t&) noexcept; The following patch sets that flag during grok_op_properties for those, so that they don't need to be predeclared. The patch doesn't fix the whole PR, as some work is needed on the CDDCE side too, unlike the throwing operator new case the if (ptr) conditional around operator delete isn't removed by VRP and so we need to handle conditional delete for unconditional new. 2024-11-05 Jakub Jelinek <jakub@redhat.com> PR c++/117370 * cp-tree.h (is_std_class): Declare. * constexpr.cc (is_std_class): New function. (is_std_allocator): Use it. * decl.cc (grok_op_properties): Mark global replaceable operator new/delete operators with const std::nothrow_t & last argument with DECL_IS_REPLACEABLE_OPERATOR.
2024-11-05i386: Handling exception input of __builtin_ia32_prefetch. [PR117416]Hu, Lin13-0/+37
op1 should be between 0 and 2. Add an error handler, and op3 should be 0 or 1, raise a warning, when op3 is an invalid value. gcc/ChangeLog: PR target/117416 * config/i386/i386-expand.cc (ix86_expand_builtin): Raise warning when op1 isn't in range of [0, 2] and set op1 as const0_rtx, and raise warning when op3 isn't in range of [0, 1]. gcc/testsuite/ChangeLog: PR target/117416 * gcc.target/i386/pr117416-1.c: New test. * gcc.target/i386/pr117416-2.c: Ditto.
2024-11-05middle-end/117433 - ICE with gimple BLKmode reg copyRichard Biener2-1/+27
When we end up expanding a SSA name copy with BLKmode regs which can happen for vectors, possibly wrapped in a NOP-conversion or a PAREN_EXPR and we are not optimizing we can end up with two BLKmode MEMs that expand_gimple_stmt_1 doesn't properly handle when expanding, trying to emit_move_insn them. Looking at store_expr which what expand_gimple_stmt_1 is really doing reveals a lot of magic that's missing. It eventually falls back to emit_block_move (store_expr isn't exported), so this is what I ended up using here given I think we'll only have BLKmode "registers" for vectors. PR middle-end/117433 * cfgexpand.cc (expand_gimple_stmt_1): Use emit_block_move when moving temp to BLKmode target. * gcc.dg/pr117433.c: New testcase.
2024-11-04aarch64: remove falkor-tag-collision-avoidance passAndrew Pinski7-904/+2
This code is not well tested and there is only a single testcase (gcc.target/aarch64/pr94530.c) which only enables this code but it is testing to make sure there is no ICE. The falkor cores have not been supported from Qualcomm from 2019 or so either. And I don't have a way to test these cores internally either. Bootstrapped and tested on aarch64-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-passes.def: Don't add pass_tag_collision_avoidance. * config/aarch64/aarch64-protos.h (make_pass_tag_collision_avoidance): Remove. * config/aarch64/aarch64-tuning-flags.def (RENAME_LOAD_REGS): Remove. * config/aarch64/tuning_models/qdf24xx.h (qdf24xx_tunings): Set tuning flags to AARCH64_EXTRA_TUNE_NONE. * config/aarch64/falkor-tag-collision-avoidance.cc: Removed. * config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o): Remove. * config.gcc (aarch64*-*-*): Remove falkor-tag-collision-avoidance.o from extra_objs. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-04aarch64: Remove scheduling models for falkor and saphiraAndrew Pinski4-1253/+4
These 2 qualcomm cores have been long gone in that Qualcomm has not supported since at least 2019. Removing them will make it easier I think to change the insn type attributes instead of keeping them up todate. Note this does not remove the cores, just the schedule models. Bootstrapped and tested on aarch64-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-cores.def (falkor): Use cortex-a57 scheduler. (saphira): Likewise. * config/aarch64/aarch64.md: Don't include falkor.md and saphira.md. * config/aarch64/falkor.md: Removed. * config/aarch64/saphira.md: Removed. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-11-05i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2Levy Hsu6-28/+214
This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for efficient BF16 comparisons. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_branch): Handle BFmode when TARGET_AVX10_2_256 is enabled. (ix86_prepare_fp_compare_args): Use SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P. (ix86_expand_fp_movcc): Ditto. (ix86_expand_fp_compare): Handle BFmode under IX86_FPCMP_COMI. * config/i386/i386.cc (ix86_multiplication_cost): Use SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P. (ix86_division_cost): Ditto. (ix86_rtx_costs): Ditto. (ix86_vector_costs::add_stmt_cost): Ditto. * config/i386/i386.h (SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Rename to ... (SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P): ...this, and add BFmode. * config/i386/i386.md (*cmpibf): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-comibf-1.c: New test. * gcc.target/i386/avx10_2-comibf-2.c: Ditto.
2024-11-05Handle T_HRESULT types in CodeView recordsMark Harmstone2-3/+25
Follow MSVC in having a special type value, T_HRESULT, for (signed) longs that have been typedef'd with the name "HRESULT". This is so that the debugger can display user-friendly constant names when debugging COM code. gcc/ * dwarf2codeview.cc (get_type_num_typedef): New function. (get_type_num): Call get_type_num_typedef. * dwarf2codeview.h (T_HRESULT): Define.
2024-11-05Write LF_POINTER CodeView types for pointers to member functions or dataMark Harmstone2-0/+112
Translate DW_TAG_ptr_to_member_type DIEs into special extended LF_POINTER CodeView types. gcc/ * dwarf2codeview.cc (struct codeview_custom_type): Add new fields to lf_pointer struct in union. (write_lf_pointer): Write containing_class and ptr_to_mem_type if applicable. (get_type_num_subroutine_type): Write correct containing_class_type if this is a pointer to a member function. (get_type_num_ptr_to_member_type): New function. (get_type_num): Call get_type_num_ptr_to_member_type. * dwarf2codeview.h (CV_PTR_MODE_MASK, CV_PTR_MODE_PMEM): Define. (CV_PTR_MODE_PMFUNC, CV_PMTYPE_D_Single, CV_PMTYPE_F_Single): Likewise.
2024-11-05Write LF_BCLASS records in CodeView struct definitionsMark Harmstone1-0/+70
When writing the CodeView type definition for a struct, translate DW_TAG_inheritance DIEs into LF_BCLASS records, to record which other structs this one inherits from. gcc/ * dwarf2codeview.cc (enum cv_leaf_type): Add LF_BCLASS. (struct codeview_subtype): Add lf_bclass to union. (write_cv_padding): Add declaration. (write_lf_fieldlist): Handle LF_BCLASS records. (add_struct_inheritance): New function. (get_type_num_struct): Call add_struct_inheritance.
2024-11-05c++/modules: Merge default arguments [PR99274]Nathaniel Shead11-3/+206
When merging a newly imported declaration with an existing declaration we don't currently propagate new default arguments, which causes issues when modularising header units. This patch adds logic to propagate default arguments to existing declarations on import, and error if the defaults do not match. PR c++/99274 gcc/cp/ChangeLog: * module.cc (trees_in::is_matching_decl): Merge default arguments. * tree.cc (cp_tree_equal) <AGGR_INIT_EXPR>: Handle unification of AGGR_INIT_EXPRs with new VAR_DECL slots. gcc/testsuite/ChangeLog: * g++.dg/modules/lambda-7.h: Skip ODR-violating declaration when testing ODR deduplication. * g++.dg/modules/lambda-7_b.C: Note we're testing ODR deduplication. * g++.dg/modules/default-arg-1_a.H: New test. * g++.dg/modules/default-arg-1_b.C: New test. * g++.dg/modules/default-arg-2_a.H: New test. * g++.dg/modules/default-arg-2_b.C: New test. * g++.dg/modules/default-arg-3.h: New test. * g++.dg/modules/default-arg-3_a.H: New test. * g++.dg/modules/default-arg-3_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2024-11-05c++/modules: Handle location exhaustion in write_location [PR105443]Nathaniel Shead1-8/+34
The 'location_t' type currently only stores a limited number of distinct locations. In some cases, if many modules are imported that sum up to a large number of locations, we may run out of room to represent new locations for these imported declarations. In such a case, any new declarations from the affected modules simply get given a location of "the module interface as a whole". 'write_location' sometimes gets confused when this happens: it finds that the location is a location we've noted to get streamed out, but it's inconsistent whether it's an ordinary location from the current module or an imported location from a different module. This causes random-looking locations to be associated with these declarations, and occasionally (checking-only) ICEs. This patch fixes the issue by first checking whether an ordinary location represents a module (rather than a location inside a module); if so, we instead write the location of the point that we imported this module. This will continue recursively in case the importing location also was not able to be stored. We only need to handle this in the IS_ORDINARY_LOC case: even for locations originally within macro expansions, the remapping logic for location exhaustion will make them look like ordinary locs again. This is a relatively expensive addition, so this new check only occurs if we've noted resource exhaustion has occurred while preparing imported line maps, or in checking builds. PR c++/105443 gcc/cp/ChangeLog: * module.cc (loc_spans::locs_exhausted_p): New field. (loc_spans::loc_spans): Initialise it. (loc_spans::locations_exhausted_p): New function. (module_state::read_prepare_maps): Move inform into... (loc_spans::report_location_exhaustion): ...this new function. (module_state::write_location): Check for writing module locations stored due to resource exhaustion. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2024-11-05Daily bump.GCC Administrator5-1/+606
2024-11-05simulate-thread tests: Silence gdb debuginfod warningH.J. Lu2-0/+18
When gdb defaults to use debuginfod, gdb warns simulate-thread tests: spawn gdb -nx -nw -batch -x /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/simulate-thread/simulate-thread.gdb ./atomic-load-int.exe Breakpoint 1 at 0x4005cc: file /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/simulate-thread/atomic-load-int.c, line 97. This GDB supports auto-downloading debuginfo from the following URLs: <https://debuginfod.fedoraproject.org/> Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal] Debuginfod has been disabled. To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. Silence gdb warning by setting DEBUGINFOD_URLS to "" and restore it if it exists. PR testsuite/117300 * g++.dg/simulate-thread/simulate-thread.exp: Set DEBUGINFOD_URLS to "" and restore it if it exists. * gcc.dg/simulate-thread/simulate-thread.exp: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-05guality tests: Silence gdb debuginfod warningH.J. Lu3-0/+30
When gdb defaults to use debuginfod, gdb warns guality tests: Spawning: gdb -nx -nw -quiet -batch -x pr36728-2.gdb ./pr36728-2.exe spawn gdb -nx -nw -quiet -batch -x pr36728-2.gdb ./pr36728-2.exe Breakpoint 1 at 0x4004ba: file /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/guality/pr36728-2.c, line 18. This GDB supports auto-downloading debuginfo from the following URLs: <https://debuginfod.fedoraproject.org/> Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal] Debuginfod has been disabled. To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. After 'set debuginfod enabled off' is added to ~/.gdbinit, gdb warning doesn't go away since -nx option ignores ~/.gdbinit. Silence gdb warning by setting DEBUGINFOD_URLS to "" and restore if it exists. PR testsuite/117300 * g++.dg/guality/guality.exp: Set DEBUGINFOD_URLS to "" and restore it if it exists. * gcc.dg/guality/guality.exp: Likewise. * gfortran.dg/guality/guality.exp: Likewise. Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-04[PATCH v2 2/2] RISC-V: Disable by pieces for vector setmem length > ↵Craig Blackmore4-11/+35
UNITS_PER_WORD For fast unaligned access targets, by pieces uses up to UNITS_PER_WORD size pieces resulting in more store instructions than needed. For example gcc.target/riscv/rvv/base/setmem-2.c:f1 built with `-O3 -march=rv64gcv -mtune=thead-c906`: ``` f1: vsetivli zero,8,e8,mf2,ta,ma vmv.v.x v1,a1 vsetivli zero,0,e32,mf2,ta,ma sb a1,14(a0) vmv.x.s a4,v1 vsetivli zero,8,e16,m1,ta,ma vmv.x.s a5,v1 vse8.v v1,0(a0) sw a4,8(a0) sh a5,12(a0) ret ``` The slow unaligned access version built with `-O3 -march=rv64gcv` used 15 sb instructions: ``` f1: sb a1,0(a0) sb a1,1(a0) sb a1,2(a0) sb a1,3(a0) sb a1,4(a0) sb a1,5(a0) sb a1,6(a0) sb a1,7(a0) sb a1,8(a0) sb a1,9(a0) sb a1,10(a0) sb a1,11(a0) sb a1,12(a0) sb a1,13(a0) sb a1,14(a0) ret ``` After this patch, the following is generated in both cases: ``` f1: vsetivli zero,15,e8,m1,ta,ma vmv.v.x v1,a1 vse8.v v1,0(a0) ret ``` gcc/ChangeLog: * config/riscv/riscv.cc (riscv_use_by_pieces_infrastructure_p): New function. (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P): Define. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113469.c: Expect mf2 setmem. * gcc.target/riscv/rvv/base/setmem-2.c: Update f1 to expect straight-line vector memset. * gcc.target/riscv/rvv/base/setmem-3.c: Likewise.
2024-11-04[PATCH v2 1/2] RISC-V: Make vectorized memset handle more casesCraig Blackmore2-21/+22
`expand_vec_setmem` only generated vectorized memset if it fitted into a single vector store of at least (TARGET_MIN_VLEN / 8) bytes. Also, without dynamic LMUL the operation was always TARGET_MAX_LMUL even if it would have fitted a smaller LMUL. Allow vectorized memset to be generated for smaller lengths and smaller LMUL by switching to using use_vector_string_op. Smaller LMUL can be seen in setmem-3.c:f3. Smaller lengths will be seen after the second patch in this series which selectively disables by pieces. gcc/ChangeLog: * config/riscv/riscv-string.cc (use_vector_stringop_p): Add comment. (expand_vec_setmem): Use use_vector_stringop_p instead of check_vectorise_memory_operation. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/setmem-3.c: Expect smaller lmul.
2024-11-04libgccjit: Add convert vectorAntoni Boucher11-0/+286
gcc/jit/ChangeLog: * docs/topics/compatibility.rst (LIBGCCJIT_ABI_30): New ABI tag. * docs/topics/expressions.rst: Document gcc_jit_context_convert_vector. * jit-playback.cc (convert_vector): New method. * jit-playback.h: New method. * jit-recording.cc (recording::context::new_convert_vector, recording::convert_vector::replay_into, recording::convert_vector::visit_children, recording::convert_vector::make_debug_string, recording::convert_vector::write_reproducer): New methods. * jit-recording.h (class convert_vector): New class. (context::new_convert_vector): New method. * libgccjit.cc (gcc_jit_context_convert_vector): New function. * libgccjit.h (gcc_jit_context_convert_vector): New function. * libgccjit.map: New function. gcc/testsuite/ChangeLog: * jit.dg/all-non-failing-tests.h: New test. * jit.dg/test-convert-vector.c: New test.