aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-10-08ada: Add External_Initialization extensionRonan Desplanques26-54/+329
This patch introduces a GNAT extension that adds a new aspect, External_Initialization. A section is added to the reference manual with a description of what the aspect does. The implementation reuses existing mechanisms, in particular Sinput.L.Load_Source_File and Sem_Res.Set_String_Literal_Subtype. A new node kind is added, and nodes of that type are present in what is passed to the back ends. That makes it necessary to update the back ends to handle the new node type. The C interface is extended to make that possible. gcc/ada/ChangeLog: * aspects.ads: Add entities for External_Initialization. * checks.adb (Selected_Length_Checks): Add support for N_External_Initializer nodes. * doc/gnat_rm/gnat_language_extensions.rst: Add section for the added extension. * exp_util.adb (Insert_Actions): Add support for N_External_Initializer nodes. * fe.h (C_Source_Buffer): New function. * gen_il-fields.ads: Add new field. * gen_il-gen-gen_nodes.adb: Add N_External_Initializer node kind. * gen_il-gen.adb: Add new field type. * gen_il-types.ads: Add new node kind and new field type. * pprint.adb (Expr_Name): Handle new node kind. * sem.adb (Analyze): Add support for N_External_Initializer nodes. * sem_ch13.adb (Analyze_Aspect_Specifications, Check_Aspect_At_Freeze_Point): Add support for External_Initialization aspect. * sem_ch3.adb (Apply_External_Initialization): New subprogram. (Analyze_Object_Declaration): Add support for External_Initialization aspect. * sem_res.adb (Resolve_External_Initializer): New procedure. (Resolve): Add support for N_External_Initializer nodes. (Set_String_Literal_Subtype): Extend to handle N_External_Initializer nodes. * sinfo-utils.adb (Is_In_Union_Id): Adapt to new field addition. * sinfo.ads: Add documentation for new node kind and new field. * sinput.adb, sinput.ads (C_Source_Buffer): Add new C interface function. * snames.ads-tmpl: Add new aspect identifier. * sprint.adb (Sprint_Node_Actual): Add nop handling of N_External_Initializer nodes. * types.ads: Modify type to allow for new C interface. * gcc-interface/trans.cc (gnat_to_gnu): Handle new GNAT node type. * gcc-interface/Make-lang.in: Update list of stage1 run-time library units. * gnat-style.texi: Regenerate. * gnat_rm.texi: Regenerate. * gnat_ugn.texi: Regenerate.
2024-10-08ada: Use a-nallfl__wraplf.ads for AndroidOlivier Hainque1-0/+1
This is the most common definition. Otherwise, from the default: a-nallfl.ads:51:13: ... intrinsic binding type mismatch on result a-nallfl.ads:51:13: ... intrinsic binding type mismatch on parameter 1 a-nallfl.ads:51:13: ... profile of "Sin" doesn't match the builtin it binds gcc/ada/ChangeLog: * Makefile.rtl (arm/aarch64-android): Associate a-nallfl.ads with libgnat/a-nallfl__wraplf.ads.
2024-10-08ada: Add System definitions of SIGSYS for AndroidOlivier Hainque3-0/+3
This allows reusing a-intnam__linux.ads for Android. gcc/ada/ChangeLog: * libgnarl/s-linux__android-arm.ads: Define SIGSYS. * libgnarl/s-linux__android-aarch64.ads: Define SIGSYS. * libgnarl/s-osinte__android.ads: Expose SIGSYS value.
2024-10-08ada: Rework s-linux/osinte for arm/aarch64-android sigactionsOlivier Hainque4-5/+143
Building an aarch64-android compiler with the current sources initially intended for arm-android expectedly trips on problems. This change is meant to address: ``` .../gcc/ada/rts % ../../gnat1 -quiet ... a-stbufi.adb -I. s-osinte.ads:591:07: error: component "sa_flags" overlaps "sa_mask" at line 590 ``` s-linux__android.ads makes hardcoded assumptions on the size of sigset_t, based on observations performed in the course of the arm port. Then sysem headers show sa_flags placed VERY differently between the 32 and the 64 bits variants. See android-sysroot/usr/include/bits/signal_types.h ``` %if defined(__LP64__) int sa_flags; \ union { \ sighandler_t sa_handler; \ void (*sa_sigaction)(int, struct siginfo*, void*); \ }; \ sigset_t sa_mask; \ void (*sa_restorer)(void); \ %else union { sighandler_t sa_handler; void (*sa_sigaction)(int, struct siginfo*, void*); }; sigset_t sa_mask; int sa_flags; void (*sa_restorer)(void); ``` gcc/ada/ChangeLog: * libgnarl/s-linux__android-arm.ads: New file, renaming of ... * libgnarl/s-linux__android.ads: ... this file. * libgnarl/s-linux__android-aarch64.ads: New file. Based on the -arm variant, with sa_ field positions adjusted. * Makefile.rtl (arm/aarch64-android pairs): Adjust accordingly. * libgnarl/s-osinte__android.ads: Rather than making assumptions on the actual type of the C sigset_t, use Os_Constants.SIZEOF_sigset_t to define an Ada sigset_t type of the proper size. Use C.int instead of unsigned_long for sa_flags.
2024-10-08ada: Account for aarch64 in init.c section for AndroidOlivier Hainque1-0/+12
Unlike the ARM port already there, aarch64 is dwarf CFI based for unwinding and Android-Linux exposes kernel CFI for signal handlers. gcc/ada/ChangeLog: * init.c (__gnat_error_handler): Map signals straight to Ada exceptions, without a local CFI trampoline. (__gnat_adjust_context_for_raise): Guard arm specific code on __arm__ compilation. Do nothing otherwise, relying on libgcc's signal frame recognition for PC/RA adjustments.
2024-10-08ada: Extend arm-android section of Makefile.rtl to aarch64Olivier Hainque1-5/+223
gcc/ada/ChangeLog: * Makefile.rtl: Extend arm-android section to aarch64, in a similar fashion as other arm/arch64 configurations. Introduce pair selection guards to prevent match of aarch64-linux-android on the regular aarch64-linux% cross as well.
2024-10-08ada: sem_prag.adb: ignore compile_time_{warning,error} in CodePeer modeGhjuvan Lacambre1-0/+5
GNAT sometimes needs help from the GCC back-end in order to check whether Compile_Time_{Warning,Error} are true. As CodePeer does not have access to a GCC back-end, it is unable to perform these checks. Thus we need to remove said pragmas from the tree. gcc/ada/ChangeLog: * sem_prag.adb (Process_Compile_Time_Warning_Or_Error): Turn Compile_Time pragmas into null nodes
2024-10-08contrib, libcpp, libstdc++: Update to Unicode 16.0Jakub Jelinek20-17890/+25060
It is autumn again and there is a new Unicode version 16.0. The following patch updates our Unicode stuff in contrib, libcpp and libstdc++ from that Unicode version. 2024-10-08 Jakub Jelinek <jakub@redhat.com> contrib/ * unicode/README: Update glibc git commit hash, replace Unicode 15 or 15.1 versions with 16. * unicode/gen_libstdcxx_unicode_data.py: Use 160000 instead of 150100 in _GLIBCXX_GET_UNICODE_DATA test. * unicode/from_glibc/utf8_gen.py: Updated from glibc 064c708c78cc2a6b5802dce73108fc0c1c6bfc80 commit. * unicode/DerivedCoreProperties.txt: Updated from Unicode 16.0. * unicode/emoji-data.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/GraphemeBreakProperty.txt: Likewise. * unicode/DerivedNormalizationProps.txt: Likewise. * unicode/NameAliases.txt: Likewise. * unicode/UnicodeData.txt: Likewise. * unicode/EastAsianWidth.txt: Likewise. gcc/testsuite/ * c-c++-common/cpp/named-universal-char-escape-1.c: Add tests for some Unicode 16.0 characters, both normal and generated. libcpp/ * makeucnid.cc (write_copyright): Update Unicode Copyright years. * makeuname2c.cc (generated_ranges): Adjust Unicode version from 15.1 to 16.0. Add EGYPTIAN HIEROGLYPH- generated range, adjust indexes in following entries. (write_copyright): Update Unicode Copyright years. * generated_cpp_wcwidth.h: Regenerated. * ucnid.h: Regenerated. * uname2c.h: Regenerated. libstdc++-v3/ * include/bits/unicode.h (std::__unicode::__v15_1_0): Rename inline namespace to ... (std::__unicode::__v16_0_0): ... this. (_GLIBCXX_GET_UNICODE_DATA): Change from 150100 to 160000. * include/bits/unicode-data.h: Regenerated. * testsuite/ext/unicode/properties.cc: Check for _Gcb_SpacingMark on U+11F03 rather than U+1D16D as the latter lost SpacingMark property in Unicode 16.0.
2024-10-08Recompute TYPE_MODE and DECL_MODE for vector_type for accelerator.Prathamesh Kulkarni2-16/+28
gcc/ChangeLog: PR ipa/96265 * lto-streamer-in.cc (lto_read_tree_1): Set TYPE_MODE and DECL_MODE for vector_type if offloading is enabled. (lto_input_mode_table): Remove handling of vector modes. * tree-streamer-out.cc (pack_ts_decl_common_value_fields): Stream out VOIDmode for vector_type if offloading is enabled. (pack_ts_decl_common_value_fields): Likewise. Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
2024-10-08testsuite: Relax line number match in gfortran.dg/pr95690.f90Andreas Schwab1-2/+4
The actual line number is target dependent, and immaterial for the test. * gfortran.dg/pr95690.f90: Allow matching error message anywhere.
2024-10-08diagnostics: Fix compile error for MinGW <7.0Torbjörn SVENSSON2-2/+12
The define ENABLE_VIRTUAL_TERMINAL_PROCESSING was introduced in MinGW 7.0 Build failure when building with MinGW 5.0.3: .../gcc/diagnostic-color.cc: In function 'bool should_colorize()': .../gcc/diagnostic-color.cc:317:41: error: 'ENABLE_VIRTUAL_TERMINAL_PROCESSING' was not declared in this scope mode |= ENABLE_PROCESSED_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .../gcc/diagnostic-color.cc:317:41: note: suggested alternative: 'ENABLE_RTL_FLAG_CHECKING' mode |= ENABLE_PROCESSED_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ENABLE_RTL_FLAG_CHECKING .../gcc/diagnostic-color.cc: In function 'bool auto_enable_urls()': .../gcc/diagnostic-color.cc:407:50: error: 'ENABLE_VIRTUAL_TERMINAL_PROCESSING' was not declared in this scope if (GetConsoleMode (handle, &mode) && !(mode & ENABLE_VIRTUAL_TERMINAL_PROCESSING)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .../gcc/diagnostic-color.cc:407:50: note: suggested alternative: 'ENABLE_RTL_FLAG_CHECKING' if (GetConsoleMode (handle, &mode) && !(mode & ENABLE_VIRTUAL_TERMINAL_PROCESSING)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ENABLE_RTL_FLAG_CHECKING Makefile:1195: recipe for target 'diagnostic-color.o' failed make[1]: *** [diagnostic-color.o] Error 1 gcc/ChangeLog: * diagnostic-color.cc: Conditionally enable terminal processing based on define availability. * pretty-print.cc: Likewise. Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2024-10-08LoongArch: Add support to annotate tablejumpXi Ruoyao5-2/+46
This is per the request from the kernel developers. For generating the ORC unwind info, the objtool program needs to analysis the control flow of a .o file. If a jump table is used, objtool has to correlate the jump instruction with the table. On x86 (where objtool was initially developed) it's simple: a relocation entry natrually correlates them because one single instruction is used for table-based jump. But on an RISC machine objtool would have to reconstruct the data flow if it must find out the correlation on its own. So, emit an additional section to store the correlation info as pairs of addresses, each pair contains the address of a jump instruction (jr) and the address of the jump table. This is very trivial to implement in GCC. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (mannotate-tablejump): New option. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.md (tablejump<mode>): Emit additional correlation info between the jump instruction and the jump table, if -mannotate-tablejump. * doc/invoke.texi: Document -mannotate-tablejump. gcc/testsuite/ChangeLog: * gcc.target/loongarch/jump-table-annotate.c: New test. Suggested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
2024-10-08RISC-V: Add an implicit dependency for ZawrsXiao Zeng3-0/+63
There is a description in <https://github.com/riscv/riscv-isa-manual/blob/main/src/zawrs.adoc>: "The instructions in the Zawrs extension are only useful in conjunction with the LR instruction, which is provided by the Zalrsc component of the A extension." It can be concluded that: zawrs -> zalrsc. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: zawrs -> zalrsc. gcc/testsuite/ChangeLog: * gcc.target/riscv/predef-38.c: New test. * gcc.target/riscv/predef-39.c: New test. Signed-off-by: Xiao Zeng <zengxiao@eswincomputing.com>
2024-10-08Daily bump.GCC Administrator11-1/+425
2024-10-07Move gfortran.dg/gomp/allocate-static.f90 to libgomp.fortran/Tobias Burnus1-28/+0
The testcase was turned into a 'dg-do run' check to check for the alignment, but this only works in testsuite/gfortran.dg, causing link errors for out-of-tree testing. The test was added in r15-4104-ga8caeaacf499d5. gcc/testsuite/: * gfortran.dg/gomp/allocate-static.f90: Move to libgomp/testsuite/. libgomp/: * testsuite/libgomp.fortran/allocate-static.f90: Moved from gcc/testsuite/ as it is a dg-do run test; use real omp_lib_kinds instead of local definition
2024-10-07libgomp.texi: Update and cleanup of Impl. Status of OpenMP TR13Tobias Burnus1-30/+40
libgomp/ChangeLog: * libgomp.texi (OpenMP Technical Report 13): Wording cleanup; sort as in Appendix B; add missing items; remove duplicates.
2024-10-07libcpp: Use constexpr for _cpp_trigraph_map initialization for C++14Jakub Jelinek2-2/+17
The _cpp_trigraph_map initialization used to be done for C99+ using designated initializers, but can't be done that way for C++ because the designated initializer support in C++ as array designators are just an extension there and don't allow skipping anything nor going backwards. But, we can get the same effect using C++14 constexpr constructor. With the following patch we get rid of the runtime initialization and the array can be in .rodata. 2024-10-07 Jakub Jelinek <jakub@redhat.com> * internal.h (_cpp_trigraph_map_s): New type for C++14 or later. (_cpp_trigraph_map_d): New variable for C++14 or later. (_cpp_trigraph_map): Define to _cpp_trigraph_map_d.map for C++14 or later. * init.cc (init_trigraph_map): Define to nothing for C++14 or later. (TRIGRAPH_MAP, END, s): Define differently for C++14 or later.
2024-10-07Implement MAXLOC and MINLOC for unsigned.Thomas Koenig66-59/+30131
gcc/fortran/ChangeLog: * check.cc (gfc_check_minloc_maxloc): Handle BT_UNSIGNED. * trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Likewise. * gfortran.texi: Document MAXLOC and MINLOC for UNSIGNED. libgfortran/ChangeLog: * Makefile.am: Add files for unsigned MINLOC and MAXLOC. * Makefile.in: Regenerated. * gfortran.map: Add files for unsigned MINLOC and MAXLOC. * generated/maxloc0_16_m1.c: New file. * generated/maxloc0_16_m16.c: New file. * generated/maxloc0_16_m2.c: New file. * generated/maxloc0_16_m4.c: New file. * generated/maxloc0_16_m8.c: New file. * generated/maxloc0_4_m1.c: New file. * generated/maxloc0_4_m16.c: New file. * generated/maxloc0_4_m2.c: New file. * generated/maxloc0_4_m4.c: New file. * generated/maxloc0_4_m8.c: New file. * generated/maxloc0_8_m1.c: New file. * generated/maxloc0_8_m16.c: New file. * generated/maxloc0_8_m2.c: New file. * generated/maxloc0_8_m4.c: New file. * generated/maxloc0_8_m8.c: New file. * generated/maxloc1_16_m1.c: New file. * generated/maxloc1_16_m2.c: New file. * generated/maxloc1_16_m4.c: New file. * generated/maxloc1_16_m8.c: New file. * generated/maxloc1_4_m1.c: New file. * generated/maxloc1_4_m16.c: New file. * generated/maxloc1_4_m2.c: New file. * generated/maxloc1_4_m4.c: New file. * generated/maxloc1_4_m8.c: New file. * generated/maxloc1_8_m1.c: New file. * generated/maxloc1_8_m16.c: New file. * generated/maxloc1_8_m2.c: New file. * generated/maxloc1_8_m4.c: New file. * generated/maxloc1_8_m8.c: New file. * generated/minloc0_16_m1.c: New file. * generated/minloc0_16_m16.c: New file. * generated/minloc0_16_m2.c: New file. * generated/minloc0_16_m4.c: New file. * generated/minloc0_16_m8.c: New file. * generated/minloc0_4_m1.c: New file. * generated/minloc0_4_m16.c: New file. * generated/minloc0_4_m2.c: New file. * generated/minloc0_4_m4.c: New file. * generated/minloc0_4_m8.c: New file. * generated/minloc0_8_m1.c: New file. * generated/minloc0_8_m16.c: New file. * generated/minloc0_8_m2.c: New file. * generated/minloc0_8_m4.c: New file. * generated/minloc0_8_m8.c: New file. * generated/minloc1_16_m1.c: New file. * generated/minloc1_16_m16.c: New file. * generated/minloc1_16_m2.c: New file. * generated/minloc1_16_m4.c: New file. * generated/minloc1_16_m8.c: New file. * generated/minloc1_4_m1.c: New file. * generated/minloc1_4_m16.c: New file. * generated/minloc1_4_m2.c: New file. * generated/minloc1_4_m4.c: New file. * generated/minloc1_4_m8.c: New file. * generated/minloc1_8_m1.c: New file. * generated/minloc1_8_m16.c: New file. * generated/minloc1_8_m2.c: New file. * generated/minloc1_8_m4.c: New file. * generated/minloc1_8_m8.c: New file. gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_35.f90: New test.
2024-10-07[RISC-V] Add splitters to restore condops generation after recent phiopt changesJeff Law7-21/+131
V2: Fix typo in ChangeLog. Remove now extraneous comment in cset-sext.c. Throttle back branch cost to 1 in various tests -- Andrew P's recent improvements to phiopt regressed on the riscv testsuite. Essentially the new code presented to the RTL optimizers is straightline code rather than branchy for the CE pass to analyze and optimize. In the absence of conditional move support or sfb, the new code would be better. Unfortunately the presented form isn't a great fit for xventanacondops, zicond or xtheadcondmov. The net is the resulting code is actually slightly worse than before. Essentially sne+czero turned into sne+sne+and. Thankfully, combine is presented with (and (ne (op1) (const_int 0)) (ne (op2) (const_int 0))) As the RHS of a set. We can use a 3->2 splitter to guide combine on how to profitably rewrite the sequence in a form suitable for condops. Just splitting that would be enough to fix the regression, but I'm fairly confident that other cases need to be handled and would have regressed had the testsuite been more thorough. One arm of the AND is going to turn into an sCC instruction. We have a variety of those that we support. The codes vary as do the allowed operands of the sCC. That produces a set of new splitters to handle those cases. The other arm is going to turn into a czero (or similar) instruction. That one can be generalized to eq/ne. So another set for that generalization. We can remove a couple of XFAILs in the rv32 space as it's behaving much more like rv64 at this point. For SFB targets it's unclear if the new code is better or worse. In both cases it's a 3 instruction sequence. So I just adjusted the test. If the new code is worse for SFB, someone with an understanding of the tradeoffs for an SFB target will need to make adjustments. Tested in my tester on rv64gcv and rv32gc. Will wait for the pre-commit testers to render their verdict before moving forward. gcc/ * config/riscv/iterators.md (scc_0): New code iterator. * config/riscv/zicond.md: New splitters to improve code generated for cases like (and (scc) (scc)) for zicond, xventanacondops, xtheadcondmov. gcc/testsuite/ * gcc.target/riscv/cset-sext-sfb.c: Turn off ssa-phiopt. * gcc.target/riscv/cset-sext-thead.c: Do not check CE output anymore. * gcc.target/riscv/cset-sext-ventana.c: Similarly. Adjust branch cost. * gcc.target/riscv/cset-sext-zicond.c: Similarly. * gcc.target/riscv/cset-sext.c: Similarly. No longer allow "neg" in asm output.
2024-10-07c: ICE in build_counted_by_ref [PR116735]qing zhao2-14/+43
When handling the counted_by attribute, if the corresponding field doesn't exit, in additiion to issue error, we should also remove the already added non-existing "counted_by" attribute from the field_decl. PR c/116735 gcc/c/ChangeLog: * c-decl.cc (verify_counted_by_attribute): Remove the attribute when error. gcc/testsuite/ChangeLog: * gcc.dg/flex-array-counted-by-9.c: New test.
2024-10-07c++: -Wmismatched-tags and modulesJason Merrill1-1/+1
In Wmismatched-tags-6.C, we try to compare two declarations of the Cp alias template, and ICE trying to check whether they're in module purview. We need to check DECL_LANG_SPECIFIC like elsewhere in the compiler. gcc/cp/ChangeLog: * decl.cc (duplicate_decls): Only check PURVIEW_P if DECL_LANG_SPECIFIC.
2024-10-07c++: require_deduced_type and modulesJason Merrill3-10/+11
With modules more variables have DECL_LANG_SPECIFIC, so we were failing to call require_deduced_type in constexpr-if30.C. gcc/cp/ChangeLog: * decl2.cc (mark_used): Always check require_deduced_type. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/auto43.C: Adjust diagnostic. * g++.dg/cpp2a/lambda-generic7.C: Likewise.
2024-10-07c++: modules don't require preprocessor outputJason Merrill7-39/+73
init_modules has rejected -M -fmodules-ts on the premise that module dependency analysis requires macro expansion, but this is no longer accurate; P1857 prohibited module directives produced by macro expansion. They can still be dependent on #if directives, but those are still handled with -fdirectives-only. What wasn't working was -M or -dM, because cpp_scan_nooutput never called module_token_pre to implement the import. The simplest fix is to use the -fdirectives-only scan when modules are enabled and teach directives_only_cb about flag_no_output. gcc/cp/ChangeLog: * module.cc (init_modules): Don't warn about -M. gcc/c-family/ChangeLog: * c-ppoutput.cc (preprocess_file): For modules, use directives-only scan even with flag_no_output. (directives_only_cb): Respect flag_no_output. gcc/ChangeLog: * doc/invoke.texi (C++ Module Preprocessing): Allow -M, refer to -fdeps. gcc/testsuite/ChangeLog: * g++.dg/modules/macro-8_a.H: New test. * g++.dg/modules/macro-8_b.C: New test. * g++.dg/modules/macro-8_c.C: New test. * g++.dg/modules/macro-8_d.C: New test.
2024-10-07arm: fix bootstrap issue with arm_noce_conversion_profitable_p patch [NFC]Andre Vieira1-2/+2
This obvious patch fixes two warnings introduced with the implementation of arm_noce_conversion_profitable_p hook. gcc/ChangeLog: * config/arm/arm.cc (arm_noce_oncersion_profitable_p): Remove unused argument name. (arm_is_v81m_cond_insn): Initialize variable.
2024-10-07gcc: Remove executable permissions of testcases and *.md filesJakub Jelinek15-0/+0
I've noticed some files were marked as executable, as can be seen with find . \( -name \*.[chSC] -o -name \*.md -o -name \*.cc \) -a -perm /111 | xargs ls -l This commit fixes that. 2024-10-07 Jakub Jelinek <jakub@redhat.com> gcc/ * config/riscv/vector-crypto.md: Remove executable permissions. gcc/testsuite/ * gcc.target/aarch64/uxtl-combine-1.c: Remove executable permissions. * gcc.target/aarch64/uxtl-combine-2.c: Likewise. * gcc.target/aarch64/uxtl-combine-3.c: Likewise. * gcc.target/aarch64/uxtl-combine-4.c: Likewise. * gcc.target/aarch64/uxtl-combine-5.c: Likewise. * gcc.target/aarch64/uxtl-combine-6.c: Likewise. * gcc.target/gcn/complex.c: Likewise. * gcc.target/i386/avx2-bf16-vec-absneg.c: Likewise. * gcc.target/i386/avx512f-bf16-vec-absneg.c: Likewise. * gcc.target/i386/pr104371-2.c: Likewise. * gcc.target/i386/pr115146.c: Likewise. * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Likewise. * g++.target/i386/pr107563-a.C: Likewise. * g++.target/i386/pr107563-b.C: Likewise.
2024-10-07middle-end: reorder masking priority of math functionsVictor Do Nascimento2-9/+42
Given the categorization of math built-in functions as `ECF_CONST', when if-converting their uses, their calls are not masked and are thus called with an all-true predicate. This, however, is not appropriate where built-ins have library equivalents, wherein they may exhibit highly architecture-specific behaviors. For example, vectorized implementations may delegate the computation of values outside a certain acceptable numerical range to special (non-vectorized) routines which considerably slow down computation. As numerical simulation programs often do bounds check on input values prior to math calls, conditionally assigning default output values for out-of-bounds input and skipping the math call altogether, these fallback implementations should seldom be called in the execution of vectorized code. If, however, we don't apply any masking to these math functions, we end up effectively executing both if and else branches for these values, leading to considerable performance degradation on scientific workloads. We therefore invert the order of handling of math function calls in `if_convertible_stmt_p' to prioritize the handling of their library-provided implementations over the equivalent internal function. gcc/ChangeLog: * tree-if-conv.cc (if_convertible_stmt_p): Check for explicit function declaration before IFN fallback. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-fncall-mask-math.c: New.
2024-10-07vect: Add more dump messages for VLA SLP permutation [PR116583]Richard Sandiford1-2/+14
Taking the !repeating_p route for VLA vectors causes analysis to fail, but it wasn't clear from the dump files when this had happened, and which node caused it. gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Add more dump messages.
2024-10-07vect: Support more VLA SLP permutations [PR116583]Richard Sandiford3-29/+82
This is the main patch for PR116583. Previously, we only supported VLA SLP permutations for which the output and inputs have the same number of lanes, and for which that number of lanes divides the number of vector elements. The patch extends this to handle: (1) "packs" of a single 2N-vector input into an N-vector output (2) "unpacks" of N-vector inputs into an XN-vector output Hopefully the comments in the code explain the approach. The contents of the: for (unsigned i = 0; i < ncopies; ++i) loop do not change; the patch simply adds an outer loop around it. The patch removes the XFAIL in slp-13.c and also improves the SVE vect.exp results with vect-force-slp=1. I haven't added new tests specifically for this, since presumably the existing ones will cover it once the SLP switch is flipped. gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Handle variable-length pack and unpack permutations. gcc/testsuite/ PR tree-optimization/116583 * gcc.dg/vect/slp-13.c: Remove xfail for vect_variable_length. * gcc.dg/vect/slp-13-big-array.c: Likewise.
2024-10-07vect: Restructure repeating_p case for SLP permutations [PR116583]Richard Sandiford1-34/+41
The repeating_p case previously handled the specific situation in which the inputs have N lanes and the output has N lanes, where N divides the number of vector elements. In that case, every output uses the same permute vector. The code was therefore structured so that the outer loop only constructed one permute vector, with an inner loop generating as many VEC_PERM_EXPRs from it as required. However, the main patch for PR116583 adds support for cycling through N permute vectors, rather than just having one. The current structure doesn't really handle that case well. (We'd need to interleave the results after generating them, which sounds a bit fragile.) This patch instead makes the transform phase calculate each output vector's permutation explicitly, like for the !repeating_p path. As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS. This arguably undermines one of the justifications for using repeating_p for constant-length vectors: that the repeating_p path involved less work than the !repeating_p path. That justification does still hold for the analysis phase, though, and that should be the more time-sensitive part. And the other justification -- to get more coverage of the code -- still applies. So I'd prefer that we continue to use repeating_p for constant-length vectors unless that causes a known missed optimisation. gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove the noutputs_per_mask inner loop and instead generate a separate permute vector for each output.
2024-10-07vect: Variable lane indices in vectorizable_slp_permutation_1 [PR116583]Richard Sandiford1-7/+9
The main patch for PR116583 needs to create variable indices into an input vector. This pre-patch changes the types to allow that. There is no pretty-print format for poly_uint64 because of issues with passing C++ objects through "...". gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Using poly_uint64 for scalar lane indices.
2024-10-07aarch64: Fix general permutes of svbfloat16_tsRichard Sandiford3-18/+27
Testing gcc.target/aarch64/sve/permute_2.c without the associated GCC patch triggered an unrecognisable insn ICE for the svbfloat16_t tests. This was because the implementation of general two-vector permutes requires two TBLs and an ORR, with the ORR being represented as an unspec for floating-point modes. The associated pattern did not cover VNx8BF. gcc/ * config/aarch64/iterators.md (SVE_I): Move further up file. (SVE_F): New mode iterator. (SVE_ALL): Redefine in terms of SVE_I and SVE_F. * config/aarch64/aarch64-sve.md (*<LOGICALF:optab><mode>3): Extend to all SVE_F. gcc/testsuite/ * gcc.target/aarch64/sve/permute_5.c: New test.
2024-10-07aarch64: Handle SVE modes in aarch64_evpc_reencode [PR116583]Richard Sandiford5-9/+633
For Advanced SIMD modes, aarch64_evpc_reencode tests whether a permute in a narrow element mode can be done more cheaply in a wider mode. For example, { 0, 1, 8, 9, 4, 5, 12, 13 } on V8HI is a natural TRN1 on V4SI ({ 0, 4, 2, 6 }). This patch extends the code to handle SVE data and predicate modes as well. This is a prerequisite to getting good results for PR116583. gcc/ PR target/116583 * config/aarch64/aarch64.cc (aarch64_coalesce_units): New function, extending the Advanced SIMD handling from... (aarch64_evpc_reencode): ...here to SVE data and predicate modes. gcc/testsuite/ PR target/116583 * gcc.target/aarch64/sve/permute_1.c: New test. * gcc.target/aarch64/sve/permute_2.c: Likewise. * gcc.target/aarch64/sve/permute_3.c: Likewise. * gcc.target/aarch64/sve/permute_4.c: Likewise.
2024-10-07testsuite: Unset torture_current_flags after useRichard Sandiford1-0/+1
Before running a test with specific torture options, gcc-dg-runtest sets the global variable torture_current_flags to the set of torture options that will be used. However, it never unset the variable afterwards, which meant that the last options would hang around and potentially confuse later non-torture tests. I saw this with a follow-on patch to check-function-bodies, but it's probably possible to construct aritificial test combinations that expose it with check-function-bodies's existing flag filtering. gcc/testsuite/ * lib/gcc-dg.exp (gcc-dg-runtest): Unset torture_current_flags after each test.
2024-10-07tree-optimization/116990 - missed control flow check in vect_analyze_loop_formRichard Biener1-3/+2
The following fixes checking for unsupported control flow in vectorization to also cover the outer loop body. PR tree-optimization/116990 * tree-vect-loop.cc (vect_analyze_loop_form): Check the current loop body for control flow.
2024-10-07tree-optimization/116982 - analyze scalar loop exit earlyRichard Biener4-9/+27
The following makes sure to discover the scalar loop IV exit during analysis as failure to do so (if DCE and friends are disabled this can happen due to if-conversion doing DCE and FRE on the if-converted loop) would ICE later. I refrained from larger refactoring to be able to eventually backport. PR tree-optimization/116982 * tree-vectorizer.h (vect_analyze_loop): Pass in .LOOP_VECTORIZED call. (vect_analyze_loop_form): Likewise. * tree-vect-loop.cc (vect_analyze_loop_form): Reject loops where we cannot determine a IV exit for the scalar loop. (vect_analyze_loop): Adjust. * tree-vectorizer.cc (try_vectorize_loop_1): Likewise. * tree-parloops.cc (gather_scalar_reductions): Likewise.
2024-10-07testsuite: Prevent unrolling of main in LTO test [PR116683]Alex Coplan1-0/+1
In r15-3585-g9759f6299d9633cabac540e5c893341c708093ac I added a test which started failing on PowerPC. The test checks that we unroll exactly one loop three times with the following: // { dg-final { scan-ltrans-rtl-dump-times "Unrolled loop 3 times" 1 "loop2_unroll" } } which passes on most targets. However, on PowerPC, the loop in main gets unrolled too, causing the scan-ltrans-rtl-dump-times check to fail as the statement now appears twice in the dump. I think the extra unrolling is due to different unrolling heuristics in the rs6000 port. This patch therefore explicitly tries to block the unrolling in main with an appropriate #pragma. gcc/testsuite/ChangeLog: PR testsuite/116683 * g++.dg/ext/pragma-unroll-lambda-lto.C (main): Add #pragma to prevent unrolling of the setup loop.
2024-10-07ssa-math-opts, i386: Improve spaceship expansion [PR116896]Jakub Jelinek9-41/+373
The PR notes that we don't emit optimal code for C++ spaceship operator if the result is returned as an integer rather than the result just being compared against different values and different code executed based on that. So e.g. for template <typename T> auto foo (T x, T y) { return x <=> y; } for both floating point types, signed integer types and unsigned integer types. auto in that case is std::strong_ordering or std::partial_ordering, which are fancy C++ abstractions around struct with signed char member which is -1, 0, 1 for the strong ordering and -1, 0, 1, 2 for the partial ordering (but for -ffast-math 2 is never the case). I'm afraid functions like that are fairly common and unless they are inlined, we really need to map the comparison to those -1, 0, 1 or -1, 0, 1, 2 values. Now, for floating point spaceship I've in the past already added an optimization (with tree-ssa-math-opts.cc discovery and named optab, the optab only defined on x86 though right now), which ensures there is just a single comparison instruction and then just tests based on flags. Now, if we have code like: auto a = x <=> y; if (a == std::partial_ordering::less) bar (); else if (a == std::partial_ordering::greater) baz (); else if (a == std::partial_ordering::equivalent) qux (); else if (a == std::partial_ordering::unordered) corge (); etc., that results in decent code generation, the spaceship named pattern on x86 optimizes for the jumps, so emits comparisons on the flags, followed by setting the result to -1, 0, 1, 2 and subsequent jump pass optimizes that well. But if the result needs to be stored into an integer and just returned that way or there are no immediate jumps based on it (or turned into some non-standard integer values like -42, 0, 36, 75 etc.), then CE doesn't do a good job for that, we end up with say comiss %xmm1, %xmm0 jp .L4 seta %al movl $0, %edx leal -1(%rax,%rax), %eax cmove %edx, %eax ret .L4: movl $2, %eax ret The jp is good, that is the unlikely case and can't be easily handled in straight line code due to the layout of the flags, but the rest uses cmov which often isn't a win and a weird math. With the patch below we can get instead xorl %eax, %eax comiss %xmm1, %xmm0 jp .L2 seta %al sbbl $0, %eax ret .L2: movl $2, %eax ret The patch changes the discovery in the generic code, by detecting if the future .SPACESHIP result is just used in a PHI with -1, 0, 1 or -1, 0, 1, 2 values (the latter for HONOR_NANS) and passes that as a flag in a new argument to .SPACESHIP ifn, so that the named pattern is told whether it should optimize for branches or for loading the result into a -1, 0, 1 (, 2) integer. Additionally, it doesn't detect just floating point <=> anymore, but also integer and unsigned integer, but in those cases only if an integer -1, 0, 1 is wanted (otherwise == and > or similar comparisons result in good code). The backend then can for those integer or unsigned integer <=>s return effectively (x > y) - (x < y) in a way that is efficient on the target (so for x86 with ensuring zero initialization first when needed before setcc; one for floating point and unsigned, where there is just one setcc and the second one optimized into sbb instruction, two for the signed int case). So e.g. for signed int we now emit xorl %edx, %edx xorl %eax, %eax cmpl %esi, %edi setl %dl setg %al subl %edx, %eax ret and for unsigned xorl %eax, %eax cmpl %esi, %edi seta %al sbbb $0, %al ret Note, I wonder if other targets wouldn't benefit from defining the named optab too... 2024-10-07 Jakub Jelinek <jakub@redhat.com> PR middle-end/116896 * optabs.def (spaceship_optab): Use spaceship$a4 rather than spaceship$a3. * internal-fn.cc (expand_SPACESHIP): Expect 3 call arguments rather than 2, expand the last one, expect 4 operands of spaceship_optab. * tree-ssa-math-opts.cc: Include cfghooks.h. (optimize_spaceship): Check if a single PHI is initialized to -1, 0, 1, 2 or -1, 0, 1 values, in that case pass 1 as last (new) argument to .SPACESHIP and optimize away the comparisons, otherwise pass 0. Also check for integer comparisons rather than floating point, in that case do it only if there is a single PHI with -1, 0, 1 values and pass 1 to last argument of .SPACESHIP if the <=> is signed, 2 if unsigned. * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Add another rtx argument. (ix86_expand_int_spaceship): Declare. * config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Add arg3 argument, if it is const0_rtx, expand like before, otherwise emit optimized sequence for setting the result into a GPR. (ix86_expand_int_spaceship): New function. * config/i386/i386.md (UNSPEC_SETCC_SI_SLP): New UNSPEC code. (setcc_si_slp): New define_expand. (*setcc_si_slp): New define_insn_and_split. (setcc + setcc + movzbl): New define_peephole2. (spaceship<mode>3): Renamed to ... (spaceship<mode>4): ... this. Add an extra operand, pass it to ix86_expand_fp_spaceship. (spaceshipxf3): Renamed to ... (spaceshipxf4): ... this. Add an extra operand, pass it to ix86_expand_fp_spaceship. (spaceship<mode>4): New define_expand for SWI modes. * doc/md.texi (spaceship@var{m}3): Renamed to ... (spaceship@var{m}4): ... this. Document the meaning of last operand. * g++.target/i386/pr116896-1.C: New test. * g++.target/i386/pr116896-2.C: New test.
2024-10-07OpenMP: Allocate directive for static vars, clean upTobias Burnus17-114/+469
For the 'allocate' directive, remove the sorry for static variables and just keep using normal memory, but honor the requested alignment and set a DECL_ATTRIBUTE in case a target may want to make use of this later on. The documentation is updated accordingly. The C diagnostic to check for predefined allocators (req. for static vars) failed to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was already okay; but both now use new common #defined value for checking.) And while Fortran common block variables are still rejected, the check has been improved as before the sorry diagnostic did not work for common blocks in modules. Finally, for 'allocate' clause on the target/task/taskloop directives, there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator with access = thread), which is undefined behavior according to the OpenMP specification. And, last, testing showed that var decl + static_assert sets TREE_USED but does not produce a statement list in C, which did run into an assert in gimplify. This special case is now also handled. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_allocate): Set alignment for alignof; accept static variables and fix predef allocator check. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Use gomp-constants.h consts. * trans-common.cc (translate_common): Reject OpenMP allocate directives. * trans-decl.cc (gfc_finish_var_decl): Handle allocate directive for static variables. (gfc_trans_deferred_vars): Update for the latter. gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP allocate directive. (gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used as allocator with the target/task/taskloop directive. include/ChangeLog: * gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX, GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX, GOMP_OMP_PREDEF_ALLOC_THREADS): New defines. libgomp/ChangeLog: * allocator.c: Add static asserts for news GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values. * libgomp.texi (OpenMP Impl. Status): Allocate directive for static vars is now supported. Refer to PR for allocate clause. (Memory allocation): Update for static vars; minor word tweaking. gcc/testsuite/ChangeLog: * c-c++-common/gomp/allocate-9.c: Update for removed sorry. * gfortran.dg/gomp/allocate-15.f90: Likewise. * gfortran.dg/gomp/allocate-pinned-1.f90: Likewise. * gfortran.dg/gomp/allocate-4.f90: Likewise; add dg-error for previously missing diagnostic. * c-c++-common/gomp/allocate-18.c: New test. * c-c++-common/gomp/allocate-19.c: New test. * gfortran.dg/gomp/allocate-clause.f90: New test. * gfortran.dg/gomp/allocate-static-2.f90: New test. * gfortran.dg/gomp/allocate-static.f90: New test.
2024-10-07Handle non-grouped stores as single-lane SLP: adjust 'gcc.dg/vect/slp-26.c', GCNThomas Schwinge1-2/+2
As of commit d34cda720988674bcf8a24267c9e1ec61335d6de "Handle non-grouped stores as single-lane SLP", we see for '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'): PASS: gcc.dg/vect/slp-26.c (test for excess errors) PASS: gcc.dg/vect/slp-26.c execution test PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 loops" 1 [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 gcc.dg/vect/slp-26.c: pattern found 2 times Apply the same change to 'amdgcn-*-*' as done for 'riscv_v'. gcc/testsuite/ * gcc.dg/vect/slp-26.c: Adjust GCN.
2024-10-07nvptx: Re-enable 'gcc.misc-tests/options.exp'Thomas Schwinge1-7/+3
..., just conditionalize its profiling test (as done elsewhere). The re-enabled test cases all PASS. For the record, for example for GCN target, this causes: Running [...]/gcc/testsuite/gcc.misc-tests/options.exp ... -PASS: compiler driver --coverage option(s) PASS: compiler driver -fdump-ipa-all-address option(s) PASS: compiler driver -fdump-ipa-all-alias option(s) PASS: compiler driver -fdump-ipa-all-all option(s) That was: Running [...]/gcc/testsuite/gcc.misc-tests/options.exp ... Executing on host: [xgcc] [...] --coverage [...] [...] ld: error: undefined symbol: __gcov_exit >>> referenced by /tmp/ccRGdqjA.o:(_sub_D_00100_1) >>> referenced by /tmp/ccRGdqjA.o:(_sub_D_00100_1) collect2: error: ld returned 1 exit status compiler exited with status 1 output is: [...] PASS: compiler driver --coverage option(s) ..., so that's nothing to worry about. gcc/testsuite/ * gcc.misc-tests/options.exp: Re-enable for nvptx.
2024-10-07nvptx: Re-enable all variants of ↵Thomas Schwinge2-2/+0
'c-c++-common/torture/complex-sign-mixed-add.c', 'c-c++-common/torture/complex-sign-mixed-sub.c' PASS with: $ ptxas --version ptxas: NVIDIA (R) Ptx optimizing assembler Copyright (c) 2005-2018 NVIDIA Corporation Built on Sun_Sep__9_21:06:46_CDT_2018 Cuda compilation tools, release 10.0, V10.0.145 ..., and execution with 'Driver Version: 361.93.02'. gcc/testsuite/ * c-c++-common/torture/complex-sign-mixed-add.c: Re-enable all variants for nvptx. * c-c++-common/torture/complex-sign-mixed-sub.c: Likewise.
2024-10-07nvptx: Re-enable 'gcc.dg/special/weak-2.c'Thomas Schwinge1-4/+0
PASSes with: $ ptxas --version ptxas: NVIDIA (R) Ptx optimizing assembler Copyright (c) 2005-2018 NVIDIA Corporation Built on Sun_Sep__9_21:06:46_CDT_2018 Cuda compilation tools, release 10.0, V10.0.145 ..., and execution with 'Driver Version: 361.93.02'. gcc/testsuite/ * gcc.dg/special/weak-2.c: Re-enable for nvptx.
2024-10-07nvptx: Re-enable all variants of 'gcc.c-torture/execute/20020529-1.c'Thomas Schwinge1-4/+0
Generally PASSes with: $ ptxas --version ptxas: NVIDIA (R) Ptx optimizing assembler Copyright (c) 2005-2018 NVIDIA Corporation Built on Sun_Sep__9_21:06:46_CDT_2018 Cuda compilation tools, release 10.0, V10.0.145 ..., and execution with 'Driver Version: 361.93.02'. Only the '-O1' execution test FAILs (pre-existing; to be analyzed later): nvptx-run: error getting kernel result: an illegal memory access was encountered (CUDA_ERROR_ILLEGAL_ADDRESS, 700) gcc/testsuite/ * gcc.c-torture/execute/20020529-1.c: Re-enable all variants for nvptx.
2024-10-07nvptx: Disable effective-target 'freestanding'Thomas Schwinge5-3/+4
After 2014's commit 157e859ffe3b5d43db1e19475711c1a3d21ab57a "remove picochip", the effective-target 'freestanding' (later) was only ever used for nvptx. However, the relevant I/O library functions have long been implemented in nvptx newlib. These test cases generally PASS, just a few need to get XFAILed; see <https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/#system-calls>, and then supposedly <https://docs.nvidia.com/cuda/cuda-c-programming-guide/#formatted-output> for description of the non-standard PTX 'vprintf' return value: > Unlike the C-standard 'printf()', which returns the number of characters > printed, CUDA's 'printf()' returns the number of arguments parsed. If no > arguments follow the format string, 0 is returned. If the format string is > NULL, -1 is returned. If an internal error occurs, -2 is returned. (I've tried a few variants to confirm that PTX 'vprintf' -- which supposedly is underlying the CUDA 'printf' -- is what's implementing this behavior.) Probably, we ought to fix that up in nvptx newlib. gcc/testsuite/ * gcc.c-torture/execute/printf-1.c: XFAIL for nvptx. * gcc.c-torture/execute/printf-chk-1.c: Likewise. * gcc.c-torture/execute/vprintf-1.c: Likewise. * gcc.c-torture/execute/vprintf-chk-1.c: Likewise. * lib/target-supports.exp (check_effective_target_freestanding): Disable for nvptx.
2024-10-07nvptx: Re-enable "ptxas times out" test casesThomas Schwinge8-10/+5
These are all quick to compile and generally PASS with: $ ptxas --version ptxas: NVIDIA (R) Ptx optimizing assembler Copyright (c) 2005-2018 NVIDIA Corporation Built on Sun_Sep__9_21:06:46_CDT_2018 Cuda compilation tools, release 10.0, V10.0.145 Only 'gcc.c-torture/compile/limits-fndefn.c' at '-O0' still has an issue, as indicated. Working around that with '-Wa,--no-verify', for now. gcc/testsuite/ * gcc.c-torture/compile/920501-4.c: Re-enable nvptx "ptxas times out" variants. * gcc.c-torture/compile/921011-1.c: Likewise. * gcc.c-torture/compile/pr34334.c: Likewise. * gcc.c-torture/compile/pr37056.c: Likewise. * gcc.c-torture/compile/pr39423-1.c: Likewise. * gcc.c-torture/compile/pr49049.c: Likewise. * gcc.c-torture/compile/pr59417.c: Likewise. * gcc.c-torture/compile/limits-fndefn.c: Likewise. Specify '-Wa,--no-verify' for nvptx '-O0'.
2024-10-07nvptx: Re-enable 'gcc.c-torture/compile/20080721-1.c'Thomas Schwinge1-1/+0
PASSes with: $ ptxas --version ptxas: NVIDIA (R) Ptx optimizing assembler Copyright (c) 2005-2018 NVIDIA Corporation Built on Sun_Sep__9_21:06:46_CDT_2018 Cuda compilation tools, release 10.0, V10.0.145 gcc/testsuite/ * gcc.c-torture/compile/20080721-1.c: Re-enable for nvptx.
2024-10-07Daily bump.GCC Administrator3-1/+29
2024-10-06testsuite: Require lto in three testsJohn David Anglin3-0/+3
2024-10-06 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept87.C: Require lto. * g++.dg/ext/pragma-unroll-lambda-lto.C: Likewise. * gcc.dg/enum-alias-3.c: Likewise.
2024-10-06hppa: Use stack slot SP-40 to copy between integer and floating-point registersJohn David Anglin2-26/+34
2024-10-06 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa-64.h (PA_SECONDARY_MEMORY_NEEDED): Define to false. Update comment. * config/pa/pa.md: Modify 64-bit move patterns to support copying between integer and floating-point registers using stack slot SP-40.
2024-10-06Add single-lane SLP support to .GOMP_SIMD_LANE vectorizationRichard Biener2-2/+35
The following adds basic support for single-lane SLP .GOMP_SIMD_LANE vectorization, in particular it enables SLP discovery. * tree-vect-slp.cc (no_arg_map): New. (vect_get_operand_map): Handle IFN_GOMP_SIMD_LANE. (vect_build_slp_tree_1): Likewise. * tree-vect-stmts.cc (vectorizable_call): Handle single-lane SLP for .GOMP_SIMD_LANE calls.