aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
12 daysada: Missing component clause warning for discriminant of Unchecked_Union typeSteve Baird1-2/+41
Even when -gnatw.c is enabled, no warning about a missing component clause should be generated if the placement of a discriminant of an Unchecked_Union type is left unspecified in a record representation clause (such a discriminant occupies no storage). In determining whether to generate such a warning, in some cases the compiler would incorrectly ignore an Unchecked_Union pragma occurring after the record representation clause. This could result in a spurious warning. gcc/ada/ChangeLog: * sem_ch13.adb (Analyze_Record_Representation_Clause): In deciding whether to generate a warning about a missing component clause, in addition to calling Is_Unchecked_Union also call a new local function, Unchecked_Union_Pragma_Pending, which checks for the case of a not-yet-analyzed Unchecked_Union pragma occurring later in the declaration list.
12 daysada: Improved error message when size of descendant type exceeds Size'Class ↵Steve Baird1-20/+40
limit Improve the error message that is generated when the size of tagged type exceeds a Size'Class limit specified for an ancestor type. gcc/ada/ChangeLog: * mutably_tagged.adb (Make_CW_Size_Compile_Check): Include the value of the Size'Class limit in the message generated via a Compile_Time_Error pragma.
12 daysada: Remove leftover from rework of aspect representationRonan Desplanques1-11/+3
This patch removes some comments and object definitions that referred to a hacky use of the Entity field that had been removed by the latest rework of the internal representation of aspects. gcc/ada/ChangeLog: * sem_ch13.adb (Check_Aspect_At_Freeze_Point): Remove obsolete bits.
12 daysada: Fix error on Designated_Storage_Model with extensions disabledRonan Desplanques1-0/+1
The format string used for the error in that case requires setting the Error_Msg_Name_1 global variable. This was not done so this patch adds the missing assignment. gcc/ada/ChangeLog: * sem_ch13.adb (Analyze_Aspect_Specifications): Fix error emission.
12 daysRegenerate common.opt.urls and add period into common.optJan Hubicka2-1/+4
gcc/ChangeLog: * common.opt: Add period. * common.opt.urls: Regenerate.
12 daystree-optimization/120927 - 510.parest_r segfault with masked epilogRichard Biener3-4/+60
The following fixes bad alignment computaton for epilog vectorization when as in this case for 510.parest_r and masked epilog vectorization with AVX512 we end up choosing AVX to vectorize the main loop and masked AVX512 (sic!) to vectorize the epilog. In that case alignment analysis for the epilog tries to force alignment of the base to 64, but that cannot possibly help the epilog when the main loop had used a vector mode with smaller alignment requirement. There's another issue, that the check whether the step preserves alignment needs to consider possibly previously involved VFs (here, the main loops smaller VF) as well. These might not be the only case with problems for such a mode mix but at least there it seems wise to never use DR alignment forcing when analyzing an epilog. We get to chose this mode setup because the iteration over epilog modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0], first_vinfo_vf) skip is conditional on !supports_partial_vectors and it is also conditional on having a cached VF. Further nothing in vect_analyze_loop_1 rejects this setup - it might be conceivable that a target can do masking only for larger modes. There is a second reason we end up with this mode setup, which is that vect_need_peeling_or_partial_vectors_p says we do not need peeling or partial vectors when analyzing the main loop with AVX512 (if it would say so we'd have chosen a masked AVX512 epilog-only vectorization). It does that because it looks at LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so always zero at this point), and compares max_niter (5) against the VF (8), but not with equality as the comment says but with greater. This also needs looking at, PR120939. PR tree-optimization/120927 * tree-vect-data-refs.cc (vect_compute_data_ref_alignment): Do not force a DRs base alignment when analyzing an epilog loop. Check whether the step preserves alignment for all VFs possibly involved sofar. * gcc.dg/vect/vect-pr120927.c: New testcase. * gcc.dg/vect/vect-pr120927-2.c: Likewise.
12 daysc-family: Tweak ptr +- (expr +- cst) FE optimization [PR120837]Jakub Jelinek2-14/+67
The following testcase is miscompiled with -fsanitize=undefined but we introduce UB into the IL even without that flag. The optimization ptr +- (expr +- cst) when expr/cst have undefined overflow into (ptr +- cst) +- expr is sometimes simply not valid, without careful analysis on what ptr points to we don't know if it is valid to do (ptr +- cst) pointer arithmetics. E.g. on the testcase, ptr points to start of an array (actually conditionally one or another) and cst is -1, so ptr - 1 is invalid pointer arithmetics, while ptr + (expr - 1) can be valid if expr is at runtime always > 1 and smaller than size of the array ptr points to + 1. Unfortunately, removing this 1992-ish optimization altogether causes FAIL: c-c++-common/restrict-2.c -Wc++-compat scan-tree-dump-times lim2 "Moving statement" 11 FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump ch2 "is now do-while loop" FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump-times ch2 " if " 3 FAIL: gcc.dg/vect/pr57558-2.c scan-tree-dump vect "vectorized 1 loops" FAIL: gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" regressions (restrict-2.c also for C++ in all std modes). I've been thinking about some match.pd optimization for signed integer addition/subtraction of constant followed by widening integral conversion followed by multiplication or left shift, but that wouldn't help 32-bit arches. So, instead at least for now, the following patch keeps doing the optimization, just doesn't perform it in pointer arithmetics. pointer_int_sum itself actually adds the multiplication by size_exp, so ptr + expr is turned into ptr p+ expr * size_exp, so this patch will try to optimize ptr + (expr +- cst) into ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp) and ptr - (expr +- cst) into ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp) 2025-07-04 Jakub Jelinek <jakub@redhat.com> PR c/120837 * c-common.cc (pointer_int_sum): Rewrite the intop PLUS_EXPR or MINUS_EXPR optimization into extension of both intop operands, their separate multiplication and then addition/subtraction followed by rest of pointer_int_sum handling after the multiplication. * gcc.dg/ubsan/pr120837.c: New test.
12 daystestsuite: Rename a testXi Ruoyao1-0/+0
I mistyped the file name :(. gcc/testsuite/ChangeLog: PR target/120807 * gcc.c-torture/compile/pr120708.c: Rename to ... * gcc.c-torture/compile/pr120807.c: ... here.
12 daysLoongArch: Prevent subreg of subreg in CRCXi Ruoyao2-1/+22
The register_operand predicate can match subreg, then we'd have a subreg of subreg and it's invalid. Use lowpart_subreg to avoid the nested subreg. gcc/ChangeLog: * config/loongarch/loongarch.md (crc_combine): Avoid nested subreg. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr120708.c: New test.
12 days[RISC-V] Add basic instrumentation to fusion detectionShreya Munnangi1-16/+64
We were looking to evaluate some changes from Artemiy that improve GCC's ability to discover fusible instruction pairs. There was no good way to get any static data out of the compiler about what kinds of fusions were happening. Yea, you could grub around the .sched dumps looking for the magic '+' annotation, then look around at the slim RTL representation and make an educated guess about what fused. But boy that was inconvenient. All we really needed was a quick note in the dump file that the target hook found a fusion pair and what kind was discovered. That made it easy to spot invalid fusions, evaluate the effectiveness of Artemiy's work, write/discover testcases for existing fusions and implement new fusions. So from a codegen standpoint this is NFC, it only affects dump file output. It's gone through the usual testing and I'll wait for pre-commit CI to churn through it before moving forward. gcc/ * config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Add basic instrumentation to all cases where fusion is detected. Fix minor formatting goofs found in the process.
12 daysRISC-V: Add testcases for signed scalar SAT_ADD IMM form 2panciyan12-0/+440
This patch adds testcase for form2, as shown below: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_2##_##INDEX (T x) \ { \ T sum = (T)((UT)x + (UT)IMM); \ return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \ (-(T)(x < 0) ^ MAX) : sum; \ } Passed the rv64gcv regression test. Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add signed scalar SAT_ADD IMM form2. * gcc.target/riscv/sat/sat_s_add_imm-2-i16.c: New test. * gcc.target/riscv/sat/sat_s_add_imm-2-i32.c: New test. * gcc.target/riscv/sat/sat_s_add_imm-2-i64.c: New test. * gcc.target/riscv/sat/sat_s_add_imm-2-i8.c: New test. * gcc.target/riscv/sat/sat_s_add_imm-run-2-i16.c: New test. * gcc.target/riscv/sat/sat_s_add_imm-run-2-i32.c: New test. * gcc.target/riscv/sat/sat_s_add_imm-run-2-i64.c: New test. * gcc.target/riscv/sat/sat_s_add_imm-run-2-i8.c: New test. * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i16.c: New test. * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i32.c: New test. * gcc.target/riscv/sat/sat_s_add_imm_type_check-2-i8.c: New test.
12 daysMatch: Support for signed scalar SAT_ADD IMM form 2panciyan1-1/+12
This patch would like to support signed scalar SAT_ADD IMM form 2 Form2: T __attribute__((noinline)) \ sat_s_add_imm_##T##_fmt_2##_##INDEX (T x) \ { \ T sum = (T)((UT)x + (UT)IMM); \ return ((x ^ sum) < 0 && (x ^ IMM) >= 0) ? \ (-(T)(x < 0) ^ MAX) : sum; \ } Take below form1 as example: DEF_SAT_S_ADD_IMM_FMT_2(0, int8_t, uint8_t, 9, INT8_MIN, INT8_MAX) Before this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x) { int8_t sum; unsigned char x.0_1; unsigned char _2; signed char _3; signed char _4; _Bool _5; signed char _6; int8_t _7; int8_t _10; signed char _11; signed char _13; signed char _14; <bb 2> [local count: 1073741822]: x.0_1 = (unsigned char) x_8(D); _2 = x.0_1 + 9; sum_9 = (int8_t) _2; _3 = x_8(D) ^ sum_9; _4 = x_8(D) ^ 9; _13 = ~_3; _14 = _4 | _13; if (_14 >= 0) goto <bb 3>; [59.00%] else goto <bb 4>; [41.00%] <bb 3> [local count: 259738146]: _5 = x_8(D) < 0; _11 = (signed char) _5; _6 = -_11; _10 = _6 ^ 127; <bb 4> [local count: 1073741824]: # _7 = PHI <sum_9(2), _10(3)> return _7; } After this patch: __attribute__((noinline)) int8_t sat_s_add_imm_int8_t_fmt_2_0 (int8_t x) { int8_t _7; <bb 2> [local count: 1073741824]: _7 = .SAT_ADD (x_8(D), 9); [tail call] return _7; } The below test suites are passed for this patch: 1. The rv64gcv fully regression tests. 2. The x86 bootstrap tests. 3. The x86 fully regression tests. Signed-off-by: Ciyan Pan <panciyan@eswincomputing.com> gcc/ChangeLog: * match.pd: Add signed scalar SAT_ADD IMM form2 matching.
12 daysDaily bump.GCC Administrator6-1/+637
12 daysc++: trivial lambda pruning [PR120716]Jason Merrill3-1/+26
In this testcase there is nothing in the lambda except a static_assert which mentions a variable from the enclosing scope but does not odr-use it, so we want prune_lambda_captures to remove its capture. Since the lambda is so empty, there's nothing in the body except the DECL_EXPR of the capture proxy, so pop_stmt_list moves that into the enclosing STATEMENT_LIST and passes the 'body' STATEMENT_LIST to free_stmt_list. As a result, passing 'body' to prune_lambda_captures is wrong; we should instead pass the enclosing scope, i.e. cur_stmt_list. PR c++/120716 gcc/cp/ChangeLog: * lambda.cc (finish_lambda_function): Pass cur_stmt_list to prune_lambda_captures. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-constexpr3.C: New test. * g++.dg/cpp0x/lambda/lambda-constexpr3a.C: New test.
12 daysc++: ICE with 'this' in lambda signature [PR120748]Jason Merrill4-6/+49
This testcase was crashing from infinite recursion in the diagnostic machinery, trying to print the lambda signature, which referred to the __this capture field in the lambda, which wanted to print the lambda again. But we don't want the signature to refer to the capture field; 'this' in an unevaluated context refers to the 'this' from the enclosing function, not the capture. After fixing that, we still wrongly rejected the B case because THIS_FORBIDDEN is set in a default (template) argument. Since we don't distinguish between THIS_FORBIDDEN being set for a default argument and it being set for a static member function, let's just ignore it if cp_unevaluated_operand; we'll give a better diagnostic for the static memfn case in finish_this_expr. PR c++/120748 gcc/cp/ChangeLog: * lambda.cc (lambda_expr_this_capture): Don't return a FIELD_DECL. * parser.cc (cp_parser_primary_expression): Ignore THIS_FORBIDDEN if cp_unevaluated_operand. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-targ16.C: New test. * g++.dg/cpp0x/this1.C: Adjust diagnostics.
12 daysc++: Fix a pasto in the PR120471 fix [PR120940]Jakub Jelinek3-1/+30
No idea how this slipped in, I'm terribly sorry. Strangely nothing in the testsuite has caught this, so I've added a new test for that. 2025-07-03 Jakub Jelinek <jakub@redhat.com> PR c++/120940 * typeck.cc (cp_build_array_ref): Fix a pasto. * g++.dg/parse/pr120940.C: New test. * g++.dg/warn/Wduplicated-branches9.C: New test.
12 daysAda: Remove left-overs of front-end exception mechanismEric Botcazou2-31/+0
It was removed from the compiler a few releases ago. gcc/ada/ * gcc-interface/Makefile.in (gnatlib-sjlj): Delete. (gnatlib-zcx): Do not modify Frontend_Exceptions constant. * libgnat/system-linux-loongarch.ads (Frontend_Exceptions): Delete.
12 dayss390: More vec-perm-const cases.Juergen Christ3-2/+542
s390 missed constant vector permutation cases based on the vector pack instruction or changing the size of the vector elements during vector merge. This enables some more patterns that do not need to load a constant vector for permutation. gcc/ChangeLog: * config/s390/s390.cc (expand_perm_with_merge): Add size change cases. (expand_perm_with_pack): New function. (vectorize_vec_perm_const_1): Wire up new function. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-perm-merge-1.c: New test. * gcc.target/s390/vector/vec-perm-pack-1.c: New test. Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
12 daysOpenMP: Add omp_get_initial_device/omp_get_num_devices builtins: Fix test casesThomas Schwinge2-4/+4
With this fix-up for commit 387209938d2c476a67966c6ddbdbf817626f24a2 "OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins", we progress: PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c (test for excess errors) PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-not optimized "abort" -FAIL: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices;" 1 +PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices" 1 PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump optimized "_1 = __builtin_omp_get_num_devices \\(\\);[\\r\\n]+[ ]+return _1;" ... etc. for offloading configurations. gcc/testsuite/ * c-c++-common/gomp/omp_get_num_devices_initial_device.c: Fix. * gfortran.dg/gomp/omp_get_num_devices_initial_device.f90: Likewise.
12 days[RISC-V][PR target/118886] Refine when two insns are signaled as fusion ↵Jeff Law1-57/+80
candidates A number of folks have had their fingers in this code and it's going to take a few submissions to do everything we want to do. This patch is primarily concerned with avoiding signaling that fusion can occur in cases where it obviously should not be signaling fusion. Every DEC based fusion I'm aware of requires the first instruction to set a destination register that is both used and set again by the second instruction. If the two instructions set different registers, then the destination of the first instruction was not dead and would need to have a result produced. This is complicated by the fact that we have pseudo registers prior to reload. So the approach we take is to signal fusion prior to reload even if the destination registers don't match. Post reload we require them to match. That allows us to clean up the code ever-so-slightly. Second, we sometimes signaled fusion into loads that weren't scalar integer loads. I'm not aware of a design that's fusing into FP loads or vector loads. So those get rejected explicitly. Third, the store pair "fusion" code is cleaned up a little. We use fusion to model store pair commits since the basic properties for detection are the same. The point where they "fuse" is different. Also this code liked to "return false" at each step along the way if fusion wasn't possible. Future work for additional fusion cases makes that behavior undesirable. So the logic gets reworked a little bit to be more friendly to future work. Fourth, if we already fused the previous instruction, then we can't fuse it again. Signaling fusion in that case is, umm, bad as it creates an atomic blob of code from a scheduling standpoint. Hopefully I got everything correct with extracting this work out of a larger set of changes 🙂 We will contribute some instrumentation & testing code so if I botched things in a major way we'll soon have a way to test that and I'll be on the hook to fix any goof's. From a correctness standpoint this should be a big fat nop. We've seen this make measurable differences in pico benchmarks, but obviously as you scale up to bigger stuff the gains largely disappear into the noise. This has been through Ventana's internal CI and my tester. I'll obviously wait for a verdict from the pre-commit tester. PR target/118886 gcc/ * config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Check for fusion being disabled earlier. If PREV is already fused, then it can't be fused again. Be more selective about fusing when the destination registers do not match. Don't fuse into loads that aren't scalar integer modes. Revamp store pair commit support. Co-authored-by: Daniel Barboza <dbarboza@ventanamicro.com> Co-authored-by: Shreya Munnangi <smunnangi1@ventanamicro.com>
12 daystestsuite: Fix gcc.dg/ipa/pr120295.c on SolarisRainer Orth1-2/+2
gcc.dg/ipa/pr120295.c FAILs on Solaris: FAIL: gcc.dg/ipa/pr120295.c (test for excess errors) Excess errors: ld: warning: symbol 'glob' has differing types: (file /var/tmp//ccsDR59c.o type=OBJT; file /lib/libc.so type=FUNC); /var/tmp//ccsDR59c.o definition taken Fixed by renaming the glob variable to glob_ to avoid the conflict. Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu. gcc/testsuite: * gcc.dg/ipa/pr120295.c (glob): Rename to glob_.
12 daysAArch64: make rules for CBZ/TBZ higher priorityKarl Meakin2-92/+105
Move the rules for CBZ/TBZ to be above the rules for CBB<cond>/CBH<cond>/CB<cond>. We want them to have higher priority because they can express larger displacements. gcc/ChangeLog: * config/aarch64/aarch64.md (aarch64_cbz<optab><mode>1): Move above rules for CBB<cond>/CBH<cond>/CB<cond>. (*aarch64_tbz<optab><mode>1): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cmpbr.c: Update tests.
12 daysAArch64: rules for CMPBR instructionsKarl Meakin6-447/+450
Add rules for lowering `cbranch<mode>4` to CBB<cond>/CBH<cond>/CB<cond> when CMPBR extension is enabled. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function. * config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise. * config/aarch64/aarch64.md (cbranch<mode>4): Rename to ... (cbranch<GPI:mode>4): ...here, and emit CMPBR if possible. (cbranch<SHORT:mode>4): New expand rule. (aarch64_cb<INT_CMP:code><GPI:mode>): New insn rule. (aarch64_cb<INT_CMP:code><SHORT:mode>): Likewise. * config/aarch64/constraints.md (Uc0): New constraint. (Uc1): Likewise. (Uc2): Likewise. * config/aarch64/iterators.md (cmpbr_suffix): New mode attr. (INT_CMP): New code iterator. (cmpbr_imm_constraint): New code attr. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cmpbr.c:
12 daysAArch64: precommit test for CMPBR instructionsKarl Meakin2-6/+1999
Commit the test file `cmpbr.c` before rules for generating the new instructions are added, so that the changes in codegen are more obvious in the next commit. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add `cmpbr` to the list of extensions. * gcc.target/aarch64/cmpbr.c: New test.
12 daysAArch64: recognize `+cmpbr` optionKarl Meakin3-0/+8
Add the `+cmpbr` option to enable the FEAT_CMPBR architectural extension. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def (cmpbr): New option. * config/aarch64/aarch64.h (TARGET_CMPBR): New macro. * doc/invoke.texi (cmpbr): New option.
12 daysAArch64: make `far_branch` attribute a booleanKarl Meakin1-12/+10
The `far_branch` attribute only ever takes the values 0 or 1, so make it a `no/yes` valued string attribute instead. gcc/ChangeLog: * config/aarch64/aarch64.md (far_branch): Replace 0/1 with no/yes. (aarch64_bcond): Handle rename. (aarch64_cbz<optab><mode>1): Likewise. (*aarch64_tbz<optab><mode>1): Likewise. (@aarch64_tbz<optab><ALLI:mode><GPI:mode>): Likewise.
12 daysAArch64: add constants for branch displacementsKarl Meakin1-16/+44
Extract the hardcoded values for the minimum PC-relative displacements into named constants and document them. gcc/ChangeLog: * config/aarch64/aarch64.md (BRANCH_LEN_P_1MiB): New constant. (BRANCH_LEN_N_1MiB): Likewise. (BRANCH_LEN_P_32KiB): Likewise. (BRANCH_LEN_N_32KiB): Likewise.
12 daysAArch64: rename branch instruction rulesKarl Meakin4-15/+18
Give the `define_insn` rules used in lowering `cbranch<mode>4` to RTL more descriptive and consistent names: from now on, each rule is named after the AArch64 instruction that it generates. Also add comments to document each rule. gcc/ChangeLog: * config/aarch64/aarch64.md (condjump): Rename to ... (aarch64_bcond): ...here. (*compare_condjump<GPI:mode>): Rename to ... (*aarch64_bcond_wide_imm<GPI:mode>): ...here. (aarch64_cb<optab><mode>): Rename to ... (aarch64_cbz<optab><mode>1): ...here. (*cb<optab><mode>1): Rename to ... (*aarch64_tbz<optab><mode>1): ...here. (@aarch64_tb<optab><ALLI:mode><GPI:mode>): Rename to ... (@aarch64_tbz<optab><ALLI:mode><GPI:mode>): ...here. (restore_stack_nonlocal): Handle rename. (stack_protect_combined_test): Likewise. * config/aarch64/aarch64-simd.md (cbranch<mode>4): Likewise. * config/aarch64/aarch64-sme.md (aarch64_restore_za): Likewise. * config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): Likewise.
12 daysAArch64: reformat branch instruction rulesKarl Meakin1-42/+42
Make the formatting of the RTL templates in the rules for branch instructions more consistent with each other. gcc/ChangeLog: * config/aarch64/aarch64.md (cbranch<mode>4): Reformat. (cbranchcc4): Likewise. (condjump): Likewise. (*compare_condjump<GPI:mode>): Likewise. (aarch64_cb<optab><mode>1): Likewise. (*cb<optab><mode>1): Likewise. (tbranch_<code><mode>3): Likewise. (@aarch64_tb<optab><ALLI:mode><GPI:mode>): Likewise.
12 daysAArch64: place branch instruction rules togetherKarl Meakin1-186/+201
The rules for conditional branches were spread throughout `aarch64.md`. Group them together so it is easier to understand how `cbranch<mode>4` is lowered to RTL. gcc/ChangeLog: * config/aarch64/aarch64.md (condjump): Move. (*compare_condjump<GPI:mode>): Likewise. (aarch64_cb<optab><mode>1): Likewise. (*cb<optab><mode>1): Likewise. (tbranch_<code><mode>3): Likewise. (@aarch64_tb<optab><ALLI:mode><GPI:mode>): Likewise.
12 daystree-optimization/120780: Support object size for containing objectsSiddhesh Poyarekar2-1/+322
MEM_REF cast of a subobject to its containing object has negative offsets, which objsz sees as an invalid access. Support this use case by peeking into the structure to validate that the containing object indeed contains a type of the subobject at that offset and if present, adjust the wholesize for the object to allow the negative offset. gcc/ChangeLog: PR tree-optimization/120780 * tree-object-size.cc (inner_at_offset, get_wholesize_for_memref): New functions. (addr_object_size): Call get_wholesize_for_memref. gcc/testsuite/ChangeLog: PR tree-optimization/120780 * gcc.dg/builtin-dynamic-object-size-pr120780.c: New test case. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
12 daysx86: Emit label only for __mcount_loc sectionH.J. Lu15-23/+261
commit ecc81e33123d7ac9c11742161e128858d844b99d Author: Andi Kleen <ak@linux.intel.com> Date: Fri Sep 26 04:06:40 2014 +0000 Add direct support for Linux kernel __fentry__ patching emitted a label, 1, for __mcount_loc section: 1: call mcount .section __mcount_loc, "a",@progbits .quad 1b .previous If __mcount_loc wasn't used, we got an unused label. Update x86_function_profiler to emit label only when __mcount_loc section is used. gcc/ PR target/120936 * config/i386/i386.cc (x86_print_call_or_nop): Add a label argument and use it to print label. (x86_function_profiler): Emit label only when __mcount_loc section is used. gcc/testsuite/ PR target/120936 * gcc.target/i386/pr120936-1.c: New test * gcc.target/i386/pr120936-2.c: Likewise. * gcc.target/i386/pr120936-3.c: Likewise. * gcc.target/i386/pr120936-4.c: Likewise. * gcc.target/i386/pr120936-5.c: Likewise. * gcc.target/i386/pr120936-6.c: Likewise. * gcc.target/i386/pr120936-7.c: Likewise. * gcc.target/i386/pr120936-8.c: Likewise. * gcc.target/i386/pr120936-9.c: Likewise. * gcc.target/i386/pr120936-10.c: Likewise. * gcc.target/i386/pr120936-11.c: Likewise. * gcc.target/i386/pr120936-12.c: Likewise. * gcc.target/i386/pr93492-3.c: Updated. * gcc.target/i386/pr93492-5.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
12 daysAdd -Wauto-profile warningJan Hubicka3-115/+684
this patch adds new warning -Wauto-profile which warns about mismatches between profile data and function bodies. This is implemented during the offline pass where every function instance is compared with actual gimple body (if available) and we verify that the statement locations in the profile data can be matched with statements in the function. Currently it is mostly useful to find bugs, but eventually I hope it will be useful for users to verify that auto-profile works as expected or to evaulate how much of an old auto-profile data can still be applied to current sources. There will probably be always some side cases we can not handle with auto-profile format (such as function with bodies in mutlple files) that can be patched in compiled program. I also added logic to fix up missing discriminators in the function callsites. I am not sure how those happens (but seem to go away with -fno-crossjumping) and will dig into it. Ohter problem is that without -flto at the train run inlined functions have dwarf names rather than symbol names. LLVM solves this by -gdebug-for-autoprofile flag that we could also have. With this flag we could output assembler names as well as multiplicities of statemnets. Building SPECint there are approx 7k profile mismatches. Bootstrapped/regtested x86_64-linux. Plan to commit it after some extra testing. gcc/ChangeLog: * auto-profile.cc (get_combined_location): Handle negative offsets; output better diagnostics. (get_relative_location_for_locus): Reutrn -1 for unknown location. (function_instance::get_cgraph_node): New member function. (match_with_target): New function. (dump_stmt): New function. (function_instance::lookup_count): New function. (mark_expr_locations): New function. (function_instance::match): New function. (autofdo_source_profile::offline_external_functions): Do not repeat renaming; manage two worklists and do matching. (autofdo_source_profile::offline_unrealized_inlines): Simplify. (afdo_set_bb_count): do not look for lost discriminators. (auto_profile): Do not ICE when profile reading failed. * common.opt (Wauto-profile): New warning flag * doc/invoke.texi (-Wauto-profile): Document.
12 daysMake inliner loop hints more agressiveJan Hubicka1-6/+6
This patch makes loop inline hints more agressive. If we know iteration count or stride, we currently assume improvement in time relative to preheader count. I changed it to header count, since this knowledge is supposed to likely help unrolling and vectorizing which brings benefits relative to that. * ipa-fnsummary.cc (analyze_function_body): For loop heuristics use header count instead of preheader count.
12 daysFix division by zero in ipa-cp.cc:update_profiling_infoJan Hubicka1-5/+6
This ICE has triggered for me during autoprofiledbootstrap. The code already takes into care possible range, so I think in this case we can just push to one side of it. Bootstrapped/regtesed x86_64-linux, OK? gcc/ChangeLog: * ipa-cp.cc (update_profiling_info): Watch for division by zero.
12 daysFortran: Remove corank conformability checks [PR120843]Andre Vehreschild2-32/+10
Remove the checks on coranks conformability in expressions, because there is nothing in the standard about it. When a coarray has no coindexes it it treated like a non-coarray, when it has a full-corank coindex its result is a regular array. So nothing to check for corank conformability. PR fortran/120843 gcc/fortran/ChangeLog: * resolve.cc (resolve_operator): Remove conformability check, because it is not in the standard. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/coindexed_6.f90: Enhance test to have coarray components covered.
13 daysaarch64: Drop const_int from aarch64_maskload_else_operandAlex Coplan2-3/+3
The "else operand" to maskload should always be a const_vector, never a const_int. This was just an issue I noticed while looking through the code, I don't have a testcase which shows a concrete problem due to this. Testing of that change alone showed ICEs with load lanes vectorization and SVE. That turned out to be because the backend pattern was missing a mode for the else operand (causing the middle-end to choose a const_int during expansion), fixed thusly. That in turn exposed an issue with the unpredicated load lanes expander which was using the wrong mode for the else operand, so fixed that too. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (vec_load_lanes<mode><vsingle>): Expand else operand in subvector mode, as per optab documentation. (vec_mask_load_lanes<mode><vsingle>): Add missing mode for operand 3. * config/aarch64/predicates.md (aarch64_maskload_else_operand): Remove const_int.
13 daysdoc: Clarify mode of else operand for vec_mask_load_lanesmnAlex Coplan1-2/+2
This extends the documentation of the vec_mask_load_lanes<m><n> optab to explicitly state that the mode of the else operand is n, i.e. the mode of a single subvector. gcc/ChangeLog: * doc/md.texi (Standard Names): Clarify mode of else operand for vec_mask_load_lanesmn optab.
13 daysEnable ipa-cp cloning for cold wrappers of hot functionsJan Hubicka1-2/+6
ipa-cp cloning disables itself for all functions not passing opt_for_fn (node->decl, optimize_size) which disables it for cold wrappers of hot functions where we want to propagate. Since we later want to time saved to be considered hot, we do not need to make this early test. The patch also fixes few other places where AFDO 0 disables ipa-cp. gcc/ChangeLog: * ipa-cp.cc (cs_interesting_for_ipcp_p): Handle correctly GLOBAL0 afdo counts. (ipcp_cloning_candidate_p): Do not rule out nodes !node->optimize_for_size_p (). (good_cloning_opportunity_p): Handle afdo counts as non-zero.
13 daysFix overlfow in ipa-cp heuristicsJan Hubicka1-7/+7
ipa-cp converts sreal times to int, while point of sreal is to accomodate very large values that can happen for loops with large number of iteraitons and also when profile is inconsistent. This happens with afdo in testsuite where loop preheader is estimated to have 0 excutions while loop body has large number of executions. Bootstrapped/regtesed x86_64-linux, comitted. gcc/ChangeLog: * ipa-cp.cc (hint_time_bonus): Return sreal and avoid conversions to integer. (good_cloning_opportunity_p): Avoid sreal to integer conversions (perform_estimation_of_a_value): Update.
13 daysAuto-FDO/FDO profile comparatorJan Hubicka8-37/+170
the patch I sent from airport only worked if you produced the gcda files with unpatched compiler. For some reason auto-profile reading is interwinded into gcov reading which is not necessary. Here is cleaner version which also makes the format bit more convenient. One can now grep as: grep "bb.*fdo.*very hot.*cold" *.profile | sort -n -k 5 -r | less digits_2/30 bb 307 fdo 10273284651 (very hot) afdo 0 (auto FDO) (cold) scaled 0 diff -10273284651, -100.00% digits_2/30 bb 201 fdo 2295561442 (very hot) afdo 19074 (auto FDO) (cold) scaled 1341585 diff -2294219857, -99.94% digits_2/30 bb 203 fdo 1236123372 (very hot) afdo 9537 (auto FDO) (cold) scaled 670792 diff -1235452580, -99.95% digits_2/30 bb 200 fdo 1236123372 (very hot) afdo 9537 (auto FDO) (cold) scaled 670792 diff -1235452580, -99.95% digits_2/30 bb 202 fdo 1059438070 (very hot) afdo 9537 (auto FDO) (cold) scaled 670792 diff -1058767278, -99.94% new_solver/9 bb 246 fdo 413879041 (very hot) afdo 76594 (guessed) (cold) scaled 5387299 diff -408491742, -98.70% new_solver/9 bb 167 fdo 413792205 (very hot) afdo 76594 (guessed) (cold) scaled 5387299 diff -408404906, -98.70% new_solver/9 bb 159 fdo 387809230 (very hot) afdo 57182 (guessed) (cold) scaled 4021940 diff -383787290, -98.96% new_solver/9 bb 158 fdo 387809230 (very hot) afdo 60510 (guessed) (cold) scaled 4256018 diff -383553212, -98.90% new_solver/9 bb 138 fdo 387809230 (very hot) afdo 40917 (guessed) (cold) scaled 2877929 diff -384931301, -99.26% new_solver/9 bb 137 fdo 387809230 (very hot) afdo 43298 (guessed) (cold) scaled 3045398 diff -384763832, -99.21% This dumps basic blocks that do have large counts by normal profile feedback but autofdo gives them small count (so they get cold). These seems to be indeed mostly basic blocks controlling loops. gcc/ChangeLog: * auto-profile.cc (afdo_hot_bb_threshod): New global variable. (maybe_hot_afdo_count_p): New function. (autofdo_source_profile::read): Do not set up dump file; set afdo_hot_bb_threshod. (afdo_annotate_cfg): Handle partial training. (afdo_callsite_hot_enough_for_early_inline): Use maybe_hot_afdo_count_p. (auto_profile_offline::execute): Read autofdo file. * auto-profile.h (maybe_hot_afdo_count_p): Declare. (afdo_hot_bb_threshold): Declare. * coverage.cc (read_counts_file): Also set gcov_profile_info. (coverage_init): Do not read autofdo file. * opts.cc (enable_fdo_optimizations): Add autofdo parameter; do not set flag_branch_probabilities and flag_profile_values with it. (common_handle_option): Update. * passes.cc (finish_optimization_passes): Do not end branch prob here. (pass_manager::dump_profile_report): Also mark change after autofdo pass. * profile.cc: Include auto-profile.h (gcov_profile_info): New global variable. (struct afdo_fdo_record): New struture. (compute_branch_probabilities): Record afdo profile. (end_branch_prob): Dump afdo/fdo profile comparsion. * profile.h (gcov_profile_info): Declarre. * tree-profile.cc (tree_profiling): Call end_branch_prob (pass_ipa_tree_profile::gate): Also enable with autoFDO
13 daysada: Fix poor code generated for return of Out parameter with access typeEric Botcazou1-2/+3
The record type of the return object is unnecessarily given BLKmode. gcc/ada/ChangeLog: * gcc-interface/decl.cc (type_contains_only_integral_data): Do not return false only because the type contains pointer data.
13 daysada: Enforce alignment constraint for large Object_Size clausesEric Botcazou1-1/+15
The constraint is that the Object_Size must be a multiple of the alignment in bits. But it's enforced only when the value of the clause is lower than the Value_Size rounded up to the alignment in bits, not for larger values. gcc/ada/ChangeLog: * gcc-interface/decl.cc (gnat_to_gnu_entity): Use default messages for errors reported for Object_Size clauses. (validate_size): Give an error for stand-alone objects of composite types if the specified size is not a multiple of the alignment.
13 daysada: Fix alignment violation for mix of aligned and misaligned composite typesEric Botcazou1-18/+23
This happens when the chain of initialization procedures is called on the subcomponents and causes the creation of temporaries along the way out of alignment considerations. Now these temporaries are not necessary in the context and were not created until recently, so this gets rid of them. gcc/ada/ChangeLog: * gcc-interface/trans.cc (addressable_p): Add COMPG third parameter. <COMPONENT_REF>: Do not return true out of alignment considerations for non-strict-alignment targets if COMPG is set. (Call_to_gnu): Pass true as COMPG in the call to the addressable_p predicate if the called subprogram is an initialization procedure.
13 daysada: Fix wrong finalization of constrained subtype of unconstrained array typeEric Botcazou1-6/+32
This implements the Is_Constr_Array_Subt_With_Bounds flag for allocators. gcc/ada/ChangeLog: * gcc-interface/trans.cc (gnat_to_gnu) <N_Allocator>: Allocate the bounds alongside the data if the Is_Constr_Array_Subt_With_Bounds flag is set on the designated type. <N_Free_Statement>: Take into account the allocated bounds if the Is_Constr_Array_Subt_With_Bounds flag is set on the designated type.
13 daysada: Fix missing error on too large Component_Size not multiple of storage unitEric Botcazou1-5/+11
This is a small regression introduced a few years ago. gcc/ada/ChangeLog: * gcc-interface/decl.cc (gnat_to_gnu_component_type): Validate the Component_Size like the size of a type only if the component type is actually packed.
13 daysada: Fix check for elaboration order on subprogram body stubsPiotr Trojanek1-1/+9
Fix an assertion failure occurring when elaboration checks were applied to subprogram with a separate body. gcc/ada/ChangeLog: * sem_elab.adb (Check_Overriding_Primitive): Find early call region of the subprogram body declaration, not of the subprogram body stub.
13 daysada: More Tbuild cleanupBob Duff2-9/+4
Remove "Nmake_Assert => ..." on N_Unchecked_Type_Conversion at gen_il-gen-gen_nodes.adb:473 (was disabled). This was left over from commit 82a794419a00ea98b68d69b64363ae6746710de9 "Tbuild cleanup". In addition, the checks for "Is_Composite_Type" in Tbuild.Unchecked_Convert_To are narrowed to "not Is_Scalar_Type"; that way, useless duplicate unchecked conversions of access types will be removed as for composite types. gcc/ada/ChangeLog: * gen_il-gen-gen_nodes.adb (N_Unchecked_Type_Conversion): Remove useless Nmake_Assert. * tbuild.adb (Unchecked_Convert_To): Narrow the bitfield-related conditions.
13 daysada: Refine sanity check in Insert_ActionsRonan Desplanques1-11/+11
Insert_Actions performs a sanity check when it goes through an expression with actions while going up the three. That check was not perfectly right before this patch and spuriously failed when inserting range checks in some situation. This patch makes the check more robust. gcc/ada/ChangeLog: * exp_util.adb (Insert_Actions): Fix check.
13 daysada: Make comment more preciseRonan Desplanques1-4/+5
gcc/ada/ChangeLog: * exp_ch6.adb (Expand_Ctrl_Function_Call): Precisify comment.