Age | Commit message (Collapse) | Author | Files | Lines |
|
llvm-svn: 363027
|
|
------------------------------------------------------------------------
r358042 | jimlin | 2019-04-09 18:56:32 -0700 (Tue, 09 Apr 2019) | 34 lines
[Sparc] Fix incorrect MI insertion position for spilling f128.
Summary:
Obviously, new built MI (sethi+add or sethi+xor+add) for constructing large offset
should be inserted before new created MI for storing even register into memory.
So the insertion position should be *StMI instead of II.
before fixed:
std %f0, [%g1+80]
sethi 4, %g1 <<<
add %g1, %sp, %g1 <<< this two instructions should be put before "std %f0, [%g1+80]".
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]
after fixed:
sethi 4, %g1
add %g1, %sp, %g1
std %f0, [%g1+80]
sethi 4, %g1
add %g1, %sp, %g1
std %f2, [%g1+88]
Reviewers: venkatra, jyknight
Reviewed By: jyknight
Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60397
------------------------------------------------------------------------
llvm-svn: 363010
|
|
------------------------------------------------------------------------
r351577 | ssijaric | 2019-01-18 11:34:20 -0800 (Fri, 18 Jan 2019) | 5 lines
Fix the buildbot issue introduced by r351421
The EXPENSIVE_CHECK x86_64 Windows buildbot is failing due to this change. Fix
the map access.
------------------------------------------------------------------------
llvm-svn: 363006
|
|
------------------------------------------------------------------------
r355154 | joerg | 2019-02-28 15:33:09 -0800 (Thu, 28 Feb 2019) | 2 lines
[PPC] Secure PLT only has meaning for PIC
------------------------------------------------------------------------
llvm-svn: 362729
|
|
------------------------------------------------------------------------
r361237 | maskray | 2019-05-21 03:41:25 -0700 (Tue, 21 May 2019) | 14 lines
[PPC64] Update LocalEntry from assigned symbols
On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local.
The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point.
In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference.
This change keeps track of these assignments, and update all needed st_other fields when finish() is called.
Patch by Leandro Lupori!
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D56586
------------------------------------------------------------------------
llvm-svn: 362671
|
|
------------------------------------------------------------------------
r360442 | maskray | 2019-05-10 10:09:25 -0700 (Fri, 10 May 2019) | 12 lines
[MC][ELF] Copy top 3 bits of st_other to .symver aliases
On PowerPC64 ELFv2 ABI, the top 3 bits of st_other encode the local
entry offset. A versioned symbol alias created by .symver should copy
the bits from the source symbol.
This partly fixes PR41048. A full fix needs tracking of .set assignments
and updating st_other fields when finish() is called, see D56586.
Patch by Alfredo Dal'Ava JĂșnior
Differential Revision: https://reviews.llvm.org/D59436
------------------------------------------------------------------------
llvm-svn: 362669
|
|
------------------------------------------------------------------------
r360439 | maskray | 2019-05-10 09:24:57 -0700 (Fri, 10 May 2019) | 8 lines
[llvm-objdump] Print st_other
Add support for ".hidden" ".internal" ".protected" and " 0x%02x" for
other st_other bits used by some architectures.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D61718
------------------------------------------------------------------------
llvm-svn: 362668
|
|
------------------------------------------------------------------------
r360293 | arsenm | 2019-05-08 15:09:57 -0700 (Wed, 08 May 2019) | 21 lines
AMDGPU: Select VOP3 form of add
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.
3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.
The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.
------------------------------------------------------------------------
llvm-svn: 362658
|
|
------------------------------------------------------------------------
r359899 | arsenm | 2019-05-03 08:37:07 -0700 (Fri, 03 May 2019) | 7 lines
AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.
The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.
------------------------------------------------------------------------
llvm-svn: 362654
|
|
------------------------------------------------------------------------
r359898 | arsenm | 2019-05-03 08:21:53 -0700 (Fri, 03 May 2019) | 3 lines
AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch
------------------------------------------------------------------------
llvm-svn: 362648
|
|
llvm-svn: 362635
|
|
------------------------------------------------------------------------
r359891 | arsenm | 2019-05-03 07:40:10 -0700 (Fri, 03 May 2019) | 9 lines
AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.
Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.
------------------------------------------------------------------------
llvm-svn: 362634
|
|
------------------------------------------------------------------------
r353865 | sfertile | 2019-02-12 09:48:22 -0800 (Tue, 12 Feb 2019) | 1 line
[PowerPC] Fix printing of negative offsets in call instruction dissasembly.
------------------------------------------------------------------------
------------------------------------------------------------------------
r353866 | sfertile | 2019-02-12 09:49:04 -0800 (Tue, 12 Feb 2019) | 4 lines
[PPC64] Update tests to reflect change in printing of call operand. [NFC]
The printing of branch operands for call instructions was changed to properly
handle negative offsets. Updating the tests to reflect that.
------------------------------------------------------------------------
------------------------------------------------------------------------
r353874 | sfertile | 2019-02-12 12:03:04 -0800 (Tue, 12 Feb 2019) | 5 lines
Fix undefined behaviour in PPCInstPrinter::printBranchOperand.
Fix the undefined behaviour introduced by my previous patch r353865 (left
shifting a potentially negative value), which was caught by the bots that run
UBSan.
------------------------------------------------------------------------
llvm-svn: 362043
|
|
------------------------------------------------------------------------
r351523 | dylanmckay | 2019-01-17 22:10:41 -0800 (Thu, 17 Jan 2019) | 12 lines
[AVR] Expand 8/16-bit multiplication to libcalls on MCUs that don't have hardware MUL
This change modifies the LLVM ISel lowering settings so that
8-bit/16-bit multiplication is expanded to calls into the compiler
runtime library if the MCU being targeted does not support
multiplication in hardware.
Before this, MUL instructions would be generated on CPUs like the
ATtiny85, triggering a CPU reset due to an illegal instruction at
runtime.
First raised in https://github.com/avr-rust/rust/issues/124.
------------------------------------------------------------------------
llvm-svn: 361551
|
|
------------------------------------------------------------------------
r355038 | joerg | 2019-02-27 13:53:14 -0800 (Wed, 27 Feb 2019) | 3 lines
Default to Secure PLT on PPC for NetBSD and OpenBSD.
This matches the default settings of clang.
------------------------------------------------------------------------
llvm-svn: 360950
|
|
------------------------------------------------------------------------
r352806 | sbc | 2019-01-31 14:38:22 -0800 (Thu, 31 Jan 2019) | 5 lines
[WebAssembly] MC: Fix for outputing wasm object to /dev/null
Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D57479
------------------------------------------------------------------------
llvm-svn: 360949
|
|
------------------------------------------------------------------------
r354846 | djg | 2019-02-25 21:20:19 -0800 (Mon, 25 Feb 2019) | 12 lines
[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments
For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.
This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.
Differential Revision: https://reviews.llvm.org/D58656
------------------------------------------------------------------------
llvm-svn: 360826
|
|
------------------------------------------------------------------------
r359569 | russell_gallop | 2019-04-30 08:35:16 -0700 (Tue, 30 Apr 2019) | 7 lines
Add llvm-profdata to LLVM_TOOLCHAIN_TOOLS
This is required for using PGO on Windows but isn't in the Windows
release packages. Windows packages are built with
LLVM_INSTALL_TOOLCHAIN_ONLY so only includes llvm "tools" listed here.
Differential Revision: https://reviews.llvm.org/D61317
------------------------------------------------------------------------
llvm-svn: 360801
|
|
------------------------------------------------------------------------
r360099 | efriedma | 2019-05-06 16:21:59 -0700 (Mon, 06 May 2019) | 26 lines
[ARM] Glue register copies to tail calls.
This generally follows what other targets do. I don't completely
understand why the special case for tail calls existed in the first
place; even when the code was committed in r105413, call lowering didn't
work in the way described in the comments.
Stack protector lowering breaks if the register copies are not glued to
a tail call: we have to insert the stack protector check before the tail
call, and we choose the location based on the assumption that all
physical register dependencies of a tail call are adjacent to the tail
call. (See FindSplitPointForStackProtector.) This is sort of fragile,
but I don't see any reason to break that assumption.
I'm guessing nobody has seen this before just because it's hard to
convince the scheduler to actually schedule the code in a way that
breaks; even without the glue, the only computation that could actually
be scheduled after the register copies is the computation of the call
address, and the scheduler usually prefers to schedule that before the
copies anyway.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41417
Differential Revision: https://reviews.llvm.org/D60427
------------------------------------------------------------------------
llvm-svn: 360793
|
|
------------------------------------------------------------------------
r359883 | arsenm | 2019-05-03 06:42:56 -0700 (Fri, 03 May 2019) | 6 lines
AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.
------------------------------------------------------------------------
llvm-svn: 360752
|
|
------------------------------------------------------------------------
r356982 | mstorsjo | 2019-03-26 02:02:44 -0700 (Tue, 26 Mar 2019) | 12 lines
[llvm-dlltool] Set a proper machine type for weak symbol object files
This makes GNU binutils not reject the libraries outright.
GNU ld handles weak externals slightly differently though, so it
can't use them for aliases in import libraries, but this makes GNU
ld able to use the rest of the import libraries.
LLD accepted object files with machine type 0 aka
IMAGE_FILE_MACHINE_UNKNOWN.
Differential Revision: https://reviews.llvm.org/D59742
------------------------------------------------------------------------
llvm-svn: 360750
|
|
------------------------------------------------------------------------
r360512 | ctopper | 2019-05-10 21:19:33 -0700 (Fri, 10 May 2019) | 5 lines
[X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1.
We were checking for SSE4.1 for FP types, but not integer 128-bit types.
Fixes PR41837.
------------------------------------------------------------------------
llvm-svn: 360749
|
|
------------------------------------------------------------------------
r359496 | mstorsjo | 2019-04-29 13:25:51 -0700 (Mon, 29 Apr 2019) | 8 lines
[X86] Run CFIInstrInserter on Windows if Dwarf is used
This is necessary since SVN r330706, as tail merging can include
CFI instructions since then.
This fixes PR40322 and PR40012.
Differential Revision: https://reviews.llvm.org/D61252
------------------------------------------------------------------------
llvm-svn: 359952
|
|
------------------------------------------------------------------------
r359834 | evandro | 2019-05-02 15:01:39 -0700 (Thu, 02 May 2019) | 3 lines
[AArch64] Update for Exynos
Fix the forwarding of multiplication results for Exynos M4.
------------------------------------------------------------------------
llvm-svn: 359946
|
|
------------------------------------------------------------------------
r357701 | mgorny | 2019-04-04 07:21:38 -0700 (Thu, 04 Apr 2019) | 8 lines
[llvm] [cmake] Add additional headers only if they exist
Modify the add_header_files_for_glob() function to only add files
that do exist, rather than all matches of the glob. This fixes CMake
error when one of the include directories (which happen to include
/usr/include) contain broken symlinks.
Differential Revision: https://reviews.llvm.org/D59632
------------------------------------------------------------------------
llvm-svn: 359857
|
|
------------------------------------------------------------------------
r355607 | petarj | 2019-03-07 08:31:08 -0800 (Thu, 07 Mar 2019) | 11 lines
[DebugInfo] Fix the type of the formated variable
Change the format type of *Personality and *LSDAAddress to PRIx64 since
they are of type uint64_t.
The problem was detected on mips builds, where it was printing junk values
and causing test failure.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D58451
------------------------------------------------------------------------
llvm-svn: 359851
|
|
------------------------------------------------------------------------
r356039 | atanasyan | 2019-03-13 04:04:38 -0700 (Wed, 13 Mar 2019) | 11 lines
[MIPS][microMIPS] Fix PseudoMTLOHI_MM matching and expansion
On micromips MipsMTLOHI is always matched to PseudoMTLOHI_DSP regardless
of +dsp argument. This patch checks is HasDSP predicate is present for
PseudoMTLOHI_DSP so PseudoMTLOHI_MM can be matched when appropriate.
Add expansion of PseudoMTLOHI_MM instruction into a mtlo/mthi pair.
Patch by Mirko Brkusanin.
Differential Revision: http://reviews.llvm.org/D59203
------------------------------------------------------------------------
llvm-svn: 358941
|
|
------------------------------------------------------------------------
r355825 | petarj | 2019-03-11 07:13:31 -0700 (Mon, 11 Mar 2019) | 10 lines
[MIPS][microMIPS] Add a pattern to match TruncIntFP
A pattern needed to match TruncIntFP was missing. This was causing multiple
tests from llvm test suite to fail during compilation for micromips.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D58722
------------------------------------------------------------------------
llvm-svn: 358936
|
|
------------------------------------------------------------------------
r354882 | atanasyan | 2019-02-26 06:45:17 -0800 (Tue, 26 Feb 2019) | 4 lines
[mips] Emit `.module softfloat` directive
This change fixes crash on an assertion in case of using
`soft float` ABI for mips32r6 target.
------------------------------------------------------------------------
llvm-svn: 358934
|
|
------------------------------------------------------------------------
r354808 | nikic | 2019-02-25 10:54:17 -0800 (Mon, 25 Feb 2019) | 11 lines
[Mips] Fix missing masking in fast-isel of br (PR40325)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending
(and x, 1) the condition before branching on it.
To avoid regressing trivial cases, I'm combining emission of cmp+br
sequences for the single-use + same block case (similar to what we
do in x86). icmpbr1.ll still regresses due to the cross-bb usage
of the condition.
Differential Revision: https://reviews.llvm.org/D58576
------------------------------------------------------------------------
llvm-svn: 358925
|
|
------------------------------------------------------------------------
r354672 | petarj | 2019-02-22 06:53:58 -0800 (Fri, 22 Feb 2019) | 13 lines
[mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM
Filling a delay slot in 32bit jump instructions with a 16bit instruction
can cause issues. According to the documentation such an operation is
unpredictable.
This patch adds opcode Mips::PseudoIndirectBranch_MM alongside
Mips::PseudoIndirectBranch and other instructions that are expanded to jr
instruction and do not allow a 16bit instruction in their delay slots.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D58507
------------------------------------------------------------------------
llvm-svn: 358920
|
|
------------------------------------------------------------------------
r355854 | jonpa | 2019-03-11 12:00:37 -0700 (Mon, 11 Mar 2019) | 13 lines
[RegAlloc] Avoid compile time regression with multiple copy hints.
As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive compile
time building opencollada"), this patch makes sure that no phys reg is hinted
more than once from getRegAllocationHints().
This handles the case were many virtual registers are assigned to the same
physreg. The previous compile time fix (r343686) in weightCalcHelper() only
made sure that physical/virtual registers are passed no more than once to
addRegAllocationHint().
Review: Dimitry Andric, Quentin Colombet
https://reviews.llvm.org/D59201
------------------------------------------------------------------------
llvm-svn: 358905
|
|
------------------------------------------------------------------------
r356924 | tstellar | 2019-03-25 10:01:29 -0700 (Mon, 25 Mar 2019) | 1 line
merge-request.sh: Update 8.0 metabug for 8.0.1
------------------------------------------------------------------------
llvm-svn: 357234
|
|
llvm-svn: 356714
|
|
llvm-svn: 356708
|
|
llvm-svn: 356034
|
|
llvm-svn: 355916
|
|
llvm-svn: 355584
|
|
------------------------------------------------------------------------
r355227 | ctopper | 2019-03-01 22:02:34 +0100 (Fri, 01 Mar 2019) | 3 lines
[X86] Add test case for D58805. NFC
This demonstrates dead store elimination removing a store that may alias a gather that uses null as its base.
------------------------------------------------------------------------
------------------------------------------------------------------------
r355228 | ctopper | 2019-03-01 22:02:40 +0100 (Fri, 01 Mar 2019) | 7 lines
[X86] Remove IntrArgMemOnly from target specific gather/scatter intrinsics
IntrArgMemOnly implies that only memory pointed to by pointer typed arguments will be accessed. But these intrinsics allow you to pass null to the pointer argument and put the full address into the index argument. Other passes won't be able to understand this.
A colleague found that ISPC was creating gathers like this and then dead store elimination removed some stores because it didn't understand what the gather was doing since the pointer argument was null.
Differential Revision: https://reviews.llvm.org/D58805
------------------------------------------------------------------------
llvm-svn: 355383
|
|
------------------------------------------------------------------------
r355136 | efriedma | 2019-02-28 21:38:45 +0100 (Thu, 28 Feb 2019) | 12 lines
[AArch64] [Windows] Don't skip constructing UnwindHelp.
In certain cases, the first non-frame-setup instruction in a function is
a branch. For example, it could be a cbz on an argument. Make sure we
correctly allocate the UnwindHelp, and find an appropriate register to
use to initialize it.
Fixes https://bugs.llvm.org/show_bug.cgi?id=40184
Differential Revision: https://reviews.llvm.org/D58752
------------------------------------------------------------------------
llvm-svn: 355313
|
|
------------------------------------------------------------------------
r352465 | mstorsjo | 2019-01-29 10:36:48 +0100 (Tue, 29 Jan 2019) | 17 lines
[COFF, ARM64] Don't put jump table into a separate COFF section for EK_LabelDifference32
Windows ARM64 has PIC relocation model and uses jump table kind
EK_LabelDifference32. This produces jump table entry as
".word LBB123 - LJTI1_2" which represents the distance between the block
and jump table.
A new relocation type (IMAGE_REL_ARM64_REL32) is needed to do the fixup
correctly if they are in different COFF section.
This change saves the jump table to the same COFF section as the
associated code. An ideal fix could be utilizing IMAGE_REL_ARM64_REL32
relocation type.
Patch by Tom Tan!
Differential Revision: https://reviews.llvm.org/D57277
------------------------------------------------------------------------
llvm-svn: 355311
|
|
------------------------------------------------------------------------
r355116 | ctopper | 2019-02-28 19:49:29 +0100 (Thu, 28 Feb 2019) | 7 lines
[X86] Don't peek through bitcasts before checking ISD::isBuildVectorOfConstantSDNodes in combineTruncatedArithmetic
We don't have any combines that can look through a bitcast to truncate a build vector of constants. So the truncate will stick around and give us something like this pattern (binop (trunc X), (trunc (bitcast (build_vector)))) which has two truncates in it. Which will be reversed by hoistLogicOpWithSameOpcodeHands in the generic DAG combiner. Thus causing an infinite loop.
Even if we had a combine for (truncate (bitcast (build_vector))), I think it would need to be implemented in getNode otherwise DAG combiner visit ordering would probably still visit the binop first and reverse it. Or combineTruncatedArithmetic would need to do its own constant folding.
Differential Revision: https://reviews.llvm.org/D58705
------------------------------------------------------------------------
------------------------------------------------------------------------
r355117 | ctopper | 2019-02-28 19:50:16 +0100 (Thu, 28 Feb 2019) | 1 line
[X86] Add test case that was supposed to go with r355116.
------------------------------------------------------------------------
llvm-svn: 355310
|
|
------------------------------------------------------------------------
r354505 | ibiryukov | 2019-02-20 20:08:06 +0100 (Wed, 20 Feb 2019) | 13 lines
[clangd] Store index in '.clangd/index' instead of '.clangd-index'
Summary: To take up the .clangd folder for other potential uses in the future.
Reviewers: kadircet, sammccall
Reviewed By: kadircet
Subscribers: ioeric, MaskRay, jkorous, arphaman, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58440
------------------------------------------------------------------------
llvm-svn: 354981
|
|
llvm-svn: 354973
|
|
------------------------------------------------------------------------
r354207 | whitequark | 2019-02-16 23:33:10 +0100 (Sat, 16 Feb 2019) | 16 lines
[bindings/go] Fix building on 32-bit systems (ARM etc.)
Summary:
The patch in https://reviews.llvm.org/D53883 (by me) fails to build on 32-bit systems like ARM. Fix the array size to be less ridiculously large. 2<<20 should still be enough for all practical purposes.
Bug: https://bugs.llvm.org/show_bug.cgi?id=40426
Reviewers: whitequark, pcc
Reviewed By: whitequark
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58030
------------------------------------------------------------------------
llvm-svn: 354956
|
|
By Jonathan Metzman!
Differential revision: https://reviews.llvm.org/D58676
llvm-svn: 354892
|
|
------------------------------------------------------------------------
r354733 | nikic | 2019-02-23 19:59:01 +0100 (Sat, 23 Feb 2019) | 10 lines
[WebAssembly] Fix select of and (PR40805)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.
I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.
Differential Revision: https://reviews.llvm.org/D58575
------------------------------------------------------------------------
llvm-svn: 354860
|
|
------------------------------------------------------------------------
r354756 | ctopper | 2019-02-24 20:33:37 +0100 (Sun, 24 Feb 2019) | 36 lines
[X86] Fix tls variable lowering issue with large code model
Summary:
The problem here is the lowering for tls variable. Below is the DAG for the code.
SelectionDAG has 11 nodes:
t0: ch = EntryToken
t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
t12: i64 = add t8, t11
t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
t6: ch = CopyToReg t0, Register:i32 %0, t4
And when mcmodel is large, below instruction can NOT be folded.
t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"
When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.
The patch is to fold the load and X86ISD::WrapperRIP.
Fixes PR26906
Patch by LuoYuanke
Reviewers: craig.topper, rnk, annita.zhang, wxiao3
Reviewed By: rnk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58336
------------------------------------------------------------------------
llvm-svn: 354857
|
|
------------------------------------------------------------------------
r354764 | lebedevri | 2019-02-25 08:39:07 +0100 (Mon, 25 Feb 2019) | 123 lines
[XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"
Summary:
This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert
Abstractions are great.
Readable code is great.
JSON support library is a *good* idea.
However unfortunately, there is an internal detail that one needs
to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`.
So for **every** `llvm::json::Object`, even if you only store a single `int`
entry there, you pay the whole price of `llvm::DenseMap`.
Unfortunately, it matters for `llvm-xray`.
I was trying to analyse the `llvm-exegesis` analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
even ended up being killed by OOM.
`xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
(compiled with ` -fxray-instruction-threshold=128`)
analysis mode over `-benchmarks-file` with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.
Timings:
Old: (copied from D58580)
```
$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% )
314 context-switches # 14.701 M/sec ( +- 59.13% )
1 cpu-migrations # 0.037 M/sec ( +-100.00% )
2181354 page-faults # 102191.251 M/sec ( +- 0.02% )
85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%)
14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%)
32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%)
67896890228 instructions # 0.79 insn per cycle
# 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%)
14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%)
212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%)
21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% )
```
New:
```
$ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):
7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% )
182 context-switches # 25.402 M/sec ( +- 28.84% )
0 cpu-migrations # 0.046 M/sec ( +- 70.71% )
33701 page-faults # 4694.994 M/sec ( +- 0.88% )
28761053971 cycles # 4006833.933 GHz ( +- 0.26% ) (83.32%)
2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%)
10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%)
36199132874 instructions # 1.26 insn per cycle
# 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%)
6434504227 branches # 896420204.421 M/sec ( +- 0.03% ) (66.68%)
73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%)
7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% )
```
So using `llvm::json` nearly triples run-time on that test case.
(+3x is times, not percent.)
Memory:
Old:
```
total runtime: 39.88s.
bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
calls to allocation functions: 33267816 (834135/s)
temporary memory allocations: 5832298 (146235/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.98GB
total memory leaked: 1.09MB
```
New:
```
total runtime: 17.42s.
bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
calls to allocation functions: 21382982 (1227284/s)
temporary memory allocations: 232858 (13364/s)
peak heap memory consumption: 350.69MB
peak RSS (including heaptrack overhead): 2.55GB
total memory leaked: 79.95KB
```
Diff:
```
total runtime: -22.46s.
bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
calls to allocation functions: -11884834 (529155/s)
temporary memory allocations: -5599440 (249307/s)
peak heap memory consumption: -8.86GB
peak RSS (including heaptrack overhead): 0B
total memory leaked: -1.01MB
```
So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x.
And total allocation count +15x. Both of these numbers are times, *not* percent.
And note that memory usage is clearly unbound with `llvm::json`, it directly depends
on the length of the log, so peak memory consumption is always increasing.
This isn't so with the dumb code, there is no accumulating memory consumption,
peak memory consumption is fixed. Naturally, that means it will handle *much*
larger logs without OOM'ing.
Readability is good, but the price is simply unacceptable here.
Too bad none of this analysis was done as part of the development/review D50129 itself.
Reviewers: dberris, kpw, sammccall
Reviewed By: dberris
Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58584
------------------------------------------------------------------------
llvm-svn: 354856
|
|
llvm-svn: 354659
|