Age | Commit message (Collapse) | Author | Files | Lines |
|
Dynamic relocations are expensive on ELF/Linux platforms because they
are applied in userspace on process startup. Therefore, it is worth
optimizing them to make PIE and PIC dylib builds faster. In +asserts
builds (non-NDEBUG), nikic identified these schedule class name string
pointers as the leading source of dynamic relocations. [1]
This change uses llvm::StringTable and the StringToOffsetTable TableGen
helper to turn the string pointers into 32-bit offsets into a separate
character array.
The number of dynamic relocations is reduced by ~60%:
❯ llvm-readelf --dyn-relocations lib/libLLVM.so | wc -l
381376 # before
155156 # after
The test suite time is modestly affected, but I'm running on a shared
noisy workstation VM with a ton of cores:
https://gist.github.com/rnk/f38882c2fe2e63d0eb58b8fffeab69de
Testing Time: 100.88s # before
Testing Time: 78.50s. # after
Testing Time: 96.25s. # before again
I haven't used any fancy hyperfine/denoising tools, but I think the
result is clearly visible and we should ship it.
[1] https://gist.github.com/nikic/554f0a544ca15d5219788f1030f78c5a
|
|
Since I've formatted the epsilon value, I don't think it's necessary to
escape it.
|
|
This patch removes the llvm:: prefix within llvm-exegesis where it is
not necessary. This is most occurrences of the prefix within exegesis as
exegesis is within the llvm namespace. This patch makes things more
consistent as the vast majority of the code did not use the llvm::
prefix for anything.
|
|
Identified with clangd.
|
|
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
|
|
This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43.
Build still failing.
|
|
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
|
|
This reverts commit 030d33409568b2f0ea61116e83fd40ca27ba33ac.
This commit is causing build failures
|
|
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568
|
|
As part of preparing the reports, the Analysis class needs to print
machine instructions in a disassembled form. For this purpose, the class
has four fields (namely Context_, AsmInfo_, InstPrinter_ and Disasm_).
All the constructor of the Analysis class does is conditionally
initializing these four fields.
This commit factors out the logic for decoding machine code and printing
it in an assembler form into a separate DisassemblerHelper class.
~~
Huawei RRI, OS Lab
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D147156
|
|
When llvm-exegesis was first introduced, it only supported benchmarking
individual instructions, hence the name for the data structure storing
the data corresponding to a benchmark being called InstructionBenchmark
made sense. However, now that benchmarking arbitrary snippets is
supported, InstructionBenchmark doesn't correspond to a single
instruction. This patch refactors InstructionBenchmark to be called
Benchmark to clean up this little bit of technical debt.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D146884
|
|
Differential Revision: https://reviews.llvm.org/D146104
|
|
With Mips fixes.
This reverts commit 7daf60e3440b22b79084bb325d823aa3ad8df0f3.
|
|
Breaks MIPS compile.
This reverts commit cc61c822e05c51e494c50d1e72f963750116ef08.
|
|
We were using the native triple to parse the benchmarks. Use the triple
from the benchmarks file.
Right now this still only allows analyzing files produced by the current
target until D133605 is in.
This also makes the `Analysis` class much less ad-hoc.
Differential Revision: https://reviews.llvm.org/D133697
|
|
Currently native clusterization simply groups all benchmarks
by the opcode of key instruction, but that is suboptimal in certain cases,
e.g. where we can already tell that the particular instructions
already resolve into different sched classes.
|
|
future-proof against C++23
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
|
|
MCObjectFileInfo
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.
This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101921
|
|
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.
This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:
- TargetTriple
- ObjectFileType
- SubtargetInfo
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101462
|
|
|
|
This is useful to set the baseline model for an unknown CPU.
Fixes PR50013.
Differential Revision: https://reviews.llvm.org/D100743
|
|
|
|
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.
Reviewers: efriedma, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D84489
|
|
The addition of `inverse_throughput` mode highlighted the disjointedness
of snippet generators and benchmark runners because it used the
`UopsSnippetGenerator` with the `LatencyBenchmarkRunner`.
To keep the code consistent tie the snippet generators to
parallelization/serialization rather than their benchmark runners.
Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`.
Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`.
Differential Revision: https://reviews.llvm.org/D72928
|
|
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
|
|
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.
It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.
Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.
The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.
In any case, downstream projects which don't know `Address` can pass 0 as
the argument.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72172
|
|
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
|
|
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68780
llvm-svn: 374533
|
|
Summary: Second patch: in the lib.
Reviewers: gchatelet
Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68692
llvm-svn: 374158
|
|
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
|
|
Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59994
llvm-svn: 357263
|
|
Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59993
llvm-svn: 357259
|
|
Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59951
llvm-svn: 357245
|
|
Summary:
This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.
But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.
The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.
The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.
Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.
This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59820
llvm-svn: 357152
|
|
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values
What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)
By splitting it into two params it is now possible.
This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% )
12 context-switches # 31.735 M/sec ( +- 27.38% )
0 cpu-migrations # 0.000 K/sec
4745 page-faults # 12183.732 M/sec ( +- 0.54% )
1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%)
185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%)
392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%)
1839236666 instructions # 1.18 insn per cycle
# 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%)
407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%)
10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%)
0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% )
262 context-switches # 38.546 M/sec ( +- 23.06% )
0 cpu-migrations # 0.065 M/sec ( +- 76.03% )
13287 page-faults # 1953.206 M/sec ( +- 0.32% )
27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%)
1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%)
16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%)
17611143370 instructions # 0.65 insn per cycle
# 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%)
3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%)
116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%)
6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% )
12 context-switches # 29.429 M/sec ( +- 25.95% )
0 cpu-migrations # 0.100 M/sec ( +-100.00% )
4714 page-faults # 11796.496 M/sec ( +- 0.55% )
1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%)
199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%)
402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%)
1847783963 instructions # 1.15 insn per cycle
# 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%)
407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%)
10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%)
0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% )
lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% )
217 context-switches # 31.236 M/sec ( +- 36.16% )
1 cpu-migrations # 0.096 M/sec ( +- 50.00% )
13258 page-faults # 1908.389 M/sec ( +- 0.34% )
27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%)
1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%)
16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%)
17755545931 instructions # 0.64 insn per cycle
# 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%)
3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%)
117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%)
6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% )
```
I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.
Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58476
llvm-svn: 354767
|
|
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.
We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.
I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.
The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)
Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% )
172 context-switches # 25.965 M/sec ( +- 29.89% )
0 cpu-migrations # 0.042 M/sec ( +- 56.54% )
31073 page-faults # 4690.754 M/sec ( +- 0.08% )
26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%)
2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%)
13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%)
19770706799 instructions # 0.74 insn per cycle
# 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%)
4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%)
121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%)
6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% )
```
patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):
6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% )
213 context-switches # 32.952 M/sec ( +- 23.81% )
1 cpu-migrations # 0.130 M/sec ( +- 43.84% )
31287 page-faults # 4832.057 M/sec ( +- 0.08% )
25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%)
1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%)
13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%)
19752995402 instructions # 0.76 insn per cycle
# 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%)
4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%)
121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%)
6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.
patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):
6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% )
133 context-switches # 20.792 M/sec ( +- 23.39% )
0 cpu-migrations # 0.063 M/sec ( +- 61.24% )
31318 page-faults # 4903.256 M/sec ( +- 0.08% )
25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%)
1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%)
13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%)
19767554347 instructions # 0.77 insn per cycle
# 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%)
4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%)
118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%)
6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.
patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):
6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% )
194 context-switches # 31.709 M/sec ( +- 20.46% )
0 cpu-migrations # 0.039 M/sec ( +- 49.77% )
31413 page-faults # 5129.261 M/sec ( +- 0.06% )
24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%)
1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%)
13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%)
18260877653 instructions # 0.74 insn per cycle
# 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%)
4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%)
114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%)
6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58355
llvm-svn: 354441
|
|
Summary:
D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 | PR37698 ]] added support for measuring of the inverse throughput.
But the support for the analysis was not added.
This attempts to fix that. (analysis done o bdver2 / piledriver)
First, small-scale experiment:
```
$ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o
---
mode: inverse_throughput
key:
instructions:
- 'BSF64rr RAX RDX'
config: ''
register_initial_values:
- 'RDX=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3
...
```
If we plug `bsfq %r12, %r10` into llvm-mca:
https://godbolt.org/z/ZtOyhJ
```
Dispatch Width: 4
uOps Per Cycle: 3.00
IPC: 0.50
Block RThroughput: 2.0
```
So RThroughput mismatch exists.
Now, let's upscale and analyse:
{F8207148}
`$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`:
{F8207172}
{F8207197}
And if we now look at https://www.agner.org/optimize/instruction_tables.pdf,
`Reciprocal throughput` for `BSF r,r` is listed as `3`.
Yay?
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57647
llvm-svn: 353023
|
|
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
double each time.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54382)
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% )
...
9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% )
```
New time:
```
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% )
...
9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% )
```
-~0.3%, not that much. But this isn't the important part.
Old:
* calls to allocation functions: 2109712
* temporary allocations: 33112
* bytes allocated in total (ignoring deallocations): 4.43 GB
New:
* calls to allocation functions: 2095345 (-0.68%)
* temporary allocations: 18745 (-43.39% !!!)
* bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54383
llvm-svn: 347199
|
|
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m10.487s
user 0m9.745s
sys 0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m9.599s
user 0m8.824s
sys 0m0.772s
```
Not that much, around -9%. But that is not the good part yet, again.
Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB
New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54382
llvm-svn: 347198
|
|
Summary:
This allows simplifying references of llvm::foo with foo when the needs
come in the future.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: javed.absar, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53455
llvm-svn: 344922
|
|
llvm-svn: 344774
|
|
That was not the proper fix: the variable is used in debug mode.
llvm-svn: 343685
|
|
llvm-svn: 343684
|
|
llvm-svn: 343682
|
|
Summary: See PR38884.
Reviewers: gchatelet
Subscribers: tschuett, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D52825
llvm-svn: 343680
|
|
Summary: The key is now the resource name, not the resource id.
Reviewers: gchatelet
Subscribers: tschuett, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D52607
llvm-svn: 343208
|
|
Summary: The convenience wrapper in STLExtras is available since rL342102.
Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb
Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D52573
llvm-svn: 343163
|
|
Summary:
THis is a backwards-compatible change (existing files will work as
expected).
See PR39082.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D52546
llvm-svn: 343108
|
|
Summary: See PR38936 for context.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D52500
llvm-svn: 343081
|