Age | Commit message (Collapse) | Author | Files | Lines |
|
I don't know what this emulation does exactly, but it missing a break
statement seems kind of obvious based on the 32-bit case above it.
|
|
This helps the compiler with optimization and fixes fallthru warnings.
|
|
|
|
The compiler pointed out that we're testing LAST_TIMER_REG and
LAST_COUNTER which are the same value ... and that's because we
set LAST_TIMER_REG to the wrong register. Fix the typo.
|
|
The compiler pointed out we checked AZ twice. Sort by name to avoid
that in the future, and to make it clearer that we have coverage of
all the bits. And add the bits we were missing.
The order here doesn't matter as it's just turning a pointer into a
human readable string when store tracing is enabled.
|
|
The scache vars aren't used by ports in the pbb & fast codepaths,
nor are they documented as inputs to the callbacks, so delete them
to avoid unused variable compiler warnings.
|
|
Pull out the common parts of the genmloop invocation into the common
code. This will make it easier to add more, and make the per-port
differences a little more obvious.
|
|
Fix one minor pointer-sign warning to enable warnings in general
for this file. Reading the data as signed and then returning it
as unsigned should be functionally the same in this case.
|
|
This function only uses prev_abuf, not abuf, and doesn't inline code
from the various ports on the fly, so abuf will never be used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These fix unused variable warnings in the generated sim.
|
|
This came up testing the CRC optimization work from Mariam@RAU.
Basically to optimize some CRC loops into table lookups or carryless
multiplies, we may need to do a bit reflection, which on the mcore
processor is done using a rotate instruction.
Unfortunately the simulator implementation of rotates has the exact same
problem as we saw with right shifts. The input value may have been sign
extended from 32 to 64 bits. When we rotate the extended value, we get
those sign extension bits and thus the wrong result.
The fix is the same. Rather than using a "long", use a uint32_t for the
type of the temporary. This fixes a handful of tests in the GCC testsuite:
|
|
We already build cleanly with these.
|
|
The push/pop insns only have 2 operands, so delete unused "c".
The pushret/popret insns use 2 operands, but they don't implement the
logic directly, they call the push/pop implementations. So delete the
unused "a" & "b".
|
|
The pmuls encoding is incorrect -- it looks like a copy & paste error
from the padd pmuls variant. The SuperH software manual covers this.
On the flip side, the manual lists pwsb & pwad as insns that exist,
but no description of what they do, what the insn name means, or the
actual encoding. Our sim implementation stubs them both out as nops.
Let's mark the fields to avoid unused variable warnings.
|
|
Mark a few things static/const, and clean up trailing whitespace.
|
|
These should be using the BF52x set of ports, not BF51x.
|
|
|
|
Fix a few problems caught by compiler warnings:
* Some of the asr & lsr insns were setting up the c state flag,
but then forgetting to set it in the PSW. Add it like the other
asr & lsr variants.
* Some of the dmulh insns were multiplying one of the source regs
against itself instead of against the other source reg.
* The sat16_cmp parallel insn was using the wrong register in the
compare -- the reg1 src/dst pair are used in the sat16 op, and
the reg2 src/dst pair are used in the add op.
|
|
The migration to local.mk in commit 0a129eb19a773d930d60b084209570f663db2053
accidentally listed the deps for all mloop steps as mloop.in instead of the
various variants that m32r uses.
Reported-by: Simon Marchi <simon.marchi@polymtl.ca>
|
|
The mloop.in code does this, but these variants do not. Use it to
avoid unused function warnings. The fast_p logic in these files
also looks off, but that'll require a bit more work to fixup.
CC m32r/mloopx.o
m32r/mloopx.c:37:1: error: ‘m32rxf_fill_argbuf_tp’ defined but not used [-Werror=unused-function]
37 | m32rxf_fill_argbuf_tp (const SIM_CPU *cpu, ARGBUF *abuf,
| ^~~~~~~~~~~~~~~~~~~~~
CC m32r/mloop2.o
m32r/mloop2.c:37:1: error: ‘m32r2f_fill_argbuf_tp’ defined but not used [-Werror=unused-function]
37 | m32r2f_fill_argbuf_tp (const SIM_CPU *cpu, ARGBUF *abuf,
| ^~~~~~~~~~~~~~~~~~~~~
Reported-by: Simon Marchi <simon.marchi@polymtl.ca>
Tested-By: Simon Marchi <simon.marchi@polymtl.ca>
|
|
When generating semantics.c from .igen source files, indenting the code
makes it more readable, but confuses compiler diagnostics. The latter
is a bit more important than the former, so bias towards that.
For example, with an introduced error, we can see w/gcc-13:
(before this change)
CC mn10300/semantics.o
../../../sim/mn10300/am33-2.igen: In function ‘semantic_dcpf_D1a’:
../../../sim/mn10300/am33-2.igen:11:5: error: ‘srcreg’ undeclared (first use in this function)
11 | srcreg = translate_rreg (SD_, RN2);
| ^~~~~~
(with this change)
CC mn10300/semantics.o
../../../sim/mn10300/am33-2.igen: In function ‘semantic_dcpf_D1a’:
../../../sim/mn10300/am33-2.igen:11:3: error: ‘srcreg’ undeclared (first use in this function)
11 | srcreg = translate_rreg (SD_, RN2);
| ^~~~~~
|
|
Running the H8 port through the GCC testsuite currently takes 4h 30m on my
fastest server -- that's roughly 1.5hrs per multilib tested and many tests are
disabled for various reasons.
To put that 1.5hr/multilib in perspective, that's roughly 3X the time for other
embedded targets. Clearly something isn't working as well as it should.
A bit of digging with perf shows that we're spending a crazy amount of time
decoding instructions in the H8 simulator. It's not hard to see why --
basically we take a blob of instruction data, then try to match it to every
instruction in the H8 opcode table starting at the beginning. That table has
~8000 entries (each different addressing mode is considered a different
instruction in the table).
Naturally my first thought was to sort the table and use a binary search to
find the right entry. That's made excessively complex due to the encoding on
the H8. Just getting the sort right would be much more complex than I'd
consider advisable.
Another thought was to build a mapping to the right entry for all the
instructions that can be disambiguated based on the first nibble (4 bits) of
instruction data and a mapping for those which can be disambiguated based on
the first byte of instruction data.
That seemed feasible until I realized that the H8/SX did some truly horrid
things with encoding branches in the 0x4XYY opcode space. It uses an "always
zero" bit in the offset to encode new semantic information. So we can't select
on just 0x4X. Ugh!
We could always to a custom decoder. I've done several through the years, they
can be very fast. But no way I can justify the time to do that.
So what I settled on was to first sort the opcode table by the first nibble,
then find the index of the first instruction for each nibble. Decoding uses
that index to start its search. This cuts the overall build/test by more than
half.
Next I adjusted the sort so that instructions that are not available on the
current sub architecture are put at the end of the table. This shaves another
~15% off the total cycle time.
The net of the two changes is on my fastest server we've gone from 4:30 to 1:40
running the GCC testsuite. Same test results before/after, of course. It's
still not fast, but it's a hell of a lot better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|