Age | Commit message (Collapse) | Author | Files | Lines |
|
Upstream ccan uses (list, existing entry, new entry) parameter ordering
rather than (list, new entry, existing entry) ordering.
Switch these to make syncing with upstream simpler.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
SPDX makes it a simpler diff.
I have audited the commit history of each file to ensure that they are
exclusively authored by IBM and thus we have the right to relicense.
The motivation behind this is twofold:
1) We want to enable experiments with coreboot, which is GPLv2 licensed
2) An upcoming firmware component wants to incorporate code from skiboot
and code from the Linux kernel, which is GPLv2 licensed.
I have gone through the IBM internal way of gaining approval for this.
The following files are not exclusively authored by IBM, so are *not*
included in this update (I will be seeking approval from contributors):
core/direct-controls.c
core/flash.c
core/pcie-slot.c
external/common/arch_flash_unknown.c
external/common/rules.mk
external/gard/Makefile
external/gard/rules.mk
external/opal-prd/Makefile
external/pflash/Makefile
external/xscom-utils/Makefile
hdata/vpd.c
hw/dts.c
hw/ipmi/ipmi-watchdog.c
hw/phb4.c
include/cpu.h
include/phb4.h
include/platform.h
libflash/libffs.c
libstb/mbedtls/sha512.c
libstb/mbedtls/sha512.h
platforms/astbmc/barreleye.c
platforms/astbmc/garrison.c
platforms/astbmc/mihawk.c
platforms/astbmc/nicole.c
platforms/astbmc/p8dnu.c
platforms/astbmc/p8dtu.c
platforms/astbmc/p9dsu.c
platforms/astbmc/vesnin.c
platforms/rhesus/ec/config.h
platforms/rhesus/ec/gpio.h
platforms/rhesus/gpio.c
platforms/rhesus/rhesus.c
platforms/astbmc/talos.c
platforms/astbmc/romulus.c
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: fixed up the drift]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
local_alloc() function tries to find first allocatable local region and
tries to allocate memory. If that region doesn't have sufficient memory to
allocate then it will log below warning and tries to allocate memory from
next available region.
Warning:
--------
[ 268.346728058,3] mem_alloc(0x800000, 0x800000, "hw/xive.c:1630", ibm,firmware-allocs-memory@0) failed !
[ 268.346732718,6] Memory regions:
[ 268.346734353,6] 0x000030500000..000030ffffff : ibm,firmware-heap
[ 268.346833468,5] 420 allocs of 0x00000058 bytes at core/device.c:41 (total 0x9060)
[ 268.346978805,5] 2965 allocs of 0x00000038 bytes at core/device.c:424 (total 0x28898)
[ 268.347035614,5] 434 allocs of 0x00000040 bytes at core/device.c:424 (total 0x6c80)
[ 268.347093567,5] 365 allocs of 0x00000028 bytes at libc/string/strdup.c:23 (total 0x3908)
[ 268.347136818,5] 84 allocs of 0x00000048 bytes at core/device.c:424 (total 0x17a0)
[ 268.347179123,5] 21 allocs of 0x00000030 bytes at libc/string/strdup.c:23 (total 0x3f0)
....
....
Hostboot reserves memory for various nodes and passes this information via HDAT.
In some cases there will be small memory holes between two reservations
(ex: 16MB free space between two reservation).
add_region() function adds new region to the head of regions list.
mem_region_init() adds OPAL regions first and then hostboot regions. So
these smaller regions will be added to head of list. If we have smaller
free regions then we may hit above warning.
Hostboot uses top of the memory for various reservations. So if we sort
memory regions then allocator will use bigger region (region after OPAL
memory) for local allocation. And we will not hit above warning.
Memory region layout with this patch:
0 - 756MB : OS reserved region (for loading kernel)
756MB - ~856MB : OPAL memory (actual size depends on PIR)
856MB - ~956MB : Memory for MPIPL (actual size depends on OPAL runtime size)
956MB - ... : We will have free memory after 956MB which we can use
for local_alloc(). Typically this will be multiple GBs.
So it works fine.
.... - top_mem: Hostboot reservations + small holes
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This fixes quite a few sparse endian annotations across the tree.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
BUILD_ASSERT can not be used for constants generated by the assembler
or linker. This results in variable length arrays that do not catch
the failure condition. This was caught by sparse.
Remove these and add some equivalent as/ld checks which actually do
the right thing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The ibm,chip-id property is not sufficent for Linux to work out the NUMA
node that a pmem region is placed on. Add any nodes that are compatible
with "pmem-region" to the pass where we add affinity information to the
normal memory@ nodes.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Use Software Package Data Exchange (SPDX) to indicate license for each
file that is unique to skiboot.
At the same time, ensure the (C) who and years are correct.
See https://spdx.org/
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: Added a few missing files]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The static analysis tool is arguably wrong and should go away.
But... I'm sick of keeping coming back to it and reviewing the false
positives enough to make a slight change to where ifdefs are.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Mambo image payloads get overwritten by the OS and by
fast reboot memory clearing because they have no region
defined. Add them, which allows fast reboot to work.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[stewart: fix up 'make check']
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Currently we print one line for each allocation done at runtime when
dumping the memory allocations. We do a few thousand allocations at
boot so this can result in a huge amount of text being printed which
is a) slow to print, and b) Can result in the log buffer overflowing
which destroys otherwise useful information.
This patch adds a de-duplication to this memory allocation dump by
merging "similar" allocations (same location, same size) into one.
Unfortunately, the algorithm used to do the de-duplication is quadratic,
but considering we only dump the allocations in the event of a fatal
error I think this is acceptable. I also did some benchmarking and found
that on a ZZ it takes ~3ms to do a dump with 12k allocations. On a Zaius
it's slightly longer at about ~10ms for 10k allocs. However, the
difference there was due to the output being written to the UART.
This patch also bumps the log level to PR_NOTICE. PR_INFO messages are
suppressed at the default log level, which probably isn't something you
want considering we only dump the allocations when we run out of skiboot
heap space.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Back in the dim dark past, mem_check() was written to take the
assumption that mem regions need to be sizeof(alloc_hdr) aligned.
I can't see any real reason for this, so change it to sizeof(long)
aligned as we count space by number of longs, so at least that kind of
makes sense.
We hit this assert in a future patch when preserving BOOTKERNEL across
fast reboots.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This can help with debugging when trying to do node local
allocations.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
For many systems, scanning PCI takes about as much time as
zeroing all of RAM, so we may as well do them at the same time
and cut a few seconds off the total fast reboot time.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Arbitrarily pick 16GB as the unit of parallelism, and
split up clearing memory into jobs and schedule them
node-local to the memory (or on node 0 if we can't
work that out because it's the memory up to SKIBOOT_BASE)
This seems to cut at least ~40% time from memory zeroing on
fast-reboot on a 256GB Boston system.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We play funny business with printf format specifiers because
of how we do unit tests.
Fixes: c32943bfc1e254176ecab564fdb4752403a48cab
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Stack allocation first allocates a memory region sized to hold stacks
for all possible CPUs up to the maximum PIR of the architecture, zeros
the region, then initialises all stacks. Max PIR is 32768 on POWER9,
which is 512MB for stacks.
The stack region is then shrunk after CPUs are discovered, but this is
a bit of a hack, and it leaves a hole in the memory allocation regions
as it's done after mem regions are initialised.
0x000000000000..00002fffffff : ibm,os-reserve - OS
0x000030000000..0000303fffff : ibm,firmware-code - OPAL
0x000030400000..000030ffffff : ibm,firmware-heap - OPAL
0x000031000000..000031bfffff : ibm,firmware-data - OPAL
0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL
*** gap ***
0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL
0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS
0x000080000000..000080b3cdff : initramfs - OPAL
0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL
0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS
This change moves zeroing into the per-cpu stack setup. The boot CPU
stack is set up based on the current PIR. Then the size of the stack
region is set, by discovering the maximum PIR of the system from the
device tree, before mem regions are intialised.
This results in all memory being accounted within memory regions,
and less memory fragmentation of OPAL allocations.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This improves the security and predictability of the fast reboot
environment.
There can not be a secure fence between fast reboots, because a
malicious OS can modify the firmware itself. However a well-behaved
OS can have a reasonable expectation that OS memory regions it has
modified will be cleared upon fast reboot.
The memory is zeroed after all other CPUs come up from fast reboot,
just before the new kernel is loaded and booted into. This allows
image preloading to run concurrently, and will allow parallelisation
of the clearing in future.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Run the mem_region sanity checkers before proceeding with fast
reboot.
This is the beginning of proactive sanity checks on opal data
for fast reboot (with complements the reactive disable_fast_reboot
cases). This is encouraged to re-use and share any kind of debug
code and unit test code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
OPAL exposes reserved memory regions through the device tree in both new
(nodes) and old (properties) formats.
However, the names used for these don't match - we use a generated cell
address for the nodes, but the plain region name for the properties.
This change, heavily based on code from Oliver O'Halloran
<oohall@gmail.com>, reworks the dt-generation code to firstly generate
the new-format nodes, then uses those same names to generate the
property data.
Reported-by: Deb McLemore <debmc@linux.vnet.ibm.com>
CC: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
[stewart: fix test case]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Just print an error if a region failed to add - at least then there will
be a trace somewhere about the problem.
Fixes: CID 147251
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Regions with the no-map property should be handled seperately to
"normal" firmware reservations. When creating mem_region regions
from a reserved-memory DT node use the no-map property to select
the right reservation type.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The only sensible thing to do if this fails is to abort() as we've
likely just failed reserving reserved memory regions, and nothing
good comes from that.
Found by static analysis
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
90% of what we print isn't useful to a normal user. This
dramatically reduces the amount of messages printed by
OPAL in normal circumstances.
We still need to add a way to bump the log level at boot
based on a BMC scratch register or some HDAT property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently memory reservations are parsed, but since they are not
processed until mem_region_init() they don't appear in the output
device tree blob. Several bugs have been found with memory reservations
so we want them to be part of the test output.
Add them and clean up several usages of printf() since we want only the
dtb to appear in standard out.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add a new type of memory reservation that indicates a memory region is
only used by hardware and should not be touched by software. This is
needed for the in-memory tracing buffers. These reservations have the
"no-map" property which indicates that the host kernel should not setup
any virtual address mappings that cover this range, unless of course a
device driver does so explicitly.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently all existing reservations are made by hostboot itself or on
behalf of some other part of system firmware (e.g. the OCCs). We want
to add a "true" hardware reservation type that should not be touched
by the host OS. To prepare for that this patch renames the existing
reservation type to refect it's actual usage.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
New memory regions need to be either fully contained by an existing
region or completely disjoint. Right now we just fail silently or crash
with an assert which is less than helpful. Printing some basic
information, such as the names of the overlapping regions is helpful.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When a new memory region is added (e.g for memory reserved by firmware)
the list of existing memory regions is iterated through and a cut-out is
made in any existing region that overlaps with the new one. Prior to the
HDAT reservations being made the region init process was always:
1) Create regions from the memory@<addr> DT nodes. (mostly large)
2) Create reserved regions from the device-tree. (mostly small)
When adding new regions we have assume that the new region will only
every intersect with at most one existing region, which it will split.
Adding reservations inside the HDAT parser breaks this because when
adding the memory@<addr> node regions we can potentially overlap with
multiple reserved regions. This patch fixes this by maintaining a
seperate list of memory reservations and delaying merging them until
after the normal memory init has finished, similar to how DT
reservations are handled.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Don't poison chunks that are already free and poison regions on
first allocation. This speeds things up dramatically.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
And use it to control the stack checker, memory poisoning and
CCAN's list debugging.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Found by smatch static analysis (http://smatch.sourceforge.net/):
core/mem_region.c:561 mem_check() warn: inconsistent indenting
core/mem_region.c:569 mem_check() warn: inconsistent indenting
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We create our own inttypes.h to get the correct printf formatting for
64bit numbers.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
SLW image)
Memory regions in skiboot have an interesting life cycle. First, we get
a bunch from the initial device tree or hdat specifying some existing
reserved ranges (as well as adding some of our own if they're missing)
but we also get ranges for the entirety of RAM.
The idea is that we can do node local allocations for per node resources
(which we do) and then, just prior to booting linux, we copy the reserved
memory regions to expose to linux along with a set of reserver regions
to cover the node local allocations.
The problem was that mem_range_is_reserved() was wanting subtle different
semantics for memory region type than region_is_reserved() provided.
That is, we were overriding the meaning of REGION_SKIBOOT_HEAP to mean both
"this is reserved by skiboot" *and* "this is a memory region that covers
all of memory and will be shrunk to cover just the memory we have allocated
for it just before we boot the payload (linux)".
So what would happen is we would ask "hey, is the memory holding the SLW
image reserved?" and we'd get the answer of "yes" but referring to the memory
region that covers the entirety of memory in a NUMA node, *not* meaning
our intent of "this will be reserved when we start linux".
To fix this, introduce a new memory region type REGION_MEMORY. This has
the semantics of a memory region that covers a block of memory that we can
allocate from (using local_alloc) and that the part that was allocated
will be passed to linux as reserved, but that the entire range will not
be reserved.
So our new semantics are:
- region_is_reservable() is true if the region *MAY* be reserved
(i.e. is the regions that cover the whole of memory OR is explicitly reserved)
- region_is_reserved() is true if the region *WILL* be reserved
(i.e. is explicitly reserved)
This way we check that the SLW image is explicitly reserved and if it isn't,
we reserve it.
Fixes: 58033e44
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Just bailing out this early in boot is perfectly acceptable.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
If we fail to allocate memory at this point in boot, we should just
assert, there's really no coming back from not being able to reserve
our reserved memory.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
See https://github.com/lucasdemarchi/codespel
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This change adds a function to check whether a range of memory is
covered by one or more reservations.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This change adds code to parse node-style memory reservations from the
incoming device-tree. If we find a reserved-memory node in either:
/reserved-memory/ or
/ibm,hostboot/reserved-memory
- then we use that in preference to the property-style reservations.
We copy those nodes as-is into /reserved-memory in the output tree, to
pass-through allow any extra property data that those input nodes
contain.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
If the HB device tree provides a top-level reserved-memory node, we'll
abort(). We want to be able to handle a pre-populated reserved-memory
node in a future change, so handle this case gracefully.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This change moves the property-style reserved-memory parsing code to a
separate function. We split dropping the properties into
mem_region_add_dt_reserved (before creating the updated range
properties), as we need to do that regardless of where the reservations
are parsed from.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
All current users of mem_reserve are actually wanting HW_RESERVED
memory; these reservations are for memory initialised pre-skiboot.
This change marks these regions as REGION_HW_RESERVED instead of
REGION_RESERVED. We also rename mem_reserve to mem_reserve_hw to reflect
this change.
This fixes an issue where the PRD daemon cannot find reserved ranges
(eg, the homer image) that have been created by skiboot itself.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
This change adds a function to iterate mem_regions.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Linux supports a newer memory reservation layout in the device-tree,
where each reservation is represented by a subnode under a top-level
"reserved-memory" node.
This change adds these nodes, using the mem_region names as the property
names (minus any cell addresses). The reserved-memory node looks like
this:
/ {
name = "reserved-memory";
ranges;
#address-cells = <0x2>;
#size-cells = <0x2>;
ibm,firmware-code@30000000 {
reg = <0x0 0x30000000 0x0 0x200000>;
};
ibm,firmware-data@30e00000 {
reg = <0x0 0x30e00000 0x0 0xc00000>;
};
ibm,firmware-stacks@31a00000 {
reg = <0x0 0x31a00000 0x0 0x8000000>;
};
ibm,firmware-allocs-memory@39a00000 {
reg = <0x0 0x39a00000 0x0 0x1c0200>;
};
ibm,firmware-heap@30200000 {
reg = <0x0 0x30200000 0x0 0xc00000>;
};
};
We also store a pointer to the reservation nodes in struct mem_region,
so they can be used by other skiboot code.
We keep the property-style reservation information (reserved-names and
reserved-ranges) unchanged.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This change allows the mem_region code to distinguish reserved memory
that was allocated before skiboot init, by introducing a new
mem_region_type member.
When we extract reserved ranges from the device tree, we mark them with
this new type.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We'll want to store non-memory nodes in this pointer too.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
If we reserve any memory after mem_region_add_dt_reserved, that
reservation won't appear in the device tree. Ensure that we can't
add new regions after this point.
Also, add a testcase for the finalise, including some basic
reserved-ranges property checks.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This change adds asserts to the mem_region calls that should have the
per-region lock held.
To keep the tests working, they need the lock_held_by_me() function. The
run-mem_region.c test has a bogus implementation of this, as it doesn't
do any locking at the moment. This will be addressed in a later change.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|