aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorStewart Smith <stewart@linux.ibm.com>2019-07-16 15:37:56 +1000
committerStewart Smith <stewart@linux.ibm.com>2019-07-16 15:37:56 +1000
commit3a6fdede6ce117facec0108afe716cf5d0472c3f (patch)
treec880316a89801cbb3f09d3ab3c6efa1d11d9ec41 /doc
parente4203d350e03b26780843655b3c8a7df5faae1e8 (diff)
downloadskiboot-6.4.zip
skiboot-6.4.tar.gz
skiboot-6.4.tar.bz2
skiboot v6.4 release notesv6.4
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Diffstat (limited to 'doc')
-rw-r--r--doc/release-notes/skiboot-6.4.rst850
1 files changed, 850 insertions, 0 deletions
diff --git a/doc/release-notes/skiboot-6.4.rst b/doc/release-notes/skiboot-6.4.rst
new file mode 100644
index 0000000..f6f632a
--- /dev/null
+++ b/doc/release-notes/skiboot-6.4.rst
@@ -0,0 +1,850 @@
+.. _skiboot-6.4:
+
+skiboot-6.4
+===========
+
+skiboot v6.4 was released on Tuesday July 16th 2019. It is the first
+release of skiboot 6.4, which becomes the new stable release
+of skiboot following the 6.3 release, first released May 3rd 2019.
+
+Skiboot 6.4 will mark the basis for op-build v2.4.
+
+skiboot v6.4 contains all bug fixes as of :ref:`skiboot-6.0.20`,
+and :ref:`skiboot-6.3.2` (the currently maintained stable releases).
+
+For how the skiboot stable releases work, see :ref:`stable-rules` for details.
+
+Over skiboot 6.3, we have the following changes:
+
+.. _skiboot-6.4-new-features:
+
+New features
+------------
+
+Since skiboot v6.4-rc1:
+
+- npu2-opencapi: Add opencapi support on ZZ
+
+ This patch adds opencapi support on ZZ. It hard-codes the required
+ device tree entries for the NPU and links. The alternative was to use
+ HDAT, but it somehow proved too painful to do.
+
+ The new device tree entries activate the npu2 init code on ZZ. On
+ systems with no opencapi adapters, it should go unnoticed, as presence
+ detection will skip link training.
+
+Since skiboot v6.3:
+
+- platforms/nicole: Add new platform
+
+ The platform is a new platform from YADRO, it's a storage controller for
+ TATLIN server. It's Based on IBM Romulus reference design (POWER9).
+
+- platform/zz: Add new platform type
+
+ We have new platform type under ZZ. Lets add them. With this fix
+- nvram: Flag dangerous NVRAM options
+
+ Most nvram options used by skiboot are just for debug or testing for
+ regressions. They should never be used long term.
+
+ We've hit a number of issues in testing and the field where nvram
+ options have been set "temporarily" but haven't been properly cleared
+ after, resulting in crashes or real bugs being masked.
+
+ This patch marks most nvram options used by skiboot as dangerous and
+ prints a chicken to remind users of the problem.
+
+- hw/phb3: Add verbose EEH output
+
+ Add support for the pci-eeh-verbose NVRAM flag on PHB3. We've had this
+ on PHB4 since forever and it has proven very useful when debugging EEH
+ issues. When testing changes to the Linux kernel's EEH implementation
+ it's fairly common for the kernel to crash before printing the EEH log
+ so it's helpful to have it in the OPAL log where it can be dumped from
+ XMON.
+
+ Note that unlike PHB4 we do not enable verbose mode by default. The
+ nvram option must be used to explicitly enable it.
+
+- Experimental support for building without FSP code
+
+ Now, with CONFIG_FSP=0/1 we have:
+
+ - 1.6M/1.4M skiboot.lid
+ - 323K/375K skiboot.lid.xz
+
+- doc: travis-ci deploy docs!
+
+ Documentation is now automatically deployed if you configure Travis CI
+ appropriately (we have done this for the open-power branch of skiboot)
+
+- Big OPAL API Documentation improvement
+
+ A lot more OPAL API calls are now (at least somewhat) documented.
+- opal/hmi: Report NPU2 checkstop reason
+
+ The NPU2 is currently not passing any information to linux to explain
+ the cause of an HMI. NPU2 has three Fault Isolation Registers and over
+ 30 of those FIR bits are configured to raise an HMI by default. We
+ won't be able to fit all possible state in the 32-bit xstop_reason
+ field of the HMI event, but we can still try to encode up to 4 HMI
+ reasons.
+- opal-msg: Enhance opal-get-msg API
+
+ Linux uses :ref:`OPAL_GET_MSG` API to get OPAL messages. This interface
+ supports upto 8 params (64 bytes). We have a requirement to send bigger data to
+ Linux. This patch enhances OPAL to send bigger data to Linux.
+
+ - Linux will use "opal-msg-size" device tree property to allocate memory for
+ OPAL messages (previous patch increased "opal-msg-size" to 64K).
+ - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
+ side opal_get_msg user can detect actual data size.
+ - If buffer size < actual message size, then opal_get_msg will copy partial
+ data and return OPAL_PARTIAL to Linux.
+ - Add new variable "extended" to "opal_msg_entry" structure to keep track
+ of messages that has more than 64byte data. We will allocate separate
+ memory for these messages and once kernel consumes message we will
+ release that memory.
+- core/opal: Increase opal-msg-size size
+
+ Kernel will use `opal-msg-size` property to allocate memory for opal_msg.
+ We want to send bigger data from OPAL to kernel. Hence increase
+ opal-msg-size to 64K.
+- hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memory
+
+ Lowest Point of Coherency (LPC) memory allows the host to access memory on
+ an OpenCAPI device.
+
+ Define 2 OPAL calls, :ref:`OPAL_NPU_MEM_ALLOC` and :ref:`OPAL_NPU_MEM_RELEASE`, for
+ assigning and clearing the memory BAR. (We try to avoid using the term
+ "LPC" to avoid confusion with Low Pin Count.)
+
+ At present, we use a fixed location in the address space, which means we
+ are restricted to a single range of 4TB, on a single OpenCAPI device per
+ chip. In future, we'll use some chip ID extension magic to give us more
+ space, and some sort of allocator to assign ranges to more than one device.
+- core/fast-reboot: Add im-feeling-lucky option
+
+ Fast reboot gets disabled for a number of reasons e.g. the availability
+ of nvlink. However this doesn't actually affect the ability to perform fast
+ reboot if no nvlink device is actually present.
+
+ Add a nvram option for fast-reset where if it's set to
+ "im-feeling-lucky" then perform the fast-reboot irrespective of if it's
+ previously been disabled.
+
+- platforms/astbmc: Check for SBE validation step
+
+ On some POWER8 astbmc systems an update to the SBE requires pausing at
+ runtime to ensure integrity of the SBE. If this is required the BMC will
+ set a chassis boot option IPMI flag using the OEM parameter 0x62. If
+ Skiboot sees this flag is set it waits until the SBE update is complete
+ and the flag is cleared.
+
+ Unfortunately the mystery operation that validates the SBE also leaves
+ it in a bad state and unable to be used for timer operations. To
+ workaround this the flag is checked as soon as possible (ie. when IPMI
+ and the console are set up), and once complete the system is rebooted.
+- Add P9 DIO interrupt support
+
+ On P9 there are GPIO port 0, 1, 2 for GPIO interrupt, and DIO interrupt
+ is used to handle the interrupts.
+
+ Add support to the DIO interrupts:
+
+ 1. Add dio_interrupt_register(chip, port, callback) to register the
+ interrupt
+ 2. Add dio_interrupt_deregister(chip, port, callback) to deregister;
+ 3. When interrupt on the port occurs, callback is invoked, and the
+ interrupt status is cleared.
+
+
+Removed features
+----------------
+
+Since skiboot v6.3:
+
+- pci/iov: Remove skiboot VF tracking
+
+ This feature was added a few years ago in response to a request to make
+ the MaxPayloadSize (MPS) field of a Virtual Function match the MPS of the
+ Physical Function that hosts it.
+
+ The SR-IOV specification states the the MPS field of the VF is "ResvP".
+ This indicates the VF will use whatever MPS is configured on the PF and
+ that the field should be treated as a reserved field in the config space
+ of the VF. In other words, a SR-IOV spec compliant VF should always return
+ zero in the MPS field. Adding hacks in OPAL to make it non-zero is...
+ misguided at best.
+
+ Additionally, there is a bug in the way pci_device structures are handled
+ by VFs that results in a crash on fast-reboot that occurs if VFs are
+ enabled and then disabled prior to rebooting. This patch fixes the bug by
+ removing the code entirely. This patch has no impact on SR-IOV support on
+ the host operating system.
+- Remove POWER7 and POWER7+ support
+
+ It's been a good long while since either OPAL POWER7 user touched a
+ machine, and even longer since they'd have been okay using an old
+ version rather than tracking master.
+
+ There's also been no testing of OPAL on POWER7 systems for an awfully
+ long time, so it's pretty safe to assume that it's very much bitrotted.
+
+ It also saves a whole 14kb of xz compressed payload space.
+- Remove remnants of :ref:`OPAL_PCI_GET_PHB_DIAG_DATA`
+
+ Never present in a public OPAL release, and only kernels prior to 3.11
+ would ever attempt to call it.
+- Remove unused :ref:`OPAL_GET_XIVE_SOURCE`
+
+ While this call was technically implemented by skiboot, no code has ever called
+ it, and it was only ever implemented for the p7ioc-phb back-end (i.e. POWER7).
+ Since this call was unused in Linux, and that POWER7 with OPAL was only ever
+ available internally, so it should be safe to remove the call.
+- Remove unused :ref:`OPAL_PCI_GET_XIVE_REISSUE` and :ref:`OPAL_PCI_SET_XIVE_REISSUE`
+
+ These seem to be remnants of one of the OPAL incarnations prior to
+ OPALv3. These calls have never been implemented in skiboot, and never
+ used by an upstream kernel (nor a PowerKVM kernel).
+
+ It's rather safe to just document them as never existing.
+- Remove never implemented :ref:`OPAL_PCI_SET_PHB_TABLE_MEMORY` and document why
+
+ Not ever used by upstream linux or PowerKVM tree. Never implemented in
+ skiboot (not even in ancient internal only tree).
+
+ So, it's incredibly safe to remove.
+- Remove unused :ref:`OPAL_PCI_EEH_FREEZE_STATUS2`
+
+ This call was introduced all the way back at the end of 2012, before
+ OPAL was public. The #define for the OPAL call was introduced to the
+ Linux kernel in June 2013, and the call was never used in any kernel
+ tree ever (as far as we can find).
+
+ Thus, it's quite safe to remove this completely unused and completely
+ untested OPAL call.
+- Document the long removed :ref:`OPAL_REGISTER_OPAL_EXCEPTION_HANDLER` call
+
+ I'm pretty sure this was removed in one of our first ever service packs.
+
+ Fixes: https://github.com/open-power/skiboot/issues/98
+- Remove last remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY` and :ref:`OPAL_PCI_SET_HUB_TCE_MEMORY`
+
+ Since we have not supported p5ioc systems since skiboot 5.2, it's pretty
+ safe to just wholesale remove these OPAL calls now.
+- Remove remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY`
+
+ There's no reason we need remnants hanging around that aren't used, so
+ remove them and save a handful of bytes at runtime.
+
+ Simultaneously, document the OPAL call removal.
+
+
+Secure and Trusted Boot
+-----------------------
+
+Since skiboot v6.3:
+
+- trustedboot: Change PCR and event_type for the skiboot events
+
+ The existing skiboot events are being logged as EV_ACTION, however, the
+ TCG PC Client spec says that EV_ACTION events should have one of the
+ pre-defined strings in the event field recorded in the event log. For
+ instance:
+
+ - "Calling Ready to Boot",
+ - "Entering ROM Based Setup",
+ - "User Password Entered", and
+ - "Start Option ROM Scan.
+
+ None of the EV_ACTION pre-defined strings are applicable to the existing
+ skiboot events. Based on recent discussions with other POWER teams, this
+ patch proposes a convention on what PCR and event types should be used
+ for skiboot events. This also changes the skiboot source code to follow
+ the convention.
+
+ The TCG PC Client spec defines several event types, other than
+ EV_ACTION. However, many of them are specific to UEFI events and some
+ others are related to platform or CRTM events, which is more applicable
+ to hostboot events.
+
+ Currently, most of the hostboot events are extended to PCR[0,1] and
+ logged as either EV_PLATFORM_CONFIG_FLAGS, EV_S_CRTM_CONTENTS or
+ EV_POST_CODE. The "Node Id" and "PAYLOAD" events, though, are extended
+ to PCR[4,5,6] and logged as EV_COMPACT_HASH.
+
+ For the lack of an event type that fits the specific purpose,
+ EV_COMPACT_HASH seems to be the most adequate one due to its
+ flexibility. According to the TCG PC Client spec:
+
+ - May be used for any PCR except 0, 1, 2 and 3.
+ - The event field may be informative or may be hashed to generate the
+ digest field, depending on the component recording the event.
+
+ Additionally, the PCR[4,5] seem to be the most adequate PCRs. They would
+ be used for skiboot and some skiroot events. According to the TCG PC
+ Client, PCR[4] is intended to represent the entity that manages the
+ transition between the pre-OS and OS-present state of the platform.
+ PCR[4], along with PCR[5], identifies the initial OS loader.
+
+ In summary, for skiboot events:
+
+ - Events that represents data should be extended to PCR 4.
+ - Events that represents config should be extended to PCR 5.
+ - For the lack of an event type that fits the specific purpose,
+ both data and config events should be logged as EV_COMPACT_HASH.
+
+Sensors
+-------
+
+Since skiboot v6.3:
+
+- occ-sensors: Check if OCC is reset while reading inband sensors
+
+ OCC may not be able to mark the sensor buffer as invalid while going
+ down RESET. If OCC never comes back we will continue to read the stale
+ sensor data. So verify if OCC is reset while reading the sensor values
+ and propagate the appropriate error.
+
+IPMI
+----
+
+Since skiboot v6.3:
+
+- ipmi: ensure forward progress on ipmi_queue_msg_sync()
+
+ BT responses are handled using a timer doing the polling. To hope to
+ get an answer to an IPMI synchronous message, the timer needs to run.
+
+ We can't just check all timers though as there may be a timer that
+ wants a lock that's held by a code path calling ipmi_queue_msg_sync(),
+ and if we did enforce that as a requirement, it's a pretty subtle
+ API that is asking to be broken.
+
+ So, if we just run a poll function to crank anything that the IPMI
+ backend needs, then we should be fine.
+
+ This issue shows up very quickly under QEMU when loading the first
+ flash resource with the IPMI HIOMAP backend.
+
+NPU2
+----
+
+Since skiboot v6.4-rc1:
+
+- witherspoon: Add nvlink peers in finalise_dt()
+
+ This information is consumed by Linux so it needs to be in the DT. Move
+ it to finalise_dt().
+
+Since skiboot v6.3:
+
+- npu2: Increase timeout for L2/L3 cache purging
+
+ On NVLink2 bridge reset, we purge all L2/L3 caches in the system.
+ This is an asynchronous operation, we have a 2ms timeout here. There are
+ reports that this is not enough and "PURGE L3 on core xxx timed out"
+ messages appear (for the reference: on the test setup this takes
+ 280us..780us).
+
+ This defines the timeout as a macro and changes this from 2ms to 20ms.
+
+ This adds a tracepoint to tell how long it took to purge all the caches.
+- npu2: Purge cache when resetting a GPU
+
+ After putting all a GPU's links in reset, do a cache purge in case we
+ have CPU cache lines belonging to the now-unaccessible GPU memory.
+- npu2-opencapi: Mask 2 XSL errors
+
+ Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some
+ opencapi FIRs") converted some FIR bits default action from system
+ checkstop to raising an error interrupt. For 2 XSL error events that
+ can be triggered by a misbehaving AFU, the error interrupt is raised
+ twice, once for each link (the XSL logic in the NPU is shared between
+ 2 links). So a badly behaving AFU could impact another, unsuspecting
+ opencapi adapter.
+
+ It doesn't look good and it turns out we can do better. We can mask
+ those 2 XSL errors. The error will also be picked up by the OTL logic,
+ which is per link. So we'll still get an error interrupt, but only on
+ the relevant link, and the other opencapi adapter can stay functional.
+- npu2: Clear fence state for a brick being reset
+
+ Resetting a GPU before resetting an NVLink leads to occasional HMIs
+ which fence some bricks and prevent the "reset_ntl" procedure from
+ succeeding at the "reset_ntl_release" step - the host system requires
+ reboot; there may be other cases like this as well.
+
+ This adds clearing of the fence bit in NPU.MISC.FENCE_STATE for
+ the NVLink which we are about to reset.
+- npu2: Fix clearing the FIR bits
+
+ FIR registers are SCOM-only so they cannot be accesses with the indirect
+ write, and yet we use SCOM-based addresses for these; fix this.
+
+- npu2: Reset NVLinks when resetting a GPU
+
+ Resetting a V100 GPU brings its NVLinks down and if an NPU tries using
+ those, an HMI occurs. We were lucky not to observe this as the bare metal
+ does not normally reset a GPU and when passed through, GPUs are usually
+ before NPUs in QEMU command line or Libvirt XML and because of that NPUs
+ are naturally reset first. However simple change of the device order
+ brings HMIs.
+
+ This defines a bus control filter for a PCI slot with a GPU with NVLinks
+ so when the host system issues secondary bus reset to the slot, it resets
+ associated NVLinks.
+- npu2: Reset PID wildcard and refcounter when mapped to LPID
+
+ Since 105d80f85b "npu2: Use unfiltered mode in XTS tables" we do not
+ register every PID in the XTS table so the table has one entry per LPID.
+ Then we added a reference counter to keep track of the entry use when
+ switching GPU between the host and guest systems (the "Fixes:" tag below).
+
+ The POWERNV platform setup creates such entries and references them
+ at the boot time when initializing IOMMUs and only removes it when
+ a GPU is passed through to a guest. This creates a problem as POWERNV
+ boots via kexec and no defererencing happens; the XTS table state remains
+ undefined. So when the host kernel boots, skiboot thinks there are valid
+ XTS entries and does not update the XTS table which breaks ATS.
+
+ This adds the reference counter and the XTS entry reset when a GPU is
+ assigned to LPID and we cannot rely on the kernel to clean that up.
+
+PHB4
+----
+
+Since skiboot v6.3:
+
+- hw/phb4: Make phb4_training_trace() more general
+
+ phb4_training_trace() is used to monitor the Link Training Status
+ State Machine (LTSSM) of the PHB's data link layer. Currently it is only
+ used to observe the LTSSM while bringing up the link, but sometimes it's
+ useful to see what's occurring in other situations (e.g. link disable, or
+ secondary bus reset). This patch renames it to phb4_link_trace() and
+ allows the target LTSSM state and a flexible timeout to help in these
+ situations.
+- hw/phb4: Make pci-tracing print at PR_NOTICE
+
+ When pci-tracing is enabled we print each trace status message and the
+ final trace status at PR_ERROR. The final status messages are similar to
+ those printed when we fail to train in the non-pci-tracing path and this
+ has resulted in spurious op-test failures.
+
+ This patch reduces the log-level of the tracing message to PR_NOTICE so
+ they're not accidently interpreted as actual error messages. PR_NOTICE
+ messages are still printed to the console during boot.
+- hw/phb4: Use read/write_reg in assert_perst
+
+ While the PHB is fenced we can't use the MMIO interface to access PHB
+ registers. While processing a complete reset we inject a PHB fence to
+ isolate the PHB from the rest of the system because the PHB won't
+ respond to MMIOs from the rest of the system while being reset.
+
+ We assert PERST after the fence has been erected which requires us to
+ use the XSCOM indirect interface to access the PHB registers rather than
+ the MMIO interface. Previously we did that when asserting PERST in the
+ CRESET path. However in b8b4c79d4419 ("hw/phb4: Factor out PERST
+ control"). This was re-written to use the raw in_be64() accessor. This
+ means that CRESET would not be asserted in the reset path. On some
+ Mellanox cards this would prevent them from re-loading their firmware
+ when the system was fast-reset.
+
+ This patch fixes the problem by replacing the raw {in|out}_be64()
+ accessors with the phb4_{read|write}_reg() functions.
+
+- hw/phb4: Assert Link Disable bit after ETU init
+
+ The cursed RAID card in ozrom1 has a bug where it ignores PERST being
+ asserted. The PCIe Base spec is a little vague about what happens
+ while PERST is asserted, but it does clearly specify that when
+ PERST is de-asserted the Link Training and Status State Machine
+ (LTSSM) of a device should return to the initial state (Detect)
+ defined in the spec and the link training process should restart.
+
+ This bug was worked around in 9078f8268922 ("phb4: Delay training till
+ after PERST is deasserted") by setting the link disable bit at the
+ start of the FRESET process and clearing it after PERST was
+ de-asserted. Although this fixed the bug, the patch offered no
+ explaination of why the fix worked.
+
+ In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable
+ workaround was moved into phb4_assert_perst(). This is called
+ always in the CRESET case, but a following patch resulted in
+ assert_perst() not being called if phb4_freset() was entered following a
+ CRESET since p->skip_perst was set in the CRESET handler. This is bad
+ since a side-effect of the CRESET is that the Link Disable bit is
+ cleared.
+
+ This, combined with the RAID card ignoring PERST results in the PCIe
+ link being trained by the PHB while we're waiting out the 100ms
+ ETU reset time. If we hack skiboot to print a DLP trace after returning
+ from phb4_hw_init() we get: ::
+
+ PHB#0001[0:1]: Initialization complete
+ PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
+ PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
+ PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
+ PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config
+ PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery
+ PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery
+ PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0
+ PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0
+ PHB#0001[0:1]: CRESET: wait_time = 100
+ PHB#0001[0:1]: FRESET: Starts
+ PHB#0001[0:1]: FRESET: Prepare for link down
+ PHB#0001[0:1]: FRESET: Assert skipped
+ PHB#0001[0:1]: FRESET: Deassert
+ PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0
+ PHB#0001[0:1]: TRACE: Reached target state
+ PHB#0001[0:1]: LINK: Start polling
+ PHB#0001[0:1]: LINK: Electrical link detected
+ PHB#0001[0:1]: LINK: Link is up
+ PHB#0001[0:1]: LINK: Went down waiting for stabilty
+ PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000
+ PHB#0001[0:1]: CRESET: Starts
+
+ What has happened here is that the link is trained to 8x Gen3 33ms after
+ we return from phb4_init_hw(), and before we've waitined to 100ms
+ that we normally wait after re-initialising the ETU. When we "deassert"
+ PERST later on in the FRESET handler the link in L0 (normal) state. At
+ this point we try to read from the Vendor/Device ID register to verify
+ that the link is stable and immediately get a PHB fence due to a PCIe
+ Completion Timeout. Skiboot attempts to recover by doing another CRESET,
+ but this will encounter the same issue.
+
+ This patch fixes the problem by setting the Link Disable bit (by calling
+ phb4_assert_perst()) immediately after we return from phb4_init_hw().
+ This prevents the link from being trained while PERST is asserted which
+ seems to avoid the Completion Timeout. With the patch applied we get: ::
+
+ PHB#0001[0:1]: Initialization complete
+ PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
+ PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
+ PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
+ PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled
+ PHB#0001[0:1]: CRESET: wait_time = 100
+ PHB#0001[0:1]: FRESET: Starts
+ PHB#0001[0:1]: FRESET: Prepare for link down
+ PHB#0001[0:1]: FRESET: Assert skipped
+ PHB#0001[0:1]: FRESET: Deassert
+ PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect
+ PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
+ PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect
+ PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling
+ PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config
+ PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery
+ PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery
+ PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0
+ PHB#0001[0:1]: TRACE: Reached target state
+ PHB#0001[0:1]: LINK: Start polling
+ PHB#0001[0:1]: LINK: Electrical link detected
+ PHB#0001[0:1]: LINK: Link is up
+ PHB#0001[0:1]: LINK: Link is stable
+ PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled
+ PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3
+ PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08
+ PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000
+
+
+Simulators
+----------
+
+Since skiboot v6.3:
+
+- external/mambo: Bump default POWER9 to Nimbus DD2.3
+- external/mambo: fix tcl startup code for mambo bogus net (repost)
+
+ This fixes a couple issues with external/mambo/skiboot.tcl so I can use the
+ mambo bogus net.
+
+ * newer distros (ubuntu 18.04) allow tap device to have a user specified
+ name instead of just tapN so we need to pass in a name not a number.
+ * need some kind of default for net_mac, and need the mconfig for it
+ to be set from an env var.
+- skiboot.tcl: Add option to wait for GDB server connection
+
+ Add an environment variable which makes Mambo wait for a connection
+ from gdb prior to starting simulation.
+- mambo: Integrate addr2line into backtrace command
+
+ Gives nice output like this: ::
+
+ systemsim % bt
+ pc: 0xC0000000002BF3D4 _savegpr0_28+0x0
+ lr: 0xC00000000004E0F4 opal_call+0x10
+ stack:0x000000000041FAE0 0xC00000000004F054 opal_check_token+0x20
+ stack:0x000000000041FB50 0xC0000000000500CC __opal_flush_console+0x88
+ stack:0x000000000041FBD0 0xC000000000050BF8 opal_flush_console+0x24
+ stack:0x000000000041FC00 0xC0000000001F9510 udbg_opal_putc+0x88
+ stack:0x000000000041FC40 0xC000000000020E78 udbg_write+0x7c
+ stack:0x000000000041FC80 0xC0000000000B1C44 console_unlock+0x47c
+ stack:0x000000000041FD80 0xC0000000000B2424 register_console+0x320
+ stack:0x000000000041FE10 0xC0000000003A5328 register_early_udbg_console+0x98
+ stack:0x000000000041FE80 0xC0000000003A4F14 setup_arch+0x68
+ stack:0x000000000041FEF0 0xC0000000003A0880 start_kernel+0x74
+ stack:0x000000000041FF90 0xC00000000000AC60 start_here_common+0x1c
+
+- mambo: Add addr2func for symbol resolution
+
+ If you supply a VMLINUX_MAP/SKIBOOT_MAP/USER_MAP addr2func can guess
+ at your symbol name. i.e. ::
+
+ systemsim % p pc
+ 0xC0000000002A68F8
+ systemsim % addr2func [p pc]
+ fdt_offset_ptr+0x78
+
+- lpc-port80h: Don't write port 80h when running under Simics
+
+ Simics doesn't model LPC port 80h. Writing to it terminates the
+ simulation due to an invalid LPC memory access. This patch adds a
+ check to ensure port 80h isn't accessed if we are running under
+ Simics.
+- device-tree: speed up fdt building on slow simulators
+
+ Trade size for speed and avoid de-duplicating strings in the fdt.
+ This costs about 2kB in fdt size, and saves about 8 million instructions
+ (almost half of all instructions) booting skiboot in mambo.
+- fast-reboot:: skip read-only memory checksum for slow simulators
+
+ Skip the fast reboot checksum, which costs about 4 million cycles
+ booting skiboot in mambo.
+- nx: remove check on the "qemu, powernv" property
+
+ commit 95f7b3b9698b ("nx: Don't abort on missing NX when using a QEMU
+ machine") introduced a check on the property "qemu,powernv" to skip NX
+ initialization when running under a QEMU machine.
+
+ The QEMU platforms now expose a QUIRK_NO_RNG in the chip. Testing the
+ "qemu,powernv" property is not necessary anymore.
+- plat/qemu: add a POWER8 and POWER9 platform
+
+ These new QEMU platforms have characteristics closer to real OpenPOWER
+ systems that we use today and define a different BMC depending on the
+ CPU type. New platform properties are introduced for each,
+ "qemu,powernv8", "qemu,powernv9" and these should be compatible with
+ existing QEMUs which only expose the "qemu,powernv" property
+- libc/string: speed up common string functions
+
+ Use compiler builtins for the string functions, and compile the
+ libc/string/ directory with -O2.
+
+ This reduces instructions booting skiboot in mambo by 2.9 million in
+ slow-sim mode, or 3.8 in normal mode, for less than 1kB image size
+ increase.
+
+ This can result in the compiler warning more cases of string function
+ problems.
+- external/mambo: Add an option to exit Mambo when the system is shutdown
+
+ Automatically exiting can be convenient for scripting. Will also exit
+ due to a HW crash (eg. unhandled exception).
+
+VESNIN platform
+---------------
+
+Since skiboot v6.3:
+
+- platforms/vesnin: PCI inventory via IPMI OEM
+
+ Replace raw protocol with OEM message supported by OpenBMC's IPMI
+ plugins.
+
+ BMC-side implementation (IPMI plug-in):
+ https://github.com/YADRO-KNS/phosphor-pci-inventory
+
+Utilities
+---------
+
+Since skiboot v6.3:
+
+- opal-gard: Account for ECC size when clearing partition
+
+ When 'opal-gard clear all' is run, it works by erasing the GUARD then
+ using blockevel_smart_write() to write nothing to the partition. This
+ second write call is needed because we rely on libflash to set the ECC
+ bits appropriately when the partition contained ECCed data.
+
+ The API for this is a little odd with the caller specifying how much
+ actual data to write, and libflash writing size + size/8 bytes
+ since there is one additional ECC byte for every eight bytes of data.
+
+ We currently do not account for the extra space consumed by the ECC data
+ in reset_partition() which is used to handle the 'clear all' command.
+ Which results in the paritition following the GUARD partition being
+ partially overwritten when the command is used. This patch fixes the
+ problem by reducing the length we would normally write by the number
+ of ECC bytes required.
+
+
+Build and debugging
+-------------------
+
+Since skiboot v6.3:
+
+- Disable -Waddress-of-packed-member for GCC9
+
+ We throw a bunch of errors in errorlog code otherwise, which we should
+ fix, but we don't *have* to yet.
+
+- Fix a lot of sparse warnings
+- With new GCC comes larger GCOV binaries
+
+ So we need to change our heap size to make more room for data/bss
+ without having to change where the console is or have more fun moving
+ things about.
+- Intentionally discard fini_array sections
+
+ Produced in a SKIBOOT_GCOV=1 build, and never called by skiboot.
+- external/trace: Add follow option to dump_trace
+
+ When monitoring traces, an option like the tail command's '-f' (follow)
+ is very useful. This option continues to append to the output as more
+ data arrives. Add an '-f' option to allow dump_trace to operate
+ similarly.
+
+ Tail also provides a '-s' (sleep time) option that
+ accompanies '-f'. This controls how often new input will be polled. Add
+ a '-s' option that will make dump_trace sleep for N milliseconds before
+ checking for new input.
+- external/trace: Add support for dumping multiple buffers
+
+ dump_trace only can dump one trace buffer at a time. It would be handy
+ to be able to dump multiple buffers and to see the entries from these
+ buffers displayed in correct timestamp order. Each trace buffer is
+ already sorted by timestamp so use a heap to implement an efficient
+ k-way merge. Use the CCAN heap to implement this sort. However the CCAN
+ heap does not have a 'heap_replace' operation. We need to 'heap_pop'
+ then 'heap_push' to replace the root which means rebalancing twice
+ instead of once.
+- external/trace: mmap trace buffers in dump_trace
+
+ The current lseek/read approach used in dump_trace does not correctly
+ handle certain aspects of the buffers. It does not use the start and end
+ position that is part of the buffer so it will not begin from the
+ correct location. It does not move back to the beginning of the trace
+ buffer file as the buffer wraps around. It also does not handle the
+ overflow case of the writer overwriting when the reader is up to.
+
+ Mmap the trace buffer file so that the existing reading functions in
+ extra/trace.c can be used. These functions already handle the cases of
+ wrapping and overflow. This reduces code duplication and uses functions
+ that are already unit tested. However this requires a kernel where the
+ trace buffer sysfs nodes are able to be mmaped (see
+ https://patchwork.ozlabs.org/patch/1056786/)
+- core/trace: Export trace buffers to sysfs
+
+ Every property in the device-tree under /ibm,opal/firmware/exports has a
+ sysfs node created in /firmware/opal/exports. Add properties with the
+ physical address and size for each trace buffer so they are exported.
+- core/trace: Add pir number to debug_descriptor
+
+ The names given to the trace buffers when exported to sysfs should show
+ what cpu they are associated with to make it easier to understand there
+ output. The debug_descriptor currently stores the address and length of
+ each trace buffer and this is used for adding properties to the device
+ tree. Extend debug_descriptor to include a cpu associated with each
+ trace. This will be used for creating properties in the device-tree
+ under /ibm,opal/firmware/exports/.
+- core/trace: Change trace buffer size
+
+ We want to be able to mmap the trace buffers to be used by the
+ dump_trace tool. As mmaping is done in terms of pages it makes sense
+ that the size of the trace buffers should be page aligned. This is
+ slightly complicated by the space taken up by the header at the
+ beginning of the trace and the room left for an extra trace entry at the
+ end of the buffer. Change the size of the buffer itself so that the
+ entire trace buffer size will be page aligned.
+- core/trace: Change buffer alignment from 4K to 64K
+
+ We want to be able to mmap the trace buffers to be used by the
+ dump_trace tool. This means that the trace bufferes must be page
+ aligned. Currently they are aligned to 4K. Most power systems have a
+ 64K page size. On systems with a 4K page size, 64K aligned will still be
+ page aligned. Change the allocation of the trace buffers to be 64K
+ aligned.
+
+ The trace_info struct that contains the trace buffer is actually what is
+ allocated aligned memory. This means the trace buffer itself is not
+ actually aligned and this is the address that is currently exposed
+ through sysfs. To get around this change the address that is exposed to
+ sysfs to be the trace_info struct. This means the lock in trace_info is
+ now visible too.
+- external/trace: Use correct width integer byte swapping
+
+ The trace_repeat struct uses be16 for storing the number of repeats.
+ Currently be32_to_cpu conversion is used to display this member. This
+ produces an incorrect value. Use be16_to_cpu instead.
+- core/trace: Put boot_tracebuf in correct location.
+
+ A position for the boot_tracebuf is allocated in skiboot.lds.S.
+ However, without a __section attribute the boot trace buffer is not
+ placed in the correct location, meaning that it also will not be
+ correctly aligned. Add the __section attribute to ensure it will be
+ placed in its allocated position.
+- core/lock: Add debug options to store backtrace of where lock was taken
+
+ Contrary to popular belief, skiboot developers are imperfect and
+ occasionally write locking bugs. When we exit skiboot, we check if we're
+ still holding any locks, and if so, we print an error with a list of the
+ locks currently held and the locations where they were taken.
+
+ However, this only tells us the location where lock() was called, which may
+ not be enough to work out what's going on. To give us more to go on with,
+ we can store backtrace data in the lock and print that out when we
+ unexpectedly still hold locks.
+
+ Because the backtrace data is rather big, we only enable this if
+ DEBUG_LOCKS_BACKTRACE is defined, which in turn is switched on when
+ DEBUG=1.
+
+ (We disable DEBUG_LOCKS_BACKTRACE in some of the memory allocation tests
+ because the locks used by the memory allocator take up too much room in the
+ fake skiboot heap.)
+- libfdt: upgrade to upstream dtc.git 243176c
+
+ Upgrade libfdt/ to github.com/dgibson/dtc.git 243176c ("Fix bogus
+ error on rebuild")
+
+ This copies dtc/libfdt/ to skiboot/libfdt/, with the only change in
+ that directory being the addition of README.skiboot and Makefile.inc.
+
+ This adds about 14kB text, 2.5kB compressed xz. This could be reduced
+ or mostly eliminated by cutting out fdt version checks and unused
+ code, but tracking upstream is a bigger benefit at the moment.
+
+ This loses commits:
+
+ - 14ed2b842f61 ("libfdt: add basic sanity check to fdt_open_into")
+ - bc7bb3d12bc1 ("sparse: fix declaration of fdt_strerror")
+
+ As well as some prehistoric similar kinds of things, which is the
+ punishment for us not being good downstream citizens and sending
+ things upstream! Syncing to upstream will make that effort simpler
+ in future.
+
+General Fixes
+-------------
+
+Since skiboot v6.4-rc1:
+
+- libflash: Fix broken continuations
+
+ Some of the libflash debug messages don't print a newlines at the end of
+ the line and assume that the next print will be contigious with the
+ last. This isn't true in skiboot since log messages are prefixed with a
+ timestamp. This results in funny looking output such as: ::
+
+ LIBFLASH: Verifying...
+ LIBFLASH: reading page 0x01963000..0x01964000...[3.084846885,7] same !
+ LIBFLASH: reading page 0x01964000..0x01965000...[3.086164489,7] same !
+
+ Fix this by moving the "same !" debug message to a new line with the
+ prefix "LIBFLASH: ..." to indicate it's a continuation of the last
+ statement.
+
+ First reported in https://github.com/open-power/skiboot/issues/51