From cce80be2fc7d9114ea0000349cc52f0947ea00f1 Mon Sep 17 00:00:00 2001 From: Stewart Smith Date: Wed, 27 Jul 2016 17:43:04 +1000 Subject: doc/*.txt: rename .txt to .rst Signed-off-by: Stewart Smith --- doc/bmc.rst | 57 +++++ doc/bmc.txt | 57 ----- doc/console-log.rst | 60 +++++ doc/console-log.txt | 60 ----- doc/device-tree.rst | 515 +++++++++++++++++++++++++++++++++++++++++++ doc/device-tree.txt | 515 ------------------------------------------- doc/error-logging.rst | 395 +++++++++++++++++++++++++++++++++ doc/error-logging.txt | 395 --------------------------------- doc/gcov.rst | 62 ++++++ doc/gcov.txt | 62 ------ doc/memory.rst | 44 ++++ doc/memory.txt | 44 ---- doc/nvlink.rst | 157 +++++++++++++ doc/nvlink.txt | 157 ------------- doc/opal-spec.rst | 216 ++++++++++++++++++ doc/opal-spec.txt | 216 ------------------ doc/pci-slot.rst | 119 ++++++++++ doc/pci-slot.txt | 119 ---------- doc/pci.rst | 71 ++++++ doc/pci.txt | 71 ------ doc/stable-skiboot-rules.rst | 62 ++++++ doc/stable-skiboot-rules.txt | 62 ------ doc/versioning.rst | 107 +++++++++ doc/versioning.txt | 107 --------- doc/xscom-node-bindings.rst | 57 +++++ doc/xscom-node-bindings.txt | 57 ----- 26 files changed, 1922 insertions(+), 1922 deletions(-) create mode 100644 doc/bmc.rst delete mode 100644 doc/bmc.txt create mode 100644 doc/console-log.rst delete mode 100644 doc/console-log.txt create mode 100644 doc/device-tree.rst delete mode 100644 doc/device-tree.txt create mode 100644 doc/error-logging.rst delete mode 100644 doc/error-logging.txt create mode 100644 doc/gcov.rst delete mode 100644 doc/gcov.txt create mode 100644 doc/memory.rst delete mode 100644 doc/memory.txt create mode 100644 doc/nvlink.rst delete mode 100644 doc/nvlink.txt create mode 100644 doc/opal-spec.rst delete mode 100644 doc/opal-spec.txt create mode 100644 doc/pci-slot.rst delete mode 100644 doc/pci-slot.txt create mode 100644 doc/pci.rst delete mode 100644 doc/pci.txt create mode 100644 doc/stable-skiboot-rules.rst delete mode 100644 doc/stable-skiboot-rules.txt create mode 100644 doc/versioning.rst delete mode 100644 doc/versioning.txt create mode 100644 doc/xscom-node-bindings.rst delete mode 100644 doc/xscom-node-bindings.txt diff --git a/doc/bmc.rst b/doc/bmc.rst new file mode 100644 index 0000000..78c3b29 --- /dev/null +++ b/doc/bmc.rst @@ -0,0 +1,57 @@ +OPAL <--> BMC interactions +========================== + +This document provides information about some of the user-visible interactions +that skiboot performs with the BMC. + +IPMI sensors +------------ + +OPAL will interact with a few IPMI sensors during the boot process. These +are: + + * Boot Count [type 0xc3: OEM reserved] + * FW Boot progress [type 0x0f: System Firmware Progress] + +Boot Count: assertion type. When OPAL reaches a late stage of boot, it sets the +boot count sensor to 0x02. This is intended to allow the BMC detect a failed +or aborted boot, for switching to a known-good firmware image. + +FW Boot Progress: assertion type. During boot, skiboot will update this sensor +to one of the IPMI-defined progress codes. The codes use by skiboot are: + + * PCI Resource configuration (0x01) + - asserted as the PCI devices have been probed and resources allocated + + * Motherboard init (0x14) + - asserted as the platform-specific components have been initialised + + * OS boot (0x13) + - asserted after skiboot has loaded the PAYLOAD image, and is about to + boot it. + +Chassis control messages +------------------------ + +OPAL uses chassis control messages to instruct the BMC to remove power from +the host. These messages are sent during graceful reboot and shutdown processes +initiated by the host. + +For a BMC-initiated graceful power-down (or reboot), the BMC is expected to send +an OEM-defined SEL message, using a SMS_ATN to trigger a BMC-to-host +notification. This SEL has a type of 0xc0, and command of 0x04. The data0 field +of the SEL indicates shutdown (0x0) or reboot (0x1). + + +Watchdog support +---------------- + +OPAL supports a BMC watchdog during the boot process. This will be disabled +before entering the OS. + + +Real-time clock +--------------- + +On platforms where a real-time-clock is not available, skiboot may use the +IPMI SEL Time as a real-time-clock device. diff --git a/doc/bmc.txt b/doc/bmc.txt deleted file mode 100644 index 78c3b29..0000000 --- a/doc/bmc.txt +++ /dev/null @@ -1,57 +0,0 @@ -OPAL <--> BMC interactions -========================== - -This document provides information about some of the user-visible interactions -that skiboot performs with the BMC. - -IPMI sensors ------------- - -OPAL will interact with a few IPMI sensors during the boot process. These -are: - - * Boot Count [type 0xc3: OEM reserved] - * FW Boot progress [type 0x0f: System Firmware Progress] - -Boot Count: assertion type. When OPAL reaches a late stage of boot, it sets the -boot count sensor to 0x02. This is intended to allow the BMC detect a failed -or aborted boot, for switching to a known-good firmware image. - -FW Boot Progress: assertion type. During boot, skiboot will update this sensor -to one of the IPMI-defined progress codes. The codes use by skiboot are: - - * PCI Resource configuration (0x01) - - asserted as the PCI devices have been probed and resources allocated - - * Motherboard init (0x14) - - asserted as the platform-specific components have been initialised - - * OS boot (0x13) - - asserted after skiboot has loaded the PAYLOAD image, and is about to - boot it. - -Chassis control messages ------------------------- - -OPAL uses chassis control messages to instruct the BMC to remove power from -the host. These messages are sent during graceful reboot and shutdown processes -initiated by the host. - -For a BMC-initiated graceful power-down (or reboot), the BMC is expected to send -an OEM-defined SEL message, using a SMS_ATN to trigger a BMC-to-host -notification. This SEL has a type of 0xc0, and command of 0x04. The data0 field -of the SEL indicates shutdown (0x0) or reboot (0x1). - - -Watchdog support ----------------- - -OPAL supports a BMC watchdog during the boot process. This will be disabled -before entering the OS. - - -Real-time clock ---------------- - -On platforms where a real-time-clock is not available, skiboot may use the -IPMI SEL Time as a real-time-clock device. diff --git a/doc/console-log.rst b/doc/console-log.rst new file mode 100644 index 0000000..fbdd33b --- /dev/null +++ b/doc/console-log.rst @@ -0,0 +1,60 @@ +SkiBoot Console Log +------------------- + +Skiboot maintains a circular textual log buffer in memory. + +It can be accessed using any debugging method that can peek at +memory contents. While the debug_descriptor does hold the location +of the memory console, we're pretty keen on keeping its location +static. + +Events are logged in the following format: +[timebase,log_level] message + +You should use the new prlog() call for any log message and set the +log level/priority appropriately. + +printf() is mapped to PR_PRINTF and should be phased out and replaced +with prlog() calls. + +See timebase.h for full timebase explanation. + +Log level from skiboot.h: +#define PR_EMERG 0 +#define PR_ALERT 1 +#define PR_CRIT 2 +#define PR_ERR 3 +#define PR_WARNING 4 +#define PR_NOTICE 5 +#define PR_PRINTF PR_NOTICE +#define PR_INFO 6 +#define PR_DEBUG 7 +#define PR_TRACE 8 +#define PR_INSANE 9 + +The console_log_levels byte in the debug_descriptor controls what +messages are written to any console drivers (e.g. fsp, uart) and +what level is just written to the in memory console (or not at all). + +This enables (advanced) users to vary what level of output they want +at runtime in the memory console and through console drivers (fsp/uart) + +You can vary two things by poking in the debug descriptor: +a) what log level is printed at all + e.g. only turn on PR_TRACE at specific points during runtime +b) what log level goes out the fsp/uart console + defaults to PR_PRINTF + +We use two 4bit numbers (1 byte) for this in debug descriptor (saving +some space, not needlessly wasting space that we may want in future). + +The default is 0x75 (7=PR_DEBUG to in memory console, 5=PR_PRINTF to drivers + +If you write 0x77 you will get debug info on uart/fsp console as +well as in memory. If you write 0x95 you get PR_INSANE in memory but +still only PR_NOTICE through drivers. + +People who write something like 0x1f will get a very quiet boot indeed. + + + diff --git a/doc/console-log.txt b/doc/console-log.txt deleted file mode 100644 index fbdd33b..0000000 --- a/doc/console-log.txt +++ /dev/null @@ -1,60 +0,0 @@ -SkiBoot Console Log -------------------- - -Skiboot maintains a circular textual log buffer in memory. - -It can be accessed using any debugging method that can peek at -memory contents. While the debug_descriptor does hold the location -of the memory console, we're pretty keen on keeping its location -static. - -Events are logged in the following format: -[timebase,log_level] message - -You should use the new prlog() call for any log message and set the -log level/priority appropriately. - -printf() is mapped to PR_PRINTF and should be phased out and replaced -with prlog() calls. - -See timebase.h for full timebase explanation. - -Log level from skiboot.h: -#define PR_EMERG 0 -#define PR_ALERT 1 -#define PR_CRIT 2 -#define PR_ERR 3 -#define PR_WARNING 4 -#define PR_NOTICE 5 -#define PR_PRINTF PR_NOTICE -#define PR_INFO 6 -#define PR_DEBUG 7 -#define PR_TRACE 8 -#define PR_INSANE 9 - -The console_log_levels byte in the debug_descriptor controls what -messages are written to any console drivers (e.g. fsp, uart) and -what level is just written to the in memory console (or not at all). - -This enables (advanced) users to vary what level of output they want -at runtime in the memory console and through console drivers (fsp/uart) - -You can vary two things by poking in the debug descriptor: -a) what log level is printed at all - e.g. only turn on PR_TRACE at specific points during runtime -b) what log level goes out the fsp/uart console - defaults to PR_PRINTF - -We use two 4bit numbers (1 byte) for this in debug descriptor (saving -some space, not needlessly wasting space that we may want in future). - -The default is 0x75 (7=PR_DEBUG to in memory console, 5=PR_PRINTF to drivers - -If you write 0x77 you will get debug info on uart/fsp console as -well as in memory. If you write 0x95 you get PR_INSANE in memory but -still only PR_NOTICE through drivers. - -People who write something like 0x1f will get a very quiet boot indeed. - - - diff --git a/doc/device-tree.rst b/doc/device-tree.rst new file mode 100644 index 0000000..742ff43 --- /dev/null +++ b/doc/device-tree.rst @@ -0,0 +1,515 @@ +/* + * Sapphire device-tree requirements + * + * Version: 0.2.1 + * + * This documents the generated device-tree requirements, this is based on + * a commented device-tree dump obtained from a DT generated by Sapphire + * itself from an HDAT. + */ + +/* + * General comments: + * + * - skiboot does not require nodes to have phandle properties, but + * if you have them then *all* nodes must have them including the + * root of the device-tree (currently a HB bug !). It is recommended + * to have them since they are needed to represent the cache levels. + * + * NOTE: The example tree below only has phandle properties for + * nodes that are referenced by other nodes. This is *not* correct + * and is purely done for keeping this document smaller, make sure + * to follow the rule above. + * + * - Only the "phandle" property is required. Sapphire also generates + * a "linux,phandle" for backward compatibility but doesn't require + * it as an input + * + * - Any property not specifically documented must be put in "as is" + * + * - All ibm,chip-id properties contain a HW chip ID which correspond + * on P8 to the PIR value shifted right by 7 bits, ie. it's a 6-bit + * value made of a 3-bit node number and a 3-bit chip number. + * + * - Unit addresses (@xxxx part of node names) should if possible use + * lower case hexadecimal to be consistent with what skiboot does + * and to help some stupid parsers out there... + */ + +/* + * Version history + * + * 2013/10/08 : Version 0.1 + * + * 2013/10/09 : Version 0.2 + * + * - Add comment about case of unit addresses + * - Add missing lpc node definition + * - Add example UART node on LPC + * - Remove "status" property from PSI xsco nodes + * + * 2014/03/26 : Version 0.2.1 + * + * - Fix cpus/xxx/ibm,pa-features to be a byte array + */ + +/dts-v1/; + + +/* + * Here are the reserve map entries. They should exactly match the + * reserved-ranges property of the root node (see documentation + * of that property) + */ + +/memreserve/ 0x00000007fe600000 0x0000000000100000; +/memreserve/ 0x00000007fe200000 0x0000000000100000; +/memreserve/ 0x0000000031e00000 0x00000000003e0000; +/memreserve/ 0x0000000031000000 0x0000000000e00000; +/memreserve/ 0x0000000030400000 0x0000000000c00000; +/memreserve/ 0x0000000030000000 0x0000000000400000; +/memreserve/ 0x0000000400000000 0x0000000000600450; + +/* Root node */ +/ { + /* + * "compatible" properties are string lists (ASCII strings separated by + * \0 characters) indicating the overall compatibility from the more + * specific to the least specific. + * + * The root node compatible property *must* contain "ibm,powernv" for + * Linux to have the powernv platform match the machine. It is recommended + * to add a slightly more precise property (first in order) indicating more + * precisely the board type. We don't currently do that in HDAT based + * setups but will. + * + * The standard naming is "vendor,name" so in your case, something like + * + * compatible = "goog,rhesus","ibm,powernv"; + * + * would work. Or even better: + * + * compatible = "goog,rhesus-v1","goog,rhesus","ibm,powernv"; + */ + compatible = "ibm,powernv"; + + /* mandatory */ + #address-cells = <0x2>; + #size-cells = <0x2>; + + /* User visible board name (will be shown in /proc/cpuinfo) */ + model = "Machine Name"; + + /* + * The reserved-names and reserve-names properties work hand in hand. The first one + * is a list of strings providing a "name" for each entry in the second one using + * the traditional "vendor,name" format. + * + * The reserved-ranges property contains a list of ranges, each in the form of 2 cells + * of address and 2 cells of size (64-bit x2 so each entry is 4 cells) indicating + * regions of memory that are reserved and must not be overwritten by skiboot or + * subsequently by the Linux Kernel. + * + * Corresponding entries must also be created in the "reserved map" part of the flat + * device-tree (which is a binary list in the header of the fdt). + * + * Unless a component (skiboot or Linux) specifically knows about a region (usually + * based on its name) and decides to change or remove it, all these regions are + * passed as-is to Linux and to subsequent kernels across kexec and are kept + * preserved. + * + * NOTE: Do *NOT* copy the entries below, they are just an example and are actually + * created by skiboot itself. They represent the SLW image as "detected" by reading + * the PBA BARs and skiboot own memory allocations. + * + * I would recommend that you put in there the SLW and OCC (or HOMER as one block + * if that's how you use it) and any additional memory you want to preserve such + * as FW log buffers etc... + */ + + reserved-names = "ibm,slw-image", "ibm,slw-image", "ibm,firmware-stacks", "ibm,firmware-data", "ibm,firmware-heap", "ibm,firmware-code", "memory@400000000"; + reserved-ranges = <0x7 0xfe600000 0x0 0x100000 0x7 0xfe200000 0x0 0x100000 0x0 0x31e00000 0x0 0x3e0000 0x0 0x31000000 0x0 0xe00000 0x0 0x30400000 0x0 0xc00000 0x0 0x30000000 0x0 0x400000 0x4 0x0 0x0 0x600450>; + + /* Mandatory */ + cpus { + #address-cells = <0x1>; + #size-cells = <0x0>; + + /* + * The following node must exist for each *core* in the system. The unit + * address (number after the @) is the hexadecimal HW CPU number (PIR value) + * of thread 0 of that core. + */ + PowerPC,POWER8@20 { + /* mandatory/standard properties */ + device_type = "cpu"; + 64-bit; + 32-64-bridge; + graphics; + general-purpose; + + /* + * The "status" property indicate whether the core is functional. It's + * a string containing "okay" for a good core or "bad" for a non-functional + * one. You can also just ommit the non-functional ones from the DT + */ + status = "okay"; + + /* + * This is the same value as the PIR of thread 0 of that core + * (ie same as the @xx part of the node name) + */ + reg = <0x20>; + + /* same as above */ + ibm,pir = <0x20>; + + /* chip ID of this core */ + ibm,chip-id = <0x0>; + + /* + * interrupt server numbers (aka HW processor numbers) of all threads + * on that core. This should have 8 numbers and the first one should + * have the same value as the above ibm,pir and reg properties + */ + ibm,ppc-interrupt-server#s = <0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27>; + + /* + * This is the "architected processor version" as defined in PAPR. Just + * stick to 0x0f000004 for P8 and things will be fine + */ + cpu-version = <0x0f000004>; + + /* + * These are various definitions of the page sizes and segment sizes + * supported by the MMU, those values are fine for P8 for now + */ + ibm,processor-segment-sizes = <0x1c 0x28 0xffffffff 0xffffffff>; + ibm,processor-page-sizes = <0xc 0x10 0x18 0x22>; + ibm,segment-page-sizes = <0xc 0x0 0x3 0xc 0x0 0x10 0x7 0x18 0x38 0x10 0x110 0x2 0x10 0x1 0x18 0x8 0x18 0x100 0x1 0x18 0x0 0x22 0x120 0x1 0x22 0x3>; + + /* + * Similarly that might need to be reviewed later but will do for now... + */ + ibm,pa-features = [0x6 0x0 0xf6 0x3f 0xc7 0x0 0x80 0xc0]; + + /* SLB size, use as-is */ + ibm,slb-size = <0x20>; + + /* VSX support, use as-is */ + ibm,vmx = <0x2>; + + /* DFP support, use as-is */ + ibm,dfp = <0x2>; + + /* PURR/SPURR support, use as-is */ + ibm,purr = <0x1>; + ibm,spurr = <0x1>; + + /* + * Old-style core clock frequency. Only create this property if the frequency fits + * in a 32-bit number. Do not create it if it doesn't + */ + clock-frequency = <0xf5552d00>; + + /* + * mandatory: 64-bit version of the core clock frequency, always create this + * property. + */ + ibm,extended-clock-frequency = <0x0 0xf5552d00>; + + /* Timebase freq has a fixed value, always use that */ + timebase-frequency = <0x1e848000>; + + /* Same */ + ibm,extended-timebase-frequency = <0x0 0x1e848000>; + + /* Use as-is, values might need to be adjusted but that will do for now */ + reservation-granule-size = <0x80>; + d-tlb-size = <0x800>; + i-tlb-size = <0x0>; + tlb-size = <0x800>; + d-tlb-sets = <0x4>; + i-tlb-sets = <0x0>; + tlb-sets = <0x4>; + d-cache-block-size = <0x80>; + i-cache-block-size = <0x80>; + d-cache-size = <0x10000>; + i-cache-size = <0x8000>; + i-cache-sets = <0x4>; + d-cache-sets = <0x8>; + performance-monitor = <0x0 0x1>; + + /* + * optional: phandle of the node representing the L2 cache for this core, + * note: it can also be named "next-level-cache", Linux will support both + * and Sapphire doesn't currently use those properties, just passes them + * along to Linux + */ + l2-cache = < 0x4 >; + }; + + /* + * Cache nodes. Those are siblings of the processor nodes under /cpus and + * represent the various level of caches. + * + * The unit address (and reg property) is mostly free-for-all as long as + * there is no collisions. On HDAT machines we use the following encoding + * which I encourage you to also follow to limit surprises: + * + * L2 : (0x20 << 24) | PIR (PIR is PIR value of thread 0 of core) + * L3 : (0x30 << 24) | PIR + * L3.5 : (0x35 << 24) | PIR + * + * In addition, each cache points to the next level cache via its + * own "l2-cache" (or "next-level-cache") property, so the core node + * points to the L2, the L2 points to the L3 etc... + */ + + l2-cache@20000020 { + phandle = <0x4>; + device_type = "cache"; + reg = <0x20000020>; + status = "okay"; + cache-unified; + d-cache-sets = <0x8>; + i-cache-sets = <0x8>; + d-cache-size = <0x80000>; + i-cache-size = <0x80000>; + l2-cache = <0x5>; + }; + + l3-cache@30000020 { + phandle = <0x5>; + device_type = "cache"; + reg = <0x30000020>; + status = "bad"; + cache-unified; + d-cache-sets = <0x8>; + i-cache-sets = <0x8>; + d-cache-size = <0x800000>; + i-cache-size = <0x800000>; + }; + + }; + + /* + * Interrupt presentation controller (ICP) nodes + * + * There is some flexibility as to how many of these are presents since + * a given node can represent multiple ICPs. When generating from HDAT we + * chose to create one per core + */ + interrupt-controller@3ffff80020000 { + /* Mandatory */ + compatible = "IBM,ppc-xicp", "IBM,power8-icp"; + interrupt-controller; + #address-cells = <0x0>; + device_type = "PowerPC-External-Interrupt-Presentation"; + + /* + * Range of HW CPU IDs represented by that node. In this example + * the core starting at PIR 0x20 and 8 threads, which corresponds + * to the CPU node of the example above. The property in theory + * supports multiple ranges but Linux doesn't. + */ + ibm,interrupt-server-ranges = <0x20 0x8>; + + /* + * For each server in the above range, the physical address of the + * ICP register block and its size. Since the root node #address-cells + * and #size-cells properties are both "2", each entry is thus + * 2 cells address and 2 cells size (64-bit each). + */ + reg = <0x3ffff 0x80020000 0x0 0x1000 0x3ffff 0x80021000 0x0 0x1000 0x3ffff 0x80022000 0x0 0x1000 0x3ffff 0x80023000 0x0 0x1000 0x3ffff 0x80024000 0x0 0x1000 0x3ffff 0x80025000 0x0 0x1000 0x3ffff 0x80026000 0x0 0x1000 0x3ffff 0x80027000 0x0 0x1000>; + }; + + /* + * The "memory" nodes represent physical memory in the system. They + * do not represent DIMMs, memory controllers or Centaurs, thus will + * be expressed separately. + * + * In order to be able to handle affinity properly, we require that + * a memory node is created for each range of memory that has a different + * "affinity", which in practice means for each chip since we don't + * support memory interleaved across multiple chips on P8. + * + * Additionally, it is *not* required that one chip = one memory node, + * it is perfectly acceptable to break down the memory of one chip into + * multiple memory nodes (typically skiboot does that if the two MCs + * are not interlaved). + */ + memory@0 { + device_type = "memory"; + + /* + * We support multiple entries in the ibm,chip-id property for + * memory nodes in case the memory is interleaved across multiple + * chips but that shouldn't happen on P8 + */ + ibm,chip-id = <0x0>; + + /* The "reg" property is 4 cells, as usual for a child of + * the root node, 2 cells of address and 2 cells of size + */ + reg = <0x0 0x0 0x4 0x0>; + }; + + /* + * The XSCOM node. This is the closest thing to a "chip" node we have. + * there must be one per chip in the system (thus a DCM has two) and + * while it represents the "parent" of various devices on the PIB/PCB + * that we want to expose, it is also used to store all sort of + * miscellaneous per-chip information on HDAT based systems (such + * as VPDs). + */ + xscom@3fc0000000000 { + /* standard & mandatory */ + #address-cells = <0x1>; + #size-cells = <0x1>; + scom-controller; + compatible = "ibm,xscom", "ibm,power8-xscom"; + + /* The chip ID as usual ... */ + ibm,chip-id = <0x0>; + + /* The base address of xscom for that chip */ + reg = <0x3fc00 0x0 0x8 0x0>; + + /* + * This comes from HDAT and I *think* is the raw content of the + * module VPD eeprom (and thus doesn't have a standard ASCII keyword + * VPD format). We don't currently use it though ... + */ + ibm,module-vpd = < ... big pile of binary data ... >; + + /* PSI host bridge XSCOM register set */ + psihb@2010900 { + reg = <0x2010900 0x20>; + compatible = "ibm,power8-psihb-x", "ibm,psihb-x"; + }; + + /* Chip TOD XSCOM register set */ + chiptod@40000 { + reg = <0x40000 0x34>; + compatible = "ibm,power-chiptod", "ibm,power8-chiptod"; + + /* + * Create that property with no value if this chip has + * the Primary TOD in the topology. If it has the secondary + * one (backup master ?) use "secondary". + */ + primary; + }; + + /* NX XSCOM register set */ + nx@2010000 { + reg = <0x2010000 0x4000>; + compatible = "ibm,power-nx", "ibm,power8-nx"; + }; + + /* + * PCI "PE Master" XSCOM register set for each active PHB + * + * For now, do *not* create these if the PHB isn't connected, + * clocked, or the PHY/HSS not configured. + */ + pbcq@2012000 { + reg = <0x2012000 0x20 0x9012000 0x5 0x9013c00 0x15>; + compatible = "ibm,power8-pbcq"; + + /* Indicate the PHB index on the chip, ie, 0,1 or 2 */ + ibm,phb-index = <0x0>; + + /* Create that property to use the IBM-style "A/B" dual input + * slot presence detect mechanism. + */ + ibm,use-ab-detect; + + /* + * TBD: Lane equalization values. Not currently used by + * skiboot but will have to be sorted out + */ + ibm,lane_eq = <0x0>; + }; + + pbcq@2012400 { + reg = <0x2012400 0x20 0x9012400 0x5 0x9013c40 0x15>; + compatible = "ibm,power8-pbcq"; + ibm,phb-index = <0x1>; + ibm,use-ab-detect; + ibm,lane_eq = <0x0>; + }; + + /* + * Here's the LPC bus. Ideally each chip has one but in + * practice it's ok to only populate the ones actually + * used for something. This is not an exact representation + * of HW, in that case we would have eccb -> opb -> lpc, + * but instead we just have an lpc node and the address is + * the base of the ECCB register set for it + * + * Devices on the LPC are represented as children nodes, + * see example below for a standard UART. + */ + lpc@b0020 { + /* + * Empty property indicating this is the primary + * LPC bus. It will be used for the default UART + * if any and this is the bus that will be used + * by Linux as the virtual 64k of IO ports + */ + primary; + + /* + * 2 cells of address, the first one indicates the + * address type, see below + */ + #address-cells = <0x2>; + #size-cells = <0x1>; + reg = <0xb0020 0x4>; + compatible = "ibm,power8-lpc"; + + /* + * Example device: a UART on IO ports. + * + * LPC address have 2 cells. The first cell is the + * address type as follow: + * + * 0 : LPC memory space + * 1 : LPC IO space + * 2: LPC FW space + * + * (This corresponds to the OPAL_LPC_* arguments + * passed to the opal_lpc_read/write functions) + * + * The unit address follows the old ISA convention + * for open firmware which prefixes IO ports with "i". + * + * (This is not critical and can be 1,3f8 if that's + * problematic to generate) + */ + serial@i3f8 { + reg = <0x1 0x3f8 8>; + compatible = "ns16550", "pnpPNP,501"; + + /* Baud rate generator base frequency */ + clock-frequency = < 1843200 >; + + /* Default speed to use */ + current-speed = < 115200 >; + + /* Historical, helps Linux */ + device_type = "serial"; + + /* + * Indicate which chip ID the interrupt + * is routed to (we assume it will always + * be the "host error interrupt" (aka + * "TPM interrupt" of that chip). + */ + ibm,irq-chip-id = <0x0>; + } + }; + }; +}; diff --git a/doc/device-tree.txt b/doc/device-tree.txt deleted file mode 100644 index 742ff43..0000000 --- a/doc/device-tree.txt +++ /dev/null @@ -1,515 +0,0 @@ -/* - * Sapphire device-tree requirements - * - * Version: 0.2.1 - * - * This documents the generated device-tree requirements, this is based on - * a commented device-tree dump obtained from a DT generated by Sapphire - * itself from an HDAT. - */ - -/* - * General comments: - * - * - skiboot does not require nodes to have phandle properties, but - * if you have them then *all* nodes must have them including the - * root of the device-tree (currently a HB bug !). It is recommended - * to have them since they are needed to represent the cache levels. - * - * NOTE: The example tree below only has phandle properties for - * nodes that are referenced by other nodes. This is *not* correct - * and is purely done for keeping this document smaller, make sure - * to follow the rule above. - * - * - Only the "phandle" property is required. Sapphire also generates - * a "linux,phandle" for backward compatibility but doesn't require - * it as an input - * - * - Any property not specifically documented must be put in "as is" - * - * - All ibm,chip-id properties contain a HW chip ID which correspond - * on P8 to the PIR value shifted right by 7 bits, ie. it's a 6-bit - * value made of a 3-bit node number and a 3-bit chip number. - * - * - Unit addresses (@xxxx part of node names) should if possible use - * lower case hexadecimal to be consistent with what skiboot does - * and to help some stupid parsers out there... - */ - -/* - * Version history - * - * 2013/10/08 : Version 0.1 - * - * 2013/10/09 : Version 0.2 - * - * - Add comment about case of unit addresses - * - Add missing lpc node definition - * - Add example UART node on LPC - * - Remove "status" property from PSI xsco nodes - * - * 2014/03/26 : Version 0.2.1 - * - * - Fix cpus/xxx/ibm,pa-features to be a byte array - */ - -/dts-v1/; - - -/* - * Here are the reserve map entries. They should exactly match the - * reserved-ranges property of the root node (see documentation - * of that property) - */ - -/memreserve/ 0x00000007fe600000 0x0000000000100000; -/memreserve/ 0x00000007fe200000 0x0000000000100000; -/memreserve/ 0x0000000031e00000 0x00000000003e0000; -/memreserve/ 0x0000000031000000 0x0000000000e00000; -/memreserve/ 0x0000000030400000 0x0000000000c00000; -/memreserve/ 0x0000000030000000 0x0000000000400000; -/memreserve/ 0x0000000400000000 0x0000000000600450; - -/* Root node */ -/ { - /* - * "compatible" properties are string lists (ASCII strings separated by - * \0 characters) indicating the overall compatibility from the more - * specific to the least specific. - * - * The root node compatible property *must* contain "ibm,powernv" for - * Linux to have the powernv platform match the machine. It is recommended - * to add a slightly more precise property (first in order) indicating more - * precisely the board type. We don't currently do that in HDAT based - * setups but will. - * - * The standard naming is "vendor,name" so in your case, something like - * - * compatible = "goog,rhesus","ibm,powernv"; - * - * would work. Or even better: - * - * compatible = "goog,rhesus-v1","goog,rhesus","ibm,powernv"; - */ - compatible = "ibm,powernv"; - - /* mandatory */ - #address-cells = <0x2>; - #size-cells = <0x2>; - - /* User visible board name (will be shown in /proc/cpuinfo) */ - model = "Machine Name"; - - /* - * The reserved-names and reserve-names properties work hand in hand. The first one - * is a list of strings providing a "name" for each entry in the second one using - * the traditional "vendor,name" format. - * - * The reserved-ranges property contains a list of ranges, each in the form of 2 cells - * of address and 2 cells of size (64-bit x2 so each entry is 4 cells) indicating - * regions of memory that are reserved and must not be overwritten by skiboot or - * subsequently by the Linux Kernel. - * - * Corresponding entries must also be created in the "reserved map" part of the flat - * device-tree (which is a binary list in the header of the fdt). - * - * Unless a component (skiboot or Linux) specifically knows about a region (usually - * based on its name) and decides to change or remove it, all these regions are - * passed as-is to Linux and to subsequent kernels across kexec and are kept - * preserved. - * - * NOTE: Do *NOT* copy the entries below, they are just an example and are actually - * created by skiboot itself. They represent the SLW image as "detected" by reading - * the PBA BARs and skiboot own memory allocations. - * - * I would recommend that you put in there the SLW and OCC (or HOMER as one block - * if that's how you use it) and any additional memory you want to preserve such - * as FW log buffers etc... - */ - - reserved-names = "ibm,slw-image", "ibm,slw-image", "ibm,firmware-stacks", "ibm,firmware-data", "ibm,firmware-heap", "ibm,firmware-code", "memory@400000000"; - reserved-ranges = <0x7 0xfe600000 0x0 0x100000 0x7 0xfe200000 0x0 0x100000 0x0 0x31e00000 0x0 0x3e0000 0x0 0x31000000 0x0 0xe00000 0x0 0x30400000 0x0 0xc00000 0x0 0x30000000 0x0 0x400000 0x4 0x0 0x0 0x600450>; - - /* Mandatory */ - cpus { - #address-cells = <0x1>; - #size-cells = <0x0>; - - /* - * The following node must exist for each *core* in the system. The unit - * address (number after the @) is the hexadecimal HW CPU number (PIR value) - * of thread 0 of that core. - */ - PowerPC,POWER8@20 { - /* mandatory/standard properties */ - device_type = "cpu"; - 64-bit; - 32-64-bridge; - graphics; - general-purpose; - - /* - * The "status" property indicate whether the core is functional. It's - * a string containing "okay" for a good core or "bad" for a non-functional - * one. You can also just ommit the non-functional ones from the DT - */ - status = "okay"; - - /* - * This is the same value as the PIR of thread 0 of that core - * (ie same as the @xx part of the node name) - */ - reg = <0x20>; - - /* same as above */ - ibm,pir = <0x20>; - - /* chip ID of this core */ - ibm,chip-id = <0x0>; - - /* - * interrupt server numbers (aka HW processor numbers) of all threads - * on that core. This should have 8 numbers and the first one should - * have the same value as the above ibm,pir and reg properties - */ - ibm,ppc-interrupt-server#s = <0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27>; - - /* - * This is the "architected processor version" as defined in PAPR. Just - * stick to 0x0f000004 for P8 and things will be fine - */ - cpu-version = <0x0f000004>; - - /* - * These are various definitions of the page sizes and segment sizes - * supported by the MMU, those values are fine for P8 for now - */ - ibm,processor-segment-sizes = <0x1c 0x28 0xffffffff 0xffffffff>; - ibm,processor-page-sizes = <0xc 0x10 0x18 0x22>; - ibm,segment-page-sizes = <0xc 0x0 0x3 0xc 0x0 0x10 0x7 0x18 0x38 0x10 0x110 0x2 0x10 0x1 0x18 0x8 0x18 0x100 0x1 0x18 0x0 0x22 0x120 0x1 0x22 0x3>; - - /* - * Similarly that might need to be reviewed later but will do for now... - */ - ibm,pa-features = [0x6 0x0 0xf6 0x3f 0xc7 0x0 0x80 0xc0]; - - /* SLB size, use as-is */ - ibm,slb-size = <0x20>; - - /* VSX support, use as-is */ - ibm,vmx = <0x2>; - - /* DFP support, use as-is */ - ibm,dfp = <0x2>; - - /* PURR/SPURR support, use as-is */ - ibm,purr = <0x1>; - ibm,spurr = <0x1>; - - /* - * Old-style core clock frequency. Only create this property if the frequency fits - * in a 32-bit number. Do not create it if it doesn't - */ - clock-frequency = <0xf5552d00>; - - /* - * mandatory: 64-bit version of the core clock frequency, always create this - * property. - */ - ibm,extended-clock-frequency = <0x0 0xf5552d00>; - - /* Timebase freq has a fixed value, always use that */ - timebase-frequency = <0x1e848000>; - - /* Same */ - ibm,extended-timebase-frequency = <0x0 0x1e848000>; - - /* Use as-is, values might need to be adjusted but that will do for now */ - reservation-granule-size = <0x80>; - d-tlb-size = <0x800>; - i-tlb-size = <0x0>; - tlb-size = <0x800>; - d-tlb-sets = <0x4>; - i-tlb-sets = <0x0>; - tlb-sets = <0x4>; - d-cache-block-size = <0x80>; - i-cache-block-size = <0x80>; - d-cache-size = <0x10000>; - i-cache-size = <0x8000>; - i-cache-sets = <0x4>; - d-cache-sets = <0x8>; - performance-monitor = <0x0 0x1>; - - /* - * optional: phandle of the node representing the L2 cache for this core, - * note: it can also be named "next-level-cache", Linux will support both - * and Sapphire doesn't currently use those properties, just passes them - * along to Linux - */ - l2-cache = < 0x4 >; - }; - - /* - * Cache nodes. Those are siblings of the processor nodes under /cpus and - * represent the various level of caches. - * - * The unit address (and reg property) is mostly free-for-all as long as - * there is no collisions. On HDAT machines we use the following encoding - * which I encourage you to also follow to limit surprises: - * - * L2 : (0x20 << 24) | PIR (PIR is PIR value of thread 0 of core) - * L3 : (0x30 << 24) | PIR - * L3.5 : (0x35 << 24) | PIR - * - * In addition, each cache points to the next level cache via its - * own "l2-cache" (or "next-level-cache") property, so the core node - * points to the L2, the L2 points to the L3 etc... - */ - - l2-cache@20000020 { - phandle = <0x4>; - device_type = "cache"; - reg = <0x20000020>; - status = "okay"; - cache-unified; - d-cache-sets = <0x8>; - i-cache-sets = <0x8>; - d-cache-size = <0x80000>; - i-cache-size = <0x80000>; - l2-cache = <0x5>; - }; - - l3-cache@30000020 { - phandle = <0x5>; - device_type = "cache"; - reg = <0x30000020>; - status = "bad"; - cache-unified; - d-cache-sets = <0x8>; - i-cache-sets = <0x8>; - d-cache-size = <0x800000>; - i-cache-size = <0x800000>; - }; - - }; - - /* - * Interrupt presentation controller (ICP) nodes - * - * There is some flexibility as to how many of these are presents since - * a given node can represent multiple ICPs. When generating from HDAT we - * chose to create one per core - */ - interrupt-controller@3ffff80020000 { - /* Mandatory */ - compatible = "IBM,ppc-xicp", "IBM,power8-icp"; - interrupt-controller; - #address-cells = <0x0>; - device_type = "PowerPC-External-Interrupt-Presentation"; - - /* - * Range of HW CPU IDs represented by that node. In this example - * the core starting at PIR 0x20 and 8 threads, which corresponds - * to the CPU node of the example above. The property in theory - * supports multiple ranges but Linux doesn't. - */ - ibm,interrupt-server-ranges = <0x20 0x8>; - - /* - * For each server in the above range, the physical address of the - * ICP register block and its size. Since the root node #address-cells - * and #size-cells properties are both "2", each entry is thus - * 2 cells address and 2 cells size (64-bit each). - */ - reg = <0x3ffff 0x80020000 0x0 0x1000 0x3ffff 0x80021000 0x0 0x1000 0x3ffff 0x80022000 0x0 0x1000 0x3ffff 0x80023000 0x0 0x1000 0x3ffff 0x80024000 0x0 0x1000 0x3ffff 0x80025000 0x0 0x1000 0x3ffff 0x80026000 0x0 0x1000 0x3ffff 0x80027000 0x0 0x1000>; - }; - - /* - * The "memory" nodes represent physical memory in the system. They - * do not represent DIMMs, memory controllers or Centaurs, thus will - * be expressed separately. - * - * In order to be able to handle affinity properly, we require that - * a memory node is created for each range of memory that has a different - * "affinity", which in practice means for each chip since we don't - * support memory interleaved across multiple chips on P8. - * - * Additionally, it is *not* required that one chip = one memory node, - * it is perfectly acceptable to break down the memory of one chip into - * multiple memory nodes (typically skiboot does that if the two MCs - * are not interlaved). - */ - memory@0 { - device_type = "memory"; - - /* - * We support multiple entries in the ibm,chip-id property for - * memory nodes in case the memory is interleaved across multiple - * chips but that shouldn't happen on P8 - */ - ibm,chip-id = <0x0>; - - /* The "reg" property is 4 cells, as usual for a child of - * the root node, 2 cells of address and 2 cells of size - */ - reg = <0x0 0x0 0x4 0x0>; - }; - - /* - * The XSCOM node. This is the closest thing to a "chip" node we have. - * there must be one per chip in the system (thus a DCM has two) and - * while it represents the "parent" of various devices on the PIB/PCB - * that we want to expose, it is also used to store all sort of - * miscellaneous per-chip information on HDAT based systems (such - * as VPDs). - */ - xscom@3fc0000000000 { - /* standard & mandatory */ - #address-cells = <0x1>; - #size-cells = <0x1>; - scom-controller; - compatible = "ibm,xscom", "ibm,power8-xscom"; - - /* The chip ID as usual ... */ - ibm,chip-id = <0x0>; - - /* The base address of xscom for that chip */ - reg = <0x3fc00 0x0 0x8 0x0>; - - /* - * This comes from HDAT and I *think* is the raw content of the - * module VPD eeprom (and thus doesn't have a standard ASCII keyword - * VPD format). We don't currently use it though ... - */ - ibm,module-vpd = < ... big pile of binary data ... >; - - /* PSI host bridge XSCOM register set */ - psihb@2010900 { - reg = <0x2010900 0x20>; - compatible = "ibm,power8-psihb-x", "ibm,psihb-x"; - }; - - /* Chip TOD XSCOM register set */ - chiptod@40000 { - reg = <0x40000 0x34>; - compatible = "ibm,power-chiptod", "ibm,power8-chiptod"; - - /* - * Create that property with no value if this chip has - * the Primary TOD in the topology. If it has the secondary - * one (backup master ?) use "secondary". - */ - primary; - }; - - /* NX XSCOM register set */ - nx@2010000 { - reg = <0x2010000 0x4000>; - compatible = "ibm,power-nx", "ibm,power8-nx"; - }; - - /* - * PCI "PE Master" XSCOM register set for each active PHB - * - * For now, do *not* create these if the PHB isn't connected, - * clocked, or the PHY/HSS not configured. - */ - pbcq@2012000 { - reg = <0x2012000 0x20 0x9012000 0x5 0x9013c00 0x15>; - compatible = "ibm,power8-pbcq"; - - /* Indicate the PHB index on the chip, ie, 0,1 or 2 */ - ibm,phb-index = <0x0>; - - /* Create that property to use the IBM-style "A/B" dual input - * slot presence detect mechanism. - */ - ibm,use-ab-detect; - - /* - * TBD: Lane equalization values. Not currently used by - * skiboot but will have to be sorted out - */ - ibm,lane_eq = <0x0>; - }; - - pbcq@2012400 { - reg = <0x2012400 0x20 0x9012400 0x5 0x9013c40 0x15>; - compatible = "ibm,power8-pbcq"; - ibm,phb-index = <0x1>; - ibm,use-ab-detect; - ibm,lane_eq = <0x0>; - }; - - /* - * Here's the LPC bus. Ideally each chip has one but in - * practice it's ok to only populate the ones actually - * used for something. This is not an exact representation - * of HW, in that case we would have eccb -> opb -> lpc, - * but instead we just have an lpc node and the address is - * the base of the ECCB register set for it - * - * Devices on the LPC are represented as children nodes, - * see example below for a standard UART. - */ - lpc@b0020 { - /* - * Empty property indicating this is the primary - * LPC bus. It will be used for the default UART - * if any and this is the bus that will be used - * by Linux as the virtual 64k of IO ports - */ - primary; - - /* - * 2 cells of address, the first one indicates the - * address type, see below - */ - #address-cells = <0x2>; - #size-cells = <0x1>; - reg = <0xb0020 0x4>; - compatible = "ibm,power8-lpc"; - - /* - * Example device: a UART on IO ports. - * - * LPC address have 2 cells. The first cell is the - * address type as follow: - * - * 0 : LPC memory space - * 1 : LPC IO space - * 2: LPC FW space - * - * (This corresponds to the OPAL_LPC_* arguments - * passed to the opal_lpc_read/write functions) - * - * The unit address follows the old ISA convention - * for open firmware which prefixes IO ports with "i". - * - * (This is not critical and can be 1,3f8 if that's - * problematic to generate) - */ - serial@i3f8 { - reg = <0x1 0x3f8 8>; - compatible = "ns16550", "pnpPNP,501"; - - /* Baud rate generator base frequency */ - clock-frequency = < 1843200 >; - - /* Default speed to use */ - current-speed = < 115200 >; - - /* Historical, helps Linux */ - device_type = "serial"; - - /* - * Indicate which chip ID the interrupt - * is routed to (we assume it will always - * be the "host error interrupt" (aka - * "TPM interrupt" of that chip). - */ - ibm,irq-chip-id = <0x0>; - } - }; - }; -}; diff --git a/doc/error-logging.rst b/doc/error-logging.rst new file mode 100644 index 0000000..7c62520 --- /dev/null +++ b/doc/error-logging.rst @@ -0,0 +1,395 @@ +How to log errors on Sapphire and POWERNV: +========================================= + +Currently the errors reported by POWERNV/Sapphire (OPAL) interfaces +are in free form, where as errors reported by FSP is in standard Platform +Error Log (PEL) format. For out-of band management via IPMI interfaces, +it is necessary to push down the errors to FSP via mailbox +(reported by POWERNV/Sapphire) in PEL format. + +PEL size can vary from 2K-16K bytes, fields of which needs to populated +based on the kind of event and error that needs to be reported. +All the information needed to be reported as part of the error, is +passed by user using the error-logging interfaces outlined below. +Following which, PEL structure is generated based on the input and +then passed on to FSP. + +Error logging interfaces in Sapphire: +==================================== + +Interfaces are provided for the user to log/report an error in Sapphire. +Using these interfaces relevant error information is collected and later +converted to PEL format and then pushed to FSP. + +Step 1: To report an error, invoke opal_elog_create() with required argument. + + struct errorlog *opal_elog_create(struct opal_err_info *e_info, + uint32_t tag); + + Parameters: + + struct opal_err_info *e_info: Struct to hold information identifying + error/event source. + + uint32_t tag: Unique value to identify the data. + Ideal to have ASCII value for 4-byte string. + + The opal_err_info struct holds several pieces of information to help + identify the error/event. The struct can be obtained via the + DEFINE_LOG_ENTRY macro as below - it only needs to be called once. + + DEFINE_LOG_ENTRY(OPAL_RC_ATTN, OPAL_PLATFORM_ERR_EVT, OPAL_CHIP, + OPAL_PLATFORM_FIRMWARE, OPAL_PREDICTIVE_ERR_GENERAL, + OPAL_NA); + + The various attributes set by this macro are described below. + + uint8_t opal_error_event_type: Classification of error/events + type reported on OPAL + /* Platform Events/Errors: Report Machine Check Interrupt */ + #define OPAL_PLATFORM_ERR_EVT 0x01 + /* INPUT_OUTPUT: Report all I/O related events/errors */ + #define OPAL_INPUT_OUTPUT_ERR_EVT 0x02 + /* RESOURCE_DEALLOC: Hotplug events and errors */ + #define OPAL_RESOURCE_DEALLOC_ERR_EVT 0x03 + /* MISC: Miscellaneous error */ + #define OPAL_MISC_ERR_EVT 0x04 + + uint16_t component_id: Component ID of Sapphire component as + listed in include/errorlog.h + + uint8_t subsystem_id: ID of the sub-system reporting error. + /* OPAL Subsystem IDs listed for reporting events/errors */ + #define OPAL_PROCESSOR_SUBSYSTEM 0x10 + #define OPAL_MEMORY_SUBSYSTEM 0x20 + #define OPAL_IO_SUBSYSTEM 0x30 + #define OPAL_IO_DEVICES 0x40 + #define OPAL_CEC_HARDWARE 0x50 + #define OPAL_POWER_COOLING 0x60 + #define OPAL_MISC 0x70 + #define OPAL_SURVEILLANCE_ERR 0x7A + #define OPAL_PLATFORM_FIRMWARE 0x80 + #define OPAL_SOFTWARE 0x90 + #define OPAL_EXTERNAL_ENV 0xA0 + + uint8_t event_severity: Severity of the event/error to be reported + #define OPAL_INFO 0x00 + #define OPAL_RECOVERED_ERR_GENERAL 0x10 + + /* 0x2X series is to denote set of Predictive Error */ + /* 0x20 Generic predictive error */ + #define OPAL_PREDICTIVE_ERR_GENERAL 0x20 + /* 0x21 Predictive error, degraded performance */ + #define OPAL_PREDICTIVE_ERR_DEGRADED_PERF 0x21 + /* 0x22 Predictive error, fault may be corrected after reboot */ + #define OPAL_PREDICTIVE_ERR_FAULT_RECTIFY_REBOOT 0x22 + /* + * 0x23 Predictive error, fault may be corrected after reboot, + * degraded performance + */ + #define OPAL_PREDICTIVE_ERR_FAULT_RECTIFY_BOOT_DEGRADE_PERF 0x23 + /* 0x24 Predictive error, loss of redundancy */ + #define OPAL_PREDICTIVE_ERR_LOSS_OF_REDUNDANCY 0x24 + + /* 0x4X series for Unrecoverable Error */ + /* 0x40 Generic Unrecoverable error */ + #define OPAL_UNRECOVERABLE_ERR_GENERAL 0x40 + /* 0x41 Unrecoverable error bypassed with degraded performance */ + #define OPAL_UNRECOVERABLE_ERR_DEGRADE_PERF 0x41 + /* 0x44 Unrecoverable error bypassed with loss of redundancy */ + #define OPAL_UNRECOVERABLE_ERR_LOSS_REDUNDANCY 0x44 + /* 0x45 Unrecoverable error bypassed with loss of redundancy and performance */ + #define OPAL_UNRECOVERABLE_ERR_LOSS_REDUNDANCY_PERF 0x45 + /* 0x48 Unrecoverable error bypassed with loss of function */ + #define OPAL_UNRECOVERABLE_ERR_LOSS_OF_FUNCTION 0x48 + + #define OPAL_ERROR_PANIC 0x50 + + uint8_t event_subtype: Event Sub-type + #define OPAL_NA 0x00 + #define OPAL_MISCELLANEOUS_INFO_ONLY 0x01 + #define OPAL_PREV_REPORTED_ERR_RECTIFIED 0x10 + #define OPAL_SYS_RESOURCES_DECONFIG_BY_USER 0x20 + #define OPAL_SYS_RESOURCE_DECONFIG_PRIOR_ERR 0x21 + #define OPAL_RESOURCE_DEALLOC_EVENT_NOTIFY 0x22 + #define OPAL_CONCURRENT_MAINTENANCE_EVENT 0x40 + #define OPAL_CAPACITY_UPGRADE_EVENT 0x60 + #define OPAL_RESOURCE_SPARING_EVENT 0x70 + #define OPAL_DYNAMIC_RECONFIG_EVENT 0x80 + #define OPAL_NORMAL_SYS_PLATFORM_SHUTDOWN 0xD0 + #define OPAL_ABNORMAL_POWER_OFF 0xE0 + + uint8_t opal_srctype: SRC type, value should be OPAL_SRC_TYPE_ERROR. + SRC refers to System Reference Code. + It is 4 byte hexa-decimal number that reflects the + current system state. + Eg: BB821010, + 1st byte -> BB -> SRC Type + 2nd byte -> 82 -> Subsystem + 3rd, 4th byte -> Component ID and Reason Code + SRC needs to be generated on the fly depending on the state + of the system. All the parameters needed to generate a SRC + should be provided during reporting of an event/error. + + + uint32_t reason_code: Reason for failure as stated in include/errorlog.h + for Sapphire + Eg: Reason code for code-update failures can be + OPAL_RC_CU_INIT -> Initialisation failure + OPAL_RC_CU_FLASH -> Flash failure + + +Step 2: Data can be appended to the user data section using the either of + the below two interfaces: + + void log_append_data(struct errorlog *buf, unsigned char *data, + uint16_t size) + + Parameters: + struct opal_errorlog *buf: + struct opal_errorlog *buf: struct opal_errorlog pointer returned + by opal_elog_create() call. + + unsigned char *data: Pointer to the dump data + + uint16_t size: Size of the dump data. + + void log_append_msg(struct errorlog *buf, const char *fmt, ...) + + Parameters: + struct opal_errorlog *buf: + struct opal_errorlog *buf: struct opal_errorlog pointer returned + by opal_elog_create() call. + + const char *fmt: Formatted error log string. + + Additional user data sections can be added to the error log to + separate data (eg. readable text vs binary data) by calling + log_add_section(). The interfaces in Step 2 operate on the 'last' + user data section of the error log. + + void log_add_section(struct errorlog *buf, uint32_t tag); + + Parameters: + struct opal_errorlog *buf: + struct opal_errorlog *buf: struct opal_errorlog pointer returned + by opal_elog_create() call. + + uint32_t tag: Unique value to identify the data. + Ideal to have ASCII value for 4-byte string. + +Step 3: Once all the data for an error is logged in, the error needs to be + committed in FSP. + + rc = elog_fsp_commit(buf); + Value of 0 is returned on success. + +In the process of committing an error to FSP, log info is first internally +converted to PEL format and then pushed to the FSP. All the errors logged +in Sapphire are again pushed up to POWERNV platform by the FSP and all the errors +reported by Sapphire and POWERNV are logged in FSP. + +If the user does not intend to dump various user data sections, but just +log the error with some amount of description around that error, they can do +so using just the simple error logging interface + +log_simple_error(uint32_t reason_code, char *fmt, ...); + +Eg: log_simple_error(OPAL_RC_SURVE_STATUS, + "SURV: Error retreiving surveillance status: %d\n", + err_len); + +Using the reason code, an error log is generated with the information derived +from the look-up table, populated and committed to FSP. All of it +is done with just one call. + +Note: +==== +* For more information regarding error logging and PEL format + refer to PAPR doc and P7 PEL and SRC PLDD document. + +* Refer to include/errorlog.h for all the error logging + interface parameters and include/pel.h for PEL + structures. + +Sample error logging: +=================== + +DEFINE_LOG_ENTRY(OPAL_RC_ATTN, OPAL_PLATFORM_ERR_EVT, OPAL_ATTN, + OPAL_PLATFORM_FIRMWARE, OPAL_PREDICTIVE_ERR_GENERAL, + OPAL_NA); + +void report_error(int index) +{ + struct errorlog *buf; + char data1[] = "This is a sample user defined data section1"; + char data2[] = "Error logging sample. These are dummy errors. Section 2"; + char data3[] = "Sample error Sample error Sample error Sample error \ + Sample error abcdefghijklmnopqrstuvwxyz"; + int tag; + + printf("ELOG: In machine check report error index: %d\n", index); + + /* To report an error, create an error log with relevant information + * opal_elog_create(). Call returns a pre-allocated buffer of type + * 'struct errorlog' buffer with relevant fields updated. + */ + + /* tag -> unique ascii tag to identify a particular data dump section */ + tag = 0x4b4b4b4b; + buf = opal_elog_create(&e_info(OPAL_RC_ATTN), tag); + if (buf == NULL) { + printf("ELOG: Error getting buffer.\n"); + return; + } + + /* Append data or text with log_append_data() or log_append_msg() */ + log_append_data(buf, data1, sizeof(data1)); + + /* In case of user wanting to add multiple sections of various dump data + * for better debug, data sections can be added using this interface + * void log_add_section(struct errorlog *buf, uint32_t tag); + */ + tag = 0x4c4c4c4c; + log_add_section(buf, tag); + log_append_data(buf, data2, sizeof(data2)); + log_append_data(buf, data3, sizeof(data3)); + + /* Once all info is updated, ready to be sent to FSP */ + printf("ELOG:commit to FSP\n"); + log_commit(buf); +} + + Sample output PEL dump got from FSP: + =================================== + $ errl -d -x 0x533C9B37 +| 00000000 50480030 01004154 20150728 02000500 PH.0..AT ..(.... | +| 00000010 20150728 02000566 4B000107 00000000 ..(...fK....... | +| 00000020 00000000 00000000 B0000002 533C9B37 ............S..7 | +| 00000030 55480018 01004154 80002000 00000000 UH....AT.. ..... | +| 00000040 00002000 01005300 50530050 01004154 .. ...S.PS.P..AT | +| 00000050 02000008 00000048 00000080 00000000 .......H........ | +| 00000060 00000000 00000000 00000000 00000000 ................ | +| 00000070 00000000 00000000 42423832 31343130 ........BB821410 | +| 00000080 20202020 20202020 20202020 20202020 | +| 00000090 20202020 20202020 4548004C 01004154 EH.L..AT | +| 000000A0 38323836 2D343241 31303738 34415400 8286-42A10784AT. | +| 000000B0 00000000 00000000 00000000 00000000 ................ | +| 000000C0 00000000 00000000 00000000 00000000 ................ | +| 000000D0 00000000 00000000 20150728 02000500 ........ ..(.... | +| 000000E0 00000000 4D54001C 01004154 38323836 ....MT....AT8286 | +| 000000F0 2D343241 31303738 34415400 00000000 -42A10784AT..... | +| 00000100 5544003C 01004154 4B4B4B4B 00340000 UD....ATKKKK.4.. | +| 00000110 54686973 20697320 61207361 6D706C65 This is a sample | +| 00000120 20757365 72206465 66696E65 64206461 user defined da | +| 00000130 74612073 65637469 6F6E3100 554400A7 ta section1.UD.. | +| 00000140 01004154 4C4C4C4C 009F0000 4572726F ..ATLLLL....Erro | +| 00000150 72206C6F 6767696E 67207361 6D706C65 r logging sample | +| 00000160 2E205468 65736520 61726520 64756D6D . These are dumm | +| 00000170 79206572 726F7273 2E205365 6374696F y errors. Sectio | +| 00000180 6E203200 53616D70 6C652065 72726F72 n 2.Sample error | +| 00000190 2053616D 706C6520 6572726F 72205361 Sample error Sa | +| 000001A0 6D706C65 20657272 6F722053 616D706C mple error Sampl | +| 000001B0 65206572 726F7220 09090953 616D706C e error ...Sampl | +| 000001C0 65206572 726F7220 61626364 65666768 e error abcdefgh | +| 000001D0 696A6B6C 6D6E6F70 71727374 75767778 ijklmnopqrstuvwx | +| 000001E0 797A00 yz. | +|------------------------------------------------------------------------------| +| Platform Event Log - 0x533C9B37 | +|------------------------------------------------------------------------------| +| Private Header | +|------------------------------------------------------------------------------| +| Section Version : 1 | +| Sub-section type : 0 | +| Created by : 4154 | +| Created at : 07/28/2015 02:00:05 | +| Committed at : 07/28/2015 02:00:05 | +| Creator Subsystem : OPAL | +| CSSVER : | +| Platform Log Id : 0xB0000002 | +| Entry Id : 0x533C9B37 | +| Total Log Size : 483 | +|------------------------------------------------------------------------------| +| User Header | +|------------------------------------------------------------------------------| +| Section Version : 1 | +| Sub-section type : 0 | +| Log Committed by : 4154 | +| Subsystem : Platform Firmware | +| Event Scope : Unknown - 0x00000000 | +| Event Severity : Predictive Error | +| Event Type : Not Applicable | +| Return Code : 0x00000000 | +| Action Flags : Report Externally | +| Action Status : Sent to Hypervisor | +|------------------------------------------------------------------------------| +| Primary System Reference Code | +|------------------------------------------------------------------------------| +| Section Version : 1 | +| Sub-section type : 0 | +| Created by : 4154 | +| SRC Format : 0x80 | +| SRC Version : 0x02 | +| Virtual Progress SRC : False | +| I5/OS Service Event Bit : False | +| Hypervisor Dump Initiated: False | +| Power Control Net Fault : False | +| | +| Valid Word Count : 0x08 | +| Reference Code : BB821410 | +| Hex Words 2 - 5 : 00000080 00000000 00000000 00000000 | +| Hex Words 6 - 9 : 00000000 00000000 00000000 00000000 | +| | +|------------------------------------------------------------------------------| +| Extended User Header | +|------------------------------------------------------------------------------| +| Section Version : 1 | +| Sub-section type : 0 | +| Created by : 4154 | +| Reporting Machine Type : 8286-42A | +| Reporting Serial Number : 10784AT | +| FW Released Ver : | +| FW SubSys Version : | +| Common Ref Time : 07/28/2015 02:00:05 | +| Symptom Id Len : 0 | +| Symptom Id : | +|------------------------------------------------------------------------------| +| Machine Type/Model & Serial Number | +|------------------------------------------------------------------------------| +| Section Version : 1 | +| Sub-section type : 0 | +| Created by : 4154 | +| Machine Type Model : 8286-42A | +| Serial Number : 10784AT | +|------------------------------------------------------------------------------| +| User Defined Data | +|------------------------------------------------------------------------------| +| Section Version : 1 | +| Sub-section type : 0 | +| Created by : 4154 | +| | +| 00000000 4B4B4B4B 00340000 54686973 20697320 KKKK.4..This is | +| 00000010 61207361 6D706C65 20757365 72206465 a sample user de | +| 00000020 66696E65 64206461 74612073 65637469 fined data secti | +| 00000030 6F6E3100 on1. | +| | +|------------------------------------------------------------------------------| +| User Defined Data | +|------------------------------------------------------------------------------| +| Section Version : 1 | +| Sub-section type : 0 | +| Created by : 4154 | +| | +| 00000000 4C4C4C4C 009F0000 4572726F 72206C6F LLLL....Error lo | +| 00000010 6767696E 67207361 6D706C65 2E205468 gging sample. Th | +| 00000020 65736520 61726520 64756D6D 79206572 ese are dummy er | +| 00000030 726F7273 2E205365 6374696F 6E203200 rors. Section 2. | +| 00000040 53616D70 6C652065 72726F72 2053616D Sample error Sam | +| 00000050 706C6520 6572726F 72205361 6D706C65 ple error Sample | +| 00000060 20657272 6F722053 616D706C 65206572 error Sample er | +| 00000070 726F7220 09090953 616D706C 65206572 ror ...Sample er | +| 00000080 726F7220 61626364 65666768 696A6B6C ror abcdefghijkl | +| 00000090 6D6E6F70 71727374 75767778 797A00 mnopqrstuvwxyz. | +| | +|------------------------------------------------------------------------------| + diff --git a/doc/error-logging.txt b/doc/error-logging.txt deleted file mode 100644 index 7c62520..0000000 --- a/doc/error-logging.txt +++ /dev/null @@ -1,395 +0,0 @@ -How to log errors on Sapphire and POWERNV: -========================================= - -Currently the errors reported by POWERNV/Sapphire (OPAL) interfaces -are in free form, where as errors reported by FSP is in standard Platform -Error Log (PEL) format. For out-of band management via IPMI interfaces, -it is necessary to push down the errors to FSP via mailbox -(reported by POWERNV/Sapphire) in PEL format. - -PEL size can vary from 2K-16K bytes, fields of which needs to populated -based on the kind of event and error that needs to be reported. -All the information needed to be reported as part of the error, is -passed by user using the error-logging interfaces outlined below. -Following which, PEL structure is generated based on the input and -then passed on to FSP. - -Error logging interfaces in Sapphire: -==================================== - -Interfaces are provided for the user to log/report an error in Sapphire. -Using these interfaces relevant error information is collected and later -converted to PEL format and then pushed to FSP. - -Step 1: To report an error, invoke opal_elog_create() with required argument. - - struct errorlog *opal_elog_create(struct opal_err_info *e_info, - uint32_t tag); - - Parameters: - - struct opal_err_info *e_info: Struct to hold information identifying - error/event source. - - uint32_t tag: Unique value to identify the data. - Ideal to have ASCII value for 4-byte string. - - The opal_err_info struct holds several pieces of information to help - identify the error/event. The struct can be obtained via the - DEFINE_LOG_ENTRY macro as below - it only needs to be called once. - - DEFINE_LOG_ENTRY(OPAL_RC_ATTN, OPAL_PLATFORM_ERR_EVT, OPAL_CHIP, - OPAL_PLATFORM_FIRMWARE, OPAL_PREDICTIVE_ERR_GENERAL, - OPAL_NA); - - The various attributes set by this macro are described below. - - uint8_t opal_error_event_type: Classification of error/events - type reported on OPAL - /* Platform Events/Errors: Report Machine Check Interrupt */ - #define OPAL_PLATFORM_ERR_EVT 0x01 - /* INPUT_OUTPUT: Report all I/O related events/errors */ - #define OPAL_INPUT_OUTPUT_ERR_EVT 0x02 - /* RESOURCE_DEALLOC: Hotplug events and errors */ - #define OPAL_RESOURCE_DEALLOC_ERR_EVT 0x03 - /* MISC: Miscellaneous error */ - #define OPAL_MISC_ERR_EVT 0x04 - - uint16_t component_id: Component ID of Sapphire component as - listed in include/errorlog.h - - uint8_t subsystem_id: ID of the sub-system reporting error. - /* OPAL Subsystem IDs listed for reporting events/errors */ - #define OPAL_PROCESSOR_SUBSYSTEM 0x10 - #define OPAL_MEMORY_SUBSYSTEM 0x20 - #define OPAL_IO_SUBSYSTEM 0x30 - #define OPAL_IO_DEVICES 0x40 - #define OPAL_CEC_HARDWARE 0x50 - #define OPAL_POWER_COOLING 0x60 - #define OPAL_MISC 0x70 - #define OPAL_SURVEILLANCE_ERR 0x7A - #define OPAL_PLATFORM_FIRMWARE 0x80 - #define OPAL_SOFTWARE 0x90 - #define OPAL_EXTERNAL_ENV 0xA0 - - uint8_t event_severity: Severity of the event/error to be reported - #define OPAL_INFO 0x00 - #define OPAL_RECOVERED_ERR_GENERAL 0x10 - - /* 0x2X series is to denote set of Predictive Error */ - /* 0x20 Generic predictive error */ - #define OPAL_PREDICTIVE_ERR_GENERAL 0x20 - /* 0x21 Predictive error, degraded performance */ - #define OPAL_PREDICTIVE_ERR_DEGRADED_PERF 0x21 - /* 0x22 Predictive error, fault may be corrected after reboot */ - #define OPAL_PREDICTIVE_ERR_FAULT_RECTIFY_REBOOT 0x22 - /* - * 0x23 Predictive error, fault may be corrected after reboot, - * degraded performance - */ - #define OPAL_PREDICTIVE_ERR_FAULT_RECTIFY_BOOT_DEGRADE_PERF 0x23 - /* 0x24 Predictive error, loss of redundancy */ - #define OPAL_PREDICTIVE_ERR_LOSS_OF_REDUNDANCY 0x24 - - /* 0x4X series for Unrecoverable Error */ - /* 0x40 Generic Unrecoverable error */ - #define OPAL_UNRECOVERABLE_ERR_GENERAL 0x40 - /* 0x41 Unrecoverable error bypassed with degraded performance */ - #define OPAL_UNRECOVERABLE_ERR_DEGRADE_PERF 0x41 - /* 0x44 Unrecoverable error bypassed with loss of redundancy */ - #define OPAL_UNRECOVERABLE_ERR_LOSS_REDUNDANCY 0x44 - /* 0x45 Unrecoverable error bypassed with loss of redundancy and performance */ - #define OPAL_UNRECOVERABLE_ERR_LOSS_REDUNDANCY_PERF 0x45 - /* 0x48 Unrecoverable error bypassed with loss of function */ - #define OPAL_UNRECOVERABLE_ERR_LOSS_OF_FUNCTION 0x48 - - #define OPAL_ERROR_PANIC 0x50 - - uint8_t event_subtype: Event Sub-type - #define OPAL_NA 0x00 - #define OPAL_MISCELLANEOUS_INFO_ONLY 0x01 - #define OPAL_PREV_REPORTED_ERR_RECTIFIED 0x10 - #define OPAL_SYS_RESOURCES_DECONFIG_BY_USER 0x20 - #define OPAL_SYS_RESOURCE_DECONFIG_PRIOR_ERR 0x21 - #define OPAL_RESOURCE_DEALLOC_EVENT_NOTIFY 0x22 - #define OPAL_CONCURRENT_MAINTENANCE_EVENT 0x40 - #define OPAL_CAPACITY_UPGRADE_EVENT 0x60 - #define OPAL_RESOURCE_SPARING_EVENT 0x70 - #define OPAL_DYNAMIC_RECONFIG_EVENT 0x80 - #define OPAL_NORMAL_SYS_PLATFORM_SHUTDOWN 0xD0 - #define OPAL_ABNORMAL_POWER_OFF 0xE0 - - uint8_t opal_srctype: SRC type, value should be OPAL_SRC_TYPE_ERROR. - SRC refers to System Reference Code. - It is 4 byte hexa-decimal number that reflects the - current system state. - Eg: BB821010, - 1st byte -> BB -> SRC Type - 2nd byte -> 82 -> Subsystem - 3rd, 4th byte -> Component ID and Reason Code - SRC needs to be generated on the fly depending on the state - of the system. All the parameters needed to generate a SRC - should be provided during reporting of an event/error. - - - uint32_t reason_code: Reason for failure as stated in include/errorlog.h - for Sapphire - Eg: Reason code for code-update failures can be - OPAL_RC_CU_INIT -> Initialisation failure - OPAL_RC_CU_FLASH -> Flash failure - - -Step 2: Data can be appended to the user data section using the either of - the below two interfaces: - - void log_append_data(struct errorlog *buf, unsigned char *data, - uint16_t size) - - Parameters: - struct opal_errorlog *buf: - struct opal_errorlog *buf: struct opal_errorlog pointer returned - by opal_elog_create() call. - - unsigned char *data: Pointer to the dump data - - uint16_t size: Size of the dump data. - - void log_append_msg(struct errorlog *buf, const char *fmt, ...) - - Parameters: - struct opal_errorlog *buf: - struct opal_errorlog *buf: struct opal_errorlog pointer returned - by opal_elog_create() call. - - const char *fmt: Formatted error log string. - - Additional user data sections can be added to the error log to - separate data (eg. readable text vs binary data) by calling - log_add_section(). The interfaces in Step 2 operate on the 'last' - user data section of the error log. - - void log_add_section(struct errorlog *buf, uint32_t tag); - - Parameters: - struct opal_errorlog *buf: - struct opal_errorlog *buf: struct opal_errorlog pointer returned - by opal_elog_create() call. - - uint32_t tag: Unique value to identify the data. - Ideal to have ASCII value for 4-byte string. - -Step 3: Once all the data for an error is logged in, the error needs to be - committed in FSP. - - rc = elog_fsp_commit(buf); - Value of 0 is returned on success. - -In the process of committing an error to FSP, log info is first internally -converted to PEL format and then pushed to the FSP. All the errors logged -in Sapphire are again pushed up to POWERNV platform by the FSP and all the errors -reported by Sapphire and POWERNV are logged in FSP. - -If the user does not intend to dump various user data sections, but just -log the error with some amount of description around that error, they can do -so using just the simple error logging interface - -log_simple_error(uint32_t reason_code, char *fmt, ...); - -Eg: log_simple_error(OPAL_RC_SURVE_STATUS, - "SURV: Error retreiving surveillance status: %d\n", - err_len); - -Using the reason code, an error log is generated with the information derived -from the look-up table, populated and committed to FSP. All of it -is done with just one call. - -Note: -==== -* For more information regarding error logging and PEL format - refer to PAPR doc and P7 PEL and SRC PLDD document. - -* Refer to include/errorlog.h for all the error logging - interface parameters and include/pel.h for PEL - structures. - -Sample error logging: -=================== - -DEFINE_LOG_ENTRY(OPAL_RC_ATTN, OPAL_PLATFORM_ERR_EVT, OPAL_ATTN, - OPAL_PLATFORM_FIRMWARE, OPAL_PREDICTIVE_ERR_GENERAL, - OPAL_NA); - -void report_error(int index) -{ - struct errorlog *buf; - char data1[] = "This is a sample user defined data section1"; - char data2[] = "Error logging sample. These are dummy errors. Section 2"; - char data3[] = "Sample error Sample error Sample error Sample error \ - Sample error abcdefghijklmnopqrstuvwxyz"; - int tag; - - printf("ELOG: In machine check report error index: %d\n", index); - - /* To report an error, create an error log with relevant information - * opal_elog_create(). Call returns a pre-allocated buffer of type - * 'struct errorlog' buffer with relevant fields updated. - */ - - /* tag -> unique ascii tag to identify a particular data dump section */ - tag = 0x4b4b4b4b; - buf = opal_elog_create(&e_info(OPAL_RC_ATTN), tag); - if (buf == NULL) { - printf("ELOG: Error getting buffer.\n"); - return; - } - - /* Append data or text with log_append_data() or log_append_msg() */ - log_append_data(buf, data1, sizeof(data1)); - - /* In case of user wanting to add multiple sections of various dump data - * for better debug, data sections can be added using this interface - * void log_add_section(struct errorlog *buf, uint32_t tag); - */ - tag = 0x4c4c4c4c; - log_add_section(buf, tag); - log_append_data(buf, data2, sizeof(data2)); - log_append_data(buf, data3, sizeof(data3)); - - /* Once all info is updated, ready to be sent to FSP */ - printf("ELOG:commit to FSP\n"); - log_commit(buf); -} - - Sample output PEL dump got from FSP: - =================================== - $ errl -d -x 0x533C9B37 -| 00000000 50480030 01004154 20150728 02000500 PH.0..AT ..(.... | -| 00000010 20150728 02000566 4B000107 00000000 ..(...fK....... | -| 00000020 00000000 00000000 B0000002 533C9B37 ............S..7 | -| 00000030 55480018 01004154 80002000 00000000 UH....AT.. ..... | -| 00000040 00002000 01005300 50530050 01004154 .. ...S.PS.P..AT | -| 00000050 02000008 00000048 00000080 00000000 .......H........ | -| 00000060 00000000 00000000 00000000 00000000 ................ | -| 00000070 00000000 00000000 42423832 31343130 ........BB821410 | -| 00000080 20202020 20202020 20202020 20202020 | -| 00000090 20202020 20202020 4548004C 01004154 EH.L..AT | -| 000000A0 38323836 2D343241 31303738 34415400 8286-42A10784AT. | -| 000000B0 00000000 00000000 00000000 00000000 ................ | -| 000000C0 00000000 00000000 00000000 00000000 ................ | -| 000000D0 00000000 00000000 20150728 02000500 ........ ..(.... | -| 000000E0 00000000 4D54001C 01004154 38323836 ....MT....AT8286 | -| 000000F0 2D343241 31303738 34415400 00000000 -42A10784AT..... | -| 00000100 5544003C 01004154 4B4B4B4B 00340000 UD....ATKKKK.4.. | -| 00000110 54686973 20697320 61207361 6D706C65 This is a sample | -| 00000120 20757365 72206465 66696E65 64206461 user defined da | -| 00000130 74612073 65637469 6F6E3100 554400A7 ta section1.UD.. | -| 00000140 01004154 4C4C4C4C 009F0000 4572726F ..ATLLLL....Erro | -| 00000150 72206C6F 6767696E 67207361 6D706C65 r logging sample | -| 00000160 2E205468 65736520 61726520 64756D6D . These are dumm | -| 00000170 79206572 726F7273 2E205365 6374696F y errors. Sectio | -| 00000180 6E203200 53616D70 6C652065 72726F72 n 2.Sample error | -| 00000190 2053616D 706C6520 6572726F 72205361 Sample error Sa | -| 000001A0 6D706C65 20657272 6F722053 616D706C mple error Sampl | -| 000001B0 65206572 726F7220 09090953 616D706C e error ...Sampl | -| 000001C0 65206572 726F7220 61626364 65666768 e error abcdefgh | -| 000001D0 696A6B6C 6D6E6F70 71727374 75767778 ijklmnopqrstuvwx | -| 000001E0 797A00 yz. | -|------------------------------------------------------------------------------| -| Platform Event Log - 0x533C9B37 | -|------------------------------------------------------------------------------| -| Private Header | -|------------------------------------------------------------------------------| -| Section Version : 1 | -| Sub-section type : 0 | -| Created by : 4154 | -| Created at : 07/28/2015 02:00:05 | -| Committed at : 07/28/2015 02:00:05 | -| Creator Subsystem : OPAL | -| CSSVER : | -| Platform Log Id : 0xB0000002 | -| Entry Id : 0x533C9B37 | -| Total Log Size : 483 | -|------------------------------------------------------------------------------| -| User Header | -|------------------------------------------------------------------------------| -| Section Version : 1 | -| Sub-section type : 0 | -| Log Committed by : 4154 | -| Subsystem : Platform Firmware | -| Event Scope : Unknown - 0x00000000 | -| Event Severity : Predictive Error | -| Event Type : Not Applicable | -| Return Code : 0x00000000 | -| Action Flags : Report Externally | -| Action Status : Sent to Hypervisor | -|------------------------------------------------------------------------------| -| Primary System Reference Code | -|------------------------------------------------------------------------------| -| Section Version : 1 | -| Sub-section type : 0 | -| Created by : 4154 | -| SRC Format : 0x80 | -| SRC Version : 0x02 | -| Virtual Progress SRC : False | -| I5/OS Service Event Bit : False | -| Hypervisor Dump Initiated: False | -| Power Control Net Fault : False | -| | -| Valid Word Count : 0x08 | -| Reference Code : BB821410 | -| Hex Words 2 - 5 : 00000080 00000000 00000000 00000000 | -| Hex Words 6 - 9 : 00000000 00000000 00000000 00000000 | -| | -|------------------------------------------------------------------------------| -| Extended User Header | -|------------------------------------------------------------------------------| -| Section Version : 1 | -| Sub-section type : 0 | -| Created by : 4154 | -| Reporting Machine Type : 8286-42A | -| Reporting Serial Number : 10784AT | -| FW Released Ver : | -| FW SubSys Version : | -| Common Ref Time : 07/28/2015 02:00:05 | -| Symptom Id Len : 0 | -| Symptom Id : | -|------------------------------------------------------------------------------| -| Machine Type/Model & Serial Number | -|------------------------------------------------------------------------------| -| Section Version : 1 | -| Sub-section type : 0 | -| Created by : 4154 | -| Machine Type Model : 8286-42A | -| Serial Number : 10784AT | -|------------------------------------------------------------------------------| -| User Defined Data | -|------------------------------------------------------------------------------| -| Section Version : 1 | -| Sub-section type : 0 | -| Created by : 4154 | -| | -| 00000000 4B4B4B4B 00340000 54686973 20697320 KKKK.4..This is | -| 00000010 61207361 6D706C65 20757365 72206465 a sample user de | -| 00000020 66696E65 64206461 74612073 65637469 fined data secti | -| 00000030 6F6E3100 on1. | -| | -|------------------------------------------------------------------------------| -| User Defined Data | -|------------------------------------------------------------------------------| -| Section Version : 1 | -| Sub-section type : 0 | -| Created by : 4154 | -| | -| 00000000 4C4C4C4C 009F0000 4572726F 72206C6F LLLL....Error lo | -| 00000010 6767696E 67207361 6D706C65 2E205468 gging sample. Th | -| 00000020 65736520 61726520 64756D6D 79206572 ese are dummy er | -| 00000030 726F7273 2E205365 6374696F 6E203200 rors. Section 2. | -| 00000040 53616D70 6C652065 72726F72 2053616D Sample error Sam | -| 00000050 706C6520 6572726F 72205361 6D706C65 ple error Sample | -| 00000060 20657272 6F722053 616D706C 65206572 error Sample er | -| 00000070 726F7220 09090953 616D706C 65206572 ror ...Sample er | -| 00000080 726F7220 61626364 65666768 696A6B6C ror abcdefghijkl | -| 00000090 6D6E6F70 71727374 75767778 797A00 mnopqrstuvwxyz. | -| | -|------------------------------------------------------------------------------| - diff --git a/doc/gcov.rst b/doc/gcov.rst new file mode 100644 index 0000000..956c5c8 --- /dev/null +++ b/doc/gcov.rst @@ -0,0 +1,62 @@ +GCOV for skiboot +---------------- + +Unit tests +---------- +All unit tests are built+run with gcov enabled. + +make coverage-report + +will generate a unit test coverage report like: +http://open-power.github.io/skiboot/coverage-report/ + +Skiboot +------- +You can now build Skiboot itself with gcov support, boot it on a machine, +do things, and then extract out gcda files to generate coverage reports +from real hardware (or a simulator). + +Building Skiboot with GCOV +-------------------------- + +SKIBOOT_GCOV=1 make + +You may need to "make clean" first. + +This will build a skiboot lid roughly *twice* the size. + +Flash/Install the skiboot.lid and boot. + +Extracting GCOV data +-------------------- +The way we extract the gcov data from a system is by dumping the contents +of skiboot memory and then parsing the data structures in user space with +the extract-gcov utility in the skiboot repo. + +mambo: + mysim memory fwrite 0x30000000 0x240000 skiboot.dump +FSP: + getmemproc 30000000 3407872 -fb skiboot.dump +linux (e.g. petitboot environment): + dd if=/proc/kcore skip=1572864 count=6656 of=skiboot.dump + +You basically need to dump out the first 3MB of skiboot memory. + +Then you need to find out where the gcov data structures are: +perl -e "printf '0x%x', 0x30000000 + 0x`grep gcov_info_list skiboot.map|cut -f 1 -d ' '`" + +That address needs to be supplied to the extract-gcov utility: +./extract-gcov skiboot.dump 0x3023ec40 + +Once you've run extract-gcov, it will have extracted the gcda files +from the skiboot memory image. + +You can then run lcov: +lcov -b . -q -c -d . -o skiboot-boot.info \ +--gcov-tool +/opt/cross/gcc-4.8.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcov + +*IMPORTANT* you should point lcov to the gcov for the compiler you used +to build skiboot, otherwise you're likely to get errors. + + diff --git a/doc/gcov.txt b/doc/gcov.txt deleted file mode 100644 index 956c5c8..0000000 --- a/doc/gcov.txt +++ /dev/null @@ -1,62 +0,0 @@ -GCOV for skiboot ----------------- - -Unit tests ----------- -All unit tests are built+run with gcov enabled. - -make coverage-report - -will generate a unit test coverage report like: -http://open-power.github.io/skiboot/coverage-report/ - -Skiboot -------- -You can now build Skiboot itself with gcov support, boot it on a machine, -do things, and then extract out gcda files to generate coverage reports -from real hardware (or a simulator). - -Building Skiboot with GCOV --------------------------- - -SKIBOOT_GCOV=1 make - -You may need to "make clean" first. - -This will build a skiboot lid roughly *twice* the size. - -Flash/Install the skiboot.lid and boot. - -Extracting GCOV data --------------------- -The way we extract the gcov data from a system is by dumping the contents -of skiboot memory and then parsing the data structures in user space with -the extract-gcov utility in the skiboot repo. - -mambo: - mysim memory fwrite 0x30000000 0x240000 skiboot.dump -FSP: - getmemproc 30000000 3407872 -fb skiboot.dump -linux (e.g. petitboot environment): - dd if=/proc/kcore skip=1572864 count=6656 of=skiboot.dump - -You basically need to dump out the first 3MB of skiboot memory. - -Then you need to find out where the gcov data structures are: -perl -e "printf '0x%x', 0x30000000 + 0x`grep gcov_info_list skiboot.map|cut -f 1 -d ' '`" - -That address needs to be supplied to the extract-gcov utility: -./extract-gcov skiboot.dump 0x3023ec40 - -Once you've run extract-gcov, it will have extracted the gcda files -from the skiboot memory image. - -You can then run lcov: -lcov -b . -q -c -d . -o skiboot-boot.info \ ---gcov-tool -/opt/cross/gcc-4.8.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcov - -*IMPORTANT* you should point lcov to the gcov for the compiler you used -to build skiboot, otherwise you're likely to get errors. - - diff --git a/doc/memory.rst b/doc/memory.rst new file mode 100644 index 0000000..002d460 --- /dev/null +++ b/doc/memory.rst @@ -0,0 +1,44 @@ +Memory in skiboot +----------------- + +There are regions of memory we statically allocate for firmware as well as +a HEAP region for boot and runtime allocations. + +A design principle of skiboot is to attempt not to allocate memory at runtime, +or at least keep it to a minimum, and not do so in any critical code path +for the system to remain running. + +At no point during runtime should a skiboot memory allocation failure cause +the system to stop functioning. + +HEAP +---- + +Dynamic memory allocations go in a single heap. This is identified as +Region ibm,firmware-heap and appears as a reserved section in the device tree. + +Originally, it was 12582912 bytes in size (declared in mem_map.h). +Now, it is 13631488 bytes after being bumped as part of the GCOV work. + +We increased heap size as on larger systems, we were getting close to using +all the heap once skiboot became 2MB with GCOV. + +Heap usage is printed before running the payload. + +For example, as of writing, on a dual socket Tuleta: +[45215870591,5] SkiBoot skiboot-5.0.1-94-gb759ce2 starting... +[3680939340,5] CUPD: T side MI Keyword = SV830_027 +[3680942658,5] CUPD: T side ML Keyword = FW830.00 +[15404383291,5] Region ibm,firmware-heap free: 5378072 + +and on a palmetto: +[24748502575,5] SkiBoot skiboot-5.0.1-94-gb759ce2 starting... +[9870429550,5] Region ibm,firmware-heap free: 10814856 + +Our memory allocator is simple, a use pattern of: +A = malloc(); +B = malloc(); +free(A); + +is likely to generate fragmentation, so it should generally be avoided +where possible. diff --git a/doc/memory.txt b/doc/memory.txt deleted file mode 100644 index 002d460..0000000 --- a/doc/memory.txt +++ /dev/null @@ -1,44 +0,0 @@ -Memory in skiboot ------------------ - -There are regions of memory we statically allocate for firmware as well as -a HEAP region for boot and runtime allocations. - -A design principle of skiboot is to attempt not to allocate memory at runtime, -or at least keep it to a minimum, and not do so in any critical code path -for the system to remain running. - -At no point during runtime should a skiboot memory allocation failure cause -the system to stop functioning. - -HEAP ----- - -Dynamic memory allocations go in a single heap. This is identified as -Region ibm,firmware-heap and appears as a reserved section in the device tree. - -Originally, it was 12582912 bytes in size (declared in mem_map.h). -Now, it is 13631488 bytes after being bumped as part of the GCOV work. - -We increased heap size as on larger systems, we were getting close to using -all the heap once skiboot became 2MB with GCOV. - -Heap usage is printed before running the payload. - -For example, as of writing, on a dual socket Tuleta: -[45215870591,5] SkiBoot skiboot-5.0.1-94-gb759ce2 starting... -[3680939340,5] CUPD: T side MI Keyword = SV830_027 -[3680942658,5] CUPD: T side ML Keyword = FW830.00 -[15404383291,5] Region ibm,firmware-heap free: 5378072 - -and on a palmetto: -[24748502575,5] SkiBoot skiboot-5.0.1-94-gb759ce2 starting... -[9870429550,5] Region ibm,firmware-heap free: 10814856 - -Our memory allocator is simple, a use pattern of: -A = malloc(); -B = malloc(); -free(A); - -is likely to generate fragmentation, so it should generally be avoided -where possible. diff --git a/doc/nvlink.rst b/doc/nvlink.rst new file mode 100644 index 0000000..5aef539 --- /dev/null +++ b/doc/nvlink.rst @@ -0,0 +1,157 @@ +OPAL/Skiboot Nvlink Interface Documentation +---------------------------------------------------------------------- + +======== +Overview +======== + +NV-Link is a high speed interconnect that is used in conjunction with +a PCI-E connection to create an interface between chips that provides +very high data bandwidth. The PCI-E connection is used as the control +path to initiate and report status of large data transfers. The data +transfers themselves are sent over the NV-Link. + +On IBM Power systems the NV-Link hardware is similar to our standard +PCI hardware so to maximise code reuse the NV-Link is exposed as an +emulated PCI device through system firmware (OPAL/skiboot). Thus each +NV-Link capable device will appear as two devices on a system, the +real PCI-E device and at least one emulated PCI device used for the +NV-Link. + +Presently the NV-Link is only capable of data transfers initiated by +the target, thus the emulated PCI device will only handle registers +for link initialisation, DMA transfers and error reporting (EEH). + +==================== +Emulated PCI Devices +==================== + +Each link will be exported as an emulated PCI device with a minimum of +two emulated PCI devices per GPU. Emulated PCI devices are grouped per +GPU. + +The emulated PCI device will be exported as a standard PCI device by +the Linux kernel. It has a standard PCI configuration space to expose +necessary device parameters. The only functionality available is +related to the setup of DMA windows. + +Configuration Space Parameters +----------------------------- + +Vendor ID = 0x1014 (IBM) +Device ID = 0x04ea +Revision ID = 0x00 +Class = 0x068000 (Bridge Device Other, ProgIf = 0x0) +BAR0/1 = TL/DL Registers + +TL/DL Registers +--------------- + +Each link has 128KB of TL/DL registers. These will always be mapped +to 64-bit BAR#0 of the emulated PCI device configuration space. + +BAR#0 + 128K +-----------+ + | NTL (64K) | +BAR#0 + 64K +-----------+ + | DL (64K) | +BAR#0 +-----------+ + +Vendor Specific Capabilities +---------------------------- + ++-----------------+----------------+----------------+----------------+ +| Version (0x02) | Cap Length | Next Cap Ptr | Cap ID (0x09) | ++-----------------+----------------+----------------+----------------+ +| Procedure Status Register | ++--------------------------------------------------------------------+ +| Procedure Control Register | ++---------------------------------------------------+----------------+ +| Reserved | PCI Dev Flag | Link Number | ++---------------------------------------------------+----------------+ + +Version + + This refers to the version of the NPU config space. Used by device + drivers to determine which fields of the config space they can + expect to be available. + +Procedure Control Register + + Used to start hardware procedures. + + Writes will start the corresponding procedure and set bit 31 in the + procedure status register. This register must not be written while + bit 31 is set in the status register. Performing a write while + another procudure is already in progress will abort that procedure. + + Reads will return the in progress procedure or the last completed + procedure number depending on the procedure status field. + + Procedure Numbers: + 0 - Abort in-progress procedure + 1 - NOP + 2 - Unsupported procedure + 3 - Unsupported procedure + 4 - Naples PHY - RESET + 5 - Naples PHY - TX_ZCAL + 6 - Naples PHY - RX_DCCAL + 7 - Naples PHY - TX_RXCAL_ENABLE + 8 - Naples PHY - TX_RXCAL_DISABLE + 9 - Naples PHY - RX_TRAINING + 10 - Naples NPU - RESET + 11 - Naples PHY - PHY preterminate + 12 - Naples PHY - PHY terminated + + Procedure 5 (TX_ZCAL) should only be run once. System firmware will + ensure this so device drivers may call this procedure mutiple + times. + +Procedure Status Register + + The procedure status register is used to determine when execution + of the procedure number in the control register is complete and if + it completed successfully. + + This register must be polled frequently to allow system firmware to + execute the procedures. + + Fields: + Bit 31 - Procedure in progress + Bit 30 - Procedure complete + Bit 3-0 - Procedure completion code + + Procedure completion codes: + 0 - Procedure completed successfully. + 1 - Transient failure. Procedure should be rerun. + 2 - Permanent failure. Procedure will never complete successfully. + 3 - Procedure aborted. + 4 - Unsupported procedure. + +PCI Device Flag + + Bit 0 is set only if an actual PCI device was bound to this + emulated device. + +Link Number + + Physical link number this emulated PCI device is associated + with. One of 0, 1, 4 or 5 (links 2 & 3 do not exist on Naples). + +Reserved + + These fields must be ignored and no value should be assumed. + +Interrupts +---------- + +Each link has a single DL/TL interrupt assigned to it. These will be +exposed as an LSI via the emulated PCI device. There are 4 links +consuming 4 LSI interrupts. The 4 remaining interrupts supported by the +corresponding PHB will be routed to OS platform for the purpose of error +reporting. + +==================== +Device Tree Bindings +==================== + +See doc/device-tree/nvlink.txt diff --git a/doc/nvlink.txt b/doc/nvlink.txt deleted file mode 100644 index 5aef539..0000000 --- a/doc/nvlink.txt +++ /dev/null @@ -1,157 +0,0 @@ -OPAL/Skiboot Nvlink Interface Documentation ----------------------------------------------------------------------- - -======== -Overview -======== - -NV-Link is a high speed interconnect that is used in conjunction with -a PCI-E connection to create an interface between chips that provides -very high data bandwidth. The PCI-E connection is used as the control -path to initiate and report status of large data transfers. The data -transfers themselves are sent over the NV-Link. - -On IBM Power systems the NV-Link hardware is similar to our standard -PCI hardware so to maximise code reuse the NV-Link is exposed as an -emulated PCI device through system firmware (OPAL/skiboot). Thus each -NV-Link capable device will appear as two devices on a system, the -real PCI-E device and at least one emulated PCI device used for the -NV-Link. - -Presently the NV-Link is only capable of data transfers initiated by -the target, thus the emulated PCI device will only handle registers -for link initialisation, DMA transfers and error reporting (EEH). - -==================== -Emulated PCI Devices -==================== - -Each link will be exported as an emulated PCI device with a minimum of -two emulated PCI devices per GPU. Emulated PCI devices are grouped per -GPU. - -The emulated PCI device will be exported as a standard PCI device by -the Linux kernel. It has a standard PCI configuration space to expose -necessary device parameters. The only functionality available is -related to the setup of DMA windows. - -Configuration Space Parameters ------------------------------ - -Vendor ID = 0x1014 (IBM) -Device ID = 0x04ea -Revision ID = 0x00 -Class = 0x068000 (Bridge Device Other, ProgIf = 0x0) -BAR0/1 = TL/DL Registers - -TL/DL Registers ---------------- - -Each link has 128KB of TL/DL registers. These will always be mapped -to 64-bit BAR#0 of the emulated PCI device configuration space. - -BAR#0 + 128K +-----------+ - | NTL (64K) | -BAR#0 + 64K +-----------+ - | DL (64K) | -BAR#0 +-----------+ - -Vendor Specific Capabilities ----------------------------- - -+-----------------+----------------+----------------+----------------+ -| Version (0x02) | Cap Length | Next Cap Ptr | Cap ID (0x09) | -+-----------------+----------------+----------------+----------------+ -| Procedure Status Register | -+--------------------------------------------------------------------+ -| Procedure Control Register | -+---------------------------------------------------+----------------+ -| Reserved | PCI Dev Flag | Link Number | -+---------------------------------------------------+----------------+ - -Version - - This refers to the version of the NPU config space. Used by device - drivers to determine which fields of the config space they can - expect to be available. - -Procedure Control Register - - Used to start hardware procedures. - - Writes will start the corresponding procedure and set bit 31 in the - procedure status register. This register must not be written while - bit 31 is set in the status register. Performing a write while - another procudure is already in progress will abort that procedure. - - Reads will return the in progress procedure or the last completed - procedure number depending on the procedure status field. - - Procedure Numbers: - 0 - Abort in-progress procedure - 1 - NOP - 2 - Unsupported procedure - 3 - Unsupported procedure - 4 - Naples PHY - RESET - 5 - Naples PHY - TX_ZCAL - 6 - Naples PHY - RX_DCCAL - 7 - Naples PHY - TX_RXCAL_ENABLE - 8 - Naples PHY - TX_RXCAL_DISABLE - 9 - Naples PHY - RX_TRAINING - 10 - Naples NPU - RESET - 11 - Naples PHY - PHY preterminate - 12 - Naples PHY - PHY terminated - - Procedure 5 (TX_ZCAL) should only be run once. System firmware will - ensure this so device drivers may call this procedure mutiple - times. - -Procedure Status Register - - The procedure status register is used to determine when execution - of the procedure number in the control register is complete and if - it completed successfully. - - This register must be polled frequently to allow system firmware to - execute the procedures. - - Fields: - Bit 31 - Procedure in progress - Bit 30 - Procedure complete - Bit 3-0 - Procedure completion code - - Procedure completion codes: - 0 - Procedure completed successfully. - 1 - Transient failure. Procedure should be rerun. - 2 - Permanent failure. Procedure will never complete successfully. - 3 - Procedure aborted. - 4 - Unsupported procedure. - -PCI Device Flag - - Bit 0 is set only if an actual PCI device was bound to this - emulated device. - -Link Number - - Physical link number this emulated PCI device is associated - with. One of 0, 1, 4 or 5 (links 2 & 3 do not exist on Naples). - -Reserved - - These fields must be ignored and no value should be assumed. - -Interrupts ----------- - -Each link has a single DL/TL interrupt assigned to it. These will be -exposed as an LSI via the emulated PCI device. There are 4 links -consuming 4 LSI interrupts. The 4 remaining interrupts supported by the -corresponding PHB will be routed to OS platform for the purpose of error -reporting. - -==================== -Device Tree Bindings -==================== - -See doc/device-tree/nvlink.txt diff --git a/doc/opal-spec.rst b/doc/opal-spec.rst new file mode 100644 index 0000000..ea76e59 --- /dev/null +++ b/doc/opal-spec.rst @@ -0,0 +1,216 @@ +OPAL Specification +================== + +DRAFT - VERSION 0.0.1 AT BEST. + +COMMENTS ARE WELCOME - and indeed, needed. + +If you are reading this, congratulations: you're now reviewing it! + + +This document aims to define what it means to be OPAL compliant. + +While skiboot is the reference implementation, this documentation should +be complete enough that (given hardware documentation) create another +implementation. It is not recommended that you do this though. + +Authors +------- +Stewart Smith : OPAL Architect, IBM + + +Definitions +----------- + +Host processor - the main POWER CPU (e.g. the POWER8 CPU) +Host OS - the operating system running on the host processor. +OPAL - OpenPOWER Abstraction Layer. + +What is OPAL? +------------- + +The OpenPower Abstraction Layer (OPAL) is boot and runtime firmware for +POWER systems. There are several components to what makes up a firmware +image for OpenPower machines. + +For example, there may be: +- BMC firmware + - Firmware that runs purely on the BMC. + - On IBM systems that have an FSP rather than a BMC, there is FSP firmware + - While essential to having the machine function, this firmware is not + part of the OPAL Specification. +- HostBoot + - HostBoot ( https://github.com/open-power/hostboot ) performs all + processor, bus and memory initialization within IBM POWER based systems. +- OCC Firmware + - On Chip Controller ( Firmware for OCC - a PPC405 core inside the IBM + POWER8 in charge of keeping the system thermally and power safe ). +- SkiBoot + - Boot and runtime services. +- A linux kernel and initramfs incorporating petitboot + - The bootloader. This is where a user chooses what OS to boot, and + petitboot will use kexec to switch to the host Operating System + (for example, PowerKVM). + +While all of these components may be absolutely essential to power on, +boot and operate a specific OpenPower POWER8 system, the majority of +the code mentioned above can be thought of as implementation details +and not something that should form part of an OPAL Specification. + +For an OPAL system, we assume that the hardware is functioning and any +hardware management that is specific to a platform is performed by OPAL +firmware transparently to the host OS. + +The OPAL Specification focus on the interface between firmware and the +Operating System. It does not dictate that any specific pieces of firmware +code be used, although re-inventing the wheel is strongly discouraged. + +The OPAL Specification explicitly allows for: +- A conforming implementation to not use any of the reference implementation + code. +- A conforming implementation to use any 64bit POWER ISA conforming processor, + and not be limited to the IBM POWER8. +- A conforming implementation to be a simulator, emulator or virtual environment +- A host OS other than Linux + +Explicitly not covered in this specification: +- A 32bit OPAL Specification + There is no reason this couldn't exist but the current specification is for + 64bit POWER systems only. + + +Boot Services +------------- + +An OPAL compliant firmware implementation will load and execute a payload +capable of booting a Host Operating System. + +The reference implementation loads a Linux kernel with an initramfs with +a minimal userspace and the petitboot boot loader - collectively referred +to as skiroot. + +The OPAL Specification explicitly allows variation in this payload. + +A requirement of the payload is that it MUST support loading and booting +an uncomppressed vmlinux Linux kernel. +[TODO: expand on what this actually means] + +An OPAL system MUST pass a device tree to the host kernel. +[TODO: expand the details, add device-tree section and spec] + +An OPAL system MUST provide the host kernel with enough information to +know how to call OPAL runtime services. +[TODO: expand on this. ] + +Explicitly not covered by the OPAL Specification: +- Kernel module ABI for skiroot kernel +- Userspace environment of skiroot +- That skiroot is Linux. + +Explicitly allowed: +- Replacing the payload with something of equal/similar functionality + (weather replacing skiroot with an implementation of Zork would be compliant + is left as an exercise for the reader) + +Payload Environment +------------------- +The payload is started with: +r3 = address of flattened device-tree (fdt) +r8 = OPAL base +r9 = OPAL entry + + +Runtime Services +---------------- + +An OPAL Specification compliant system provides runtime services to the host +Operating System via a standard interface. + +An OPAL call is made by calling opal_entry with: + * r0: OPAL Token + * r2: OPAL Base + * r3..r10: Args (up to 8) + +The OPAL API is defined in skiboot/doc/opal-api/ + +Not all OPAL APIs must be supported for a system to be compliant. When +called with an unsupported token, a compliant firmware implementation +MUST fail gracefully and not crash. Reporting a warning that an unsupported +token was called is okay, as compliant host Operating Systems should use +OPAL_CHECK_TOKEN to test for optional functionality. + +All parameters to OPAL calls are big endian. Little endian hosts MUST +appropriately convert parameters before passing them to OPAL. + +Machine state across OPAL calls: +- r1 is preserved +- r12 is scratch +- r13 - 31 preserved +- 64bit HV real mode +- big endian +- external interrupts disabled + +Detecting OPAL Support +---------------------- + +A Host OS may need to detect the presence of OPAL as it may support booting +under other platforms. For example, a single Linux kernel can be built to boot +under OPAL and under PowerVM or qemu pseries machine type. + +The root node of the device tree MUST have compatible = "ibm,powernv". +See doc/device-tree.txt for more details +[TODO: make doc/device-tree.txt better] + +The presence of the "/ibm,opal" entry in the device tree signifies running +under OPAL. Additionally, the "/ibm,opal" node MUST have a compatibile property +listing "ibm,opal-v3". + +The "/ibm,opal" node MUST have the following properties: + +ibm,opal { + compatible = "ibm,opal-v3"; + opal-base-address = <>; + opal-entry-address = <>; + opal-runtime-size = <>; +} + +The compatible property MAY have other strings, such as a future "ibm,opal-v4". +These are reserved for future use. + +Some releases of the reference implementation (skiboot) have had compatible +contain "ibm,opal-v2" as well as "ibm,opal-v3". Host operating systems MUST +NOT rely on "ibm,opal-v2", this is a relic from early OPAL history. + +The "ibm,opal" node MUST have a child node named "firmware". It MUST contain +the following: + +firmware { + compatible = "ibm,opal-firmware"; +} + +It MUST contain one of the following two properties: git-id, version. +The git-id property is deprecated, and version SHOULD be used. These +are informative and MUST NOT be used by the host OS to determine anything +about the firmware environment. + +The version property is a textual representation of the OPAL version. +For example, it may be "skiboot-4.1" or other versioning described +in more detail in doc/versioning.txt + + +OPAL log +-------- + +OPAL implementations SHOULD have an in memory log where informational and +error messages are stored. If present it MUST be human readable and text based. +There is a separate facility (Platform Error Logs) for machine readable errors. + +A conforming implementation MAY also output the log to a serial port or similar. +An implementation MAY choose to only output certain log messages to a serial +port. + +For example, the reference implementation (skiboot) by default filters log +messages so that only higher priority log messages go over the serial port +while more messages go to the in memory buffer. + +[TODO: add device-tree bits here] diff --git a/doc/opal-spec.txt b/doc/opal-spec.txt deleted file mode 100644 index ea76e59..0000000 --- a/doc/opal-spec.txt +++ /dev/null @@ -1,216 +0,0 @@ -OPAL Specification -================== - -DRAFT - VERSION 0.0.1 AT BEST. - -COMMENTS ARE WELCOME - and indeed, needed. - -If you are reading this, congratulations: you're now reviewing it! - - -This document aims to define what it means to be OPAL compliant. - -While skiboot is the reference implementation, this documentation should -be complete enough that (given hardware documentation) create another -implementation. It is not recommended that you do this though. - -Authors -------- -Stewart Smith : OPAL Architect, IBM - - -Definitions ------------ - -Host processor - the main POWER CPU (e.g. the POWER8 CPU) -Host OS - the operating system running on the host processor. -OPAL - OpenPOWER Abstraction Layer. - -What is OPAL? -------------- - -The OpenPower Abstraction Layer (OPAL) is boot and runtime firmware for -POWER systems. There are several components to what makes up a firmware -image for OpenPower machines. - -For example, there may be: -- BMC firmware - - Firmware that runs purely on the BMC. - - On IBM systems that have an FSP rather than a BMC, there is FSP firmware - - While essential to having the machine function, this firmware is not - part of the OPAL Specification. -- HostBoot - - HostBoot ( https://github.com/open-power/hostboot ) performs all - processor, bus and memory initialization within IBM POWER based systems. -- OCC Firmware - - On Chip Controller ( Firmware for OCC - a PPC405 core inside the IBM - POWER8 in charge of keeping the system thermally and power safe ). -- SkiBoot - - Boot and runtime services. -- A linux kernel and initramfs incorporating petitboot - - The bootloader. This is where a user chooses what OS to boot, and - petitboot will use kexec to switch to the host Operating System - (for example, PowerKVM). - -While all of these components may be absolutely essential to power on, -boot and operate a specific OpenPower POWER8 system, the majority of -the code mentioned above can be thought of as implementation details -and not something that should form part of an OPAL Specification. - -For an OPAL system, we assume that the hardware is functioning and any -hardware management that is specific to a platform is performed by OPAL -firmware transparently to the host OS. - -The OPAL Specification focus on the interface between firmware and the -Operating System. It does not dictate that any specific pieces of firmware -code be used, although re-inventing the wheel is strongly discouraged. - -The OPAL Specification explicitly allows for: -- A conforming implementation to not use any of the reference implementation - code. -- A conforming implementation to use any 64bit POWER ISA conforming processor, - and not be limited to the IBM POWER8. -- A conforming implementation to be a simulator, emulator or virtual environment -- A host OS other than Linux - -Explicitly not covered in this specification: -- A 32bit OPAL Specification - There is no reason this couldn't exist but the current specification is for - 64bit POWER systems only. - - -Boot Services -------------- - -An OPAL compliant firmware implementation will load and execute a payload -capable of booting a Host Operating System. - -The reference implementation loads a Linux kernel with an initramfs with -a minimal userspace and the petitboot boot loader - collectively referred -to as skiroot. - -The OPAL Specification explicitly allows variation in this payload. - -A requirement of the payload is that it MUST support loading and booting -an uncomppressed vmlinux Linux kernel. -[TODO: expand on what this actually means] - -An OPAL system MUST pass a device tree to the host kernel. -[TODO: expand the details, add device-tree section and spec] - -An OPAL system MUST provide the host kernel with enough information to -know how to call OPAL runtime services. -[TODO: expand on this. ] - -Explicitly not covered by the OPAL Specification: -- Kernel module ABI for skiroot kernel -- Userspace environment of skiroot -- That skiroot is Linux. - -Explicitly allowed: -- Replacing the payload with something of equal/similar functionality - (weather replacing skiroot with an implementation of Zork would be compliant - is left as an exercise for the reader) - -Payload Environment -------------------- -The payload is started with: -r3 = address of flattened device-tree (fdt) -r8 = OPAL base -r9 = OPAL entry - - -Runtime Services ----------------- - -An OPAL Specification compliant system provides runtime services to the host -Operating System via a standard interface. - -An OPAL call is made by calling opal_entry with: - * r0: OPAL Token - * r2: OPAL Base - * r3..r10: Args (up to 8) - -The OPAL API is defined in skiboot/doc/opal-api/ - -Not all OPAL APIs must be supported for a system to be compliant. When -called with an unsupported token, a compliant firmware implementation -MUST fail gracefully and not crash. Reporting a warning that an unsupported -token was called is okay, as compliant host Operating Systems should use -OPAL_CHECK_TOKEN to test for optional functionality. - -All parameters to OPAL calls are big endian. Little endian hosts MUST -appropriately convert parameters before passing them to OPAL. - -Machine state across OPAL calls: -- r1 is preserved -- r12 is scratch -- r13 - 31 preserved -- 64bit HV real mode -- big endian -- external interrupts disabled - -Detecting OPAL Support ----------------------- - -A Host OS may need to detect the presence of OPAL as it may support booting -under other platforms. For example, a single Linux kernel can be built to boot -under OPAL and under PowerVM or qemu pseries machine type. - -The root node of the device tree MUST have compatible = "ibm,powernv". -See doc/device-tree.txt for more details -[TODO: make doc/device-tree.txt better] - -The presence of the "/ibm,opal" entry in the device tree signifies running -under OPAL. Additionally, the "/ibm,opal" node MUST have a compatibile property -listing "ibm,opal-v3". - -The "/ibm,opal" node MUST have the following properties: - -ibm,opal { - compatible = "ibm,opal-v3"; - opal-base-address = <>; - opal-entry-address = <>; - opal-runtime-size = <>; -} - -The compatible property MAY have other strings, such as a future "ibm,opal-v4". -These are reserved for future use. - -Some releases of the reference implementation (skiboot) have had compatible -contain "ibm,opal-v2" as well as "ibm,opal-v3". Host operating systems MUST -NOT rely on "ibm,opal-v2", this is a relic from early OPAL history. - -The "ibm,opal" node MUST have a child node named "firmware". It MUST contain -the following: - -firmware { - compatible = "ibm,opal-firmware"; -} - -It MUST contain one of the following two properties: git-id, version. -The git-id property is deprecated, and version SHOULD be used. These -are informative and MUST NOT be used by the host OS to determine anything -about the firmware environment. - -The version property is a textual representation of the OPAL version. -For example, it may be "skiboot-4.1" or other versioning described -in more detail in doc/versioning.txt - - -OPAL log --------- - -OPAL implementations SHOULD have an in memory log where informational and -error messages are stored. If present it MUST be human readable and text based. -There is a separate facility (Platform Error Logs) for machine readable errors. - -A conforming implementation MAY also output the log to a serial port or similar. -An implementation MAY choose to only output certain log messages to a serial -port. - -For example, the reference implementation (skiboot) by default filters log -messages so that only higher priority log messages go over the serial port -while more messages go to the in memory buffer. - -[TODO: add device-tree bits here] diff --git a/doc/pci-slot.rst b/doc/pci-slot.rst new file mode 100644 index 0000000..1b64f69 --- /dev/null +++ b/doc/pci-slot.rst @@ -0,0 +1,119 @@ +Overview +======== + +The PCI slots are instantiated to represent their associated properties and +operations. The slot properties are exported to OS through the device tree +node of the corresponding parent PCI device. The slot operations are used +to accomodate requests from OS regarding the indicated PCI slot: + + * PCI slot reset + * PCI slot property retrival + +The PCI slots are expected to be created by individual platforms based on +the given templates, which are classified to PHB slot or normal one currently. +The PHB slot is instantiated based on PHB types like P7IOC and PHB3. However, +the normal PCI slots are created based on general RC (Root Complex), PCIE switch +ports, PCIE-to-PCIx bridge. Individual platform may create PCI slot, which doesn't +have existing template. + +The PCI slots are created at different stages according to their types. PHB slots +are expected to be created once the PHB is register (struct platform::pci_setup_phb()) +because the PHB slot reset operations are required at early stage of PCI enumeration. +The normal slots are populated after their parent PCI devices are instantiated at +struct platform::pci_get_slot_info(). + +The operation set supplied by the template might be overrided and reimplemented, or +partially. It's usually done according to the VPD figured out by individual platforms. + +PCI Slot Operations +=================== + +The following operations are supported to one particular PCI slot. More details +could be found from the definition of struct pci_slot_ops: + +get_presence_state Check if any adapter connected to slot +get_link_state Retrieve PCIE link status: up, down, link width +get_power_state Retrieve the power status: on, off +get_attention_state Retrieve attention status: on, off, blinking +get_latch_state Retrieve latch status +set_power_state Configure the power status: on, off +set_attention_state Configure attention status: on, off, blinking + +prepare_link_change Prepare PCIE link status change +poll_link Poll PCIE link until it's up or down permanently +creset Complete reset, only available to PHB slot +freset Fundamental reset +pfreset Post fundamental reset +hreset Hot reset +poll Interface for OPAL API to drive internal state machine + +add_properties Additional PCI slot properties seen by platform + +PCI Slot Properties +=================== + +The following PCI slot properties have been exported through PCI device tree +node for a root port, a PCIE switch port, or a PCIE to PCIx bridge. If the +individual platforms (e.g. Firenze and Apollo) have VPD for the PCI slot, they +should extract the PCI slot properties from VPD and export them accordingly. + +ibm,reset-by-firmware Boolean indicating whether the slot reset should be + done in firmware +ibm,slot-pluggable Boolean indicating whether the slot is pluggable +ibm,slot-power-ctl Boolean indicating whether the slot has power control +ibm,slot-wired-lanes The number of hardware lanes that are wired +ibm,slot-pwr-led-ctl Presence of slot power led, and controlling entity +ibm,slot-attn-led-ctl Presence of slot ATTN led, and controlling entity + +PCI Hotplug +=========== + +The implementation of PCI slot hotplug heavily relies on its power state. +Initially, the slot is powered off if there are no adapters behind it. +Otherwise, the slot should be powered on. + +In hot add scenario, the adapter is physically inserted to PCI slot. Then +the PCI slot is powered on by OPAL API opal_pci_set_power_state(). The +power is supplied to the PCI slot, the adapter behind the PCI slot is +probed and the device sub-tree (for hot added devices) is populated. A +OPAL message is sent to OS on completion. The OS needs retrieve the device +sub-tree through OPAL API opal_get_device_tree(), unflatten it and populate +the device sub-tree. After that, the adapter behind the PCI slot should +be probed and added to the system. + +On the other hand, the OS removes the adapter behind the PCI slot before +calling opal_pci_set_power_state(). Skiboot cuts off the power supply to +the PCI slot, removes the adapter behind the PCI slot and the corresponding +device sub-tree. A OPAL message (OPAL_MSG_ASYNC_COMP) is sent to OS. The +OS removes the device sub-tree for the adapter behind the PCI slot. + +The OPAL message used in PCI hotplug is comprised of 4 dwords in sequence: +asychronous token from OS, PCI slot device node's phandle, OPAL_PCI_SLOT_POWER_{ON, +OFF}, OPAL_SUCCESS or errcode. + +The states OPAL_PCI_SLOT_OFFLINE and OPAL_PCI_SLOT_ONLINE are used for removing +or adding devices behind the slot. The device nodes in the device tree are +removed or added accordingly, without actually changing the slot's power state. +The API call will return OPAL_SUCCESS immediately and no further asynchronous +message will be sent. + +PCI Slot on Apollo and Firenze +============================== + +On IBM's Apollo and Firenze platform, the PCI VPD is fetched from dedicated LID, +which is organized in so-called 1004, 1005, or 1006 format. 1006 mapping format +isn't supported currently. The PCI slot properties are figured out from the VPD. +On the other hand, there might have external power management entity hooked to +I2C buses for one PCI slot. The fundamental reset operation of the PCI slot should +be implemented based on the external power management entity for that case. + +On Firenze platform, PERST pin is accessible through bit#10 of PCI config register +(offset: 0x80) for those PCI slots behind some PLX switch downstream ports. For +those PCI slots, PERST pin is utilized to implement fundamental reset if external +power management entity doesn't exist. + +For Apollo and Firenze platform, following PCI slot properties are exported through +PCI device tree node except those generic properties (as above): + +ibm,slot-location-code System location code string for the slot connector +ibm,slot-label Slot label, part of "ibm,slot-location-code" diff --git a/doc/pci-slot.txt b/doc/pci-slot.txt deleted file mode 100644 index 1b64f69..0000000 --- a/doc/pci-slot.txt +++ /dev/null @@ -1,119 +0,0 @@ -Overview -======== - -The PCI slots are instantiated to represent their associated properties and -operations. The slot properties are exported to OS through the device tree -node of the corresponding parent PCI device. The slot operations are used -to accomodate requests from OS regarding the indicated PCI slot: - - * PCI slot reset - * PCI slot property retrival - -The PCI slots are expected to be created by individual platforms based on -the given templates, which are classified to PHB slot or normal one currently. -The PHB slot is instantiated based on PHB types like P7IOC and PHB3. However, -the normal PCI slots are created based on general RC (Root Complex), PCIE switch -ports, PCIE-to-PCIx bridge. Individual platform may create PCI slot, which doesn't -have existing template. - -The PCI slots are created at different stages according to their types. PHB slots -are expected to be created once the PHB is register (struct platform::pci_setup_phb()) -because the PHB slot reset operations are required at early stage of PCI enumeration. -The normal slots are populated after their parent PCI devices are instantiated at -struct platform::pci_get_slot_info(). - -The operation set supplied by the template might be overrided and reimplemented, or -partially. It's usually done according to the VPD figured out by individual platforms. - -PCI Slot Operations -=================== - -The following operations are supported to one particular PCI slot. More details -could be found from the definition of struct pci_slot_ops: - -get_presence_state Check if any adapter connected to slot -get_link_state Retrieve PCIE link status: up, down, link width -get_power_state Retrieve the power status: on, off -get_attention_state Retrieve attention status: on, off, blinking -get_latch_state Retrieve latch status -set_power_state Configure the power status: on, off -set_attention_state Configure attention status: on, off, blinking - -prepare_link_change Prepare PCIE link status change -poll_link Poll PCIE link until it's up or down permanently -creset Complete reset, only available to PHB slot -freset Fundamental reset -pfreset Post fundamental reset -hreset Hot reset -poll Interface for OPAL API to drive internal state machine - -add_properties Additional PCI slot properties seen by platform - -PCI Slot Properties -=================== - -The following PCI slot properties have been exported through PCI device tree -node for a root port, a PCIE switch port, or a PCIE to PCIx bridge. If the -individual platforms (e.g. Firenze and Apollo) have VPD for the PCI slot, they -should extract the PCI slot properties from VPD and export them accordingly. - -ibm,reset-by-firmware Boolean indicating whether the slot reset should be - done in firmware -ibm,slot-pluggable Boolean indicating whether the slot is pluggable -ibm,slot-power-ctl Boolean indicating whether the slot has power control -ibm,slot-wired-lanes The number of hardware lanes that are wired -ibm,slot-pwr-led-ctl Presence of slot power led, and controlling entity -ibm,slot-attn-led-ctl Presence of slot ATTN led, and controlling entity - -PCI Hotplug -=========== - -The implementation of PCI slot hotplug heavily relies on its power state. -Initially, the slot is powered off if there are no adapters behind it. -Otherwise, the slot should be powered on. - -In hot add scenario, the adapter is physically inserted to PCI slot. Then -the PCI slot is powered on by OPAL API opal_pci_set_power_state(). The -power is supplied to the PCI slot, the adapter behind the PCI slot is -probed and the device sub-tree (for hot added devices) is populated. A -OPAL message is sent to OS on completion. The OS needs retrieve the device -sub-tree through OPAL API opal_get_device_tree(), unflatten it and populate -the device sub-tree. After that, the adapter behind the PCI slot should -be probed and added to the system. - -On the other hand, the OS removes the adapter behind the PCI slot before -calling opal_pci_set_power_state(). Skiboot cuts off the power supply to -the PCI slot, removes the adapter behind the PCI slot and the corresponding -device sub-tree. A OPAL message (OPAL_MSG_ASYNC_COMP) is sent to OS. The -OS removes the device sub-tree for the adapter behind the PCI slot. - -The OPAL message used in PCI hotplug is comprised of 4 dwords in sequence: -asychronous token from OS, PCI slot device node's phandle, OPAL_PCI_SLOT_POWER_{ON, -OFF}, OPAL_SUCCESS or errcode. - -The states OPAL_PCI_SLOT_OFFLINE and OPAL_PCI_SLOT_ONLINE are used for removing -or adding devices behind the slot. The device nodes in the device tree are -removed or added accordingly, without actually changing the slot's power state. -The API call will return OPAL_SUCCESS immediately and no further asynchronous -message will be sent. - -PCI Slot on Apollo and Firenze -============================== - -On IBM's Apollo and Firenze platform, the PCI VPD is fetched from dedicated LID, -which is organized in so-called 1004, 1005, or 1006 format. 1006 mapping format -isn't supported currently. The PCI slot properties are figured out from the VPD. -On the other hand, there might have external power management entity hooked to -I2C buses for one PCI slot. The fundamental reset operation of the PCI slot should -be implemented based on the external power management entity for that case. - -On Firenze platform, PERST pin is accessible through bit#10 of PCI config register -(offset: 0x80) for those PCI slots behind some PLX switch downstream ports. For -those PCI slots, PERST pin is utilized to implement fundamental reset if external -power management entity doesn't exist. - -For Apollo and Firenze platform, following PCI slot properties are exported through -PCI device tree node except those generic properties (as above): - -ibm,slot-location-code System location code string for the slot connector -ibm,slot-label Slot label, part of "ibm,slot-location-code" diff --git a/doc/pci.rst b/doc/pci.rst new file mode 100644 index 0000000..a139176 --- /dev/null +++ b/doc/pci.rst @@ -0,0 +1,71 @@ +IODA PE Setup Sequences +----------------------- + +(WARNING: this was rescued from old internal documentation. Needs verification) + +To setup basic PE mappings, the host performs this basic sequence: + + For ibm,opal-ioda2, prior to allocating PHB resources to PEs, the host must + allocate memory for PE structures and then calls + opal_pci_set_phb_table_memory( phb_id, rtt_addr, ivt_addr, ivt_len, + rrba_addr, peltv_addr) to define them to the PHB. OPAL returns + OPAL_UNSUPPORTED status for ibm,opal-ioda PHBs. + + The host calls opal_pci_set_pe( phb_id, pe_number, bus, dev, func, + validate_mask, bus_mask, dev_mask, func mask) to map a PE to a PCI RID or + range of RIDs in the same PE domain. + + The host calls opal_pci_set_peltv(phb_id, parent_pe, child_pe, state) to + set a parent PELT vector bit for the child PE argument to 1 (a child of the + parent) or 0 (not in the parent PE domain). + +IODA MMIO Setup Sequences +------------------------- + +(WARNING: this was rescued from old internal documentation. Needs verification) + + + The host calls opal_pci_phb_mmio_enable( phb_id, window_type, window_num, 0x0) to disable the MMIO window. + + The host calls opal_pci_set_phb_mmio_window( phb_id, mmio_window, starting_real_address, starting_pci_address, segment_size) to change the MMIO window location in PCI and/or processor real address space, or to change the size -- and corresponding window size -- of a particular MMIO window. + + The host calls opal_pci_map_pe_mmio_window( pe_number, mmio_window, segment_number) to map PEs to window segments, for each segment mapped to each PE. + + The host calls opal_pci_phb_mmio_enable( phb_id, window_type, window_num, 0x1) to enable the MMIO window. + +IODA MSI Setup Sequences +------------------------ + +(WARNING: this was rescued from old internal documentation. Needs verification) + +To setup MSIs: + + +1. For ibm,opal-ioda PHBs, the host chooses an MVE for a PE to use and calls opal_pci_set_mve( phb_id, mve_number, pe_number,) to setup the MVE for the PE number. HAL treats this call as a NOP and returns hal_success status for ibm,opal-ioda2 PHBs. + +2. the host chooses an XIVE to use with a PE and calls + + a. opal_pci_set_xive_pe( phb_id, xive_number, pe_number) to authorize that PE to signal that XIVE as an interrupt. The host must call this function for each XIVE assigned to a particular PE, but may use this call for all XIVEs prior to calling opel_pci_set_mve() to bind the PE XIVEs to an MVE.For MSI conventional, the host must bind a unique MVE for each sequential set of 32 XIVEs. + + b. The host forms the interrupt_source_number from the combination of the device tree MSI property base BUID and XIVE number, as an input to opal_set_xive(interrupt_source_number, server_number, priority) and opal_get_xive(interrupt_source_number, server_number, priority) to set or return the server and priority numbers within an XIVE. + + c. opal_get_msi_64[32](phb_id, mve_number, xive_num, msi_range, msi_address, message_data) to determine the MSI DMA address (32 or 64 bit) and message data value for that xive. + + For MSI conventional, the host uses this for each sequential power of 2 set of 1 to 32 MSIs, to determine the MSI DMA address and starting message data value for that MSI range. For MSI-X, the host calls this uniquely for each MSI interrupt with an msi_range input value of 1. + + +3. For ibm,opal-ioda PHBs, once the MVE and XIVRs are setup for a PE, the host calls opal_pci_set_mve_enable( phb_id, mve_number, state)to enable that MVE to be a valid target of MSI DMAs. The host may also call this function to disable an MVE when changing PE domains or states. + +IODA DMA Setup Sequences +------------------------ + +(WARNING: this was rescued from old internal documentation. Needs verification) + + + +To Manage DMA Windows : + + +1. The host calls opal_pci_map_pe_dma_window( phb_id, dma_window_number, pe_number, tce_levels, tce_table_addr, tce_table_size, tce_page_size, utin64_t* pci_start_addr ) to setup a DMA window for a PE to translate through a TCE table structure in KVM memory. + +2. The host calls opal_pci_map_pe_dma_window_real( phb_id, dma_window_number, pe_number, mem_low_addr, mem_high_addr) to setup a DMA window for a PE that is translated (but validated by the PHB as an untranlsated address space authorized to this PE). diff --git a/doc/pci.txt b/doc/pci.txt deleted file mode 100644 index a139176..0000000 --- a/doc/pci.txt +++ /dev/null @@ -1,71 +0,0 @@ -IODA PE Setup Sequences ------------------------ - -(WARNING: this was rescued from old internal documentation. Needs verification) - -To setup basic PE mappings, the host performs this basic sequence: - - For ibm,opal-ioda2, prior to allocating PHB resources to PEs, the host must - allocate memory for PE structures and then calls - opal_pci_set_phb_table_memory( phb_id, rtt_addr, ivt_addr, ivt_len, - rrba_addr, peltv_addr) to define them to the PHB. OPAL returns - OPAL_UNSUPPORTED status for ibm,opal-ioda PHBs. - - The host calls opal_pci_set_pe( phb_id, pe_number, bus, dev, func, - validate_mask, bus_mask, dev_mask, func mask) to map a PE to a PCI RID or - range of RIDs in the same PE domain. - - The host calls opal_pci_set_peltv(phb_id, parent_pe, child_pe, state) to - set a parent PELT vector bit for the child PE argument to 1 (a child of the - parent) or 0 (not in the parent PE domain). - -IODA MMIO Setup Sequences -------------------------- - -(WARNING: this was rescued from old internal documentation. Needs verification) - - - The host calls opal_pci_phb_mmio_enable( phb_id, window_type, window_num, 0x0) to disable the MMIO window. - - The host calls opal_pci_set_phb_mmio_window( phb_id, mmio_window, starting_real_address, starting_pci_address, segment_size) to change the MMIO window location in PCI and/or processor real address space, or to change the size -- and corresponding window size -- of a particular MMIO window. - - The host calls opal_pci_map_pe_mmio_window( pe_number, mmio_window, segment_number) to map PEs to window segments, for each segment mapped to each PE. - - The host calls opal_pci_phb_mmio_enable( phb_id, window_type, window_num, 0x1) to enable the MMIO window. - -IODA MSI Setup Sequences ------------------------- - -(WARNING: this was rescued from old internal documentation. Needs verification) - -To setup MSIs: - - -1. For ibm,opal-ioda PHBs, the host chooses an MVE for a PE to use and calls opal_pci_set_mve( phb_id, mve_number, pe_number,) to setup the MVE for the PE number. HAL treats this call as a NOP and returns hal_success status for ibm,opal-ioda2 PHBs. - -2. the host chooses an XIVE to use with a PE and calls - - a. opal_pci_set_xive_pe( phb_id, xive_number, pe_number) to authorize that PE to signal that XIVE as an interrupt. The host must call this function for each XIVE assigned to a particular PE, but may use this call for all XIVEs prior to calling opel_pci_set_mve() to bind the PE XIVEs to an MVE.For MSI conventional, the host must bind a unique MVE for each sequential set of 32 XIVEs. - - b. The host forms the interrupt_source_number from the combination of the device tree MSI property base BUID and XIVE number, as an input to opal_set_xive(interrupt_source_number, server_number, priority) and opal_get_xive(interrupt_source_number, server_number, priority) to set or return the server and priority numbers within an XIVE. - - c. opal_get_msi_64[32](phb_id, mve_number, xive_num, msi_range, msi_address, message_data) to determine the MSI DMA address (32 or 64 bit) and message data value for that xive. - - For MSI conventional, the host uses this for each sequential power of 2 set of 1 to 32 MSIs, to determine the MSI DMA address and starting message data value for that MSI range. For MSI-X, the host calls this uniquely for each MSI interrupt with an msi_range input value of 1. - - -3. For ibm,opal-ioda PHBs, once the MVE and XIVRs are setup for a PE, the host calls opal_pci_set_mve_enable( phb_id, mve_number, state)to enable that MVE to be a valid target of MSI DMAs. The host may also call this function to disable an MVE when changing PE domains or states. - -IODA DMA Setup Sequences ------------------------- - -(WARNING: this was rescued from old internal documentation. Needs verification) - - - -To Manage DMA Windows : - - -1. The host calls opal_pci_map_pe_dma_window( phb_id, dma_window_number, pe_number, tce_levels, tce_table_addr, tce_table_size, tce_page_size, utin64_t* pci_start_addr ) to setup a DMA window for a PE to translate through a TCE table structure in KVM memory. - -2. The host calls opal_pci_map_pe_dma_window_real( phb_id, dma_window_number, pe_number, mem_low_addr, mem_high_addr) to setup a DMA window for a PE that is translated (but validated by the PHB as an untranlsated address space authorized to this PE). diff --git a/doc/stable-skiboot-rules.rst b/doc/stable-skiboot-rules.rst new file mode 100644 index 0000000..1db47a3 --- /dev/null +++ b/doc/stable-skiboot-rules.rst @@ -0,0 +1,62 @@ +Stable Skiboot tree/releases +---------------------------- + +If you're at all familiar with the Linux kernel stable trees, this should +seem fairly familiar. + +The purpose of a -stable tree is to give vendors a stable base to create +firmware releases from and to incorporate into service packs. New stable +releases contain critical fixes only. + +As a general rule, on the most recent skiboot release gets a maintained +-stable tree. If you wish to maintain an older tree, speak up! For example, +with my IBMer hat on, we'll maintain branches that we ship in products. + +What patches are accepted? +-------------------------- + +- Patches must be obviously correct and tested + - A Tested-by signoff is *important* +- A patch must fix a real bug +- No trivial patches, such fixups belong in main branch +- Not fix a purely theoretical problem unless you can prove how + it's exploitable +- The patch, or an equivalent one, must already be in master + - Submitting to both at the same time is okay, but backporting is better + +HOWTO submit to stable +---------------------- +Two ways: +1) Send patch to the skiboot@ list with "[PATCH stable]" in subject + - This targets the patch *ONLY* to the stable branch. + - Such commits will *NOT* be merged into master. + - Use this when: + a) cherry-picking a fix from master + b) fixing something that is only broken in stable + c) fix in stable needs to be completely different than in master + If b or c: explain why. + - If cherry-picking, include the following at the top of your + commit message: + commit upstream. + - If the patch has been modified, explain why in description. + +2) Add "Cc: stable" above your Signed-off-by line when sending to skiboot@ + - This targets the patch to master and stable. + - You can target a patch to a specific stable tree with: + Cc: stable # 5.1.x + and that will target it to the 5.1.x branch. + - You can ask for prerequisites to be cherry-picked: + Cc: stable # 5.1.x 55ae15b Ensure we run pollers in cpu_wait_job() + Cc: stable # 5.1.x + Which means: + 1) please git cherry-pick 55ae15b + 2) then apply this patch to 5.1.x". + +Trees +----- +- https://github.com/open-power/skiboot/tree/stable + git@github.com:open-power/skiboot.git (branches are skiboot-X.Y.x - e.g. skiboot-5.1.x) + +- Some stable versions may last longer than others + - So there may be skiboot-5.1.x and skiboot-5.2.x actively maintained + and skiboot-5.1.x could possibly outlast skiboot-5.2.x diff --git a/doc/stable-skiboot-rules.txt b/doc/stable-skiboot-rules.txt deleted file mode 100644 index 1db47a3..0000000 --- a/doc/stable-skiboot-rules.txt +++ /dev/null @@ -1,62 +0,0 @@ -Stable Skiboot tree/releases ----------------------------- - -If you're at all familiar with the Linux kernel stable trees, this should -seem fairly familiar. - -The purpose of a -stable tree is to give vendors a stable base to create -firmware releases from and to incorporate into service packs. New stable -releases contain critical fixes only. - -As a general rule, on the most recent skiboot release gets a maintained --stable tree. If you wish to maintain an older tree, speak up! For example, -with my IBMer hat on, we'll maintain branches that we ship in products. - -What patches are accepted? --------------------------- - -- Patches must be obviously correct and tested - - A Tested-by signoff is *important* -- A patch must fix a real bug -- No trivial patches, such fixups belong in main branch -- Not fix a purely theoretical problem unless you can prove how - it's exploitable -- The patch, or an equivalent one, must already be in master - - Submitting to both at the same time is okay, but backporting is better - -HOWTO submit to stable ----------------------- -Two ways: -1) Send patch to the skiboot@ list with "[PATCH stable]" in subject - - This targets the patch *ONLY* to the stable branch. - - Such commits will *NOT* be merged into master. - - Use this when: - a) cherry-picking a fix from master - b) fixing something that is only broken in stable - c) fix in stable needs to be completely different than in master - If b or c: explain why. - - If cherry-picking, include the following at the top of your - commit message: - commit upstream. - - If the patch has been modified, explain why in description. - -2) Add "Cc: stable" above your Signed-off-by line when sending to skiboot@ - - This targets the patch to master and stable. - - You can target a patch to a specific stable tree with: - Cc: stable # 5.1.x - and that will target it to the 5.1.x branch. - - You can ask for prerequisites to be cherry-picked: - Cc: stable # 5.1.x 55ae15b Ensure we run pollers in cpu_wait_job() - Cc: stable # 5.1.x - Which means: - 1) please git cherry-pick 55ae15b - 2) then apply this patch to 5.1.x". - -Trees ------ -- https://github.com/open-power/skiboot/tree/stable - git@github.com:open-power/skiboot.git (branches are skiboot-X.Y.x - e.g. skiboot-5.1.x) - -- Some stable versions may last longer than others - - So there may be skiboot-5.1.x and skiboot-5.2.x actively maintained - and skiboot-5.1.x could possibly outlast skiboot-5.2.x diff --git a/doc/versioning.rst b/doc/versioning.rst new file mode 100644 index 0000000..2bbad69 --- /dev/null +++ b/doc/versioning.rst @@ -0,0 +1,107 @@ +Versioning Scheme of skiboot +============================ + +History +------- +For roughly the first six months of public life, skiboot just presented a +git SHA1 as a version "number". This was "user visible" in two places: +1) /sys/firmware/opal/msglog + the familiar "SkiBoot 71664fd-dirty starting..." message +2) device tree: + /proc/device-tree/ibm,opal/firmware/git-id + +Builds were also referred to by date and by corresponding PowerKVM release. +Clearly, this was unlikely to be good practice going forward. + +As of skiboot-4.0, this scheme has changed and we now present a version +string instead. This better addresses the needs of everybody who is building +OpenPower systems. + + +Current practice +---------------- +The version string is constructed from a few places and is designed to +be *highly* informative about what you're running. For the most part, +it should be automatically constructed by the skiboot build system. The +only times you need to do something is if you are a) making an upstream +skiboot release or b) building firmware to release for your platform(s). + +OPAL/skiboot has several consumers, for example: +- IBM shipping POWER8 systems with an FSP (FW810.XX and future) +- OpenPower +- OpenPower partners manufacturing OpenPower systems +- developers, test and support needing to understand what code a system + is running + +and there are going to be several concurrent maintained releases in the wild, +likely build by different teams of people at different companies. + +tl;dr; is you're likely going to see version numbers like this (for the +hypothetical platforms 'ketchup' and 'mustard'): +skiboot-4.0-ketchup-0 +skiboot-4.0-ketchup-1 +skiboot-4.1-mustard-4 +skiboot-4.1-ketchup-0 + +If you see *extra* things on the end of the version, then you're running +a custom build from a developer +(e.g. 'skiboot-4.0-1-g23f147e-stewart-dirty-f42fc40' means something to +us - explained below). + +If you see less, for example 'skiboot-4.0', then you're running a build +directly out of the main git tree. Those producing OPAL builds for users +must *not* ship like this, even if the tree is identical. + +Here are the components of the version string from master: + +skiboot-4.0-1-g23f147e-debug-occ-stewart-dirty-f42fc40 +^ ^^^ ^ ^^^^^^^ ^-------^ ^ ^ ^^^^^^^ +| | | | | | | | +| | | | | \ / - 'git diff|sha1sum' +| | | | | \ / +| | | | | - built from a dirty tree of $USER +| | | | | +| | | | - $EXTRA_VERSION (optional) +| | | | +| | | - git SHA1 of commit built +| | | +| | - commits head of skiboot-4.0 tag +| | +| - skiboot version number ---\ +| >-- from the 'skiboot-4.0' git tag + - product name (always skiboot) ---/ + + +When doing a release for a particular platform, you are expected to create +and tag a branch from master. For the (hypothetical) ketchup platform which +is going to do a release based on skiboot-4.0, you would create a tag +'skiboot-4.0-ketchup-0' pointing to the same revision as the 'skiboot-4.0' tag +and then make any additional modifications to skiboot that were not in the 4.0 +release. So, you could ship a skiboot with the following version string: + +skiboot-4.0-ketchup-1 +^ ^^^ ^ ^ +| | | | +| | | - revision for this platform +| | | +| | | +| | - Platform name/version +| | +| - skiboot version number +| + - product name (always skiboot) + +This version string tells your users to expect what is in skiboot-4.0 plus +some revisions for your platform. + + +Practical Considerations +------------------------ + +You MUST correctly tag your git tree for sensible version numbers to be +generated. Look at the (generated) version.c file to confirm you're building +the correct version number. You will need annotated tags (git tag -a). + +If your build infrastructure does *not* build skiboot from a git tree, you +should specify SKIBOOT_VERSION as an environment variable (following this +versioning scheme), otherwise the build will fail. diff --git a/doc/versioning.txt b/doc/versioning.txt deleted file mode 100644 index 2bbad69..0000000 --- a/doc/versioning.txt +++ /dev/null @@ -1,107 +0,0 @@ -Versioning Scheme of skiboot -============================ - -History -------- -For roughly the first six months of public life, skiboot just presented a -git SHA1 as a version "number". This was "user visible" in two places: -1) /sys/firmware/opal/msglog - the familiar "SkiBoot 71664fd-dirty starting..." message -2) device tree: - /proc/device-tree/ibm,opal/firmware/git-id - -Builds were also referred to by date and by corresponding PowerKVM release. -Clearly, this was unlikely to be good practice going forward. - -As of skiboot-4.0, this scheme has changed and we now present a version -string instead. This better addresses the needs of everybody who is building -OpenPower systems. - - -Current practice ----------------- -The version string is constructed from a few places and is designed to -be *highly* informative about what you're running. For the most part, -it should be automatically constructed by the skiboot build system. The -only times you need to do something is if you are a) making an upstream -skiboot release or b) building firmware to release for your platform(s). - -OPAL/skiboot has several consumers, for example: -- IBM shipping POWER8 systems with an FSP (FW810.XX and future) -- OpenPower -- OpenPower partners manufacturing OpenPower systems -- developers, test and support needing to understand what code a system - is running - -and there are going to be several concurrent maintained releases in the wild, -likely build by different teams of people at different companies. - -tl;dr; is you're likely going to see version numbers like this (for the -hypothetical platforms 'ketchup' and 'mustard'): -skiboot-4.0-ketchup-0 -skiboot-4.0-ketchup-1 -skiboot-4.1-mustard-4 -skiboot-4.1-ketchup-0 - -If you see *extra* things on the end of the version, then you're running -a custom build from a developer -(e.g. 'skiboot-4.0-1-g23f147e-stewart-dirty-f42fc40' means something to -us - explained below). - -If you see less, for example 'skiboot-4.0', then you're running a build -directly out of the main git tree. Those producing OPAL builds for users -must *not* ship like this, even if the tree is identical. - -Here are the components of the version string from master: - -skiboot-4.0-1-g23f147e-debug-occ-stewart-dirty-f42fc40 -^ ^^^ ^ ^^^^^^^ ^-------^ ^ ^ ^^^^^^^ -| | | | | | | | -| | | | | \ / - 'git diff|sha1sum' -| | | | | \ / -| | | | | - built from a dirty tree of $USER -| | | | | -| | | | - $EXTRA_VERSION (optional) -| | | | -| | | - git SHA1 of commit built -| | | -| | - commits head of skiboot-4.0 tag -| | -| - skiboot version number ---\ -| >-- from the 'skiboot-4.0' git tag - - product name (always skiboot) ---/ - - -When doing a release for a particular platform, you are expected to create -and tag a branch from master. For the (hypothetical) ketchup platform which -is going to do a release based on skiboot-4.0, you would create a tag -'skiboot-4.0-ketchup-0' pointing to the same revision as the 'skiboot-4.0' tag -and then make any additional modifications to skiboot that were not in the 4.0 -release. So, you could ship a skiboot with the following version string: - -skiboot-4.0-ketchup-1 -^ ^^^ ^ ^ -| | | | -| | | - revision for this platform -| | | -| | | -| | - Platform name/version -| | -| - skiboot version number -| - - product name (always skiboot) - -This version string tells your users to expect what is in skiboot-4.0 plus -some revisions for your platform. - - -Practical Considerations ------------------------- - -You MUST correctly tag your git tree for sensible version numbers to be -generated. Look at the (generated) version.c file to confirm you're building -the correct version number. You will need annotated tags (git tag -a). - -If your build infrastructure does *not* build skiboot from a git tree, you -should specify SKIBOOT_VERSION as an environment variable (following this -versioning scheme), otherwise the build will fail. diff --git a/doc/xscom-node-bindings.rst b/doc/xscom-node-bindings.rst new file mode 100644 index 0000000..0c2545e --- /dev/null +++ b/doc/xscom-node-bindings.rst @@ -0,0 +1,57 @@ +XSCOM regions +============= + +The top-level xscom nodes specify the mapping range from the 64-bit address +space into the PCB address space. + +There's one mapping range per chip xscom, therefore one node per mapping range. + +/ +/xscom@/ +/xscom@/ +… +/xscom@/ + +- where is the xscom base address with the gcid-specific + bits (for chip n) OR-ed in. + +Each xscom node has the following properties: + + * #address-cells = 1 + * #size-cells = 1 + * reg = + * ibm,chip-id = gcid + * compatible = "ibm,xscom", "ibm,power8-scom" / "ibm,power7-xscom" + + +Chiplet endpoints +================= + +One sub-node per endpoint. Endpoints are defined by their (port, +endpoint-address) data on the PCB, and are named according to their endpoint +types: + +/xscom@/ +/xscom@/chiptod@ +/xscom@/lpc@ + +- where the is a single address (as distinct from the current + (gcid,base) format), consisting of the SCOM port and SCOM endpoint bits in + their 31-bit address format. + +Each endpoint node has the following properties: + + * reg = + * compatible - depends on endpoint type, eg "ibm,power8-chiptod" + +The endpoint address specifies the address on the PCB. So, to calculate the +MMIO address for a PCB register: + + mmio_addr = | (pcb_addr[1:27] << 4) + | (pcb_addr[28:31] << 3) + +Where: + + - xscom-base-addr is the address from the first two cells of the parent + node's reg property + - pcb_addr is the first cell of the endpoint's reg property diff --git a/doc/xscom-node-bindings.txt b/doc/xscom-node-bindings.txt deleted file mode 100644 index 0c2545e..0000000 --- a/doc/xscom-node-bindings.txt +++ /dev/null @@ -1,57 +0,0 @@ -XSCOM regions -============= - -The top-level xscom nodes specify the mapping range from the 64-bit address -space into the PCB address space. - -There's one mapping range per chip xscom, therefore one node per mapping range. - -/ -/xscom@/ -/xscom@/ -… -/xscom@/ - -- where is the xscom base address with the gcid-specific - bits (for chip n) OR-ed in. - -Each xscom node has the following properties: - - * #address-cells = 1 - * #size-cells = 1 - * reg = - * ibm,chip-id = gcid - * compatible = "ibm,xscom", "ibm,power8-scom" / "ibm,power7-xscom" - - -Chiplet endpoints -================= - -One sub-node per endpoint. Endpoints are defined by their (port, -endpoint-address) data on the PCB, and are named according to their endpoint -types: - -/xscom@/ -/xscom@/chiptod@ -/xscom@/lpc@ - -- where the is a single address (as distinct from the current - (gcid,base) format), consisting of the SCOM port and SCOM endpoint bits in - their 31-bit address format. - -Each endpoint node has the following properties: - - * reg = - * compatible - depends on endpoint type, eg "ibm,power8-chiptod" - -The endpoint address specifies the address on the PCB. So, to calculate the -MMIO address for a PCB register: - - mmio_addr = | (pcb_addr[1:27] << 4) - | (pcb_addr[28:31] << 3) - -Where: - - - xscom-base-addr is the address from the first two cells of the parent - node's reg property - - pcb_addr is the first cell of the endpoint's reg property -- cgit v1.1