diff options
author | Stewart Smith <stewart@linux.vnet.ibm.com> | 2017-10-02 12:08:25 +1100 |
---|---|---|
committer | Stewart Smith <stewart@linux.vnet.ibm.com> | 2017-10-11 16:45:34 +1100 |
commit | 696d378d7b7295366e115e89a785640bf72a5043 (patch) | |
tree | 83cfb7042a91d5d7489684f1b23ba44294663537 /doc | |
parent | e363cd66debb6a83e64bdd3bbdbf0eff501443a8 (diff) | |
download | skiboot-696d378d7b7295366e115e89a785640bf72a5043.zip skiboot-696d378d7b7295366e115e89a785640bf72a5043.tar.gz skiboot-696d378d7b7295366e115e89a785640bf72a5043.tar.bz2 |
fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM
We had a race condition between FSP Reset/Reload and powering down
the system from the host:
Roughly:
FSP Host
--- ----
Power on
Power on
(inject EPOW)
(trigger FSP R/R)
Processes EPOW event, starts shutting down
calls OPAL_CEC_POWER_DOWN
(is still in R/R)
gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
(FSP comes back)
spinning in opal_poll_events
(thinks host is running)
The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
error path for fsp_sync_msg() is to return -1, which means we give
the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
docs give us the opportunity to return OPAL_BUSY when trying again
later may be successful, and we're ambiguous as to if you should retry
on OPAL_INTERNAL_ERROR.
For reference, the linux code looks like this:
>static void __noreturn pnv_power_off(void)
>{
> long rc = OPAL_BUSY;
>
> pnv_prepare_going_down();
>
> while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> rc = opal_cec_power_down(0);
> if (rc == OPAL_BUSY_EVENT)
> opal_poll_events(NULL);
> else
> mdelay(10);
> }
> for (;;)
> opal_poll_events(NULL);
>}
Which means that *practically* our only option is to return OPAL_BUSY
or OPAL_BUSY_EVENT.
We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
running pollers to communicate with the FSP and do the final bits of
Reset/Reload handling before we power off the system.
Additionally, we really should update our documentation to point all
of these return codes and what action an OS should take.
CC: stable
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/opal-api/opal-cec-power-down-5.rst | 18 | ||||
-rw-r--r-- | doc/opal-api/return-codes.rst | 6 |
2 files changed, 20 insertions, 4 deletions
diff --git a/doc/opal-api/opal-cec-power-down-5.rst b/doc/opal-api/opal-cec-power-down-5.rst index 6daea3d..bdcb84e 100644 --- a/doc/opal-api/opal-cec-power-down-5.rst +++ b/doc/opal-api/opal-cec-power-down-5.rst @@ -24,16 +24,28 @@ Return Values ------------- ``OPAL_SUCCESS`` - the power down was updated successful + the power down request was successful. + This may/may not result in immediate power down. An OS should + spin in a loop after getting `OPAL_SUCCESS` as it is likely that there + will be a delay before instructions stop being executed. ``OPAL_BUSY`` - unable to power down, try again later + unable to power down, try again later. + +``OPAL_BUSY_EVENT`` + Unable to power down, call `opal_run_pollers` and try again. ``OPAL_PARAMETER`` a parameter was incorrect ``OPAL_INTERNAL_ERROR`` - hal code sent incorrect data to hardware device + Something went wrong, and waiting and trying again is unlikely to be + successful. Although, considering that in a shutdown code path, there's + unlikely to be any other valid option to take, retrying is perfectly valid. + + In older OPAL versions (prior to skiboot v5.9), on IBM FSP systems, this + return code was returned erroneously instead of OPAL_BUSY_EVENT during an + FSP Reset/Reload. ``OPAL_UNSUPPORTED`` this platform does not support being powered off. diff --git a/doc/opal-api/return-codes.rst b/doc/opal-api/return-codes.rst index 03ea5c1..3ea4a3d 100644 --- a/doc/opal-api/return-codes.rst +++ b/doc/opal-api/return-codes.rst @@ -40,7 +40,8 @@ OPAL_BUSY #define OPAL_BUSY -2 -Try again later. +Try again later. Related to `OPAL_BUSY_EVENT`, but `OPAL_BUSY` indicates that the +caller need not call `OPAL_POLL_EVENTS` itself. **TODO** Clarify current situation. OPAL_PARTIAL ------------ @@ -126,6 +127,9 @@ OPAL_BUSY_EVENT #define OPAL_BUSY_EVENT -12 +The same as `OPAL_BUSY` but signals that the OS should call `OPAL_POLL_EVENTS` as +that may be required to get into a state where the call will succeed. + OPAL_HARDWARE_FROZEN -------------------- :: |