looks like we had a bit of a race condition: set_state() was
effectively just an assignment opration to pd[port].task_state. it's
called asynchronously from pd_set_suspend() in response to a
PD_SUSPEND message from the AP as well as from pd_task() before it
enters its main event loop. this can take a long time because
tcpci_tcpm_init() has a 300ms timeout. last one wins.
similarly, when pd_task() is running its main loop, pd_set_suspend()
really needs to wait for pd_task() to actually enter the
PD_STATE_SUSPENDED state before the caller can assume that the
pd_task() has stopped accessing the TCPC.
the particular failure case was when depthcharge would decide to do a
TCPC firmware update. it starts by sending a PD_SUSPEND to the EC,
then accessing the TCPC. unfortunately, the pd_task() hadn't gotten
out of the way yet, thus causing TCPC access chaos.
so, i'm adding a req_suspend_state flag to the pd_protocol struct so
we can tell pd_task() to suspend itself in a controlled manner. when
pd_task() is ready to do a state change - basically at the top of the
main event loop - it'll change to PD_STATE_SUSPENDED and clear the
req_suspend_state flag.
in any case, pd_set_suspend() still needs to wait around for pd_task()
to enter the suspended state as we don't have a fancy handshake
mechanism between these tasks.
TEST=in combination with some follow-on CLs, ps8751 firmware update
works properly where previously it needed a ~2 second delay
for the EC pd_task() to settle. the way to trigger the
failure was to insert or remove the power brick.
BRANCH=none
BUG=b:62356808
Change-Id: I363803ff60db31ccf84d592f8c9d1610fbe0f9ce
Signed-off-by: Caveh Jalali <caveh@google.com>
Reviewed-on: https://chromium-review.googlesource.com/544659
Reviewed-by: Shawn N <shawnn@chromium.org>