Timer initialization & conversion bug fixes

This fixes two race conditions that lead to a watchdog timeout:

1) ticks_to_usecs()

common/timer.c:process_timers() wraps its body in a
"while (next.val <= get_time().val)" loop meant to ensure that
it never returns after having scheduled an expired timer
(to address potential __hw_clock_event_set() overflows/underflows).
However get_time() through __hw_clock_source_read() calls ticks_to_usecs()
which "expands" the hw_rollover_count by a truncated clock_div_factor which
causes that loop condition to observe a "current time" that is up to ~15us
in the past (assuming a 24MHz clock). This race arises frequently with
workloads that repeatedly sleep for a short duration.

2) __hw_clock_event_irq()

The HW timer rollover interrupt was configured to be higher priority than
the event timer interrupt (i.e. it can preempt it) which is problematic if:
- There is a scheduled deadline soon after a "clksrc_high / .le.hi" boundary
- An earlier (before the clksrc_high rollover) event timer interrupt kicks in
- After the event timer interrupt handler gets to "now = get_time()"
  in common/timer.c:process_timers() the rollover interrupt triggers
  incrementing clksrc_high (i.e. the overflow case)
- The rollover interrupt handler arms the event timer to trigger at
  that deadline (mentioned in the first bullet) and returns
- The original event timer interrupt handler resumes execution but finds
  no events to schedule since the "timer_deadline[tskid].le.hi == now.le.hi"
  clause won't evaluate to true. It will then call __hw_clock_event_clear()
  before returning causing a watchdog timeout

This commit also contains a fix to properly initialize the HW timer
after a sysjump.

BRANCH=none
BUG=none
TEST=Reproduced both races and successfully tested the fix. The workload I was
     using to reproduce (typically within an hour) has been running smoothly
     for the past 24 hours.

Change-Id: Ic0b0958e66e701b52481fcfe506745ca1c892dd1
Signed-off-by: Nadim Taha <ntaha@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/347465
Reviewed-by: Bill Richardson <wfrichar@chromium.org>
This commit is contained in:
Nadim Taha
2016-05-27 01:41:51 -07:00
committed by Nadim Taha
parent d1beddc463
commit 00c1a0993f

View File

@@ -22,21 +22,12 @@
* this file.
*/
static uint32_t clock_mul_factor;
static uint32_t clock_div_factor;
static uint32_t hw_rollover_count;
static inline uint32_t ticks_to_usecs(uint32_t ticks)
{
return hw_rollover_count * clock_div_factor + ticks / clock_mul_factor;
}
static inline uint32_t usecs_to_ticks(uint32_t usecs)
{
/* Really large counts will just be scheduled more than once */
if (usecs >= clock_div_factor)
return 0xffffffff;
return usecs * clock_mul_factor;
return ((uint64_t)hw_rollover_count * 0xffffffff + ticks)
/ clock_mul_factor;
}
static void update_prescaler(void)
@@ -57,7 +48,6 @@ static void update_prescaler(void)
* Assume the clock rate is an integer multiple of MHz.
*/
clock_mul_factor = PCLK_FREQ / 1000000;
clock_div_factor = 0xffffffff / clock_mul_factor;
}
DECLARE_HOOK(HOOK_FREQ_CHANGE, update_prescaler, HOOK_PRIO_DEFAULT);
@@ -93,15 +83,16 @@ void __hw_clock_event_set(uint32_t deadline)
}
/*
* Handle event matches. It's lower priority than the HW rollover irq, so it
* will always be either before or after a rollover exception.
* Handle event matches. It's priority matches the HW rollover irq to prevent
* a race condition that could lead to a watchdog timeout if preempted after
* the get_time() call in process_timers().
*/
void __hw_clock_event_irq(void)
{
__hw_clock_event_clear();
process_timers(0);
}
DECLARE_IRQ(GC_IRQNUM_TIMEHS0_TIMINT2, __hw_clock_event_irq, 2);
DECLARE_IRQ(GC_IRQNUM_TIMEHS0_TIMINT2, __hw_clock_event_irq, 1);
uint32_t __hw_clock_source_read(void)
{
@@ -114,7 +105,9 @@ uint32_t __hw_clock_source_read(void)
void __hw_clock_source_set(uint32_t ts)
{
GR_TIMEHS_LOAD(0, 1) = 0xffffffff - usecs_to_ticks(ts);
hw_rollover_count = ((uint64_t)ts * clock_mul_factor) >> 32;
GR_TIMEHS_LOAD(0, 1) = 0xffffffff - ts * clock_mul_factor;
GR_TIMEHS_BGLOAD(0, 1) = 0xffffffff;
}
/* This handles rollover in the HW timer */