From 4cd4262034849da01eb88659af677b69f8169f06 Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Wed, 26 Nov 2008 21:04:24 -0500 Subject: [PATCH] sched: prevent divide by zero error in cpu_avg_load_per_task Impact: fix divide by zero crash in scheduler rebalance irq While testing the branch profiler, I hit this crash: divide error: 0000 [#1] PREEMPT SMP [...] RIP: 0010:[] [] cpu_avg_load_per_task+0x50/0x7f [...] Call Trace: <0> [] find_busiest_group+0x3e5/0xcaa [] rebalance_domains+0x2da/0xa21 [] ? find_next_bit+0x1b2/0x1e6 [] run_rebalance_domains+0x112/0x19f [] __do_softirq+0xa8/0x232 [] call_softirq+0x1c/0x3e [] do_softirq+0x94/0x1cd [] irq_exit+0x6b/0x10e [] smp_apic_timer_interrupt+0xd3/0xff [] apic_timer_interrupt+0x13/0x20 The code for cpu_avg_load_per_task has: if (rq->nr_running) rq->avg_load_per_task = rq->load.weight / rq->nr_running; The runqueue lock is not held here, and there is nothing that prevents the rq->nr_running from going to zero after it passes the if condition. The branch profiler simply made the race window bigger. This patch saves off the rq->nr_running to a local variable and uses that for both the condition and the division. Signed-off-by: Steven Rostedt Peter Zijlstra Signed-off-by: Ingo Molnar --- kernel/sched.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 9b1e79371c2..700aa9a1413 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -1453,9 +1453,10 @@ static int task_hot(struct task_struct *p, u64 now, struct sched_domain *sd); static unsigned long cpu_avg_load_per_task(int cpu) { struct rq *rq = cpu_rq(cpu); + unsigned long nr_running = rq->nr_running; - if (rq->nr_running) - rq->avg_load_per_task = rq->load.weight / rq->nr_running; + if (nr_running) + rq->avg_load_per_task = rq->load.weight / nr_running; else rq->avg_load_per_task = 0; -- 2.41.0