[parisc-linux] keyboard_tasklet bug?
Ryan Bradetich
rbradetich@uswest.net
29 Nov 2001 00:45:26 -0700
Hello parisc-linux hackers,
I have spent the last couple of evenings exploring new (to me anyways)
parts of the kernel tracking down a SMP hang on my C200+. What I found
appears to be a more generic bug, so I'm posting it here for ideas on
how to fix it, or for someone to explain to me why this isn't a bug :)
After quite a bit of tracking the problem down, I figured out the kernel
wasn't halting, but was stuck in the following infinate loop from
tasklet_action() in kernel/softirq.c
while (list) {
struct tasklet_struct *t = list;
list = list->next;
if (tasklet_trylock(t)) {
if (!atomic_read(&t->count)) {
if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
t->func(t->data);
tasklet_unlock(t);
continue;
}
tasklet_unlock(t);
}
local_irq_disable();
t->next = tasklet_vec[cpu].list;
tasklet_vec[cpu].list = t;
__cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
local_irq_enable();
}
I eventually figured out that the if(!atomic_read(&t->count)) was
failing... and the task would be added back into the list via the
following lines of code:
t->next = tasklet_vec[cpu].list;
tasklet_vec[cpu].list = t;
This loop would continue since the atomic_read(&t->count) was always
non-zero, and the task was always being put back on the list.
I figured out that the keyboard_task was the task the atomic_read
was failing on, and started to investigate why. I figured out that
the keyboard_tasklet was being initialized disabled via the following
macro from include/linux/interrupt.h:
#define DECLARE_TASKLET_DISABLED(name, func, data) \
struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
This macro initialized the ->count to 1.
I also figurd out that the keyboard_tasklet was being scheduled via the
schedule_tasklet() before the enable_tasklet() was called it. (The
enable_tasklet() provides a memory barrior, then calls atomic_dec()
on the ->count of the tasklet, making it 0).
This trace shows the path to the first schedule_tasklet() of the
keyboard_tasklet, starting with the start_kernel() since that is the
common point between schedule_tasklet() and enable_tasklet().
schedule_tasklet(keyboard_tasklet)
-------------------------
1. start_kernel()
2. console_init()
3. con_init()
4. vc_init()
5. reset_terminal()
6. set_leds()
7. schedule_tasklet()
enable_tasklet(keyboard_tasklet)
--------------------------------
1. start_kernel()
2. rest_init()
3. init() via kernel_thread.
4. do_base_setup()
5. do_init_calls()
6. chr_dev_init()
7. tty_init()
8. kbd_init()
9. enable_tasklet()
Looking in the start_kernel() ... console_init() is the 9th function
called, where as rest_init() is the last function called.
I am not sure why this only showed up under SMP for my on the C200+, but
it was _very_ reproducable. As a temporary solution (and to verify I'd
found the problem), I commented out the set_leds() in reset_terminal()
and the C200+ boots both SMP and UP fine. I know this is not the proper
fix, but I am not sure how to fix this problem, thus my post to the list
:)
Thanks for reading, and any feedback welcome!
- Ryan