[parisc-linux] 2.4.18 SMP instability

Grant Grundler grundler@dsl2.external.hp.com
Sun, 26 May 2002 00:09:47 -0600


Robert Stanford wrote:
> Regarding the below post, have the SMP issues been worked out on 2.4.18
> yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
> to use apt with an smp kernel.

uhm...I see that I'm using UP kernels on my boxes right now.
I'll rebuild SMP and retest.

I did just find an SMP problem in the current EIEM handling.
Can't say if this is really causing any problems right now though.
Stop reading now if you don't know about (or don't want to) EIEM.

If enable_irq or disable_irq gets called from a CPU other than
the one the device driver is supposed to interrupt, it will set the
EIEM bit in only *that* (the wrong) CPU. The result is the interrupt
will remain masked on the target CPU. I think the solution
is to use a global "eiem_val" (set/clear bits here) to match
the global EIRR switch table.  I've thought about moving to a
per-CPU EIEM/EIRR switch table. But that's more work than I
have time for right now and would have a similar problem.
For now, we just need to update EIEM on all CPUs whenever the
eiem_val global changes.

We do NOT currently distribute interrupts.
I did write a patch to distribute IO interrupts:
	ftp://ftp.parisc-linux.org/patches/irq_distr.diff

This diff can't be applied until the EIEM issue is fixed.

I suspect we don't (usually) have a problem with EIEM since all
interrupts are going to CPU 0 (aka Monarch) and nearly all driver
initialization takes place before the system is multithreaded.
The only other possibility is processes are only running on CPU 0.
ie when loading a device driver later, it always gets initialized on the
monarch. This scenario would also match the "top" output where
a 2-way system is always 50% idle and a 4-way is 75% idle.

I'd like to learn some way of seeing which CPU is running which
processes. top doesn't seem to indicate that. I'll look at sysstat
package later.

grant