[parisc-linux] N Class SMP pb ? (follow up)

Grant Grundler grundler@parisc-linux.org
Thu, 25 Sep 2003 17:35:00 -0600


On Thu, Sep 25, 2003 at 04:56:26PM +0200, Joel Soete wrote:
...
> As already mentionned in previous mail that I could read many 6, 15 (but
> it seems to be normal in UP kernel those interruption occurs)


Yes - 6 is ITLB miss and 15 is Data TLB miss.

> but (most interesting) it is the very first time that I got
> the message making failed the kernel:
> [...]
> handle_interruption(26, ...).

26 is "Data Memory Access rights Trap".
This sounds normal for Copy-On-Write.

> SMP CALL FUNCTION TIMED OUT (CPU=1)

The IPI handler will time out if the other CPU doesn't ack
the function call with in a second. This is bad.
It means either other CPU never got the interrupt (locked up
with I-bit off) or the "unstarted_count" isn't coherent
between the CPUs.

> handle_interruption(26, ...).
>
> Could this be a pb with sync between cpu time ref?
> (because timeout = jiffies + HZ)

I don't think so since jiffies is a global.
And it's always be measured on the same CPU.

> I have also a look for where this function is called but never see its return
> code tested to launch a 'stack dump' and a stop of system?

You need to find out who is using smp_call_function() and which function
they are trying to invoke. I suspect it's coming from mm/slab.c but
would know which of the three it might be.

grant