[parisc-linux] N Class SMP pb ? (follow up)

Grant Grundler grundler@parisc-linux.org
Fri, 26 Sep 2003 10:50:45 -0600


On Fri, Sep 26, 2003 at 05:46:35PM +0200, Joel Soete wrote:
> >It means either other CPU never got the interrupt (locked up
> >with I-bit off) or the "unstarted_count" isn't coherent between the CPUs.
> 
> hmm how could I verify this hypothesis?

TOC the machine, "ser pim" and look at PSW in TOC Info for each CPU.
bit 0 is the I-Bit IIRC.

On second thought, I'm skeptical unstarted_count isn't coherent
since it's a kernel global as well (like jiffies).

> >You need to find out who is using smp_call_function() and which function
> >they are trying to invoke. I suspect it's coming from mm/slab.c but
> >would know which of the three it might be.
> 
> Effectively I don't find another place where it is called. And so add a
> printk in each function calling smp_call_function_all_cpus() finaly.
> 
> That is allowing me to notice severall call to kmem_tune_cpucache() (7 exactly)
> (and not other) but don't get any more 'SMP CALL FUNCTION TIMED OUT (CPU=1)'
> :(
> (i presume that, as previously, the system crash before having the opportunity
> to flush its buffer?)
> 
> What do you think?

Could be.
Add mdelay(100) (or higher) after the lines of output you've added.
The works if it's a functional problem that's not timing dependent.

Otherwise setup kernel crash dump and use tools from bruno/phi to view
contents of the kernel message buffer.

grant