[parisc-linux] do_page_fault() infinite loop running 2.4.20-pa18 #9 SMP

John David Anglin dave@hiauly1.hia.nrc.ca
Sun, 5 Jan 2003 01:16:06 -0500 (EST)


> On Sat, Jan 04, 2003 at 02:38:15PM -0500, John David Anglin wrote:
> > This has been around for awhile.  When using a SMP configuration, the
> > program expect "causes" a segmentation fault that results in do_page_fault()
> > going into an infinite loop.  The log data repeats indefinitely and
> > eventually fills /var.  For some reason, expect is not killed by the kernel
> > when this happens, although the loop can be broken by manually killing it.
> 
> This on gsyprf11? (running SMP 2.4.20-pa13 on a500-65)

We were running 2.4.20-pa18 earlier today.  I rebooted to see if
that would help and SMP 2.4.20-pa13 came up.  It think the sample
fault below was on 2.4.20-pa18.

> I'm hoping this is unrelated to my entry.S changes.

Possibly, this is involved.  The IAOQ below points to an address in
the dynamic loader or a shared library.  I tried building a static
version of expect to see if I could locate which code was causing
the problem but it didn't work at all.  It caused page faults in
what was possibly a syscall.  The return pointer was still above
0x40000000.

> But is certainly sounds like that kind of problem.
> 
> In -pa12, Randolph and I fixed:
> | revision 1.98
> | date: 2002/12/09 06:09:08;  author: tausq;  state: Exp;  lines: +2 -2
> | -pa12
> | fix interruption return path so that it will process signals after
> | handle_interruption()
> | (thanks to Grant for pointing this out)
> 
> Since I broken this with -pa11, maybe the rebuild of -pa13 picked
> up the old -pa11 entry.o?

Don't know.  However, I haven't seen the hang during gcc's configure
process.  That's where I first noticed the page fault problem that
you and Randolph fixed above.

> I'll rebuild from scratch to rule this out and reboot gsyprf11.
> 
> Perhaps a user space signal handler is interfering?
> 
> BTW, appended is one "expect" segfault info from dmesg ouput.
> Dmesg output is filled with the same PID and AFAICT the register dumps
> look identical too. "infinite" is about right.
> 
> grant
> 
> do_page_fault() pid=28552 command='expect' type=15 address=0x00000014
> 
>      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 00000000000001001111111100001111 Not tainted
> r00-03  0000000000000000 fffffffffffffffa 00000000403309c4 00000000403309d4
> r04-07  0000000040330970 000000004032ea28 0000000000000063 000000004032ea28
> r08-11  0000000000021110 0000000000205ff4 0000000000000006 0000000000003b1b
> r12-15  0000000000000001 0000000000000000 0000000000207d40 0000000000000001
> r16-19  0000000000000000 0000000000000001 0000000000000000 000000004032ea28
> r20-23  000000000000000b 000000000000000c 0000000000205628 00000000002055f8
> r24-27  0000000000000030 0000000000000000 0000000040330970 0000000000020d44
> r28-31  0000000000000002 00000000403309e8 00000000faf05a40 0000000000000000
> sr0-3   000000000037b780 000000000037b780 0000000000000000 000000000037b780
> sr4-7   000000000037b780 000000000037b780 000000000037b780 000000000037b780
> 
> IASQ: 000000000037b780 000000000037b780 IAOQ: 000000004025b45f 000000004025b463
> IIR: 0eb41290    ISR: 000000000037b780  IOR: 0000000000000014
> CPU:        1   CR30: 0000000030754000 CR31: 0000000000008020
> ORIG_R28: 0000000000000002
> 
> 

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)