[parisc-linux] 2.4.18-pa35 SMP process hangs on a J200

Ryan Bradetich rbradetich@uswest.net
10 Jun 2002 23:36:46 -0600


Hello parisc-linux hackers,


I have heard of SMP hangs on the A500 from the ESIEE and Grant's system,
so I installed the 2.4.18-pa35 on a dual processor J200 and tried to
duplicate the hangs and hopefully provide a different perspective on the
hang.

The good and bad news is that I can duplicate the process hangs on the 
J200 by simply running two instances of the setiathome program.

Digging into the system a bit, here is what I found:

	* The setiathome process that hung (PID 326) will hang any other
	  process that tries to access /proc/326/*.  (This is why top,
	  ps, etc all hang after the process gets stuck).

	* None of the other processes appear to be stuck. (ie. I can 
	  access the /proc/PID/* information and the command will 	  return).

	* The processes that hang while trying to read from the stuck 
	  process go into a disk sleep and never return.

		This is the a "hung" process that access the stuck 		process.

		# cat status 
		Name:	ps
		State:	D (disk sleep)
		Tgid:	1362
		Pid:	1362
		PPid:	1359
		TracerPid:	0
		Uid:	0	0	0	0
		Gid:	0	0	0	0
		FDSize:	256
		Groups:	0 
		VmSize:	    3032 kB
		VmLck:	       0 kB
		VmRSS:	     880 kB
		VmData:	    1136 kB
		VmStk:	       0 kB
		VmExe:	      68 kB
		VmLib:	    1420 kB
		SigPnd:	0000000000000000
		SigBlk:	0000000000000000
		SigIgn:	8000000000000000
		SigCgt:	000000007f2ffef9
		CapInh:	0000000000000000
		CapPrm:	00000000fffffeff
		CapEff:	00000000fffffeff


It appears that the stuck process also affected my serial console login,
so I am not able to gather more information using the magic-sysrq
commands.  I will try to reboot the system, and see if I can keep a
console up and use the magic-sysrq commands next time the process gets
stuck.


Is this the same behavior people are seeing on the SMP A500's?  Any
ideas on where to continue debugging this (possible deadlock problem?
Can I see what locks are being held by the stuck process? etc..)  I will
continue to poke around and see what I can find also.

Thanks,

- Ryan