[parisc-linux] Re:[parisc-linux-cvs] linux-2.6 kyle
Joel Soete
soete.joel at scarlet.be
Wed Aug 16 10:18:31 MDT 2006
> > On Mon, 2006-08-07 at 19:04 +0200, Joel Soete wrote:
> > > > On Mon, 2006-08-07 at 12:32 +0200, Joel Soete wrote:
> > > > > Oops my bad: I forgot the raw data, here there are
> > > >
> > > > Thanks, that's still not quite enough to say what they were doing at the
> > > > time.
> > > >
> > > > cpu0 ioaq is 10104c90 10104c94
> > > > cpu1 ioaq is 1024a1a4 1024a18c
> > > >
> > > > Could you do me the disassembly from the entry symbol down into each of
> > > > these so I can try to figure out where they're stuck? (Your prior
> > > > disassembly ends just a bit short).
> > > >
> > > No pb:
> > > CPU#0 is runing:
> > > as from System.map:
> >
> > Oh, that's actually not quite what I thought it was doing. Both CPU0
> > and 1 are spinning trying to acquire the rw->lock. That means something
> > else has it held for write. How many CPUs does this system have?
> This is a D380 2-way pa8000:
> processor : 0
> cpu family : PA-RISC 2.0
> cpu : PA8000 (PCX-U)
> cpu MHz : 180.000000
> model : 9000/820/D380
> model name : UL proc 2-way U180 1M/1M
> hversion : 0x00005aa0
> sversion : 0x00000491
> I-cache : 1024 KB
> D-cache : 1024 KB (WB, direct-mapped)
> ITLB entries : 96
> DTLB entries : 96 - shared with ITLB
> BTLB : not supported
> bogomips : 359.62
> software id : 2005347373
>
> with 256Mb of ram and 512Mb of swap.
>
> > And what were the rest of them doing?
> >
> It was runing a kernel 2.6.18-rc3-pa5 32bit smp (scsi drives ncr53c720 doesn't
> yet allow me to try 64bit) and the most its job at the hang moment was a
> kernel's compile loop:
> #while true ; do nice -n -4 make -j 2 clean ; nice -n -4 make -j 2 oldconfig;
> nice -n -4 make -j 2 vmlinux 2>&1 | tee -a /var/logs/k-loop ; done
>
> (A top showing me well at this moment the 2 cpu busy by cc1)
>
> (and no stree loop on fs).
>
On this same system: another test, another hang with same console messages:
BUG: soft lockup detected on CPU#1!
Backtrace:
[<101557b0>] softlockup_tick+0xd4/0x148
[<1013b88c>] update_process_times+0x3c/0x88
[<101091c0>] timer_interrupt+0xd0/0x1a8
[<10155c0c>] handle_IRQ_event+0x5c/0xa4
[<10155cec>] __do_IRQ+0x98/0x1d0
[<1010993c>] do_cpu_irq_mask+0xdc/0x194
[<1010c068>] intr_return+0x0/0x1c
BUG: soft lockup detected on CPU#0!
Backtrace:
[<101557b0>] softlockup_tick+0xd4/0x148
[<1013b88c>] update_process_times+0x3c/0x88
[<101091c0>] timer_interrupt+0xd0/0x1a8
[<10155c0c>] handle_IRQ_event+0x5c/0xa4
[<10155cec>] __do_IRQ+0x98/0x1d0
[<1010993c>] do_cpu_irq_mask+0xdc/0x194
[<1010c068>] intr_return+0x0/0x1c
the top showing:
top - 12:04:53 up 2 days, 22:52, 3 users, load average: 2.44, 2.43, 2.37
Tasks: 107 total, 3 running, 104 sleeping, 0 stopped, 0 zombie
Cpu0 : 25.9%us, 74.1%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 78.0%us, 15.6%sy, 0.0%ni, 0.7%id, 0.0%wa, 0.0%hi, 5.7%si, 0.0%st
Mem: 255124k total, 186492k used, 68632k free, 73616k buffers
Swap: 517480k total, 0k used, 517480k free, 47896k cached
PID PPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P WCHAN
COMMAND
13285 13284 root 21 -4 11336 7280 1992 R 80 2.9 0:03.33 1 lock_text cc1
2146 1 gkrellmd 15 0 5388 1500 1108 S 16 0.6 444:44.26 1 select
gkrellmd
3311 3266 root 16 0 2972 1404 1040 R 9 0.6 479:27.73 0 read top
3213 1 root 15 0 3924 1456 748 S 1 0.6 25:25.27 1 select
screen
13277 12826 root 15 -4 2940 1348 1056 S 1 0.5 0:00.20 1 wait sh
25814 25805 jso 15 0 9928 1832 1268 S 1 0.7 16:23.23 0 select
sshd
1 0 root 15 0 2380 828 680 S 0 0.3 0:32.94 1 select
init
(sorry I missed the 3rd running process ;_( )
but the toc is showing something a bit different:
----------------- Processor 0 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000010247000 000000001013e694 000000001049ee80
04-07 000000000000000e 0000000010094ca0 000000001049e810 0000000000000001
08-11 000000001f810038 0000000010444280 00000000104cf9c0 000000001049e810
12-15 000000001049e810 000000000002ed58 000000000002e558 000000000001cbfc
16-19 000000001c830440 0000000000030d58 000000000002e558 00000000deaf1eed
20-23 0000000000000000 0000000000000000 0000000010133a90 00000000bb20f6f9
24-27 000000001f810038 0000000000000001 000000001049ee80 0000000010445010
28-31 0000000000000000 00000000104cf810 000000001c830980 000000001014ca1c
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000002112 0000000000000000 00000000000000c0 0000000000000017
12-15 0000000000000000 0000000000000000 000000000010b000 00000000ff000000
16-19 00002a121edeb615 0000000000000000 0000000010247518 000000000c68109c
20-23 0000000004240041 000000002789ee84 00000000002ceb0f 0000000080000000
24-27 00000000004a3000 000000000c4de000 00000000aaaaaaaa 00000000aaaaaaaa
28-31 00000000000003d0 0000000011111111 000000001c830000 00000000104f8000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00001089
04-07 00000000 00000000 00000000 00000000
IIA Space = 0x0000000000000000
IIA Offset = 0x0000000010247508
CPU State = 0x9e000001
----------------- Processor 1 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000010247000 000000001013e694 000000001049ee80
04-07 000000000000000e 000000001009cca0 000000001049e810 0000000000000001
08-11 000000001e496938 0000000010444280 00000000104cf9c0 000000001049e810
12-15 00000000ffffffff 000000001049e810 0000000000000000 0000000000054ec0
16-19 000000001d7d8200 0000000000000000 000000001d7d8108 00000000deaf1eed
20-23 0000000000000001 0000000000000000 0000000010133a90 00000000bb20f6f9
24-27 000000001e496938 0000000000000001 000000001049ee80 0000000010445010
28-31 0000000000000000 00000000104cf810 000000001d7d8740 000000001014ca1c
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000f2a 0000000000000000 00000000000000c0 000000000000001f
12-15 0000000000000000 0000000000000000 000000000010b000 00000000ff000000
16-19 00002a126188a653 0000000000000000 0000000010247518 000000000c68109c
20-23 0000000000000000 0000000000000000 00000000002cef0f 0000000080000000
24-27 00000000004a3000 000000000da30000 00000000aaaaaaaa 00000000400a04a0
28-31 00000000aaaaaaaa 0000000011111111 000000001d7d8000 0000000011111111
Space Registers 0 - 7
00-03 00000000 00000795 00000000 00000795
04-07 00000000 00000000 00000000 00000000
IIA Space = 0x0000000000000000
IIA Offset = 0x0000000010247508
CPU State = 0x9e000001
and the analysis give me:
----------------- Processor 0 TOC Information -------------------
GR of CPU[0]
00-03 0000000000000000 0000000010247000 000000001013e694 000000001049ee80
04-07 000000000000000e 0000000010094ca0 000000001049e810 0000000000000001
08-11 000000001f810038 0000000010444280 00000000104cf9c0 000000001049e810
12-15 000000001049e810 000000000002ed58 000000000002e558 000000000001cbfc
16-19 000000001c830440 0000000000030d58 000000000002e558 00000000deaf1eed
20-23 0000000000000000 0000000000000000 0000000010133a90 00000000bb20f6f9
24-27 000000001f810038 0000000000000001 000000001049ee80 0000000010445010
28-31 0000000000000000 00000000104cf810 000000001c830980 000000001014ca1c
GR[02] == rp = 000000001013e694
Func: send_group_sig_info, Off: 0x30, Addr: 0x1013e694
1013e690: 08 19 02 47 copy r25,r7
1013e694: 08 07 02 59 copy r7,r25
1013e698: 08 04 02 5a copy r4,r26
1013e69c: eb ff be 25 b,l 1013e5b4 <group_send_sig_info>,rp
GR[22] == t1(32bits) == arg4(64bits) = 0000000010133a90
GR[21] == t2(32bits) == arg5(64bits) = 0000000000000000
GR[20] == t3(32bits) == arg6(64bits) = 0000000000000000
GR[19] == t4(32bits) == arg7(64bits) = 00000000deaf1eed
GR[26] == arg0 = 000000001049ee80
GR[25] == arg1 = 0000000000000001
GR[24] == arg2 = 000000001f810038
GR[23] == arg3 = 00000000bb20f6f9
GR[27] == dp = 0000000010445010
Func: $global$, Off: 0x0, Addr: 0x10445010
GR[28] == ret0 = 0000000000000000
GR[29] == ret1 or sl = 00000000104cf810
GR[30] == sp = 000000001c830980
GR[31] == ble rp = 000000001014ca1c
Func: hrtimer_run_queues, Off: 0x18c, Addr: 0x1014ca1c
1014ca10: 08 04 02 56 copy r4,r22
1014ca14: e6 c0 20 00 be,l 0(sr4,r22),sr0,r31
1014ca18: 08 1f 02 42 copy r31,rp
1014ca1c: 08 06 02 5a copy r6,r26
CR of CPU[0]
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000002112 0000000000000000 00000000000000c0 0000000000000017
12-15 0000000000000000 0000000000000000 000000000010b000 00000000ff000000
16-19 00002a121edeb615 0000000000000000 0000000010247518 000000000c68109c
20-23 0000000004240041 000000002789ee84 00000000002ceb0f 0000000080000000
24-27 00000000004a3000 000000000c4de000 00000000aaaaaaaa 00000000aaaaaaaa
28-31 00000000000003d0 0000000011111111 000000001c830000 00000000104f8000
CR[00] == rctr = 0000000000000000
CR[08] == (Protection ID) pidr1 = 0000000000002112
CR[10] == ccr = 00000000000000c0
CR[11] == sar = 0000000000000017
CR[14] == iva = 000000000010b000
CR[15] == eiem = 00000000ff000000
CR[16] == itmr = 00002a121edeb615
CR[17] == pcsq = 0000000000000000
CR[18] == pcoq = 0000000010247518
CR[19] == iir = 000000000c68109c
CR[20] == isr = 0000000004240041
CR[21] == ior = 000000002789ee84
CR[22] == ipsw = 00000000002ceb0f
CR[23] == eirw = 0000000080000000
CR[24] == tr0 (ptov) = 00000000004a3000
CR[25] == tr1 (vtop) = 000000000c4de000
CR[26] == tr2 = 00000000aaaaaaaa
CR[27] == tr3 = 00000000aaaaaaaa
CR[28] == tr4 = 00000000000003d0
CR[29] == tr5 = 0000000011111111
CR[30] == tr6 = 000000001c830000
CR[31] == tr7 = 00000000104f8000
SR of CPU[0]
00-03 00000000 00000000 00000000 00001089
04-07 00000000 00000000 00000000 00000000
SR[0] = 00000000
SR[1] = 00000000
SR[2] = 00000000
SR[3] = 00001089
SR[4] = 00000000
SR[5] = 00000000
SR[6] = 00000000
SR[7] = 00000000
Need much more work !!!
SR[00] == ts0 = 00000000
SR[01] == ts1 = 00000000
SR[03] == cpp = 00001089
Not parsable address!
...
IIA Offset = 0x0000000010247508
...
e.g. IAOQ = 0x0000000010247508
Parse IAOQ = 0x0000000010247508 for CPU[0]
Func: _raw_read_lock, Off: 0x24, Addr: 0x10247508
10247500: 0c 60 15 dc ldcw,co 0(r3),ret0
10247504: 8f 80 20 1a cmpib,<>,n 0,ret0,10247518 <_raw_read_lock+0x34>
10247508: 0c 60 10 9c ldw 0(r3),ret0
1024750c: 8f 80 3f dd cmpib,<> 0,ret0,10247500 <_raw_read_lock+0x1c>
...
----------------- Processor 1 TOC Information -------------------
GR of CPU[1]
00-03 0000000000000000 0000000010247000 000000001013e694 000000001049ee80
04-07 000000000000000e 000000001009cca0 000000001049e810 0000000000000001
08-11 000000001e496938 0000000010444280 00000000104cf9c0 000000001049e810
12-15 00000000ffffffff 000000001049e810 0000000000000000 0000000000054ec0
16-19 000000001d7d8200 0000000000000000 000000001d7d8108 00000000deaf1eed
20-23 0000000000000001 0000000000000000 0000000010133a90 00000000bb20f6f9
24-27 000000001e496938 0000000000000001 000000001049ee80 0000000010445010
28-31 0000000000000000 00000000104cf810 000000001d7d8740 000000001014ca1c
GR[02] == rp = 000000001013e694
Func: send_group_sig_info, Off: 0x30, Addr: 0x1013e694
1013e690: 08 19 02 47 copy r25,r7
1013e694: 08 07 02 59 copy r7,r25
1013e698: 08 04 02 5a copy r4,r26
1013e69c: eb ff be 25 b,l 1013e5b4 <group_send_sig_info>,rp
GR[22] == t1(32bits) == arg4(64bits) = 0000000010133a90
GR[21] == t2(32bits) == arg5(64bits) = 0000000000000000
GR[20] == t3(32bits) == arg6(64bits) = 0000000000000001
GR[19] == t4(32bits) == arg7(64bits) = 00000000deaf1eed
GR[26] == arg0 = 000000001049ee80
GR[25] == arg1 = 0000000000000001
GR[24] == arg2 = 000000001e496938
GR[23] == arg3 = 00000000bb20f6f9
GR[27] == dp = 0000000010445010
Func: $global$, Off: 0x0, Addr: 0x10445010
GR[28] == ret0 = 0000000000000000
GR[29] == ret1 or sl = 00000000104cf810
GR[30] == sp = 000000001d7d8740
GR[31] == ble rp = 000000001014ca1c
Func: hrtimer_run_queues, Off: 0x18c, Addr: 0x1014ca1c
1014ca10: 08 04 02 56 copy r4,r22
1014ca14: e6 c0 20 00 be,l 0(sr4,r22),sr0,r31
1014ca18: 08 1f 02 42 copy r31,rp
1014ca1c: 08 06 02 5a copy r6,r26
CR of CPU[1]
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000f2a 0000000000000000 00000000000000c0 000000000000001f
12-15 0000000000000000 0000000000000000 000000000010b000 00000000ff000000
16-19 00002a126188a653 0000000000000000 0000000010247518 000000000c68109c
20-23 0000000000000000 0000000000000000 00000000002cef0f 0000000080000000
24-27 00000000004a3000 000000000da30000 00000000aaaaaaaa 00000000400a04a0
28-31 00000000aaaaaaaa 0000000011111111 000000001d7d8000 0000000011111111
CR[00] == rctr = 0000000000000000
CR[08] == (Protection ID) pidr1 = 0000000000000f2a
CR[10] == ccr = 00000000000000c0
CR[11] == sar = 000000000000001f
CR[14] == iva = 000000000010b000
CR[15] == eiem = 00000000ff000000
CR[16] == itmr = 00002a126188a653
CR[17] == pcsq = 0000000000000000
CR[18] == pcoq = 0000000010247518
CR[19] == iir = 000000000c68109c
CR[20] == isr = 0000000000000000
CR[21] == ior = 0000000000000000
CR[22] == ipsw = 00000000002cef0f
CR[23] == eirw = 0000000080000000
CR[24] == tr0 (ptov) = 00000000004a3000
CR[25] == tr1 (vtop) = 000000000da30000
CR[26] == tr2 = 00000000aaaaaaaa
CR[27] == tr3 = 00000000400a04a0
CR[28] == tr4 = 00000000aaaaaaaa
CR[29] == tr5 = 0000000011111111
CR[30] == tr6 = 000000001d7d8000
CR[31] == tr7 = 0000000011111111
SR of CPU[1]
00-03 00000000 00000795 00000000 00000795
04-07 00000000 00000000 00000000 00000000
SR[0] = 00000000
SR[1] = 00000795
SR[2] = 00000000
SR[3] = 00000795
SR[4] = 00000000
SR[5] = 00000000
SR[6] = 00000000
SR[7] = 00000000
Need much more work !!!
SR[00] == ts0 = 00000000
SR[01] == ts1 = 00000795
SR[03] == cpp = 00000795
Not parsable address!
...
IIA Offset = 0x0000000010247508
...
e.g. IAOQ = 0x0000000010247508
Parse IAOQ = 0x0000000010247508 for CPU[1]
Func: _raw_read_lock, Off: 0x24, Addr: 0x10247508
10247500: 0c 60 15 dc ldcw,co 0(r3),ret0
10247504: 8f 80 20 1a cmpib,<>,n 0,ret0,10247518 <_raw_read_lock+0x34>
10247508: 0c 60 10 9c ldw 0(r3),ret0
1024750c: 8f 80 3f dd cmpib,<> 0,ret0,10247500 <_raw_read_lock+0x1c>
the _raw_read_lock() is pre-compile as:
void _raw_read_lock(rwlock_t *lock)
{
if (__builtin_expect(!!(lock->magic != 0xdeaf1eed), 0)) rwlock_bug(lock, "bad
magic");
__raw_read_lock(&lock->raw_lock);
}
and assembly as:
.section .text._raw_read_lock,"ax", at progbits
.align 4
.globl _raw_read_lock
.type _raw_read_lock, @function
_raw_read_lock:
.PROC
.CALLINFO FRAME=64,CALLS,SAVE_RP,ENTRY_GR=3
.ENTRY
stw %r2,-20(%r30)
ldil L'-558956544,%r28
ldo 7917(%r28),%r28
stw,ma %r3,64(%r30)
ldw 8(%r26),%r19
cmpb,<> %r28,%r19,.L84
copy %r26,%r3
.L68:
.L82:
#APP
ldcw,co 0(%r3),%r28
#NO_APP
cmpib,<>,n 0,%r28,.L85
.L83:
ldw 0(%r3),%r28
cmpib,<> 0,%r28,.L82
nop
b,n .L83
.L85:
ldw 4(%r3),%r28
ldo 1(%r28),%r28
stw %r28,4(%r3)
ldi 1,%r28
stw %r28,0(%r3)
ldw -84(%r30),%r2
bve (%r2)
ldw,mb -64(%r30),%r3
.L84:
ldil LR'.LC1,%r25
b,l rwlock_bug,%r2
ldo RR'.LC1(%r25),%r25
b,n .L68
.EXIT
.PROCEND
.size _raw_read_lock, .-_raw_read_lock
So here if i well understand the 2 cpu are spining simultaneously in the same
loop:
.L83:
ldw 0(%r3),%r28
cmpib,<> 0,%r28,.L82
nop
b,n .L83
Let me know if you need more info.
TIA,
Joel
----------
Club Scarlet : Tout le monde gagne! Si vous devenez aujourd'hui Scarlet One grace a un client existant de Scarlet, vous recevez tous les deux un cadeau d'une valeur de 50 euros! Surfez vite sur http://www.clubscarlet.be
More information about the parisc-linux
mailing list