[parisc-linux] Re:[parisc-linux-cvs] linux-2.6 kyle

Joel Soete soete.joel at scarlet.be
Wed Aug 16 10:18:31 MDT 2006


> > On Mon, 2006-08-07 at 19:04 +0200, Joel Soete wrote:
> > > > On Mon, 2006-08-07 at 12:32 +0200, Joel Soete wrote:
> > > > > Oops my bad: I forgot the raw data, here there are
> > > > 
> > > > Thanks, that's still not quite enough to say what they were doing at the
> > > > time.
> > > > 
> > > > cpu0 ioaq is 10104c90 10104c94
> > > > cpu1 ioaq is 1024a1a4 1024a18c
> > > > 
> > > > Could you do me the disassembly from the entry symbol down into each of
> > > > these so I can try to figure out where they're stuck? (Your prior
> > > > disassembly ends just a bit short).
> > > > 
> > > No pb:
> > > CPU#0 is runing:
> > > as from System.map:
> > 
> > Oh, that's actually not quite what I thought it was doing.  Both CPU0
> > and 1 are spinning trying to acquire the rw->lock.  That means something
> > else has it held for write.  How many CPUs does this system have?
> This is a D380 2-way pa8000:
> processor       : 0
> cpu family      : PA-RISC 2.0
> cpu             : PA8000 (PCX-U)
> cpu MHz         : 180.000000
> model           : 9000/820/D380
> model name      : UL proc 2-way U180 1M/1M
> hversion        : 0x00005aa0
> sversion        : 0x00000491
> I-cache         : 1024 KB
> D-cache         : 1024 KB (WB, direct-mapped)
> ITLB entries    : 96
> DTLB entries    : 96 - shared with ITLB
> BTLB            : not supported
> bogomips        : 359.62
> software id     : 2005347373
> 
> with 256Mb of ram and 512Mb of swap.
> 
> > And what were the rest of them doing?
> > 
> It was runing a kernel 2.6.18-rc3-pa5 32bit smp (scsi drives ncr53c720 doesn't
> yet allow me to try 64bit) and the most its job at the hang moment was a
> kernel's compile loop:
> #while true ; do nice -n -4 make -j 2 clean ; nice -n -4 make -j 2 oldconfig;
> nice -n -4 make -j 2 vmlinux 2>&1 | tee -a /var/logs/k-loop ; done
> 
> (A top showing me well at this moment the 2 cpu busy by cc1)
> 
> (and no stree loop on fs).
> 
On this same system: another test, another hang with same console messages:
BUG: soft lockup detected on CPU#1!
Backtrace:
 [<101557b0>] softlockup_tick+0xd4/0x148
 [<1013b88c>] update_process_times+0x3c/0x88
 [<101091c0>] timer_interrupt+0xd0/0x1a8
 [<10155c0c>] handle_IRQ_event+0x5c/0xa4
 [<10155cec>] __do_IRQ+0x98/0x1d0
 [<1010993c>] do_cpu_irq_mask+0xdc/0x194
 [<1010c068>] intr_return+0x0/0x1c

BUG: soft lockup detected on CPU#0!
Backtrace:
 [<101557b0>] softlockup_tick+0xd4/0x148
 [<1013b88c>] update_process_times+0x3c/0x88
 [<101091c0>] timer_interrupt+0xd0/0x1a8
 [<10155c0c>] handle_IRQ_event+0x5c/0xa4
 [<10155cec>] __do_IRQ+0x98/0x1d0
 [<1010993c>] do_cpu_irq_mask+0xdc/0x194
 [<1010c068>] intr_return+0x0/0x1c

the top showing:
top - 12:04:53 up 2 days, 22:52,  3 users,  load average: 2.44, 2.43, 2.37
Tasks: 107 total,   3 running, 104 sleeping,   0 stopped,   0 zombie
Cpu0  : 25.9%us, 74.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 78.0%us, 15.6%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi,  5.7%si,  0.0%st
Mem:    255124k total,   186492k used,    68632k free,    73616k buffers
Swap:   517480k total,        0k used,   517480k free,    47896k cached

  PID  PPID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P WCHAN   
 COMMAND
13285 13284 root      21  -4 11336 7280 1992 R   80  2.9   0:03.33 1 lock_text cc1
 2146     1 gkrellmd  15   0  5388 1500 1108 S   16  0.6 444:44.26 1 select  
 gkrellmd
 3311  3266 root      16   0  2972 1404 1040 R    9  0.6 479:27.73 0 read      top
 3213     1 root      15   0  3924 1456  748 S    1  0.6  25:25.27 1 select  
 screen
13277 12826 root      15  -4  2940 1348 1056 S    1  0.5   0:00.20 1 wait      sh
25814 25805 jso       15   0  9928 1832 1268 S    1  0.7  16:23.23 0 select  
 sshd
    1     0 root      15   0  2380  828  680 S    0  0.3   0:32.94 1 select  
 init

(sorry I missed the 3rd running process ;_( )

but the toc is showing something a bit different:

-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03  0000000000000000  0000000010247000  000000001013e694  000000001049ee80
04-07  000000000000000e  0000000010094ca0  000000001049e810  0000000000000001
08-11  000000001f810038  0000000010444280  00000000104cf9c0  000000001049e810
12-15  000000001049e810  000000000002ed58  000000000002e558  000000000001cbfc
16-19  000000001c830440  0000000000030d58  000000000002e558  00000000deaf1eed
20-23  0000000000000000  0000000000000000  0000000010133a90  00000000bb20f6f9
24-27  000000001f810038  0000000000000001  000000001049ee80  0000000010445010
28-31  0000000000000000  00000000104cf810  000000001c830980  000000001014ca1c

Control Registers 0 - 31
00-03  0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07  0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11  0000000000002112  0000000000000000  00000000000000c0  0000000000000017
12-15  0000000000000000  0000000000000000  000000000010b000  00000000ff000000
16-19  00002a121edeb615  0000000000000000  0000000010247518  000000000c68109c
20-23  0000000004240041  000000002789ee84  00000000002ceb0f  0000000080000000
24-27  00000000004a3000  000000000c4de000  00000000aaaaaaaa  00000000aaaaaaaa
28-31  00000000000003d0  0000000011111111  000000001c830000  00000000104f8000

Space Registers 0 - 7
00-03  00000000          00000000          00000000          00001089
04-07  00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000010247508
CPU State                    = 0x9e000001

-----------------  Processor 1 TOC Information -------------------

General Registers 0 - 31
00-03  0000000000000000  0000000010247000  000000001013e694  000000001049ee80
04-07  000000000000000e  000000001009cca0  000000001049e810  0000000000000001
08-11  000000001e496938  0000000010444280  00000000104cf9c0  000000001049e810
12-15  00000000ffffffff  000000001049e810  0000000000000000  0000000000054ec0
16-19  000000001d7d8200  0000000000000000  000000001d7d8108  00000000deaf1eed
20-23  0000000000000001  0000000000000000  0000000010133a90  00000000bb20f6f9
24-27  000000001e496938  0000000000000001  000000001049ee80  0000000010445010
28-31  0000000000000000  00000000104cf810  000000001d7d8740  000000001014ca1c

Control Registers 0 - 31
00-03  0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07  0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11  0000000000000f2a  0000000000000000  00000000000000c0  000000000000001f
12-15  0000000000000000  0000000000000000  000000000010b000  00000000ff000000
16-19  00002a126188a653  0000000000000000  0000000010247518  000000000c68109c
20-23  0000000000000000  0000000000000000  00000000002cef0f  0000000080000000
24-27  00000000004a3000  000000000da30000  00000000aaaaaaaa  00000000400a04a0
28-31  00000000aaaaaaaa  0000000011111111  000000001d7d8000  0000000011111111

Space Registers 0 - 7
00-03  00000000          00000795          00000000          00000795
04-07  00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000010247508
CPU State                    = 0x9e000001

and the analysis give me:
-----------------  Processor 0 TOC Information -------------------

GR of CPU[0]
00-03  0000000000000000  0000000010247000  000000001013e694  000000001049ee80
04-07  000000000000000e  0000000010094ca0  000000001049e810  0000000000000001
08-11  000000001f810038  0000000010444280  00000000104cf9c0  000000001049e810
12-15  000000001049e810  000000000002ed58  000000000002e558  000000000001cbfc
16-19  000000001c830440  0000000000030d58  000000000002e558  00000000deaf1eed
20-23  0000000000000000  0000000000000000  0000000010133a90  00000000bb20f6f9
24-27  000000001f810038  0000000000000001  000000001049ee80  0000000010445010
28-31  0000000000000000  00000000104cf810  000000001c830980  000000001014ca1c

GR[02] == rp = 000000001013e694

Func: send_group_sig_info, Off: 0x30, Addr: 0x1013e694

1013e690:       08 19 02 47     copy r25,r7
1013e694:       08 07 02 59     copy r7,r25
1013e698:       08 04 02 5a     copy r4,r26
1013e69c:       eb ff be 25     b,l 1013e5b4 <group_send_sig_info>,rp

GR[22] == t1(32bits) == arg4(64bits) = 0000000010133a90

GR[21] == t2(32bits) == arg5(64bits) = 0000000000000000

GR[20] == t3(32bits) == arg6(64bits) = 0000000000000000

GR[19] == t4(32bits) == arg7(64bits) = 00000000deaf1eed

GR[26] == arg0 = 000000001049ee80

GR[25] == arg1 = 0000000000000001

GR[24] == arg2 = 000000001f810038

GR[23] == arg3 = 00000000bb20f6f9

GR[27] == dp = 0000000010445010

Func: $global$, Off: 0x0, Addr: 0x10445010


GR[28] == ret0 = 0000000000000000

GR[29] == ret1 or sl = 00000000104cf810

GR[30] == sp = 000000001c830980

GR[31] == ble rp = 000000001014ca1c

Func: hrtimer_run_queues, Off: 0x18c, Addr: 0x1014ca1c

1014ca10:       08 04 02 56     copy r4,r22
1014ca14:       e6 c0 20 00     be,l 0(sr4,r22),sr0,r31
1014ca18:       08 1f 02 42     copy r31,rp
1014ca1c:       08 06 02 5a     copy r6,r26

CR of CPU[0]
00-03  0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07  0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11  0000000000002112  0000000000000000  00000000000000c0  0000000000000017
12-15  0000000000000000  0000000000000000  000000000010b000  00000000ff000000
16-19  00002a121edeb615  0000000000000000  0000000010247518  000000000c68109c
20-23  0000000004240041  000000002789ee84  00000000002ceb0f  0000000080000000
24-27  00000000004a3000  000000000c4de000  00000000aaaaaaaa  00000000aaaaaaaa
28-31  00000000000003d0  0000000011111111  000000001c830000  00000000104f8000

CR[00] == rctr = 0000000000000000

CR[08] == (Protection ID) pidr1 = 0000000000002112

CR[10] == ccr = 00000000000000c0

CR[11] == sar = 0000000000000017

CR[14] == iva = 000000000010b000

CR[15] == eiem = 00000000ff000000

CR[16] == itmr = 00002a121edeb615

CR[17] == pcsq = 0000000000000000

CR[18] == pcoq = 0000000010247518

CR[19] == iir = 000000000c68109c

CR[20] == isr = 0000000004240041

CR[21] == ior = 000000002789ee84

CR[22] == ipsw = 00000000002ceb0f

CR[23] == eirw = 0000000080000000

CR[24] == tr0 (ptov) = 00000000004a3000

CR[25] == tr1 (vtop) = 000000000c4de000

CR[26] == tr2 = 00000000aaaaaaaa

CR[27] == tr3 = 00000000aaaaaaaa

CR[28] == tr4 = 00000000000003d0

CR[29] == tr5 = 0000000011111111

CR[30] == tr6 = 000000001c830000

CR[31] == tr7 = 00000000104f8000

SR of CPU[0]
00-03  00000000          00000000          00000000          00001089
04-07  00000000          00000000          00000000          00000000
SR[0] = 00000000
SR[1] = 00000000
SR[2] = 00000000
SR[3] = 00001089
SR[4] = 00000000
SR[5] = 00000000
SR[6] = 00000000
SR[7] = 00000000
Need much more work !!!

SR[00] == ts0 = 00000000

SR[01] == ts1 = 00000000

SR[03] == cpp = 00001089
        Not parsable address!
...
IIA Offset                   = 0x0000000010247508
...

e.g. IAOQ = 0x0000000010247508

Parse IAOQ = 0x0000000010247508 for CPU[0]

Func: _raw_read_lock, Off: 0x24, Addr: 0x10247508

10247500:       0c 60 15 dc     ldcw,co 0(r3),ret0
10247504:       8f 80 20 1a     cmpib,<>,n 0,ret0,10247518 <_raw_read_lock+0x34>
10247508:       0c 60 10 9c     ldw 0(r3),ret0
1024750c:       8f 80 3f dd     cmpib,<> 0,ret0,10247500 <_raw_read_lock+0x1c>
        ...

-----------------  Processor 1 TOC Information -------------------

GR of CPU[1]
00-03  0000000000000000  0000000010247000  000000001013e694  000000001049ee80
04-07  000000000000000e  000000001009cca0  000000001049e810  0000000000000001
08-11  000000001e496938  0000000010444280  00000000104cf9c0  000000001049e810
12-15  00000000ffffffff  000000001049e810  0000000000000000  0000000000054ec0
16-19  000000001d7d8200  0000000000000000  000000001d7d8108  00000000deaf1eed
20-23  0000000000000001  0000000000000000  0000000010133a90  00000000bb20f6f9
24-27  000000001e496938  0000000000000001  000000001049ee80  0000000010445010
28-31  0000000000000000  00000000104cf810  000000001d7d8740  000000001014ca1c

GR[02] == rp = 000000001013e694

Func: send_group_sig_info, Off: 0x30, Addr: 0x1013e694

1013e690:       08 19 02 47     copy r25,r7
1013e694:       08 07 02 59     copy r7,r25
1013e698:       08 04 02 5a     copy r4,r26
1013e69c:       eb ff be 25     b,l 1013e5b4 <group_send_sig_info>,rp

GR[22] == t1(32bits) == arg4(64bits) = 0000000010133a90

GR[21] == t2(32bits) == arg5(64bits) = 0000000000000000

GR[20] == t3(32bits) == arg6(64bits) = 0000000000000001

GR[19] == t4(32bits) == arg7(64bits) = 00000000deaf1eed

GR[26] == arg0 = 000000001049ee80

GR[25] == arg1 = 0000000000000001

GR[24] == arg2 = 000000001e496938

GR[23] == arg3 = 00000000bb20f6f9

GR[27] == dp = 0000000010445010

Func: $global$, Off: 0x0, Addr: 0x10445010


GR[28] == ret0 = 0000000000000000

GR[29] == ret1 or sl = 00000000104cf810

GR[30] == sp = 000000001d7d8740

GR[31] == ble rp = 000000001014ca1c

Func: hrtimer_run_queues, Off: 0x18c, Addr: 0x1014ca1c

1014ca10:       08 04 02 56     copy r4,r22
1014ca14:       e6 c0 20 00     be,l 0(sr4,r22),sr0,r31
1014ca18:       08 1f 02 42     copy r31,rp
1014ca1c:       08 06 02 5a     copy r6,r26

CR of CPU[1]
00-03  0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07  0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11  0000000000000f2a  0000000000000000  00000000000000c0  000000000000001f
12-15  0000000000000000  0000000000000000  000000000010b000  00000000ff000000
16-19  00002a126188a653  0000000000000000  0000000010247518  000000000c68109c
20-23  0000000000000000  0000000000000000  00000000002cef0f  0000000080000000
24-27  00000000004a3000  000000000da30000  00000000aaaaaaaa  00000000400a04a0
28-31  00000000aaaaaaaa  0000000011111111  000000001d7d8000  0000000011111111

CR[00] == rctr = 0000000000000000

CR[08] == (Protection ID) pidr1 = 0000000000000f2a

CR[10] == ccr = 00000000000000c0

CR[11] == sar = 000000000000001f

CR[14] == iva = 000000000010b000

CR[15] == eiem = 00000000ff000000

CR[16] == itmr = 00002a126188a653

CR[17] == pcsq = 0000000000000000

CR[18] == pcoq = 0000000010247518

CR[19] == iir = 000000000c68109c

CR[20] == isr = 0000000000000000

CR[21] == ior = 0000000000000000

CR[22] == ipsw = 00000000002cef0f

CR[23] == eirw = 0000000080000000

CR[24] == tr0 (ptov) = 00000000004a3000

CR[25] == tr1 (vtop) = 000000000da30000

CR[26] == tr2 = 00000000aaaaaaaa

CR[27] == tr3 = 00000000400a04a0

CR[28] == tr4 = 00000000aaaaaaaa

CR[29] == tr5 = 0000000011111111

CR[30] == tr6 = 000000001d7d8000

CR[31] == tr7 = 0000000011111111

SR of CPU[1]
00-03  00000000          00000795          00000000          00000795
04-07  00000000          00000000          00000000          00000000
SR[0] = 00000000
SR[1] = 00000795
SR[2] = 00000000
SR[3] = 00000795
SR[4] = 00000000
SR[5] = 00000000
SR[6] = 00000000
SR[7] = 00000000
Need much more work !!!

SR[00] == ts0 = 00000000

SR[01] == ts1 = 00000795

SR[03] == cpp = 00000795
        Not parsable address!
...
IIA Offset                   = 0x0000000010247508
...

e.g. IAOQ = 0x0000000010247508

Parse IAOQ = 0x0000000010247508 for CPU[1]

Func: _raw_read_lock, Off: 0x24, Addr: 0x10247508

10247500:       0c 60 15 dc     ldcw,co 0(r3),ret0
10247504:       8f 80 20 1a     cmpib,<>,n 0,ret0,10247518 <_raw_read_lock+0x34>
10247508:       0c 60 10 9c     ldw 0(r3),ret0
1024750c:       8f 80 3f dd     cmpib,<> 0,ret0,10247500 <_raw_read_lock+0x1c>

the _raw_read_lock() is pre-compile as:
void _raw_read_lock(rwlock_t *lock)
{
 if (__builtin_expect(!!(lock->magic != 0xdeaf1eed), 0)) rwlock_bug(lock, "bad
magic");
 __raw_read_lock(&lock->raw_lock);
}

and assembly as:
        .section        .text._raw_read_lock,"ax", at progbits
        .align 4
.globl _raw_read_lock
        .type   _raw_read_lock, @function
_raw_read_lock:
        .PROC
        .CALLINFO FRAME=64,CALLS,SAVE_RP,ENTRY_GR=3
        .ENTRY
        stw %r2,-20(%r30)
        ldil L'-558956544,%r28
        ldo 7917(%r28),%r28
        stw,ma %r3,64(%r30)
        ldw 8(%r26),%r19
        cmpb,<> %r28,%r19,.L84
        copy %r26,%r3
.L68:
.L82:
#APP
        ldcw,co 0(%r3),%r28
#NO_APP
        cmpib,<>,n 0,%r28,.L85
.L83:
        ldw 0(%r3),%r28
        cmpib,<> 0,%r28,.L82
        nop
        b,n .L83
.L85:
        ldw 4(%r3),%r28
        ldo 1(%r28),%r28
        stw %r28,4(%r3)
        ldi 1,%r28
        stw %r28,0(%r3)
        ldw -84(%r30),%r2
        bve (%r2)
        ldw,mb -64(%r30),%r3
.L84:
        ldil LR'.LC1,%r25
        b,l rwlock_bug,%r2
        ldo RR'.LC1(%r25),%r25
        b,n .L68
        .EXIT
        .PROCEND
        .size   _raw_read_lock, .-_raw_read_lock

So here if i well understand the 2 cpu are spining simultaneously in the same
loop:
.L83:
        ldw 0(%r3),%r28
        cmpib,<> 0,%r28,.L82
        nop
        b,n .L83

Let me know if you need more info.

TIA,
    Joel



----------
Club Scarlet : Tout le monde gagne! Si vous devenez aujourd'hui Scarlet One grace a un client existant de Scarlet, vous recevez tous les deux un cadeau d'une valeur de 50 euros! Surfez vite sur http://www.clubscarlet.be




More information about the parisc-linux mailing list