[parisc-linux] Latest palinux crash -- VM problem?
Matthew Wilcox
matthew at wil.cx
Sat Sep 2 23:00:52 MDT 2006
We hit an HPMC earlier this evening running 2.6.18-rc5-pa1 on palinux.
Here's my analysis (I'll attach the raw data to the end).
The MCA dump fingers this culprit:
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x0000000010198ba4
The offset lands in sys_mprotect(). Specifically, it's the call to
flush_tlb_range() in the (inlined) change_protection() function:
10198ba4: 04 e0 52 00 pdtlb r0(sr1,r7)
10198ba8: 37 9c 00 02 ldo 1(ret0),ret0
10198bac: bf 85 3f e5 cmpb,*<> r5,ret0,10198ba4 <sys_mprotect+0x7fc>
10198bb0: 34 e7 20 00 ldo 1000(r7),r7
At this point, there are two reasonable hypotheses:
1. Bad hardware
2. Bad software
The memory error log indicates an uncorrectable error, unfortuantely I
don't understand it enough to decode what it's saying.
Could it be a different manifestation of the same problem that bites
PA8800? That is, do we have the same address mapped twice and we're
upsetting Astro by writing back cachelines that are supposed to be on
the other CPU?
I should probably try to find Astro docs at some point so I can find out
how much it cares about this kind of thing.
The HPMC log:
Service Menu: Enter command > pim 0 hpmc
FIRMWARE INFORMATION
Firmware Version: 41.10
PROCESSOR PIM INFORMATION
----------------- Processor 0 HPMC Information - PDC Version: 41.10 ------
Timestamp = Sun Sep 3 03:06:18 GMT 2006 (20:06:09:03:03:06:18)
HPMC Chassis Codes
Chassis Code Extension
------------ ---------
0x0000082000ff6242 0x0000000000000000
0x1800082011006312 0xcb81000000000000
0x0000087000ff6292 0x000000f0f0000000
0x6000082070006062 0x0000000000000010
0x7000082070006082 0x0000000000392400
0x7000082379006133 0xc1bff0fffed08040
0x0000080080006310 0x0000000000000001
0x000008008000631f 0x0000000000000000
0x0000082000ff6452 0x0000000000000000
0x0000082000ff6402 0x0000000000000000
0x0000080080006300 0x0000000000000001
0x7000082382006343 0x0000000000070200
0x7000082382026343 0x0000000000070200
0x7000082382046343 0x0000000000070200
0x7000082382066343 0x0000000000070200
0x0000080089006200 0x0000000000000000
0x0000080086006200 0x0000000000000000
0x000008008000630f 0x0000000000000000
General Registers 0 - 31
00-03 0000000000000000 00000000105b60c0 0000000010198af0 000000009fe7ce58
04-07 00000000105a78c0 00000000000000d3 0000000040c00000 0000000040bd7000
08-11 0000000040caa000 0000000040caa000 0000000040caa000 00000000d0c9881c
12-15 0000000000000070 0000000040ca9fff 0000000040ca9fff 0000000000000b00
16-19 00000000000e1e00 00000000d0c9a004 00000000a096c3c0 0000000010000000
20-23 00000000facc8b40 0000000000000000 0000000000000000 0000000000000040
24-27 000000009fe7ce98 0000000040caa000 0000000010478000 00000000105a78c0
28-31 0000000000000001 0000000015fa0270 0000000015fa02b0 0000000000000000
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 000000000000db78 0000000000000000 00000000000000c0 0000000000000038
12-15 0000000000000000 0000000000000000 0000000000103000 ffc0000000000000
16-19 000011f2605064fd 0000000000000000 0000000010198bb0 0000000034e72000
20-23 0000000010240001 000000001e078000 000000ff080cef0f 8000000000000000
24-27 0000000000511000 00000000c0c9a000 0000000000041020 5555555555555555
28-31 000000f0f015e700 5555555555555555 0000000015fa0000 0000000010568000
Space Registers 0 - 7
00-03 036de000 036de000 00000000 036de000
04-07 00000000 00000000 00000000 00000000
IIA Space (back entry) = 0x0000000000000000
IIA Offset (back entry) = 0x0000000010198ba4
Check Type = 0x20000000
CPU State = 0x9e000004
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x0010c03b
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0xfffffffffffa0000
Floating Point Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000010d48098 0000000010000000 0000080300000000 000000004fae8ac0
08-11 0000000000000000 00000000105a78c0 ffffffffffffff9c 0000000000000000
12-15 c06f020000000802 403cf49114843c00 40000e7014843c10 00000000105a78c0
16-19 0000000000000000 0000000000000001 00000000105b48c0 0000000010603000
20-23 0000000010453d80 00000000105487f0 0000000000000244 00000244a8b90fc5
24-27 0000000100000000 00000000105b70c0 00000000105a78c0 0000000000000802
28-31 0000000010143a08 00000000104f32c0 0000000017c841c0 0000000014844108
Check Summary = 0xcb81000000000000
Available Memory = 0x0000000100000000
CPU Diagnose Register 2 = 0x0301000000802004
CPU Status Register 0 = 0x2440c20000000000
CPU Status Register 1 = 0x8000200000000000
SADD LOG = 0x141ffcffffffffff
Read Short LOG = 0xc10080fff800a014
-------------- Memory Error Log Information --------------
Bus 0 Log Information
Timestamp = Sun Sep 3 03:06:18 GMT 2006 (20:06:09:03:03:06:18)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
X ERR_ERROR X X
Bus Requestor Address = 0xfffffffffffa0000
Bus Target Address = 0x0000000000000000
Bus Responder Address = 0xfffffffffed00000
Error Status Reg = 0x0000000000000010
Runway Control Reg = 0x0000021c00001418
Runway Address Reg = 0xc1bff0fffed08040
Runway Data High Reg = 0xe840c000083c025c
Runway Data Low Reg = 0xe840c000083c025c
Memory Address Reg = 0x000001ff3fffffff
Memory Address Corr Reg = 0x000001ff3fffffff
Memory Syndrome Reg = 0x0000000000000000
Memory Syndrome Corr Reg = 0x0000000000000000
Address/Control Parity Error Registers
Address/Control Parity Error Bit (mem_addr_par_stat) Not Set
------------ I/O Module Error Log Information ------------
Summary of IO subsystem log entries
-----------------------------------
Phys Loc Vendor Device Severity
Description (hex) Id Id CORR UNC FE CW
----------- ----- ------ ------ ----------------
System Bus Adapter RP 0x000000ffff04ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000000ffff01ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000000ffff02ff83 0x103c 0x1051 X
System Bus Adapter RP 0x000000ffff03ff83 0x103c 0x1051 X
Detail display of IO subsystem log entries
------------------------------------------
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Sun Sep 3 03:06:19 GMT 2006 (20:06:09:03:03:06:19)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000000ffff04ff83
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Sun Sep 3 03:06:19 GMT 2006 (20:06:09:03:03:06:19)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000000ffff01ff83
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Sun Sep 3 03:06:19 GMT 2006 (20:06:09:03:03:06:19)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000000ffff02ff83
System Bus Adapter -- Rope Interface
------------------------------------------
Timestamp = Sun Sep 3 03:06:19 GMT 2006 (20:06:09:03:03:06:19)
OV RQ RS ESTAT A C D corr unc fe cw pf
-- -- -- ----- - - - ---- --- -- -- --
ERR_FUNCTION X
IO Requestor Address = 0x0000000000000000
IO Target Address = 0x0000000000000000
IO Responder Address = 0x0000000000000000
IO Physical Location = 0x000000ffffffff82
IO Hardware Path = 0x00ffffffffffff00
Module Error Register = 0x0000000000000000
Rope Physical Location = 0x000000ffff03ff83
More information about the parisc-linux
mailing list