[parisc-linux] Unaligned access failures with apt-get on SMP
Ryan Bradetich
rbradetich@uswest.net
17 Jun 2002 14:43:49 -0600
John et all,
I recompiled the debian apt-get package this time leaving the debug
symbols intact.
Here is the function that is causing the failure:
// DynamicMMap::Allocate - Pooled aligned allocation
/*{{{*/
// ---------------------------------------------------------------------
/* This allocates an Item of size ItemSize so that it is aligned to its
size in the file. */
unsigned long DynamicMMap::Allocate(unsigned long ItemSize)
{
// Look for a matching pool entry
Pool *I;
Pool *Empty = 0;
for (I = Pools; I != Pools + PoolCount; I++)
{
if (I->ItemSize == 0)
Empty = I;
if (I->ItemSize == ItemSize)
break;
}
// No pool is allocated, use an unallocated one
if (I == Pools + PoolCount)
{
// Woops, we ran out, the calling code should allocate more.
if (Empty == 0)
{
_error->Error("Ran out of allocation pools");
return 0;
}
I = Empty;
I->ItemSize = ItemSize;
I->Count = 0;
}
// Out of space, allocate some more
if (I->Count == 0)
{
I->Count = 20*1024/ItemSize;
I->Start = RawAllocate(I->Count*ItemSize,ItemSize);
}
I->Count--;
unsigned long Result = I->Start;
I->Start += ItemSize;
return Result/ItemSize;
}
Here is my gdb output while tracing the failure:
root@rebel:~# gdb /usr/bin/apt-get
GNU gdb 2002-04-01-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "hppa-linux"...
(gdb) b main
Breakpoint 1 at 0x27ea4: file apt-get.cc, line 2134.
(gdb) run install less
Starting program: /usr/bin/apt-get install less
Breakpoint 1, main (argc=3, argv=0x46e66) at apt-get.cc:2134
2134 CommandLine CmdL(Args,_config);
(gdb) b DynamicMMap::Allocate
Breakpoint 2 at 0x40050358: file contrib/mmap.cc, line 229.
(gdb) continue
Continuing.
Reading Package Lists... 0%
Breakpoint 2, DynamicMMap::Allocate(unsigned long) (this=0x4c900,
ItemSize=275112) at contrib/mmap.cc:229
229 Pool *Empty = 0;
(gdb) bt
#0 DynamicMMap::Allocate(unsigned long) (this=0x4c900, ItemSize=275112)
at contrib/mmap.cc:229
#1 0x400ba64c in pkgCacheGenerator::SelectFile(std::string,
std::string, pkgIndexFile const&, unsigned long) (this=0xbff01020, File=
{static npos = 4294967295, _M_dataplus = {<allocator<char>> =
{<No data fields>}, _M_p = 0x489f4 "/var/lib/dpkg/status"}, static
_S_empty_rep_storage = {0, 0, 1, 18, 1, 0}}, Site={static npos =
4294967295, _M_dataplus = {<allocator<char>> = {<No data fields>}, _M_p
= 0x432b4 ""}, static _S_empty_rep_storage = {0, 0, 1, 18, 1, 0}},
Index=@0x4bda0,
Flags=1) at pkgcachegen.cc:404
#2 0x400e5a14 in debStatusIndex::Merge(pkgCacheGenerator&, OpProgress&)
const (this=0x4bda0, Gen=@0xbff01020, Prog=@0xbff00d90) at
/usr/include/g++-v3/bits/basic_string.h:863
#3 0x400bbf8c in BuildCache(pkgCacheGenerator&, OpProgress&, unsigned
long&, unsigned long, std::__normal_iterator<pkgIndexFile**,
std::vector<pkgIndexFile*, std::allocator<pkgIndexFile*> > >,
std::__normal_iterator<pkgIndexFile**, std::vector<pkgIndexFile*,
std::allocator<pkgIndexFile*> > >) (Gen=@0xbff01020,
Progress=@0xbff00d90, CurrentSize=@0xbff01190,
TotalSize=107592,
Start={<iterator<std::random_access_iterator_tag,pkgIndexFile*,int,pkgIndexFile**,pkgIndexFile*&>> = {<No data fields>}, _M_current = 0x4c578}, End=
{<iterator<std::random_access_iterator_tag,pkgIndexFile*,int,pkgIndexFile**,pkgIndexFile*&>> = {<No data fields>}, _M_current = 0x4c57c})
at /usr/include/g++-v3/bits/stl_iterator.h:478
#4 0x400bd280 in pkgMakeStatusCache(pkgSourceList&, OpProgress&,
MMap**, bool) (List=@0xbff01020, Progress=@0xbff00d90,
OutMap=0xbff00990, AllowMem=224)
at /usr/include/g++-v3/bits/stl_vector.h:187
#5 0x400ad8d4 in pkgCacheFile::Open(OpProgress&, bool)
(this=0xbff00990, Progress=@0xbff00d90, WithLock=true) at
cachefile.cc:70
#6 0x0002b794 in CacheFile::Open(bool) (this=0xbff00990, WithLock=56)
at apt-get.cc:85
(gdb) n
DynamicMMap::Allocate(unsigned long) (this=0x4c900, ItemSize=275112) at
contrib/mmap.cc:226
226 {
(gdb) n
DynamicMMap::Allocate(unsigned long) (this=0x4c900, ItemSize=56) at
contrib/mmap.cc:230
230 for (I = Pools; I != Pools + PoolCount; I++)
(gdb) n
232 if (I->ItemSize == 0)
(gdb) n
234 if (I->ItemSize == ItemSize)
(gdb) n
230 for (I = Pools; I != Pools + PoolCount; I++)
(gdb) n
234 if (I->ItemSize == ItemSize)
(gdb) n
230 for (I = Pools; I != Pools + PoolCount; I++)
(gdb) n
234 if (I->ItemSize == ItemSize)
(gdb) n
230 for (I = Pools; I != Pools + PoolCount; I++)
(gdb) n
234 if (I->ItemSize == ItemSize)
(gdb) n
230 for (I = Pools; I != Pools + PoolCount; I++)
(gdb) n
234 if (I->ItemSize == ItemSize)
(gdb) n
230 for (I = Pools; I != Pools + PoolCount; I++)
(gdb) n
234 if (I->ItemSize == ItemSize)
(gdb) n
230 for (I = Pools; I != Pools + PoolCount; I++)
(gdb) n
234 if (I->ItemSize == ItemSize)
(gdb) n
239 if (I == Pools + PoolCount)
(gdb) n
254 if (I->Count == 0)
========> Things get interesting here <=======
(gdb) n
261 unsigned long Result = I->Start;
(gdb) n
263 return Result/ItemSize;
(gdb) n
260 I->Count--;
(gdb) n
263 return Result/ItemSize;
(gdb) n
260 I->Count--;
(gdb) n
Program received signal SIGBUS, Bus error.
DynamicMMap::Allocate(unsigned long) (this=0x4c900, ItemSize=56) at
contrib/mmap.cc:263
263 return Result/ItemSize;
It looks like the the function gets exited twice.... but I do not see
any recursion in the function, and the function is not listed twice
in the origional back trace I posted. Do we have a corrupt stack?
or can you think of anything else? I would be glad to provide any
additional debugging output to anyone interested. I can also give
remote access to this system if someone is interested in looking
this further.
Thanks,
- Ryan
On Sun, 2002-06-16 at 22:12, John David Anglin wrote:
> > any way I can tell from the binary?
>
> Not that I am aware of. On further thought, I think the user code is ok.
>
> Studying you original message further, I see that the printout from
> unaligned.c is fully consistent with the register dump and user code.
> Thus, I have to think that the problem is actually in the kernel.
>
> If the failure occurs all the time, I would put a break at 0x4005e47c
> and then set a large ignore count. Run the program and see how many
> times the break is hit before the fault occurs. Then, set the ignore
> count to 1 less than the number of hits and rerun. If the fault is
> deterministic, you should be able to determine the exact conditions
> which cause the "trap".
>
> Oh, I remember that gdb may not print r3 correctly with info reg.
> It's better to use p $r3 or printf "0x%x\n", $r3.
>
> Dave
> --
> J. David Anglin dave.anglin@nrc.ca
> National Research Council of Canada (613) 990-0752 (FAX: 952-6605)
>