[kernel] bug#161: optimize csum_partial_copy_from_user()
grundler@dsl2.external.hp.com (Grant Grundler),
161@bugs.parisc-linux.org
grundler@dsl2.external.hp.com (Grant Grundler),
161@bugs.parisc-linux.org
X-PA-RISC Linux-PR-Message: report 161
X-PA-RISC Linux-PR-Package: kernel
X-Loop: daniel_frazier@hp.com
Received: via spool by bugs@bugs.parisc-linux.org id=B.101612759020789
(code B ref -1); Thu, 14 Mar 2002 17:48:01 GMT
To: submit@bugs.parisc-linux.org
Message-Id: <20020314173950.266AB488A@dsl2.external.hp.com>
Date: Thu, 14 Mar 2002 10:39:50 -0700 (MST)
From: grundler@dsl2.external.hp.com (Grant Grundler)
Package: kernel
Version: any
Severity: wishlist
from Documentation/parisc/unwritten
csum_partial_copy
csum_partial_copy_from_user
arch/parisc/lib/checksum.c
We want optimized asm for both of those.
For PA2.0, we want a loop that can perform load,add,store per cycle.
This means interleaving registers in a loop and prefetch ~3 cachelines
ahead. The main loop for from_user flavor could look something like:
ldd,ma 0(%s3,src), b
ldd,ma 0(%s3,src), c
ldd,ma 0(%s3,src), d
big_loop:
ldd 192(dst), %r0 ; prefetch 3 cachelines ahead for write
ldw 192(%s3,src), %r0 ; prefetch 3 cachelines ahead for read
ldd,ma 0(%s3,src), a
addc x,b,x
stdd,ma b,0(dst)
ldd,ma 0(%s3,src), b
addc y,c,y
stdd,ma c,0(dst)
ldd,ma 0(%s3,src), c
addc z,d,z
stdd,ma d,0(dst)
ldd,ma 0(%s3,src), d
addc w,a,w
stdd,ma a,0(dst)
/* psuedo C */
if (src < (end & ~0x3fUL) goto loop;
/* close up shop, fold cksum */
addc x,b,x
stdd,ma b,0(dst)
addc w,x,w
addc y,c,y
stdd,ma c,0(dst)
addc w,y,w
addc z,d,z
stdd,ma d,0(dst)
addc w,z,w
while (src
/* w needs to be folded smaller and returned */
return (csum_fold(w));
The above code still needs lots of boundary condition checking:
o verify src/dst are well aligned at start
o verify len is > 32 bytes
o handle misaligned leftovers