[parisc-linux-cvs] Re: 2.6.0-test4-pa12 __fls()

Sat, 6 Sep 2003 20:31:00 +0100

On Fri, Sep 05, 2003 at 11:59:14PM -0600, Grant Grundler wrote:
> > kudos to Joel Soete for mangling lamont's __ffs() to produce __fls().
> > Again, I added 64-bit support.  Booted on a500.
> > Didn't test 32-bit but fls_test.c included in above URL works in user space.
> > (162 cycles per loop iteration, 00:29:01 to complete on 400Mhz PA8500).

Sorry, but it's clearly broken.

> -#define fls(x) generic_fls(x)
> +static __inline__ unsigned long __fls(unsigned long x)
> +{
> +	unsigned long ret;
> +
> +	__asm__( " ldi    1,%1\n"
> +#if BITS_PER_LONG > 32
> +		" extrd,u,*<>  %0,63,32,%%r0\n"

if any of the bottom 32 bits are set ...

> +		" depd,*TR  %0,31,32,%0\n"

move the bottom 32 bits up into the top 32 bits

> +		" addi    32,%1,%1\n"

otherwise add 32

> +#endif

... and then do things that can't see the top 32 bits.

> +		" extru,<>  %0,15,16,%%r0\n"
> +		" zdep,TR  %0,15,16,%0\n"
> +		" addi     16,%1,%1\n"

think you could put in comments similar to the endian swapping?  it makes
the intent clearer to see.  as far as i can tell, the basic principle
here is..

if (top N bits clear)
	shift N bits right
else
	add N 

right?

seems to me this whole thing should be done as

if (any top N bits set)
	shift N bits left
else
	subtract N

> +		" extru,<>  %0,7,8,%%r19\n"

r19?  That's not mentioned as being clobbered.  I think you mean r0.

> +static __inline__ int fls(int x)
> +{
> +	return x ? (__fls((unsigned long)x) + 1) : 0;
> +}

we don't need an __fls, so implementing an fls that works directly would
be better.

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk