[parisc-linux] [patch 2/2] backport of sba sg list management to ccio-dma

Joel Soete soete.joel at scarlet.be
Tue Oct 23 10:15:34 MDT 2007


> On Sat, Oct 20, 2007 at 01:23:28PM -0400, Kyle McMartin wrote:
> > On Sat, Oct 20, 2007 at 05:21:44PM +0000, Joel Soete wrote:
> > > Hello Kyle,
> > > 
> > > Hppay that you finaly receive it ;-)
> > > (btw sorry for boucing)
> > > 
> > 
> > I've looked over the code, and it looks fine to me. Will wait for an ack
> > from Grant though, since I have no ability to test ccio :/
> 
> We both have access to the FtC test ring - but I think we are also both
> too lazy to build a kernel and test it on a remote box. :)
> 
> In general I'm ok with this code since I trust Joel is testing
> it with both NIC and storage devices and I'm too lazy to.
> 
> Some comments/requests/ideas below but nothing critical.
> 
> Apologies that I'm cut/pasting from the mailing list archive:
>    
http://lists.parisc-linux.org/pipermail/parisc-linux/2007-October/031998.html
> 
> I didn't have the original mail in my inbox anymore.
> 
> 
> +static CCIO_INLINE int
> +get_iovp_order(unsigned long size)
> +{
> +	int order;
> +
> +	size = (size - 1) >> (IOVP_SHIFT - 1);
> +	order = -1;
> +	do {
> +		size >>= 1;
> +		order++;
> +	} while (size);
> +	return order;
> +}
> 
> Would this be faster if written to use "ffs()"?
> 
This seems to be better fls then ffs like:
int get_iovp_order_faster(unsigned long size)
{
	return fls((size - 1) >> (IOVP_SHIFT));
}

Even thought following test seems to be ok:
	unsigned int a;
	unsigned long ul;

	/* for proof of concept
	*/
	for (ul=0; ul<536870914; ul++) {
		if ( get_iovp_order(ul) != get_iovp_order_faster(ul) ) {
			printf("get_iovp_order(%ld) = %ld (0x%x)\n\n", ul, a = get_iovp_order(ul), a);
			printf("get_iovp_order_faster(%ld) = %ld (0x%x)\n\n", ul, a =
get_iovp_order_faster(ul), a);
			exit (1);
		}
	}

But what would give the kernel generic (include/asm-generic/page.h: this is
what get_iovp_order() is if IOVP_SHIFT==PAGE_SHIFT) get_order(0): 0?
the answer is 20 for paric and ia32?
 
> 
> ...
> +#define ROUNDUP(x, y)		((x + ((y)-1)) & ~((y)-1))
> 
> ROUNDUP got replace in SBA with ALIGN(). Please use ALIGN instead.

Yes I figure it out later sorry.
 
> 
> ...
> +	} else {
> +		/*
> +		** Search the resource bit map on well-aligned values.
> +		** "o" is the alignment.
> +		** We need the alignment to invalidate I/O TLB using
> +		** xxx HW features in the unmap path.
> +		*/
> +		unsigned long o = 1 << get_iovp_order(pages_wanted << IOVP_SHIFT);
> +		uint bitshiftcnt = ROUNDUP(ioc->res_bitshift, o);
> +		unsigned long mask;
> 
> CCIO doesn't have this feature AFAIK. The comment only applies to SBA.
> Please remove the second sentence of the comment.
Ok.

> Searching on "well-aligned" values is still a good idea.
> 
> 
> NOTE: Each 8 consecutive CCIO IO PDIR mappings land on the same IO TLB entry.
> 
> To avoid thrashing the IO TLB, ideally CCIO code will do the following:
> 
> o map_sg (ie storage devices) use 8 bit alignment (ie each mapping start
>  on a new byte boundary).  Traditional block storage can complete the
>  IOs out of order and in parallel (via different controllers). This
>  would thrash one IO TLB entry if 8 seperate 4K DMA mappings are inflight
>  at the same time.
> 
> o map_single() (ie NICs) to just use the next available bit for a given
>   PCI device.  NIC DMA patterns (at least for a given stream) are linear.
>   The ccio_map_single() should track the last bitmap octet used for both
>   DMA_TO_DEVICE and FROM_DEVICE for each PCI device.
>   Once that octet is "used", find the next 8 bit field in the bitmap that
>   is available.
> 
> For a single NIC and one or two SCSI controllers, this probably doesn't
> make much difference unless you start measuring latency. Throughput
> should be nearly the same.
> 
> My guess is the biggest perf gain will come from reducing the number
> of "sync" ops by implementing DELAYED_RESOURCE_CNT. 
> 
> 
> ...
>  #ifdef CCIO_SEARCH_TIME
> +	udelay(100);
> 
> Did you want to keep udelay() call?
> 
Most probably not (just need more test)

> 
> ...
> +#else /* DELAYED_RESOURCE_CNT == 0 */
> +
> +	ccio_free_range(ioc, iova, size);
> +
> +	/* force fdc's to be visible now */
> +	asm volatile("sync" :: );
> +
> +#endif /* DELAYED_RESOURCE_CNT == 0 */
> 
> Does the asm("sync") need to be outside the #endif ?
> Please make sure "sync" is called exactly once IFF 
> the IO PDIR is modified.
> 
Well understand but this check stay hard to me (but I will try to spend a bit
more time to check)

> 
> ...
> +#ifdef CCIO_MAP_STATS
> +		ioc->usg_pages += sg_dma_len(sglist) >> IOVP_SHIFT;
> +		ioc->usingle_calls--;   /* kluge since call is unmap_sg() */
> +#endif
> 
> I wouldn't add MAP_STATS here.
> They aren't enabled in SBA becuase they impact performance too much.
> I expect that will always be true and would seriously consider removing
> MAP_STATS code from SBA as well. Timing the bitmap search is
> probably the only critical bit of info that really matters. 
> And that's for developement/testing only.
> 
mmm, I would so let stay here but with just additional comment to prevent to
activate it outside this developement/testing context?

> hth,
> grant
> 
Yes, tx a lot,
    J.

>
---
Pack Scarlet One, ADSL 6 Mbps + Telephonie, a partir de EUR 29,95...
http://www.scarlet.be/




More information about the parisc-linux mailing list