[parisc-linux] Proposal for altering our Page Table layouts

Joel Soete soete.joel at tiscali.be
Fri Apr 9 14:12:51 MDT 2004


Hi James,

James Bottomley wrote:
> Current state of Play
> =====================
> 
> On PA, we currently have different page table layouts depending on
> whether we're running a 64 bit (LP64) or 32 bit (ILP32) kernel.  PA
> has a so called software TLB, which means that each PA processor
> contains a number of fixed TLB entries and if the current virtual
> address is not in one of them the processor takes a TLB miss fault and
> the fault routine gets to locate the TLB entry and insert it (usually
> causing the processor to throw out another TLB entry).  This software
> TLB policy means that our page table structure is really up to us.
> 
> On ILP32 we have a 2 level page table, with a 4k directory pointing to
> a page of 4k containing the entries, each entry pointing to a physical
> page and taking 4 bytes (covering 1024*1024*4096 = 4GB total).
> 
> On LP64 we have a 3 level page table, with a 4k directory pointing to
> a 4k mid-directory pointing to a page of 4k containing entries.  Since
> our pointers here are 8 bytes, 4k only contains 512 of them, so we
> cover 512 * 512 * 512 * 4096 = 512GB
> 
> One disadvantage on LP64 is that even though our user-space is mostly
> ILP32, we still incur the overhead of a three level lookup.
> 
> Another problem with this is that each Page table Entry (PTE) needs to
> contain certain flags (some are mandated by Linux, others are needed
> to control the type of TLB entries we insert).  Since each PTE points
> to a page (and thus must be page aligned), we get the lower 12 bits of
> the address for the flags.  If you look in asm/pgtable.h, you'll see
> that all of those bits are already in use for 13 flags (we overload
> _PAGE_FILE and _PAGE_DIRTY).
> 
> In order to solve our cache flush penalty on fork/exec, and implement
> stingy flushing, we need to be able to mark a page as being "in
> cache", and would need an extra flag to do this with.  Additionally,
> at some point in the future it would be nice to be able to be adaptive
> about page size (i.e. r-x regions are just faulted binary text, we
> could cover them with 16k or even 64k pages for efficiency and Linux
> would be none the wiser).
> 
> To achieve all of this, we need quite a large expansion in the number
> of available flags.
> 
> So:
> 
> New Proposal for Page Table Layout
> ==================================
> 
> The proposal is:
> 
> 1) Make the PTE on both ILP32 and LP64 8 bytes.  Even on LP64, the
>    maximum addressable physical memory is 48bits (256EB), so we can
>    use the top 16 bits for additional flags.  On ILP32 we'd have an
>    extra long, so again, we use the top 16 bits for flags and leave
>    the lower 16 bits unused.  This gives us identical PTE layouts on
>    both ILP32 and LP64
> 
> 2) Make the directories 8k in size (this has to be physically
>    contiguous because the TLB miss handler operates in absolute
>    space).
> 
> 3) Allocate all page tables in ZONE_DMA.  On PA, this means that the
>    physical address of every page table will be under 4GB, so we only
>    need *four* bytes for all of the directory entries. (The flags I'm
>    looking for are only in the PTE, we have plenty of extra space
>    still for directory flags).
> 
I would just take the opportunity to mentioned you a pb I encounter on N4k model
(typicaly requiring 64bit kernel) with 2 cpu and 4Gb of ram. I can just run a up kernel (2.6.5-pa5 :) )
which only uses only 2 of the 4 Gb of the available ram.
Thanks to Matthew (http://lists.parisc-linux.org/pipermail/parisc-linux/2004-February/022393.html)
and Grant (http://lists.parisc-linux.org/pipermail/parisc-linux/2004-February/022408.html),
I can figure out the following stuff:
 > <==== actualy return by setup_bootmem() ====>
 > pmem_ranges[0].start_pfn = 0.
 > pmem_ranges[0].pages = 524288.
 > pmem_ranges[1].start_pfn = 1572864.
 >
There is so an actual gap too big for setup_bootmem():
(in arch/parisc/kernel/init.c)
[snip]
#define MAX_GAP (0x40000000UL >> PAGE_SHIFT)

static void __init setup_bootmem(void)
{
[snip]
#ifdef __LP64__

#ifndef CONFIG_DISCONTIGMEM
[snip]
         for (i = 1; i < npmem_ranges; i++) {
                 if (pmem_ranges[i].start_pfn -
                         (pmem_ranges[i-1].start_pfn +
                          pmem_ranges[i-1].pages) > MAX_GAP) {
                         npmem_ranges = i;
                         break;
                 }
         }
#endif
[snip]

I try to have a look to implement 'CONFIG_DISCONTIGMEM' but I am not a developer and have not enough kernel knowledge to do it.

Just in the hope it could help you,
	Joel
> Now, if you put all this together, you'll see that for ILP32
> executables on the LP64 kernel, we only need a two level page table
> (2048 directory entries * 512 PTEs * 4096 = 4GB), saving us one level
> of indirect lookup.
> 
> Additionally, if we ever get around to implementing LP64 user binaries
> (and you know who you are...) we would then be able to address up to
> 2048 * 2048 * 512 * 4096 = 8EB of virtual space using a three level
> page table.
> 
> The disadvantages:
> 
> 1) Our directory entries become order one allocations.  Linux is
>    careful about this, so these type of allocations should be
>    plentiful and we only need one directory per ILP32 process anyway.
> 
> 2) we have to allocate GFP_DMA.  Since very few people actually have a
>    PA machine with more than 4GB of ram, this shouldn't be too much of
>    a problem.
> 
> The advantages:
> 
> 1) We get an extra sixteen PTE flags to play with.
> 
> 2) We use 2 level page tables for ILP32 user processes on LP64.
> 
> 3) We can unify the narrow and wide TLB miss handlers (we'd actually
>    predicate the 2 or 3 level lookup on the width of the user binary).
> 
> James
> 
> 
> _______________________________________________
> parisc-linux mailing list
> parisc-linux at lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> 


More information about the parisc-linux mailing list