• Nick Piggin's avatar
    mm: introduce pte_special pte bit · 7e675137
    Nick Piggin authored
    s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
    model (which is more dynamic than most).  Instead, they had proposed to
    implement it with an additional path through vm_normal_page(), using a bit in
    the pte to determine whether or not the page should be refcounted:
    
    vm_normal_page()
    {
    	...
            if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
                    if (vma->vm_flags & VM_MIXEDMAP) {
    #ifdef s390
    			if (!mixedmap_refcount_pte(pte))
    				return NULL;
    #else
                            if (!pfn_valid(pfn))
                                    return NULL;
    #endif
                            goto out;
                    }
    	...
    }
    
    This is fine, however if we are allowed to use a bit in the pte to determine
    refcountedness, we can use that to _completely_ replace all the vma based
    schemes.  So instead of adding more cases to the already complex vma-based
    scheme, we can have a clearly seperate and simple pte-based scheme (and get
    slightly better code generation in the process):
    
    vm_normal_page()
    {
    #ifdef s390
    	if (!mixedmap_refcount_pte(pte))
    		return NULL;
    	return pte_page(pte);
    #else
    	...
    #endif
    }
    
    And finally, we may rather make this concept usable by any architecture rather
    than making it s390 only, so implement a new type of pte state for this.
    Unfortunately the old vma based code must stay, because some architectures may
    not be able to spare pte bits.  This makes vm_normal_page a little bit more
    ugly than we would like, but the 2 cases are clearly seperate.
    
    So introduce a pte_special pte state, and use it in mm/memory.c.  It is
    currently a noop for all architectures, so this doesn't actually result in any
    compiled code changes to mm/memory.o.
    
    BTW:
    I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
    The reason is that, regardless of where vm_normal_page is actually
    implemented, the *abstraction* is still exactly the same. Also, while it
    depends on whether the architecture has pte_special or not, that is the
    only two possible cases, and it really isn't an arch specific function --
    the role of the arch code should be to provide primitive functions and
    accessors with which to build the core code; pte_special does that. We do
    not want architectures to know or care about vm_normal_page itself, and
    we definitely don't want them being able to invent something new there
    out of sight of mm/ code. If we made vm_normal_page an arch function, then
    we have to make vm_insert_mixed (next patch) an arch function too. So I
    don't think moving it to arch code fundamentally improves any abstractions,
    while it does practically make the code more difficult to follow, for both
    mm and arch developers, and easier to misuse.
    
    [akpm@linux-foundation.org: build fix]
    Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
    Acked-by: default avatarCarsten Otte <cotte@de.ibm.com>
    Cc: Jared Hulbert <jaredeh@gmail.com>
    Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7e675137
pgtable_64.h 11.3 KB