1. 30 Oct, 2009 15 commits
    • David Gibson's avatar
      powerpc/mm: Cleanup initialization of hugepages on powerpc · d1837cba
      David Gibson authored
      This patch simplifies the logic used to initialize hugepages on
      powerpc.  The somewhat oddly named set_huge_psize() is renamed to
      add_huge_page_size() and now does all necessary verification of
      whether it's given a valid hugepage sizes (instead of just some) and
      instantiates the generic hstate structure (but no more).
      
      hugetlbpage_init() now steps through the available pagesizes, checks
      if they're valid for hugepages by calling add_huge_page_size() and
      initializes the kmem_caches for the hugepage pagetables.  This means
      we can now eliminate the mmu_huge_psizes array, since we no longer
      need to pass the sizing information for the pagetable caches from
      set_huge_psize() into hugetlbpage_init()
      
      Determination of the default huge page size is also moved from the
      hash code into the general hugepage code.
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d1837cba
    • David Gibson's avatar
      powerpc/mm: Allow more flexible layouts for hugepage pagetables · a4fe3ce7
      David Gibson authored
      Currently each available hugepage size uses a slightly different
      pagetable layout: that is, the bottem level table of pointers to
      hugepages is a different size, and may branch off from the normal page
      tables at a different level.  Every hugepage aware path that needs to
      walk the pagetables must therefore look up the hugepage size from the
      slice info first, and work out the correct way to walk the pagetables
      accordingly.  Future hardware is likely to add more possible hugepage
      sizes, more layout options and more mess.
      
      This patch, therefore reworks the handling of hugepage pagetables to
      reduce this complexity.  In the new scheme, instead of having to
      consult the slice mask, pagetable walking code can check a flag in the
      PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
      and the entry also contains the information (eseentially hugepage
      shift) necessary to then interpret that table without recourse to the
      slice mask.  This scheme can be extended neatly to handle multiple
      levels of self-describing "special" hugepage pagetables, although for
      now we assume only one level exists.
      
      This approach means that only the pagetable allocation path needs to
      know how the pagetables should be set out.  All other (hugepage)
      pagetable walking paths can just interpret the structure as they go.
      
      There already was a flag bit in PGD/PUD/PMD entries for hugepage
      directory pointers, but it was only used for debug.  We alter that
      flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
      pointer (normally it would be 1 since the pointer lies in the linear
      mapping).  This means that asm pagetable walking can test for (and
      punt on) hugepage pointers with the same test that checks for
      unpopulated page directory entries (beq becomes bge), since hugepage
      pointers will always be positive, and normal pointers always negative.
      
      While we're at it, we get rid of the confusing (and grep defeating)
      #defining of hugepte_shift to be the same thing as mmu_huge_psizes.
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a4fe3ce7
    • David Gibson's avatar
      powerpc/mm: Cleanup management of kmem_caches for pagetables · a0668cdc
      David Gibson authored
      Currently we have a fair bit of rather fiddly code to manage the
      various kmem_caches used to store page tables of various levels.  We
      generally have two caches holding some combination of PGD, PUD and PMD
      tables, plus several more for the special hugepage pagetables.
      
      This patch cleans this all up by taking a different approach.  Rather
      than the caches being designated as for PUDs or for hugeptes for 16M
      pages, the caches are simply allocated to be a specific size.  Thus
      sharing of caches between different types/levels of pagetables happens
      naturally.  The pagetable size, where needed, is passed around encoded
      in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
      pagetable contains 2^n pointers.
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0668cdc
    • David Gibson's avatar
      powerpc/mm: Make hpte_need_flush() correctly mask for multiple page sizes · f71dc176
      David Gibson authored
      Currently, hpte_need_flush() only correctly flushes the given address
      for normal pages.  Callers for hugepages are required to mask the
      address themselves.
      
      But hpte_need_flush() already looks up the page sizes for its own
      reasons, so this is a rather silly imposition on the callers.  This
      patch alters it to mask based on the pagesize it has looked up itself,
      and removes the awkward masking code in the hugepage caller.
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f71dc176
    • Brian King's avatar
      powerpc: Add kdump support to Collaborative Memory Manager · 8be8cf5b
      Brian King authored
      When running Active Memory Sharing, the Collaborative Memory Manager (CMM)
      may mark some pages as "loaned" with the hypervisor. Periodically, the
      CMM will query the hypervisor for a loan request, which is a single signed
      value. When kexec'ing into a kdump kernel, the CMM driver in the kdump
      kernel is not aware of the pages the previous kernel had marked as "loaned",
      so the hypervisor and the CMM driver are out of sync. Fix the CMM driver
      to handle this scenario by ignoring requests to decrease the number of loaned
      pages if we don't think we have any pages loaned. Pages that are marked as
      "loaned" which are not in the balloon will automatically get switched to "active"
      the next time we touch the page. This also fixes the case where totalram_pages
      is smaller than min_mem_mb, which can occur during kdump.
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8be8cf5b
    • Michael Ellerman's avatar
      powerpc: Remove get_irq_desc() · 6cff46f4
      Michael Ellerman authored
      get_irq_desc() is a powerpc-specific version of irq_to_desc(). That
      is reason enough to remove it, but it also doesn't know about sparse
      irq_desc support which irq_to_desc() does (when we enable it).
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Acked-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6cff46f4
    • Michael Ellerman's avatar
      powerpc/pseries: Use irq_has_action() in eeh_disable_irq() · 59e3f837
      Michael Ellerman authored
      Rather than open-coding our own check, use irq_has_action()
      to check if an irq has an action - ie. is "in use".
      
      irq_has_action() doesn't take the descriptor lock, but it
      shouldn't matter - we're just using it as an indicator
      that the irq is in use. disable_irq_nosync() will take
      the descriptor lock before doing anything also.
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Acked-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      59e3f837
    • Michael Ellerman's avatar
      powerpc: Make NR_IRQS a CONFIG option · 551b81f2
      Michael Ellerman authored
      The irq_desc array consumes quite a lot of space, and for systems
      that don't need or can't have 512 irqs it's just wasted space.
      
      The first 16 are reserved for ISA, so the minimum of 32 is really
      16 - and no one has asked for more than 512 so leave that as the
      maximum.
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      551b81f2
    • Anton Vorontsov's avatar
      of/platform: Implement support for dev_pm_ops · d35ef90b
      Anton Vorontsov authored
      Linux power management subsystem supports vast amount of new PM
      callbacks that are crucial for proper suspend and hibernation support
      in drivers.
      
      This patch implements support for dev_pm_ops, preserving support
      for legacy callbacks.
      Signed-off-by: default avatarAnton Vorontsov <avorontsov@ru.mvista.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d35ef90b
    • Benjamin Herrenschmidt's avatar
      powerpc/chrp: Use the same RTAS daemon as pSeries · 3d541c4b
      Benjamin Herrenschmidt authored
      The CHRP code has some fishy timer based code to scan the RTAS event
      log, which uses a 1KB stack buffer and doesn't even use the results.
      
      The pSeries code as a nicer daemon that allows userspace to read the
      event log and basically uses the same RTAS interface
      
      This patch moves rtasd.c out of platform/pseries and makes it usable
      by CHRP, after removing the old crufty event log mechanism in there.
      
      The nvram logging part of the daemon is still only available on 64-bit
      since the underlying nvram management routines aren't currently shared.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3d541c4b
    • Benjamin Herrenschmidt's avatar
      powerpc: Move /proc/ppc64 to /proc/powerpc and add symlink · 188917e1
      Benjamin Herrenschmidt authored
      Some of the stuff in /proc/ppc64 such as the RTAS bits are actually
      useful to some 32-bit platforms. Rename the file, and create a
      symlink on 64-bit for backward compatibility
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      188917e1
    • Anton Vorontsov's avatar
      powerpc: Make it possible to select hibernation on all PowerPCs · 64eb38a6
      Anton Vorontsov authored
      Just as with kexec, hibernation may fail even on well-tested platforms:
      some PCI device, a driver of which doesn't play well with hibernation,
      is enough to break resuming.
      
      Hibernation code is not much platform dependent, and hiding features only
      because these were not verified on a particular hardware is
      counterproductive: we just prevent the features from being widely tested.
      
      For example, with this patch I just tested hibernation on a MPC83xx
      board, and it works quite well, modulo a few drivers that need some
      fixing.
      
      So, let's make it possible to select hibernation support for all
      PowerPCs, then let's wait for any possible bug reports, and actually fix
      (or just collect ;-) the bugs instead of hiding them. If some platforms
      really can't stand hibernation, we can make a blacklist, with proper
      comments why exactly hibernation doesn't work, whether it is possible to
      fix, and what needs to be done to fix it.
      
      CONFIG_HIBERNATION is still =n by default, so the commit doesn't change
      anything apart from ability to set it to =y.
      
      I'm not sure if EXPERIMENTAL dependency is needed, I'd rather not add it
      for a few reasons:
      
      1) It doesn't matter much, for distro kernels user has no clue that some
         feature is experimental. Majority of defconfigs enable EXPERIMENTAL
         anyway (90 vs. 4, which, btw, means that EXPERIMENTAL is overused
         in Kconfigs);
      
      2) EXPERIMENTAL is a good thing for features that change default
         behaviour of a kernel, while for hibernation user has to explicitly
         issue 'echo disk > /sys/power/state' to trigger any hibernation bugs;
      
      3) Per init/Kconfig, EXPERIMENTAL is a good thing to scare and discourage
         users from 'widespread use of a feature', while we want to encourage
         that use.
      Signed-off-by: default avatarAnton Vorontsov <avorontsov@ru.mvista.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      64eb38a6
    • Michael Ellerman's avatar
      powerpc/ps3: Use pr_devel() in ps3/mm.c · 7424639a
      Michael Ellerman authored
      The non-debug case in ps3/mm.c uses pr_debug(), so that the compiler
      still does type checks etc. and doesn't complain about unused
      variables in the non-debug case.
      
      However with DEBUG=n and CONFIG_DYNAMIC_DEBUG=y there's still code
      generated for those pr_debugs().
      
      size before:
         text    data     bss     dec     hex filename
        17553	   4112	     88	  21753	   54f9	arch/powerpc/platforms/ps3/mm.o
      
      size after:
         text    data     bss     dec     hex filename
         7377	    776	     88	   8241	   2031	arch/powerpc/platforms/ps3/mm.o
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Acked-by: default avatarGeoff Levand <geoffrey.levand@am.sony.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7424639a
    • Alexey Dobriyan's avatar
    • Benjamin Herrenschmidt's avatar
  2. 29 Oct, 2009 25 commits