1. 31 Oct, 2002 9 commits
    • John Levon's avatar
      [PATCH] oprofile: tiny makefile tidy · 14029c7b
      John Levon authored
      14029c7b
    • John Levon's avatar
      [PATCH] fix timer_pit.c warning · 213afbef
      John Levon authored
      make x86_do_profile available when UP=y,LOCAL_APIC=n
      213afbef
    • Andrew Morton's avatar
      [PATCH] hugetlbfs backing for SYSV shared memory · bba2dd58
      Andrew Morton authored
      From Bill Irwin
      
      Optionally back priviled processes' shm with hugetlbfs.
      
      One of the more common requests for and/or users of hugetlb interfaces
      in general are databases using shm.  This patch exports functionality
      mostly equivalent to tmpfs, adds the calling sequence to ipc/shm.c, and
      hashes out a small support function in fs/hugetlbfs/inode.c so that shm
      segments may be hugetlbpage-backed if userspace passes a flag to
      shmget().
      
      Access to this resource requires CAP_IPC_LOCK.
      bba2dd58
    • Andrew Morton's avatar
      [PATCH] hugetlbfs file system · 9f3336ab
      Andrew Morton authored
      From Bill Irwin
      
      Tiny hugetlbpage ram-backed filesystem.
      
      Some way to export hugetlbfs through more standard system call
      interfaces was needed, and hugetlbfs already had inodes with ratnodes
      etc.  used to track offset -> page translations, so adding the rest of
      a filesystem around it was easy and natural.  Most of it is identical
      to ramfs, except ->f_op->mmap() is now just a wrapper around the
      hugetlb_prefault() to fill in the VMA, and to simplify it,
      ->readpage(), ->prepare_write(), and ->commit_write() are omitted.
      
      Permissions:
      
      (1) check capable(CAP_IPC_LOCK) in ->f_ops->mmap
              This may be redundant but it errors out with less state to
              clean up and at least clarifies the fact that checks are
              being performed at the relevant entry points.
      
      (2) check capable(CAP_IPC_LOCK) in hugetlbfs_zero_setup()
              This is called at shmget() time and is an actual potential
              security hole. hugetlb_prefault() does not perform this
              check itself, so it must be done here.
      9f3336ab
    • Andrew Morton's avatar
      [PATCH] fix hugetlb thinko · 1541c38b
      Andrew Morton authored
      It's setting the page count on the wrong page.
      1541c38b
    • Andrew Morton's avatar
      [PATCH] hugetlb fixes andhugetlb fixes and cleanups cleanups · b2229e8d
      Andrew Morton authored
      huge_page_release()             -- hugepage refcounting
      free_huge_page()                -- separates freeing from inode refcounting
      unmap_hugepage_range()          -- unmapping refcounting hook when locked
      zap_hugepage_range()            -- unmappping refcounting hook when unlocked
      export setattr_mask()           -- hugetlbfs wants to call it
      export destroy_inode()          -- hugetlbfs wants to use it
      export unmap_vma()              -- hugetlbpage.c wants to use it
      unlock_page() in hugetlbpage.c  -- fixes deadlock in hugetlbfs_truncate()
      b2229e8d
    • Andrew Morton's avatar
      [PATCH] Move hugetlb declarations into their own header · 5c7eb9d8
      Andrew Morton authored
      From Bill Irwin
      
      Move hugetlb and hugetlbfs declarations into a dedicated header file.
      
      Hugetlb's big #ifdeffed block in mm.h got a lot bigger with hugetlbfs.
      This patch basically attempts to remove the noise from mm.h by simply
      rearranging it into a new header, and fixing all users of hugetlb.
      5c7eb9d8
    • Andrew Morton's avatar
      [PATCH] hugetlbpages: factor out some code for hugetlbfs · d38c229c
      Andrew Morton authored
      In order for hugetlbfs to operate, prefaulting the vma at mmap()-time
      while simultaneously instantiating and performing lookups on its
      ratcache entries is needed as an isolated operation.  This is
      implemented as part of a different function within hugetlbpage.c that
      ties it to inode and key lookup and allocation.
      
      The following patch simply moves the code already present into its own
      function, calls it, and makes it available for hugetlbfs to use.
      d38c229c
    • Roman Zippel's avatar
      [PATCH] check QT only if needed · e66c772c
      Roman Zippel authored
      On Wed, 30 Oct 2002, Aaron Lehmann wrote:
      >
      > Now running 'make oldconfig' or 'make menuconfig' requires a Qt
      > installation. I believe that this is a bug because these still work
      > fine without Qt when the -k flag is passed to make.
      
      Yes, it's a bug. This fixes it without breaking xconfig.
      e66c772c
  2. 30 Oct, 2002 31 commits
    • Linus Torvalds's avatar
      Linux v2.5.45. For real this time. · b1b782f7
      Linus Torvalds authored
      b1b782f7
    • Linus Torvalds's avatar
      Merge master.kernel.org:/home/davem/BK/net-2.5 · dc85a09d
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      dc85a09d
    • Neil Brown's avatar
      [PATCH] kNFSd: Convert nfsd to use a list of pages instead of one big buffer · a0e7d495
      Neil Brown authored
      This means:
        1/ We don't need an order-4 allocation for each nfsd that starts
        2/ We don't need an order-4 allocation in skb_linearize when
           we receive a 32K write request
        3/ It will be easier to incorporate the zero-copy read changes
      
      The pages are handed around using an xdr_buf (instead of svc_buf)
      much like the NFS client so future crypto code can use the same
      data structure for both client and server.
      
      The code assumes that most requests and replies fit in a single page.
      The exceptions are assumed to have some largish 'data' bit, and the
      rest must fit in a single page.
      The 'data' bits are file data, readdir data, and symlinks.
      There must be only one 'data' bit per request.
      This is all fine for nfs/nlm.
      
      This isn't complete:
        1/ NFSv4 hasn't been converted yet (it won't compile)
        2/ NFSv3 allows symlinks upto 4096, but the code will only support
           upto about 3800 at the moment
        3/ readdir responses are limited to about 3800.
      
      but I thought that patch was big enough, and the rest can come
      later.
      
      
      This patch introduces vfs_readv and vfs_writev as parallels to
      vfs_read and vfs_write.  This means there is a fair bit of
      duplication in read_write.c that should probably be tidied up...
      a0e7d495
    • Neil Brown's avatar
      [PATCH] kNFSd: nfsd_readdir changes. · 335c5fc7
      Neil Brown authored
      nfsd_readdir - the common readdir code for all version of nfsd,
      contains a number of version-specific things with appropriate checks,
      and also does some xdr-encoding which rightly belongs elsewhere.
      
      This patch simplifies nfsd_readdir to do just the core stuff, and moves
      the version specifics into version specific files, and the xdr encoding
      into xdr encoding files.
      335c5fc7
    • Neil Brown's avatar
      [PATCH] kNFSd: Fix problem with buffer length with rpc/tcp · f319e5fa
      Neil Brown authored
      I forgot to add '1' for the record-length header in RPC/TCP.
       Thanks to  Hirokazu Takahashi <taka@valinux.co.jp>
      f319e5fa
    • Neil Brown's avatar
      [PATCH] kNFSd: Make sure export_open cleans up on failure. · 988d8f66
      Neil Brown authored
      Currently if the kmalloc in exports_open fails,
      the seq_file isn't seq_released.
      
      We now do the kmalloc first, and make sure to kfree
      if seq_open fails.
      988d8f66
    • Neil Brown's avatar
      [PATCH] kNFSd: Fix nfs shutdown problem. · b9d189e5
      Neil Brown authored
      The 'unexport everything' that happens when the
      last nfsd thread dies was shuting down too much -
      things that should only be shut down on module unload.
      b9d189e5
    • Matthew Dobson's avatar
      [PATCH] Remove sole CONFIG_MULIQUAD in kernel source · 23518c21
      Matthew Dobson authored
      There is one remaining instance of CONFIG_MULTIQUAD in the kernel source.
      
      Fix it to use the proper CONFIG_X86_NUMAQ instead.
      23518c21
    • Neil Brown's avatar
      [PATCH] md: factor out MD superblock handling code · d571b483
      Neil Brown authored
      Define an interface for interpreting and updating superblocks
      so we can more easily define new formats.
      
      With this patch, (almost) all superblock layout information is
      locating in a small set of routines dedicated to superblock
      handling.  This will allow us to provide a similar set for
      a different format.
      
      The two exceptions are:
       1/ autostart_array where the devices listed in the superblock
          are searched for.
       2/ raid5 'knows' the maximum number of devices for
           compute_parity.
      
      These will be addressed in a later patch.
      d571b483
    • Linus Torvalds's avatar
      Merge · 6932d2d5
      Linus Torvalds authored
      6932d2d5
    • Andi Kleen's avatar
      [PATCH] x86-64 updates for 2.5.44 · d05e5732
      Andi Kleen authored
      A few updates for x86-64 in 2.5.44. Some of the bugs fixed were serious.
      
      - Don't count ACPI mappings in end_pfn. This shrinks mem_map a lot
        on many setups.
      - Fix mem= option. Remove custom mapping support.
      - Revert per_cpu implementation to the generic version. The optimized one
        that used %gs directly triggered too many toolkit problems and was an
        constant source of bugs.
      - Make sure pgd_offset_k works correctly for vmalloc mappings. This makes
        modules work again properly.
      - Export pci dma symbols
      - Export other symbols to make more modules work
      - Don't drop physical address bits >32bit on iommu free.
      - Add more prototypes to fix warnings
      - Resync pci subsystem with i386
      - Fix pci dma kernel option parsing.
      - Do PCI peer bus scanning after ACPI in case it missed some busses
        (that's a workaround - 2.5 ACPI seems to have some problems here that
        I need to investigate more closely)
      - Remove the .eh_frame on linking. This saves several hundred KB in the
        bzImage
      - Fix MTRR initialization. It works properly now on SMP again.
      - Fix kernel option parsing, it was broken by section name changes in
        init.h
      - A few other cleanups and fixes.
      - Fix nonatomic warning in ioport.c
      d05e5732
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: free and allocate hints · 8d6282a1
      Andrew Morton authored
      Add a `cold' hint to struct pagevec, and teach truncate and page
      reclaim to use it.
      
      Empirical testing showed that truncate's pages tend to be hot.  And page
      reclaim's are certainly cold.
      8d6282a1
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: use cold pages for readahead · 5019ce29
      Andrew Morton authored
      It is usually the case that pagecache reads use busmastering hardware
      to transfer the data into pagecache.  This invalidates the CPU cache of
      the pagecache pages.
      
      So use cache-cold pages for pagecache reads.  To avoid wasting
      cache-hot pages.
      5019ce29
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: page allocator core · a206231b
      Andrew Morton authored
      Hot/Cold pages and zone->lock amortisation
      a206231b
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: bulk page freeing · 1d2652dd
      Andrew Morton authored
      Patch from Martin Bligh.
      
      Implements __free_pages_bulk().  Release multiple pages of a given
      order into the buddy all within a single acquisition of the zone lock.
      
      This also removes current->local_pages.  The per-task list of pages
      which only ever contained one page.  To prevent other tasks from
      stealing pages which this task has just freed up.
      
      Given that we're freeing into the per-cpu caches, and that those are
      multipage caches, and the cpu-stickiness of the scheduler, I think
      current->local_pages is no longer needed.
      1d2652dd
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: bulk page allocator · 38e419f5
      Andrew Morton authored
      This is the hot-n-cold-pages series.  It introduces a per-cpu lockless
      LIFO pool in front of the page allocator.  For three reasons:
      
      1: To reduce lock contention on the buddy lock: we allocate and free
         pages in, typically, 16-page chunks.
      
      2: To return cache-warm pages to page allocation requests.
      
      3: As infrastructure for a page reservation API which can be used to
         ensure that the GFP_ATOMIC radix-tree node and pte_chain allocations
         cannot fail.  That code is not complete, and does not absolutely
         require hot-n-cold pages.  It'll work OK though.
      
      We add two queues per CPU.  The "hot" queue contains pages which the
      freeing code thought were likely to be cache-hot.  By default, new
      allocations are satisfied from this queue.
      
      The "cold" queue contains pages which the freeing code expected to be
      cache-cold.  The cold queue is mainly for lock amortisation, although
      it is possible to explicitly allocate cold pages.  The readahead code
      does that.
      
      I have been hot and cold on these patches for quite some time - the
      benefit is not great.
      
      - 4% speedup in Randy Hron's benching of the autoconf regression
        tests on a 4-way.  Most of this came from savings in pte_alloc and
        pmd_alloc: the pagetable clearing code liked the warmer pages (some
        architectures still have the pgt_cache, and can perhaps do away with
        them).
      
      - 1% to 2% speedup in kernel compiles on my 4-way and Martin's 32-way.
      
      - 60% speedup in a little test program which writes 80 kbytes to a
        file and ftruncates it to zero again.  Ran four instances of that on
        4-way and it loved the cache warmth.
      
      - 2.5% speedup in Specweb testing on 8-way
      
      - The thing which won me over: an 11% increase in throughput of the
        SDET benchmark on an 8-way PIII:
      
      	with hot & cold:
      
      	RESULT for 8 users is 17971    +12.1%
      	RESULT for 16 users is 17026   +12.0%
      	RESULT for 32 users is 17009   +10.4%
      	RESULT for 64 users is 16911   +10.3%
      
      	without:
      
      	RESULT for 8 users is 16038
      	RESULT for 16 users is 15200
      	RESULT for 32 users is 15406
      	RESULT for 64 users is 15331
      
        SDET is a very old SPEC test which simulates a development
        environment with a large number of users.  Lots of users running a
        mix of shell commands, basically.
      
      
      These patches were written by Martin Bligh and myself.
      
      This one implements rmqueue_bulk() - a function for removing multiple
      pages of a given order from the buddy lists.
      
      This is for lock amortisation: take the highly-contended zone->lock
      with less frequency, do more work once it has been acquired.
      38e419f5
    • Andrew Morton's avatar
      [PATCH] percpu: convert global page accounting · afce7191
      Andrew Morton authored
      Convert global page state accounting to use per-cpu storage
      
      (I think this code remains a little buggy, btw.  Note how I do
      
      	per_cpu(page_states, cpu).member += (delta);
      
      This gets done at interrupt time and hence is assuming that
      the "+=" operation on a ulong is atomic wrt interrupts on
      all architectures. How do we feel about that assumption?)
      afce7191
    • Andrew Morton's avatar
      [PATCH] percpu: create an EXPORT_PER_CPU_SYMBOL() macro · 999eac41
      Andrew Morton authored
      This is needed so that per-cpu information in the core kernel can be
      accessed from modules.
      999eac41
    • Andrew Morton's avatar
      [PATCH] percpu: convert buffer.c · e252fb96
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch makes per_cpu bh_accounting safe for cpu_possible
      allocation by using cpu notifiers.
      e252fb96
    • Andrew Morton's avatar
      [PATCH] percpu: convert softirqs · c1bf37e9
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch makes per_cpu tasklet vectors safe for cpu_possible
      allocation by using CPU notifiers.
      c1bf37e9
    • Andrew Morton's avatar
      [PATCH] percpu: convert timers · cf228cdc
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch changes the per-CPU data in timer management (tvec_bases)
      to use per_cpu data area and makes it safe for cpu_possible allocation
      by using CPU notifiers. End result - saving space.
      
      Depends on cpu_possible patch.
      cf228cdc
    • Andrew Morton's avatar
      [PATCH] percpu: convert RCU · c12e16e2
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch convers RCU per_cpu data to use per_cpu data area
      and makes it safe for cpu_possible allocation by using CPU
      notifiers.
      c12e16e2
    • Andrew Morton's avatar
      [PATCH] percpu: fix compile warning for UP builds · 0c83f291
      Andrew Morton authored
      A typical construct is:
      
      	int cpu = get_cpu();
      
      	foo = per_cpu(bar, cpu);
      	put_cpu();
      
      but this generates a compiler warning on uniprocessor builds: unused
      variable `cpu'.
      
      Add a dummy ref to `cpu' to per_cpu() to prevent this.
      0c83f291
    • Andrew Morton's avatar
      [PATCH] percpu: balance_dirty_pages ratelimit counters · f98bf5ff
      Andrew Morton authored
      Convert balance_dirty_pages_ratelimited() to use percpu storage
      for the ratelimiting counters.
      f98bf5ff
    • Alexey Kuznetsov's avatar
      [UDP]: Delete buggy assertion. · 4c664ca5
      Alexey Kuznetsov authored
      4c664ca5
    • Andrew Morton's avatar
      [PATCH] slab: Use CPU notifiers · 4524ea04
      Andrew Morton authored
      - allocate memory for cpu buffers in cpu_up_prepare
      
      - start the timer in cpu_online
      
      - free the memory for cpu buffers in cpu_up_cancel.
      4524ea04
    • Andrew Morton's avatar
      [PATCH] slab: additional code cleanup · b464df2e
      Andrew Morton authored
      From Manfred Spraul
      
      - remove all typedef, except the kmem_bufctl_t.  It's a redefine for
        an int, i.e.  qualifies as tiny.
      
      - convert most macros to inline functions.
      b464df2e
    • Andrew Morton's avatar
      [PATCH] slab: Remove cache_chain_lock · 716b7ab1
      Andrew Morton authored
      Manfred added a new lock to protect the global list of slab caches.  We
      already have a semaphore from those but he needs locking from timer
      context.
      
      So here we remove that lock and just do a down_trylock() on the
      existing semaphore.  If that fails give up - we'll try again next timer
      tick.
      716b7ab1
    • Andrew Morton's avatar
      [PATCH] slab: Rework the slab timer code to use add_timer_on · bf19f75e
      Andrew Morton authored
      Manfred had all this weird code to schedule a kernel thread onto a
      different CPU just so that we could bond a timer to that CPU.
      
      Convert it all to use the new add_timer_on().
      bf19f75e
    • Andrew Morton's avatar
      [PATCH] slab: reap timers · fd1425d5
      Andrew Morton authored
      - add a reap timer that returns stale objects from the cpu arrays
      - use list_for_each instead of while loops
      - /proc/slabinfo layout change, for a new field about reaping.
      
      Implementation:
      slab contains 2 caches that contain objects that might be usable to the
      systems:
      - the cpu arrays contains objects that other cpus could use
      - the slabs_free list contains freeable slabs, i.e. pages that someone
      else might want.
      
      The patch now keeps track of accesses to the cpu arrays and to the free
      list. If there were no recent activities in one of the caches, part of
      the cache is flushed.
      
      Unlike <2.5.39, only a small part (~20%) is flushed each time:
      The older kernel would refill/drain bounce heavily under memory pressure:
      
      - kmem_cache_alloc: notices that there are no objects in the cpu
              cache, loads 120 objects from the slab lists, return 1.
              [assuming batchcount=120]
      - kmem_cache_reap is called due to memory pressure, finds 119
              objects in the cpu array and returns them to the slab lists.
      - repeat.
      
      In addition, the length of the free list is limited based on the free
      list accesses: a fixed "1" limit hurts the large object caches.
      
      That's the last part for now, next is: [not yet written]
      - cleanup: BUG_ON instead of if() BUG
      - OOM handling for enable_cpucaches
      - remove the unconditional might_sleep() from
              cache_alloc_debugcheck_before, and make that DEBUG dependant.
      - initial NUMA support, just to collect some stats:
              Which percentage of the objects are freed on the wrong
              node? 0.1% or 20%?
      fd1425d5
    • Andrew Morton's avatar
      [PATCH] slab: uninline poisoning checks · 1aabbecc
      Andrew Morton authored
      remove inline from the cache poison checks: the functions are not
      performance critical.
      1aabbecc