1. 24 Aug, 2004 40 commits
    • Hugh Dickins's avatar
      [PATCH] clarify get_task_mm (mmgrab) · 7dbb1d67
      Hugh Dickins authored
      Clarify mmgrab by collapsing it into get_task_mm (in fork.c not inline),
      and commenting on the special case it is guarding against: when use_mm in
      an AIO daemon temporarily adopts the mm while it's on its way out.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7dbb1d67
    • Marcelo Tosatti's avatar
      [PATCH] x86 bitops.h commentary on instruction reordering · c524e494
      Marcelo Tosatti authored
      Back when we were discussing the need for a memory barrier in sync_page(),
      it came to me (thanks Andrea!) that the bit operations can be perfectly
      reordered on architectures other than x86.
      
      I think the commentary on i386 bitops.h is misleading, its worth to note
      that that these operations are not guaranteed not to be reordered on
      different architectures.
      
      clear_bit() already does that:
      
       * clear_bit() is atomic and may not be reordered.  However, it does
       * not contain a memory barrier, so if it is used for locking purposes,
       * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit()
       * in order to ensure changes are visible on other processors.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c524e494
    • Hugh Dickins's avatar
      [PATCH] rmaplock: swapoff use anon_vma · 69929041
      Hugh Dickins authored
      Swapoff can make good use of a page's anon_vma and index, while it's still
      left in swapcache, or once it's brought back in and the first pte mapped back:
      unuse_vma go directly to just one page of only those vmas with the same
      anon_vma.  And unuse_process can skip any vmas without an anon_vma (extending
      the hugetlb check: hugetlb vmas have no anon_vma).
      
      This just hacks in on top of the existing procedure, still going through all
      the vmas of all the mms in mmlist.  A more elegant procedure might replace
      mmlist by a list of anon_vmas: but that would be more work to implement, with
      apparently more overhead in the common paths.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      69929041
    • Hugh Dickins's avatar
      [PATCH] rmaplock: mm lock ordering · 9d9ae43b
      Hugh Dickins authored
      With page_map_lock out of the way, there's no need for page_referenced and
      try_to_unmap to use trylocks - provided we switch anon_vma->lock and
      mm->page_table_lock around in anon_vma_prepare.  Though I suppose it's
      possible that we'll find that vmscan makes better progress with trylocks than
      spinning - we're free to choose trylocks again if so.
      
      Try to update the mm lock ordering documentation in filemap.c.  But I still
      find it confusing, and I've no idea of where to stop.  So add an mm lock
      ordering list I can understand to rmap.c.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9d9ae43b
    • Hugh Dickins's avatar
      [PATCH] rmaplock: SLAB_DESTROY_BY_RCU · 77631565
      Hugh Dickins authored
      With page_map_lock gone, how to stabilize page->mapping's anon_vma while
      acquiring anon_vma->lock in page_referenced_anon and try_to_unmap_anon?
      
      The page cannot actually be freed (vmscan holds reference), but however much
      we check page_mapped (which guarantees that anon_vma is in use - or would
      guarantee that if we added suitable barriers), there's no locking against page
      becoming unmapped the instant after, then anon_vma freed.
      
      It's okay to take anon_vma->lock after it's freed, so long as it remains a
      struct anon_vma (its list would become empty, or perhaps reused for an
      unrelated anon_vma: but no problem since we always check that the page located
      is the right one); but corruption if that memory gets reused for some other
      purpose.
      
      This is not unique: it's liable to be problem whenever the kernel tries to
      approach a structure obliquely.  It's generally solved with an atomic
      reference count; but one advantage of anon_vma over anonmm is that it does not
      have such a count, and it would be a backward step to add one.
      
      Therefore...  implement SLAB_DESTROY_BY_RCU flag, to guarantee that such a
      kmem_cache_alloc'ed structure cannot get freed to other use while the
      rcu_read_lock is held i.e.  preempt disabled; and use that for anon_vma.
      
      Fix concerns raised by Manfred: this flag is incompatible with poisoning and
      destructor, and kmem_cache_destroy needs to synchronize_kernel.
      
      I hope SLAB_DESTROY_BY_RCU may be useful elsewhere; but though it's safe for
      little anon_vma, I'd be reluctant to use it on any caches whose immediate
      shrinkage under pressure is important to the system.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      77631565
    • Hugh Dickins's avatar
      [PATCH] rmaplock: kill page_map_lock · edcc56dc
      Hugh Dickins authored
      The pte_chains rmap used pte_chain_lock (bit_spin_lock on PG_chainlock) to
      lock its pte_chains.  We kept this (as page_map_lock: bit_spin_lock on
      PG_maplock) when we moved to objrmap.  But the file objrmap locks its vma tree
      with mapping->i_mmap_lock, and the anon objrmap locks its vma list with
      anon_vma->lock: so isn't the page_map_lock superfluous?
      
      Pretty much, yes.  The mapcount was protected by it, and needs to become an
      atomic: starting at -1 like page _count, so nr_mapped can be tracked precisely
      up and down.  The last page_remove_rmap can't clear anon page mapping any
      more, because of races with page_add_rmap; from which some BUG_ONs must go for
      the same reason, but they've served their purpose.
      
      vmscan decisions are naturally racy, little change there beyond removing
      page_map_lock/unlock.  But to stabilize the file-backed page->mapping against
      truncation while acquiring i_mmap_lock, page_referenced_file now needs page
      lock to be held even for refill_inactive_zone.  There's a similar issue in
      acquiring anon_vma->lock, where page lock doesn't help: which this patch
      pretends to handle, but actually it needs the next.
      
      Roughly 10% cut off lmbench fork numbers on my 2*HT*P4.  Must confess my
      testing failed to show the races even while they were knowingly exposed: would
      benefit from testing on racier equipment.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      edcc56dc
    • Hugh Dickins's avatar
      [PATCH] rmaplock: PageAnon in mapping · 6f055bc1
      Hugh Dickins authored
      First of a batch of five patches to eliminate rmap's page_map_lock, replace
      its trylocking by spinlocking, and use anon_vma to speed up swapoff.
      
      Patches updated from the originals against 2.6.7-mm7: nothing new so I won't
      spam the list, but including Manfred's SLAB_DESTROY_BY_RCU fixes, and omitting
      the unuse_process mmap_sem fix already in 2.6.8-rc3.
      
      
      This patch:
      
      Replace the PG_anon page->flags bit by setting the lower bit of the pointer in
      page->mapping when it's anon_vma: PAGE_MAPPING_ANON bit.
      
      We're about to eliminate the locking which kept the flags and mapping in
      synch: it's much easier to work on a local copy of page->mapping, than worry
      about whether flags and mapping are in synch (though I imagine it could be
      done, at greater cost, with some barriers).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6f055bc1
    • Roger Luethi's avatar
      [PATCH] Fix /proc/pid/statm documentation · 52ad51e6
      Roger Luethi authored
      I really wanted /proc/pid/statm to die and I still believe the
      reasoning is valid.  As it doesn't look like that is going to happen,
      though, I offer this fix for the respective documentation.  Note: lrs/drs
      fields are switched.
      Signed-off-by: default avatarRoger Luethi <rl@hellgate.ch>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      52ad51e6
    • Arjan van de Ven's avatar
      [PATCH] Automatically enable bigsmp on big HP machines · c178f392
      Arjan van de Ven authored
      This enables apic=bigsmp automatically on some big HP machines that need
      it.  This makes them boot without kernel parameters on a generic arch
      kernel.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c178f392
    • William Lee Irwin III's avatar
      [PATCH] ia64: dma_mapping fix · a6843b89
      William Lee Irwin III authored
      We need to be able to dereference struct device in
      include/asm-ia64/dma-mapping.h.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a6843b89
    • Andi Kleen's avatar
      [PATCH] md: make MD no device warning KERN_WARNING · d033bbf5
      Andi Kleen authored
      Prevents some noise during boot up when no MD volumes are found.
      
      I think I picked it up from someone else, but I cannot remember from whom
      (sorry)
      
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d033bbf5
    • Pete Zaitcev's avatar
      [PATCH] Make MAX_INIT_ARGS 32 · 54d68822
      Pete Zaitcev authored
      We at Red Hat shipped a larger number of arguments for quite some time, it
      was required for installations on IBM mainframe (s390), which doesn't have
      a good way to pass arguments.
      
      There are a number of reasonable situations that go past the current limits
      of 8.  One that comes to mind is when you want to perform a manual vnc
      install on a headless machine using anaconda.  This requires passing in a
      number of parameters to get anaconda past the initial (no-gui) loader
      screens.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      54d68822
    • Suparna Bhattacharya's avatar
      [PATCH] AIO: workqueue context switch reduction · e84e486c
      Suparna Bhattacharya authored
      From: Chris Mason
      
      I compared the 2.6 pipetest results with the 2.4 suse kernel, and 2.6 was
      roughly 40% slower.  During the pipetest run, 2.6 generates ~600,000
      context switches per second while 2.4 generates 30 or so.
      
      aio-context-switch (attached) has a few changes that reduces our context
      switch rate, and bring performance back up to 2.4 levels.  These have only
      really been tested against pipetest, they might make other workloads worse.
      
      The basic theory behind the patch is that it is better for the userland
      process to call run_iocbs than it is to schedule away and let the worker
      thread do it.
      
                                                                                    
      1) on io_submit, use run_iocbs instead of run_iocb
      2) on io_getevents, call run_iocbs if no events were available.
      
      3) don't let two procs call run_iocbs for the same context at the same
         time.  They just end up bouncing on spinlocks.
      
      The first three optimizations got me down to 360,000 context switches per
      second, and they help build a little structure to allow optimization #4,
      which uses queue_delayed_work(HZ/10) instead of queue_work. 
      
      That brings down the number of context switches to 2.4 levels.
      
      Adds aio_run_all_iocbs so that normal processes can run all the pending
      retries on the run list.  This allows worker threads to keep using list
      splicing, but regular procs get to run the list until it stays empty.  The
      end result should be less work for the worker threads.
      
      I was able to trigger short stalls (1sec) with aio-stress, and with the
      current patch they are gone.  Could be wishful thinking on my part though,
      please let me know how this works for you.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e84e486c
    • Suparna Bhattacharya's avatar
      [PATCH] AIO: Splice runlist for fairness across io contexts · 068b52c1
      Suparna Bhattacharya authored
      This patch tries be a little fairer across multiple io contexts in handling
      retries, helping make sure progress happens uniformly across different io
      contexts (especially if they are acting on independent queues).
      
      It splices the ioctx runlist before processing it in __aio_run_iocbs.  If
      new iocbs get added to the ctx in meantime, it queues a fresh workqueue
      entry instead of handling them righaway, so that other ioctxs' retries get
      a chance to be processed before the newer entries in the queue.
      
      This might make a difference in a situation where retries are getting
      queued very fast on one ioctx, while the workqueue entry for another ioctx
      is stuck behind it.  I've only seen this occasionally earlier and can't
      recreate it consistently, but may be worth including anyway.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      068b52c1
    • Suparna Bhattacharya's avatar
      [PATCH] AIO: retry infrastructure fixes and enhancements · 63b05203
      Suparna Bhattacharya authored
      From: Daniel McNeil <daniel@osdl.org>
      From: Chris Mason <mason@suse.com>
      
       AIO: retry infrastructure fixes and enhancements
      
       Reorganises, comments and fixes the AIO retry logic. Fixes 
       and enhancements include:
      
         - Split iocb setup and execution in io_submit
              (also fixes io_submit error reporting)
         - Use aio workqueue instead of keventd for retries
         - Default high level retry methods
         - Subtle use_mm/unuse_mm fix
         - Code commenting
         - Fix aio process hang on EINVAL (Daniel McNeil)
         - Hold the context lock across unuse_mm
         - Acquire task_lock in use_mm()
         - Allow fops to override the retry method with their own
         - Elevated ref count for AIO retries (Daniel McNeil)
         - set_fs needed when calling use_mm
         - Flush workqueue on __put_ioctx (Chris Mason)
         - Fix io_cancel to work with retries (Chris Mason)
         - Read-immediate option for socket/pipe retry support
      
       Note on default high-level retry methods support
       ================================================
      
       High-level retry methods allows an AIO request to be executed as a series of
       non-blocking iterations, where each iteration retries the remaining part of
       the request from where the last iteration left off, by reissuing the
       corresponding AIO fop routine with modified arguments representing the
       remaining I/O.  The retries are "kicked" via the AIO waitqueue callback
       aio_wake_function() which replaces the default wait queue entry used for
       blocking waits.
      
       The high level retry infrastructure is responsible for running the
       iterations in the mm context (address space) of the caller, and ensures that
       only one retry instance is active at a given time, thus relieving the fops
       themselves from having to deal with potential races of that sort.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      63b05203
    • Bjorn Helgaas's avatar
      [PATCH] cpqfc: add missing pci_enable_device() · 86b9159a
      Bjorn Helgaas authored
      Add pci_enable_device()/pci_disable_device().  In the past, drivers
      often worked without this, but it is now required in order to route
      PCI interrupts correctly.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      86b9159a
    • Bjorn Helgaas's avatar
      [PATCH] de4x5.c: add missing pci_enable_device() · 0ad8ac84
      Bjorn Helgaas authored
      Add pci_enable_device()/pci_disable_device().  In the past, drivers
      often worked without this, but it is now required in order to route
      PCI interrupts correctly.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0ad8ac84
    • Bjorn Helgaas's avatar
      [PATCH] ioc3-eth.c: add missing pci_enable_device() · aa22e9a9
      Bjorn Helgaas authored
      Add pci_enable_device()/pci_disable_device().  In the past, drivers often
      worked without this, but it is now required in order to route PCI interrupts
      correctly.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      aa22e9a9
    • Bjorn Helgaas's avatar
      [PATCH] hp100.c: add missing pci_enable_device() · 41b3f604
      Bjorn Helgaas authored
      Add pci_enable_device()/pci_disable_device().  In the past, drivers often
      worked without this, but it is now required in order to route PCI interrupts
      correctly.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      41b3f604
    • Bjorn Helgaas's avatar
      [PATCH] ibmasm: add missing pci_enable_device() · 94ae67e9
      Bjorn Helgaas authored
      Add pci_enable_device()/pci_disable_device().  In the past, drivers often
      worked without this, but it is now required in order to route PCI
      interrupts correctly.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      94ae67e9
    • Bjorn Helgaas's avatar
      [PATCH] tpam_main.c: add missing pci_enable_device() · f09e59b4
      Bjorn Helgaas authored
      Add pci_enable_device()/pci_disable_device().  In the past, drivers
      often worked without this, but it is now required in order to route
      PCI interrupts correctly.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f09e59b4
    • Bjorn Helgaas's avatar
      [PATCH] ip2main.c: add missing pci_enable_device() · dcd769e6
      Bjorn Helgaas authored
      I don't have this hardware, so this has been compiled but not tested.
      
      Add pci_enable_device()/pci_disable_device In the past, drivers often worked
      without this, but it is now required in order to route PCI interrupts
      correctly.  In addition, this driver incorrectly used the IRQ value from PCI
      config space rather than the one in the struct pci_dev.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      dcd769e6
    • Bjorn Helgaas's avatar
      [PATCH] idt77252.c: add missing pci_enable_device() · fac6ab89
      Bjorn Helgaas authored
      Add pci_enable_device()/pci_disable_device().  In the past, drivers often
      worked without this, but it is now required in order to route PCI
      interrupts correctly.
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fac6ab89
    • Dave Hansen's avatar
      [PATCH] don't pass mem_map into init functions · 01af8988
      Dave Hansen authored
        When using CONFIG_NONLINEAR, a zone's mem_map isn't contiguous, and isn't
        allocated in the same place.  This means that nonlinear doesn't really have
        a mem_map[] to pass into free_area_init_node() or memmap_init_zone() which
        makes any sense.
      
        So, this patch removes the 'struct page *mem_map' argument to both of
        those functions.  All non-NUMA architectures just pass a NULL in there,
        which is ignored.  The solution on the NUMA arches is to pass the mem_map in
        via the pgdat, which works just fine.
      
        To replace the removed arguments, a call to pfn_to_page(node_start_pfn) is
        made.  This is valid because all of the pfn_to_page() implementations rely
        only on the pgdats, which are already set up at this time.  Plus, the
        pfn_to_page() method should work for any future nonlinear-type code.  
      
        Finally, the patch creates a function: node_alloc_mem_map(), which I plan
        to effectively #ifdef out for nonlinear at some future date. 
      
        Compile tested and booted on SMP x86, NUMAQ, and ppc64.
      
      From: Jesse Barnes <jbarnes@engr.sgi.com>
      
        Fix up ia64 specific memory map init function in light of Dave's
        memmap_init cleanups.
      Signed-off-by: default avatarJesse Barnes <jbarnes@sgi.com>
      
      From: Dave Hansen <haveblue@us.ibm.com>
      
        Looks like I missed a couple of architectures.  This patch, on top of my
        previous one and Jesse's should clean up the rest.
      
      From: William Lee Irwin III <wli@holomorphy.com>
      
        x86-64 wouldn't compile with NUMA support on, as node_alloc_mem_map()
        references mem_map outside #ifdefs on CONFIG_NUMA/CONFIG_DISCONTIGMEM.  This
        patch wraps that reference in such an #ifdef.
      
      From: William Lee Irwin III <wli@holomorphy.com>
      
        Initializing NODE_DATA(nid)->node_mem_map prior to calling it should do.
      
      From: Dave Hansen <haveblue@us.ibm.com>
      
        Rick, I bet you didn't think your nerf weapons would be so effective in
        getting that compile error fixed, did you?
      
        Applying the attached patch and commenting out this line:
      
        arch/i386/kernel/nmi.c: In function `proc_unknown_nmi_panic':
        arch/i386/kernel/nmi.c:558: too few arguments to function `proc_dointvec'
      
        will let it compile.  
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      01af8988
    • Guillaume Thouvenin's avatar
      [PATCH] watchdog: fix warning "defined but not used" · 14297505
      Guillaume Thouvenin authored
      Function wdtpci_init_one() in file wdt_pci.c generates a warning when
      compiling the watchdog driver.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      14297505
    • William Lee Irwin III's avatar
      [PATCH] first/next_cpu returns values > NR_CPUS · f9bbb9e8
      William Lee Irwin III authored
      Zwane Mwaikambo <zwane@fsmlabs.com> wrote:
      
        The following caused some fireworks whilst merging i386 cpu hotplug.
        any_online_cpu(0x2) returns 32 on i386 if we're forced to continue past the
        only set bit due to the additional find_first_bit in the find_next_bit i386
        implementation.  Not wanting to change current behaviour in the bitops
        primitives and since the NR_CPUS thing is a cpumask issue, i've opted to fix
        next_cpu() and first_cpu() instead.
      
      This might save a couple of lines of code.
      
      From: <akpm@osdl.org>
      
        Fix cross-arch ulong/int disaster with find_next_bit().
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f9bbb9e8
    • Andi Kleen's avatar
      [PATCH] New x86-64 merge · 1a87fc37
      Andi Kleen authored
      This fixes various issues in the previous update, in particular
      a kernel without CONFIG_GART_IOMMU should boot now again,
      
      The kernel discoverys PCI BUS<->CPU affinity on AMD systems
      now.  It is so far used by dma_alloc_coherent to allocate memory
      Experimental patches to add this to sysfs exist, but they're not
      included yet. On systems with no memory on a CPU this information may
      be wrong.
      
      It has a new experimental CONFIG_UNORDERED_IO option. When enabled
      it uses write combining for stores to device iomemory mapping. This
      may give better performance with some device drivers, but has a slight
      risk of breaking drivers (in general if a driver works on ia64,ppc64,sparc64
      it should also work). Based on some discussions with Grant Grundler.
      
      It requires the driver to use memory barriers properly. I would be interested 
      in feedback on any performance changes you're seeing. For a production system I
      would recommend to keep it turned off(although I run it on all my systems and 
      haven't run into any problems yet)
      
      ACPI and Centrino speedstep is enabled now for Nocona systems.
      
      The IOMMU code does lazy merging by default now, which should be safe
      and may increase performance on block IO.  It also avoids SAC force by default
      now.
      
      The machine check code has been improved again, hopefully it is good 
      now. It will log now machine check events from before the last reset.
      And various other fixes.
      
      The x86-64 parts are now gcc 3.5 clean.
      
      And various other fixes
      
      - Update defconfig
      - Reset lost ticks on lost time warning, print RIP.
      - Make TASK_SIZE test for 32bit (Arjan van de Ven) 
      - Work around bug in generic code that broke pcibus_to_cpumask
      - Actually fix dummy iommu code
      - Compile i386 acpi and speedstep-centrino cpufreq modules
      - Export cpu_khz
      - Fix compilation without GART_IOMMU
      - Optimize find_*_bit functions for small fields
      - Discover nodes near PCI busses on K8 (Travis Betak, changed by me) 
      - Optimize gart tlb flush slightly
      - Add experimental CONFIG_UNORDERED_IO for unordered IO stores
      - Add 32bit emulation for PTRACE_GETEVENTMSG
      - Fix kernel_fpu_{begin,end} for preemptive kernels (Alexander Nyberg)
      - Readd proper check for biomerge (got lost) 
      - Set up 32bit vsyscall page for ptrace early
      - Add 32bit emulation for lookup_dcookie() for oprofile
      - Export copy_page / clear_page
      - Use rex prefix in save_init_fpu fxsave (Jan Beulich)
      - Make it compile again
      - Fix handling of hwdev == NULL (= ISA/LPC devices) in swiotlb
      - Convert PCI DMA code to dma devices
      - Change IOMMU code to use dummy fallback device instead of hardcoded
        NULL tests everywhere.
      - Test iommu_sac_force instead of nommu for DAC supported macro
        (will cause more drivers to use DAC)
      - Harden non IOMMU dma_alloc_consistent code to fail less likely.
      - Remove use of strsep in option parsers
      - Remove duplicated exports (Arjan van der Ven) 
      - Fix EFAULT checking in ptrace (John Blackwood)
      - Update defconfig
      - Remove dead URL from boot/setup.S (R.J. Wysocki) 
      - Use compat_sigval_t instead of sigval_t32 (Al Viro)
      - Nanooptimization in 32bit ptregs calls
      - Fix gcc 3.5 compilation in mtrr.h 
      - Pass pt_regs as pointer to avoid illegal pass by reference (for gcc 3.5)
      - Make set_bit take int not long (Harald Dunkel)
      - Avoid panic on pci_map_sg and pci_alloc_consistent overflow in GART IOMMU
      - Handle large lost time delays in HPET code (Suresh B. Siddha)
      - Work around theoretical bugs in prefetch handling (suggested by Jamie Lokier)
      - Remove mtrr_strings declaration for gcc 3.5
      - Set KBUILD_IMAGE for make rpm (William Lee Irwin III)
      - Add iommu=noaperture to not touch the aperture
      - Clean up argument parsing for iommu= option
      - Export symbols for xchgadd based rwsems (still disabled)
      - Define iommu_bio_merge for !CONFIG_GART_IOMMU
      - Don't use backwards rep ; movsb for memmove
      - Out line bitmap search functions (saves 8k .text, from i386) 
      - Convert bitmap search functions to 64bit accesses and optimize them
        a bit.
      - Handle corrupted page tables in page fault handler
      - Set iommu_merge (without force) to on by default again.
      - Don't do bio merging by default for iommu=merge. This should make it
        safe to use again
      - Add iommu=biomerge option to enable BIO merging (like old iommu=merge)
      - Fix iommu=memaper=... parsing
      - More MCE fixes (based on a patch by Eric Morton, heavily changed by me)
      - Fix check for banks causing exceptions
      - Allow to reinit MCEs later even after mce=off, fix wrong
        use of __initdata
        to disable at boot, but reenable later.
      - Log left over machine checks after boot and resume
      - Fix missing prototype warning with CPU_FREQ on
      - Fix parsing of noexec=on (Ian Hastie)
      - Fix warning in ia32_binfmt.c
      - Resync time variable cpu frequency handling with i386
      - Resync msr.c with i386
      - Add 0x60 level 1 intel cache descriptor (from i386)
      - Remove duplicated 32bit ioctls (Arnd Bergmann)
      - Enable -msoft-float (from i386)
      - Use faster version of FPU hang fix - handle the exception
        * a bit experimental, if you see "kernel ... math error" events
          in the log please report.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1a87fc37
    • Adam Kropelin's avatar
      [PATCH] preset loops_per_jiffy for faster booting · b1541be9
      Adam Kropelin authored
      Adds a kernel boot parameter "lpj=NNN" which allows the operator to specify
      the loops-per-jiffy value.  This shaves up to a quarter of a second off
      boot times, which are critical for embedded appliances.
      
      It's a bit thin, but the code is in __init.
      Signed-off-by: default avatarAdam Kropelin <akropel1@rochester.rr.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b1541be9
    • Mika Kukkonen's avatar
      [PATCH] Fix drivers/isdn/hisax/avm_pci.c build warning when !CONFIG_ISAPNP · f2c2e878
      Mika Kukkonen authored
        CC [M]  drivers/isdn/hisax/avm_pci.o
      drivers/isdn/hisax/avm_pci.c: In function `setup_avm_pcipnp':
      drivers/isdn/hisax/avm_pci.c:817: warning: label `ready' defined but not used
      
      Patch is big because I replaced the '} else { ...  }' with 'goto ready; }'
      and so had to remove one level of indentation from code.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f2c2e878
    • Jeff Dike's avatar
      [PATCH] Make UML build and run · 51b9cfe9
      Jeff Dike authored
      This patch includes the following -
      	updated defconfig
      	move uml.lds.S and main.c from arch/um to arch/um/kernel per Sam's suggestions
      	steal bitops.c from arch/i386
      	convert all calls to open_private_file to dentry_open
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      51b9cfe9
    • Jeff Dike's avatar
      [PATCH] UML fixes · 7420e97c
      Jeff Dike authored
      The patch below fixes a few UML-specific bugs not related to the rest of the
      kernel
      	a bogus error return and some formatting in the fork code
      	correct calculation of task.thread.kernel_stack
      	remove a bogus panic
      	a couple of fixes to allow UML to boot in the presence of exec-shield
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7420e97c
    • Jeff Dike's avatar
      [PATCH] UML updates · 04fa5a56
      Jeff Dike authored
      The patch below brings UML up to date with interface changes and the like
      	irq.c includes profile.h to bring in a missing definition
      	use the cpu_{set,clear} interface
      	use the new get_signal_to_deliver interface
      	define instruction_pointer
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      04fa5a56
    • Coywolf Qi Hunt's avatar
      [PATCH] uml: remove a group of unused bh functions · 51461bbe
      Coywolf Qi Hunt authored
      This patch removes a group of unused bh functions in um.  This 2.2 legacy
      code should be cleaned up.
      Signed-off-by: default avatarCoywolf Qi Hunt <coywolf@greatcn.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      51461bbe
    • Paolo \'Blaisorblade\' Giarrusso's avatar
      [PATCH] uml: Fix os_process_pc and os_process_parent for corner cases. · 0d59a6c4
      Paolo \'Blaisorblade\' Giarrusso authored
      Update os_process_pc and os_process_parent: now a PID can be > 32768 (so
      increase number of digits) and make it work even with spaces in the command
      name.
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0d59a6c4
    • Paolo \'Blaisorblade\' Giarrusso's avatar
      [PATCH] uml: little-kmalloc · a15968fe
      Paolo \'Blaisorblade\' Giarrusso authored
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a15968fe
    • Paolo \'Blaisorblade\' Giarrusso's avatar
      [PATCH] uml: Make malloc() call vmalloc if needed. Needed for hostfs on 2.6 host. · fbb214aa
      Paolo \'Blaisorblade\' Giarrusso authored
      From: Oleg Drokin <green@linuxhacker.ru>, Jeff Dike <jdike@addtoit.com>, and
      me
      
      If size > 128K, with this patch malloc will call vmalloc; free will detect
      whether to call vfree or kfree or __real_free().  The 2.4 version could forget
      free()ing something; this has been fixed.
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fbb214aa
    • Paolo \'Blaisorblade\' Giarrusso's avatar
      [PATCH] uml: Removes dead code in trap_kern.c · d32870ad
      Paolo \'Blaisorblade\' Giarrusso authored
      That code comes from the out_of_memory section; in 2.4 it was correct to put
      it for "default:", since it was called when handle_mm_fault() return value was
      != 0, 1, 2, i.e.  it was 3, OOM (but the i386 code put it out of line, for
      better performance).  Here, instead, the OOM case is handled on its own, so if
      handle_mm_fault() != from the listed cases we must BUG().
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d32870ad
    • Paolo \'Blaisorblade\' Giarrusso's avatar
      [PATCH] uml: Avoids a panic for a legal situation · 3b2dcf38
      Paolo \'Blaisorblade\' Giarrusso authored
      From: Alex Züpke <azu@sysgo.de>, and me
      
      SKAS mode is like 4G/4G (here we have actually 3G/3G) for guest processes, so
      when checking for kernel stack overflow, we must first make sure we are
      checking a kernel-space address.  Also, correctly test for stack overflows
      (i.e.  check if there is less than 1k of stack left; see
      arch/i386/kernel/irq.c:do_IRQ()).  And also, THREAD_SIZE != PAGE_SIZE * 2, in
      general (though this setting is almost never changed, so we didn't notice
      this1).  Thanks to the good eye of Alex Züpke <azu@sysgo.de> for first seeing
      this bug, and providing a test program:
      
      /*
       * trigger.c - triggers panic("Kernel stack overflow") in UML
       *
       * 20040630, azu@sysgo.de
       */
      
      #include <stdio.h>
      #include <setjmp.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <sys/mman.h>
      
      #define LOW  0xa0000000
      #define HIGH 0xb0000000
      
      int main(int argc, char **argv)
      {
      	unsigned long addr;
      	int fd;
      
      	fd = open("/dev/zero", O_RDWR);
      
      	printf("This may take some time ... one more cup of coffee ...\n");
      
      	for(addr = LOW; addr < HIGH; addr += 0x1000)
      	{
      		pid_t p;
      		if(mmap((void*)addr, 0x1000, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0) == MAP_FAILED)
      			printf("mmap failed\n");
      
      		p = fork();
      		if(p == -1)
      			printf("fork failed\n");
      
      		if(p == 0)
      		{
      			/* child context */
      			int *p = (int *)addr;
      			volatile int x;
      
      			x = *p;
      			return 0;
      		}
      		/* father context */
      		waitpid(p, 0, 0);
      
      		if(munmap((void*)addr, 0x1000) == -1)
      			printf("munmap failed\n");
      	}
      
      	close(fd);
      	printf("done\n");
      }
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3b2dcf38
    • Paolo \'Blaisorblade\' Giarrusso's avatar
      [PATCH] uml: Adds some exports · 128995d8
      Paolo \'Blaisorblade\' Giarrusso authored
      Adds some exports
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      128995d8
    • Paolo \'Blaisorblade\' Giarrusso's avatar
      [PATCH] uml: Handles correctly errno == EINTR in lots of places. · 65894e60
      Paolo \'Blaisorblade\' Giarrusso authored
      On various places (mostly waitpid() calls) this patch makes sure that if errno
      == EINTR on return, then the syscall is endlessly retried.  It also defines a
      simple generic way to do this.
      
      Signed-off-by: <blaisorblade_spam@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      65894e60