1. 15 Feb, 2006 22 commits
    • Albert D. Cahalan's avatar
      [PATCH] x86: document sysenter path · 581141cb
      Albert D. Cahalan authored
      This path isn't obvious.  It looks as if the kernel will be taking three
      args from the user stack, but it only takes one from there.
      Signed-off-by: default avatarAlbert Cahalan <acahalan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      581141cb
    • David Howells's avatar
      [PATCH] FRV: Use virtual interrupt disablement · 28baebae
      David Howells authored
      Make the FRV arch use virtual interrupt disablement because accesses to the
      processor status register (PSR) are relatively slow and because we will
      soon have the need to deal with multiple interrupt controls at the same
      time (separate h/w and inter-core interrupts).
      
      The way this is done is to dedicate one of the four integer condition code
      registers (ICC2) to maintaining a virtual interrupt disablement state
      whilst inside the kernel.  This uses the ICC2.Z flag (Zero) to indicate
      whether the interrupts are virtually disabled and the ICC2.C flag (Carry)
      to indicate whether the interrupts are physically disabled.
      
      ICC2.Z is set to indicate interrupts are virtually disabled.  ICC2.C is set
      to indicate interrupts are physically enabled.  Under normal running
      conditions Z==0 and C==1.
      
      Disabling interrupts with local_irq_disable() doesn't then actually
      physically disable interrupts - it merely sets ICC2.Z to 1.  Should an
      interrupt then happen, the exception prologue will note ICC2.Z is set and
      branch out of line using one instruction (an unlikely BEQ).  Here it will
      physically disable interrupts and clear ICC2.C.
      
      When it comes time to enable interrupts (local_irq_enable()), this simply
      clears the ICC2.Z flag and invokes a trap #2 if both Z and C flags are
      clear (the HI integer condition).  This can be done with the TIHI
      conditional trap instruction.
      
      The trap then physically reenables interrupts and sets ICC2.C again.  Upon
      returning the interrupt will be taken as interrupts will then be enabled.
      Note that whilst processing the trap, the whole exceptions system is
      disabled, and so an interrupt can't happen till it returns.
      
      If no pending interrupt had happened, ICC2.C would still be set, the HI
      condition would not be fulfilled, and no trap will happen.
      
      Saving interrupts (local_irq_save) is simply a matter of pulling the ICC2.Z
      flag out of the CCR register, shifting it down and masking it off.  This
      gives a result of 0 if interrupts were enabled and 1 if they weren't.
      
      Restoring interrupts (local_irq_restore) is then a matter of taking the
      saved value mentioned previously and XOR'ing it against 1.  If it was one,
      the result will be zero, and if it was zero the result will be non-zero.
      This result is then used to affect the ICC2.Z flag directly (it is a
      condition code flag after all).  An XOR instruction does not affect the
      Carry flag, and so that bit of state is unchanged.  The two flags can then
      be sampled to see if they're both zero using the trap (TIHI) as for the
      unconditional reenablement (local_irq_enable).
      
      This patch also:
      
       (1) Modifies the debugging stub (break.S) to handle single-stepping crossing
           into the trap #2 handler and into virtually disabled interrupts.
      
       (2) Removes superseded fixup pointers from the second instructions in the trap
           tables (there's no a separate fixup table for this).
      
       (3) Declares the trap #3 vector for use in .org directives in the trap table.
      
       (4) Moves irq_enter() and irq_exit() in do_IRQ() to avoid problems with
           virtual interrupt handling, and removes the duplicate code that has now
           been folded into irq_exit() (softirq and preemption handling).
      
       (5) Tells the compiler in the arch Makefile that ICC2 is now reserved.
      
       (6) Documents the in-kernel ABI, including the virtual interrupts.
      
       (7) Renames the old irq management functions to different names.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      28baebae
    • David Howells's avatar
      [PATCH] FRV: Miscellaneous fixes · 68f624fc
      David Howells authored
      Make various alterations and fixes to the FRV arch:
      
       (1) Resyncs the FRV system call collection with the i386 arch.
      
       (2) Discards __iounmap() as it's not used.
      
       (3) Fixes the use of the SWAP/SWAPI instruction to get the arguments the right
           way around in atomic.h, and also to get the asm constraints correct.
      
       (4) Moves copy_to/from_user_page() to asm/cacheflush.h to be consistent with
           other archs.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      68f624fc
    • Ingo Molnar's avatar
      [PATCH] hrtimer: round up relative start time on low-res arches · 06027bdd
      Ingo Molnar authored
      CONFIG_TIME_LOW_RES is a temporary way for architectures to signal that
      they simply return xtime in do_gettimeoffset().  In this corner-case we
      want to round up by resolution when starting a relative timer, to avoid
      short timeouts.  This will go away with the GTOD framework.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      06027bdd
    • Heiko Carstens's avatar
      [PATCH] s390: fix __delay implementation · e35a6619
      Heiko Carstens authored
      Fix __delay implementation.  Called with an argument "1" or "0" it would
      loop nearly forever (since (1/2)-1 = 0xffffffff).
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e35a6619
    • Adrian Bunk's avatar
      [PATCH] fix a typo in the CPU_H8300H dependencies · 5a1342f7
      Adrian Bunk authored
      Jean-Luc Leger <reiga@dspnet.fr.eu.org> found this obvious typo.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5a1342f7
    • Chen, Kenneth W's avatar
      [PATCH] sched: revert "filter affine wakeups" · d6077cb8
      Chen, Kenneth W authored
      Revert commit d7102e95:
      
          [PATCH] sched: filter affine wakeups
      
      Apparently caused more than 10% performance regression for aim7 benchmark.
      The setup in use is 16-cpu HP rx8620, 64Gb of memory and 12 MSA1000s with 144
      disks.  Each disk is 72Gb with a single ext3 filesystem (courtesy of HP, who
      supplied benchmark results).
      
      The problem is, for aim7, the wake-up pattern is random, but it still needs
      load balancing action in the wake-up path to achieve best performance.  With
      the above commit, lack of load balancing hurts that workload.
      
      However, for workloads like database transaction processing, the requirement
      is exactly opposite.  In the wake up path, best performance is achieved with
      absolutely zero load balancing.  We simply wake up the process on the CPU that
      it was previously run.  Worst performance is obtained when we do load
      balancing at wake up.
      
      There isn't an easy way to auto detect the workload characteristics.  Ingo's
      earlier patch that detects idle CPU and decide whether to load balance or not
      doesn't perform with aim7 either since all CPUs are busy (it causes even
      bigger perf.  regression).
      
      Revert commit d7102e95, which causes more
      than 10% performance regression with aim7.
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d6077cb8
    • Michael S. Tsirkin's avatar
      [PATCH] madvise MADV_DONTFORK/MADV_DOFORK · f8225661
      Michael S. Tsirkin authored
      Currently, copy-on-write may change the physical address of a page even if the
      user requested that the page is pinned in memory (either by mlock or by
      get_user_pages).  This happens if the process forks meanwhile, and the parent
      writes to that page.  As a result, the page is orphaned: in case of
      get_user_pages, the application will never see any data hardware DMA's into
      this page after the COW.  In case of mlock'd memory, the parent is not getting
      the realtime/security benefits of mlock.
      
      In particular, this affects the Infiniband modules which do DMA from and into
      user pages all the time.
      
      This patch adds madvise options to control whether memory range is inherited
      across fork.  Useful e.g.  for when hardware is doing DMA from/into these
      pages.  Could also be useful to an application wanting to speed up its forks
      by cutting large areas out of consideration.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@mellanox.co.il>
      Acked-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f8225661
    • Jim Keniston's avatar
      [PATCH] kprobes: Update Documentation/kprobes.txt · 8861da31
      Jim Keniston authored
      Update Documentation/kprobes.txt to reflect Kprobes enhancements and other
      recent developments.
      Acked-by: default avatarAnanth Mavinakayanahalli <mananth@in.ibm.com>
      Signed-off-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8861da31
    • Karsten Keil's avatar
      [PATCH] Fix NULL pointer dereference in isdn_tty_at_cout · 61b9a26a
      Karsten Keil authored
      The changes in the tty related code introduced wrong parenthesis in a if
      condition in the isdn_tty_at_cout function.  This caused access to index -1
      in the dev->drv[] array.  This patch change it back to the correct
      condition from the previous versions.
      Signed-off-by: default avatarKarsten Keil <kkeil@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      61b9a26a
    • James Bottomley's avatar
      [PATCH] fix x86 topology export in sysfs for subarchitectures · 8b09fb34
      James Bottomley authored
      The correct way to export hyperthreading based functions is to predicate
      them on CONFIG_X86_HT.  Without this, the topology exporting patch breaks
      the build on all non-PC x86 subarchitectures.
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8b09fb34
    • Trond Myklebust's avatar
      [PATCH] NLM: Fix the NLM_GRANTED callback checks · 5ac5f9d1
      Trond Myklebust authored
      If 2 threads attached to the same process are blocking on different locks on
      different files (maybe even on different servers) but have the same lock
      arguments (i.e.  same offset+length - actually quite common, since most
      processes try to lock the entire file) then the first GRANTED call that wakes
      one up will also wake the other.
      
      Currently when the NLM_GRANTED callback comes in, lockd walks the list of
      blocked locks in search of a match to the lock that the NLM server has
      granted.  Although it checks the lock pid, start and end, it fails to check
      the filehandle and the server address.
      
      By checking the filehandle and server IP address, we ensure that this only
      happens if the locks truly are referencing the same file.
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5ac5f9d1
    • Mark Fasheh's avatar
      [PATCH] jbd: revert checkpoint list changes · 7c8903f6
      Mark Fasheh authored
      This patch reverts commit f93ea411:
        [PATCH] jbd: split checkpoint lists
      
      This broke journal_flush() for OCFS2, which is its method of being sure
      that metadata is sent to disk for another node.
      
      And two related commits 8d3c7fce and
      43c3e6f5 with the subjects:
        [PATCH] jbd: log_do_checkpoint fix
        [PATCH] jbd: remove_transaction fix
      
      These seem to be incremental bugfixes on the original patch and as such are
      no longer needed.
      Signed-off-by: default avatarMark Fasheh <mark.fasheh@oracle.com>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7c8903f6
    • Bjorn Helgaas's avatar
      [PATCH] HPET: handle multiple ACPI EXTENDED_IRQ resources · be5efffb
      Bjorn Helgaas authored
      When the _CRS for a single HPET contains multiple EXTENDED_IRQ resources,
      we overwrote hdp->hd_nirqs every time we found one.
      
      So the driver worked when all the IRQs were described in a single
      EXTENDED_IRQ resource, but failed when multiple resources were used.
      (Strictly speaking, I think the latter is actually more correct, but both
      styles have been used.)
      
      Someday we should remove all the ACPI stuff from hpet.c and use PNP driver
      registration instead.  But currently PNP_MAX_IRQ is 2, and HPETs often have
      more IRQs.  Hint, hint, Adam :-)
      Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Acked-by: default avatarBob Picco <robert.picco@hp.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Adam Belay <ambx1@neo.rr.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      be5efffb
    • Paul Fulghum's avatar
      [PATCH] tty reference count fix · da965822
      Paul Fulghum authored
      Fix hole where tty structure can be released when reference count is non
      zero.  Existing code can sleep without tty_sem protection between deciding
      to release the tty structure (setting local variables tty_closing and
      otty_closing) and setting TTY_CLOSING to prevent further opens.  An open
      can occur during this interval causing release_dev() to free the tty
      structure while it is still referenced.
      
      This should fix bugzilla.kernel.org [Bug 6041] New: Unable to handle kernel
      paging request
      
      In Bug 6041, tty_open() oopes on accessing the tty structure it has
      successfully claimed.  Bug was on SMP machine with the same tty being
      opened and closed by multiple processes, and DEBUG_PAGEALLOC enabled.
      Signed-off-by: default avatarPaul Fulghum <paulkf@microgate.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Jesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      da965822
    • Hugh Dickins's avatar
      [PATCH] compound page: no access_process_vm check · 16bf1348
      Hugh Dickins authored
      The PageCompound check before access_process_vm's set_page_dirty_lock is no
      longer necessary, so remove it.  But leave the PageCompound checks in
      bio_set_pages_dirty, dio_bio_complete and nfs_free_user_pages: at least some
      of those were introduced as a little optimization on hugetlb pages.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      16bf1348
    • Hugh Dickins's avatar
      [PATCH] compound page: default destructor · d98c7a09
      Hugh Dickins authored
      Somehow I imagined that calling a NULL destructor would free a compound page
      rather than oopsing.  No, we must supply a default destructor, __free_pages_ok
      using the order noted by prep_compound_page.  hugetlb can still replace this
      as before with its own free_huge_page pointer.
      
      The case that needs this is not common: rarely does put_compound_page's
      put_page_testzero bring the count down to 0.  But if get_user_pages is applied
      to some part of a compound page, without immediate release (e.g.  AIO or
      Infiniband), then it's possible for its put_page to come after the containing
      vma has been unmapped and the driver done its free_pages.
      
      That's just the kind of case compound pages are supposed to be guarding
      against (but Nick points out, nor did PageReserved handle this right).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d98c7a09
    • Hugh Dickins's avatar
      [PATCH] compound page: use page[1].lru · 41d78ba5
      Hugh Dickins authored
      If a compound page has its own put_page_testzero destructor (the only current
      example is free_huge_page), that is noted in page[1].mapping of the compound
      page.  But that's rather a poor place to keep it: functions which call
      set_page_dirty_lock after get_user_pages (e.g.  Infiniband's
      __ib_umem_release) ought to be checking first, otherwise set_page_dirty is
      liable to crash on what's not the address of a struct address_space.
      
      And now I'm about to make that worse: it turns out that every compound page
      needs a destructor, so we can no longer rely on hugetlb pages going their own
      special way, to avoid further problems of page->mapping reuse.  For example,
      not many people know that: on 50% of i386 -Os builds, the first tail page of a
      compound page purports to be PageAnon (when its destructor has an odd
      address), which surprises page_add_file_rmap.
      
      Keep the compound page destructor in page[1].lru.next instead.  And to free up
      the common pairing of mapping and index, also move compound page order from
      index to lru.prev.  Slab reuses page->lru too: but if we ever need slab to use
      compound pages, it can easily stack its use above this.
      
      (akpm: decoded version of the above: the tail pages of a compound page now
      have ->mapping==NULL, so there's no need for the set_page_dirty[_lock]()
      caller to check that they're not compund pages before doing the dirty).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      41d78ba5
    • Peter Osterlund's avatar
      [PATCH] pktcdvd: Reduce stack usage · 72772323
      Peter Osterlund authored
      Reduce stack usage in the pkt_start_write() function.  Even though it's not
      currently a real problem, the pages and offsets arrays can be eliminated,
      which saves approximately 1000 bytes of stack space.
      Signed-off-by: default avatarPeter Osterlund <petero2@telia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      72772323
    • Peter Osterlund's avatar
      [PATCH] pktcdvd: Don't unlock the door if the disc is in use · 948423e5
      Peter Osterlund authored
      Unlocking the door when the disc is in use is obviously not good, because then
      it's possible to eject the disc at the wrong time and cause severe disc data
      corruption.
      Signed-off-by: default avatarPeter Osterlund <petero2@telia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      948423e5
    • Peter Osterlund's avatar
      [PATCH] pktcdvd: Allow non-writable media to be mounted · 01fd9fda
      Peter Osterlund authored
      If opening for write fails, the open method should return -EROFS.  This makes
      "mount" try again with a read-only mount, instead of just giving up.
      Signed-off-by: default avatarPeter Osterlund <petero2@telia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      01fd9fda
    • Peter Osterlund's avatar
      [PATCH] pktcdvd: Don't spam the kernel log when nothing is wrong · 61a34937
      Peter Osterlund authored
      Change some messages that don't indicate an error so that they are only
      printed when debugging is enabled.
      Signed-off-by: default avatarPeter Osterlund <petero2@telia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      61a34937
  2. 14 Feb, 2006 18 commits