1. 24 May, 2004 2 commits
    • Paul Mackerras's avatar
      [PATCH] ppc64: better stack traces · 14bc28ad
      Paul Mackerras authored
      This improves the stack traces we get on PPC64 by putting a marker in
      those stack frames that are created as a result of an interrupt or
      exception.  The marker is "regshere" (0x7265677368657265).
      
      With this, stack traces show where exceptions have occurred, which can
      be very useful.  This also improves the accuracy of the trace because
      the relevant return address can be in the link register at the time of
      the exception rather than on the stack.  We now print the PC and
      exception type for each exception frame, and then the link register if
      appropriate as the next item in the trace.
      14bc28ad
    • Linus Torvalds's avatar
      Merge bk://bk.arm.linux.org.uk/linux-2.6-rmk · e1ff5fe0
      Linus Torvalds authored
      into ppc970.osdl.org:/home/torvalds/v2.6/linux
      e1ff5fe0
  2. 25 May, 2004 1 commit
  3. 24 May, 2004 15 commits
  4. 23 May, 2004 18 commits
  5. 22 May, 2004 4 commits
    • Linus Torvalds's avatar
      Linux 2.6.7-rc1 · 86042707
      Linus Torvalds authored
      86042707
    • Roland McGrath's avatar
      [PATCH] bogus sigaltstack calls by rt_sigreturn · ce34221e
      Roland McGrath authored
      There is a longstanding bug in the rt_sigreturn system call.
      This exists in both 2.4 and 2.6, and for almost every platform.
      
      I am referring to this code in sys_rt_sigreturn (arch/i386/kernel/signal.c):
      
      	if (__copy_from_user(&st, &frame->uc.uc_stack, sizeof(st)))
      		goto badframe;
      	/* It is more difficult to avoid calling this function than to
      	   call it and ignore errors.  */
      	/*
      	 * THIS CANNOT WORK! "&st" is a kernel address, and "do_sigaltstack()"
      	 * takes a user address (and verifies that it is a user address). End
      	 * result: it does exactly _nothing_.
      	 */
      	do_sigaltstack(&st, NULL, regs->esp);
      
      As the comment says, this is bogus.  On vanilla i386 kernels, this is just
      harmlessly stupid--do_sigaltstack always does nothing and returns -EFAULT.
      
      However this code actually bites users on kernels using Ingo Molnar's 4G/4G
      address space layout changes.  There some kernel stack address might very
      well be a lovely and readable user address as well.  When that happens, we
      make a sigaltstack call with some random buffer, and then the fun begins.
      
      To my knowledge, this has produced trouble in the real world only for 4G
      i386 kernels (RHEL and Fedora "hugemem" kernels) on machines that actually
      have several GB of physical memory (and in programs that are actually using
      sigaltstack and handling a lot of signals).  However, the same clearly
      broken code has been blindly copied to most other architecture ports, and
      off hand I don't know the address space details of any other well enough to
      know if real kernel stack addresses and real user addresses are in fact
      disjoint as they are on i386 when not using the nonstandard 4GB address
      space layout.
      
      The obvious intent of the call being there in the first place is to permit
      a signal handler to diddle its ucontext_t.uc_stack before returning, and
      have this effect a sigaltstack call on the signal handler return.  This is
      not only an optimization vs doing the extra system call, but makes it
      possible to make a sigaltstack change when that handler itself was running
      on the signal stack.  AFAICT this has never actually worked before, so
      certainly noone depends on it.  But the code certainly suggests that
      someone intended at one time for that to be the behavior.  Thus I am
      inclined to fix it so it works in that way, though it has not done so before.
      It would also be reasonable enough to simply rip out the bogus call and not
      have this functionality.
      
      From the current state of code in both 2.4 and 2.6, there is no fathoming
      how this broken code came about.  It's actually much simpler to just make
      it work!  I can only presume that at some point in the past the sigaltstack
      implementation functions were different such that this made sense.  Of the
      few ports I've looked at briefly, only the ppc/pc64 porters (go paulus!)
      actually tried to understand what the i386 code was doing and implemented
      it correctly rather than just carefully transliterating the bug.
      
      The patch below fixes only the i386 and x86_64 versions.  The x86_64
      patches I have not actually tested.  I think each and every arch (except
      ppc and ppc64) need to make the corresponding fixes as well.  Note that
      there is a function to fix for each native arch, and then one for each
      emulation flavor.  The details differ minutely for getting the calls right
      in each emulation flavor, but I think that most or all of the arch's with
      biarch/emulation support have similar enough code that each emulation
      flavor's fix will look very much like the arch/x86_64/ia32/ia32_signal.c
      patch here.
      ce34221e
    • Andrew Morton's avatar
      [PATCH] partial prefetch for vma_prio_tree_next · ad9beb31
      Andrew Morton authored
      From: Rajesh Venkatasubramanian <vrajesh@umich.edu>
      
      This patch adds prefetches for walking a vm_set.list.  Adding prefetches
      for prio tree traversals is tricky and may lead to cache trashing.  So this
      patch just adds prefetches only when walking a vm_set.list.
      
      I haven't done any benchmarks to show that this patch improves performance.
       However, this patch should help to improve performance when vm_set.lists
      are long, e.g., libc.  Since we only prefetch vmas that are guaranteed to
      be used in the near future, this patch should not result in cache trashing,
      theoretically.
      
      I didn't add any NULL checks before prefetching because prefetch.h clearly
      says prefetch(0) is okay.
      ad9beb31
    • Andrew Morton's avatar
      [PATCH] rmap 40 better anon_vma sharing · 17e8935f
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      anon_vma rmap will always necessarily be more restrictive about vma merging
      than before: according to the history of the vmas in an mm, they are liable to
      be allocated different anon_vma heads, and from that point on be unmergeable.
      
      Most of the time this doesn't matter at all; but in two cases it may matter.
      One case is that mremap refuses (-EFAULT) to span more than a single vma: so
      it is conceivable that some app has relied on vma merging prior to mremap in
      the past, and will now fail with anon_vma.  Conceivable but unlikely, let's
      cross that bridge if we come to it: and the right answer would be to extend
      mremap, which should not be exporting the kernel's implementation detail of
      vma to user interface.
      
      The other case that matters is when a reasonable repetitive sequence of
      syscalls and faults ends up with a large number of separate unmergeable vmas,
      instead of the single merged vma it could have.
      
      Andrea's mprotect-vma-merging patch fixed some such instances, but left other
      plausible cases unmerged.  There is no perfect solution, and the harder you
      try to allow vmas to be merged, the less efficient anon_vma becomes, in the
      extreme there being one to span the whole address space, from which hangs
      every private vma; but anonmm rmap is clearly superior to that extreme.
      
      Andrea's principle was that neighbouring vmas which could be mprotected into
      mergeable vmas should be allowed to share anon_vma: good insight.  His
      implementation was to arrange this sharing when trying vma merge, but that
      seems to be too early.  This patch sticks to the principle, but implements it
      in anon_vma_prepare, when handling the first write fault on a private vma:
      with better results.  The drawback is that this first write fault needs an
      extra find_vma_prev (whereas prev was already to hand when implementing
      anon_vma sharing at try-to-merge time).
      17e8935f