1. 23 Nov, 2007 40 commits
    • Linus Torvalds's avatar
      Linux 2.1.132pre1 · 2e59abdf
      Linus Torvalds authored
      There's a new pre-patch out there. I'm back from Finland, and have caught
      up with just about half the email that I got during the stay. However,
      even the part I caught up with I may have partly missed something in,
      because (for obvious reasons) I didn't read them as carefully (*) as I
      usually do.
      
      This should fix at least part of the NFS problems people have reported:
      there was code to completely incorrectly invalidate quite valid write
      requests under some circumstances. The pre-patch also contains the first
      batch of patches merged in from Alan, and the "rmdir" problems should be
      fixed (mostly thanks to Al Viro).
      
      This pre-patch also gets rid of some imho completely unnecessary
      complexity in some of the VM memory freeing routines. There have been
      patches floating around that added more heuristics on when to do
      something, and this tries to get the same result by just removing old
      heuristics that didn't make much sense.
      
      	Linus
      
      (*) Even my usual "careful" is not very careful by other peoples
      standards. So when _I_ say that I wasn't very careful, you should just
      assume that I was reading my email about as carefully as a hyper-active
      hedgehog on some serious uppers. Can you say "ignored email" three times
      quickly while chewing on an apple?
      2e59abdf
    • Linus Torvalds's avatar
      Linux 2.1.131 · ec274075
      Linus Torvalds authored
      2.1.131 is out there now - and will be the last kernel release for a
      while. I'm going to Finland for a week and a half, and will be back mid
      December. During that time I hope people will beat on this. I'll be able
      to read email when I'm gone, but as I haven't been back in over a year,
      I'm not very likely to.
      
      Alan, I have got any replies (positive or negative) about the VFS fixes in
      pre-2.1.131-3 (which are obviously in the real 131 too), so I hope that
      means that I successfully fixed all filesystems. The chance of that being
      true is remote, but hey, I can hope.  If not, I assume you'll be doing
      your ac patches anyway (any bugs wrt rmdir() should be fairly obvious once
      seen), and people might as well consider those official..
      
                      Linus
      ec274075
    • Linus Torvalds's avatar
      pre-patch-2.1.131-3 · 16c82539
      Linus Torvalds authored
      Ok,
       I've made a new pre-patch-2.1.131-3.
      
      The basic problem (that Alexander Viro correctly diagnosed) is that the
      inode locking was horribly and subtly wrong for the case of a "rmdir()"
      call. What rmdir() did was essentially something like
      
       - VFS: lock the directory that contains the directory to remove
         (this is normal and required to make sure that the name updates are
         completely atomic - so removing or adding anything requires you to hold
         the lock on the directory that contains the removee/addee)
       - low-level filesystem: lock the directory you're going to remove, in
         order to atomically check that it's empty.
      
      So far so good, the above makes tons of sense. HOWEVER, the problem is
      fairly obvious if anybody before Alexander had actually bothered to think
      about it: when we hold two locks, we had better make sure that we get the
      locks in the right order, or we may end up deadlocked with two (or more)
      processes getting the locks in the wrong order and waiting on each other.
      
      Now, if it was only rmdir(), things would be fine, because the directory
      hierarchy itself imposes a lock order for rmdir(). But we have another
      case that needs to lock two directories: "rename()". And that one doesn't
      have the same kind of obvious order, and uses a different way to order the
      two locks it gets. BOOM.
      
      As far as I can tell, this is a problem in 2.0.x too, but while it's a
      potential really nasty DoS-opening, it does have the saving grace that the
      window to trigger it is really really small. I don't know if you can
      actually make an exploit for it that has any real chance of hitting it,
      but it's at least conceptually possible.
      
      Now, the only sane fix was to actually make the VFS layer do all the
      locking for rmdir(), and thus let the VFS layer make sure the order is
      correct, so that low-level filesystems don't need to worry their pretty
      heads. I tried to do that in the previous pre-patch, and it worked well
      for ext2, but not all that much else. The problem was that too many
      filesystems "knew" what the rmdir() downcall used to do. Oh, well.
      
      Anyway, I've fixed the low-level filesystems as far as I can tell, and the
      end result is a much cleaner interface (and one less bug). But it's an
      interface change at a fairly late date, and while the fixes to smbfs etc
      looked for the most part obvious, I haven't been able to test them, so
      I've done most of them "blind".
      
      Sadly, this bug couldn't just be glossed over, because a normal user could
      (by knowing the exact right incantation) force tons of unkillable
      processes that held critical filesystem resources (any lookup on a
      directory that was locked would in turn also lock up). So I'd ask people
      who have done filesystems for Linux to look over my changes, and if the
      filesystems are not part of the standard distribution please look over
      your own locally maintained fs code. I think we can ignore 2.0.x by virtue
      of it probably being virtually impossible to trigger. I'll leave the
      decision up to Alan.
      
      Most specially, I'd like to have people who use/maintain vfat and umsdos
      filesystems to test out that I actually made those filesystems happy with
      my changes. The other filesystems were more straightforward.
      
      Oh, and thanks to Alexander. Not that I really needed another bug to fix,
      but it feels good to plug holes.
      
                              Linus
      
      The change is basically:
       - the VFS layer locks the directory to be removed for you (as opposed to
         just the directory that contains the directory to be removed as it used
         to). A lot of filesystems didn't actually do this, and it is required,
         because otherwise the test for an empty directory may be subverted by a
         clever hacker.
       - the VFS layer will have done a dcache "prune" operation on the
         directory, and if there were no other uses for that dcache entry, it
         will have done a "d_drop()" on it too.
       - the above essentially means that any filesystem can do a
              if (!list_empty(&dentry->d_hash))
                      return -EBUSY;
         to test whether there are other users of this directory. No need to do
         any extra pruning etc - if it's been dropped there won't be any new
         users of the dentry afterwards, so there are no races. So after doing
         the above test you know that you'll have exclusive access to the dentry
         forever.
         Most notably, the low-level filesystem should _not_ look at the
         dentry->d_count member to see how many users there are. The VFS layer
         currently artificially raises the dentry count to make sure
         "d_delete()" doesn't get rid of the inode early.
       - however: traditional local UNIX-type filesystems tend to want to allow
         removing of the directory even if it is in use by something else. This
         requires that the inode be accessible even after the rmdir() - even
         though it doesn't necessarily need to actually _do_ anything.
         For a normal UNIX-like filesystem this tends to be trivial and quite
         automatic behaviour, but you need to think about whether your
         filesystem is of the kind where the inode stays around even after the
         delete until we locally do the final "iput()". For example, on
         networked filesystems this is generally not true, simply because the
         server will have de-allocated the inode even if we still have a
         reference to it locally.
      16c82539
    • Linus Torvalds's avatar
      Linux 2.1.131pre2 · b468356b
      Linus Torvalds authored
      There's a pre-131-2 patch there on ftp.kernel.org in the testing
      directory. This should have the NFS locking issues worked out (please
      test), and also has a rather subtle but potentially very nasty deadlock
      due to incorrect semaphore ordering with rmdir() hopefully fixed for good.
      Alan, the regparm patches are also there.
      
                      Linus
      
      nfs: write back everything whenever some lock is changed (not just for
           unlock), and always invalidates the caches.
      b468356b
    • Linus Torvalds's avatar
      The Basted Turkey Release (aka 2.1.130) · 2a86df06
      Linus Torvalds authored
      Following hot on the heels of the greased weasel, the basted turkey rears
      its handsome head.
      
      The basted turkey release fixes some problems that our dear weasel had,
      namely:
       - NFS reference counting was wrong. It had been wrong for a long time,
         but apparently the more aggressively asynchronous code was more easily
         able to show the resultant random memory corruption. That should be
         gone.
       - The UP flu fixed officially (this has been in most of the 2.1.129
         patches)
       - kernel_thread() used to be able to cause bad things in init-routines at
         bootup. Fixed.
       - itimers could lead to bad things in SMP under heavy itimer load.
       - various mm tweaks to make it behave better under load. Things for dirty
         buffers still under consideration.
       - IP masqerading check fixes.
       - acenic gigabit ethernet driver
       - some drunken revelers fixed some MCA issues.
       - alpha PCI setup updates and video drivers
       - hfs and minix filesystem fixes.
      
      On the whole, an excellent thing to do this evening, and goes together
      remarkably well with some good red wine. Amaze your friends and relatives
      by completely ignoring them, sitting in a corner with your own basted
      turkey, and getting wasted on red wine. Much more fun than your average
      thanksgiving dinner,
      
      		Linus
      2a86df06
    • Linus Torvalds's avatar
      pre-2.1.130-3 · 63f5d27a
      Linus Torvalds authored
      There's a new pre-patch for people who want to test these things out: I'll
      probably make a real 2.1.130 soon just to make sure all the silly problems
      in 2.1.129 are left behind (ie the UP flu in particular that people are
      still discussing even though there's a known cure).
      
      The pre-patch fixes a rather serious problem with wall-clock itimer
      functions, that admittedly was very very hard to trigger in real life (the
      only reason we found it was due to the diligent help from John Taves that
      saw sporadic problems under some very specific circumstances - thanks
      John).
      
      It also fixes a very silly NFS path revalidation issue: when we
      revalidated a cached NFS path component, we didn't update the revalidation
      time, so we ended up doing a lookup over the wire every time after the
      first time - essentially making the dcache useless for path component
      caching of NFS. If you use NFS heavily, you _will_ notice this change (it
      also fixes some rather ugly uses of dentries and inodes in the NFS code
      where we didn't update the counter so the inode wasn't guaranteed to even
      be there any more!).
      
      Also, thanks to Richard Gooch &co, who found the rather nasty race
      condition when a kernel thread was started from an init-region. The
      trivial fix was to not have the kernel thread function be inlined, but
      while fixing it was trivial, it wasn't trivial to notice in the first
      place. Good debugging.
      
      And the UP flu is obviously fixed here (as it was in earlier pre-patches
      and in various other patches floating around).
      
                              Linus
      63f5d27a
    • Linus Torvalds's avatar
      Import 2.1.130pre2 · 2eec9bc7
      Linus Torvalds authored
      2eec9bc7
    • Linus Torvalds's avatar
      Linux 2.1.129 · c54c8322
      Linus Torvalds authored
      To a large degree is more merges for PPC and Sparc (and
      somehow I must have missed ARM _again_, so I'll have to find that).
      
      But there's a few other things in there:
       - ncr53c8xx tag fix
       - more sound fixes.
       - NFS fixed
       - some subtle TCP issues fixed
       - and lots of mm smoothness tweaks (most of those have been floating
         around for some time - like getting rid of the last vestiges of page
         ages which just complicated and hurt the code)
      
      Have fun with it, and tell me if it breaks. But it won't. I'm finally
      getting the old "greased weasel" feeling back. In short, this is the much
      awaited perfect and bug-free release, and the only reason I don't call it
      2.2 is that I'm chicken.
      
      	Kvaa, kvaa,
      			Linus
      c54c8322
    • Linus Torvalds's avatar
      Import 2.1.129pre6 · 03c31052
      Linus Torvalds authored
      03c31052
    • Linus Torvalds's avatar
      Import 2.1.129pre5 · ee537af3
      Linus Torvalds authored
      ee537af3
    • Linus Torvalds's avatar
      Import 2.1.129pre4 · d3c10203
      Linus Torvalds authored
      d3c10203
    • Linus Torvalds's avatar
      Linux 2.1.129-pre3 · 5f99a99e
      Linus Torvalds authored
      I don't know how I made an old pre-patch available: I've made a pre-3 that
      has the proper proc thing so that it compiles (it is otherwise identical
      to pre-2, so if you got pre-2 to compile by patching by hand, then there's
      no reason to get pre-3).
      
                      Linus
      5f99a99e
    • Linus Torvalds's avatar
      Import 2.1.129pre2 · 73f97101
      Linus Torvalds authored
      73f97101
    • Linus Torvalds's avatar
      Import 2.1.129pre1 · ef6a1333
      Linus Torvalds authored
      ef6a1333
    • Linus Torvalds's avatar
      Import 2.1.128 · d9c0ffee
      Linus Torvalds authored
      d9c0ffee
    • Linus Torvalds's avatar
      Import 2.1.128pre1 · c2ef85f5
      Linus Torvalds authored
      c2ef85f5
    • Linus Torvalds's avatar
      Linux 2.1.127 · 2470b27d
      Linus Torvalds authored
      Ok,
       after two fairly hectic weeks for me, 2.1.127 is finally out there.
      
      This kernel does:
       - various small but important networking fixes from Davem (thanks). One
         of them is the "anti-nagle" bit to allow programs that know what they
         are doing to avoid nagling by telling the kernel so. This is mainly
         things like Web servers and ftp-servers that can use this option
         together with "sendfile()".
       - scheduling timeout interface change: the new interface is much more
         logical than the old one, and allows us to get the jiffies wrap-around
         case right. Thanks to Andrea Arcangeli.
       - Various driver updates: specialix, sonycd,
       - Memory management fixups. Handle out-of-memory conditions correctly,
         and handle high memory load much more gracefully.
       - sparc and PowerPC architecture updates
       - 3c509 SMP fix, tlan PCI probe update.
       - scsi driver updates: ncr53c8xx, aic7xxx, dc390
       - filesystem updates: autofs, hfs, umsdos
      
      Go, test, be happy,
      
                      Linus
      2470b27d
    • Linus Torvalds's avatar
      Import 2.1.127pre7 · 852a91e8
      Linus Torvalds authored
      852a91e8
    • Linus Torvalds's avatar
      >> Btw, I've been looking at why Andrea thinks he's patches are needed, · c8cff325
      Linus Torvalds authored
      >> because I looked very deep and the patches really shouldn't have made any
      >> real difference..
      >> The reason - tadaam - is so silly that it's embarrassing. The thing is,
      >> that the things that should use GFP_USER don't. They use GFP_KERNEL
      >> instead, and that is sufficient to explain all the problems that Andrea
      >> saw. Becuase GFP_KERNEL will continue to allow allocations even after the
      >> freeing up of another page has failed.
      >> After fixing that in mm/memory.c and mm/filemap.c, the problem seems to be
      >> properly fixed.
      
      > I thought to change that but I was not sure (and infact some email ago I
      > asked that to you too). I have not changed that myself because I was
      > worryed that userspace allocation could be too much light. It would be
      > nice to know if using GFP_USER and disabling kswapd (at the end of
      > vmscan.c) causes process to segfaults (so that we can know if a real time
      > process can alloc/swapout memory safely).
      
      I wonder why it wasn't GFP_USER - that's exactly what the thing is there
      for, and I don't know when it was changed. Probably with the new page
      cache or something. I just looked at the memory allocator, and it looked
      like it was doing the right thing, and it _was_ - but because it was
      called with GFP_KERNEL it tried harder than it should have to return a
      good page even when it ran out of memory.
      Anyway, I made a pre-patch-2.1.127-6 and put it on ftp.kernel.org (pre-4
      and pre-5 have been my internal pre-patches and don't show up there). This
      has the timeout code basic fixes and the mm fixes, and doesn't fall over
      for me with Andreas memory load case.
      
      	Linus
      c8cff325
    • Linus Torvalds's avatar
      Import 2.1.127pre3 · 8e1e477e
      Linus Torvalds authored
      8e1e477e
    • Linus Torvalds's avatar
      Linux 2.1.127pre2 · a93be803
      Linus Torvalds authored
      I just found a case that could certainly result in endless page faults,
      and an endless stream of __get_free_page() calls. It's been there forever,
      and I bascially thought it could never happen, but thinking about it some
      more it can happen a lot more easily than I thought.
      
      The problem is that the page fault handling code will give up if it cannot
      allocate a page table entry. We have code in place to handle the final
      page allocation failure, but the "mid-way" failures just failed, and
      caused the page fault to be done over and over again.
      
      More importantly, this could happen from kernel mode when a system call
      was trying to fill in a user page, in which case it wouldn't even be
      interruptible.
      
      It's really unlikely to happen (because the page tables tend to be set up
      already), but I suspect it can be triggered by execve'ing a new process
      which is not going to have any existing page tables. Even then we're
      likely to have old pages available (the ones we free'd from the previous
      process), but at least it doesn't sound impossible that this could be a
      problem.
      
      I've not seen this behaviour myself, but it could have caused Andrea's
      problems, especially the harder to find ones. Andrea, can you check this
      patch (against clean 2.1.126) out and see if it makes any difference to
      your testing?
      
      (Right now it does the wrong error code: it will cause a SIGSEGV instead
      of a SIGBUS when we run out of memory, but that's a small detail).
      Essentially, instead of trying to call "oom()" and sending a signal (which
      doesn't work for kernel level accesses anyway), the code returns the
      proper return value from handle_mm_fault(), which allows the caller to do
      the right thing (which can include following the exception tables). That
      way we can handle the case of running out of memory from a kernel mode
      access too..
      
      (This is also why the fault gets the wrong signal - I didn't bother to fix
      up the x86 fault handler all that much ;)
      Btw, the reason I'm sending out these patches in emails instead of just
      putting them on ftp.kernel.org is that the machine has had disk problems
      for the last week, and finally gave up completely last Friday or so. So
      ftp.kernel.org is down until we have a new raid array or the old one
      magically recovers.  Sorry about the spamming.
      
                      Linus
      a93be803
    • Linus Torvalds's avatar
      Linux 2.1.127pre1 · d7cc008e
      Linus Torvalds authored
      I have an alternate patch for low memory circumstances that I'd like you
      to test out.
      The problem with the old kswapd setup was at least partly that kswapd was
      woken up too late - by the time kswapd was woken up, it really had to work
      fairly hard. Also, kswapd really shouldn't be real-time at all: normally
      it should just be a fairly low-priority process, and the priority should
      grow as there is more urgent need for memory.
      This alternate approach seems to work for me, and is designed to avoid the
      "spikes" of heavy real-time kswapd activity during which the machine is
      fairly unusable in the old scheme.
      
                      Linus
      d7cc008e
    • Linus Torvalds's avatar
      Linux-2.1.126 · 76df47b0
      Linus Torvalds authored
       - architecture updates for alpha and MIPS (and some minor PPC updates
         too)
       - joystick updates
       - MCA stuff from Alan. The guy has too much free time on his hands.
       - stallion driver cosmetic update
       - nasty SMP race with "task queues" (not the scheduling kind), where we
         were mixing atomic metaphores, resulting in a mess. Usually a benign
         one, but occasionally you could force oopses.
       - some floppy and ide updates
       - PS/2 mouse driver integrated into the PC keyboard controller. That got
         rid of a lot of really nasty problems (it's the same controller,
         accessing it from two different drivers was always messy)
       - various driver updates: floppy, ide, network drivers, sound, video..
       - various small FS fixes - finally _really_ getting the ENOENT vs ENOTDIR
         stuff right, nfsd updates, remounting fixes, filesize limits on NFS
         and smbfs, ntfs and ufs updates...
       - shm updates from Alan
       - cleanup of some MM stuff, I hope Andrea will re-do the patches and I'll
         look at the other parts.
       - unix fd garbage collection fix, getting rid of circular dependencies..
      
      And probably various other small fixes that I have thankfully forgotten
      about.
      76df47b0
    • Linus Torvalds's avatar
      Import 2.1.126pre2 · 350e3226
      Linus Torvalds authored
      350e3226
    • Linus Torvalds's avatar
      Import 2.1.126pre1 · 79e1fe75
      Linus Torvalds authored
      79e1fe75
    • Linus Torvalds's avatar
      Linux-2.1.125 ... pre-2.2 imminent · c968c37a
      Linus Torvalds authored
      It seems that I've finally found the mysterious bug that caused some SMP
      machines to lock up at bootup if they had no keyboard enabled. It turns
      out that the keyboard was a complete red herring, and that it just changed
      timings of bottom half handling in particular. The real culprit was some
      misguided locking attempts by the console driver at a really bad time.
      
      Anyway, that means that the last of my personal show-stopper bugs in 2.1.x
      seems to be finally history. I still expect to sync up with Alan Cox's
      patches in particular, but I'm mentally getting ready for a real 2.2.
      
      I still haven't decided on whether I'll make the same kind of "pre-2.2"
      that I did before the 2.0 release, but there are strong psychological
      reasons to do so to get people to more actively test it out with a "this
      really should be stable" mindset.
      
      In the meantime, there's now 2.1.125. Most of 2.1.125 is driver updates
      for various things, most notably perhaps joystick and the new 5.10 version
      of the Adaptec aic7xxx driver by Doug Ledford (but there are various other
      driver updates). The fix for the mysterious lock-up is a few embarrassing
      lines removed, but makes me feel a lot better ;)
      
      Go forth, multiply, and fill the earth
      c968c37a
    • Linus Torvalds's avatar
      Import 2.1.125pre2 · a972c494
      Linus Torvalds authored
      a972c494
    • Linus Torvalds's avatar
      Import 2.1.125pre1 · 9d50b265
      Linus Torvalds authored
      9d50b265
    • Linus Torvalds's avatar
      Linux-2.1.124... · 830e4ab4
      Linus Torvalds authored
      .. is out there now, and includes:
      
       - subtle fix for lazy FP save and restore on x86. The bug has been there
         for a long time, but was apparently triggered by the re-write of the
         low-level scheduling function. It could result in corrupted i387 state
         under certain (admittedly fairly unlikely) circumstances.
       - various networking updates. Some of the bugs fixed could result in
         kernel Oopses. None of them were common, though.
       - fixes for both filesystem accounting and quota handling.
       - the much-ado-about-little video driver merge.
       - PPC and Sparc updates
       - i386/SMP interrupt handling falls back on the safe mode.. Please tell
         me whether there are still machines with problems.
       - some new network drivers and updates
       - final (we hope) IP masquerade update
      
      I still have a problem with certain machines that apparently don't want to
      boot with the keyboard not plugged in even though they should. Kill me
      now. If you have problems with i386/SMP on a machine without a keyboard,
      plug one in and send me a report..
      830e4ab4
    • Linus Torvalds's avatar
      Import 2.1.124pre2 · b03770e4
      Linus Torvalds authored
      b03770e4
    • Linus Torvalds's avatar
      Import 2.1.124pre1 · a172a8a2
      Linus Torvalds authored
      a172a8a2
    • Linus Torvalds's avatar
      Import 2.1.123 · add4b624
      Linus Torvalds authored
      add4b624
    • Linus Torvalds's avatar
      Import 2.1.123pre3 · d13a7654
      Linus Torvalds authored
      d13a7654
    • Linus Torvalds's avatar
      Import 2.1.123pre2 · 36800b1c
      Linus Torvalds authored
      36800b1c
    • Linus Torvalds's avatar
      Import 2.1.123pre1 · 20fca9a1
      Linus Torvalds authored
      20fca9a1
    • Linus Torvalds's avatar
      Import 2.1.122 · f85f3898
      Linus Torvalds authored
      f85f3898
    • Linus Torvalds's avatar
      Import 2.1.122pre3 · 28c24777
      Linus Torvalds authored
      28c24777
    • Linus Torvalds's avatar
      Import 2.1.122pre2 · 5f61db3c
      Linus Torvalds authored
      5f61db3c
    • Linus Torvalds's avatar
      2.1.122pre1 · 90b9165f
      Linus Torvalds authored
      This may or may not fix the APM problems, and the INITRD ones.  The
      INITRD one in particular was a case of a fairly inexplicable test that
      shouldn't have been there in the first place breaking when something
      completely unrelated was cleaned up..
      
      The APM breakage was simply due to it being in the wrong place.  The
      patch looks bigger than it really is - it really only moves the file to
      the proper directory, and makes sure that it should compile with the
      standard assembler..
      
                      Linus
      90b9165f
    • Linus Torvalds's avatar
      Import 2.1.121 · b1173ae3
      Linus Torvalds authored
      b1173ae3