Commits · 2.1.132pre3 · Kirill Smelkov / linux

23 Nov, 2007 40 commits

Import 2.1.132pre3 · e2ba60b6
Linus Torvalds authored Nov 23, 2007

e2ba60b6

Linus Torvalds authored Nov 23, 2007

..is out there, and has everybodys favourite fix, ie the version number
has been bumped this time. In addition, compared to pre-1, it has:
 - autofs fix (uninitialized inode number could lead to "interesting"
   problems)
 - some more NFS fixes (file truncation with pending write-backs this time)
 - disable_irq()/enable_irq() now nests properly, as Alan convinced me
   (quite rightly) that they have to nest in order to work sanely with
   shared interrupt and multiple CPU's and various other schenarios.
 - more merges from Alan, we're getting closer to being synched up.
Most of the bulk of the thing is the irda stuff, that most people can
ignore.

                Linus

bd9c5382

Linux 2.1.132pre1 · 2e59abdf

Linus Torvalds authored Nov 23, 2007

There's a new pre-patch out there. I'm back from Finland, and have caught
up with just about half the email that I got during the stay. However,
even the part I caught up with I may have partly missed something in,
because (for obvious reasons) I didn't read them as carefully (*) as I
usually do.

This should fix at least part of the NFS problems people have reported:
there was code to completely incorrectly invalidate quite valid write
requests under some circumstances. The pre-patch also contains the first
batch of patches merged in from Alan, and the "rmdir" problems should be
fixed (mostly thanks to Al Viro).

This pre-patch also gets rid of some imho completely unnecessary
complexity in some of the VM memory freeing routines. There have been
patches floating around that added more heuristics on when to do
something, and this tries to get the same result by just removing old
heuristics that didn't make much sense.

	Linus

(*) Even my usual "careful" is not very careful by other peoples
standards. So when _I_ say that I wasn't very careful, you should just
assume that I was reading my email about as carefully as a hyper-active
hedgehog on some serious uppers. Can you say "ignored email" three times
quickly while chewing on an apple?

2e59abdf

Linux 2.1.131 · ec274075

Linus Torvalds authored Nov 23, 2007

2.1.131 is out there now - and will be the last kernel release for a
while. I'm going to Finland for a week and a half, and will be back mid
December. During that time I hope people will beat on this. I'll be able
to read email when I'm gone, but as I haven't been back in over a year,
I'm not very likely to.

Alan, I have got any replies (positive or negative) about the VFS fixes in
pre-2.1.131-3 (which are obviously in the real 131 too), so I hope that
means that I successfully fixed all filesystems. The chance of that being
true is remote, but hey, I can hope. If not, I assume you'll be doing
your ac patches anyway (any bugs wrt rmdir() should be fairly obvious once
seen), and people might as well consider those official..

Linus

ec274075

pre-patch-2.1.131-3 · 16c82539

Linus Torvalds authored Nov 23, 2007

Ok,
 I've made a new pre-patch-2.1.131-3.

The basic problem (that Alexander Viro correctly diagnosed) is that the
inode locking was horribly and subtly wrong for the case of a "rmdir()"
call. What rmdir() did was essentially something like

 - VFS: lock the directory that contains the directory to remove
   (this is normal and required to make sure that the name updates are
   completely atomic - so removing or adding anything requires you to hold
   the lock on the directory that contains the removee/addee)
 - low-level filesystem: lock the directory you're going to remove, in
   order to atomically check that it's empty.

So far so good, the above makes tons of sense. HOWEVER, the problem is
fairly obvious if anybody before Alexander had actually bothered to think
about it: when we hold two locks, we had better make sure that we get the
locks in the right order, or we may end up deadlocked with two (or more)
processes getting the locks in the wrong order and waiting on each other.

Now, if it was only rmdir(), things would be fine, because the directory
hierarchy itself imposes a lock order for rmdir(). But we have another
case that needs to lock two directories: "rename()". And that one doesn't
have the same kind of obvious order, and uses a different way to order the
two locks it gets. BOOM.

As far as I can tell, this is a problem in 2.0.x too, but while it's a
potential really nasty DoS-opening, it does have the saving grace that the
window to trigger it is really really small. I don't know if you can
actually make an exploit for it that has any real chance of hitting it,
but it's at least conceptually possible.

Now, the only sane fix was to actually make the VFS layer do all the
locking for rmdir(), and thus let the VFS layer make sure the order is
correct, so that low-level filesystems don't need to worry their pretty
heads. I tried to do that in the previous pre-patch, and it worked well
for ext2, but not all that much else. The problem was that too many
filesystems "knew" what the rmdir() downcall used to do. Oh, well.

Anyway, I've fixed the low-level filesystems as far as I can tell, and the
end result is a much cleaner interface (and one less bug). But it's an
interface change at a fairly late date, and while the fixes to smbfs etc
looked for the most part obvious, I haven't been able to test them, so
I've done most of them "blind".

Sadly, this bug couldn't just be glossed over, because a normal user could
(by knowing the exact right incantation) force tons of unkillable
processes that held critical filesystem resources (any lookup on a
directory that was locked would in turn also lock up). So I'd ask people
who have done filesystems for Linux to look over my changes, and if the
filesystems are not part of the standard distribution please look over
your own locally maintained fs code. I think we can ignore 2.0.x by virtue
of it probably being virtually impossible to trigger. I'll leave the
decision up to Alan.

Most specially, I'd like to have people who use/maintain vfat and umsdos
filesystems to test out that I actually made those filesystems happy with
my changes. The other filesystems were more straightforward.

Oh, and thanks to Alexander. Not that I really needed another bug to fix,
but it feels good to plug holes.

                        Linus

The change is basically:
 - the VFS layer locks the directory to be removed for you (as opposed to
   just the directory that contains the directory to be removed as it used
   to). A lot of filesystems didn't actually do this, and it is required,
   because otherwise the test for an empty directory may be subverted by a
   clever hacker.
 - the VFS layer will have done a dcache "prune" operation on the
   directory, and if there were no other uses for that dcache entry, it
   will have done a "d_drop()" on it too.
 - the above essentially means that any filesystem can do a
        if (!list_empty(&dentry->d_hash))
                return -EBUSY;
   to test whether there are other users of this directory. No need to do
   any extra pruning etc - if it's been dropped there won't be any new
   users of the dentry afterwards, so there are no races. So after doing
   the above test you know that you'll have exclusive access to the dentry
   forever.
   Most notably, the low-level filesystem should _not_ look at the
   dentry->d_count member to see how many users there are. The VFS layer
   currently artificially raises the dentry count to make sure
   "d_delete()" doesn't get rid of the inode early.
 - however: traditional local UNIX-type filesystems tend to want to allow
   removing of the directory even if it is in use by something else. This
   requires that the inode be accessible even after the rmdir() - even
   though it doesn't necessarily need to actually _do_ anything.
   For a normal UNIX-like filesystem this tends to be trivial and quite
   automatic behaviour, but you need to think about whether your
   filesystem is of the kind where the inode stays around even after the
   delete until we locally do the final "iput()". For example, on
   networked filesystems this is generally not true, simply because the
   server will have de-allocated the inode even if we still have a
   reference to it locally.

16c82539

Linux 2.1.131pre2 · b468356b

Linus Torvalds authored Nov 23, 2007

There's a pre-131-2 patch there on ftp.kernel.org in the testing
directory. This should have the NFS locking issues worked out (please
test), and also has a rather subtle but potentially very nasty deadlock
due to incorrect semaphore ordering with rmdir() hopefully fixed for good.
Alan, the regparm patches are also there.

                Linus

nfs: write back everything whenever some lock is changed (not just for
     unlock), and always invalidates the caches.

b468356b

The Basted Turkey Release (aka 2.1.130) · 2a86df06

Linus Torvalds authored Nov 23, 2007

Following hot on the heels of the greased weasel, the basted turkey rears
its handsome head.

The basted turkey release fixes some problems that our dear weasel had,
namely:
 - NFS reference counting was wrong. It had been wrong for a long time,
   but apparently the more aggressively asynchronous code was more easily
   able to show the resultant random memory corruption. That should be
   gone.
 - The UP flu fixed officially (this has been in most of the 2.1.129
   patches)
 - kernel_thread() used to be able to cause bad things in init-routines at
   bootup. Fixed.
 - itimers could lead to bad things in SMP under heavy itimer load.
 - various mm tweaks to make it behave better under load. Things for dirty
   buffers still under consideration.
 - IP masqerading check fixes.
 - acenic gigabit ethernet driver
 - some drunken revelers fixed some MCA issues.
 - alpha PCI setup updates and video drivers
 - hfs and minix filesystem fixes.

On the whole, an excellent thing to do this evening, and goes together
remarkably well with some good red wine. Amaze your friends and relatives
by completely ignoring them, sitting in a corner with your own basted
turkey, and getting wasted on red wine. Much more fun than your average
thanksgiving dinner,

		Linus

2a86df06

pre-2.1.130-3 · 63f5d27a

Linus Torvalds authored Nov 23, 2007

There's a new pre-patch for people who want to test these things out: I'll
probably make a real 2.1.130 soon just to make sure all the silly problems
in 2.1.129 are left behind (ie the UP flu in particular that people are
still discussing even though there's a known cure).

The pre-patch fixes a rather serious problem with wall-clock itimer
functions, that admittedly was very very hard to trigger in real life (the
only reason we found it was due to the diligent help from John Taves that
saw sporadic problems under some very specific circumstances - thanks
John).

It also fixes a very silly NFS path revalidation issue: when we
revalidated a cached NFS path component, we didn't update the revalidation
time, so we ended up doing a lookup over the wire every time after the
first time - essentially making the dcache useless for path component
caching of NFS. If you use NFS heavily, you _will_ notice this change (it
also fixes some rather ugly uses of dentries and inodes in the NFS code
where we didn't update the counter so the inode wasn't guaranteed to even
be there any more!).

Also, thanks to Richard Gooch &co, who found the rather nasty race
condition when a kernel thread was started from an init-region. The
trivial fix was to not have the kernel thread function be inlined, but
while fixing it was trivial, it wasn't trivial to notice in the first
place. Good debugging.

And the UP flu is obviously fixed here (as it was in earlier pre-patches
and in various other patches floating around).

                        Linus

63f5d27a

Import 2.1.130pre2 · 2eec9bc7
Linus Torvalds authored Nov 23, 2007

2eec9bc7

Linux 2.1.129 · c54c8322

Linus Torvalds authored Nov 23, 2007

To a large degree is more merges for PPC and Sparc (and
somehow I must have missed ARM _again_, so I'll have to find that).

But there's a few other things in there:
 - ncr53c8xx tag fix
 - more sound fixes.
 - NFS fixed
 - some subtle TCP issues fixed
 - and lots of mm smoothness tweaks (most of those have been floating
   around for some time - like getting rid of the last vestiges of page
   ages which just complicated and hurt the code)

Have fun with it, and tell me if it breaks. But it won't. I'm finally
getting the old "greased weasel" feeling back. In short, this is the much
awaited perfect and bug-free release, and the only reason I don't call it
2.2 is that I'm chicken.

	Kvaa, kvaa,
			Linus

c54c8322

Import 2.1.129pre6 · 03c31052
Linus Torvalds authored Nov 23, 2007

03c31052
Import 2.1.129pre5 · ee537af3
Linus Torvalds authored Nov 23, 2007

ee537af3
Import 2.1.129pre4 · d3c10203
Linus Torvalds authored Nov 23, 2007

d3c10203

Linux 2.1.129-pre3 · 5f99a99e

Linus Torvalds authored Nov 23, 2007

I don't know how I made an old pre-patch available: I've made a pre-3 that
has the proper proc thing so that it compiles (it is otherwise identical
to pre-2, so if you got pre-2 to compile by patching by hand, then there's
no reason to get pre-3).

                Linus

5f99a99e

Import 2.1.129pre2 · 73f97101
Linus Torvalds authored Nov 23, 2007

73f97101
Import 2.1.129pre1 · ef6a1333
Linus Torvalds authored Nov 23, 2007

ef6a1333
Import 2.1.128 · d9c0ffee
Linus Torvalds authored Nov 23, 2007

d9c0ffee
Import 2.1.128pre1 · c2ef85f5
Linus Torvalds authored Nov 23, 2007

c2ef85f5

Linux 2.1.127 · 2470b27d

Linus Torvalds authored Nov 23, 2007

Ok,
 after two fairly hectic weeks for me, 2.1.127 is finally out there.

This kernel does:
 - various small but important networking fixes from Davem (thanks). One
   of them is the "anti-nagle" bit to allow programs that know what they
   are doing to avoid nagling by telling the kernel so. This is mainly
   things like Web servers and ftp-servers that can use this option
   together with "sendfile()".
 - scheduling timeout interface change: the new interface is much more
   logical than the old one, and allows us to get the jiffies wrap-around
   case right. Thanks to Andrea Arcangeli.
 - Various driver updates: specialix, sonycd,
 - Memory management fixups. Handle out-of-memory conditions correctly,
   and handle high memory load much more gracefully.
 - sparc and PowerPC architecture updates
 - 3c509 SMP fix, tlan PCI probe update.
 - scsi driver updates: ncr53c8xx, aic7xxx, dc390
 - filesystem updates: autofs, hfs, umsdos

Go, test, be happy,

                Linus

2470b27d

Import 2.1.127pre7 · 852a91e8
Linus Torvalds authored Nov 23, 2007

852a91e8

>> Btw, I've been looking at why Andrea thinks he's patches are needed, · c8cff325

Linus Torvalds authored Nov 23, 2007

>> because I looked very deep and the patches really shouldn't have made any
>> real difference..
>> The reason - tadaam - is so silly that it's embarrassing. The thing is,
>> that the things that should use GFP_USER don't. They use GFP_KERNEL
>> instead, and that is sufficient to explain all the problems that Andrea
>> saw. Becuase GFP_KERNEL will continue to allow allocations even after the
>> freeing up of another page has failed.
>> After fixing that in mm/memory.c and mm/filemap.c, the problem seems to be
>> properly fixed.

> I thought to change that but I was not sure (and infact some email ago I
> asked that to you too). I have not changed that myself because I was
> worryed that userspace allocation could be too much light. It would be
> nice to know if using GFP_USER and disabling kswapd (at the end of
> vmscan.c) causes process to segfaults (so that we can know if a real time
> process can alloc/swapout memory safely).

I wonder why it wasn't GFP_USER - that's exactly what the thing is there
for, and I don't know when it was changed. Probably with the new page
cache or something. I just looked at the memory allocator, and it looked
like it was doing the right thing, and it _was_ - but because it was
called with GFP_KERNEL it tried harder than it should have to return a
good page even when it ran out of memory.
Anyway, I made a pre-patch-2.1.127-6 and put it on ftp.kernel.org (pre-4
and pre-5 have been my internal pre-patches and don't show up there). This
has the timeout code basic fixes and the mm fixes, and doesn't fall over
for me with Andreas memory load case.

	Linus

c8cff325

Import 2.1.127pre3 · 8e1e477e
Linus Torvalds authored Nov 23, 2007

8e1e477e

Linux 2.1.127pre2 · a93be803

Linus Torvalds authored Nov 23, 2007

I just found a case that could certainly result in endless page faults,
and an endless stream of __get_free_page() calls. It's been there forever,
and I bascially thought it could never happen, but thinking about it some
more it can happen a lot more easily than I thought.

The problem is that the page fault handling code will give up if it cannot
allocate a page table entry. We have code in place to handle the final
page allocation failure, but the "mid-way" failures just failed, and
caused the page fault to be done over and over again.

More importantly, this could happen from kernel mode when a system call
was trying to fill in a user page, in which case it wouldn't even be
interruptible.

It's really unlikely to happen (because the page tables tend to be set up
already), but I suspect it can be triggered by execve'ing a new process
which is not going to have any existing page tables. Even then we're
likely to have old pages available (the ones we free'd from the previous
process), but at least it doesn't sound impossible that this could be a
problem.

I've not seen this behaviour myself, but it could have caused Andrea's
problems, especially the harder to find ones. Andrea, can you check this
patch (against clean 2.1.126) out and see if it makes any difference to
your testing?

(Right now it does the wrong error code: it will cause a SIGSEGV instead
of a SIGBUS when we run out of memory, but that's a small detail).
Essentially, instead of trying to call "oom()" and sending a signal (which
doesn't work for kernel level accesses anyway), the code returns the
proper return value from handle_mm_fault(), which allows the caller to do
the right thing (which can include following the exception tables). That
way we can handle the case of running out of memory from a kernel mode
access too..

(This is also why the fault gets the wrong signal - I didn't bother to fix
up the x86 fault handler all that much ;)
Btw, the reason I'm sending out these patches in emails instead of just
putting them on ftp.kernel.org is that the machine has had disk problems
for the last week, and finally gave up completely last Friday or so. So
ftp.kernel.org is down until we have a new raid array or the old one
magically recovers.  Sorry about the spamming.

                Linus

a93be803

Linux 2.1.127pre1 · d7cc008e

Linus Torvalds authored Nov 23, 2007

I have an alternate patch for low memory circumstances that I'd like you
to test out.
The problem with the old kswapd setup was at least partly that kswapd was
woken up too late - by the time kswapd was woken up, it really had to work
fairly hard. Also, kswapd really shouldn't be real-time at all: normally
it should just be a fairly low-priority process, and the priority should
grow as there is more urgent need for memory.
This alternate approach seems to work for me, and is designed to avoid the
"spikes" of heavy real-time kswapd activity during which the machine is
fairly unusable in the old scheme.

                Linus

d7cc008e

Linux-2.1.126 · 76df47b0

Linus Torvalds authored Nov 23, 2007

 - architecture updates for alpha and MIPS (and some minor PPC updates
   too)
 - joystick updates
 - MCA stuff from Alan. The guy has too much free time on his hands.
 - stallion driver cosmetic update
 - nasty SMP race with "task queues" (not the scheduling kind), where we
   were mixing atomic metaphores, resulting in a mess. Usually a benign
   one, but occasionally you could force oopses.
 - some floppy and ide updates
 - PS/2 mouse driver integrated into the PC keyboard controller. That got
   rid of a lot of really nasty problems (it's the same controller,
   accessing it from two different drivers was always messy)
 - various driver updates: floppy, ide, network drivers, sound, video..
 - various small FS fixes - finally _really_ getting the ENOENT vs ENOTDIR
   stuff right, nfsd updates, remounting fixes, filesize limits on NFS
   and smbfs, ntfs and ufs updates...
 - shm updates from Alan
 - cleanup of some MM stuff, I hope Andrea will re-do the patches and I'll
   look at the other parts.
 - unix fd garbage collection fix, getting rid of circular dependencies..

And probably various other small fixes that I have thankfully forgotten
about.

76df47b0

Import 2.1.126pre2 · 350e3226
Linus Torvalds authored Nov 23, 2007

350e3226
Import 2.1.126pre1 · 79e1fe75
Linus Torvalds authored Nov 23, 2007

79e1fe75

Linux-2.1.125 ... pre-2.2 imminent · c968c37a

Linus Torvalds authored Nov 23, 2007

It seems that I've finally found the mysterious bug that caused some SMP
machines to lock up at bootup if they had no keyboard enabled. It turns
out that the keyboard was a complete red herring, and that it just changed
timings of bottom half handling in particular. The real culprit was some
misguided locking attempts by the console driver at a really bad time.

Anyway, that means that the last of my personal show-stopper bugs in 2.1.x
seems to be finally history. I still expect to sync up with Alan Cox's
patches in particular, but I'm mentally getting ready for a real 2.2.

I still haven't decided on whether I'll make the same kind of "pre-2.2"
that I did before the 2.0 release, but there are strong psychological
reasons to do so to get people to more actively test it out with a "this
really should be stable" mindset.

In the meantime, there's now 2.1.125. Most of 2.1.125 is driver updates
for various things, most notably perhaps joystick and the new 5.10 version
of the Adaptec aic7xxx driver by Doug Ledford (but there are various other
driver updates). The fix for the mysterious lock-up is a few embarrassing
lines removed, but makes me feel a lot better ;)

Go forth, multiply, and fill the earth

c968c37a

Import 2.1.125pre2 · a972c494
Linus Torvalds authored Nov 23, 2007

a972c494
Import 2.1.125pre1 · 9d50b265
Linus Torvalds authored Nov 23, 2007

9d50b265

Linux-2.1.124... · 830e4ab4

Linus Torvalds authored Nov 23, 2007

.. is out there now, and includes:

 - subtle fix for lazy FP save and restore on x86. The bug has been there
   for a long time, but was apparently triggered by the re-write of the
   low-level scheduling function. It could result in corrupted i387 state
   under certain (admittedly fairly unlikely) circumstances.
 - various networking updates. Some of the bugs fixed could result in
   kernel Oopses. None of them were common, though.
 - fixes for both filesystem accounting and quota handling.
 - the much-ado-about-little video driver merge.
 - PPC and Sparc updates
 - i386/SMP interrupt handling falls back on the safe mode.. Please tell
   me whether there are still machines with problems.
 - some new network drivers and updates
 - final (we hope) IP masquerade update

I still have a problem with certain machines that apparently don't want to
boot with the keyboard not plugged in even though they should. Kill me
now. If you have problems with i386/SMP on a machine without a keyboard,
plug one in and send me a report..

830e4ab4

Import 2.1.124pre2 · b03770e4
Linus Torvalds authored Nov 23, 2007

b03770e4
Import 2.1.124pre1 · a172a8a2
Linus Torvalds authored Nov 23, 2007

a172a8a2
Import 2.1.123 · add4b624
Linus Torvalds authored Nov 23, 2007

add4b624
Import 2.1.123pre3 · d13a7654
Linus Torvalds authored Nov 23, 2007

d13a7654
Import 2.1.123pre2 · 36800b1c
Linus Torvalds authored Nov 23, 2007

36800b1c
Import 2.1.123pre1 · 20fca9a1
Linus Torvalds authored Nov 23, 2007

20fca9a1
Import 2.1.122 · f85f3898
Linus Torvalds authored Nov 23, 2007

f85f3898
Import 2.1.122pre3 · 28c24777
Linus Torvalds authored Nov 23, 2007

28c24777
Import 2.1.122pre2 · 5f61db3c
Linus Torvalds authored Nov 23, 2007

5f61db3c