- 23 Nov, 2007 40 commits
-
-
Linus Torvalds authored
There's a new pre-patch out there. I'm back from Finland, and have caught up with just about half the email that I got during the stay. However, even the part I caught up with I may have partly missed something in, because (for obvious reasons) I didn't read them as carefully (*) as I usually do. This should fix at least part of the NFS problems people have reported: there was code to completely incorrectly invalidate quite valid write requests under some circumstances. The pre-patch also contains the first batch of patches merged in from Alan, and the "rmdir" problems should be fixed (mostly thanks to Al Viro). This pre-patch also gets rid of some imho completely unnecessary complexity in some of the VM memory freeing routines. There have been patches floating around that added more heuristics on when to do something, and this tries to get the same result by just removing old heuristics that didn't make much sense. Linus (*) Even my usual "careful" is not very careful by other peoples standards. So when _I_ say that I wasn't very careful, you should just assume that I was reading my email about as carefully as a hyper-active hedgehog on some serious uppers. Can you say "ignored email" three times quickly while chewing on an apple?
-
Linus Torvalds authored
2.1.131 is out there now - and will be the last kernel release for a while. I'm going to Finland for a week and a half, and will be back mid December. During that time I hope people will beat on this. I'll be able to read email when I'm gone, but as I haven't been back in over a year, I'm not very likely to. Alan, I have got any replies (positive or negative) about the VFS fixes in pre-2.1.131-3 (which are obviously in the real 131 too), so I hope that means that I successfully fixed all filesystems. The chance of that being true is remote, but hey, I can hope. If not, I assume you'll be doing your ac patches anyway (any bugs wrt rmdir() should be fairly obvious once seen), and people might as well consider those official.. Linus
-
Linus Torvalds authored
Ok, I've made a new pre-patch-2.1.131-3. The basic problem (that Alexander Viro correctly diagnosed) is that the inode locking was horribly and subtly wrong for the case of a "rmdir()" call. What rmdir() did was essentially something like - VFS: lock the directory that contains the directory to remove (this is normal and required to make sure that the name updates are completely atomic - so removing or adding anything requires you to hold the lock on the directory that contains the removee/addee) - low-level filesystem: lock the directory you're going to remove, in order to atomically check that it's empty. So far so good, the above makes tons of sense. HOWEVER, the problem is fairly obvious if anybody before Alexander had actually bothered to think about it: when we hold two locks, we had better make sure that we get the locks in the right order, or we may end up deadlocked with two (or more) processes getting the locks in the wrong order and waiting on each other. Now, if it was only rmdir(), things would be fine, because the directory hierarchy itself imposes a lock order for rmdir(). But we have another case that needs to lock two directories: "rename()". And that one doesn't have the same kind of obvious order, and uses a different way to order the two locks it gets. BOOM. As far as I can tell, this is a problem in 2.0.x too, but while it's a potential really nasty DoS-opening, it does have the saving grace that the window to trigger it is really really small. I don't know if you can actually make an exploit for it that has any real chance of hitting it, but it's at least conceptually possible. Now, the only sane fix was to actually make the VFS layer do all the locking for rmdir(), and thus let the VFS layer make sure the order is correct, so that low-level filesystems don't need to worry their pretty heads. I tried to do that in the previous pre-patch, and it worked well for ext2, but not all that much else. The problem was that too many filesystems "knew" what the rmdir() downcall used to do. Oh, well. Anyway, I've fixed the low-level filesystems as far as I can tell, and the end result is a much cleaner interface (and one less bug). But it's an interface change at a fairly late date, and while the fixes to smbfs etc looked for the most part obvious, I haven't been able to test them, so I've done most of them "blind". Sadly, this bug couldn't just be glossed over, because a normal user could (by knowing the exact right incantation) force tons of unkillable processes that held critical filesystem resources (any lookup on a directory that was locked would in turn also lock up). So I'd ask people who have done filesystems for Linux to look over my changes, and if the filesystems are not part of the standard distribution please look over your own locally maintained fs code. I think we can ignore 2.0.x by virtue of it probably being virtually impossible to trigger. I'll leave the decision up to Alan. Most specially, I'd like to have people who use/maintain vfat and umsdos filesystems to test out that I actually made those filesystems happy with my changes. The other filesystems were more straightforward. Oh, and thanks to Alexander. Not that I really needed another bug to fix, but it feels good to plug holes. Linus The change is basically: - the VFS layer locks the directory to be removed for you (as opposed to just the directory that contains the directory to be removed as it used to). A lot of filesystems didn't actually do this, and it is required, because otherwise the test for an empty directory may be subverted by a clever hacker. - the VFS layer will have done a dcache "prune" operation on the directory, and if there were no other uses for that dcache entry, it will have done a "d_drop()" on it too. - the above essentially means that any filesystem can do a if (!list_empty(&dentry->d_hash)) return -EBUSY; to test whether there are other users of this directory. No need to do any extra pruning etc - if it's been dropped there won't be any new users of the dentry afterwards, so there are no races. So after doing the above test you know that you'll have exclusive access to the dentry forever. Most notably, the low-level filesystem should _not_ look at the dentry->d_count member to see how many users there are. The VFS layer currently artificially raises the dentry count to make sure "d_delete()" doesn't get rid of the inode early. - however: traditional local UNIX-type filesystems tend to want to allow removing of the directory even if it is in use by something else. This requires that the inode be accessible even after the rmdir() - even though it doesn't necessarily need to actually _do_ anything. For a normal UNIX-like filesystem this tends to be trivial and quite automatic behaviour, but you need to think about whether your filesystem is of the kind where the inode stays around even after the delete until we locally do the final "iput()". For example, on networked filesystems this is generally not true, simply because the server will have de-allocated the inode even if we still have a reference to it locally.
-
Linus Torvalds authored
There's a pre-131-2 patch there on ftp.kernel.org in the testing directory. This should have the NFS locking issues worked out (please test), and also has a rather subtle but potentially very nasty deadlock due to incorrect semaphore ordering with rmdir() hopefully fixed for good. Alan, the regparm patches are also there. Linus nfs: write back everything whenever some lock is changed (not just for unlock), and always invalidates the caches.
-
Linus Torvalds authored
Following hot on the heels of the greased weasel, the basted turkey rears its handsome head. The basted turkey release fixes some problems that our dear weasel had, namely: - NFS reference counting was wrong. It had been wrong for a long time, but apparently the more aggressively asynchronous code was more easily able to show the resultant random memory corruption. That should be gone. - The UP flu fixed officially (this has been in most of the 2.1.129 patches) - kernel_thread() used to be able to cause bad things in init-routines at bootup. Fixed. - itimers could lead to bad things in SMP under heavy itimer load. - various mm tweaks to make it behave better under load. Things for dirty buffers still under consideration. - IP masqerading check fixes. - acenic gigabit ethernet driver - some drunken revelers fixed some MCA issues. - alpha PCI setup updates and video drivers - hfs and minix filesystem fixes. On the whole, an excellent thing to do this evening, and goes together remarkably well with some good red wine. Amaze your friends and relatives by completely ignoring them, sitting in a corner with your own basted turkey, and getting wasted on red wine. Much more fun than your average thanksgiving dinner, Linus
-
Linus Torvalds authored
There's a new pre-patch for people who want to test these things out: I'll probably make a real 2.1.130 soon just to make sure all the silly problems in 2.1.129 are left behind (ie the UP flu in particular that people are still discussing even though there's a known cure). The pre-patch fixes a rather serious problem with wall-clock itimer functions, that admittedly was very very hard to trigger in real life (the only reason we found it was due to the diligent help from John Taves that saw sporadic problems under some very specific circumstances - thanks John). It also fixes a very silly NFS path revalidation issue: when we revalidated a cached NFS path component, we didn't update the revalidation time, so we ended up doing a lookup over the wire every time after the first time - essentially making the dcache useless for path component caching of NFS. If you use NFS heavily, you _will_ notice this change (it also fixes some rather ugly uses of dentries and inodes in the NFS code where we didn't update the counter so the inode wasn't guaranteed to even be there any more!). Also, thanks to Richard Gooch &co, who found the rather nasty race condition when a kernel thread was started from an init-region. The trivial fix was to not have the kernel thread function be inlined, but while fixing it was trivial, it wasn't trivial to notice in the first place. Good debugging. And the UP flu is obviously fixed here (as it was in earlier pre-patches and in various other patches floating around). Linus
-
Linus Torvalds authored
-
Linus Torvalds authored
To a large degree is more merges for PPC and Sparc (and somehow I must have missed ARM _again_, so I'll have to find that). But there's a few other things in there: - ncr53c8xx tag fix - more sound fixes. - NFS fixed - some subtle TCP issues fixed - and lots of mm smoothness tweaks (most of those have been floating around for some time - like getting rid of the last vestiges of page ages which just complicated and hurt the code) Have fun with it, and tell me if it breaks. But it won't. I'm finally getting the old "greased weasel" feeling back. In short, this is the much awaited perfect and bug-free release, and the only reason I don't call it 2.2 is that I'm chicken. Kvaa, kvaa, Linus
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
I don't know how I made an old pre-patch available: I've made a pre-3 that has the proper proc thing so that it compiles (it is otherwise identical to pre-2, so if you got pre-2 to compile by patching by hand, then there's no reason to get pre-3). Linus
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
Ok, after two fairly hectic weeks for me, 2.1.127 is finally out there. This kernel does: - various small but important networking fixes from Davem (thanks). One of them is the "anti-nagle" bit to allow programs that know what they are doing to avoid nagling by telling the kernel so. This is mainly things like Web servers and ftp-servers that can use this option together with "sendfile()". - scheduling timeout interface change: the new interface is much more logical than the old one, and allows us to get the jiffies wrap-around case right. Thanks to Andrea Arcangeli. - Various driver updates: specialix, sonycd, - Memory management fixups. Handle out-of-memory conditions correctly, and handle high memory load much more gracefully. - sparc and PowerPC architecture updates - 3c509 SMP fix, tlan PCI probe update. - scsi driver updates: ncr53c8xx, aic7xxx, dc390 - filesystem updates: autofs, hfs, umsdos Go, test, be happy, Linus
-
Linus Torvalds authored
-
Linus Torvalds authored
>> because I looked very deep and the patches really shouldn't have made any >> real difference.. >> The reason - tadaam - is so silly that it's embarrassing. The thing is, >> that the things that should use GFP_USER don't. They use GFP_KERNEL >> instead, and that is sufficient to explain all the problems that Andrea >> saw. Becuase GFP_KERNEL will continue to allow allocations even after the >> freeing up of another page has failed. >> After fixing that in mm/memory.c and mm/filemap.c, the problem seems to be >> properly fixed. > I thought to change that but I was not sure (and infact some email ago I > asked that to you too). I have not changed that myself because I was > worryed that userspace allocation could be too much light. It would be > nice to know if using GFP_USER and disabling kswapd (at the end of > vmscan.c) causes process to segfaults (so that we can know if a real time > process can alloc/swapout memory safely). I wonder why it wasn't GFP_USER - that's exactly what the thing is there for, and I don't know when it was changed. Probably with the new page cache or something. I just looked at the memory allocator, and it looked like it was doing the right thing, and it _was_ - but because it was called with GFP_KERNEL it tried harder than it should have to return a good page even when it ran out of memory. Anyway, I made a pre-patch-2.1.127-6 and put it on ftp.kernel.org (pre-4 and pre-5 have been my internal pre-patches and don't show up there). This has the timeout code basic fixes and the mm fixes, and doesn't fall over for me with Andreas memory load case. Linus
-
Linus Torvalds authored
-
Linus Torvalds authored
I just found a case that could certainly result in endless page faults, and an endless stream of __get_free_page() calls. It's been there forever, and I bascially thought it could never happen, but thinking about it some more it can happen a lot more easily than I thought. The problem is that the page fault handling code will give up if it cannot allocate a page table entry. We have code in place to handle the final page allocation failure, but the "mid-way" failures just failed, and caused the page fault to be done over and over again. More importantly, this could happen from kernel mode when a system call was trying to fill in a user page, in which case it wouldn't even be interruptible. It's really unlikely to happen (because the page tables tend to be set up already), but I suspect it can be triggered by execve'ing a new process which is not going to have any existing page tables. Even then we're likely to have old pages available (the ones we free'd from the previous process), but at least it doesn't sound impossible that this could be a problem. I've not seen this behaviour myself, but it could have caused Andrea's problems, especially the harder to find ones. Andrea, can you check this patch (against clean 2.1.126) out and see if it makes any difference to your testing? (Right now it does the wrong error code: it will cause a SIGSEGV instead of a SIGBUS when we run out of memory, but that's a small detail). Essentially, instead of trying to call "oom()" and sending a signal (which doesn't work for kernel level accesses anyway), the code returns the proper return value from handle_mm_fault(), which allows the caller to do the right thing (which can include following the exception tables). That way we can handle the case of running out of memory from a kernel mode access too.. (This is also why the fault gets the wrong signal - I didn't bother to fix up the x86 fault handler all that much ;) Btw, the reason I'm sending out these patches in emails instead of just putting them on ftp.kernel.org is that the machine has had disk problems for the last week, and finally gave up completely last Friday or so. So ftp.kernel.org is down until we have a new raid array or the old one magically recovers. Sorry about the spamming. Linus
-
Linus Torvalds authored
I have an alternate patch for low memory circumstances that I'd like you to test out. The problem with the old kswapd setup was at least partly that kswapd was woken up too late - by the time kswapd was woken up, it really had to work fairly hard. Also, kswapd really shouldn't be real-time at all: normally it should just be a fairly low-priority process, and the priority should grow as there is more urgent need for memory. This alternate approach seems to work for me, and is designed to avoid the "spikes" of heavy real-time kswapd activity during which the machine is fairly unusable in the old scheme. Linus
-
Linus Torvalds authored
- architecture updates for alpha and MIPS (and some minor PPC updates too) - joystick updates - MCA stuff from Alan. The guy has too much free time on his hands. - stallion driver cosmetic update - nasty SMP race with "task queues" (not the scheduling kind), where we were mixing atomic metaphores, resulting in a mess. Usually a benign one, but occasionally you could force oopses. - some floppy and ide updates - PS/2 mouse driver integrated into the PC keyboard controller. That got rid of a lot of really nasty problems (it's the same controller, accessing it from two different drivers was always messy) - various driver updates: floppy, ide, network drivers, sound, video.. - various small FS fixes - finally _really_ getting the ENOENT vs ENOTDIR stuff right, nfsd updates, remounting fixes, filesize limits on NFS and smbfs, ntfs and ufs updates... - shm updates from Alan - cleanup of some MM stuff, I hope Andrea will re-do the patches and I'll look at the other parts. - unix fd garbage collection fix, getting rid of circular dependencies.. And probably various other small fixes that I have thankfully forgotten about.
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
It seems that I've finally found the mysterious bug that caused some SMP machines to lock up at bootup if they had no keyboard enabled. It turns out that the keyboard was a complete red herring, and that it just changed timings of bottom half handling in particular. The real culprit was some misguided locking attempts by the console driver at a really bad time. Anyway, that means that the last of my personal show-stopper bugs in 2.1.x seems to be finally history. I still expect to sync up with Alan Cox's patches in particular, but I'm mentally getting ready for a real 2.2. I still haven't decided on whether I'll make the same kind of "pre-2.2" that I did before the 2.0 release, but there are strong psychological reasons to do so to get people to more actively test it out with a "this really should be stable" mindset. In the meantime, there's now 2.1.125. Most of 2.1.125 is driver updates for various things, most notably perhaps joystick and the new 5.10 version of the Adaptec aic7xxx driver by Doug Ledford (but there are various other driver updates). The fix for the mysterious lock-up is a few embarrassing lines removed, but makes me feel a lot better ;) Go forth, multiply, and fill the earth
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
.. is out there now, and includes: - subtle fix for lazy FP save and restore on x86. The bug has been there for a long time, but was apparently triggered by the re-write of the low-level scheduling function. It could result in corrupted i387 state under certain (admittedly fairly unlikely) circumstances. - various networking updates. Some of the bugs fixed could result in kernel Oopses. None of them were common, though. - fixes for both filesystem accounting and quota handling. - the much-ado-about-little video driver merge. - PPC and Sparc updates - i386/SMP interrupt handling falls back on the safe mode.. Please tell me whether there are still machines with problems. - some new network drivers and updates - final (we hope) IP masquerade update I still have a problem with certain machines that apparently don't want to boot with the keyboard not plugged in even though they should. Kill me now. If you have problems with i386/SMP on a machine without a keyboard, plug one in and send me a report..
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
This may or may not fix the APM problems, and the INITRD ones. The INITRD one in particular was a case of a fairly inexplicable test that shouldn't have been there in the first place breaking when something completely unrelated was cleaned up.. The APM breakage was simply due to it being in the wrong place. The patch looks bigger than it really is - it really only moves the file to the proper directory, and makes sure that it should compile with the standard assembler.. Linus
-
Linus Torvalds authored
-