1. 03 Oct, 2002 40 commits
    • Matthew Wilcox's avatar
      [PATCH] Remove another for_each_process loop · a62e0c44
      Matthew Wilcox authored
      Convert send_sigurg() to the for_each_task_pid() mechanism.  Also in
      the case where we were trying to send a signal to a non-existent PID,
      don't bother searching for -PID in the PGID array; we won't find it.
      a62e0c44
    • Linus Torvalds's avatar
      b1c725a7
    • Arjan van de Ven's avatar
      [PATCH] Remove sys_call_table export · f960dc50
      Arjan van de Ven authored
      The following patch removes the export of the sys_call_table.
      
      There are no uses of this export that are valid and correct. The uses I've
      found so far are
      
      1. Calling syscalls from inside kernel modules
      iBCS/Linux-abi used to do this (and this is the reason for the export
      in the first place), however it does
      no longer, because newer gcc's (2.96/3.x) don't allow
      function pointer calls with a mismatching type. Also it's much better to
      just call the sys_foo functions directly (most are export symbol'd already
      and exporting more if needed wouldn't be a problem, they are clearly a
      stable interface). Since gcc does no longer allow this
      (and I doubt older ones allowed it for all platforms) this I
      consider invalid and unneeded use.
      
      2. Install new syscalls from kernel modules
      LiS seems to be doing this. The correct way to do this is how NFS does
      it for its syscall, and that doesn't need the syscall table to be
      exported for this. Without an in-kernel helper like NFS has, it is not
      possible to do this race free wrt module-unloads etc. Eg this use of the
      export is unneeded and incorrect.
      
      3. Intercept system calls
      OProfile (and intel's vtune which is similar in function) used to do this;
      however what they really need is a notification on certain
      events (exec() mostly). The way modules do this is store the original
      function pointer, install a new one that calls the old one after storing
      whatever info they need. This mechanism breaks badly in the light of
      multiple such modules doing this versus modules
      unloading/uninstalling their handlers (by restoring their saved pointer
      that may or may not point to a valid handler anymore).
      Eg the use of the export in this just a bandaid due to lack of a
      proper mechanism, and also incorrect and crash prone.
      
      4. Extend system calls
      The mechanism for this is identical to the previous one, except
      that now the actual syscall behavior is changed. I don't think open source
      modules do this (generally they don't need to, just adding things to the
      kernel proper works for them), however I've
      seen IBM's closed source cluster fs do this.
      The objections to the mechanism are the same as in 3. Also
      this changes the userspace ABI effectively, something which is undesireable.
      f960dc50
    • Alan Cox's avatar
      [PATCH] aacraid driver for 2.5 · 03f29536
      Alan Cox authored
      Forward port from 2.4
      03f29536
    • Alan Cox's avatar
      [PATCH] move tulip into ethernet 10,100 · c1a178bd
      Alan Cox authored
      c1a178bd
    • Alan Cox's avatar
      [PATCH] 2.5 Fix set_bit abuse in ATP driver · 389e5af6
      Alan Cox authored
      389e5af6
    • Alan Cox's avatar
      0ed5c741
    • Alan Cox's avatar
      [PATCH] PC110 pad docs are wrong · 4f94e73f
      Alan Cox authored
      Someone tweaked the PC110 documents changing touchpad to touchscreen,
      this changes it back because it is a touchpad and _not_ a touchscreen
      4f94e73f
    • Alan Cox's avatar
      [PATCH] disable GMX2000 · 66a95ea5
      Alan Cox authored
      The GMX code in the DRI is unfinished stuff.  You need the old 4.0 DRM
      for the GMX2000 until 4.3 at least
      66a95ea5
    • Alan Cox's avatar
      [PATCH] PATCH: 2.5 trivial - MCA comments · f355faea
      Alan Cox authored
      f355faea
    • Linus Torvalds's avatar
      Merge bk://linuxusb.bkbits.net/linus-2.5 · 9b42378c
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      9b42378c
    • Manfred Spraul's avatar
      [PATCH] pipe bugfix /cleanup · c33585c5
      Manfred Spraul authored
      pipe_write contains a wakeup storm, 2 writers that write into the same
      fifo can wake each other up, and spend 100% cpu time with
      wakeup/schedule, without making any progress.
      
      The only regression I'm aware of is that
      
        $ dd if=/dev/zero | grep not_there
      
      will fail due to OOM, because grep does something like
      
      	for(;;) {
      		rlen = read(fd, buf, len);
      		if (rlen == len) {
      			len *= 2;
      			buf = realloc(buf, len);
      		}
      	}
      
      if it operates on pipes, and due to the improved syscall merging, read
      will always return the maximum possible amount of data. But that's a grep
      bug, not a kernel problem.
      c33585c5
    • Ingo Molnar's avatar
      [PATCH] workqueue lossage (fwd) · d8ac8dd7
      Ingo Molnar authored
      patch from DaveM
      d8ac8dd7
    • Ingo Molnar's avatar
      [PATCH] timer-2.5.40-F7 · afc14106
      Ingo Molnar authored
      This does a number of timer subsystem enhancements:
      
      - simplified timer initialization, now it's the cheapest possible thing:
      
          static inline void init_timer(struct timer_list * timer)
          {
                  timer->base = NULL;
          }
      
        since the timer functions already did a !timer->base check this did not
        have any effect on their fastpath.
      
      - the rule from now on is that timer->base is set upon activation of the
        timer, and cleared upon deactivation. This also made it possible to:
      
      - reorganize all the timer handling code to not assume anything about
        timer->entry.next and timer->entry.prev - this also removed lots of
        unnecessery cleaning of these fields. Removed lots of unnecessary list
        operations from the fastpath.
      
      - simplified del_timer_sync(): it now uses del_timer() plus some simple
        synchronization code. Note that this also fixes a bug: if mod_timer (or
        add_timer) moves a currently executing timer to another CPU's timer
        vector, then del_timer_sync() does not synchronize with the handler
        properly.
      
      - bugfix: moved run_local_timers() from scheduler_tick() into
        update_process_times() .. scheduler_tick() might be called from the fork
        code which will not quite have the intended effect ...
      
      - removed the APIC-timer-IRQ shifting done on SMP, Dipankar Sarma's
        testing shows no negative effects.
      
      - cleaned up include/linux/timer.h:
      
           - removed the timer_t typedef, and fixes up kernel/workqueue.c to use
             the 'struct timer_list' name instead.
      
           - removed unnecessery includes
      
           - renamed the 'list' field to 'entry' (it's an entry not a list head)
      
           - exchanged the 'function' and 'data' fields. This, besides being
             more logical, also unearthed the last few remaining places that
             initialized timers by assuming some given field ordering, the patch
             also fixes these places. (fs/xfs/pagebuf/page_buf.c,
             net/core/profile.c and net/ipv4/inetpeer.c)
      
           - removed the defunct sync_timers(), timer_enter() and timer_exit()
             prototypes.
      
           - added docbook-style comments.
      
      - other kernel/timer.c changes:
      
           - base->running_timer does not have to be volatile ...
      
           - added consistent comments to all the important functions.
      
           - made the sync-waiting in del_timer_sync preempt- and lowpower-
             friendly.
      
      i've compiled, booted & tested the patched kernel on x86 UP and SMP. I
      have tried moderately high networking load as well, to make sure the timer
      changes are correct - they appear to be.
      afc14106
    • Ingo Molnar's avatar
      [PATCH] sigfix-2.5.40-D6 · 794aa320
      Ingo Molnar authored
      This fixes all known signal semantics problems.
      
      sigwait() is really evil - i had to re-introduce ->real_blocked. When a
      signal has no handler defined then the actual action taken by the kernel
      depends on whether the sigwait()-ing thread was blocking the signal
      originally or not. If the signal was blocked => specific delivery to the
      thread, if the signal was not blocked => kill-all.
      
      fortunately this meant that PF_SIGWAIT could be killed - the real_blocked
      field contains all the necessery information to do the right decision at
      signal-sending time.
      
      i've also cleaned up and made the shared-pending code more robust: now
      there's a single central dequeue_signal() function that handles all the
      details. Plus upon unqueueing a shared-pending signal we now re-queue the
      signal to the current thread, which this time around is not going to end
      up in the shared-pending queue. This change handles the following case
      correctly: a signal was blocked in every signal, then one thread unblocks
      it and gets the signal delivered - but there's no handler for the signal
      => the correct action is to do a kill-all.
      
      i removed the unused shared_unblocked field as well, reported by Oleg
      Nesterov.
      
      now we pass both signal-tst1 and signal-tst2, so i'm confident that we got
      most of the details right.
      794aa320
    • Ingo Molnar's avatar
      [PATCH] futex-2.5.40-B5 · 6a20c6fe
      Ingo Molnar authored
      This does a number of futex bugfixes, performance improvements and
      cleanups.
      
      The bugfixes are:
      
       - fix locking bug noticed by Martin Wirth: the ordering of
         page_table_lock, vcache_lock and futex_lock was inconsistent and
         created the possibility of an SMP deadlock.
      
       - fix spurious wakeup noticed by Andrew Morton: the get_user() in
         futex_wait() can set the task state to TASK_RUNNING.
      
       - fix futex_wake COW race, noticed by Martin Wirth - futex_wake() has to
         go through the same lookup rules as the futex_wait() code, otherwise it
         might end up trying to wake up based on the wrong physical page.
      
      Improvements:
      
       - speed up the basic addrs => page lookup done by the futex code. It used
         to do an unconditional get_user_pages() call, which did a vma lookup
         and other heavy-handed tactics - while the common case is that the
         page is mapped and available. Furthermore, due to the COW-race code we
         had to re-check the mapping anyway, which made the get_user_pages()
         thing pretty unnecessery. This inefficiency was noticed by Martin
         Wirth.
      
         the new lookup code first does a lightweight follow_page(), then if no
         page is present we do the get_user_pages() thing.
      
       - locking cleanups - the new lookup code made some things simpler, eg.
         the hash calculation can now be done in queue_me().
      
       - added comments
      
       - reduced include file use.
      
       - increased the futex hashtable.
      6a20c6fe
    • Ingo Molnar's avatar
      [PATCH] dump_stack() cleanup, BK-curr · 26f7ff2e
      Ingo Molnar authored
      This modifies x86's dump_stack() to print out just the backtrace, not
      the stack contents.  The patch also adds one more whitespace after the
      numeric EIP value.  The old dump looked this way:
      
        bad: scheduling while atomic!
        Stack: ffffffff c041c72f 0000006a 00000068 000000f0 c13e1f28 c04c49c0 c13e1f28
               c02a4099 c04c49c0 000000f0 00000000 00003104 c012592e 00003104 00003104
               ffffffff 34000286 00000282 00000000 00000000 c13e1f28 c04c49c0 c04c4468
        Call Trace:
         [<c011f009>]sys_gettimeofday+0x89/0x90
         [<c0113e40>]do_page_fault+0x0/0x49e
         [<c0107d63>]syscall_call+0x7/0xb
      
      the new output is:
      
        bad: scheduling while atomic!
        Call Trace:
         [<c011f009>] sys_gettimeofday+0x89/0x90
         [<c0113e40>] do_page_fault+0x0/0x49e
         [<c0107d63>] syscall_call+0x7/0xb
      
      much nicer and much more compact.
      26f7ff2e
    • Ivan Kokshaysky's avatar
      [PATCH] alpha compile fixes · 76c405d4
      Ivan Kokshaysky authored
      - alpha/kernel/signal.c: sigmask_lock to sig->siglock transition;
      - alpha/lib/Makefile: fix EV6 targets (restore EXTRA_AFLAGS accidentally
        killed by previous patch).
      76c405d4
    • Richard Henderson's avatar
      [PATCH] alpha strncpy fix · 0840c8ec
      Richard Henderson authored
      Ported across from a nearly identical fix to the glibc tree.  Under
      some conditions we'd read one too many source words and segfault.
      0840c8ec
    • Ivan Kokshaysky's avatar
      [PATCH] PCI: probing read-only BARs · 1307ef66
      Ivan Kokshaysky authored
      Some pci devices may have base address registers locked with non-zero values.
      Examples:
      - AGP aperture BAR of AMD-7xx host bridges: if the AGP window disabled,
        this BAR is read-only and read as 0x00000008;
      - BAR0-4 of ALi IDE controllers can be non-zero and read-only.
      
      Obviously, we can't calculate correct size of the respective region in
      this case (for AMD AGP window we'll get 4 GB resource - ouch).
      So I think that we should ignore r/o BARs (let the device specific
      fixups deal with them if needed).
      
      Patch appended (note that extra write(0)/read-back pair is required,
      as the BAR might be programmed with all 1s).
      1307ef66
    • Linus Torvalds's avatar
      Merge penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/andrew · fda0b1ed
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      fda0b1ed
    • Hugh Dickins's avatar
      [PATCH] shmem whitespace cleanups · 5de3d3bd
      Hugh Dickins authored
      Regularize the erratic whitespace conventions in mm/shmem.c.  Removal
      of blank line changes BUG_ON line numbers, otherwise builds the same.
      5de3d3bd
    • Hugh Dickins's avatar
      [PATCH] shmem: misc changes and cleanups · 42ec8004
      Hugh Dickins authored
      If PAGE_CACHE_SIZE were to differ from PAGE_SIZE, the VM_ACCT macro,
      and shmem_nopage's vm_pgoff manipulation, were still not quite right.
      
      Slip a cond_resched_lock into shmem_truncate's long loop; but not into
      shmem_unuse_inode's, since other locks held, and swapoff awful anyway.
      
      Move SetPageUptodate to where it's not already set.  Replace
      copy_from_user by __copy_from_user since access already verified.
      
      Replace BUG()s by BUG_ON()s.  Remove an uninteresting PAGE_BUG().
      42ec8004
    • Hugh Dickins's avatar
      [PATCH] shmem accounting fixes · 62fe4120
      Hugh Dickins authored
      If we're going to rely on struct page *s rather than virtual addresses
      for the metadata pages, let's count nr_swapped in the private field:
      these pages are only for storing swp_entry_ts, and need not be examined
      at all when nr_swapped is zero.
      62fe4120
    • Hugh Dickins's avatar
      [PATCH] put shmem metadata in highmem · 2729b9af
      Hugh Dickins authored
      wli suffered OOMs because tmpfs was allocating GFP_USER, for its
      metadata pages.  This patch allocates them GFP_HIGHUSER (default
      mapping->gfp_mask) and uses atomic kmaps to access (KM_USER0 for upper
      levels, KM_USER1 for lowest level).  shmem_unuse_inode and
      shmem_truncate rewritten alike to avoid repeated maps and unmaps of the
      same page: cr's truncate was much more elegant, but I couldn't quite
      see how to convert it.
      
      I do wonder whether this patch is a bloat too far for tmpfs, and even
      non-highmem configs will be penalised by page_address overhead (perhaps
      a further patch could get over that).  There is an attractive
      alternative (keep swp_entry_ts in the existing radix-tree, no metadata
      pages at all), but we haven't worked out an unhacky interface to that.
      For now at least, let's give tmpfs highmem metadata a spin.
      2729b9af
    • Hugh Dickins's avatar
      [PATCH] shmem: avoid metadata leakiness · 03844e4b
      Hugh Dickins authored
      akpm and wli each discovered unfortunate behaviour of dbench on tmpfs:
      after tmpfs has reached its data memory limit, dbench continues to
      lseek and write, and tmpfs carries on allocating unlimited metadata
      blocks to accommodate the data it then refuses.  That particular
      behaviour could be simply fixed by checking earlier; but I think tmpfs
      metablocks should be subject to the memory limit, and included in df
      and du accounting.  Also, manipulate inode->i_blocks under lock, was
      missed before.
      03844e4b
    • Hugh Dickins's avatar
      [PATCH] consolidate shmem_getpage and shmem_getpage_locked · 7aa8800b
      Hugh Dickins authored
      The distinction between shmem_getpage and shmem_getpage_locked is not
      helpful, particularly now info->sem is gone; and shmem_getpage
      confusingly tailored to shmem_nopage's expectations.  Put the code of
      shmem_getpage_locked into the frame of shmem_getpage, leaving its
      callers to unlock_page afterwards.
      7aa8800b
    • Hugh Dickins's avatar
      [PATCH] shmem: remove info->sem · cd7fef3d
      Hugh Dickins authored
      Between inode->i_sem and info->lock comes info->sem; but it doesn't
      guard thoroughly against the difficult races (truncate during read),
      and serializes reads from tmpfs unlike other filesystems.  I'd prefer
      to work with just i_sem and info->lock, backtracking when necessary
      (when another task allocates block or metablock at the same time).
      
      (I am not satisfied with the locked setting of next_index at the start
      of shmem_getpage_locked: it's one lock hold too many, and it doesn't
      really fix races against truncate better than before: another patch in
      a later batch will resolve that.)
      cd7fef3d
    • Hugh Dickins's avatar
      [PATCH] shmem truncate race fix · 91abc449
      Hugh Dickins authored
      The earlier partial truncation fix in shmem_truncate admits it is racy,
      and I've now seen that (though perhaps more likely when
      mpage_writepages was writing pages it shouldn't).  A cleaner fix is,
      not to repeat the memclear in shmem_truncate, but to hold the partial
      page in memory throughout truncation, by shmem_holdpage from
      shmem_notify_change.
      91abc449
    • Hugh Dickins's avatar
      [PATCH] add shmem_vm_writeback() · 3e884b46
      Hugh Dickins authored
      Give tmpfs its own shmem_vm_writeback (and empty shmem_writepages):
      going through the default mpage_writepages is very wrong for tmpfs,
      since that may write nearby pages while still mapped into mms, but
      "writing" converts pages from tmpfs file identity to swap backing
      identity: doing so while mapped breaks assumptions throughout e.g.  the
      shared file is liable to disintegrate into private instances.
      3e884b46
    • Hugh Dickins's avatar
      [PATCH] tmpfs: minor fixes · 83c69b86
      Hugh Dickins authored
      tmpfs contributes to the AltSysRqM swapcache add and delete statistics,
      but not to its find statistics: use lookup_swap_cache wrapper to
      find_get_page, to contribute to those statistics too.  Elsewhere, use
      existing info pointer and NAME_MAX definition.  (I'll be sending 2.4
      version to Marcelo shortly.)
      83c69b86
    • Hugh Dickins's avatar
      [PATCH] tpmfs: fake a non-zero size for directories · a76da73c
      Hugh Dickins authored
      Apparently some applications are confused by tmpfs's practice of
      returning zero for the size of diretories.  In 2.4.20-pre6 Peter Anvin
      submitted a change to make tmpfs directories always have a size of "1".
      
      In the same spirit, this patch arranges for tmpfs directories to show
      up as having 20 * number_of_entries, including "." and "..".
      
      Apparently counting up the size of all the entries isn't worth the
      hassle.
      a76da73c
    • Hugh Dickins's avatar
      [PATCH] shmem_rename() fixes · 39d21233
      Hugh Dickins authored
      shmem_rename still didn't get parent directory link count quite right,
      in the case where you rename a directory in place of an empty directory
      (with rename syscall: doesn't happen like that with mv command); and it
      forgot to update new directory's ctime and mtime.  (I'll be sending 2.4
      version to Marcelo shortly.)
      39d21233
    • Hugh Dickins's avatar
      [PATCH] cleanup of page->flags manipulations · 6b5dbcf2
      Hugh Dickins authored
      I've had this patch hanging around for a couple of months (you liked an
      earlier version, but I never found time to resubmit it), remove some
      unnecessary PageDirty and PageUptodate manipulations.
      
      add_to_page_cache can only receive a dirty page in the add_to_swap
      case, so deal with it there.  add_to_swap is better off using
      add_to_page_cache directly than add_to_swap_cache.  Keep move_to_ and
      _from_swap_cache simple, and don't fiddle with flags without reason.
      It's a little less efficient to correct clean->dirty list as an
      afterthought, but cuts unusual code from slow path.
      6b5dbcf2
    • Hugh Dickins's avatar
      [PATCH] tmpfs swapoff deadlock · a2495207
      Hugh Dickins authored
      tmpfs 1/5 swapoff deadlock: my igrab/iput around the yield in
      shmem_unuse_inode was rubbish, seems my testing never really hit the
      case until last week, when truncation of course deadlocked on the page
      held locked across the iput (at least I had the foresight to say "ugh!"
      there).  Don't yield here, switch over to the simple backoff I'd been
      using for months in the loopable tmpfs patch (yes, it could loop
      indefinitely for memory, that's already an issue to be dealt with
      later).  The return convention from shmem_unuse to try_to_unuse is
      inelegant (commented at both ends), but effective.
      a2495207
    • Andrew Morton's avatar
      [PATCH] convert direct-io to use bio_add_page() · c21c3ad0
      Andrew Morton authored
      From Badari Pavlati.
      
      Use bio_add_page() in direct-io.c.
      c21c3ad0
    • Andrew Morton's avatar
      [PATCH] "io wait" process accounting · 7b88e5e0
      Andrew Morton authored
      Patch from Rik adds "I/O wait" statistics to /proc/stat.
      
      This allows us to determine how much system time is being spent
      awaiting IO completion.  This is an important statistic, as it tends to
      directly subtract from job completion time.
      
      procps-2.0.9 is OK with this, but doesn't report it.
      7b88e5e0
    • Andrew Morton's avatar
      [PATCH] add kswapd success accounting to /proc/vmstat · 7e96bae1
      Andrew Morton authored
      Tells us how many pages were reclaimed by kswapd.
      
      The `pgsteal' statistic tells us how many pages were reclaimed
      altogether.  So
      
      	kswapd_steal - pgsteal
      
      is the number of pages which were directly reclaimed by page allocating
      processes.
      
      
      Also, the `pgscan' data is currently counting the number of pages
      scanned in shrink_cache() plus the number of pages scanned in
      refill_inactive_zone().  These are rather separate concepts, so I
      created the new `pgrefill' counter for refill_inactive_zone().
      `pgscan' is now just the number of pages scanned in shrink_cache().
      7e96bae1
    • Andrew Morton's avatar
      [PATCH] add /proc/vmstat (start of /proc/stat cleanup) · 15e19695
      Andrew Morton authored
      Moves the VM accounting out of /proc/stat and into /proc/vmstat.
      
      The VM accounting is now per-cpu.
      
      It also moves kstat.pgpgin and kstat.pgpgout into /proc/vmstat.
      Which is a bit of a duplication of /proc/diskstats (SARD), but it's
      easy, super-cheap and makes life a lot easier for all the system
      monitoring applications which we just broke.
      
      We now require procps 2.0.9.
      
      Updated versions of top and vmstat are available at http://surriel.com
      and the Cygnus CVS is uptodate for these changes.  (Rik has the CVS
      info at the above site).
      
      This tidies up kernel_stat quite a lot - it now only contains CPU
      things (interrupts and CPU loads) and disk things.  So we now have:
      
      /proc/stat:	CPU things and disk things
      /proc/vmstat:	VM things	(plus pgpgin, pgpgout)
      
      The SARD patch removes the disk things from /proc/stat as well.
      15e19695
    • Andrew Morton's avatar
      [PATCH] truncate/invalidate_inode_pages rewrite · 735a2573
      Andrew Morton authored
      Rewrite these functions to use gang lookup.
      
      - This probably has similar performance to the old code in the common case.
      
      - It will be vastly quicker than current code for the worst case
        (single-page truncate).
      
      - invalidate_inode_pages() has been changed.  It used to use
        page_count(page) as the "is it mapped into pagetables" heuristic.  It
        now uses the (page->pte.direct != 0) heuristic.
      
      - Removes the worst cause of scheduling latency in the kernel.
      
      - It's a big code cleanup.
      
      - invalidate_inode_pages() has been changed to take an address_space
        *, not an inode *.
      
      - the maximum hold times for mapping->page_lock are enormously reduced,
        making it quite feasible to turn this into an irq-safe lock.  Which, it
        seems, is a requirement for sane AIO<->direct-io integration, as well
        as possibly other AIO things.
      
      (Thanks Hugh for fixing a bug in this one as well).
      
      (Christoph added some stuff too)
      735a2573