1. 22 Dec, 2002 5 commits
    • Andi Kleen's avatar
      [PATCH] Make mem=nopentium clear cpu_has_pse · 0b9e43dc
      Andi Kleen authored
      "mem=nopentium" would clear the PSE bit in boot_cpu_data, but the CPU
      detection later would overwrite it again from CPUID.
      
      The large pages would be correctly disabled, but cpu_has_pse was lying.
      
      This patch makes sure it stays clear when the option is given.
      
      I also took the liberty to remove these obnoxious cpu capability
      printks who give no use information (the data can be either gotten
      from CPUID in user space in raw form or from /proc/cpuinfo processed)
      0b9e43dc
    • Manfred Spraul's avatar
      [PATCH] reorder 'rep;nop;' in the spinlock macro · 5e163a89
      Manfred Spraul authored
      According to Intel's recommendation, 'rep;nop; should be called before
      testing if the lock variable was modified (i.e. rep nop;cmp;jcc). The
      current implementation does it the wrong way around: first test, then
      wait, then branch. I've asked Asit Mallik from Intel, and he recommended
      to change it.
      
      It should be at least consistent: Right now, spinlock uses
      'cmp;rep nop;jcc', rwlock uses 'rep nop;cmp;jcc'
      5e163a89
    • Linus Torvalds's avatar
      Merge http://linux-voyager.bkbits.net/dma-generic-mapping-2.5 · d21918b6
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      d21918b6
    • James Bottomley's avatar
      remove PCI_NEW_DMA_COMPAT_API · e6241a27
      James Bottomley authored
      use a #include mechanism for generic implementations of the pci_
      API in terms of the dma_ one
      e6241a27
    • Linus Torvalds's avatar
      Merge bk://linuxusb.bkbits.net/linus-2.5 · b163be65
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      b163be65
  2. 21 Dec, 2002 35 commits
    • Richard Henderson's avatar
      ba96dab4
    • Greg Kroah-Hartman's avatar
      Merge kroah.com:/home/linux/linux/BK/bleeding-2.5 · 7037193a
      Greg Kroah-Hartman authored
      into kroah.com:/home/linux/linux/BK/gregkh-2.5
      7037193a
    • James Keniston's avatar
      [PATCH] dev_printk macro · b874f98e
      James Keniston authored
      b874f98e
    • Henning Meier-Geinitz's avatar
      [PATCH] scanner.c: Support for devices with only one bulk-in endpoint · 6f815233
      Henning Meier-Geinitz authored
      This patch (originally from Sergey Vlasov) adds support for scanners
      with only one bulk-in endpoint. It's needed by all the GT-6801 based
      scanners like the Artec Ultima 2000 or some of the Mustek BearPaws.
      6f815233
    • Henning Meier-Geinitz's avatar
      [PATCH] scanner.h: add/fix vendor/product ids · 00945e82
      Henning Meier-Geinitz authored
      This patch adds additional vendor and product ids for Nikon, Mustek,
      Plustek, Genius, Epson, Canon, Umax, Hewlett-Packard, Benq, Agfa,
      and Minolta scanners. The entries for Benq, Genius and Plustek
      scanners have been updated.
      
      I've also increased the version number to 0.4.9 and brought the
      version numbers in scanner.c and scanner.h in sync.
      00945e82
    • David Brownell's avatar
      [PATCH] ehci, qtd submit and completions · a37d3ccc
      David Brownell authored
       > ... usb-storage gets unhappy when
       > it decides (why?  and unsuccessfully) to reset high speed
       > devices.  ...
      
      I don't know if that problem is resolved, but this patch
      makes the question moot by handling an earlier error correctly.
      
      The patch updates an incorrect test, so a short read will now
      be treated as one.  Please merge.
      
      This lets storage behave again.  As in, "mkfs -c" then copy
      about 8 GB around, then 'dbench'.
      a37d3ccc
    • Linus Torvalds's avatar
      Remove old pci_dma_supported(), this is done by the generic · 4e375211
      Linus Torvalds authored
      device DMA now (see <linux/pci.h> for the compat wrapper).
      4e375211
    • James Bottomley's avatar
    • James Bottomley's avatar
      generic device DMA API · 1ebad6d8
      James Bottomley authored
      add dma_ API to mirror pci_ DMA API but phrased to use struct
      device instead of struct pci_dev.
      
      See Documentation/DMA-API.txt for details
      1ebad6d8
    • Linus Torvalds's avatar
      More mtrr/if.c fixes · 011f5659
      Linus Torvalds authored
       - printk is not an acceptable substitute for errors
       - fix indentation of mtrr_close()
       - fix duplicate mtrr "release" fn pointer initializer
      011f5659
    • Andrew Morton's avatar
      [PATCH] remove unused macro MAP_ALIGN() · 7a503673
      Andrew Morton authored
      Patch from Christoph Hellwig <hch@lst.de>
      
      remove unused macro MAP_ALIGN()
      7a503673
    • Andrew Morton's avatar
      [PATCH] remove memclass() · 9a7e870f
      Andrew Morton authored
      From hch.  Nothing is using the memclass() predicate.
      9a7e870f
    • Andrew Morton's avatar
      [PATCH] don't cacheline-align radix_tree_nodes · 2a17c650
      Andrew Morton authored
      They are 260 bytes.  We can get 15 per page without cacheline
      alignment.  But we're currently only getting ten per page on P4.
      2a17c650
    • Andrew Morton's avatar
      [PATCH] hugetlbfs: set inode->i_size · 74bbb9c7
      Andrew Morton authored
      An `ls' in hugetlbfs currently shows all files having zero size.
      
      So, part-cosmetic, part-informative, we here set i_size to represent the
      index of the highest present page in the mapping, plus one.
      74bbb9c7
    • Andrew Morton's avatar
      [PATCH] hugetlb: report shared memory attachment counts · 165eaa86
      Andrew Morton authored
      From Rohit Seth
      
      Attached is a patch that passes the correct information back to user
      land for number of attachments to shared memory segment.  I could have
      done few more changes in a way nattach is getting set for regular cases
      now, but just want to limit it at this point.
      165eaa86
    • Andrew Morton's avatar
      [PATCH] hugetlb bugfixes · f19dc938
      Andrew Morton authored
      From Rohit Seth
      
      1) Bug fixes (mainly in the unsuccessful attempts of hugepages).
      
         i) not modifying the value of key for unsuccessful key
            allocation
      
         ii) Correct usage of mmap_sem in free_hugepages
      
         iii) Proper unlocking of key->lock for partial hugepage
              allocations
      
      
      2) Include the IPC_LOCK for permission to use hugepages via the
         syscall interface.  This brings the syscall interface into line with
         the hugetlbfs interface.
      
         It also adds permits users who are in the superuser group to
         access hugetlb resources.  This is so that database servers can run
         without elevated permissions.
      
      3) Increment the key_counts during forks to correctly identify the
         number of processes references a key.
      f19dc938
    • Andrew Morton's avatar
      [PATCH] ext3: fix buffer dirtying · 0c74aabb
      Andrew Morton authored
      This is a forward-port from 2.4.  One of Stephen's recent fixes.  I
      managed to merge up only half of it.  Here is the rest.  It should fix
      the asserton failure reported by Robert Macaulay
      <robert_macaulay@dell.com>
      
      "There was a race window in buffer refiling where we could temporarily
       expose the journal's internal BH_JBDDirect flag as BH_Dirty, which is
       visible to the rest of the VFS.  That doesn't affect the journaling,
       because we hold journal_head locks while the buffer is in this
       transient state, but bdflush can see the buffer and write it out
       unexpectedly, causing ext3 to find the buffer in an unexpected state
       later."
      
       The fix simply keeps the dirty bits clear during the internal buffer
       processing, restoring the state to the private BH_JBDDirect once
       refiling is complete."
      0c74aabb
    • Andrew Morton's avatar
      [PATCH] ext3 use-after-free bugfix · dd2f1160
      Andrew Morton authored
      If ext3_add_nondir() fails it will do an iput() of the inode.  But we
      continue to run ext3_mark_inode_dirty() against the potentially-freed
      inode.  This oopses when slab poisoning is enabled.
      
      Fix it so that we only run ext3_mark_inode_dirty() if the inode was
      successfully instantiated.
      dd2f1160
    • Andrew Morton's avatar
      [PATCH] rename locals in ext2_new_block() · 02d0c3df
      Andrew Morton authored
      Renames the local variables `bh2', `i', `j', 'k', and `tmp' to
      something meanigful.  This brings ext2_new_block() into line with
      ext3_new_block().
      02d0c3df
    • Andrew Morton's avatar
      [PATCH] ext2: smarter block allocation startup · 7dcaa802
      Andrew Morton authored
      The same thing, for ext2.
      7dcaa802
    • Andrew Morton's avatar
      [PATCH] ext3: smarter block allocation startup · d2562c9d
      Andrew Morton authored
      When an ext3 (or ext2) file is first created the filesystem has to
      choose the initial starting block for its data allocations.  In the
      usual (new-file) case, that initial goal block is the zeroeth block of
      a particular blockgroup.
      
      This is the worst possible choice.  Because it _guarantees_ that this
      file's blocks will be pessimally intermingled with the blocks of
      another file which is growing within the same blockgroup.
      
      We've always had this problem with files in the same directory.  With
      the introduction of the Orlov allocator we now have the problem with
      files in different directories.  And it got noticed.  This is the cause
      of the post-Orlov 50% slowdown in dbench throughput on ext3 on
      write-through caching SCSI on SMP.  And 25% in ext2.
      
      It doesn't happen on uniprocessor because a single CPU will not exhibit
      sufficient concurrency in allocation against two or more files.
      
      It will happen on uniprocessor if the files are growing slowly.
      
      It has always happened if the files are in the same directory.
      
      ext2 has the same problem but it is siginficantly less damaging there
      because of ext2's eight-block per-inode preallocation window.
      
      The patch largely solves this problem by not always starting the
      allocation goal at the zeroeth block of the blockgroup.  We instead
      chop the blockgroup into sixteen starting points and select one of those
      based on the lower four bits of the calling process's PID.
      
      The PID was chosen as the index because this will help to ensure that
      related files have the same starting goal.  If one process is slowly
      writing two files in the same directory, we still lose.
      
      
      Using the PID in the heuristic is a bit weird.  As an alternative I
      tried using the file's directory's i_ino.  That fixed the dbench
      problem OK but caused a 15% slowdown in the fast-growth `untar a kernel
      tree' workload.  Because this approach will cause files which are in
      different directories to spread out more.  Suppressing that behaviour
      when the files are all being created by the same process is a
      reasonable heuristic.
      
      
      I changed dbench to never unlink its files, and used e2fsck to
      determine how many fragmented files were present after a `dbench 32'
      run.  With this patch and the next couple, ext2's fragmentation went
      from 22% to 13% and ext3's from 25% to 10.4%.
      d2562c9d
    • Andrew Morton's avatar
      [PATCH] ext2/3: better starting group for S_ISREG files · 1cdf4231
      Andrew Morton authored
      ext2 places non-directory objects into the same blockgroup as their
      directory, as long as that directory has free inodes.  It does this
      even if there are no free blocks in that blockgroup (!).
      
      This means that if there are lots of files being created at a common
      point in the tree, they _all_ have the same starting blockgroup.  For
      each file we do a big search forwards for the first block and the
      allocations end up getting intermingled.
      
      So this patch will avoid placing new inodes in block groups which have
      no free blocks.
      
      So far so good.  But this means that if a lot of new files are being
      created under a directory (or multiple directories) which are in the
      same blockgroup, all the new inodes will overflow into the same
      blockgroup.  No improvement at all.
      
      So the patch arranges for the new inode locations to be "spread out"
      across different blockgroups if they are not going to be placed in
      their directory's block group.  This is done by adding parent->i_ino
      into the starting point for the quadratic hash.  i_ino was chosen so
      that files which are in the same directory will tend to all land in the
      same new blockgroup.
      1cdf4231
    • Andrew Morton's avatar
      [PATCH] ext2/3 commentary and cleanup · 61432dbc
      Andrew Morton authored
      - Add some (much-needed) commentary to the ext2/ext3 block allocator
        state fields.
      
      - Remove the SEARCH_FROM_ZERO debug code.  I wrote that to trigger
        some race and it hasn't been used in a year.
      61432dbc
    • Andrew Morton's avatar
      [PATCH] Give kswapd writeback higher priority than pdflush · e386771c
      Andrew Morton authored
      The `low latency page reclaim' design works by preventing page
      allocators from blocking on request queues (and by preventing them from
      blocking against writeback of individual pages, but that is immaterial
      here).
      
      This has a problem under some situations.  pdflush (or a write(2)
      caller) could be saturating the queue with highmem pages.  This
      prevents anyone from writing back ZONE_NORMAL pages.  We end up doing
      enormous amounts of scenning.
      
      A test case is to mmap(MAP_SHARED) almost all of a 4G machine's memory,
      then kill the mmapping applications.  The machine instantly goes from
      0% of memory dirty to 95% or more.  pdflush kicks in and starts writing
      the least-recently-dirtied pages, which are all highmem.  The queue is
      congested so nobody will write back ZONE_NORMAL pages.  kswapd chews
      50% of the CPU scanning past dirty ZONE_NORMAL pages and page reclaim
      efficiency (pages_reclaimed/pages_scanned) falls to 2%.
      
      So this patch changes the policy for kswapd.  kswapd may use all of a
      request queue, and is prepared to block on request queues.
      
      What will now happen in the above scenario is:
      
      1: The page alloctor scans some pages, fails to reclaim enough
         memory and takes a nap in blk_congetion_wait().
      
      2: kswapd() will scan the ZONE_NORMAL LRU and will start writing
         back pages.  (These pages will be rotated to the tail of the
         inactive list at IO-completion interrupt time).
      
         This writeback will saturate the queue with ZONE_NORMAL pages.
         Conveniently, pdflush will avoid the congested queues.  So we end up
         writing the correct pages.
      
      In this test, kswapd CPU utilisation falls from 50% to 2%, page reclaim
      efficiency rises from 2% to 40% and things are generally a lot happier.
      
      
      The downside is that kswapd may now do a lot less page reclaim,
      increasing page allocation latency, causing more direct reclaim,
      increasing lock contention in the VM, etc.  But I have not been able to
      demonstrate that in testing.
      
      
      The other problem is that there is only one kswapd, and there are lots
      of disks.  That is a generic problem - without being able to co-opt
      user processes we don't have enough threads to keep lots of disks saturated.
      
      One fix for this would be to add an additional "really congested"
      threshold in the request queues, so kswapd can still perform
      nonblocking writeout.  This gives kswapd priority over pdflush while
      allowing kswapd to feed many disk queues.  I doubt if this will be
      called for.
      e386771c
    • Andrew Morton's avatar
      [PATCH] Remove PF_NOWARN · 833cb2a6
      Andrew Morton authored
      We keep getting in a mess with the current->flags setting and
      unsetting.
      
      Remove current->flags:PF_NOWARN and create __GFP_NOWARN instead.
      833cb2a6
    • Andrew Morton's avatar
      [PATCH] misc fixes · 72c36b7d
      Andrew Morton authored
      - A C99 initialiser in drivers/char/mem.c
      
      - Remove unneeded deref in madvise_willneed()
      72c36b7d
    • Andrew Morton's avatar
      [PATCH] Add generic_file_readonly_mmap() for nommu · 503c99ef
      Andrew Morton authored
      Add a generic_file_readonly_mmap() for !CONFIG_MMU.
      503c99ef
    • Andrew Morton's avatar
      [PATCH] more informative slab poisoning · 4f781c84
      Andrew Morton authored
      slab poisons objects with 0x5a both when they are constructed and when
      they are freed.  So it is not possible to tell whether a deref of
      0x5a5a5a5a was a use-before-initialisation bug or a use-after-free bug.
      
      The patch changes it so that
      
      1) A deref of 0x5a5a5a5a means use-of-uninitialised-memory
      
      2) A deref of 0x6b6b6b6b means use-of-freed-memory.
      4f781c84
    • Andrew Morton's avatar
      [PATCH] fix use-after-free bug in move_vma() · 5446f21e
      Andrew Morton authored
      move_vma() calls do_munmap() and then uses the memory at *new_vma.
      
      But when starting X11 it just happens that the memory which do_munmap
      unmapped had the same start address and the range at *new_vma.  So new_vma
      is freed by do_munmap().
      
      This was never noticed before because (vm_flags & VM_LOCKED) evaluates
      false when vm_flags is 0x5a5a5a5a.  But I just changed that to 0x6b6b6b6b
      and boom - we call make_pages_present() with start == end == 0x6b6b6b6b and
      it goes BUG.
      
      So I think the right fix here is for move_vma() to not inspect the values
      of any vma's after it has called do_munmap().
      
      The patch does that, for `new_vma'.
      
      The local variable `vma' is also being used after the call do do_munmap(),
      and this may also be a bug.  Proving that this is not so, and adding a
      comment to explain why is hereby added to Hugh's todo list ;)
      5446f21e
    • Andrew Morton's avatar
      [PATCH] fix a page dirtying race in vmscan.c · 985babe8
      Andrew Morton authored
      There's a small window in which another CPU could dirty the page after
      we've cleaned it, and before we've moved it to mapping->dirty_pages().
      The end result is a dirty page on mapping->locked_pages, which is
      wrong.
      
      So take mapping->page_lock before clearing the dirty bit.
      985babe8
    • Andrew Morton's avatar
      [PATCH] sync_fs deadlock fix · e101875d
      Andrew Morton authored
      Running a `mount -o remount' against ext3 deadlocks if there is heavy
      write activity.  It's a sort of AB/BA deadlock caused by calling
      log_wait_commit() under lock_super().  The caller holds lock_super()
      and is waiting for a commit, but the commit cannot complete because
      lock_super() is also used in the block allocator.
      
      The way we fixed this in tha past is to drop the superblock lock inside
      ext3.  The way this patch fixes it is to arrange for lock_super() to
      not be held around the ->sync_fs() call.
      
      Also: sync_filesystems is on the sys_sync() path and is racy wrt
      unmount.  Check sb->s_root after taking sb->s_umount.
      e101875d
    • Linus Torvalds's avatar
      Sysenter cleanups (originals by Brian Gerst, updated and expanded by me): · d8ce4c5f
      Linus Torvalds authored
       - set up kernel stack pointer for sysenter at each context switch.
       - disable sysenter while in vm86 mode.
       - clean up mtrr number defines and SEP feature testing
      d8ce4c5f
    • Linus Torvalds's avatar
    • Ivan Kokshaysky's avatar
      [PATCH] PCI: setup-xx fixes · 2ce208e5
      Ivan Kokshaysky authored
      Don't disable PCI devices before changing the BARs, as discussed
      recently.  Disabling PCI_COMMAND_MASTER bit is an obvious bug.
      
      Further, pdev_enable_device() is a leftover from very old (2.0, I guess)
      alpha PCI code.  It's used in pci_assign_unassigned_resources() to
      enable *every* PCI device in the system.  So, if we have two graphic
      cards on the same bus, both with legacy VGA IO...  oops.
      
      Actually, only alpha relied on that due to the lack of
      pcibios_enable_device (which has been already fixed).
      2ce208e5
    • Manfred Spraul's avatar
      [PATCH] new attempt at sys_poll allocation (was: Re: Poll patches..) · 9dd405aa
      Manfred Spraul authored
      This replaces the dynamically allocated two-level array in sys_poll with
      a dynamically allocated linked list.  The current implementation causes
      at least two alloc/free calls, even if only one or two descriptors are
      polled.  This reduces that to one alloc/free, and the .text segment is
      around 220 bytes shorter.  The microbenchmark that polls one pipe fd is
      around 30% faster.  [1140 cycles instead of 1604 cycles, Celeron mobile
      1.13 GHz]
      9dd405aa