Commits · aaba9265318483297267400fbfce1c399b3ac018 · Kirill Smelkov / linux

15 Aug, 2002 21 commits

[PATCH] make pagemap_lru_lock irq-safe · aaba9265

Andrew Morton authored Aug 14, 2002

It is expensive for a CPU to take an interrupt while holding the page
LRU lock, because other CPUs will pile up on the lock while the
interrupt runs.

Disabling interrupts while holding the lock reduces contention by an
additional 30% on 4-way.  This is when the only source of interrupts is
disk completion.  The improvement will be higher with more CPUs and it
will be higher if there is networking happening.

The maximum hold time of this lock is 17 microseconds on 500 MHx PIII,
which is well inside the kernel's maximum interrupt latency (which was
100 usecs when I last looked, a year ago).

This optimisation is not needed on uniprocessor, but the patch disables
IRQs while holding pagemap_lru_lock anyway, so it becomes an irq-safe
spinlock, and pages can be moved from the LRU in interrupt context.

pagemap_lru_lock has been renamed to _pagemap_lru_lock to pick up any
missed uses, and to reliably break any out-of-tree patches which may be
using the old semantics.

aaba9265

[PATCH] batched removal of pages from the LRU · 008f707c

Andrew Morton authored Aug 14, 2002

Convert all the bulk callers of lru_cache_del() to use the batched
pagevec_lru_del() function.

Change truncate_complete_page() to not delete the page from the LRU.
Do it in page_cache_release() instead.  (This reintroduces the problem
with final-release-from-interrupt.  THat gets fixed further on).

This patch changes the truncate locking somewhat.  The removal from the
LRU now happens _after_ the page has been removed from the
address_space and has been unlocked.  So there is now a window where
the shrink_cache code can discover the to-be-freed page via the LRU
list.  But that's OK - the page is clean, its buffers (if any) are
clean.  It's not attached to any mapping.

008f707c

[PATCH] batched addition of pages to the LRU · 9eb76ee2

Andrew Morton authored Aug 14, 2002

The patch goes through the various places which were calling
lru_cache_add() against bulk pages and batches them up.

Also.  This whole patch series improves the behaviour of the system
under heavy writeback load.  There is a reduction in page allocation
failures, some reduction in loss of interactivity due to page
allocators getting stuck on writeback from the VM.  (This is still bad
though).

I think it's due to the change here in mpage_writepages().  That
function was originally unconditionally refiling written-back pages to
the head of the inactive list.  The theory being that they should be
moved out of the way of page allocators, who would end up waiting on
them.

It appears that this simply had the effect of pushing dirty, unwritten
data closer to the tail of the inactive list, making things worse.

So instead, if the caller is (typically) balance_dirty_pages() then
leave the pages where they are on the LRU.

If the caller is PF_MEMALLOC then the pages *have* to be refiled.  This
is because VM writeback is clustered along mapping->dirty_pages, and
it's almost certain that the pages which are being written are near the
tail of the LRU.  If they were left there, page allocators would block
on them too soon.  It would effectively become a synchronous write.

9eb76ee2

[PATCH] batched movement of lru pages in writeback · 823e0df8
Andrew Morton authored Aug 14, 2002
```
Makes mpage_writepages() move pages around on the LRU sixteen-at-a-time
rather than one-at-a-time.
```
823e0df8

[PATCH] multithread page reclaim · 3aa1dc77

Andrew Morton authored Aug 14, 2002

This patch multithreads the main page reclaim function, shrink_cache().

This function used to run under pagemap_lru_lock.  Instead, we grab
that lock, put 32 pages from the LRU into a private list, drop the
pagemap_lru_lock and then proceed to attempt to free those pages.

Any pages which were succesfully reclaimed are batch-freed.  Pages
which were not reclaimed are re-added to the LRU.

This patch reduces pagemap_lru_lock contention on the 4-way by a factor
of thirty.

The shrink_cache() code has been simplified somewhat.

refill_inactive() was being called too often - often just to process
two or three pages.  Fiddled with that so it processes pages at the
same rate, but works on 32 pages at a time.

Added a couple of mark_page_accessed() calls into mm/memory.c from 2.4.
They seem appropriate.

Change the shrink_caches() logic so that it will still trickle through
the active list (via refill_inactive) even if the inactive list is much
larger than the active list.

3aa1dc77

[PATCH] pagevec infrastructure · 6a952840

Andrew Morton authored Aug 14, 2002

This is the first patch in a series of eight which address
pagemap_lru_lock contention, and which simplify the VM locking
hierarchy.

Most testing has been done with all eight patches applied, so it would
be best not to cherrypick, please.

The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six
disks, six filesystems, six processes each flat-out writing a large
file onto one of the disks.  ie: heavy page replacement load.

The frequency with which pagemap_lru_lock is taken is reduced by 90%.

Lockmeter claims that pagemap_lru_lock contention on the 4-way has been
reduced by 98%.  Total amount of system time lost to lock spinning went
from 2.5% to 0.85%.

Anton ran a similar test on 8-way PPC, the reduction in system time was
around 25%, and the reduction in time spent playing with
pagemap_lru_lock was 80%.

	http://samba.org/~anton/linux/2.5.30/standard/
versus
	http://samba.org/~anton/linux/2.5.30/akpm/

Throughput changes on uniprocessor are modest: a 1% speedup with this
workload due to shortened code paths and improved cache locality.

The patches do two main things:

1: In almost all places where the kernel was doing something with
   lots of pages one-at-a-time, convert the code to do the same thing
   sixteen-pages-at-a-time.  Take the lock once rather than sixteen
   times.  Take the lock for the minimum possible time.

2: Multithread the pagecache reclaim function: don't hold
   pagemap_lru_lock while reclaiming pagecache pages.  That function
   was massively expensive.

One fallout from this work is that we never take any other locks while
holding pagemap_lru_lock.  So this lock conceptually disappears from
the VM locking hierarchy.


So.  This is all basically a code tweak to improve kernel scalability.
It does it by optimising the existing design, rather than by redesign.
There is little conceptual change to how the VM works.

This is as far as I can tweak it.  It seems that the results are now
acceptable on SMP.  But things are still bad on NUMA.  It is expected
that the per-zone LRU and per-zone LRU lock patches will fix NUMA as
well, but that has yet to be tested.


This first patch introduces `struct pagevec', which is the basic unit
of batched work.  It is simply:

struct pagevec {
	unsigned nr;
	struct page *pages[16];
};

pagevecs are used in the following patches to get the VM away from
page-at-a-time operations.

This patch includes all the pagevec library functions which are used in
later patches.

6a952840

[PATCH] lockd shouldn't call posix_unblock_lock here · ecc9d325

Matthew Wilcox authored Aug 14, 2002

nlmsvc_notify_blocked() is only called via the fl_notify() pointer which
is only called immediately after we already did a locks_delete_block(),
so calling posix_unblock_lock() here is always a NOP.

ecc9d325

[PATCH] Modular x86 MTRR driver. · 6a85ced0

Dave Jones authored Aug 14, 2002

This patch from Pat Mochel cleans up the hell that was mtrr.c
into something a lot more modular and easy to understand, by
doing the implementation-per-file as has been done to various
other things by Pat and myself over the last months.

It's functionally identical from a kernel internal point of view,
and a userspace point of view, and is basically just a very large
code clean up.

6a85ced0

[PATCH] stale thread detach debugging removal · 3b307fd5

Ingo Molnar authored Aug 14, 2002

one of the debugging tests triggered a false-positive BUG() when a
detached thread was straced.

3b307fd5

[PATCH] thread release infrastructure · d2b7244f

Ingo Molnar authored Aug 14, 2002

it is much cleaner to pass in the address of the user-space VM lock -
this will also enable arbitrary implementations of the stack-unlock, as
the fifth clone() parameter.

d2b7244f

[PATCH] init_tasks is not defined anywhere. · 86ae817e

Rusty Russell authored Aug 14, 2002

It's referenced by mips and mips64 (both far out of date), but never
actually defined anywhere.

86ae817e

Merge http://linuxusb.bkbits.net/linus-2.5 · edf3d92b
Linus Torvalds authored Aug 14, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
edf3d92b
[PATCH] es1371 synchronize_irq · 17454310
Petr Vandrovec authored Aug 14, 2002
```
Update ES1371 to new synchronize_irq() API.
```
17454310

[PATCH] broken cfb* support in the 2.5.31-bk · 9299c003

Petr Vandrovec authored Aug 14, 2002

line_length, type and visual moved from display struct to the fb_info's fix
structure during last fbdev updates. Unfortunately generic code was not updated
together, so now every fbdev driver is broken.

9299c003

[PATCH] Unicode characters 0x80-0x9F are valid ISO* characters · 26036678

Petr Vandrovec authored Aug 14, 2002

Characters 0x80-0x9F from ISO encodings are U+0080-U+009F, so map
them both ways. Otherwise you cannot use chars 0x80-0x9F in filenames
on filesystems using NLS.

26036678

Merge http://linux-scsi.bkbits.net/scsi-for-linus-2.5 · f9969cbe
Linus Torvalds authored Aug 14, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
f9969cbe
Merge bk://ldm.bkbits.net/linux-2.5 · ad2d842b
Linus Torvalds authored Aug 14, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
ad2d842b
[PATCH] Trivial: remove sti from aic7xxx_old · 0352f6f5
Matthew Wilcox authored Aug 14, 2002
```
We don't need to reenable interrupts before calling panic.
```
0352f6f5
[PATCH] umem per-disk gendisks · 49ae70c0
Alexander Viro authored Aug 14, 2002

49ae70c0
[PATCH] dasd per-disk gendisks · 664aa7b2
Alexander Viro authored Aug 14, 2002

664aa7b2
[PATCH] acsi per-disk gendisks · bedbeab4
Alexander Viro authored Aug 14, 2002

bedbeab4

14 Aug, 2002 18 commits

Merge mulgrave.(none):/home/jejb/BK/53c700-2.5 · 909a019a
James Bottomley authored Aug 14, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
909a019a
USB: changed usb_match_id to not need the usb_device pointer. · f601a8a6
Greg Kroah-Hartman authored Aug 14, 2002

f601a8a6
Merge ssh://linux-scsi@linux-scsi.bkbits.net/scsi-for-linus-2.5 · 130fbeeb
James Bottomley authored Aug 14, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
130fbeeb

[PATCH] USB core cleanups · 16dc2073

David Brownell authored Aug 14, 2002

Moves some functions that are only used by usbfs to be private, and
documents some of the interface issues that need to be cleaned up.

16dc2073

USB: fixed DEVICE_ATTR usage in the ehci driver · 97a75be6
Greg Kroah-Hartman authored Aug 14, 2002

97a75be6
[SCSI debug driver] change DRIVER_ATTR usage · 8403fb48
James Bottomley authored Aug 14, 2002

8403fb48
Merge by hand · c6efcb49
James Bottomley authored Aug 14, 2002

c6efcb49
Merge mulgrave.(none):/home/jejb/BK/scsi-cpqfc-2.5 · 12c262b2
James Bottomley authored Aug 14, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
12c262b2
Merge by hand · 378a8995
James Bottomley authored Aug 13, 2002

378a8995

This version of sg for the lk 2.5 series re-adds direct IO · 2e0376aa

Douglas Gilbert authored Aug 13, 2002

support using work done by Kai Makisara (on st driver, posted
2002/7/29).

Changelog:
    Changes since 3.5.26 (20020708)
        - re-add direct IO using Kai Makisara's work
        - re-tab to 8, start using C99-isms
        - simplify memory management

Like Kai's patch, this one needs kernel/ksyms.c altered
to export get_user_pages(). Kai's worker routines
st_map_user_pages() and st_unmap_user_pages() are duplicated
as is. Hopefully these routines will find a home in
a library soon.

The re-tabbing makes the patches rather large so here
are 2 urls:
This tarball contains sg.h and sg.c 
	http://www.torque.net/sg/p/sg3527.tgz
This gzipped patch is against lk 2.5.31 and touches
kernel/ksyms.c as well
	http://www.torque.net/sg/p/sg_3527_lk2531.diff.gz

Testing is ongoing, everything works apart from "zero
copy" copy. That uses mmap-ed IO on the read side and
direct IO on the write side. Not too many people would
be using that I suspect.

Doug Gilbert

2e0376aa

[PATCH] lk 2.5.31 scsi interface documentation · 9017032b

Douglas Gilbert authored Aug 13, 2002

Linus,
Below is a patch to a file that documents the interface
between the scsi mid level and lower level (HBA) drivers.

The main change is documenting "autosense". bios_param()'s
interface has changed.

Doug Gilbert

9017032b

Here is an update for scsi_debug that utilizes driverfs · bb70b680

Douglas Gilbert authored Aug 13, 2002

support for per driver parameters added in lk 2.5.31

1.62 changes:
  - driverfs support for these options (more to come):
    /driverfs/bus/scsi/drivers/scsi_debug/delay [rw]
    /driverfs/bus/scsi/drivers/scsi_debug/num_devs [r]
    /driverfs/bus/scsi/drivers/scsi_debug/opts [rw]
  - start using some C99
  - fdisk requires EINVAL from unsupported ioctls
    (scsi_debug previously used ENOTTY)

1.61 changes:
  - simulate delayed responses, controlled by
    'scsi_debug_delay'
  - support REPORT LUNS
  - support more MODE SENSE pages
  - [following Doug Ledford's suggestion] do autosense
    (i.e. set Scsi_Cmnd::sense_buffer array appropriately
     when a status of CHECK CONDITION is set)
  - minor driverfs support
  - start adding error injection logic, see
    "scsi_debug_every_nth"

Doug Gilbert

bb70b680

Remove extra '#include <linux/err.h>' in drivers/base/core.c · 073df01f
Patrick Mochel authored Aug 13, 2002

073df01f

Remove device_root device; replace with global_device_list. · 70f7d2ec

Patrick Mochel authored Aug 13, 2002

The device_root device was only a placeholder device that provided a head
for the global device list, and a parent directory for root bridge devices.

This removes the device and replaces with an explicit global_device_list
and a separate root directory. We never used any of the other fields in
device_root, and we special cased it. So, it's better off dead.

70f7d2ec

Make sure we do to_dev(node) in device_suspend(). · dc4c65da
Patrick Mochel authored Aug 13, 2002

dc4c65da
Use C99 initializers in driver model core · 3a245026
Patrick Mochel authored Aug 13, 2002

3a245026
[PATCH] per-disk gendisks in ataraid · d572f1a5
Alexander Viro authored Aug 13, 2002

d572f1a5
[PATCH] NUMA-Q disable irqbalance · 59a00f8c
Martin J. Bligh authored Aug 13, 2002
```
This just adds an if switch to irq_balance which the compiler optimises
away anyway.
```
59a00f8c

13 Aug, 2002 1 commit

[PATCH] add FP exception mode prctl · fcc6fcc6

Paul Mackerras authored Aug 13, 2002

This patch that adds a prctl so that processes can set their
floating-point exception mode on PPC and on PPC64.  We need this
because the FP exception mode is controlled by bits in the machine
state register, which can only be accessed by the kernel, and because
the exception mode setting interacts with the lazy FPU save/restore
that the kernel does.

fcc6fcc6