Commits · 03844e4b25f5993847fea8f2936eee540167cd41 · nexedi / linux

03 Oct, 2002 20 commits

[PATCH] shmem: avoid metadata leakiness · 03844e4b

Hugh Dickins authored Oct 02, 2002

akpm and wli each discovered unfortunate behaviour of dbench on tmpfs:
after tmpfs has reached its data memory limit, dbench continues to
lseek and write, and tmpfs carries on allocating unlimited metadata
blocks to accommodate the data it then refuses.  That particular
behaviour could be simply fixed by checking earlier; but I think tmpfs
metablocks should be subject to the memory limit, and included in df
and du accounting.  Also, manipulate inode->i_blocks under lock, was
missed before.

03844e4b

[PATCH] consolidate shmem_getpage and shmem_getpage_locked · 7aa8800b

Hugh Dickins authored Oct 02, 2002

The distinction between shmem_getpage and shmem_getpage_locked is not
helpful, particularly now info->sem is gone; and shmem_getpage
confusingly tailored to shmem_nopage's expectations.  Put the code of
shmem_getpage_locked into the frame of shmem_getpage, leaving its
callers to unlock_page afterwards.

7aa8800b

[PATCH] shmem: remove info->sem · cd7fef3d

Hugh Dickins authored Oct 02, 2002

Between inode->i_sem and info->lock comes info->sem; but it doesn't
guard thoroughly against the difficult races (truncate during read),
and serializes reads from tmpfs unlike other filesystems.  I'd prefer
to work with just i_sem and info->lock, backtracking when necessary
(when another task allocates block or metablock at the same time).

(I am not satisfied with the locked setting of next_index at the start
of shmem_getpage_locked: it's one lock hold too many, and it doesn't
really fix races against truncate better than before: another patch in
a later batch will resolve that.)

cd7fef3d

[PATCH] shmem truncate race fix · 91abc449

Hugh Dickins authored Oct 02, 2002

The earlier partial truncation fix in shmem_truncate admits it is racy,
and I've now seen that (though perhaps more likely when
mpage_writepages was writing pages it shouldn't).  A cleaner fix is,
not to repeat the memclear in shmem_truncate, but to hold the partial
page in memory throughout truncation, by shmem_holdpage from
shmem_notify_change.

91abc449

[PATCH] add shmem_vm_writeback() · 3e884b46

Hugh Dickins authored Oct 02, 2002

Give tmpfs its own shmem_vm_writeback (and empty shmem_writepages):
going through the default mpage_writepages is very wrong for tmpfs,
since that may write nearby pages while still mapped into mms, but
"writing" converts pages from tmpfs file identity to swap backing
identity: doing so while mapped breaks assumptions throughout e.g. the
shared file is liable to disintegrate into private instances.

3e884b46

[PATCH] tmpfs: minor fixes · 83c69b86

Hugh Dickins authored Oct 02, 2002

tmpfs contributes to the AltSysRqM swapcache add and delete statistics,
but not to its find statistics: use lookup_swap_cache wrapper to
find_get_page, to contribute to those statistics too.  Elsewhere, use
existing info pointer and NAME_MAX definition.  (I'll be sending 2.4
version to Marcelo shortly.)

83c69b86

[PATCH] tpmfs: fake a non-zero size for directories · a76da73c

Hugh Dickins authored Oct 02, 2002

Apparently some applications are confused by tmpfs's practice of
returning zero for the size of diretories.  In 2.4.20-pre6 Peter Anvin
submitted a change to make tmpfs directories always have a size of "1".

In the same spirit, this patch arranges for tmpfs directories to show
up as having 20 * number_of_entries, including "." and "..".

Apparently counting up the size of all the entries isn't worth the
hassle.

a76da73c

[PATCH] shmem_rename() fixes · 39d21233

Hugh Dickins authored Oct 02, 2002

shmem_rename still didn't get parent directory link count quite right,
in the case where you rename a directory in place of an empty directory
(with rename syscall: doesn't happen like that with mv command); and it
forgot to update new directory's ctime and mtime.  (I'll be sending 2.4
version to Marcelo shortly.)

39d21233

[PATCH] cleanup of page->flags manipulations · 6b5dbcf2

Hugh Dickins authored Oct 02, 2002

I've had this patch hanging around for a couple of months (you liked an
earlier version, but I never found time to resubmit it), remove some
unnecessary PageDirty and PageUptodate manipulations.

add_to_page_cache can only receive a dirty page in the add_to_swap
case, so deal with it there.  add_to_swap is better off using
add_to_page_cache directly than add_to_swap_cache.  Keep move_to_ and
_from_swap_cache simple, and don't fiddle with flags without reason.
It's a little less efficient to correct clean->dirty list as an
afterthought, but cuts unusual code from slow path.

6b5dbcf2

[PATCH] tmpfs swapoff deadlock · a2495207

Hugh Dickins authored Oct 02, 2002

tmpfs 1/5 swapoff deadlock: my igrab/iput around the yield in
shmem_unuse_inode was rubbish, seems my testing never really hit the
case until last week, when truncation of course deadlocked on the page
held locked across the iput (at least I had the foresight to say "ugh!"
there). Don't yield here, switch over to the simple backoff I'd been
using for months in the loopable tmpfs patch (yes, it could loop
indefinitely for memory, that's already an issue to be dealt with
later). The return convention from shmem_unuse to try_to_unuse is
inelegant (commented at both ends), but effective.

a2495207

[PATCH] convert direct-io to use bio_add_page() · c21c3ad0
Andrew Morton authored Oct 02, 2002
```
From Badari Pavlati.

Use bio_add_page() in direct-io.c.
```
c21c3ad0

[PATCH] "io wait" process accounting · 7b88e5e0

Andrew Morton authored Oct 02, 2002

Patch from Rik adds "I/O wait" statistics to /proc/stat.

This allows us to determine how much system time is being spent
awaiting IO completion.  This is an important statistic, as it tends to
directly subtract from job completion time.

procps-2.0.9 is OK with this, but doesn't report it.

7b88e5e0

[PATCH] add kswapd success accounting to /proc/vmstat · 7e96bae1

Andrew Morton authored Oct 02, 2002

Tells us how many pages were reclaimed by kswapd.

The `pgsteal' statistic tells us how many pages were reclaimed
altogether.  So

	kswapd_steal - pgsteal

is the number of pages which were directly reclaimed by page allocating
processes.


Also, the `pgscan' data is currently counting the number of pages
scanned in shrink_cache() plus the number of pages scanned in
refill_inactive_zone().  These are rather separate concepts, so I
created the new `pgrefill' counter for refill_inactive_zone().
`pgscan' is now just the number of pages scanned in shrink_cache().

7e96bae1

[PATCH] add /proc/vmstat (start of /proc/stat cleanup) · 15e19695

Andrew Morton authored Oct 02, 2002

Moves the VM accounting out of /proc/stat and into /proc/vmstat.

The VM accounting is now per-cpu.

It also moves kstat.pgpgin and kstat.pgpgout into /proc/vmstat.
Which is a bit of a duplication of /proc/diskstats (SARD), but it's
easy, super-cheap and makes life a lot easier for all the system
monitoring applications which we just broke.

We now require procps 2.0.9.

Updated versions of top and vmstat are available at http://surriel.com
and the Cygnus CVS is uptodate for these changes.  (Rik has the CVS
info at the above site).

This tidies up kernel_stat quite a lot - it now only contains CPU
things (interrupts and CPU loads) and disk things.  So we now have:

/proc/stat:	CPU things and disk things
/proc/vmstat:	VM things	(plus pgpgin, pgpgout)

The SARD patch removes the disk things from /proc/stat as well.

15e19695

[PATCH] truncate/invalidate_inode_pages rewrite · 735a2573

Andrew Morton authored Oct 02, 2002

Rewrite these functions to use gang lookup.

- This probably has similar performance to the old code in the common case.

- It will be vastly quicker than current code for the worst case
  (single-page truncate).

- invalidate_inode_pages() has been changed.  It used to use
  page_count(page) as the "is it mapped into pagetables" heuristic.  It
  now uses the (page->pte.direct != 0) heuristic.

- Removes the worst cause of scheduling latency in the kernel.

- It's a big code cleanup.

- invalidate_inode_pages() has been changed to take an address_space
  *, not an inode *.

- the maximum hold times for mapping->page_lock are enormously reduced,
  making it quite feasible to turn this into an irq-safe lock.  Which, it
  seems, is a requirement for sane AIO<->direct-io integration, as well
  as possibly other AIO things.

(Thanks Hugh for fixing a bug in this one as well).

(Christoph added some stuff too)

735a2573

[PATCH] radix tree gang lookup · 55b40732

Andrew Morton authored Oct 02, 2002

Adds a gang lookup facility to radix trees.  It provides an efficient
means of locating a bunch of pages starting at a particular offset.

The implementation is a bit dumb, but is efficient enough.  And it is
amenable to the `tagged lookup' extension which is proving tricky to
write, but which will allow the dirty pages within a mapping to be
located in pgoff_t order.

Thanks are due to Huch Dickins for finding and fixing an unpleasant bug
in here.

55b40732

[PATCH] remove bogus BUG in page_remove_rmap() · 803f57a8

Andrew Morton authored Oct 02, 2002

Pages with no reverse mapping can be present in page tables as a result
of a driver performing remap_page_range().  Don't go BUG over them.

803f57a8

[PATCH] mprotect bugfix · 9c96b76d

Andrew Morton authored Oct 02, 2002

Patch from Hugh Dickins

Our earlier fix for mprotect_fixup was broken - passing an
already-freed VMA to change_protection().

9c96b76d

[PATCH] sys_ioperm atomicity fix · f9a4baef

Andrew Morton authored Oct 02, 2002

sys_ioperm() is calling kmalloc(GFP_KERNEL) inside get_cpu().  That's
wrong, because the memory allocation could schedule away and return on
a different CPU.

So change it to perform the memory allocation outside the atomic region.

f9a4baef

[PATCH] misc (mainly documentation) · 3a9ed298

Andrew Morton authored Oct 02, 2002

- hugetlb Documentation update

- Add /proc/buddyinfo documentation

- nano-cleanup in __remove_from_page_cache.

3a9ed298

30 Sep, 2002 13 commits

Linux v2.5.40 · 7570df54
Linus Torvalds authored Sep 30, 2002

7570df54
Merge http://linux-scsi.bkbits.net/scsi-for-linus-2.5 · 2b9fa51a
Linus Torvalds authored Sep 30, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
2b9fa51a
Merge mulgrave.(none):/home/jejb/BK/linux-2.5 · fd0a1c61
James Bottomley authored Sep 30, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
fd0a1c61
Error handler general clean up · 9b46c836
Mike Anderson authored Sep 30, 2002

9b46c836

[PATCH] sg.c and USER_HZ, kernel 2.5.37 · 8885e375

Rolf Fokkens authored Sep 30, 2002

Hi!

Since the introduction of USER_HZ the SG_[GS]ET_TIMEOUT ioctls may have
a serious BUG as userspace uses a different HZ from the HZ in kernelspace.

In x86 HZ=1000 and USER_HZ=100, resulting in confusing timouts as the
kernel measures time 10 times as fast as userspace.

This patch is an attempt to fix this by transforming USER_HZ based timing to
HZ based timing before storing it in timeout. To make sure that SG_GET_TIMEOUT
and SG_SET_TIMEOUT behave consistently a field timeout_user is added which
stores the exact value that's passed by SG_SET_TIMEOUT and it's returned on
SG_GET_TIMEOUT.

Rolf Fokkens
fokkensr@fokkensr.vertis.nl

P.S. this is the second post of this patch

8885e375

[SCSI 53c700] flag as able to do I/O from highmem · dfa944ae
James Bottomley authored Sep 30, 2002

dfa944ae

scsi_initialise_merge_fn() will only set highio if ->type == TYPE_DISK. · 2b562242

Andrew Morton authored Sep 30, 2002

But it's called from scsi_add_lun()->scsi_alloc_sdev() before the type
is known.  The type is -1 all the time in scsi_initialise_merge_fn()
and scsi always bounces.

This patch makes it do the right thing - just enable block-highmem for
all scsi devices.

Jens had this to say:

"I guess that block-highmem has been around long enough, that I can
 use the term 'historically' at least in the kernel sense :-)

 This extra check was added for IDE because each device type driver
 (ide-disk, ide-cd, etc) needed to be updated to not assume virtual
 mappings of request data was valid.  I only did that for ide-disk,
 since this is the only one where bounce buffering really hurt
 performance wise.  So while ide-cd and ide-tape etc could have been
 updated, I deemed it uninteresting and not worthwhile.

 Now, this was just carried straight into the scsi counter parts,
 conveniently, because of laziness.  A quick glance at sr shows that it
 too can aviod bouncing easily (no changes needed).  st may need some
 changes, though.  So again, for scsi it was a matter of not impacting
 existing code in 2.4 too much.

 So TYPE_DISK check can be killed in 2.5 if someone does the work of
 checking that it is safe.  I'm not so sure it will make eg your SCSI
 CD-ROM that much faster :-)"

2b562242

[PATCH] Squash warning in fs/devfs/base.c · 5dd17103
David Gibson authored Sep 30, 2002
```
This removes an unused label in fs/devfs/base.c
```
5dd17103
Merge kroah.com:/home/greg/linux/BK/bleeding_edge-2.5 · 1a008d0e
Greg Kroah-Hartman authored Sep 30, 2002
```
into kroah.com:/home/greg/linux/BK/gregkh-2.5
```
1a008d0e

[PATCH] hc_sl811 build and memory leak · 5c1c6931

Randy Dunlap authored Sep 30, 2002

It needs s/malloc.h/slab.h/ .
It also forgets to free some memory on an error exit patch.
Patch for 2.5.39 follows.

5c1c6931

[PATCH] usb_sg_{init,wait,cancel}() · 1e4fece8

David Brownell authored Sep 30, 2002

Here are the scatterlist primitives there's been mail about before.
Now the code has passed basic sanity testing, and is ready to merge
into Linus' tree to start getting wider use.  Greg, please merge!

To recap, the routines are a utility layer packaging several usb
core facilities to improve system performance.  It's synchronous.
The code uses functionality that drivers could use already, but
generally haven't:

    - Request queueing.  This is a big performance win.  It lets
      device drivers help the hcds avoid wasted i/o bandwidth, by
      eliminating irq and scheduling latencies between requests.  It
      can make a huge difference at high speed, when the latencies
      often exceed the time to handle each i/o request!

    - The new usb_map_sg() primitives, leveraging IOMMU hardware
      if it's there (better than entry-at-a-time mapping).

    - URB_NO_INTERRUPT transfer flag, a hint to hcds that they
      can avoid a 'success irq' for this urb.  Only the urb for
      the last scatterlist entry really needs an IRQ, the others
      can be eliminated or delayed.  (OHCI uses this today, and
      any HCD can safely ignore it.)

The particular functionality in these APIs seemed to meet Matt's
requirements for usb-storage, so I'd hope the 2.5 usb-storage
code will start to use these routines in a while.  (And maybe
those two scanner drivers: hpusbscsi, microtek.)

Brief summary of testing:  this code seems correct for normal
reads and writes, but the fault paths (including cancelation)
haven't been tested yet.  Both EHCI and OHCI seem to be mostly
OK with these more aggressive queued loads, but may need small
updates (like the two I sent yesterday).  Unfortunately I have
to report that UHCI and urb queueing will sometimes lock up my
hardware (PIIX4), so while we're lots better than 2.4 this is
still a bit of a trouble spot for now.

I'll be making some testing software available shortly, which
will help track down remaining HCD level problems by giving the
queuing APIs (and some others!) a more strenuous workout than
most drivers will, in their day-to-day usage.

- Dave

1e4fece8

[PATCH] USB-storage: problem clearing halts · 2eea1938

Matthew Dharm authored Sep 30, 2002

Greg, attached is a patch designed for diagnostic purposes.  Please apply
to the 2.5 tree -- yes, we'll be removing this at some point in the future.

It appears that we have a problem clearing halts.  This patch causes a very
clear message to be printed whenever a usb_stor_clear_halt() manages to
work.  So far, I haven't seen such a thing happen.  And I've seen _lots_ of
STALL conditions.

This problem has likely been around for a while... however, it hasn't been
noticed before because usb-storage was difficult to use because of other
bugs.  Heck, the most recent 'bk pull' is the first one for me in _months_
which let me boot all the way into X11.

I'm going to hold my patch queue until this is resolved.  On my test setup,
it's easy to see this failing.  I've tried with 4 different devices, with
both UHCI and EHCI drivers.  I don't want to confuse this problem with
other patches...

'result' in this function always seems to be -32.  Which is odd, because
control endpoints shouldn't do that.

I'm open to suggestions as to where to look for this bug, but my instincts
are telling me that this is a core or HCD issue, not a usb-storage issue.

On a positive note, this means that the error-recovery system gets a good
workout.

2eea1938

Merge bk://bk.arm.linux.org.uk · 2fbc109c
Linus Torvalds authored Sep 30, 2002
```
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
```
2fbc109c

01 Oct, 2002 4 commits
- Merge flint.arm.linux.org.uk:/usr/src/linux-bk-2.5/linux-2.5 · 93d84590
  Russell King authored Oct 01, 2002
```
into flint.arm.linux.org.uk:/usr/src/linux-bk-2.5/linux-2.5-rmk
```
  93d84590
- [ARM] iPAQ updates from Jamey Hicks · 7cfccad5
  Russell King authored Oct 01, 2002
  
  7cfccad5
- [ARM] General cleanups/missed bits in previous csets · e9174866
  Russell King authored Oct 01, 2002
```
This corrects spelling mistakes, adds missed configuration for
cpufreq, corrects free_irq comment, etc.
```
  e9174866
- [ARM] Prevent namespace clash with IRq numbering · 1859d7e2
  Russell King authored Oct 01, 2002
```
Add "IRQ_" prefix to these sa1111 irq numbers.
```
  1859d7e2
30 Sep, 2002 3 commits

[ARM] Fix sa1111 IRQ handling · 99afe913

Russell King authored Oct 01, 2002

We must clear down all currently pending IRQs before servicing any
IRQ on the chip.  This prevents immediate recursion into the
interrupt handling paths when we service the first IRQ.

99afe913

[ARM] Update cpufreq related sa1100 related drivers and CPU code · 3496bea8
Russell King authored Oct 01, 2002
```
This cset updates sa1100 code for the now merged cpufreq next-gen.
```
3496bea8

[ARM] sa1100fb updates · a58cdfc6

Russell King authored Oct 01, 2002

Update sa1100fb for recent fbcon changes, and move stork LCD power
handling into machine specific file.

a58cdfc6