Commits · 292c8c14cace19c94c6abe25506310239daf949e · nexedi / linux

25 Jan, 2008 30 commits

[GFS2] patch to check for recursive lock requests in gfs2_rename code path · 292c8c14

Abhijith Das authored Nov 29, 2007

A certain scenario in the rename code path triggers a kernel BUG()
because it accidentally does recursive locking The first lock is
requested to unlink an already existing inode (replacing a file) and the
second lock is requested when the destination directory needs to alloc
some space. It is rare that these two
events happen during the same rename call, and even more rare that these
two instances try to lock the same rgrp. It is, however, possible.
https://bugzilla.redhat.com/show_bug.cgi?id=404711Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

292c8c14

[GFS2] Remove lock methods for lock_nolock protocol · c97bfe43

Wendy Cheng authored Nov 29, 2007

GFS2 supports two modes of locking - lock_nolock for single node filesystem
and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from
file operation table for lock_nolock protocol. This would allow VFS to handle
posix lock and flock logics just like other in-tree filesystems without
duplication.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

c97bfe43

[GFS2] Remove unrequired code · bcd40559

Fabio M. Di Nitto authored Nov 28, 2007

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

bcd40559

[GFS2] Fix build warnings · 6a69a23f

Fabio Massimo Di Nitto authored Nov 27, 2007

Hi Steven,

Steven Whitehouse wrote:
> Hi,
>
> Now in the -nmw git tree. Thanks,
>
> Steve.
>
> On Wed, 2007-11-21 at 11:54 -0600, Ryan O'Hara wrote:

this patch introduces a bunch of build warnings by leaving around

struct inode *inode = &ip->i_inode;

The patch in attachment cleans them up. Please apply.
Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

6a69a23f

[GFS2] remove unnecessary permission checks · 002ef1dc

Ryan O'Hara authored Nov 21, 2007

Remove read/write permission() checks from xattr operations.
VFS layer is already handling permission for xattrs via the
xattr_permission() call, so there is no need for gfs2 to
check permissions. Futhermore, using permission() for SELinux
xattrs ops is incorrect.
Signed-off-by: Ryan O'Hara <rohara@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

002ef1dc

[GFS2] Fix runtime issue with UP kernels · 1a2781cf

Fabio Massimo Di Nitto authored Nov 16, 2007

The issue is indeed UP vs SMP and it is totally random.

spin_is_locked() is a bad assertion because there is no correct answer on UP.
on UP spin_is_locked() has to return either one value or another, always.

This means that in my setup I am lucky enough to trigger the issue and your you
are lucky enough not to.

the patch in attachment removes the bogus calls to BUG_ON and according to David
(in CC and thanks for the long explanation on the problem) we can rely upon
things like lockdep to find problem that might be trying to catch.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

1a2781cf

[GFS2] tidy up error message · 00c13475

David Teigland authored Nov 15, 2007

Print error with log_error() to be consistent with others.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

00c13475

[GFS2] Check for installation of mount helpers for DLM mounts · 0b7580c7

Fabio Massimo Di Nitto authored Nov 15, 2007

The patch is a fix to abort mount if the mount.gfs* and possible
umount.* are missing from /sbin.

While we do what we can to guarantee that they are installed properly in
userland (CVS HEAD), we want to make sure that mount still aborts properly.

The only sign of missing helpers is that lock_dlm will receive no mount options
at all. According to David the problem does not exist for lock_nolock as the
helpers are not required.

The patch has been tested for both gfs and gfs2 and it works as expected. The
lack of mount.gfs* will generate an error that is propagated to mount:

oot@node1:~# mount -t  gfs2 /dev/nbd2 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/nbd2,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

[ 3513.303346] GFS2: fsid=: Trying to join cluster "lock_dlm", "gutsy:gfs2"
[ 3513.304546] DLM/GFS2/GFS ERROR: (u)mount helpers are not installed properly!
[ 3513.306290] GFS2: fsid=: can't mount proto=lock_dlm, table=gutsy:gfs2, hostdata=

You might want to notice that it will also avoid mount to hang or fail silently
or with strange errors that will require the cluster to reboot/restart before
you can actually mount the filesystem again.
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

0b7580c7

[GFS2] Don't periodically update the jindex · e35b9211

Steven Whitehouse authored Nov 09, 2007

We only care about the content of the jindex in two cases,
one is when we mount the fs and the other is when we need
to recover another journal. In both cases we have to update
the jindex anyway, so there is no point in updating it
periodically between times, so this removes it to simplify
gfs2_logd.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

e35b9211

[GFS2] Move gfs2_logd into log.c · ec69b188

Steven Whitehouse authored Nov 09, 2007

This means that we can mark gfs2_ail1_empty static and prepares
the way for further changes.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

ec69b188

[GFS2] Use atomic_t for journal free blocks counter · fd041f0b

Steven Whitehouse authored Nov 08, 2007

This patch changes the counter which keeps track of the free
blocks in the journal to an atomic_t in preparation for the
following patch which will update the log reservation code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

fd041f0b

[GFS2] Don't add glocks to the journal · 2bcd610d

Steven Whitehouse authored Nov 08, 2007

The only reason for adding glocks to the journal was to keep track
of which locks required a log flush prior to release. We add a
flag to the glock to allow this check to be made in a simpler way.

This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64)
and means that we can avoid extra work during the journal flush.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

2bcd610d

[GFS2] check kthread_should_stop when waiting · 8cbc4342

David Teigland authored Nov 07, 2007

Use wait_event_interruptible() in the lock_dlm thread instead
of an open coded equivalent, and include a kthread_should_stop()
check in the wait test so we don't miss a kthread_stop().
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

8cbc4342

[GFS2] Given device ID rather than s_id in "id" sysfs file · c7227e46

Bob Peterson authored Nov 02, 2007

This patch changes the /sys/fs/gfs2/<s_id>/id file to give the device
id "major:minor" rather than the s_id. That enables gfs2_tool to
match devices properly (by id, not name) when locating the tuning files.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

c7227e46

[GFS2] Remove flags no longer required · e589665e

Steven Whitehouse authored Nov 02, 2007

The HIF_MUTEX and HIF_PROMOTE flags were set on the glock holders
depending upon which of the two waiters lists they were going to
be queued upon. They were then tested when the holders were taken
off the lists to ensure that the right type of holder was being
dequeued.

Since we are already using separate lists, there doesn't seem a
lot of point having these flags as well, and since setting them
and testing them is in the fast path for locking and unlocking
glock, this patch removes them.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

e589665e

[GFS2] Reorder writeback for glock sync · 3042a2cc

Steven Whitehouse authored Nov 02, 2007

Previously we were doing (write data, wait for data, write metadata, wait
for metadata). After this patch we so (write metadata, write data, wait for
data, wait for metadata) which should be more efficient.

Also I noticed that the drop_bh and xmote_bh functions were almost
identical. In fact the only difference was a single test, and that
test is such that in the drop_bh case, it would always evaluate to
the correct result. As such we can use the xmote_bh functions in
all the places where we were using the drop_bh function and remove
the drop_bh functions.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

3042a2cc

[GFS2] Add sync_page to metadata address space operations · 52d4c74b

Steven Whitehouse authored Nov 01, 2007

This set of address space operations was missing a sync_page
operation.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

52d4c74b

[GFS2] Remove "reclaim limit" · c2932e03

Steven Whitehouse authored Nov 01, 2007

This call to reclaim glocks is not needed, and in particular we don't want it
in the fast path for locking glocks. The limit was entirely arbitrary anyway
and we can't expect users to adjust things like this, the remaining code will
do the right thing on its own.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

c2932e03

[GFS2] Remove unused variables · 60b0d087

Steven Whitehouse authored Oct 31, 2007

These haven't been used for some time, remove them.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

60b0d087

[GFS2] Use correct include file in ops_address.c · 47e83b50

Steven Whitehouse authored Oct 18, 2007

Something changed in the upstream kernel, and it needs this
one-liner to allow ops_address.c to build.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

47e83b50

[GFS2] Don't hold page lock when starting transaction · c41d4f09

Steven Whitehouse authored Oct 17, 2007

This is an addendum to the new AOPs work which moves the point
at which we take the page lock so that we don't get it until
the last possible moment. This resolves a conflict between
starting transactions and the page lock.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

c41d4f09

[GFS2] Add writepages for GFS2 jdata · b8e7cbb6

Steven Whitehouse authored Oct 17, 2007

This patch resolves a lock ordering issue where we had been getting
a transaction lock in the wrong order with respect to the page lock.
By using writepages rather than just writepage, it is then possible
to start a transaction before locking the page, and thus matching the
locking order elsewhere in the code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

b8e7cbb6

[GFS2] Split gfs2_writepage into three cases · 9ff8ec32

Steven Whitehouse authored Sep 28, 2007

This patch splits gfs2_writepage into separate functions for each of
the three cases: writeback, ordered and journalled. As a result
it becomes a lot easier to see what each one is doing. The common
code is moved into gfs2_writepage_common.

This fixes a performance bug where we were doing more work than
strictly required in the ordered write case.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

9ff8ec32

[GFS2] Introduce gfs2_set_aops() · 5561093e

Steven Whitehouse authored Oct 17, 2007

Just like ext3 we now have three sets of address space operations
to cover the cases of writeback, ordered and journalled data
writes. This means that the individual operations can now become
less complicated as we are able to remove some of the tests for
file data mode from the code.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

5561093e

[GFS2] Add gfs2_is_writeback() · bf36a713

Steven Whitehouse authored Oct 17, 2007

This adds a function "gfs2_is_writeback()" along the lines of the
existing "gfs2_is_jdata()" in order to clean up the code and make
the various tests for the inode mode more obvious. It also fixes
the PageChecked() logic where we were resetting the flag too early
in the case of an error path.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

bf36a713

[GFS2] Remove unused field in struct gfs2_inode · e7e36f14
Steven Whitehouse authored Oct 16, 2007
```
Removes a field that is not used.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
```
e7e36f14

[GFS2] Remove useless i_cache from inodes · f91a0d3e

Steven Whitehouse authored Oct 15, 2007

The i_cache was designed to keep references to the indirect blocks
used during block mapping so that they didn't have to be looked
up continually. The idea failed because there are too many places
where the i_cache needs to be freed, and this has in the past been
the cause of many bugs.

In addition there was no performance benefit being gained since the
disk blocks in question were cached anyway. So this patch removes
it in order to simplify the code to prepare for other changes which
would otherwise have had to add further support for this feature.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

f91a0d3e

[GFS2] Use ->page_mkwrite() for mmap() · 3cc3f710

Steven Whitehouse authored Oct 15, 2007

This cleans up the mmap() code path for GFS2 by implementing the
page_mkwrite function for GFS2. We are thus able to use the
generic filemap_fault function for our ->fault() implementation.

This now means that shared writable mappings will be much more
efficiently shared across the cluster if there is a reasonable
proportion of read activity (the greater proportion, the better).

As a side effect, it also reduces the size of the code, removes
special cases from readpage and readpages, and makes the code
path easier to follow.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

3cc3f710

[GFS2] Clean up internal read function · 51ff87bd

Steven Whitehouse authored Oct 15, 2007

As requested by Christoph, this patch cleans up GFS2's internal
read function so that it no longer uses the do_generic_mapping_read
function. This function is obsolete and GFS2 is the last user of it.

As a side effect the internal read code gets smaller and easier
to read and gfs2_readpage is split into two. One function has the locking
and the other function has the rest of the logic.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>

51ff87bd

[GFS2] Handle multiple glock demote requests · cc7e79b1

Wendy Cheng authored Oct 05, 2007

Fix a race condition where multiple glock demote requests are sent to
a node back-to-back. This patch does a check inside handle_callback()
to see whether a demote request is in progress. If true, it sets a flag
to make sure run_queue() will loop again to handle the new request,
instead of erronously setting gl_demote_state to a different state.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

cc7e79b1

24 Jan, 2008 10 commits

Linux 2.6.24 · 49914084
Linus Torvalds authored Jan 24, 2008

49914084

spi: omap2_mcspi PIO RX fix · feed9bab

Kalle Valo authored Jan 24, 2008

Before transmission of the last word in PIO RX_ONLY mode rx+tx mode
is enabled:

	/* prevent last RX_ONLY read from triggering
	 * more word i/o: switch to rx+tx
	 */
	if (c == 0 && tx == NULL)
		mcspi_write_cs_reg(spi,
				OMAP2_MCSPI_CHCONF0, l);

But because c is decremented after the test, c will never be zero and
rx+tx will not be enabled. This breaks RX_ONLY mode PIO transfers.

Fix it by decrementing c in the beginning of the various I/O loops.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

feed9bab

Revert "mac80211: warn when receiving frames with unaligned data" · dbcc2ec6

Linus Torvalds authored Jan 24, 2008

This reverts commit 81100eb8 for the
release, to avoid the unnecessary warning noise that is only really
relevant to wireless driver developers.

The warning will probably go right back in after I cut the release, but
at least we won't unnecessarily worry users.
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dbcc2ec6

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · 901720af

Linus Torvalds authored Jan 24, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Partially revert "Constify function pointer tables."

901720af

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 · 668ebab4

Linus Torvalds authored Jan 24, 2008

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  Revert "ACPI: Fan: Drop force_power_state acpi_device option"
  ACPI: EC: "DEBUG" needs to be defined earlier
  ACPI: EC: add leading zeros to debug messages
  ACPI: EC: fix dmesg spam regression
  ACPI: DMI blacklist to reduce console warnings on OSI(Linux) systems.
  ACPI: Add ThinkPad R61, ThinkPad T61 to OSI(Linux) white-list
  ACPI: make _OSI(Linux) console messages smarter
  ACPI: Delete Intel Customer Reference Board (CRB) from OSI(Linux) DMI list
  ACPI: on OSI(Linux), print needed DMI rather than requesting dmidecode output
  ACPI: create acpi_dmi_dump()
  DMI: create dmi_get_slot()
  DMI: move dmi_available declaration to linux/dmi.h
  ACPI: processor: Fix null pointer dereference in throttling

668ebab4

slab: partially revert list3 changes · 9c09a95c

Mel Gorman authored Jan 24, 2008

Partial revert the changes made by 04231b30
to the kmem_list3 management. On a machine with a memoryless node, this
BUG_ON was triggering

	static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
	{
		struct list_head *entry;
		struct slab *slabp;
		struct kmem_list3 *l3;
		void *obj;
		int x;

		l3 = cachep->nodelists[nodeid];
		BUG_ON(!l3);
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9c09a95c

fix hugepages leak due to pagetable page sharing · c5c99429

Larry Woodman authored Jan 24, 2008

The shared page table code for hugetlb memory on x86 and x86_64
is causing a leak.  When a user of hugepages exits using this code
the system leaks some of the hugepages.

-------------------------------------------------------
Part of /proc/meminfo just before database startup:
HugePages_Total:  5500
HugePages_Free:   5500
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

Just before shutdown:
HugePages_Total:  5500
HugePages_Free:   4475
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

After shutdown:
HugePages_Total:  5500
HugePages_Free:   4988
HugePages_Rsvd:
0 Hugepagesize:     2048 kB
----------------------------------------------------------

The problem occurs durring a fork, in copy_hugetlb_page_range().  It
locates the dst_pte using huge_pte_alloc().  Since huge_pte_alloc() calls
huge_pmd_share() it will share the pmd page if can, yet the main loop in
copy_hugetlb_page_range() does a get_page() on every hugepage.  This is a
violation of the shared hugepmd pagetable protocol and creates additional
referenced to the hugepages causing a leak when the unmap of the VMA
occurs.  We can skip the entire replication of the ptes when the hugepage
pagetables are shared.  The attached patch skips copying the ptes and the
get_page() calls if the hugetlbpage pagetable is shared.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c5c99429

sysctl: kill binary sysctl KERN_PPC_L2CR · c2f3dabe

Eric W. Biederman authored Jan 24, 2008

: Stefan Roese <sr@denx.de> said:
> ppc: 4xx: sysctl table check failed: /kernel/l2cr .1.31 Missing strategy
>
> I'm seeing this error message when booting an recent arch/ppc kernel on
> 4xx platforms (tested on Ocotea and other 4xx platforms). Booting NFS
> rootfs still works fine, but this message kind of makes me "nervous".
> This is not seen on 4xx arch/powerpc platforms. Here the bootlog:

Because the data field was never filled and a binary sysctl handler was
never written this sysctl has never been usable through the sys_sysctl
interface.  So just remove the binary sysctl number.  Making the kernel
sanity checks happy.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Stefan Roese <sr@denx.de>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Wolfgang Denk <wd@denx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c2f3dabe

lockdep: fix kernel crash on module unload · fabe874a

Arjan van de Ven authored Jan 24, 2008

Michael Wu noticed in his lkml post at

    http://marc.info/?l=linux-kernel&m=119396182726091&w=2

that certain wireless drivers ended up having their name in module
memory, which would then crash the kernel on module unload.

The patch he proposed was a bit clumsy in that it increased the size of
a lockdep entry significantly; the patch below tries another approach,
it checks, on module teardown, if the name of a class is in module space
and then zaps the class.  This is very similar to what we already do
with keys that are in module space.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fabe874a

[SPARC64]: Partially revert "Constify function pointer tables." · de195fd0

David S. Miller authored Jan 23, 2008

This partially reverts 872e2be7
(Constify function pointer tables.)

The solaris/socksys.c transformation wasn't valid:

arch/sparc64/solaris/socksys.c:192: error: assignment of read-only variable ‘socksys_file_ops’
arch/sparc64/solaris/socksys.c:195: error: assignment of read-only variable ‘socksys_file_ops’
arch/sparc64/solaris/socksys.c:196: error: assignment of read-only variable ‘socksys_file_ops’
Signed-off-by: David S. Miller <davem@davemloft.net>

de195fd0