Commits · 65b6f3403431cd43ef7b0dab679a50f770124a65 · Kirill Smelkov / linux

26 Feb, 2010 20 commits

ocfs2_dlmfs: Use poll() to signify BASTs. · 65b6f340

Joel Becker authored Jan 26, 2010

o2dlm's userspace filesystem is an easy way to use the DLM from
userspace.  It is intentionally simple. For example, it does not allow
for asynchronous behavior or lock conversion.  This is intentional to
keep the interface simple.

Because there is no asynchronous notification, there is no way for a
process holding a lock to know another node needs the lock.  This is the
number one complaint of ocfs2_dlmfs users.  Turns out, we can solve this
very easily.  We add poll() support to ocfs2_dlmfs.  When a BAST is
received, the lock's file descriptor will receive POLLIN.

This is trivial to implement.  Userdlm already has an appropriate
waitqueue, and the lock knows when it is blocked.

We add the "bast" capability to tell userspace this is available.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

65b6f340

ocfs2_dlmfs: Add capabilities parameter. · 14a437c2

Joel Becker authored Feb 04, 2010

Over time, dlmfs has added some features that were not part of the
initial ABI.  Unfortunately, some of these features are not detectable
via standard usage.  For example, Linux's default poll always returns
POLLIN, so there is no way for a caller of poll(2) to know when dlmfs
added poll support.  Instead, we provide this list of new capabilities.

Capabilities is a read-only attribute.  We do it as a module parameter
so we can discover it whether dlmfs is built in, loaded, or even not
loaded (via modinfo).

The ABI features are local to this machine's dlmfs mount.  This is
distinct from the locking protocol, which is concerned with inter-node
interaction.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

14a437c2

ocfs2: Handle errors while setting external xattr values. · 399ff3a7

Joel Becker authored Sep 01, 2009

ocfs2 can store extended attribute values as large as a single file.  It
does this using a standard ocfs2 btree for the large value.  However,
the previous code did not handle all error cases cleanly.

There are multiple problems to have.

1) We have trouble allocating space for a new xattr.  This leaves us
   with an empty xattr.
2) We overwrote an existing local xattr with a value root, and now we
   have an error allocating the storage.  This leaves us an empty xattr.
   where there used to be a value.  The value is lost.
3) We have trouble truncating a reused value.  This leaves us with the
   original entry pointing to the truncated original value.  The value
   is lost.
4) We have trouble extending the storage on a reused value.  This leaves
   us with the original value safely in place, but with more storage
   allocated when needed.

This doesn't consider storing local xattrs (values that don't require a
btree).  Those only fail when the journal fails.

Case (1) is easy.  We just remove the xattr we added.  We leak the
storage because we can't safely remove it, but otherwise everything is
happy.  We'll print a warning about the leak.

Case (4) is easy.  We still have the original value in place.  We can
just leave the extra storage attached to this xattr.  We return the
error, but the old value is untouched.  We print a warning about the
storage.

Case (2) and (3) are hard because we've lost the original values.  In
the old code, we ended up with values that could be partially read.
That's not good.  Instead, we just wipe the xattr entry and leak the
storage.  It stinks that the original value is lost, but now there isn't
a partial value to be read.  We'll print a big fat warning.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

399ff3a7

ocfs2: Set inline xattr entries with ocfs2_xa_set() · 139fface

Joel Becker authored Aug 19, 2009

ocfs2_xattr_ibody_set() is the only remaining user of
ocfs2_xattr_set_entry().  ocfs2_xattr_set_entry() actually does two
things: it calls ocfs2_xa_set(), and it initializes the inline xattrs.
Initializing the inline space really belongs in its own call.

We lift the initialization to ocfs2_xattr_ibody_init(), called from
ocfs2_xattr_ibody_set() only when necessary.  Now
ocfs2_xattr_ibody_set() can call ocfs2_xa_set() directly.
ocfs2_xattr_set_entry() goes away.

Another nice fact is that ocfs2_init_dinode_xa_loc() can trust
i_xattr_inline_size.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

139fface

ocfs2: Set xattr block entries with ocfs2_xa_set() · d3981544

Joel Becker authored Aug 19, 2009

ocfs2_xattr_block_set() calls into ocfs2_xattr_set_entry() with just the
HAS_XATTR flag. Most of the machinery of ocfs2_xattr_set_entry() is
skipped. All that really happens other than the call to ocfs2_xa_set()
is making sure the HAS_XATTR flag is set on the inode.

But HAS_XATTR should be set when we also set di->i_xattr_loc. And
that's done in ocfs2_create_xattr_block(). So let's move it there, and
then ocfs2_xattr_block_set() can just call ocfs2_xa_set().

While we're there, ocfs2_create_xattr_block() can take the set_ctxt for
a smaller argument list. It also learns to set HAS_XATTR_FL, because it
knows for sure. ocfs2_create_empty_xatttr_block() in the reflink path
fakes a set_ctxt to call ocfs2_create_xattr_block().
Signed-off-by: Joel Becker <joel.becker@oracle.com>

d3981544

ocfs2: Let ocfs2_xa_prepare_entry() do space checks. · c5d95df5

Joel Becker authored Aug 18, 2009

ocfs2_xattr_set_in_bucket() doesn't need to do its own hacky space
checking.  Let's let ocfs2_xa_prepare_entry() (via ocfs2_xa_set()) do
the more accurate work.  Whenever it doesn't have space,
ocfs2_xattr_set_in_bucket() can try to get more space.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

c5d95df5

ocfs2: Gell into ocfs2_xa_set() · bca5e9bd

Joel Becker authored Aug 18, 2009

ocfs2_xa_set() wraps the ocfs2_xa_prepare_entry()/ocfs2_xa_store_value()
logic.  Both callers can now use the same routine.  ocfs2_xa_remove()
moves directly into ocfs2_xa_set().
Signed-off-by: Joel Becker <joel.becker@oracle.com>

bca5e9bd

ocfs2: Allocation in ocfs2_xa_prepare_entry(), values in ocfs2_xa_store_value() · 73857ee0

Joel Becker authored Aug 18, 2009

ocfs2_xa_prepare_entry() gets all the logic to add, remove, or modify
external value trees.  Now, when it exits, the entry is ready to receive
a value of any size.

ocfs2_xa_remove() is added to handle the complete removal of an entry.
It truncates the external value tree before calling
ocfs2_xa_remove_entry().

ocfs2_xa_store_inline_value() becomes ocfs2_xa_store_value().  It can
store any value.

ocfs2_xattr_set_entry() loses all the allocation logic and just uses
these functions.  ocfs2_xattr_set_value_outside() disappears.

ocfs2_xattr_set_in_bucket() uses these functions and makes
ocfs2_xattr_set_entry_in_bucket() obsolete.  That goes away, as does
ocfs2_xattr_bucket_set_value_outside() and
ocfs2_xattr_bucket_value_truncate().
Signed-off-by: Joel Becker <joel.becker@oracle.com>

73857ee0

ocfs2: Teach ocfs2_xa_loc how to do its own journal work · cf2bc809

Joel Becker authored Aug 18, 2009

We're going to want to make sure our buffers get accessed and dirtied
correctly.  So have the xa_loc do the work.  This includes storing the
inode on ocfs2_xa_loc.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

cf2bc809

ocfs2: Provide ocfs2_xa_fill_value_buf() for external value processing · 3fc12afa

Joel Becker authored Aug 18, 2009

We use the ocfs2_xattr_value_buf structure to manage external values.
It lets the value tree code do its work regardless of the containing
storage.  ocfs2_xa_fill_value_buf() initializes a value buf from an
ocfs2_xa_loc entry.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

3fc12afa

ocfs2: Handle value tree roots in ocfs2_xa_set_inline_value() · 9dc47400

Joel Becker authored Aug 17, 2009

Previously the xattr code would send in a fake value, containing a tree
root, to the function that installed name+value pairs. Instead, we pass
the real value to ocfs2_xa_set_inline_value(), and it notices that the
value cannot fit. Thus, it installs a tree root.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

9dc47400

ocfs2: Set the xattr name+value pair in one place · 69a3e539

Joel Becker authored Aug 17, 2009

We create two new functions on ocfs2_xa_loc, ocfs2_xa_prepare_entry()
and ocfs2_xa_store_inline_value().

ocfs2_xa_prepare_entry() makes sure that the xl_entry field of
ocfs2_xa_loc is ready to receive an xattr.  The entry will point to an
appropriately sized name+value region in storage.  If an existing entry
can be reused, it will be.  If no entry already exists, it will be
allocated.  If there isn't space to allocate it, -ENOSPC will be
returned.

ocfs2_xa_store_inline_value() stores the data that goes into the 'value'
part of the name+value pair.  For values that don't fit directly, this
stores the value tree root.

A number of operations are added to ocfs2_xa_loc_operations to support
these functions.  This reflects the disparate behaviors of xattr blocks
and buckets.

With these functions, the overlapping ocfs2_xattr_set_entry_local() and
ocfs2_xattr_set_entry_normal() can be replaced with a single call
scheme.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

69a3e539

ocfs2: Wrap calculation of name+value pair size. · 199799a3

Joel Becker authored Aug 14, 2009

An ocfs2 xattr entry stores the text name and value as a pair in the
storage area.  Obviously names and values can be variable-sized.  If a
value is too large for the entry storage, a tree root is stored instead.
The name+value pair is also padded.

Because of this, there are a million places in the code that do:

	if (needs_external_tree(value_size)
		namevalue_size = pad(name_size) + tree_root_size;
	else
		namevalue_size = pad(name_size) + pad(value_size);

Let's create some convenience functions to make the code more readable.
There are three forms.  The first takes the raw sizes.  The second takes
an ocfs2_xattr_info structure.  The third takes an existing
ocfs2_xattr_entry.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

199799a3

ocfs2: Add a name_len field to ocfs2_xattr_info. · 18853b95

Joel Becker authored Aug 14, 2009

Rather than calculating strlen all over the place, let's store the
name length directly on ocfs2_xattr_info.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

18853b95

ocfs2: Prefix the member fields of struct ocfs2_xattr_info. · 6b240ff6

Joel Becker authored Aug 14, 2009

struct ocfs2_xattr_info is a useful structure describing an xattr
you'd like to set.  Let's put prefixes on the member fields so it's
easier to read and use.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

6b240ff6

ocfs2: Remove xattrs via ocfs2_xa_loc · bde1e540

Joel Becker authored Aug 14, 2009

Add ocfs2_xa_remove_entry(), which will remove an xattr entry from its
storage via the ocfs2_xa_loc descriptor.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

bde1e540

ocfs2: Introduce ocfs2_xa_loc · 11179f2c

Joel Becker authored Aug 14, 2009

The ocfs2 extended attribute (xattr) code is very flexible.  It can
store xattrs in the inode itself, in an external block, or in a tree of
data structures.  This allows the number of xattrs to be bounded by the
filesystem size.

However, the code that manages each possible storage location is
different.  Maintaining the ocfs2 xattr code requires changing each hunk
separately.

This patch is the start of a series introducing the ocfs2_xa_loc
structure.  This structure wraps the on-disk details of an xattr
entry.  The goal is that the generic xattr routines can use
ocfs2_xa_loc without knowing the underlying storage location.

This first pass merely implements the basic structure, initializing it,
and wiping the name+value pair of the entry.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

11179f2c

ocfs2: Add current->comm in trace output · 8545e03d

Sunil Mushran authored Feb 12, 2010

Add current->comm to the standard mlog() output to help with debugging.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

8545e03d

ocfs2: Clean up the checks for CoW and direct I/O. · 96a1cc73

Wengang Wang authored Feb 09, 2010

When ocfs2 has to do CoW for refcounted extents, we disable direct I/O
and go through the buffered I/O path.  This makes the combined check
easier to read.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

96a1cc73

ocfs2: add extent block stealing for ocfs2 v5 · b89c5428

Tiger Yang authored Jan 25, 2010

This patch add extent block (metadata) stealing mechanism for
extent allocation. This mechanism is same as the inode stealing.
if no room in slot specific extent_alloc, we will try to
allocate extent block from the next slot.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

b89c5428

09 Feb, 2010 1 commit

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · a5f28ae4

Linus Torvalds authored Feb 08, 2010

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/cluster: Make o2net connect messages KERN_NOTICE
  ocfs2/dlm: Fix printing of lockname
  ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
  ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
  ocfs2: Plugs race between the dc thread and an unlock ast message
  ocfs2: Remove overzealous BUG_ON during blocked lock processing
  ocfs2: Do not downconvert if the lock level is already compatible
  ocfs2: Prevent a livelock in dlmglue
  ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
  ocfs2: Use compat_ptr in reflink_arguments.
  ocfs2/dlm: Handle EAGAIN for compatibility - v2
  ocfs2: Add parenthesis to wrap the check for O_DIRECT.
  ocfs2: Only bug out when page size is larger than cluster size.
  ocfs2: Fix memory overflow in cow_by_page.
  ocfs2/dlm: Print more messages during lock migration
  ocfs2/dlm: Ignore LVBs of locks in the Blocked list
  ocfs2/trivial: Remove trailing whitespaces
  ocfs2: fix a misleading variable name
  ocfs2: Sync max_inline_data_with_xattr from tools.
  ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path

a5f28ae4

08 Feb, 2010 16 commits

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq · 8defcaa6

Linus Torvalds authored Feb 08, 2010

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix ondemand to not request targets outside policy limits
  [CPUFREQ] Fix use after free of struct powernow_k8_data
  [CPUFREQ] fix default value for ondemand governor

8defcaa6

ocfs2/cluster: Make o2net connect messages KERN_NOTICE · 6efd8066

Sunil Mushran authored Feb 05, 2010

Connect and disconnect messages are more than informational as they are required
during root cause analysis for failures. This patch changes them from KERN_INFO
to KERN_NOTICE.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Faseh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

6efd8066

ocfs2/dlm: Fix printing of lockname · 86a06aba

Sunil Mushran authored Feb 05, 2010

The debug call printing the name of the lock resource was chopping
off the last character. This patch fixes the problem.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

86a06aba

Merge branch 'v4l_for_linus' of git://linuxtv.org/fixes · 08c4f1b0

Linus Torvalds authored Feb 08, 2010

* 'v4l_for_linus' of git://linuxtv.org/fixes:
  V4L/DVB: dvb-core: fix initialization of feeds list in demux filter
  V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet
  V4L/DVB: Fix the risk of an oops at dvb_dmx_release

08c4f1b0

Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze · 2b1f5c3a
Linus Torvalds authored Feb 08, 2010
```
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Invalidate dcache before enabling it
```
2b1f5c3a

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 9d2bc1a4

Linus Torvalds authored Feb 08, 2010

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pseries: Fix kexec regression caused by CPPR tracking

9d2bc1a4

Merge branch 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · 8bd73803

Linus Torvalds authored Feb 08, 2010

* 'sh/for-2.6.33' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Remove superfluous setup_frame_reg call
  sh: Don't continue unwinding across interrupts
  sh: Setup frame pointer in handle_exception path
  sh: Correct the offset of the return address in ret_from_exception
  usb: r8a66597-hcd: Fix up spinlock recursion in root hub polling.
  usb: r8a66597-hcd: Flush the D-cache for the pipe-in transfer buffers.

8bd73803

V4L/DVB: dvb-core: fix initialization of feeds list in demux filter · 691c9ae0

Francesco Lavra authored Feb 07, 2010

A DVB demultiplexer device can be used to set up either a PES filter or
a section filter. In the former case, the ts field of the feed union of
struct dmxdev_filter is used, in the latter case the sec field of the
same union is used.
The ts field is a struct list_head, and is currently initialized in the
open() method of the demux device. When for a given demuxer a section
filter is set up, the sec field is played with, thus if a PES filter
needs to be set up after that the ts field will be corrupted, causing a
kernel oops.
This fix moves the list head initialization to
dvb_dmxdev_pes_filter_set(), so that the ts field is properly
initialized every time a PES filter is set up.
Signed-off-by: Francesco Lavra <francescolavra@interfree.it>
Cc: stable <stable@kernel.org>
Reviewed-by: Andy Walls <awalls@radix.net>
Tested-by: hermann pitton <hermann-pitton@arcor.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

691c9ae0

V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet · bc081cc8

Mauro Carvalho Chehab authored Feb 01, 2010

As dvb_dmx_swfilter_packet() is protected by a spinlock, it shouldn't sleep.
However, vmalloc() may call sleep. So, move the initialization of
dvb_demux::cnt_storage field to a better place.
Reviewed-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

bc081cc8

V4L/DVB: Fix the risk of an oops at dvb_dmx_release · adefdcee

Mauro Carvalho Chehab authored Feb 01, 2010

dvb_dmx_init tries to allocate virtual memory for 2 pointers: filter and feed.

If the second vmalloc fails, filter is freed, but the pointer keeps pointing
to the old place. Later, when dvb_dmx_release() is called, it will try to
free an already freed memory, causing an OOPS.
Reviewed-by: Andy Walls <awalls@radix.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

adefdcee

microblaze: Invalidate dcache before enabling it · a6013411

Michal Simek authored Feb 01, 2010

We found that on write-trough kernel is necessary to do that invalidation.
One WB is possible to use invalidation too.
Signed-off-by: Michal Simek <monstr@monstr.eu>

a6013411

powerpc/pseries: Fix kexec regression caused by CPPR tracking · 36350e00

Mark Nelson authored Feb 07, 2010

The code to track the CPPR values added by commit
49bd3647 ("powerpc/pseries: Track previous
CPPR values to correctly EOI interrupts") broke kexec on pseries because
the kexec code in xics.c calls xics_set_cpu_priority() before the IPI has
been EOI'ed. This wasn't a problem previously but it now triggers a BUG_ON
in xics_set_cpu_priority() because os_cppr->index isn't 0.

Fix this problem by setting the index on the CPPR stack to 0 before calling
xics_set_cpu_priority() in xics_teardown_cpu().

Also make it clear that we only want to set the priority when there's just
one CPPR value in the stack, and enforce it by updating the value of
os_cppr->stack[0] rather than os_cppr->stack[os_cppr->index].

While we're at it change the BUG_ON to a WARN_ON.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

36350e00

sh: Remove superfluous setup_frame_reg call · 1af0b2fc

Matt Fleming authored Jan 30, 2010

There's no need to setup the frame pointer again in
call_handle_tlbmiss. The frame pointer will already have been setup in
handle_interrupt.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

1af0b2fc

sh: Don't continue unwinding across interrupts · 944a3438

Matt Fleming authored Jan 30, 2010

Unfortunately, due to poor DWARF info in current toolchains, unwinding
through interrutps cannot be done reliably. The problem is that the
DWARF info for function epilogues is wrong.

Take this standard epilogue sequence,

80003cc4:       e3 6f           mov     r14,r15
80003cc6:       26 4f           lds.l   @r15+,pr
80003cc8:       f6 6e           mov.l   @r15+,r14
						<---- interrupt here
80003cca:       f6 6b           mov.l   @r15+,r11
80003ccc:       f6 6a           mov.l   @r15+,r10
80003cce:       f6 69           mov.l   @r15+,r9
80003cd0:       0b 00           rts

If we take an interrupt at the highlighted point, the DWARF info will
bogusly claim that the return address can be found at some offset from
the frame pointer, even though the frame pointer was just restored. The
worst part is if the unwinder finds a text address at the bogus stack
address - unwinding will continue, for a bit, until it finally comes
across an unexpected address on the stack and blows up.

The only solution is to stop unwinding once we've calculated the
function that was executing when the interrupt occurred. This PC can be
easily calculated from pt_regs->pc.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

944a3438

sh: Setup frame pointer in handle_exception path · 1dca56f1

Matt Fleming authored Jan 27, 2010

In order to allow the DWARF unwinder to unwind through exceptions we
need to setup the frame pointer register (r14).
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

1dca56f1

sh: Correct the offset of the return address in ret_from_exception · 14269828

Matt Fleming authored Jan 27, 2010

The address that ret_from_exception and ret_from_irq will return to is
found in the stack slot for SPC, not PR. This error was causing the
DWARF unwinder to pick up the wrong return address on the stack and then
unwind using the unwind tables for the wrong function.

While I'm here I might as well add CFI annotations for the other
registers since they could be useful when unwinding.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

14269828

07 Feb, 2010 3 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 6339204e

Linus Torvalds authored Feb 07, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Take ima_file_free() to proper place.
  ima: rename PATH_CHECK to FILE_CHECK
  ima: rename ima_path_check to ima_file_check
  ima: initialize ima before inodes can be allocated
  fix ima breakage
  Take ima_path_check() in nfsd past dentry_open() in nfsd_open()
  freeze_bdev: don't deactivate successfully frozen MS_RDONLY sb
  befs: fix leak

6339204e

Fix race in tty_fasync() properly · 80e1e823

Linus Torvalds authored Feb 07, 2010

This reverts commit 70362511 ("tty: fix race in tty_fasync") and
commit b04da8bf ("fnctl: f_modown should call write_lock_irqsave/
restore") that tried to fix up some of the fallout but was incomplete.

It turns out that we really cannot hold 'tty->ctrl_lock' over calling
__f_setown, because not only did that cause problems with interrupt
disables (which the second commit fixed), it also causes a potential
ABBA deadlock due to lock ordering.

Thanks to Tetsuo Handa for following up on the issue, and running
lockdep to show the problem.  It goes roughly like this:

 - f_getown gets filp->f_owner.lock for reading without interrupts
   disabled, so an interrupt that happens while that lock is held can
   cause a lockdep chain from f_owner.lock -> sighand->siglock.

 - at the same time, the tty->ctrl_lock -> f_owner.lock chain that
   commit 70362511 introduced, together with the pre-existing
   sighand->siglock -> tty->ctrl_lock chain means that we have a lock
   dependency the other way too.

So instead of extending tty->ctrl_lock over the whole __f_setown() call,
we now just take a reference to the 'pid' structure while holding the
lock, and then release it after having done the __f_setown.  That still
guarantees that 'struct pid' won't go away from under us, which is all
we really ever needed.
Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Américo Wang <xiyou.wangcong@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

80e1e823

Take ima_file_free() to proper place. · 89068c57
Al Viro authored Feb 07, 2010
```
Hooks: Just Say No.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
89068c57