Commits · 29963437a48475036353b95ab142bf199adb909e · Kirill Smelkov / linux

15 Mar, 2011 2 commits

IB/cm: Bump reference count on cm_id before invoking callback · 29963437

Sean Hefty authored Feb 23, 2011

When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
refcount of the cm_id is initialized to 1.  However, cm_process_work
will decrement the refcount after invoking all callbacks.  The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.

If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id.  This can lead to a crash when the cm callback thread tries
to access the cm_id.

This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>

29963437

RDMA/cma: Fix crash in request handlers · 25ae21a1

Sean Hefty authored Feb 23, 2011

Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS.  The crash has the following call trace:

    cm_process_work
       cma_req_handler
          cma_disable_callback
          rdma_create_id
             kzalloc
             init_completion
          cma_get_net_info
          cma_save_net_info
          cma_any_addr
             cma_zero_addr
          rdma_translate_ip
             rdma_copy_addr
          cma_acquire_dev
             rdma_addr_get_sgid
             ib_find_cached_gid
             cma_attach_to_dev
          ucma_event_handler
             kzalloc
             ib_copy_ah_attr_to_user
          cma_comp

[ preempted ]

    cma_write
        copy_from_user
        ucma_destroy_id
           copy_from_user
           _ucma_find_context
           ucma_put_ctx
           ucma_free_ctx
              rdma_destroy_id
                 cma_exch
                 cma_cancel_operation
                 rdma_node_get_transport

        rt_mutex_slowunlock
        bad_area_nosemaphore
        oops_enter

They were able to reproduce the crash multiple times with the
following details:

    Crash seems to always happen on the:
            mutex_unlock(&conn_id->handler_mutex);
    as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers.  When a new connection request is received, the rdma_cm
allocates a new connection identifier.  This identifier has a single
reference count on it.  If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory.  However, the request
handlers may still be in the process of running.  When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>

25ae21a1

18 Feb, 2011 8 commits

Merge branch 'for-linus/bugfixes' of git://xenbits.xen.org/people/ianc/linux-2.6 · a5bbef0b
Linus Torvalds authored Feb 18, 2011
```
* 'for-linus/bugfixes' of git://xenbits.xen.org/people/ianc/linux-2.6:
  xen: suspend and resume system devices when running PVHVM
```
a5bbef0b

Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · bc3adfc6

Linus Torvalds authored Feb 18, 2011

* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
  workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
  workqueue: wake up a worker when a rescuer is leaving a gcwq

bc3adfc6

Expand CONFIG_DEBUG_LIST to several other list operations · 3c18d4de

Linus Torvalds authored Feb 18, 2011

When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.

However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.

So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3c18d4de

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 · 2a324ce7

Linus Torvalds authored Feb 17, 2011

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / Hibernate: Return error code when alloc_image_page() fails

2a324ce7

Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · c8b392e9

Linus Torvalds authored Feb 17, 2011

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: add missing frac fb div flag for dce4+
  drm/radeon/kms: do not reject X16 and Y16X16 floating-point formats on r300
  drm/nouveau: fix suspend/resume on GPUs that don't have PM support
  drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
  drm/nv40: fix tiling-related setup for a number of chipsets
  drm/nouveau: fix non-EDIDful native mode selection
  drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
  drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
  drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.

c8b392e9

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · dd8240bd

Linus Torvalds authored Feb 17, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/qib: Prevent double completions after a timeout or RNR error
  IB/qib: Fix double add_timer()
  RDMA/nes: Don't generate async events for unregistered devices

dd8240bd

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · a0aeccdc

Linus Torvalds authored Feb 17, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix NMI startup bug which also breaks perf.
  sparc: fix size argument to find_next_zero_bit()
  sparc: use bitmap_set()
  sparc32: unaligned memory access (MNA) trap handler bug

a0aeccdc

fs/partitions: Validate map_count in Mac partition tables · fa7ea87a

Timo Warns authored Feb 17, 2011

Validate number of blocks in map and remove redundant variable.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fa7ea87a

17 Feb, 2011 10 commits

Merge branches 'nes' and 'qib' into for-next · 814b0a61
Roland Dreier authored Feb 17, 2011

814b0a61

IB/qib: Prevent double completions after a timeout or RNR error · c0af2c05

Mike Marciniszyn authored Feb 16, 2011

There is a double completion associated with error handling for RC QPs.

The sequence is:

 - The do_rc_ack() routine fields an RNR nack and there are 0
   rnr_retries configured on the QP.
 - qib_error_qp() stops the pending timer
 - qib_rc_send_complete() is called from sdma_complete()
 - qib_rc_send_complete() starts the timer because the msb of the psn
   just completed says an ack is needed.
 - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
 - rc_timeout() calls qib_restart_rc()
 - qib_restart_rc() calls qib_send_complete() with a
   IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
   past

The fix avoids starting the timer since another packet will never
arrive.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

c0af2c05

xen: suspend and resume system devices when running PVHVM · 8dd38383

Ian Campbell authored Feb 17, 2011

Otherwise we fail to properly suspend/resume all of the emulated devices.

Something between 2.6.38-rc2 and rc3 appears to have exposed this
issue, but it's always been wrong not to do this.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>

8dd38383

Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux · ee715087

Linus Torvalds authored Feb 16, 2011

* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
  nfsd: correctly handle return value from nfsd_map_name_to_*

ee715087

Merge remote branch 'nouveau/drm-nouveau-next' of /ssd/git/drm-nouveau-next into drm-fixes · e138018e

Dave Airlie authored Feb 17, 2011

* 'nouveau/drm-nouveau-next' of /ssd/git/drm-nouveau-next:
  drm/nouveau: fix suspend/resume on GPUs that don't have PM support
  drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
  drm/nv40: fix tiling-related setup for a number of chipsets
  drm/nouveau: fix non-EDIDful native mode selection
  drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
  drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
  drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.

e138018e

drm/radeon/kms: add missing frac fb div flag for dce4+ · 9f4283f4

Alex Deucher authored Feb 16, 2011

The fixed ref/post dividers are set by the AdjustPll table
rather than the ss info table on dce4+.  Make sure we enable
the fractional feedback dividers when using a fixed post
or ref divider on them as well.

Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=29272Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

9f4283f4

drm/radeon/kms: do not reject X16 and Y16X16 floating-point formats on r300 · 16e4b8a6
Marek Olšák authored Feb 16, 2011
```
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
```
16e4b8a6

drm/nouveau: fix suspend/resume on GPUs that don't have PM support · 317495b2

Ben Skeggs authored Feb 17, 2011

This has been broken since 2.6.37, and fixes resume on a couple of fermi
boards I have access to.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

317495b2

Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6 · 68ac01a2
Linus Torvalds authored Feb 16, 2011
```
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
  spi/pxa2xx pci: fix the release - remove race
```
68ac01a2

block: revert block_dev read-only check · e51900f7

Chuck Ebbert authored Feb 16, 2011

This reverts commit 75f1dc0d ("block: check bdev_read_only() from
blkdev_get()").  That commit added stricter checking to make sure
devices that were being used read-only were actually opened in that
mode.

It turns out that the change breaks a bunch of kernel code that opens
block devices.  Affected systems include dm, md, and the loop device.
Because strict checking for read-only opens of block devices was not
done before this, the code that opens the devices was opening them
read-write even if they were being used read-only.  Auditing all that
code will take time, and new userspace packages for dm, mdadm, etc.
will also be required.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e51900f7

16 Feb, 2011 15 commits

drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup() · b8884da6

Ben Skeggs authored Feb 14, 2011

We free the temporary binding before leaving this function, so we also have
to wait for the move to actually complete.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

b8884da6

drm/nv40: fix tiling-related setup for a number of chipsets · 1dc32671

Ben Skeggs authored Feb 07, 2011

Due to the default case handling the older chipsets, a bunch of the newer
ones ended up having the wrong tiling regs used.  This commit switches the
default case to handle the newest chipsets.

This also makes nv4e touch the "extra" tiling regs.  "nv" doesn't touch
them for C51 but traces of the NVIDIA binary driver show it being done
there.

I couldn't find NV41/NV45 traces to confirm the behaviour there, but an
educated guess was taken at each of them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

1dc32671

drm/nouveau: fix non-EDIDful native mode selection · 0d9b6193

Ben Skeggs authored Feb 07, 2011

The DRM core fills this value, but at too late a stage for this to work,
possibly resulting in an undesirable mode being selected.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

0d9b6193

drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards. · 77b1d5dc
Francisco Jerez authored Feb 03, 2011
```
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
```
77b1d5dc

drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode. · 87886221

Francisco Jerez authored Feb 03, 2011

Reported-by: Alex Buell <alex.buell@munted.org.uk>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

87886221

drm/nv10: Fix crash when allocating a BO larger than half the available VRAM. · 812f219a

Francisco Jerez authored Feb 03, 2011

Reported-by: Alex Buell <alex.buell@munted.org.uk>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

812f219a

nfsd: correctly handle return value from nfsd_map_name_to_* · 47c85291

NeilBrown authored Feb 16, 2011

These functions return an nfs status, not a host_err.  So don't
try to convert  before returning.

This is a regression introduced by
3c726023; I fixed up two of the callers,
but missed these two.

Cc: stable@kernel.org
Reported-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

47c85291

PM / Hibernate: Return error code when alloc_image_page() fails · 2e725a06

Stanislaw Gruszka authored Feb 12, 2011

Currently we return 0 in swsusp_alloc() when alloc_image_page() fails.
Fix that.  Also remove unneeded "error" variable since the only
useful value of error is -ENOMEM.

[rjw: Fixed up the changelog and changed subject.]
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

2e725a06

workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long · 3233cdbd

Tejun Heo authored Feb 16, 2011

MAYDAY_INITIAL_TIMEOUT is defined as HZ / 100 and depending on
configuration may end up 0 or 1.  Even when it's 1, depending on when
the mayday timer is added in the current jiffy interval, it may expire
way before a jiffy has passed.

Make sure MAYDAY_INITIAL_TIMEOUT is at least two to guarantee that at
least a full jiffy has passed before calling rescuers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ray Jui <rjui@broadcom.com>
Cc: stable@kernel.org

3233cdbd

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · a2640111

Linus Torvalds authored Feb 16, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] qla2xxx: Return DID_NO_CONNECT when FC device is lost.
  [SCSI] mptfusion: Bump version 03.04.18
  [SCSI] mptfusion: Fix Incorrect return value in mptscsih_dev_reset
  [SCSI] mptfusion: mptctl_release is required in mptctl.c
  [SCSI] target: fix use after free detected by SLUB poison
  [SCSI] target: Remove procfs based target_core_mib.c code
  [SCSI] target: Fix SCF_SCSI_CONTROL_SG_IO_CDB breakage
  [SCSI] target: Fix top-level configfs_subsystem default_group shutdown breakage
  [SCSI] target: fixed missing lock drop in error path
  [SCSI] target: Fix demo-mode MappedLUN shutdown UA/PR breakage
  [SCSI] target/iblock: Fix failed bd claim NULL pointer dereference
  [SCSI] target: iblock/pscsi claim checking for NULL instead of IS_ERR
  [SCSI] scsi_debug: Fix 32-bit overflow in do_device_access causing memory corruption
  [SCSI] qla2xxx: Change from irq to irqsave with host_lock
  [SCSI] qla2xxx: Fix race that could hang kthread_stop()

a2640111

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 0d6e82e7

Linus Torvalds authored Feb 16, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha-s390 - Reset index after processing partial block

0d6e82e7

vfs: fix BUG_ON() in fs/namei.c:1461 · 3abb17e8

Linus Torvalds authored Feb 16, 2011

When Al moved the nameidata_dentry_drop_rcu_maybe() call into the
do_follow_link function in commit 844a3917 ("nothing in
do_follow_link() is going to see RCU"), he mistakenly left the

	BUG_ON(inode != path->dentry->d_inode);

behind.  Which would otherwise be ok, but that BUG_ON() really needs to
be _after_ dropping RCU, since the dentry isn't necessarily stable
otherwise.

So complete the code movement in that commit, and move the BUG_ON() into
do_follow_link() too.  This means that we need to pass in 'inode' as an
argument (just for this one use), but that's a small thing.  And
eventually we may be confident enough in our path lookup that we can
just remove the BUG_ON() and the unnecessary inode argument.
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3abb17e8

workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' · 58a69cb4

Tejun Heo authored Feb 16, 2011

There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'.  The former is the more prominent one.  The latter is
mostly used by workqueue and in a few other odd places.  Unify the
spelling to 'freezable'.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>

58a69cb4

Linux 2.6.38-rc5 · 85e2efbb
Linus Torvalds authored Feb 15, 2011

85e2efbb

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 048f039f

Linus Torvalds authored Feb 15, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: set flow handler for secondary interrupt controller of 5249
  m68knommu: remove use of IRQ_FLG_LOCK from 68360 platform support
  m68knommu: fix dereference of port.tty
  m68knommu: add missing linker __modver section
  m68knommu: fix mis-named variable int set_irq_chip loop
  m68knommu: add optimize memmove() function
  m68k: remove arch specific non-optimized memcmp()
  m68knommu: fix use of un-defined _TIF_WORK_MASK
  m68knommu: Rename m548x_wdt.c to m54xx_wdt.c
  m68knommu: fix m548x_wdt.c compilation after headers renaming
  m68knommu: Remove dependencies on nonexistent M68KNOMMU

048f039f

15 Feb, 2011 5 commits

m68knommu: set flow handler for secondary interrupt controller of 5249 · 86d306c9

Greg Ungerer authored Feb 09, 2011

The secondary interrupt controller of the ColdFire 5249 code is not
setting the edge triggered flow handler. Set it.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

86d306c9

m68knommu: remove use of IRQ_FLG_LOCK from 68360 platform support · 4531dab4

Greg Ungerer authored Feb 08, 2011

The m68knommu arch does not define or use IRQ_FLG_LOCK in its irq
subsystem. Remove obsolete use of it.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

4531dab4

m68knommu: fix dereference of port.tty · bc0c36d3

Greg Ungerer authored Feb 08, 2011

The struct_tty associated with a port is now a direct pointer
from within the local private driver info struct. So fix all uses
of it.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

bc0c36d3

m68knommu: add missing linker __modver section · 81174262

Greg Ungerer authored Feb 08, 2011

Add missing linker section __modver to fix:

  LD      vmlinux
/usr/local/bin/../m68k-uclinux/bin/ld.real: error: no memory region specified for loadable section `__modver'
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

81174262

m68knommu: fix mis-named variable int set_irq_chip loop · b14769d9

Greg Ungerer authored Feb 08, 2011

Compiling for 68360 targets gives:

CC arch/m68knommu/platform/68360/ints.o
arch/m68knommu/platform/68360/ints.c: In function ‘init_IRQ’:
arch/m68knommu/platform/68360/ints.c:135:16: error: ‘irq’ undeclared (first use in this function)
arch/m68knommu/platform/68360/ints.c:135:16: note: each undeclared identifier is reported only once for each function it appears in

Fix variable name used.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

b14769d9