Commits · cb7ac02a7453fb64b555e26660eecac8977c04a1 · Kirill Smelkov / linux

27 Jan, 2015 40 commits

xhci: Check if slot is already in default state before moving it there · cb7ac02a

Mathias Nyman authored Jan 09, 2015

commit f161ead7 upstream.

Solves xhci error cases with debug messages:
xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 1.
usb 1-6: hub failed to enable device, error -22

xhci will give a context state error if we try to set a slot in default
state to the same default state with a special address device command.

Turns out this happends in several cases:
- retry reading the device rescriptor in hub_port_init()
- usb_reset_device() is called for a slot in default state
- in resume path, usb_port_resume() calls hub_port_init()

The default state is usually reached from most states with a reset device
command without any context state errors, but using the address device
command with BSA bit set (block set address) only works from the enabled
state and will otherwise cause context error.

solve this by checking if we are already in the default state before issuing
a address device BSA=1 command.

Fixes: 48fc7dbd ("usb: xhci: change enumeration scheme to 'new scheme'")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cb7ac02a

cxl: Unmap MMIO regions when detaching a context · 00ba0215

Ian Munsie authored Dec 08, 2014

commit b123429e upstream.

If we need to force detach a context (e.g. due to EEH or simply force
unbinding the driver) we should prevent the userspace contexts from
being able to access the Problem State Area MMIO region further, which
they may have mapped with mmap().

This patch unmaps any mapped MMIO regions when detaching a userspace
context.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

00ba0215

cxl: Add timeout to process element commands · 3db65dba

Ian Munsie authored Dec 08, 2014

commit a98e6e9f upstream.

In the event that something goes wrong in the hardware and it is unable
to complete a process element comment we would end up polling forever,
effectively making the associated process unkillable.

This patch adds a timeout to the process element command code path, so
that we will give up if the hardware does not respond in a reasonable
time.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3db65dba

cxl: Change contexts_lock to a mutex to fix sleep while atomic bug · d4a26d11

Ian Munsie authored Dec 08, 2014

commit ee41d11d upstream.

We had a known sleep while atomic bug if a CXL device was forcefully
unbound while it was in use. This could occur as a result of EEH, or
manually induced with something like this while the device was in use:

echo 0000:01:00.0 > /sys/bus/pci/drivers/cxl-pci/unbind

The issue was that in this code path we iterated over each context and
forcefully detached it with the contexts_lock spin lock held, however
the detach also needed to take the spu_mutex, and call schedule.

This patch changes the contexts_lock to a mutex so that we are not in
atomic context while doing the detach, thereby avoiding the sleep while
atomic.

Also delete the related TODO comment, which suggested an alternate
solution which turned out to not be workable.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d4a26d11

ARC: [nsimosci] move peripherals to match model to FPGA · 70819eaa

Vineet Gupta authored Oct 01, 2014

commit e8ef060b upstream.

This allows the sdplite/Zebu images to run on OSCI simulation platform
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

70819eaa

drm/irq: BUG_ON() -> WARN_ON() · a7545203

Rob Clark authored Nov 08, 2014

commit 7f907bf2 upstream.

Let's make things a bit easier to debug when things go bad (potentially
under console_lock).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: Anand Moon <moon.linux@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a7545203

drm/i915: Don't call intel_prepare_page_flip() multiple times on gen2-4 · 00a80b75

Ville Syrjälä authored Dec 17, 2014

commit 7d47559e upstream.

The flip stall detector kicks in when pending>=INTEL_FLIP_COMPLETE. That
means if we first call intel_prepare_page_flip() but don't call
intel_finish_page_flip(), the next stall check will erroneosly think
the page flip was somehow stuck.

With enough debug spew emitted from the interrupt handler my 830 hangs
when this happens. My theory is that the previous vblank interrupt gets
sufficiently delayed that the handler will see the pending bit set in
IIR, but ISR still has the bit set as well (ie. the flip was processed
by CS but didn't complete yet). In this case the handler will proceed
to call intel_check_page_flip() immediately after
intel_prepare_page_flip(). It then tries to print a backtrace for the
stuck flip WARN, which apparetly results in way too much debug spew
delaying interrupt processing further. That then seems to cause an
endless loop in the interrupt handler, and the machine is dead until
the watchdog kicks in and reboots. At least limiting the number of
iterations of the loop in the interrupt handler also prevented the
hang.

So it seems better to not call intel_prepare_page_flip() without
immediately calling intel_finish_page_flip(). The IIR/ISR trickery
avoids races here so this is a perfectly safe thing to do.

v2: Fix typo in commit message (checkpatch)

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88381
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85888Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

00a80b75

drm/i915: Disable PSMI sleep messages on all rings around context switches · bb0edc80

Chris Wilson authored Dec 16, 2014

commit 2c550183 upstream.

There exists a current workaround to prevent a hang on context switch
should the ring go to sleep in the middle of the restore,
WaProgramMiArbOnOffAroundMiSetContext (applicable to all gen7+). In
spite of disabling arbitration (which prevents the ring from powering
down during the critical section) we were still hitting hangs that had
the hallmarks of the known erratum. That is we are still seeing hangs
"on the last instruction in the context restore". By comparing -nightly
(broken) with requests (working), we were able to deduce that it was the
semaphore LRI cross-talk that reproduced the original failure. The key
was that requests implemented deferred semaphore signalling, and
disabling that, i.e. emitting the semaphore signal to every other ring
after every batch restored the frequent hang.  Explicitly disabling PSMI
sleep on the RCS ring was insufficient, all the rings had to be awake to
prevent the hangs. Fortunately, we can reduce the wakelock to the
MI_SET_CONTEXT operation itself, and so should be able to limit the extra
power implications.

Since the MI_ARB_ON_OFF workaround is listed for all gen7 and above
products, we should apply this extra hammer for all of the same
platforms despite so far that we have only been able to reproduce the
hang on certain ivb and hsw models. The last question is whether we want
to always use the extra hammer or only when we know semaphores are in
operation. At the moment, we only use LRI on non-RCS rings for
semaphores, but that may change in the future with the possibility of
reintroducing this bug under subtle conditions.

v2: Make it explicit that the PSMI LRI are an extension to the original
workaround for the other rings.
v3: Bikeshedding variable names and whitespacing

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80660
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83677
Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Peter Frühberger <fritsch@xbmc.org>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bb0edc80

drm/i915: Force the CS stall for invalidate flushes · 3ea5944a

Chris Wilson authored Dec 16, 2014

commit add284a3 upstream.

In order to act as a full command barrier by itself, we need to tell the
pipecontrol to actually stall the command streamer while the flush runs.
We require the full command barrier before operations like
MI_SET_CONTEXT, which currently rely on a prior invalidate flush.

References: https://bugs.freedesktop.org/show_bug.cgi?id=83677
Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3ea5944a

drm/i915: Invalidate media caches on gen7 · 1c61a711

Chris Wilson authored Dec 16, 2014

commit 148b83d0 upstream.

In the gen7 pipe control there is an extra bit to flush the media
caches, so let's set it during cache invalidation flushes.

v2: Rename to MEDIA_STATE_CLEAR to be more inline with spec.

Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1c61a711

drm/nv4c/mc: disable msi · 715f7c82

Ilia Mirkin authored Dec 16, 2014

commit 4761703b upstream.

Several users have, over time, reported issues with MSI on these IGPs.
They're old, rarely available, and MSI doesn't provide such huge
advantages on them. Just disable.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87361
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74492
Fixes: fa8c9ac7 ("drm/nv4c/mc: nv4x igp's have a different msi rearm register")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

715f7c82

drm/i915: save/restore GMBUS freq across suspend/resume on gen4 · 6ba4636a

Jesse Barnes authored Dec 10, 2014

commit 9f49c376 upstream.

Should probably just init this in the GMbus code all the time, based on
the cdclk and HPLL like we do on newer platforms.  Ville has code for
that in a rework branch, but until then we can fix this bug fairly
easily.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76301Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Nikolay <mar.kolya@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6ba4636a

drm/i915: resume MST after reading back hw state · ddf6f9a4

Dave Airlie authored Dec 08, 2014

commit e7d6f7d7 upstream.

Otherwise the MST resume paths can hit DPMS paths
which hit state checker paths, which hit WARN_ON,
because the state checker is inconsistent with the
hw.

This fixes a bunch of WARN_ON's on resume after
undocking.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ddf6f9a4

drm/i915: Only warn the first time we attempt to mmio whilst suspended · a5872ca0

Chris Wilson authored Nov 24, 2014

commit 2b387059 upstream.

In all likelihood we will do a few hundred errnoneous register
operations if we do a single invalid register access whilst the device
is suspended. As each instance causes a WARN, this floods the system
logs and can make the system unresponsive.

The warning was first introduced in
commit b2ec142c
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Fri Feb 21 13:52:25 2014 -0300

    drm/i915: call assert_device_not_suspended at gen6_force_wake_work

and despite the claims the WARN is still encountered in the wild today.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a5872ca0

drm/i915: Disallow pin ioctl completely for kms drivers · 06b7ae70

Daniel Vetter authored Nov 24, 2014

commit d472fcc8 upstream.

The problem here is that SNA pins batchbuffers to etch out a bit more
performance. Iirc it started out as a w/a for i830M (which we've
implemented in the kernel since a long time already). The problem is
that the pin ioctl wasn't added in

commit d23db88c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri May 23 08:48:08 2014 +0200

    drm/i915: Prevent negative relocation deltas from wrapping

Fix this by simply disallowing pinning from userspace so that the
kernel is in full control of batch placement again. Especially since
distros are moving towards running X as non-root, so most users won't
even be able to see any benefits.

UMS support is dead now, but we need this minimal patch for
backporting. Follow-up patch will remove the pin ioctl code
completely.

Note to backporters: You must have both

commit b45305fc
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Dec 17 16:21:27 2012 +0100

    drm/i915: Implement workaround for broken CS tlb on i830/845

which laned in 3.8 and

commit c4d69da1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 8 14:25:41 2014 +0100

    drm/i915: Evict CS TLBs between batches

which is also marked cc: stable. Otherwise this could introduce a
regression by disabling the userspace w/a without the kernel w/a being
fully functional on i830/45.

References: https://bugs.freedesktop.org/show_bug.cgi?id=76554#c116
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

06b7ae70

drm/i915: Don't complain about stolen conflicts on gen3 · 09764573

Daniel Vetter authored Apr 11, 2014

commit 0b6d24c0 upstream.

Apparently stuff works that way on those machines.

I agree with Chris' concern that this is a bit risky but imo worth a
shot in -next just for fun. Afaics all these machines have the pci
resources allocated like that by the BIOS, so I suspect that it's all
ok.

This regression goes back to

commit eaba1b8f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 4 12:28:35 2013 +0100

    drm/i915: Verify that our stolen memory doesn't conflict

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76983
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71031Tested-by: lu hua <huax.lu@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

09764573

drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw · e8e547f6

Alex Deucher authored Dec 10, 2014

commit 410cce2a upstream.

The check was already in place in the dp mode_valid check, but
radeon_dp_get_dp_link_clock() never returned the high clock
mode_valid was checking for because that function clipped the
clock based on the hw capabilities.  Add an explicit check
in the mode_valid function.

bug:
https://bugs.freedesktop.org/show_bug.cgi?id=87172Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e8e547f6

drm/radeon: adjust default bapm settings for KV · 2710eb29

Alex Deucher authored Dec 15, 2014

commit 02ae7af5 upstream.

Enabling bapm seems to cause clocking problems on some
KV configurations.  Disable it by default for now.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2710eb29

drm/radeon: fix sad_count check for dce3 · ca01c8f0

Alex Deucher authored Dec 09, 2014

commit 5665c3eb upstream.

Make it consistent with the sad code for other asics to deal
with monitors that don't report sads.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=89461Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ca01c8f0

drm/radeon: KV has three PPLLs (v2) · 310a18b4

Alex Deucher authored Dec 05, 2014

commit fbedf1c3 upstream.

Enable all three in the driver.  Early documentation
indicated the 3rd one was used for something else, but
that is not the case.

v2: handle disable as well
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

310a18b4

drm/radeon: check the right ring in radeon_evict_flags() · 65c48e05

Alex Deucher authored Dec 03, 2014

commit 5e5c21ca upstream.

Check the that ring we are using for copies is functional
rather than the GFX ring.  On newer asics we use the DMA
ring for bo moves.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

65c48e05

drm/radeon: work around a hw bug in MGCG on CIK · b29b3ee3

Alex Deucher authored Nov 17, 2014

commit 4bb62c95 upstream.

Always need to set bit 0 of RLC_CGTT_MGCG_OVERRIDE
to avoid unreliable doorbell updates in some cases.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b29b3ee3

drm/radeon: fix typo in CI dpm disable · 0d052146

Alex Deucher authored Nov 07, 2014

commit 129acb7c upstream.

Need to disable DS, not enable it when disabling dpm.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0d052146

drm/dp-mst: Remove branches before dropping the reference · c3c16208

Daniel Vetter authored Dec 08, 2014

commit 0391359d upstream.

When we unplug a dp mst branch we unreference the entire tree from
the root towards the leaves. Which is ok, since that's the way the
pointers and so also the refcounts go.

But when we drop the reference we must make sure that we remove the
branches/ports from the lists/pointers before dropping the reference.
Otherwise the get_validated functions will still return it instead
of returning NULL (which indicates a potentially on-going unplug).

The mst branch destroy gets this right for ports: First it deletes
the port from the ports list, then it unrefs. But the ports destroy
function gets it wrong: First it unrefs, then it drops the ref. Which
means a zombie mst branch can still be validate with get_validated_mstb_ref
when it shouldn't.

Fix this.

This should address a backtrace Dave dug out somewhere on unplug:

 [<ffffffffa00cc262>] drm_dp_mst_get_validated_mstb_ref_locked+0x92/0xa0 [drm_kms_helper]
 [<ffffffffa00cc211>] drm_dp_mst_get_validated_mstb_ref_locked+0x41/0xa0 [drm_kms_helper]
 [<ffffffffa00cc2aa>] drm_dp_get_validated_mstb_ref+0x3a/0x60 [drm_kms_helper]
 [<ffffffffa00cc2fb>] drm_dp_payload_send_msg.isra.14+0x2b/0x100 [drm_kms_helper]
 [<ffffffffa00cc547>] drm_dp_update_payload_part1+0x177/0x360 [drm_kms_helper]
 [<ffffffffa015c52e>] intel_mst_disable_dp+0x3e/0x80 [i915]
 [<ffffffffa013d60b>] haswell_crtc_disable+0x1cb/0x340 [i915]
 [<ffffffffa0136739>] intel_crtc_control+0x49/0x100 [i915]
 [<ffffffffa0136857>] intel_crtc_update_dpms+0x67/0x80 [i915]
 [<ffffffffa013fa59>] intel_connector_dpms+0x59/0x70 [i915]
 [<ffffffffa015c752>] intel_dp_destroy_mst_connector+0x32/0xc0 [i915]
 [<ffffffffa00cb44b>] drm_dp_destroy_port+0x6b/0xa0 [drm_kms_helper]
 [<ffffffffa00cb588>] drm_dp_destroy_mst_branch_device+0x108/0x130 [drm_kms_helper]
 [<ffffffffa00cb3cd>] drm_dp_port_teardown_pdt+0x3d/0x50 [drm_kms_helper]
 [<ffffffffa00cdb79>] drm_dp_mst_handle_up_req+0x499/0x540 [drm_kms_helper]
 [<ffffffff810d9ead>] ? trace_hardirqs_on_caller+0x15d/0x200 [<ffffffffa00cdc73>]
 drm_dp_mst_hpd_irq+0x53/0xa00 [drm_kms_helper] [<ffffffffa00c7dfb>]
 ? drm_dp_dpcd_read+0x1b/0x20 [drm_kms_helper] [<ffffffffa0153ed8>]
 ? intel_dp_dpcd_read_wake+0x38/0x70 [i915] [<ffffffffa015a225>]
 intel_dp_check_mst_status+0xb5/0x250 [i915] [<ffffffffa015ac71>]
 intel_dp_hpd_pulse+0x181/0x210 [i915] [<ffffffffa01104f6>]
 i915_digport_work_func+0x96/0x120 [i915]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c3c16208

drm/dp: retry AUX transactions 32 times (v1.1) · 1cc635aa

Dave Airlie authored Nov 26, 2014

commit 19a93f04 upstream.

At least on two MST devices I've tested with, when
they are link training downstream, they are totally
unable to handle aux ch msgs, so they defer like nuts.
I tried 16, it wasn't enough, 32 seems better.

This fixes one Dell 4k monitor and one of the
MST hubs.

v1.1: fixup comment (Tom).
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1cc635aa

drm/fb_helper: move deferred fb checking into restore mode (v2) · f5f6dd6d

Dave Airlie authored Nov 26, 2014

commit e2809c7d upstream.

On MST systems the monitors don't appear when we set the fb up,
but plymouth opens the drm device and holds it open while they
come up, when plymouth finishes and lastclose gets called we
don't do the delayed fb probe, so the monitor never appears on the
console.

Fix this by moving the delayed checking into the mode restore.

v2: Daniel suggested that ->delayed_hotplug is set under
the mode_config mutex, so we should check it under that as
well, while we are in the area.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f5f6dd6d

drm/ttm: Avoid memory allocation from shrinker functions. · 33084139

Tetsuo Handa authored Nov 13, 2014

commit 881fdaa5 upstream.

Andrew Morton wrote:
> On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> > Andrew Morton wrote:
> > > Poor ttm guys - this is a bit of a trap we set for them.
> >
> > Commit a91576d7 ("drm/ttm: Pass GFP flags in order to avoid deadlock.")
> > changed to use sc->gfp_mask rather than GFP_KERNEL.
> >
> > -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
> > -                       GFP_KERNEL);
> > +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> >
> > But this bug is caused by sc->gfp_mask containing some flags which are not
> > in GFP_KERNEL, right? Then, I think
> >
> > -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> > +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL);
> >
> > would hide this bug.
> >
> > But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag)
>
> Well no - ttm_page_pool_free() should stop calling kmalloc altogether.
> Just do
>
> 	struct page *pages_to_free[16];
>
> and rework the code to free 16 pages at a time.  Easy.

Well, ttm code wants to process 512 pages at a time for performance.
Memory footprint increased by 512 * sizeof(struct page *) buffer is
only 4096 bytes. What about using static buffer like below?
----------
>From d3cb5393c9c8099d6b37e769f78c31af1541fe8c Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 13 Nov 2014 22:21:54 +0900
Subject: drm/ttm: Avoid memory allocation from shrinker functions.

Commit a91576d7 ("drm/ttm: Pass GFP flags in order to avoid
deadlock.") caused BUG_ON() due to sc->gfp_mask containing flags
which are not in GFP_KERNEL.

  https://bugzilla.kernel.org/show_bug.cgi?id=87891

Changing from sc->gfp_mask to (sc->gfp_mask & GFP_KERNEL) would
avoid the BUG_ON(), but avoiding memory allocation from shrinker
function is better and reliable fix.

Shrinker function is already serialized by global lock, and
clean up function is called after shrinker function is unregistered.
Thus, we can use static buffer when called from shrinker function
and clean up function.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

33084139

drm/vmwgfx: Fix fence event code · a869935d

Thomas Hellstrom authored Dec 02, 2014

commit 89669e7a upstream.

The commit "vmwgfx: Rework fence event action" introduced a number of bugs
that are fixed with this commit:

a) A forgotten return stateemnt.
b) An if statement with identical branches.
Reported-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a869935d

drm/vmwgfx: Fix error printout on signals pending · ad2b9bb4

Thomas Hellstrom authored Nov 25, 2014

commit e338c4c2 upstream.

The function vmw_master_check() might return -ERESTARTSYS if there is a
signal pending, indicating that the IOCTL should be rerun, potentially from
user-space. At that point we shouldn't print out an error message since that
is not an error condition. In short, avoid bloating the kernel log when a
process refuses to die on SIGTERM.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ad2b9bb4

drm/vmwgfx: Don't use memory accounting for kernel-side fence objects · 28c54a26

Thomas Hellstrom authored Dec 02, 2014

commit 1f563a6a upstream.

Kernel side fence objects are used when unbinding resources and may thus be
created as part of a memory reclaim operation. This might trigger recursive
memory reclaims and result in the kernel running out of stack space.

So a simple way out is to avoid accounting of these fence objects.
In principle this is OK since while user-space can trigger the creation of
such objects, it can't really hold on to them. However, their lifetime is
quite long, so some form of accounting should perhaps be implemented in the
future.

Fixes kernel crashes when running, for example viewperf11 ensight-04 test 3
with low system memory settings.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

28c54a26

enic: fix rx skb checksum · 4c056b89

Govindarajulu Varadarajan authored Dec 18, 2014

[ Upstream commit 17e96834 ]

Hardware always provides compliment of IP pseudo checksum. Stack expects
whole packet checksum without pseudo checksum if CHECKSUM_COMPLETE is set.

This causes checksum error in nf & ovs.

kernel: qg-19546f09-f2: hw csum failure
kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: GF          O--------------   3.10.0-123.8.1.el7.x86_64 #1
kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
kernel: ffff881218f40000 df68243feb35e3a8 ffff881237a43ab8 ffffffff815e237b
kernel: ffff881237a43ad0 ffffffff814cd4ca ffff8829ec71eb00 ffff881237a43af0
kernel: ffffffff814c6232 0000000000000286 ffff8829ec71eb00 ffff881237a43b00
kernel: Call Trace:
kernel: <IRQ>  [<ffffffff815e237b>] dump_stack+0x19/0x1b
kernel: [<ffffffff814cd4ca>] netdev_rx_csum_fault+0x3a/0x40
kernel: [<ffffffff814c6232>] __skb_checksum_complete_head+0x62/0x70
kernel: [<ffffffff814c6251>] __skb_checksum_complete+0x11/0x20
kernel: [<ffffffff8155a20c>] nf_ip_checksum+0xcc/0x100
kernel: [<ffffffffa049edc7>] icmp_error+0x1f7/0x35c [nf_conntrack_ipv4]
kernel: [<ffffffff814cf419>] ? netif_rx+0xb9/0x1d0
kernel: [<ffffffffa040eb7b>] ? internal_dev_recv+0xdb/0x130 [openvswitch]
kernel: [<ffffffffa04c8330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
kernel: [<ffffffffa049e302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
kernel: [<ffffffff815005ca>] nf_iterate+0xaa/0xc0
kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
kernel: [<ffffffff81500664>] nf_hook_slow+0x84/0x140
kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
kernel: [<ffffffff81509dd4>] ip_rcv+0x344/0x380

Hardware verifies IP & tcp/udp header checksum but does not provide payload
checksum, use CHECKSUM_UNNECESSARY. Set it only if its valid IP tcp/udp packet.

Cc: Jiri Benc <jbenc@redhat.com>
Cc: Stefan Assmann <sassmann@redhat.com>
Reported-by: Sunil Choudhary <schoudha@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

4c056b89

team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin · 58f2f9c8

Jiri Pirko authored Jan 14, 2015

[ Upstream commit b0d11b42 ]

This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").

Consider following scenario:

count_pending == 2
   CPU0                                           CPU1
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
 team_notify_peers
   atomic_add (adding 1 to count_pending)
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 1)
					  schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to 0)
   schedule_delayed_work
					team_notify_peers_work
					  atomic_dec_and_test (dec count_pending to -1)

Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.

Fixes: fc423ff0 ("team: add peer notification")
Fixes: 492b200e  ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

58f2f9c8

alx: fix alx_poll() · ff9df482

Eric Dumazet authored Jan 11, 2015

[ Upstream commit 7a05dc64 ]

Commit d75b1ade ("net: less interrupt masking in NAPI") uncovered
wrong alx_poll() behavior.

A NAPI poll() handler is supposed to return exactly the budget when/if
napi_complete() has not been called.

It is also supposed to return number of frames that were received, so
that netdev_budget can have a meaning.

Also, in case of TX pressure, we still have to dequeue received
packets : alx_clean_rx_irq() has to be called even if
alx_clean_tx_irq(alx) returns false, otherwise device is half duplex.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade ("net: less interrupt masking in NAPI")
Reported-by: Oded Gabbay <oded.gabbay@amd.com>
Bisected-by: Oded Gabbay <oded.gabbay@amd.com>
Tested-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ff9df482

xen-netback: fixing the propagation of the transmit shaper timeout · 2aab9536

Palik, Imre authored Jan 06, 2015

[ Upstream commit 07ff890d ]

Since e9ce7cb6 ("xen-netback: Factor queue-specific data into queue struct"),
the transimt shaper timeout is always set to 0.  The value the user sets via
xenbus is never propagated to the transmit shaper.

This patch fixes the issue.

Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2aab9536

tcp: Do not apply TSO segment limit to non-TSO packets · 04e5427d

Herbert Xu authored Jan 01, 2015

[ Upstream commit 843925f3 ]

Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.

In fact the problem was completely unrelated to IPsec.  The bug is
also reproducible if you just disable TSO/GSO.

The problem is that when the MSS goes down, existing queued packet
on the TX queue that have not been transmitted yet all look like
TSO packets and get treated as such.

This then triggers a bug where tcp_mss_split_point tells us to
generate a zero-sized packet on the TX queue.  Once that happens
we're screwed because the zero-sized packet can never be removed
by ACKs.

Fixes: 1485348d ("tcp: Apply device TSO segment limit earlier")
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

04e5427d

net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow · f3721ea3

Maor Gottlieb authored Dec 30, 2014

[ Upstream commit a51e0df4 ]

Previously, mlx4_mt_rereg_write filled the MPT's entity_size with the
old MTT's page shift, which could result in using an incorrect offset.
Fix the initialization to be after we calculate the new MTT offset.

In addition, assign mtt order to -1 after calling mlx4_mtt_cleanup. This
is necessary in order to mark the MTT as invalid and avoid freeing it later.

Fixes: e630664c ('mlx4_core: Add helper functions to support MR re-registration')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f3721ea3

net: Generalize ndo_gso_check to ndo_features_check · 12d5e0bb

Jesse Gross authored Dec 23, 2014

[ Upstream commit 5f35227e ]

GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.

This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.

By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.

CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Tom Herbert <therbert@google.com>
Fixes: 04ffcb25 ("net: Add ndo_gso_check")
Tested-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

12d5e0bb

net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding · 3890e428

Jay Vosburgh authored Dec 19, 2014

[ Upstream commit 2c26d34b ]

When using VXLAN tunnels and a sky2 device, I have experienced
checksum failures of the following type:

[ 4297.761899] eth0: hw csum failure
[...]
[ 4297.765223] Call Trace:
[ 4297.765224]  <IRQ>  [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235]  [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238]  [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240]  [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243]  [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246]  [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360

	These are reliably reproduced in a network topology of:

container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch

	When VXLAN encapsulated traffic is received from a similarly
configured peer, the above warning is generated in the receive
processing of the encapsulated packet.  Note that the warning is
associated with the container eth0.

        The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
because the packet is an encapsulated Ethernet frame, the checksum
generated by the hardware includes the inner protocol and Ethernet
headers.

	The receive code is careful to update the skb->csum, except in
__dev_forward_skb, as called by dev_forward_skb.  __dev_forward_skb
calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
to skip over the Ethernet header, but does not update skb->csum when
doing so.

	This patch resolves the problem by adding a call to
skb_postpull_rcsum to update the skb->csum after the call to
eth_type_trans.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3890e428

net: Reset secmark when scrubbing packet · 253ab524

Thomas Graf authored Dec 23, 2014

[ Upstream commit b8fb4e06 ]

skb_scrub_packet() is called when a packet switches between a context
such as between underlay and overlay, between namespaces, or between
L3 subnets.

While we already scrub the packet mark, connection tracking entry,
and cached destination, the security mark/context is left intact.

It seems wrong to inherit the security context of a packet when going
from overlay to underlay or across forwarding paths.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

253ab524

net: Fix stacked vlan offload features computation · 29862482

Toshiaki Makita authored Dec 22, 2014

[ Upstream commit 796f2da8 ]

When vlan tags are stacked, it is very likely that the outer tag is stored
in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
Currently netif_skb_features() first looks at skb->protocol even if there
is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
encapsulated by the inner vlan instead of the inner vlan protocol.
This allows GSO packets to be passed to HW and they end up being
corrupted.

Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

29862482