Commits · a0840275e3ebddd5d1349cd4908777e11ba50311 · Kirill Smelkov / linux

07 Aug, 2017 40 commits

pstore: Correctly initialize spinlock and flags · a0840275

Kees Cook authored Feb 09, 2017

commit 76d5692a upstream.

The ram backend wasn't always initializing its spinlock correctly. Since
it was coming from kzalloc memory, though, it was harmless on
architectures that initialize unlocked spinlocks to 0 (at least x86 and
ARM). This also fixes a possibly ignored flag setting too.

When running under CONFIG_DEBUG_SPINLOCK, the following Oops was visible:

[    0.760836] persistent_ram: found existing buffer, size 29988, start 29988
[    0.765112] persistent_ram: found existing buffer, size 30105, start 30105
[    0.769435] persistent_ram: found existing buffer, size 118542, start 118542
[    0.785960] persistent_ram: found existing buffer, size 0, start 0
[    0.786098] persistent_ram: found existing buffer, size 0, start 0
[    0.786131] pstore: using zlib compression
[    0.790716] BUG: spinlock bad magic on CPU#0, swapper/0/1
[    0.790729]  lock: 0xffffffc0d1ca9bb0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[    0.790742] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc2+ #913
[    0.790747] Hardware name: Google Kevin (DT)
[    0.790750] Call trace:
[    0.790768] [<ffffff900808ae88>] dump_backtrace+0x0/0x2bc
[    0.790780] [<ffffff900808b164>] show_stack+0x20/0x28
[    0.790794] [<ffffff9008460ee0>] dump_stack+0xa4/0xcc
[    0.790809] [<ffffff9008113cfc>] spin_dump+0xe0/0xf0
[    0.790821] [<ffffff9008113d3c>] spin_bug+0x30/0x3c
[    0.790834] [<ffffff9008113e28>] do_raw_spin_lock+0x50/0x1b8
[    0.790846] [<ffffff9008a2d2ec>] _raw_spin_lock_irqsave+0x54/0x6c
[    0.790862] [<ffffff90083ac3b4>] buffer_size_add+0x48/0xcc
[    0.790875] [<ffffff90083acb34>] persistent_ram_write+0x60/0x11c
[    0.790888] [<ffffff90083aab1c>] ramoops_pstore_write_buf+0xd4/0x2a4
[    0.790900] [<ffffff90083a9d3c>] pstore_console_write+0xf0/0x134
[    0.790912] [<ffffff900811c304>] console_unlock+0x48c/0x5e8
[    0.790923] [<ffffff900811da18>] register_console+0x3b0/0x4d4
[    0.790935] [<ffffff90083aa7d0>] pstore_register+0x1a8/0x234
[    0.790947] [<ffffff90083ac250>] ramoops_probe+0x6b8/0x7d4
[    0.790961] [<ffffff90085ca548>] platform_drv_probe+0x7c/0xd0
[    0.790972] [<ffffff90085c76ac>] driver_probe_device+0x1b4/0x3bc
[    0.790982] [<ffffff90085c7ac8>] __device_attach_driver+0xc8/0xf4
[    0.790996] [<ffffff90085c4bfc>] bus_for_each_drv+0xb4/0xe4
[    0.791006] [<ffffff90085c7414>] __device_attach+0xd0/0x158
[    0.791016] [<ffffff90085c7b18>] device_initial_probe+0x24/0x30
[    0.791026] [<ffffff90085c648c>] bus_probe_device+0x50/0xe4
[    0.791038] [<ffffff90085c35b8>] device_add+0x3a4/0x76c
[    0.791051] [<ffffff90087d0e84>] of_device_add+0x74/0x84
[    0.791062] [<ffffff90087d19b8>] of_platform_device_create_pdata+0xc0/0x100
[    0.791073] [<ffffff90087d1a2c>] of_platform_device_create+0x34/0x40
[    0.791086] [<ffffff900903c910>] of_platform_default_populate_init+0x58/0x78
[    0.791097] [<ffffff90080831fc>] do_one_initcall+0x88/0x160
[    0.791109] [<ffffff90090010ac>] kernel_init_freeable+0x264/0x31c
[    0.791123] [<ffffff9008a25bd0>] kernel_init+0x18/0x11c
[    0.791133] [<ffffff9008082ec0>] ret_from_fork+0x10/0x50
[    0.793717] console [pstore-1] enabled
[    0.797845] pstore: Registered ramoops as persistent store backend
[    0.804647] ramoops: attached 0x100000@0xf7edc000, ecc: 0/0

Fixes: 663deb47 ("pstore: Allow prz to control need for locking")
Fixes: 10970449 ("pstore: Make spinlock per zone instead of global")
Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a0840275

pstore: Allow prz to control need for locking · 46930803

Joel Fernandes authored Oct 20, 2016

commit 663deb47 upstream.

In preparation of not locking at all for certain buffers depending on if
there's contention, make locking optional depending on the initialization
of the prz.
Signed-off-by: Joel Fernandes <joelaf@google.com>
[kees: moved locking flag into prz instead of via caller arguments]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

46930803

v4l: s5c73m3: fix negation operator · 5463a3dc

Andrzej Hajda authored Jan 05, 2017

commit a2370ba2 upstream.

Bool values should be negated using logical operators. Using bitwise operators
results in unexpected and possibly incorrect results.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5463a3dc

dentry name snapshots · ad25f11e

Al Viro authored Jul 07, 2017

commit 49d31c2f upstream.

take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified).  In either case the pointer to stable
string is stored into the same structure.

dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().

Intended use:
	struct name_snapshot s;

	take_dentry_name_snapshot(&s, dentry);
	...
	access s.name
	...
	release_dentry_name_snapshot(&s);

Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ad25f11e

ipmi/watchdog: fix watchdog timeout set on reboot · d933777b

Valentin Vidic authored May 05, 2017

commit 860f01e9 upstream.

systemd by default starts watchdog on reboot and sets the timer to
ShutdownWatchdogSec=10min.  Reboot handler in ipmi_watchdog than reduces
the timer to 120s which is not enough time to boot a Xen machine with
a lot of RAM.  As a result the machine is rebooted the second time
during the long run of (XEN) Scrubbing Free RAM.....

Fix this by setting the timer to 120s only if it was previously
set to a low value.
Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d933777b

RDMA/uverbs: Fix the check for port number · 19655366

Ismail, Mustafa authored Jul 14, 2017

commit 5a7a88f1 upstream.

The port number is only valid if IB_QP_PORT is set in the mask.
So only check port number if it is valid to prevent modify_qp from
failing due to an invalid port number.

Fixes: 5ecce4c9("Check port number supplied by user verbs cmds")
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

19655366

sched/cgroup: Move sched_online_group() back into css_online() to fix crash · 62b5776c

Konstantin Khlebnikov authored Feb 08, 2017

commit 96b77745 upstream.

Commit:

  2f5177f0 ("sched/cgroup: Fix/cleanup cgroup teardown/init")

.. moved sched_online_group() from css_online() to css_alloc().
It exposes half-baked task group into global lists before initializing
generic cgroup stuff.

LTP testcase (third in cgroup_regression_test) written for testing
similar race in kernels 2.6.26-2.6.28 easily triggers this oops:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: kernfs_path_from_node_locked+0x260/0x320
  CPU: 1 PID: 30346 Comm: cat Not tainted 4.10.0-rc5-test #4
  Call Trace:
  ? kernfs_path_from_node+0x4f/0x60
  kernfs_path_from_node+0x3e/0x60
  print_rt_rq+0x44/0x2b0
  print_rt_stats+0x7a/0xd0
  print_cpu+0x2fc/0xe80
  ? __might_sleep+0x4a/0x80
  sched_debug_show+0x17/0x30
  seq_read+0xf2/0x3b0
  proc_reg_read+0x42/0x70
  __vfs_read+0x28/0x130
  ? security_file_permission+0x9b/0xc0
  ? rw_verify_area+0x4e/0xb0
  vfs_read+0xa5/0x170
  SyS_read+0x46/0xa0
  entry_SYSCALL_64_fastpath+0x1e/0xad

Here the task group is already linked into the global RCU-protected 'task_groups'
list, but the css->cgroup pointer is still NULL.

This patch reverts this chunk and moves online back to css_online().
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2f5177f0 ("sched/cgroup: Fix/cleanup cgroup teardown/init")
Link: http://lkml.kernel.org/r/148655324740.424917.5302984537258726349.stgit@buzzSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

62b5776c

mailbox: handle empty message in tx_tick · 016a638a

Sudeep Holla authored Mar 21, 2017

commit cb710ab1 upstream.

We already check if the message is empty before calling the client
tx_done callback. Calling completion on a wait event is also invalid
if the message is empty.

This patch moves the existing empty message check earlier.

Fixes: 2b6d83e2 ("mailbox: Introduce framework for mailbox")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

016a638a

mailbox: skip complete wait event if timer expired · abe9090a

Sudeep Holla authored Mar 21, 2017

commit cc6eeaa3 upstream.

If a wait_for_completion_timeout() call returns due to a timeout,
complete() can get called after returning from the wait which is
incorrect and can cause subsequent transmissions on a channel to fail.
Since the wait_for_completion_timeout() sees the completion variable
is non-zero caused by the erroneous/spurious complete() call, and
it immediately returns without waiting for the time as expected by the
client.

This patch fixes the issue by skipping complete() call for the timer
expiry.

Fixes: 2b6d83e2 ("mailbox: Introduce framework for mailbox")
Reported-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

abe9090a

mailbox: always wait in mbox_send_message for blocking Tx mode · a23fba81

Sudeep Holla authored Mar 21, 2017

commit c61b781e upstream.

There exists a race when msg_submit return immediately as there was an
active request being processed which may have completed just before it's
checked again in mbox_send_message. This will result in return to the
caller without waiting in mbox_send_message even when it's blocking Tx.

This patch fixes the issue by waiting for the completion always if Tx
is in blocking mode.

Fixes: 2b6d83e2 ("mailbox: Introduce framework for mailbox")
Reported-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

a23fba81

wil6210: fix deadlock when using fw_no_recovery option · 2f16bcd4

Lior David authored Nov 23, 2016

commit dfb5b098 upstream.

When FW crashes with no_fw_recovery option, driver
waits for manual recovery with wil->mutex held, this
can easily create deadlocks.
Fix the problem by moving the wait outside the lock.
Signed-off-by: Lior David <qca_liord@qca.qualcomm.com>
Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2f16bcd4

ath10k: fix null deref on wmi-tlv when trying spectral scan · 59153e65

Michal Kazior authored Nov 14, 2016

commit 18ae68ff upstream.

WMI ops wrappers did not properly check for null
function pointers for spectral scan. This caused
null dereference crash with WMI-TLV based firmware
which doesn't implement spectral scan.

The crash could be triggered with:

  ip link set dev wlan0 up
  echo background > /sys/kernel/debug/ieee80211/phy0/ath10k/spectral_scan_ctl

The crash looked like this:

  [  168.031989] BUG: unable to handle kernel NULL pointer dereference at           (null)
  [  168.037406] IP: [<          (null)>]           (null)
  [  168.040395] PGD cdd4067 PUD fa0f067 PMD 0
  [  168.043303] Oops: 0010 [#1] SMP
  [  168.045377] Modules linked in: ath10k_pci(O) ath10k_core(O) ath mac80211 cfg80211 [last unloaded: cfg80211]
  [  168.051560] CPU: 1 PID: 1380 Comm: bash Tainted: G        W  O    4.8.0 #78
  [  168.054336] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
  [  168.059183] task: ffff88000c460c00 task.stack: ffff88000d4bc000
  [  168.061736] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
  ...
  [  168.100620] Call Trace:
  [  168.101910]  [<ffffffffa03b9566>] ? ath10k_spectral_scan_config+0x96/0x200 [ath10k_core]
  [  168.104871]  [<ffffffff811386e2>] ? filemap_fault+0xb2/0x4a0
  [  168.106696]  [<ffffffffa03b97e6>] write_file_spec_scan_ctl+0x116/0x280 [ath10k_core]
  [  168.109618]  [<ffffffff812da3a1>] full_proxy_write+0x51/0x80
  [  168.111443]  [<ffffffff811957b8>] __vfs_write+0x28/0x120
  [  168.113090]  [<ffffffff812f1a2d>] ? security_file_permission+0x3d/0xc0
  [  168.114932]  [<ffffffff8109b912>] ? percpu_down_read+0x12/0x60
  [  168.116680]  [<ffffffff811965f8>] vfs_write+0xb8/0x1a0
  [  168.118293]  [<ffffffff81197966>] SyS_write+0x46/0xa0
  [  168.119912]  [<ffffffff818f2972>] entry_SYSCALL_64_fastpath+0x1a/0xa4
  [  168.121737] Code:  Bad RIP value.
  [  168.123318] RIP  [<          (null)>]           (null)
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

59153e65

isdn/i4l: fix buffer overflow · 7b3a6673

Annie Cherkaev authored Jul 15, 2017

commit 9f5af546 upstream.

This fixes a potential buffer overflow in isdn_net.c caused by an
unbounded strcpy.

[ ISDN seems to be effectively unmaintained, and the I4L driver in
  particular is long deprecated, but in case somebody uses this..
    - Linus ]
Signed-off-by: Jiten Thakkar <jitenmt@gmail.com>
Signed-off-by: Annie Cherkaev <annie.cherk@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7b3a6673

isdn: Fix a sleep-in-atomic bug · b7568624

Jia-Ju Bai authored May 31, 2017

commit e8f4ae85 upstream.

The driver may sleep under a spin lock, the function call path is:
isdn_ppp_mp_receive (acquire the lock)
  isdn_ppp_mp_reassembly
    isdn_ppp_push_higher
      isdn_ppp_decompress
        isdn_ppp_ccp_reset_trans
          isdn_ppp_ccp_reset_alloc_state
            kzalloc(GFP_KERNEL) --> may sleep

To fixed it, the "GFP_KERNEL" is replaced with "GFP_ATOMIC".
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b7568624

net: phy: Do not perform software reset for Generic PHY · 6c78197e

Florian Fainelli authored Mar 05, 2017

commit 0878fff1 upstream.

The Generic PHY driver is a catch-all PHY driver and it should preserve
whatever prior initialization has been done by boot loader or firmware
agents. For specific PHY device configuration it is expected that a
specialized PHY driver would take over that role.

Resetting the generic PHY was a bad idea that has lead to several
complaints and downstream workarounds e.g: in OpenWrt/LEDE so restore
the behavior prior to 87aa9f9c ("net: phy: consolidate PHY
reset in phy_init_hw()").
Reported-by: Felix Fietkau <nbd@nbd.name>
Fixes: 87aa9f9c ("net: phy: consolidate PHY reset in phy_init_hw()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6c78197e

nfc: fdp: fix NULL pointer dereference · 57154f03

Sudip Mukherjee authored Dec 20, 2016

commit b6355fb3 upstream.

We are checking phy after dereferencing it. We can print the debug
information after checking it. If phy is NULL then we will get a good
stack trace to tell us that we are in this irq handler.
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

57154f03

nfc: Fix hangup of RC-S380* in port100_send_ack() · 35bdf9a6

OGAWA Hirofumi authored Feb 04, 2017

commit 24971281 upstream.

If port100_send_ack() was called twice or more, it has race to hangup.

  port100_send_ack()          port100_send_ack()
    init_completion()
    [...]
    dev->cmd_cancel = true
                                /* this removes previous from completion */
                                init_completion()
				[...]
                                dev->cmd_cancel = true
                                wait_for_completion()
    /* never be waked up */
    wait_for_completion()

Like above race, this code is not assuming port100_send_ack() is
called twice or more.

To fix, this checks dev->cmd_cancel to know if prior cancel is
in-flight or not. And never be remove prior task from completion by
using reinit_completion(), so this guarantees to be waked up properly
soon or later.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

35bdf9a6

smp/hotplug: Replace BUG_ON and react useful · 6b3d13fe

Thomas Gleixner authored Jul 11, 2017

commit dea1d0f5 upstream.

The move of the unpark functions to the control thread moved the BUG_ON()
there as well. While it made some sense in the idle thread of the upcoming
CPU, it's bogus to crash the control thread on the already online CPU,
especially as the function has a return value and the callsite is prepared
to handle an error return.

Replace it with a WARN_ON_ONCE() and return a proper error code.

Fixes: 9cd4f1a4 ("smp/hotplug: Move unparking of percpu threads to the control CPU")
Rightfully-ranted-at-by: Linux Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6b3d13fe

smp/hotplug: Move unparking of percpu threads to the control CPU · 7b4e4b18

Thomas Gleixner authored Jul 04, 2017

commit 9cd4f1a4 upstream.

Vikram reported the following backtrace:

   BUG: scheduling while atomic: swapper/7/0/0x00000002
   CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
   schedule
   schedule_hrtimeout_range_clock
   schedule_hrtimeout
   wait_task_inactive
   __kthread_bind_mask
   __kthread_bind
   __kthread_unpark
   kthread_unpark
   cpuhp_online_idle
   cpu_startup_entry
   secondary_start_kernel

He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
was still on the runqueue when the CPU came back online and tried to unpark
it. This causes the thread which invoked kthread_unpark() to call
wait_task_inactive() and subsequently schedule() with preemption disabled.
His proposed workaround was to "make sure" that a parked thread has
scheduled out when the CPU goes offline, so the situation cannot happen.

But that's still wrong because the root cause is not the fact that the
percpu thread is still on the runqueue and neither that preemption is
disabled, which could be simply solved by enabling preemption before
calling kthread_unpark().

The real issue is that the calling thread is the idle task of the upcoming
CPU, which is not supposed to call anything which might sleep.  The moron,
who wrote that code, missed completely that kthread_unpark() might end up
in schedule().

The solution is simpler than expected. The thread which controls the
hotplug operation is waiting for the CPU to call complete() on the hotplug
state completion. So the idle task of the upcoming CPU can set its state to
CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
task on a different CPU, which then can safely do the unpark and kick the
now unparked hotplug thread of the upcoming CPU to complete the bringup to
the final target state.

Control CPU                     AP

bringup_cpu();
  __cpu_up()  ------------>
				bringup_ap();
  bringup_wait_for_ap()
    wait_for_completion();
                                cpuhp_online_idle();
                <------------    complete();
    unpark(AP->stopper);
    unpark(AP->hotplugthread);
                                while(1)
                                  do_idle();
    kick(AP->hotplugthread);
    wait_for_completion();	hotplug_thread()
				  run_online_callbacks();
				  complete();

Fixes: 8df3e07e ("cpu/hotplug: Let upcoming cpu bring itself fully up")
Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanosSigned-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7b4e4b18

drm: rcar-du: Simplify and fix probe error handling · 755f6550

Laurent Pinchart authored Oct 19, 2016

commit 4f7b0d26 upstream.

It isn't safe to call drm_dev_unregister() without first initializing
mode setting with drm_mode_config_init(). This leads to a crash if
either IO memory can't be remapped or vblank initialization fails.

Fix this by reordering the initialization sequence. Move vblank
initialization after the drm_mode_config_init() call, and move IO
remapping before drm_dev_alloc() to avoid the need to perform clean up
in case of failure.

While at it remove the explicit drm_vblank_cleanup() call from
rcar_du_remove() as the drm_dev_unregister() function already cleans up
vblank.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: thongsyho <thong.ho.px@rvc.renesas.com>
Signed-off-by: Nhan Nguyen <nhan.nguyen.yb@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

755f6550

Staging: comedi: comedi_fops: Avoid orphaned proc entry · 9bf0d78b

Cheah Kok Cheong authored Dec 30, 2016

commit bf279ece upstream.

Move comedi_proc_init to the end to avoid orphaned proc entry
if module loading failed.
Signed-off-by: Cheah Kok Cheong <thrust73@gmail.com>
Reviewed-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9bf0d78b

Revert "powerpc/numa: Fix percpu allocations to be NUMA aware" · 0f316915

Greg Kroah-Hartman authored Aug 03, 2017

This reverts commit b4624ff9 which is
commit ba4a648f upstream.

Michal Hocko writes:

JFYI. We have encountered a regression after applying this patch on a
large ppc machine. While the patch is the right thing to do it doesn't
work well with the current vmalloc area size on ppc and large machines
where NUMA nodes are very far from each other. Just for the reference
the boot fails on such a machine with bunch of warning preceeding it.
See http://lkml.kernel.org/r/20170724134240.GL25221@dhcp22.suse.cz

It seems the right thing to do is to enlarge the vmalloc space on ppc
but this is not the case in the upstream kernel yet AFAIK. It is also
questionable whether that is a stable material but I will decision on
you here.

We have reverted this patch from our 4.4 based kernel.

Newer kernels do not have enlarged vmalloc space yet AFAIK so they won't
work properly eiter. This bug is quite rare though because you need a
specific HW configuration to trigger the issue - namely NUMA nodes have
to be far away from each other in the physical memory space.

Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

0f316915

KVM: PPC: Book3S HV: Save/restore host values of debug registers · c39c3aeb

Paul Mackerras authored Jun 16, 2017

commit 7ceaa6dc upstream.

At present, HV KVM on POWER8 and POWER9 machines loses any instruction
or data breakpoint set in the host whenever a guest is run.
Instruction breakpoints are currently only used by xmon, but ptrace
and the perf_event subsystem can set data breakpoints as well as xmon.

To fix this, we save the host values of the debug registers (CIABR,
DAWR and DAWRX) before entering the guest and restore them on exit.
To provide space to save them in the stack frame, we expand the stack
frame allocated by kvmppc_hv_entry() from 112 to 144 bytes.

[paulus@ozlabs.org - Adjusted stack offsets since we aren't saving
 POWER9-specific registers.]

Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

c39c3aeb

KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit · e5cd34d1

Paul Mackerras authored Jun 15, 2017

commit 4c3bb4cc upstream.

This restores several special-purpose registers (SPRs) to sane values
on guest exit that were missed before.

TAR and VRSAVE are readable and writable by userspace, and we need to
save and restore them to prevent the guest from potentially affecting
userspace execution (not that TAR or VRSAVE are used by any known
program that run uses the KVM_RUN ioctl).  We save/restore these
in kvmppc_vcpu_run_hv() rather than on every guest entry/exit.

FSCR affects userspace execution in that it can prohibit access to
certain facilities by userspace.  We restore it to the normal value
for the task on exit from the KVM_RUN ioctl.

IAMR is normally 0, and is restored to 0 on guest exit.  However,
with a radix host on POWER9, it is set to a value that prevents the
kernel from executing user-accessible memory.  On POWER9, we save
IAMR on guest entry and restore it on guest exit to the saved value
rather than 0.  On POWER8 we continue to set it to 0 on guest exit.

PSPB is normally 0.  We restore it to 0 on guest exit to prevent
userspace taking advantage of the guest having set it non-zero
(which would allow userspace to set its SMT priority to high).

UAMOR is normally 0.  We restore it to 0 on guest exit to prevent
the AMR from being used as a covert channel between userspace
processes, since the AMR is not context-switched at present.

[paulus@ozlabs.org - removed IAMR bits that are only needed on POWER9]

Fixes: b005255e ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs", 2014-01-08)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e5cd34d1

drm/nouveau/bar/gf100: fix access to upper half of BAR2 · ae8faca6

Ben Skeggs authored Jul 25, 2017

commit 38bcb208 upstream.

Bit 30 being set causes the upper half of BAR2 to stay in physical mode,
mapped over the end of VRAM, even when the rest of the BAR has been set
to virtual mode.

We inherited our initial value from RM, but I'm not aware of any reason
we need to keep it that way.

This fixes severe GPU hang/lockup issues revealed by Wayland on F26.

Shout-out to NVIDIA for the quick response with the potential cause!
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ae8faca6

drm/nouveau/disp/nv50-: bump max chans to 21 · 34da5f74

Ilia Mirkin authored Jun 28, 2017

commit a90e049c upstream.

GP102's cursors go from chan 17..20. Increase the array size to hold
their data properly.

Fixes: e50fcff1 ("drm/nouveau/disp/gp102: fix cursor/overlay immediate channel indices")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

34da5f74

drm/vmwgfx: Fix gcc-7.1.1 warning · e4177988

Sinclair Yeh authored Jul 17, 2017

commit fcfffdd8 upstream.

The current code does not look correct, and the reason for it is
probably lost.  Since this now generates a compiler warning,
fix it to what makes sense.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e4177988

md/raid5: add thread_group worker async_tx_issue_pending_all · fabc7dff

Ofer Heifetz authored Jul 24, 2017

commit 7e96d559 upstream.

Since thread_group worker and raid5d kthread are not in sync, if
worker writes stripe before raid5d then requests will be waiting
for issue_pendig.

Issue observed when building raid5 with ext4, in some build runs
jbd2 would get hung and requests were waiting in the HW engine
waiting to be issued.

Fix this by adding a call to async_tx_issue_pending_all in the
raid5_do_work.
Signed-off-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fabc7dff

KVM: PPC: Book3S HV: Enable TM before accessing TM registers · d745f0f6

Paul Mackerras authored Jul 21, 2017

commit e4705715 upstream.

Commit 46a704f8 ("KVM: PPC: Book3S HV: Preserve userspace HTM state
properly", 2017-06-15) added code to read transactional memory (TM)
registers but forgot to enable TM before doing so.  The result is
that if userspace does have live values in the TM registers, a KVM_RUN
ioctl will cause a host kernel crash like this:

[  181.328511] Unrecoverable TM Unavailable Exception f60 at d00000001e7d9980
[  181.328605] Oops: Unrecoverable TM Unavailable Exception, sig: 6 [#1]
[  181.328613] SMP NR_CPUS=2048
[  181.328613] NUMA
[  181.328618] PowerNV
[  181.328646] Modules linked in: vhost_net vhost tap nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs
+fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
+nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables
+ip6table_filter ip6_tables iptable_filter bridge stp llc kvm_hv kvm nfsd ses enclosure scsi_transport_sas ghash_generic
+auth_rpcgss gf128mul xts sg ctr nfs_acl lockd vmx_crypto shpchp ipmi_powernv i2c_opal grace ipmi_devintf i2c_core
+powernv_rng sunrpc ipmi_msghandler ibmpowernv uio_pdrv_genirq uio leds_powernv powernv_op_panel ip_tables xfs sd_mod
+lpfc ipr bnx2x libata mdio ptp pps_core scsi_transport_fc libcrc32c dm_mirror dm_region_hash dm_log dm_mod
[  181.329278] CPU: 40 PID: 9926 Comm: CPU 0/KVM Not tainted 4.12.0+ #1
[  181.329337] task: c000003fc6980000 task.stack: c000003fe4d80000
[  181.329396] NIP: d00000001e7d9980 LR: d00000001e77381c CTR: d00000001e7d98f0
[  181.329465] REGS: c000003fe4d837e0 TRAP: 0f60   Not tainted  (4.12.0+)
[  181.329523] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[  181.329527]   CR: 24022448  XER: 00000000
[  181.329608] CFAR: d00000001e773818 SOFTE: 1
[  181.329608] GPR00: d00000001e77381c c000003fe4d83a60 d00000001e7ef410 c000003fdcfe0000
[  181.329608] GPR04: c000003fe4f00000 0000000000000000 0000000000000000 c000003fd7954800
[  181.329608] GPR08: 0000000000000001 c000003fc6980000 0000000000000000 d00000001e7e2880
[  181.329608] GPR12: d00000001e7d98f0 c000000007b19000 00000001295220e0 00007fffc0ce2090
[  181.329608] GPR16: 0000010011886608 00007fff8c89f260 0000000000000001 00007fff8c080028
[  181.329608] GPR20: 0000000000000000 00000100118500a6 0000010011850000 0000010011850000
[  181.329608] GPR24: 00007fffc0ce1b48 0000010011850000 00000000d673b901 0000000000000000
[  181.329608] GPR28: 0000000000000000 c000003fdcfe0000 c000003fdcfe0000 c000003fe4f00000
[  181.330199] NIP [d00000001e7d9980] kvmppc_vcpu_run_hv+0x90/0x6b0 [kvm_hv]
[  181.330264] LR [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
[  181.330322] Call Trace:
[  181.330351] [c000003fe4d83a60] [d00000001e773478] kvmppc_set_one_reg+0x48/0x340 [kvm] (unreliable)
[  181.330437] [c000003fe4d83b30] [d00000001e77381c] kvmppc_vcpu_run+0x2c/0x40 [kvm]
[  181.330513] [c000003fe4d83b50] [d00000001e7700b4] kvm_arch_vcpu_ioctl_run+0x114/0x2a0 [kvm]
[  181.330586] [c000003fe4d83bd0] [d00000001e7642f8] kvm_vcpu_ioctl+0x598/0x7a0 [kvm]
[  181.330658] [c000003fe4d83d40] [c0000000003451b8] do_vfs_ioctl+0xc8/0x8b0
[  181.330717] [c000003fe4d83de0] [c000000000345a64] SyS_ioctl+0xc4/0x120
[  181.330776] [c000003fe4d83e30] [c00000000000b004] system_call+0x58/0x6c
[  181.330833] Instruction dump:
[  181.330869] e92d0260 e9290b50 e9290108 792807e3 41820058 e92d0260 e9290b50 e9290108
[  181.330941] 792ae8a4 794a1f87 408204f4 e92d0260 <7d4022a6> f9490ff0 e92d0260 7d4122a6
[  181.331013] ---[ end trace 6f6ddeb4bfe92a92 ]---

The fix is just to turn on the TM bit in the MSR before accessing the
registers.

Fixes: 46a704f8 ("KVM: PPC: Book3S HV: Preserve userspace HTM state properly")
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

d745f0f6

crypto: authencesn - Fix digest_null crash · 9eb088e5

Herbert Xu authored Jul 17, 2017

commit 41cdf7a4 upstream.

When authencesn is used together with digest_null a crash will
occur on the decrypt path.  This is because normally we perform
a special setup to preserve the ESN, but this is skipped if there
is no authentication.  However, on the post-authentication path
it always expects the preservation to be in place, thus causing
a crash when digest_null is used.

This patch fixes this by also skipping the post-processing when
there is no authentication.

Fixes: 104880a6 ("crypto: authencesn - Convert to new AEAD...")
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

9eb088e5

NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter · 7d2a3548

Benjamin Coddington authored Jul 28, 2017

commit b7dbcc0e upstream.

nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
same region protected by the wait_queue's lock after checking for a
notification from CB_NOTIFY_LOCK callback.  However, after releasing that
lock, a wakeup for that task may race in before the call to
freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
freezable_schedule_timeout_interruptible() will set the state back to
TASK_INTERRUPTIBLE before the task will sleep.  The result is that the task
will sleep for the entire duration of the timeout.

Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
freezable_schedule_timout() instead.

Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

7d2a3548

NFS: invalidate file size when taking a lock. · b087b8b1

NeilBrown authored Jul 24, 2017

commit 442ce049 upstream.

Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
for writing"), NFS would revalidate, or invalidate, the file size when
taking a lock.  Since that commit it only invalidates the file content.

If the file size is changed on the server while wait for the lock, the
client will have an incorrect understanding of the file size and could
corrupt data.  This particularly happens when writing beyond the
(supposed) end of file and can be easily be demonstrated with
posix_fallocate().

If an application opens an empty file, waits for a write lock, and then
calls posix_fallocate(), glibc will determine that the underlying
filesystem doesn't support fallocate (assuming version 4.1 or earlier)
and will write out a '0' byte at the end of each 4K page in the region
being fallocated that is after the end of the file.
NFS will (usually) detect that these writes are beyond EOF and will
expand them to cover the whole page, and then will merge the pages.
Consequently, NFS will write out large blocks of zeroes beyond where it
thought EOF was.  If EOF had moved, the pre-existing part of the file
will be over-written.  Locking should have protected against this,
but it doesn't.

This patch restores the use of nfs_zap_caches() which invalidated the
cached attributes.  When posix_fallocate() asks for the file size, the
request will go to the server and get a correct answer.

Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

b087b8b1

powerpc/pseries: Fix of_node_put() underflow during reconfig remove · 6d3d93ca

Laurent Vivier authored Jul 21, 2017

commit 4fd1bd44 upstream.

As for commit 68baf692 ("powerpc/pseries: Fix of_node_put()
underflow during DLPAR remove"), the call to of_node_put() must be
removed from pSeries_reconfig_remove_node().

dlpar_detach_node() and pSeries_reconfig_remove_node() both call
of_detach_node(), and thus the node should not be released in both
cases.

Fixes: 0829f6d1 ("of: device_node kobject lifecycle fixes")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

6d3d93ca

parisc: Suspend lockup detectors before system halt · fa2aa76e

Helge Deller authored Jul 25, 2017

commit 56188832 upstream.

Some machines can't power off the machine, so disable the lockup detectors to
avoid this watchdog BUG to show up every few seconds:
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1]
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fa2aa76e

parisc: Extend disabled preemption in copy_user_page · f0d23fa6

John David Anglin authored Jul 25, 2017

commit 56008c04 upstream.

It's always bothered me that we only disable preemption in
copy_user_page around the call to flush_dcache_page_asm.
This patch extends this to after the copy.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f0d23fa6

parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases · afe9fc01

John David Anglin authored Jul 25, 2017

commit ae7a609c upstream.

Helge noticed that we flush the TLB page in flush_cache_page but not in
flush_cache_range or flush_cache_mm.

For a long time, we have had random segmentation faults building
packages on machines with PA8800/8900 processors.  These machines only
support equivalent aliases.  We don't see these faults on machines that
don't require strict coherency.  So, it appears TLB speculation
sometimes leads to cache corruption on machines that require coherency.

This patch adds TLB flushes to flush_cache_range and flush_cache_mm when
coherency is required.  We only flush the TLB in flush_cache_page when
coherency is required.

The patch also optimizes flush_cache_range.  It turns out we always have
the right context to use flush_user_dcache_range_asm and
flush_user_icache_range_asm.

The patch has been tested for some time on rp3440, rp3410 and A500-44.
It's been boot tested on c8000.  No random segmentation faults were
observed during testing.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

afe9fc01

ALSA: hda - Add missing NVIDIA GPU codec IDs to patch table · 5f8bdd5e

Daniel Dadap authored Jul 13, 2017

commit 74ec1181 upstream.

Add codec IDs for several recently released, pending, and historical
NVIDIA GPU audio controllers to the patch table, to allow the correct
patch functions to be selected for them.
Signed-off-by: Daniel Dadap <ddadap@nvidia.com>
Reviewed-by: Andy Ritger <aritger@nvidia.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

5f8bdd5e

ALSA: fm801: Initialize chip after IRQ handler is registered · 3d955095

Andy Shevchenko authored Jul 16, 2017

commit 610e1ae9 upstream.

The commit b56fa687 ("ALSA: fm801: detect FM-only card earlier")
rearranged initialization calls, i.e. it makes snd_fm801_chip_init() to
be called before we register interrupt handler and set PCI bus
mastering.

Somehow it prevents FM801-AU to work properly. Thus, partially revert
initialization order changed by commit mentioned above.

Fixes: b56fa687 ("ALSA: fm801: detect FM-only card earlier")
Reported-by: Émeric MASCHINO <emeric.maschino@gmail.com>
Tested-by: Émeric MASCHINO <emeric.maschino@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3d955095

jfs: Don't clear SGID when inheriting ACLs · 3a79e1c8

Jan Kara authored Jun 22, 2017

commit 9bcf66c7 upstream.

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__jfs_set_acl() into jfs_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 07393101
CC: jfs-discussion@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

3a79e1c8

net: reduce skb_warn_bad_offload() noise · 26d01aa8

Eric Dumazet authored Jan 31, 2017

commit b2504a5d upstream.

Dmitry reported warnings occurring in __skb_gso_segment() [1]

All SKB_GSO_DODGY producers can allow user space to feed
packets that trigger the current check.

We could prevent them from doing so, rejecting packets, but
this might add regressions to existing programs.

It turns out our SKB_GSO_DODGY handlers properly set up checksum
information that is needed anyway when packets needs to be segmented.

By checking again skb_needs_check() after skb_mac_gso_segment(),
we should remove these pesky warnings, at a very minor cost.

With help from Willem de Bruijn

[1]
WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e
 ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1
 ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20
Call Trace:
 [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline]
 [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179
 [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542
 [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565
 [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
 [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706
 [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline]
 [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969
 [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383
 [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
 [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline]
 [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955
 [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline]
 [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631
 [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954
 [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988
 [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline]
 [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

26d01aa8