1. 07 Jun, 2017 6 commits
    • Steve Rutherford's avatar
      KVM: x86: Introduce segmented_write_std · 27bf4b6e
      Steve Rutherford authored
      commit 129a72a0 upstream.
      
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      27bf4b6e
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "MOV SS, null selector" · 2dd7d7e4
      Paolo Bonzini authored
      commit 33ab9110 upstream.
      
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      
      [js] backport to 3.12
      Reported-by: default avatarXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2dd7d7e4
    • Ilya Dryomov's avatar
      libceph: don't set weight to IN when OSD is destroyed · dd7d4be1
      Ilya Dryomov authored
      commit b581a585 upstream.
      
      Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
      osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
      This changes the result of applying an incremental for clients, not
      just OSDs.  Because CRUSH computations are obviously affected,
      pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
      object placement, resulting in misdirected requests.
      
      Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.
      
      Fixes: 930c5328 ("libceph: apply new_state before new_up_client on incrementals")
      Link: http://tracker.ceph.com/issues/19122Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd7d4be1
    • Ryan Ware's avatar
      EVM: Use crypto_memneq() for digest comparisons · 37510515
      Ryan Ware authored
      commit 613317bd upstream.
      
      This patch fixes vulnerability CVE-2016-2085.  The problem exists
      because the vm_verify_hmac() function includes a use of memcmp().
      Unfortunately, this allows timing side channel attacks; specifically
      a MAC forgery complexity drop from 2^128 to 2^12.  This patch changes
      the memcmp() to the cryptographically safe crypto_memneq().
      Reported-by: default avatarXiaofei Rex Guo <xiaofei.rex.guo@intel.com>
      Signed-off-by: default avatarRyan Ware <ware@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      37510515
    • James Yonan's avatar
      crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks · 5fd53819
      James Yonan authored
      commit 6bf37e5a upstream.
      
      When comparing MAC hashes, AEAD authentication tags, or other hash
      values in the context of authentication or integrity checking, it
      is important not to leak timing information to a potential attacker,
      i.e. when communication happens over a network.
      
      Bytewise memory comparisons (such as memcmp) are usually optimized so
      that they return a nonzero value as soon as a mismatch is found. E.g,
      on x86_64/i5 for 512 bytes this can be ~50 cyc for a full mismatch
      and up to ~850 cyc for a full match (cold). This early-return behavior
      can leak timing information as a side channel, allowing an attacker to
      iteratively guess the correct result.
      
      This patch adds a new method crypto_memneq ("memory not equal to each
      other") to the crypto API that compares memory areas of the same length
      in roughly "constant time" (cache misses could change the timing, but
      since they don't reveal information about the content of the strings
      being compared, they are effectively benign). Iow, best and worst case
      behaviour take the same amount of time to complete (in contrast to
      memcmp).
      
      Note that crypto_memneq (unlike memcmp) can only be used to test for
      equality or inequality, NOT for lexicographical order. This, however,
      is not an issue for its use-cases within the crypto API.
      
      We tried to locate all of the places in the crypto API where memcmp was
      being used for authentication or integrity checking, and convert them
      over to crypto_memneq.
      
      crypto_memneq is declared noinline, placed in its own source file,
      and compiled with optimizations that might increase code size disabled
      ("Os") because a smart compiler (or LTO) might notice that the return
      value is always compared against zero/nonzero, and might then
      reintroduce the same early-return optimization that we are trying to
      avoid.
      
      Using #pragma or __attribute__ optimization annotations of the code
      for disabling optimization was avoided as it seems to be considered
      broken or unmaintained for long time in GCC [1]. Therefore, we work
      around that by specifying the compile flag for memneq.o directly in
      the Makefile. We found that this seems to be most appropriate.
      
      As we use ("Os"), this patch also provides a loop-free "fast-path" for
      frequently used 16 byte digests. Similarly to kernel library string
      functions, leave an option for future even further optimized architecture
      specific assembler implementations.
      
      This was a joint work of James Yonan and Daniel Borkmann. Also thanks
      for feedback from Florian Weimer on this and earlier proposals [2].
      
        [1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html
        [2] https://lkml.org/lkml/2013/2/10/131Signed-off-by: default avatarJames Yonan <james@openvpn.net>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5fd53819
    • Philip Pettersson's avatar
      packet: fix race condition in packet_set_ring · 4bfb6dd7
      Philip Pettersson authored
      commit 84ac7260 upstream.
      
      When packet_set_ring creates a ring buffer it will initialize a
      struct timer_list if the packet version is TPACKET_V3. This value
      can then be raced by a different thread calling setsockopt to
      set the version to TPACKET_V1 before packet_set_ring has finished.
      
      This leads to a use-after-free on a function pointer in the
      struct timer_list when the socket is closed as the previously
      initialized timer will not be deleted.
      
      The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
      changing the packet version while also taking the lock at the start
      of packet_set_ring.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarPhilip Pettersson <philip.pettersson@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: gregkh@linuxfoundation.org
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4bfb6dd7
  2. 10 Feb, 2017 34 commits
    • Willy Tarreau's avatar
      Linux 3.10.105 · ec55e7c2
      Willy Tarreau authored
      ec55e7c2
    • Guenter Roeck's avatar
      metag: Only define atomic_dec_if_positive conditionally · 63d7f335
      Guenter Roeck authored
      commit 35d04077 upstream.
      
      The definition of atomic_dec_if_positive() assumes that
      atomic_sub_if_positive() exists, which is only the case if
      metag specific atomics are used. This results in the following
      build error when trying to build metag1_defconfig.
      
      kernel/ucount.c: In function 'dec_ucount':
      kernel/ucount.c:211: error:
      	implicit declaration of function 'atomic_sub_if_positive'
      
      Moving the definition of atomic_dec_if_positive() into the metag
      conditional code fixes the problem.
      
      Fixes: 6006c0d8 ("metag: Atomics, locks and bitops")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      63d7f335
    • Max Staudt's avatar
      fbdev/efifb: Fix 16 color palette entry calculation · 16654a56
      Max Staudt authored
      commit d50b3f43 upstream.
      
      When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered
      in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green
      (#50bc50) and neighboring pixels have slightly different values
      (such as #50bc78).
      
      The reason is that fbcon loads its 16 color palette through
      efifb_setcolreg(), which in turn calculates a 32-bit value to write
      into memory for each palette index.
      Until now, this code could only handle 8-bit visuals and didn't mask
      overlapping values when ORing them.
      
      With this patch, fbcon displays the correct colors when a qemu VM is
      booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16").
      
      Fixes: 7c83172b ("x86_64 EFI boot support: EFI frame buffer driver")  # v2.6.24+
      Signed-off-by: default avatarMax Staudt <mstaudt@suse.de>
      Acked-By: default avatarPeter Jones <pjones@redhat.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      16654a56
    • Bart Van Assche's avatar
      dm: mark request_queue dead before destroying the DM device · fa88cd79
      Bart Van Assche authored
      commit 3b785fbc upstream.
      
      This avoids that new requests are queued while __dm_destroy() is in
      progress.
      
      [js] use md->queue instead of non-present helper
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      fa88cd79
    • Jan Remmet's avatar
      regulator: tps65910: Work around silicon erratum SWCZ010 · 7fbbad13
      Jan Remmet authored
      commit 8f9165c9 upstream.
      
      http://www.ti.com/lit/pdf/SWCZ010:
        DCDC o/p voltage can go higher than programmed value
      
      Impact:
      VDDI, VDD2, and VIO output programmed voltage level can go higher than
      expected or crash, when coming out of PFM to PWM mode or using DVFS.
      
      Description:
      When DCDC CLK SYNC bits are 11/01:
      * VIO 3-MHz oscillator is the source clock of the digital core and input
        clock of VDD1 and VDD2
      * Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant
        phase shift
      * Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2)
      * The 3 HSD PFET will be turned-on at the same time, causing the highest
        possible switching noise on the application. This noise level depends
        on the layout, the VBAT level, and the load current. The noise level
        increases with improper layout.
      
      When DCDC CLK SYNC bits are 00:
      * VIO 3-MHz oscillator is the source clock of digital core
      * VDD1 and VDD2 are running on their own 3-MHz oscillator
      * Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2)
      * The switching noise of the 3 SMPS will be randomly spread over time,
        causing lower overall switching noise.
      
      Workaround:
      Set DCDCCTRL_REG[1:0]= 00.
      Signed-off-by: default avatarJan Remmet <j.remmet@phytec.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7fbbad13
    • Peter Ujfalusi's avatar
      ASoC: omap-mcpdm: Fix irq resource handling · bc90bffd
      Peter Ujfalusi authored
      commit a8719670 upstream.
      
      Fixes: ddd17531 ("ASoC: omap-mcpdm: Clean up with devm_* function")
      
      Managed irq request will not doing any good in ASoC probe level as it is
      not going to free up the irq when the driver is unbound from the sound
      card.
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Reported-by: default avatarRussell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bc90bffd
    • Dan Carpenter's avatar
      mfd: 88pm80x: Double shifting bug in suspend/resume · a28d8358
      Dan Carpenter authored
      commit 9a6dc644 upstream.
      
      set_bit() and clear_bit() take the bit number so this code is really
      doing "1 << (1 << irq)" which is a double shift bug.  It's done
      consistently so it won't cause a problem unless "irq" is more than 4.
      
      Fixes: 70c6cce0 ('mfd: Support 88pm80x in 80x driver')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a28d8358
    • Andrey Ryabinin's avatar
      mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] · 940d1175
      Andrey Ryabinin authored
      commit f5527fff upstream.
      
      This fixes CVE-2016-8650.
      
      If mpi_powm() is given a zero exponent, it wants to immediately return
      either 1 or 0, depending on the modulus.  However, if the result was
      initalised with zero limb space, no limbs space is allocated and a
      NULL-pointer exception ensues.
      
      Fix this by allocating a minimal amount of limb space for the result when
      the 0-exponent case when the result is 1 and not touching the limb space
      when the result is 0.
      
      This affects the use of RSA keys and X.509 certificates that carry them.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804011944c0 task.stack: ffff880401294000
      RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
      RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
      RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
      RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
      R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
      FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
      Stack:
       ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
       0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
       ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
      Call Trace:
       [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
       [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
       [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
       [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
       [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
       [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
       [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
       [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
       [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
       [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
       [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
       [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
       [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
       [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
       [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
       [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
      Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
      RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
       RSP <ffff880401297ad8>
      CR2: 0000000000000000
      ---[ end trace d82015255d4a5d8d ]---
      
      Basically, this is a backport of a libgcrypt patch:
      
      	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526
      
      Fixes: cdec9cb5 ("crypto: GnuPG based MPI lib - source files (part 1)")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      940d1175
    • Michael Walle's avatar
      hwmon: (adt7411) set bit 3 in CFG1 register · 2c8c5e67
      Michael Walle authored
      commit b53893aa upstream.
      
      According to the datasheet you should only write 1 to this bit. If it is
      not set, at least AIN3 will return bad values on newer silicon revisions.
      
      Fixes: d84ca5b3 ("hwmon: Add driver for ADT7411 voltage and temperature sensor")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2c8c5e67
    • Sergei Miroshnichenko's avatar
      can: dev: fix deadlock reported after bus-off · ebee2e2f
      Sergei Miroshnichenko authored
      commit 9abefcb1 upstream.
      
      A timer was used to restart after the bus-off state, leading to a
      relatively large can_restart() executed in an interrupt context,
      which in turn sets up pinctrl. When this happens during system boot,
      there is a high probability of grabbing the pinctrl_list_mutex,
      which is locked already by the probe() of other device, making the
      kernel suspect a deadlock condition [1].
      
      To resolve this issue, the restart_timer is replaced by a delayed
      work.
      
      [1] https://github.com/victronenergy/venus/issues/24Signed-off-by: default avatarSergei Miroshnichenko <sergeimir@emcraft.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ebee2e2f
    • zhong jiang's avatar
      mm,ksm: fix endless looping in allocating memory when ksm enable · 3419dc51
      zhong jiang authored
      commit 5b398e41 upstream.
      
      I hit the following hung task when runing a OOM LTP test case with 4.1
      kernel.
      
      Call trace:
      [<ffffffc000086a88>] __switch_to+0x74/0x8c
      [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
      [<ffffffc000a1c09c>] schedule+0x3c/0x94
      [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
      [<ffffffc000a1e32c>] down_write+0x64/0x80
      [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
      [<ffffffc0000be650>] mmput+0x118/0x11c
      [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
      [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
      [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
      [<ffffffc000089fcc>] do_signal+0x1d8/0x450
      [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
      
      The oom victim cannot terminate because it needs to take mmap_sem for
      write while the lock is held by ksmd for read which loops in the page
      allocator
      
      ksm_do_scan
      	scan_get_next_rmap_item
      		down_read
      		get_next_rmap_item
      			alloc_rmap_item   #ksmd will loop permanently.
      
      There is no way forward because the oom victim cannot release any memory
      in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
      this problem because it would release the memory asynchronously.
      Nevertheless we can relax alloc_rmap_item requirements and use
      __GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
      would just retry later after the lock got dropped.
      
      Such a patch would be also easy to backport to older stable kernels which
      do not have oom_reaper.
      
      While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
      by the allocation failure.
      
      Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3419dc51
    • Mike Snitzer's avatar
      dm flakey: fix reads to be issued if drop_writes configured · 46049e5c
      Mike Snitzer authored
      commit 299f6230 upstream.
      
      v4.8-rc3 commit 99f3c90d ("dm flakey: error READ bios during the
      down_interval") overlooked the 'drop_writes' feature, which is meant to
      allow reads to be issued rather than errored, during the down_interval.
      
      Fixes: 99f3c90d ("dm flakey: error READ bios during the down_interval")
      Reported-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      46049e5c
    • Chris Metcalf's avatar
      tile: avoid using clocksource_cyc2ns with absolute cycle count · 04be31f5
      Chris Metcalf authored
      commit e658a6f1 upstream.
      
      For large values of "mult" and long uptimes, the intermediate
      result of "cycles * mult" can overflow 64 bits.  For example,
      the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
      we have mult = 853, and after 208.5 days, we overflow 64 bits.
      
      Since clocksource_cyc2ns() is intended to be used for relative
      cycle counts, not absolute cycle counts, performance is more
      importance than accepting a wider range of cycle values.  So,
      just use mult_frac() directly in tile's sched_clock().
      
      Commit 4cecf6d4 ("sched, x86: Avoid unnecessary overflow
      in sched_clock") by Salman Qazi results in essentially the same
      generated code for x86 as this change does for tile.  In fact,
      a follow-on change by Salman introduced mult_frac() and switched
      to using it, so the C code was largely identical at that point too.
      
      Peter Zijlstra then added mul_u64_u32_shr() and switched x86
      to use it.  This is, in principle, better; by optimizing the
      64x64->64 multiplies to be 32x32->64 multiplies we can potentially
      save some time.  However, the compiler piplines the 64x64->64
      multiplies pretty well, and the conditional branch in the generic
      mul_u64_u32_shr() causes some bubbles in execution, with the
      result that it's pretty much a wash.  If tilegx provided its own
      implementation of mul_u64_u32_shr() without the conditional branch,
      we could potentially save 3 cycles, but that seems like small gain
      for a fair amount of additional build scaffolding; no other platform
      currently provides a mul_u64_u32_shr() override, and tile doesn't
      currently have an <asm/div64.h> header to put the override in.
      
      Additionally, gcc currently has an optimization bug that prevents
      it from recognizing the opportunity to use a 32x32->64 multiply,
      and so the result would be no better than the existing mult_frac()
      until such time as the compiler is fixed.
      
      For now, just using mult_frac() seems like the right answer.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      04be31f5
    • Myron Stowe's avatar
      PCI: Handle read-only BARs on AMD CS553x devices · 569d62a3
      Myron Stowe authored
      commit 06cf35f9 upstream.
      
      Some AMD CS553x devices have read-only BARs because of a firmware or
      hardware defect.  There's a workaround in quirk_cs5536_vsa(), but it no
      longer works after 36e81648 ("PCI: Restore detection of read-only
      BARs").  Prior to 36e81648, we filled in res->start; afterwards we
      leave it zeroed out.  The quirk only updated the size, so the driver tried
      to use a region starting at zero, which didn't work.
      
      Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
      hard-code the sizes.
      
      On Nix's system BAR 2's read-only value is 0x6200.  Prior to 36e81648,
      we interpret that as a 512-byte BAR based on the lowest-order bit set.  Per
      datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
      avoid clearing any address bits if a platform uses only 64-byte alignment.
      
      [js] pcibios_bus_to_resource takes pdev, not bus in 3.12
      
      [bhelgaas: changelog, reduce BAR 2 size to 64]
      Fixes: 36e81648 ("PCI: Restore detection of read-only BARs")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
      Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
      Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdfReported-and-tested-by: default avatarNix <nix@esperi.org.uk>
      Signed-off-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      569d62a3
    • Punit Agrawal's avatar
      ACPI / APEI: Fix incorrect return value of ghes_proc() · bdb29c7b
      Punit Agrawal authored
      commit 806487a8 upstream.
      
      Although ghes_proc() tests for errors while reading the error status,
      it always return success (0). Fix this by propagating the return
      value.
      
      Fixes: d334a491 (ACPI, APEI, Generic Hardware Error Source memory error support)
      Signed-of-by: default avatarPunit Agrawal <punit.agrawa.@arm.com>
      Tested-by: default avatarTyler Baicar <tbaicar@codeaurora.org>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      [ rjw: Subject ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bdb29c7b
    • Alexander Usyskin's avatar
      mei: bus: fix received data size check in NFC fixup · 2ef72292
      Alexander Usyskin authored
      commit 582ab27a upstream.
      
      NFC version reply size checked against only header size, not against
      full message size. That may lead potentially to uninitialized memory access
      in version data.
      
      That leads to warnings when version data is accessed:
      drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]:  => 212:2
      
      Reported in
      Build regressions/improvements in v4.9-rc3
      https://lkml.org/lkml/2016/10/30/57
      
      [js] the check is in 3.12 only once
      
      Fixes: 59fcd7c6 (mei: nfc: Initial nfc implementation)
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2ef72292
    • Arnd Bergmann's avatar
      staging: iio: ad5933: avoid uninitialized variable in error case · 0b0d4fc3
      Arnd Bergmann authored
      commit 34eee70a upstream.
      
      The ad5933_i2c_read function returns an error code to indicate
      whether it could read data or not. However ad5933_work() ignores
      this return code and just accesses the data unconditionally,
      which gets detected by gcc as a possible bug:
      
      drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work':
      drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      This adds minimal error handling so we only evaluate the
      data if it was correctly read.
      
      Link: https://patchwork.kernel.org/patch/8110281/Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0b0d4fc3
    • Long Li's avatar
      hv: do not lose pending heartbeat vmbus packets · c670aaed
      Long Li authored
      commit 407a3aee upstream.
      
      The host keeps sending heartbeat packets independent of the
      guest responding to them.  Even though we respond to the heartbeat messages at
      interrupt level, we can have situations where there maybe multiple heartbeat
      messages pending that have not been responded to. For instance this occurs when the
      VM is paused and the host continues to send the heartbeat messages.
      Address this issue by draining and responding to all
      the heartbeat messages that maybe pending.
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c670aaed
    • David Howells's avatar
      KEYS: Fix short sprintf buffer in /proc/keys show function · b9eabdf0
      David Howells authored
      commit 03dab869 upstream.
      
      This fixes CVE-2016-7042.
      
      Fix a short sprintf buffer in proc_keys_show().  If the gcc stack protector
      is turned on, this can cause a panic due to stack corruption.
      
      The problem is that xbuf[] is not big enough to hold a 64-bit timeout
      rendered as weeks:
      
      	(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
      	$2 = 30500568904943
      
      That's 14 chars plus NUL, not 11 chars plus NUL.
      
      Expand the buffer to 16 chars.
      
      I think the unpatched code apparently works if the stack-protector is not
      enabled because on a 32-bit machine the buffer won't be overflowed and on a
      64-bit machine there's a 64-bit aligned pointer at one side and an int that
      isn't checked again on the other side.
      
      The panic incurred looks something like:
      
      Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe
      CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f
       ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6
       ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679
      Call Trace:
       [<ffffffff813d941f>] dump_stack+0x63/0x84
       [<ffffffff811b2cb6>] panic+0xde/0x22a
       [<ffffffff81352ebe>] ? proc_keys_show+0x3ce/0x3d0
       [<ffffffff8109f7f9>] __stack_chk_fail+0x19/0x30
       [<ffffffff81352ebe>] proc_keys_show+0x3ce/0x3d0
       [<ffffffff81350410>] ? key_validate+0x50/0x50
       [<ffffffff8134db30>] ? key_default_cmp+0x20/0x20
       [<ffffffff8126b31c>] seq_read+0x2cc/0x390
       [<ffffffff812b6b12>] proc_reg_read+0x42/0x70
       [<ffffffff81244fc7>] __vfs_read+0x37/0x150
       [<ffffffff81357020>] ? security_file_permission+0xa0/0xc0
       [<ffffffff81246156>] vfs_read+0x96/0x130
       [<ffffffff81247635>] SyS_read+0x55/0xc0
       [<ffffffff817eb872>] entry_SYSCALL_64_fastpath+0x1a/0xa4
      Reported-by: default avatarOndrej Kozina <okozina@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarOndrej Kozina <okozina@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b9eabdf0
    • Jan Viktorin's avatar
      uio: fix dmem_region_start computation · a6e9fafc
      Jan Viktorin authored
      commit 4d31a258 upstream.
      
      The variable i contains a total number of resources (including
      IORESOURCE_IRQ). However, we want the dmem_region_start to point
      after the last resource of type IORESOURCE_MEM. The original behaviour
      leads (very likely) to skipping several UIO mapping regions and makes
      them useless. Fix this by computing dmem_region_start from the uiomem
      which points to the last used UIO mapping.
      
      Fixes: 0a0c3b5a ("Add new uio device for dynamic memory allocation")
      Signed-off-by: default avatarJan Viktorin <viktorin@rehivetech.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a6e9fafc
    • Liu Gang's avatar
      gpio: mpc8xxx: Correct irq handler function · 4499fcbd
      Liu Gang authored
      commit d71cf15b upstream.
      
      From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq"
      has being used to handle GPIO interrupts in the PowerPC/Layerscape
      platforms. But actually, almost all PowerPC/Layerscape platforms
      assert an interrupt request upon either a high-to-low change or
      any change on the state of the signal.
      
      So the "handle_level_irq" is not reasonable for PowerPC/Layerscape
      GPIO interrupt, it should be "handle_edge_irq". Otherwise the system
      may lost some interrupts from the PIN's state changes.
      Signed-off-by: default avatarLiu Gang <Gang.Liu@nxp.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4499fcbd
    • Mauro Carvalho Chehab's avatar
      cx231xx: fix GPIOs for Pixelview SBTVD hybrid · dd228932
      Mauro Carvalho Chehab authored
      commit 24b923f0 upstream.
      
      This device uses GPIOs: 28 to switch between analog and
      digital modes: on digital mode, it should be set to 1.
      
      The code that sets it on analog mode is OK, but it misses
      the logic that sets it on digital mode.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd228932
    • Mauro Carvalho Chehab's avatar
      cx231xx: don't return error on success · 9865f089
      Mauro Carvalho Chehab authored
      commit 1871d718 upstream.
      
      The cx231xx_set_agc_analog_digital_mux_select() callers
      expect it to return 0 or an error. Returning a positive value
      makes the first attempt to switch between analog/digital to fail.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9865f089
    • Mauro Carvalho Chehab's avatar
      mb86a20s: fix demod settings · e3ba79c7
      Mauro Carvalho Chehab authored
      commit 505a0ea7 upstream.
      
      With the current settings, only one channel locks properly.
      That's likely because, when this driver was written, Brazil
      were still using experimental transmissions.
      
      Change it to reproduce the settings used by the newer drivers.
      That makes it lock on other channels.
      
      Tested with both PixelView SBTVD Hybrid (cx231xx-based) and
      C3Tech Digital Duo HDTV/SDTV (em28xx-based) devices.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e3ba79c7
    • Mauro Carvalho Chehab's avatar
      mb86a20s: fix the locking logic · e80cf277
      Mauro Carvalho Chehab authored
      commit dafb65fb upstream.
      
      On this frontend, it takes a while to start output normal
      TS data. That only happens on state S9. On S8, the TS output
      is enabled, but it is not reliable enough.
      
      However, the zigzag loop is too fast to let it sync.
      
      As, on practical tests, the zigzag software loop doesn't
      seem to be helping, but just slowing down the tuning, let's
      switch to hardware algorithm, as the tuners used on such
      devices are capable of work with frequency drifts without
      any help from software.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e80cf277
    • Andrew Bresticker's avatar
      pstore/ram: Use memcpy_fromio() to save old buffer · 8996ef00
      Andrew Bresticker authored
      commit d771fdf9 upstream.
      
      The ramoops buffer may be mapped as either I/O memory or uncached
      memory.  On ARM64, this results in a device-type (strongly-ordered)
      mapping.  Since unnaligned accesses to device-type memory will
      generate an alignment fault (regardless of whether or not strict
      alignment checking is enabled), it is not safe to use memcpy().
      memcpy_fromio() is guaranteed to only use aligned accesses, so use
      that instead.
      Signed-off-by: default avatarAndrew Bresticker <abrestic@chromium.org>
      Signed-off-by: default avatarEnric Balletbo Serra <enric.balletbo@collabora.com>
      Reviewed-by: default avatarPuneet Kumar <puneetster@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8996ef00
    • Furquan Shaikh's avatar
      pstore/ram: Use memcpy_toio instead of memcpy · 0d29b98a
      Furquan Shaikh authored
      commit 7e75678d upstream.
      
      persistent_ram_update uses vmap / iomap based on whether the buffer is in
      memory region or reserved region. However, both map it as non-cacheable
      memory. For armv8 specifically, non-cacheable mapping requests use a
      memory type that has to be accessed aligned to the request size. memcpy()
      doesn't guarantee that.
      Signed-off-by: default avatarFurquan Shaikh <furquan@google.com>
      Signed-off-by: default avatarEnric Balletbo Serra <enric.balletbo@collabora.com>
      Reviewed-by: default avatarAaron Durbin <adurbin@chromium.org>
      Reviewed-by: default avatarOlof Johansson <olofj@chromium.org>
      Tested-by: default avatarFurquan Shaikh <furquan@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0d29b98a
    • Sebastian Andrzej Siewior's avatar
      pstore/core: drop cmpxchg based updates · 90efe9d4
      Sebastian Andrzej Siewior authored
      commit d5a9bf0b upstream.
      
      I have here a FPGA behind PCIe which exports SRAM which I use for
      pstore. Now it seems that the FPGA no longer supports cmpxchg based
      updates and writes back 0xff…ff and returns the same.  This leads to
      crash during crash rendering pstore useless.
      Since I doubt that there is much benefit from using cmpxchg() here, I am
      dropping this atomic access and use the spinlock based version.
      
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Rabin Vincent <rabinv@axis.com>
      Tested-by: default avatarRabin Vincent <rabinv@axis.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      [kees: remove "_locked" suffix since it's the only option now]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      90efe9d4
    • Daniel Glöckner's avatar
      mmc: block: don't use CMD23 with very old MMC cards · 98af342b
      Daniel Glöckner authored
      commit 0ed50abb upstream.
      
      CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1.
      Older versions of the specification allowed to terminate
      multi-block transfers only with CMD12.
      
      The patch fixes the following problem:
      
        mmc0: new MMC card at address 0001
        mmcblk0: mmc0:0001 SDMB-16 15.3 MiB
        mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900
        ...
        blk_update_request: I/O error, dev mmcblk0, sector 0
        Buffer I/O error on dev mmcblk0, logical block 0, async page read
         mmcblk0: unable to read partition table
      Signed-off-by: default avatarDaniel Glöckner <dg@emlix.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      98af342b
    • Fabio Estevam's avatar
      mmc: mxs: Initialize the spinlock prior to using it · 0d9dddc0
      Fabio Estevam authored
      commit f91346e8 upstream.
      
      An interrupt may occur right after devm_request_irq() is called and
      prior to the spinlock initialization, leading to a kernel oops,
      as the interrupt handler uses the spinlock.
      
      In order to prevent this problem, move the spinlock initialization
      prior to requesting the interrupts.
      
      Fixes: e4243f13 (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Reviewed-by: default avatarMarek Vasut <marex@denx.de>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0d9dddc0
    • Johan Hovold's avatar
      PM / sleep: fix device reference leak in test_suspend · 6d97f02a
      Johan Hovold authored
      commit ceb75787 upstream.
      
      Make sure to drop the reference taken by class_find_device() after
      opening the RTC device.
      
      Fixes: 77437fd4 (pm: boot time suspend selftest)
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6d97f02a
    • Johan Hovold's avatar
      mfd: core: Fix device reference leak in mfd_clone_cell · 26b5a4a5
      Johan Hovold authored
      commit 722f1910 upstream.
      
      Make sure to drop the reference taken by bus_find_device_by_name()
      before returning from mfd_clone_cell().
      
      Fixes: a9bbba99 ("mfd: add platform_device sharing support for mfd")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      26b5a4a5
    • Jaewon Kim's avatar
      ratelimit: fix bug in time interval by resetting right begin time · 479ce88b
      Jaewon Kim authored
      commit c2594bc3 upstream.
      
      rs->begin in ratelimit is set in two cases.
       1) when rs->begin was not initialized
       2) when rs->interval was passed
      
      For case #2, current ratelimit sets the begin to 0.  This incurrs
      improper suppression.  The begin value will be set in the next ratelimit
      call by 1).  Then the time interval check will be always false, and
      rs->printed will not be initialized.  Although enough time passed,
      ratelimit may return 0 if rs->printed is not less than rs->burst.  To
      reset interval properly, begin should be jiffies rather than 0.
      
      For an example code below:
      
          static DEFINE_RATELIMIT_STATE(mylimit, 1, 1);
          for (i = 1; i <= 10; i++) {
              if (__ratelimit(&mylimit))
                  printk("ratelimit test count %d\n", i);
              msleep(3000);
          }
      
      test result in the current code shows suppression even there is 3 seconds sleep.
      
        [  78.391148] ratelimit test count 1
        [  81.295988] ratelimit test count 2
        [  87.315981] ratelimit test count 4
        [  93.336267] ratelimit test count 6
        [  99.356031] ratelimit test count 8
        [ 105.376367] ratelimit test count 10
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      479ce88b
    • Ding Tianhong's avatar
      rcu: Fix soft lockup for rcu_nocb_kthread · fe5080ab
      Ding Tianhong authored
      commit bedc1969 upstream.
      
      Carrying out the following steps results in a softlockup in the
      RCU callback-offload (rcuo) kthreads:
      
      1. Connect to ixgbevf, and set the speed to 10Gb/s.
      2. Use ifconfig to bring the nic up and down repeatedly.
      
      [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
      [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
      [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
      [  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
      [  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
      [  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
      [  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
      [  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
      [  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
      [  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
      [  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
      [  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
      [  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  368.106005] Stack:
      [  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
      [  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
      [  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
      [  368.106005] Call Trace:
      [  368.106005]  <IRQ>
      [  368.106005]
      [  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
      [  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
      [  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
      [  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
      [  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
      [  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
      [  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
      [  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
      [  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
      [  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
      [  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
      [  368.106005]  <EOI>
      [  368.106005]
      [  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
      [  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
      [  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
      [  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
      [  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
      [  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
      [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
      [  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
      [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
      
      ==================================cut here==============================
      
      It turns out that the rcuos callback-offload kthread is busy processing
      a very large quantity of RCU callbacks, and it is not reliquishing the
      CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
      within the loop to allow other tasks to run.
      
      [js] use onlu cond_resched() in 3.12
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Dhaval Giani <dhaval.giani@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      fe5080ab