1. 30 Apr, 2017 17 commits
    • Peter Zijlstra's avatar
      perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race · 416bd4a3
      Peter Zijlstra authored
      commit 321027c1 upstream.
      
      Di Shen reported a race between two concurrent sys_perf_event_open()
      calls where both try and move the same pre-existing software group
      into a hardware context.
      
      The problem is exactly that described in commit:
      
        f63a8daa ("perf: Fix event->ctx locking")
      
      ... where, while we wait for a ctx->mutex acquisition, the event->ctx
      relation can have changed under us.
      
      That very same commit failed to recognise sys_perf_event_context() as an
      external access vector to the events and thereby didn't apply the
      established locking rules correctly.
      
      So while one sys_perf_event_open() call is stuck waiting on
      mutex_lock_double(), the other (which owns said locks) moves the group
      about. So by the time the former sys_perf_event_open() acquires the
      locks, the context we've acquired is stale (and possibly dead).
      
      Apply the established locking rules as per perf_event_ctx_lock_nested()
      to the mutex_lock_double() for the 'move_group' case. This obviously means
      we need to validate state after we acquire the locks.
      
      Reported-by: Di Shen (Keen Lab)
      Tested-by: default avatarJohn Dias <joaodias@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Min Chong <mchong@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: f63a8daa ("perf: Fix event->ctx locking")
      Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [bwh: Backported to 4.4:
       - Test perf_event::group_flags instead of group_caps
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      416bd4a3
    • Eric Dumazet's avatar
      ping: implement proper locking · b7f47c79
      Eric Dumazet authored
      commit 43a66845 upstream.
      
      We got a report of yet another bug in ping
      
      http://www.openwall.com/lists/oss-security/2017/03/24/6
      
      ->disconnect() is not called with socket lock held.
      
      Fix this by acquiring ping rwlock earlier.
      
      Thanks to Daniel, Alexander and Andrey for letting us know this problem.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDaniel Jiang <danieljiang0415@gmail.com>
      Reported-by: default avatarSolar Designer <solar@openwall.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7f47c79
    • EunTaik Lee's avatar
      staging/android/ion : fix a race condition in the ion driver · a7544fdd
      EunTaik Lee authored
      commit 9590232b upstream.
      
      There is a use-after-free problem in the ion driver.
      This is caused by a race condition in the ion_ioctl()
      function.
      
      A handle has ref count of 1 and two tasks on different
      cpus calls ION_IOC_FREE simultaneously.
      
      cpu 0                                   cpu 1
      -------------------------------------------------------
      ion_handle_get_by_id()
      (ref == 2)
                                  ion_handle_get_by_id()
                                  (ref == 3)
      
      ion_free()
      (ref == 2)
      
      ion_handle_put()
      (ref == 1)
      
                                  ion_free()
                                  (ref == 0 so ion_handle_destroy() is
                                  called
                                  and the handle is freed.)
      
                                  ion_handle_put() is called and it
                                  decreases the slub's next free pointer
      
      The problem is detected as an unaligned access in the
      spin lock functions since it uses load exclusive
       instruction. In some cases it corrupts the slub's
      free pointer which causes a mis-aligned access to the
      next free pointer.(kmalloc returns a pointer like
      ffffc0745b4580aa). And it causes lots of other
      hard-to-debug problems.
      
      This symptom is caused since the first member in the
      ion_handle structure is the reference count and the
      ion driver decrements the reference after it has been
      freed.
      
      To fix this problem client->lock mutex is extended
      to protect all the codes that uses the handle.
      Signed-off-by: default avatarEun Taik Lee <eun.taik.lee@samsung.com>
      Reviewed-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      index 7ff2a7ec871f..33b390e7ea31
      a7544fdd
    • Vlad Tsyrklevich's avatar
      vfio/pci: Fix integer overflows, bitmask check · d23ef85b
      Vlad Tsyrklevich authored
      commit 05692d70 upstream.
      
      The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
      user-supplied integers, potentially allowing memory corruption. This
      patch adds appropriate integer overflow checks, checks the range bounds
      for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
      in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
      VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
      vfio_pci_set_irqs_ioctl().
      
      Furthermore, a kzalloc is changed to a kcalloc because the use of a
      kzalloc with an integer multiplication allowed an integer overflow
      condition to be reached without this patch. kcalloc checks for overflow
      and should prevent a similar occurrence.
      Signed-off-by: default avatarVlad Tsyrklevich <vlad@tsyrklevich.net>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d23ef85b
    • Michal Kubeček's avatar
      tipc: check minimum bearer MTU · 65d30f75
      Michal Kubeček authored
      commit 3de81b75 upstream.
      
      Qian Zhang (张谦) reported a potential socket buffer overflow in
      tipc_msg_build() which is also known as CVE-2016-8632: due to
      insufficient checks, a buffer overflow can occur if MTU is too short for
      even tipc headers. As anyone can set device MTU in a user/net namespace,
      this issue can be abused by a regular user.
      
      As agreed in the discussion on Ben Hutchings' original patch, we should
      check the MTU at the moment a bearer is attached rather than for each
      processed packet. We also need to repeat the check when bearer MTU is
      adjusted to new device MTU. UDP case also needs a check to avoid
      overflow when calculating bearer MTU.
      
      Fixes: b97bf3fd ("[TIPC] Initial merge")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Reported-by: default avatarQian Zhang (张谦) <zhangqian-c@360.cn>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 4.4:
       - Adjust context
       - NETDEV_GOING_DOWN and NETDEV_CHANGEMTU cases in net notifier were combined]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65d30f75
    • Phil Turnbull's avatar
      netfilter: nfnetlink: correctly validate length of batch messages · 9540baad
      Phil Turnbull authored
      commit c58d6c93 upstream.
      
      If nlh->nlmsg_len is zero then an infinite loop is triggered because
      'skb_pull(skb, msglen);' pulls zero bytes.
      
      The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len <
      NLMSG_HDRLEN' which bypasses the length validation and will later
      trigger an out-of-bound read.
      
      If the length validation does fail then the malformed batch message is
      copied back to userspace. However, we cannot do this because the
      nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in
      netlink_ack:
      
          [   41.455421] ==================================================================
          [   41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr ffff880119e79340
          [   41.456431] Read of size 4294967280 by task a.out/987
          [   41.456431] =============================================================================
          [   41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected
          [   41.456431] -----------------------------------------------------------------------------
          ...
          [   41.456431] Bytes b4 ffff880119e79310: 00 00 00 00 d5 03 00 00 b0 fb fe ff 00 00 00 00  ................
          [   41.456431] Object ffff880119e79320: 20 00 00 00 10 00 05 00 00 00 00 00 00 00 00 00   ...............
          [   41.456431] Object ffff880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 22 33 10 00 05  .......@EV."3...
          [   41.456431] Object ffff880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 0a 00 06 fe fb  ................
                                                  ^^ start of batch nlmsg with
                                                     nlmsg_len=4294967280
          ...
          [   41.456431] Memory state around the buggy address:
          [   41.456431]  ffff880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          [   41.456431]  ffff880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          [   41.456431] >ffff880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
          [   41.456431]                                ^
          [   41.456431]  ffff880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          [   41.456431]  ffff880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
          [   41.456431] ==================================================================
      
      Fix this with better validation of nlh->nlmsg_len and by setting
      NFNL_BATCH_FAILURE if any batch message fails length validation.
      
      CAP_NET_ADMIN is required to trigger the bugs.
      
      Fixes: 9ea2aa8b ("netfilter: nfnetlink: validate nfnetlink header from batch")
      Signed-off-by: default avatarPhil Turnbull <phil.turnbull@oracle.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9540baad
    • Mauro Carvalho Chehab's avatar
      xc2028: avoid use after free · 0d9dac5d
      Mauro Carvalho Chehab authored
      commit 8dfbcc43 upstream.
      
      If struct xc2028_config is passed without a firmware name,
      the following trouble may happen:
      
      [11009.907205] xc2028 5-0061: type set to XCeive xc2028/xc3028 tuner
      [11009.907491] ==================================================================
      [11009.907750] BUG: KASAN: use-after-free in strcmp+0x96/0xb0 at addr ffff8803bd78ab40
      [11009.907992] Read of size 1 by task modprobe/28992
      [11009.907994] =============================================================================
      [11009.907997] BUG kmalloc-16 (Tainted: G        W      ): kasan: bad access detected
      [11009.907999] -----------------------------------------------------------------------------
      
      [11009.908008] INFO: Allocated in xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd] age=0 cpu=3 pid=28992
      [11009.908012] 	___slab_alloc+0x581/0x5b0
      [11009.908014] 	__slab_alloc+0x51/0x90
      [11009.908017] 	__kmalloc+0x27b/0x350
      [11009.908022] 	xhci_urb_enqueue+0x214/0x14c0 [xhci_hcd]
      [11009.908026] 	usb_hcd_submit_urb+0x1e8/0x1c60
      [11009.908029] 	usb_submit_urb+0xb0e/0x1200
      [11009.908032] 	usb_serial_generic_write_start+0xb6/0x4c0
      [11009.908035] 	usb_serial_generic_write+0x92/0xc0
      [11009.908039] 	usb_console_write+0x38a/0x560
      [11009.908045] 	call_console_drivers.constprop.14+0x1ee/0x2c0
      [11009.908051] 	console_unlock+0x40d/0x900
      [11009.908056] 	vprintk_emit+0x4b4/0x830
      [11009.908061] 	vprintk_default+0x1f/0x30
      [11009.908064] 	printk+0x99/0xb5
      [11009.908067] 	kasan_report_error+0x10a/0x550
      [11009.908070] 	__asan_report_load1_noabort+0x43/0x50
      [11009.908074] INFO: Freed in xc2028_set_config+0x90/0x630 [tuner_xc2028] age=1 cpu=3 pid=28992
      [11009.908077] 	__slab_free+0x2ec/0x460
      [11009.908080] 	kfree+0x266/0x280
      [11009.908083] 	xc2028_set_config+0x90/0x630 [tuner_xc2028]
      [11009.908086] 	xc2028_attach+0x310/0x8a0 [tuner_xc2028]
      [11009.908090] 	em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
      [11009.908094] 	em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
      [11009.908098] 	em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
      [11009.908101] 	em28xx_register_extension+0xd9/0x190 [em28xx]
      [11009.908105] 	em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
      [11009.908108] 	do_one_initcall+0x141/0x300
      [11009.908111] 	do_init_module+0x1d0/0x5ad
      [11009.908114] 	load_module+0x6666/0x9ba0
      [11009.908117] 	SyS_finit_module+0x108/0x130
      [11009.908120] 	entry_SYSCALL_64_fastpath+0x16/0x76
      [11009.908123] INFO: Slab 0xffffea000ef5e280 objects=25 used=25 fp=0x          (null) flags=0x2ffff8000004080
      [11009.908126] INFO: Object 0xffff8803bd78ab40 @offset=2880 fp=0x0000000000000001
      
      [11009.908130] Bytes b4 ffff8803bd78ab30: 01 00 00 00 2a 07 00 00 9d 28 00 00 01 00 00 00  ....*....(......
      [11009.908133] Object ffff8803bd78ab40: 01 00 00 00 00 00 00 00 b0 1d c3 6a 00 88 ff ff  ...........j....
      [11009.908137] CPU: 3 PID: 28992 Comm: modprobe Tainted: G    B   W       4.5.0-rc1+ #43
      [11009.908140] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0350.2015.0812.1722 08/12/2015
      [11009.908142]  ffff8803bd78a000 ffff8802c273f1b8 ffffffff81932007 ffff8803c6407a80
      [11009.908148]  ffff8802c273f1e8 ffffffff81556759 ffff8803c6407a80 ffffea000ef5e280
      [11009.908153]  ffff8803bd78ab40 dffffc0000000000 ffff8802c273f210 ffffffff8155ccb4
      [11009.908158] Call Trace:
      [11009.908162]  [<ffffffff81932007>] dump_stack+0x4b/0x64
      [11009.908165]  [<ffffffff81556759>] print_trailer+0xf9/0x150
      [11009.908168]  [<ffffffff8155ccb4>] object_err+0x34/0x40
      [11009.908171]  [<ffffffff8155f260>] kasan_report_error+0x230/0x550
      [11009.908175]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
      [11009.908179]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908182]  [<ffffffff8155f5c3>] __asan_report_load1_noabort+0x43/0x50
      [11009.908185]  [<ffffffff8155ea00>] ? __asan_register_globals+0x50/0xa0
      [11009.908189]  [<ffffffff8194cea6>] ? strcmp+0x96/0xb0
      [11009.908192]  [<ffffffff8194cea6>] strcmp+0x96/0xb0
      [11009.908196]  [<ffffffffa13ba4ac>] xc2028_set_config+0x15c/0x630 [tuner_xc2028]
      [11009.908200]  [<ffffffffa13bac90>] xc2028_attach+0x310/0x8a0 [tuner_xc2028]
      [11009.908203]  [<ffffffff8155ea78>] ? memset+0x28/0x30
      [11009.908206]  [<ffffffffa13ba980>] ? xc2028_set_config+0x630/0x630 [tuner_xc2028]
      [11009.908211]  [<ffffffffa157a59a>] em28xx_attach_xc3028.constprop.7+0x1f9/0x30d [em28xx_dvb]
      [11009.908215]  [<ffffffffa157aa2a>] ? em28xx_dvb_init.part.3+0x37c/0x5cf4 [em28xx_dvb]
      [11009.908219]  [<ffffffffa157a3a1>] ? hauppauge_hvr930c_init+0x487/0x487 [em28xx_dvb]
      [11009.908222]  [<ffffffffa01795ac>] ? lgdt330x_attach+0x1cc/0x370 [lgdt330x]
      [11009.908226]  [<ffffffffa01793e0>] ? i2c_read_demod_bytes.isra.2+0x210/0x210 [lgdt330x]
      [11009.908230]  [<ffffffff812e87d0>] ? ref_module.part.15+0x10/0x10
      [11009.908233]  [<ffffffff812e56e0>] ? module_assert_mutex_or_preempt+0x80/0x80
      [11009.908238]  [<ffffffffa157af92>] em28xx_dvb_init.part.3+0x8e4/0x5cf4 [em28xx_dvb]
      [11009.908242]  [<ffffffffa157a6ae>] ? em28xx_attach_xc3028.constprop.7+0x30d/0x30d [em28xx_dvb]
      [11009.908245]  [<ffffffff8195222d>] ? string+0x14d/0x1f0
      [11009.908249]  [<ffffffff8195381f>] ? symbol_string+0xff/0x1a0
      [11009.908253]  [<ffffffff81953720>] ? uuid_string+0x6f0/0x6f0
      [11009.908257]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
      [11009.908260]  [<ffffffff8104b02f>] ? print_context_stack+0x7f/0xf0
      [11009.908264]  [<ffffffff812e9846>] ? __module_address+0xb6/0x360
      [11009.908268]  [<ffffffff8137fdc9>] ? is_ftrace_trampoline+0x99/0xe0
      [11009.908271]  [<ffffffff811a775e>] ? __kernel_text_address+0x7e/0xa0
      [11009.908275]  [<ffffffff81240a70>] ? debug_check_no_locks_freed+0x290/0x290
      [11009.908278]  [<ffffffff8104a24b>] ? dump_trace+0x11b/0x300
      [11009.908282]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
      [11009.908285]  [<ffffffff81237d71>] ? trace_hardirqs_off_caller+0x21/0x290
      [11009.908289]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
      [11009.908292]  [<ffffffff812404dd>] ? trace_hardirqs_on+0xd/0x10
      [11009.908296]  [<ffffffffa13e8143>] ? em28xx_register_extension+0x23/0x190 [em28xx]
      [11009.908299]  [<ffffffff822dcbb0>] ? mutex_trylock+0x400/0x400
      [11009.908302]  [<ffffffff810021a1>] ? do_one_initcall+0x131/0x300
      [11009.908306]  [<ffffffff81296dc7>] ? call_rcu_sched+0x17/0x20
      [11009.908309]  [<ffffffff8159e708>] ? put_object+0x48/0x70
      [11009.908314]  [<ffffffffa1579f11>] em28xx_dvb_init+0x81/0x8a [em28xx_dvb]
      [11009.908317]  [<ffffffffa13e81f9>] em28xx_register_extension+0xd9/0x190 [em28xx]
      [11009.908320]  [<ffffffffa0150000>] ? 0xffffffffa0150000
      [11009.908324]  [<ffffffffa0150010>] em28xx_dvb_register+0x10/0x1000 [em28xx_dvb]
      [11009.908327]  [<ffffffff810021b1>] do_one_initcall+0x141/0x300
      [11009.908330]  [<ffffffff81002070>] ? try_to_run_init_process+0x40/0x40
      [11009.908333]  [<ffffffff8123ff56>] ? trace_hardirqs_on_caller+0x16/0x590
      [11009.908337]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908340]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908343]  [<ffffffff8155e926>] ? kasan_unpoison_shadow+0x36/0x50
      [11009.908346]  [<ffffffff8155ea37>] ? __asan_register_globals+0x87/0xa0
      [11009.908350]  [<ffffffff8144da7b>] do_init_module+0x1d0/0x5ad
      [11009.908353]  [<ffffffff812f2626>] load_module+0x6666/0x9ba0
      [11009.908356]  [<ffffffff812e9c90>] ? symbol_put_addr+0x50/0x50
      [11009.908361]  [<ffffffffa1580037>] ? em28xx_dvb_init.part.3+0x5989/0x5cf4 [em28xx_dvb]
      [11009.908366]  [<ffffffff812ebfc0>] ? module_frob_arch_sections+0x20/0x20
      [11009.908369]  [<ffffffff815bc940>] ? open_exec+0x50/0x50
      [11009.908374]  [<ffffffff811671bb>] ? ns_capable+0x5b/0xd0
      [11009.908377]  [<ffffffff812f5e58>] SyS_finit_module+0x108/0x130
      [11009.908379]  [<ffffffff812f5d50>] ? SyS_init_module+0x1f0/0x1f0
      [11009.908383]  [<ffffffff81004044>] ? lockdep_sys_exit_thunk+0x12/0x14
      [11009.908394]  [<ffffffff822e6936>] entry_SYSCALL_64_fastpath+0x16/0x76
      [11009.908396] Memory state around the buggy address:
      [11009.908398]  ffff8803bd78aa00: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908401]  ffff8803bd78aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908403] >ffff8803bd78ab00: fc fc fc fc fc fc fc fc 00 00 fc fc fc fc fc fc
      [11009.908405]                                            ^
      [11009.908407]  ffff8803bd78ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908409]  ffff8803bd78ac00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [11009.908411] ==================================================================
      
      In order to avoid it, let's set the cached value of the firmware
      name to NULL after freeing it. While here, return an error if
      the memory allocation fails.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d9dac5d
    • Eric W. Biederman's avatar
      mnt: Add a per mount namespace limit on the number of mounts · c50fd34e
      Eric W. Biederman authored
      commit d2921684 upstream.
      
      CAI Qian <caiqian@redhat.com> pointed out that the semantics
      of shared subtrees make it possible to create an exponentially
      increasing number of mounts in a mount namespace.
      
          mkdir /tmp/1 /tmp/2
          mount --make-rshared /
          for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done
      
      Will create create 2^20 or 1048576 mounts, which is a practical problem
      as some people have managed to hit this by accident.
      
      As such CVE-2016-6213 was assigned.
      
      Ian Kent <raven@themaw.net> described the situation for autofs users
      as follows:
      
      > The number of mounts for direct mount maps is usually not very large because of
      > the way they are implemented, large direct mount maps can have performance
      > problems. There can be anywhere from a few (likely case a few hundred) to less
      > than 10000, plus mounts that have been triggered and not yet expired.
      >
      > Indirect mounts have one autofs mount at the root plus the number of mounts that
      > have been triggered and not yet expired.
      >
      > The number of autofs indirect map entries can range from a few to the common
      > case of several thousand and in rare cases up to between 30000 and 50000. I've
      > not heard of people with maps larger than 50000 entries.
      >
      > The larger the number of map entries the greater the possibility for a large
      > number of active mounts so it's not hard to expect cases of a 1000 or somewhat
      > more active mounts.
      
      So I am setting the default number of mounts allowed per mount
      namespace at 100,000.  This is more than enough for any use case I
      know of, but small enough to quickly stop an exponential increase
      in mounts.  Which should be perfect to catch misconfigurations and
      malfunctioning programs.
      
      For anyone who needs a higher limit this can be changed by writing
      to the new /proc/sys/fs/mount-max sysctl.
      Tested-by: default avatarCAI Qian <caiqian@redhat.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      [bwh: Backported to 4.4: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c50fd34e
    • Jon Paul Maloy's avatar
      tipc: fix socket timer deadlock · 59e0cd11
      Jon Paul Maloy authored
      commit f1d048f2 upstream.
      
      We sometimes observe a 'deadly embrace' type deadlock occurring
      between mutually connected sockets on the same node. This happens
      when the one-hour peer supervision timers happen to expire
      simultaneously in both sockets.
      
      The scenario is as follows:
      
      CPU 1:                          CPU 2:
      --------                        --------
      tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
        lock(sk1.slock)                 lock(sk2.slock)
        msg_create(probe)               msg_create(probe)
        unlock(sk1.slock)               unlock(sk2.slock)
        tipc_node_xmit_skb()            tipc_node_xmit_skb()
          tipc_node_xmit()                tipc_node_xmit()
            tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
              lock(sk2.slock)                 lock((sk1.slock)
              filter_rcv()                    filter_rcv()
                tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
                  msg_create(probe_rsp)           msg_create(probe_rsp)
                  tipc_sk_respond()               tipc_sk_respond()
                    tipc_node_xmit_skb()            tipc_node_xmit_skb()
                      tipc_node_xmit()                tipc_node_xmit()
                        tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                          lock((sk1.slock)                lock((sk2.slock)
                          ===> DEADLOCK                   ===> DEADLOCK
      
      Further analysis reveals that there are three different locations in the
      socket code where tipc_sk_respond() is called within the context of the
      socket lock, with ensuing risk of similar deadlocks.
      
      We now solve this by passing a buffer queue along with all upcalls where
      sk_lock.slock may potentially be held. Response or rejected message
      buffers are accumulated into this queue instead of being sent out
      directly, and only sent once we know we are safely outside the slock
      context.
      Reported-by: default avatarGUNA <gbalasun@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59e0cd11
    • Parthasarathy Bhuvaragan's avatar
      tipc: fix random link resets while adding a second bearer · abc025d1
      Parthasarathy Bhuvaragan authored
      commit d2f394dc upstream.
      
      In a dual bearer configuration, if the second tipc link becomes
      active while the first link still has pending nametable "bulk"
      updates, it randomly leads to reset of the second link.
      
      When a link is established, the function named_distribute(),
      fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
      with NAME_DISTRIBUTOR message for each PUBLICATION.
      However, the function named_distribute() allocates the buffer by
      increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
      This consumes the space allocated for TUNNEL_PROTOCOL.
      
      When establishing the second link, the link shall tunnel all the
      messages in the first link queue including the "bulk" update.
      As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
      the link mtu the transmission fails (-EMSGSIZE).
      
      Thus, the synch point based on the message count of the tunnel
      packets is never reached leading to link timeout.
      
      In this commit, we adjust the size of name distributor message so that
      they can be tunnelled.
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abc025d1
    • Arnd Bergmann's avatar
      gfs2: avoid uninitialized variable warning · d39cb4a5
      Arnd Bergmann authored
      commit 67893f12 upstream.
      
      We get a bogus warning about a potential uninitialized variable
      use in gfs2, because the compiler does not figure out that we
      never use the leaf number if get_leaf_nr() returns an error:
      
      fs/gfs2/dir.c: In function 'get_first_leaf':
      fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
      fs/gfs2/dir.c: In function 'dir_split_leaf':
      fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is
      sufficient to let gcc understand that this is exactly the same
      condition as in IS_ERR() so it can optimize the code path enough
      to understand it.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d39cb4a5
    • Arnd Bergmann's avatar
      hostap: avoid uninitialized variable use in hfa384x_get_rid · 9a35bc2a
      Arnd Bergmann authored
      commit 48dc5fb3 upstream.
      
      The driver reads a value from hfa384x_from_bap(), which may fail,
      and then assigns the value to a local variable. gcc detects that
      in in the failure case, the 'rlen' variable now contains
      uninitialized data:
      
      In file included from ../drivers/net/wireless/intersil/hostap/hostap_pci.c:220:0:
      drivers/net/wireless/intersil/hostap/hostap_hw.c: In function 'hfa384x_get_rid':
      drivers/net/wireless/intersil/hostap/hostap_hw.c:842:5: warning: 'rec' may be used uninitialized in this function [-Wmaybe-uninitialized]
        if (le16_to_cpu(rec.len) == 0) {
      
      This restructures the function as suggested by Russell King, to
      make it more readable and get more reliable error handling, by
      handling each failure mode using a goto.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a35bc2a
    • Arnd Bergmann's avatar
      tty: nozomi: avoid a harmless gcc warning · 58f80ccf
      Arnd Bergmann authored
      commit a4f642a8 upstream.
      
      The nozomi wireless data driver has its own helper function to
      transfer data from a FIFO, doing an extra byte swap on big-endian
      architectures, presumably to bring the data back into byte-serial
      order after readw() or readl() perform their implicit byteswap.
      
      This helper function is used in the receive_data() function to
      first read the length into a 32-bit variable, which causes
      a compile-time warning:
      
      drivers/tty/nozomi.c: In function 'receive_data':
      drivers/tty/nozomi.c:857:9: warning: 'size' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      The problem is that gcc is unsure whether the data was actually
      read or not. We know that it is at this point, so we can replace
      it with a single readl() to shut up that warning.
      
      I am leaving the byteswap in there, to preserve the existing
      behavior, even though this seems fishy: Reading the length of
      the data into a cpu-endian variable should normally not use
      a second byteswap on big-endian systems, unless the hardware
      is aware of the CPU endianess.
      
      There appears to be a lot more confusion about endianess in this
      driver, so it probably has not worked on big-endian systems in
      a long time, if ever, and I have no way to test it. It's well
      possible that this driver has not been used by anyone in a while,
      the last patch that looks like it was tested on the hardware is
      from 2008.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58f80ccf
    • Jon Paul Maloy's avatar
      tipc: correct error in node fsm · 2847736f
      Jon Paul Maloy authored
      commit c4282ca7 upstream.
      
      commit 88e8ac70 ("tipc: reduce transmission rate of reset messages
      when link is down") revealed a flaw in the node FSM, as defined in
      the log of commit 66996b6c ("tipc: extend node FSM").
      
      We see the following scenario:
      1: Node B receives a RESET message from node A before its link endpoint
         is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This
         event will not change the node FSM state, but the (distinct) link FSM
         will move to state RESETTING.
      2: As an effect of the previous event, the local endpoint on B will
         declare node A lost, and post the event SELF_DOWN to the its node
         FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning
         that no messages will be accepted from A until it receives another
         RESET message that confirms that A's endpoint has been reset. This
         is  wasteful, since we know this as a fact already from the first
         received RESET, but worse is that the link instance's FSM has not
         wasted this information, but instead moved on to state ESTABLISHING,
         meaning that it repeatedly sends out ACTIVATE messages to the reset
         peer A.
      3: Node A will receive one of the ACTIVATE messages, move its link FSM
         to state ESTABLISHED, and start repeatedly sending out STATE messages
         to node B.
      4: Node B will consistently drop these messages, since it can only accept
         accept a RESET according to its node FSM.
      5: After four lost STATE messages node A will reset its link and start
         repeatedly sending out RESET messages to B.
      6: Because of the reduced send rate for RESET messages, it is very
         likely that A will receive an ACTIVATE (which is sent out at a much
         higher frequency) before it gets the chance to send a RESET, and A
         may hence quickly move back to state ESTABLISHED and continue sending
         out STATE messages, which will again be dropped by B.
      7: GOTO 5.
      8: After having repeated the cycle 5-7 a number of times, node A will
         by chance get in between with sending a RESET, and the situation is
         resolved.
      
      Unfortunately, we have seen that it may take a substantial amount of
      time before this vicious loop is broken, sometimes in the order of
      minutes.
      
      We correct this by making a small correction to the node FSM: When a
      node in state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now
      moves directly back to state SELF_DOWN_PEER_DOWN, instead of as now
      SELF_DOWN_PEER_LEAVING. This is logically consistent, since we don't
      need to wait for RESET confirmation from of an endpoint that we alread
      know has been reset. It also means that node B in the scenario above
      will not be dropping incoming STATE messages, and the link can come up
      immediately.
      
      Finally, a symmetry comparison reveals that the  FSM has a similar
      error when receiving the event PEER_DOWN in state PEER_UP_SELF_COMING.
      Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly
      to SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect
      of this logical error, we choose fix this one, too.
      
      The node FSM looks as follows after those changes:
      
                                 +----------------------------------------+
                                 |                           PEER_DOWN_EVT|
                                 |                                        |
        +------------------------+----------------+                       |
        |SELF_DOWN_EVT           |                |                       |
        |                        |                |                       |
        |              +-----------+          +-----------+               |
        |              |NODE_      |          |NODE_      |               |
        |   +----------|FAILINGOVER|<---------|SYNCHING   |-----------+   |
        |   |SELF_     +-----------+ FAILOVER_+-----------+   PEER_   |   |
        |   |DOWN_EVT   |          A BEGIN_EVT  A         |   DOWN_EVT|   |
        |   |           |          |            |         |           |   |
        |   |           |          |            |         |           |   |
        |   |           |FAILOVER_ |FAILOVER_   |SYNCH_   |SYNCH_     |   |
        |   |           |END_EVT   |BEGIN_EVT   |BEGIN_EVT|END_EVT    |   |
        |   |           |          |            |         |           |   |
        |   |           |          |            |         |           |   |
        |   |           |         +--------------+        |           |   |
        |   |           +-------->|   SELF_UP_   |<-------+           |   |
        |   |   +-----------------|   PEER_UP    |----------------+   |   |
        |   |   |SELF_DOWN_EVT    +--------------+   PEER_DOWN_EVT|   |   |
        |   |   |                    A        A                   |   |   |
        |   |   |                    |        |                   |   |   |
        |   |   |         PEER_UP_EVT|        |SELF_UP_EVT        |   |   |
        |   |   |                    |        |                   |   |   |
        V   V   V                    |        |                   V   V   V
      +------------+       +-----------+    +-----------+       +------------+
      |SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
      |PEER_LEAVING|       |PEER_COMING|    |SELF_COMING|       |SELF_LEAVING|
      +------------+       +-----------+    +-----------+       +------------+
             |               |       A        A       |                |
             |               |       |        |       |                |
             |       SELF_   |       |SELF_   |PEER_  |PEER_           |
             |       DOWN_EVT|       |UP_EVT  |UP_EVT |DOWN_EVT        |
             |               |       |        |       |                |
             |               |       |        |       |                |
             |               |    +--------------+    |                |
             |PEER_DOWN_EVT  +--->|  SELF_DOWN_  |<---+   SELF_DOWN_EVT|
             +------------------->|  PEER_DOWN   |<--------------------+
                                  +--------------+
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2847736f
    • Jon Paul Maloy's avatar
      tipc: re-enable compensation for socket receive buffer double counting · 76ca3053
      Jon Paul Maloy authored
      commit 7c8bcfb1 upstream.
      
      In the refactoring commit d570d864 ("tipc: enqueue arrived buffers
      in socket in separate function") we did by accident replace the test
      
      if (sk->sk_backlog.len == 0)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      with
      
      if (sk->sk_backlog.len)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      This effectively disables the compensation we have for the double
      receive buffer accounting that occurs temporarily when buffers are
      moved from the backlog to the socket receive queue. Until now, this
      has gone unnoticed because of the large receive buffer limits we are
      applying, but becomes indispensable when we reduce this buffer limit
      later in this series.
      
      We now fix this by inverting the mentioned condition.
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76ca3053
    • Erik Hugne's avatar
      tipc: make dist queue pernet · 3f315590
      Erik Hugne authored
      commit 541726ab upstream.
      
      Nametable updates received from the network that cannot be applied
      immediately are placed on a defer queue. This queue is global to the
      TIPC module, which might cause problems when using TIPC in containers.
      To prevent nametable updates from escaping into the wrong namespace,
      we make the queue pernet instead.
      Signed-off-by: default avatarErik Hugne <erik.hugne@gmail.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f315590
    • Richard Alpe's avatar
      tipc: make sure IPv6 header fits in skb headroom · 44b3b7e0
      Richard Alpe authored
      commit 9bd160bf upstream.
      
      Expand headroom further in order to be able to fit the larger IPv6
      header. Prior to this patch this caused a skb under panic for certain
      tipc packets when using IPv6 UDP bearer(s).
      Signed-off-by: default avatarRichard Alpe <richard.alpe@ericsson.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44b3b7e0
  2. 27 Apr, 2017 23 commits