1. 21 Nov, 2017 8 commits
    • John Johansen's avatar
      apparmor: fix locking when creating a new complain profile. · 5d7c44ef
      John Johansen authored
      Break the per cpu buffer atomic section when creating a new null
      complain profile. In learning mode this won't matter and we can
      safely re-aquire the buffer.
      
      This fixes the following lockdep BUG trace
         nov. 14 14:09:09 cyclope audit[7152]: AVC apparmor="ALLOWED" operation="exec" profile="/usr/sbin/sssd" name="/usr/sbin/adcli" pid=7152 comm="sssd_be" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="/usr/sbin/sssd//null-/usr/sbin/adcli"
          nov. 14 14:09:09 cyclope kernel: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
          nov. 14 14:09:09 cyclope kernel: in_atomic(): 1, irqs_disabled(): 0, pid: 7152, name: sssd_be
          nov. 14 14:09:09 cyclope kernel: 1 lock held by sssd_be/7152:
          nov. 14 14:09:09 cyclope kernel:  #0:  (&sig->cred_guard_mutex){....}, at: [<ffffffff8182d53e>] prepare_bprm_creds+0x4e/0x100
          nov. 14 14:09:09 cyclope kernel: CPU: 3 PID: 7152 Comm: sssd_be Not tainted 4.14.0prahal+intel #150
          nov. 14 14:09:09 cyclope kernel: Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET53WW (1.33 ) 09/15/2017
          nov. 14 14:09:09 cyclope kernel: Call Trace:
          nov. 14 14:09:09 cyclope kernel:  dump_stack+0xb0/0x135
          nov. 14 14:09:09 cyclope kernel:  ? _atomic_dec_and_lock+0x15b/0x15b
          nov. 14 14:09:09 cyclope kernel:  ? lockdep_print_held_locks+0xc4/0x130
          nov. 14 14:09:09 cyclope kernel:  ___might_sleep+0x29c/0x320
          nov. 14 14:09:09 cyclope kernel:  ? rq_clock+0xf0/0xf0
          nov. 14 14:09:09 cyclope kernel:  ? __kernel_text_address+0xd/0x40
          nov. 14 14:09:09 cyclope kernel:  __might_sleep+0x95/0x190
          nov. 14 14:09:09 cyclope kernel:  ? aa_new_null_profile+0x50a/0x960
          nov. 14 14:09:09 cyclope kernel:  __mutex_lock+0x13e/0x1a20
          nov. 14 14:09:09 cyclope kernel:  ? aa_new_null_profile+0x50a/0x960
          nov. 14 14:09:09 cyclope kernel:  ? save_stack+0x43/0xd0
          nov. 14 14:09:09 cyclope kernel:  ? kmem_cache_alloc_trace+0x13f/0x290
          nov. 14 14:09:09 cyclope kernel:  ? mutex_lock_io_nested+0x1880/0x1880
          nov. 14 14:09:09 cyclope kernel:  ? profile_transition+0x932/0x2d40
          nov. 14 14:09:09 cyclope kernel:  ? apparmor_bprm_set_creds+0x1479/0x1f70
          nov. 14 14:09:09 cyclope kernel:  ? security_bprm_set_creds+0x5a/0x80
          nov. 14 14:09:09 cyclope kernel:  ? prepare_binprm+0x366/0x980
          nov. 14 14:09:09 cyclope kernel:  ? do_execveat_common.isra.30+0x12a9/0x2350
          nov. 14 14:09:09 cyclope kernel:  ? SyS_execve+0x2c/0x40
          nov. 14 14:09:09 cyclope kernel:  ? do_syscall_64+0x228/0x650
          nov. 14 14:09:09 cyclope kernel:  ? entry_SYSCALL64_slow_path+0x25/0x25
          nov. 14 14:09:09 cyclope kernel:  ? deactivate_slab.isra.62+0x49d/0x5e0
          nov. 14 14:09:09 cyclope kernel:  ? save_stack_trace+0x16/0x20
          nov. 14 14:09:09 cyclope kernel:  ? init_object+0x88/0x90
          nov. 14 14:09:09 cyclope kernel:  ? ___slab_alloc+0x520/0x590
          nov. 14 14:09:09 cyclope kernel:  ? ___slab_alloc+0x520/0x590
          nov. 14 14:09:09 cyclope kernel:  ? aa_alloc_proxy+0xab/0x200
          nov. 14 14:09:09 cyclope kernel:  ? lock_downgrade+0x7e0/0x7e0
          nov. 14 14:09:09 cyclope kernel:  ? memcg_kmem_get_cache+0x970/0x970
          nov. 14 14:09:09 cyclope kernel:  ? kasan_unpoison_shadow+0x35/0x50
          nov. 14 14:09:09 cyclope kernel:  ? kasan_unpoison_shadow+0x35/0x50
          nov. 14 14:09:09 cyclope kernel:  ? kasan_kmalloc+0xad/0xe0
          nov. 14 14:09:09 cyclope kernel:  ? aa_alloc_proxy+0xab/0x200
          nov. 14 14:09:09 cyclope kernel:  ? kmem_cache_alloc_trace+0x13f/0x290
          nov. 14 14:09:09 cyclope kernel:  ? aa_alloc_proxy+0xab/0x200
          nov. 14 14:09:09 cyclope kernel:  ? aa_alloc_proxy+0xab/0x200
          nov. 14 14:09:09 cyclope kernel:  ? _raw_spin_unlock+0x22/0x30
          nov. 14 14:09:09 cyclope kernel:  ? vec_find+0xa0/0xa0
          nov. 14 14:09:09 cyclope kernel:  ? aa_label_init+0x6f/0x230
          nov. 14 14:09:09 cyclope kernel:  ? __label_insert+0x3e0/0x3e0
          nov. 14 14:09:09 cyclope kernel:  ? kmem_cache_alloc_trace+0x13f/0x290
          nov. 14 14:09:09 cyclope kernel:  ? aa_alloc_profile+0x58/0x200
          nov. 14 14:09:09 cyclope kernel:  mutex_lock_nested+0x16/0x20
          nov. 14 14:09:09 cyclope kernel:  ? mutex_lock_nested+0x16/0x20
          nov. 14 14:09:09 cyclope kernel:  aa_new_null_profile+0x50a/0x960
          nov. 14 14:09:09 cyclope kernel:  ? aa_fqlookupn_profile+0xdc0/0xdc0
          nov. 14 14:09:09 cyclope kernel:  ? aa_compute_fperms+0x4b5/0x640
          nov. 14 14:09:09 cyclope kernel:  ? disconnect.isra.2+0x1b0/0x1b0
          nov. 14 14:09:09 cyclope kernel:  ? aa_str_perms+0x8d/0xe0
          nov. 14 14:09:09 cyclope kernel:  profile_transition+0x932/0x2d40
          nov. 14 14:09:09 cyclope kernel:  ? up_read+0x1a/0x40
          nov. 14 14:09:09 cyclope kernel:  ? ext4_xattr_get+0x15c/0xaf0 [ext4]
          nov. 14 14:09:09 cyclope kernel:  ? x_table_lookup+0x190/0x190
          nov. 14 14:09:09 cyclope kernel:  ? ext4_xattr_ibody_get+0x590/0x590 [ext4]
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? ext4_xattr_security_get+0x1a/0x20 [ext4]
          nov. 14 14:09:09 cyclope kernel:  ? __vfs_getxattr+0x6d/0xa0
          nov. 14 14:09:09 cyclope kernel:  ? get_vfs_caps_from_disk+0x114/0x720
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? tsc_resume+0x10/0x10
          nov. 14 14:09:09 cyclope kernel:  ? get_vfs_caps_from_disk+0x720/0x720
          nov. 14 14:09:09 cyclope kernel:  ? native_sched_clock_from_tsc+0x201/0x2b0
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock_cpu+0x1b/0x170
          nov. 14 14:09:09 cyclope kernel:  ? find_held_lock+0x3c/0x1e0
          nov. 14 14:09:09 cyclope kernel:  ? rb_insert_color_cached+0x1660/0x1660
          nov. 14 14:09:09 cyclope kernel:  apparmor_bprm_set_creds+0x1479/0x1f70
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? handle_onexec+0x31d0/0x31d0
          nov. 14 14:09:09 cyclope kernel:  ? tsc_resume+0x10/0x10
          nov. 14 14:09:09 cyclope kernel:  ? graph_lock+0xd0/0xd0
          nov. 14 14:09:09 cyclope kernel:  ? tsc_resume+0x10/0x10
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock_cpu+0x1b/0x170
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock+0x9/0x10
          nov. 14 14:09:09 cyclope kernel:  ? sched_clock_cpu+0x1b/0x170
          nov. 14 14:09:09 cyclope kernel:  ? find_held_lock+0x3c/0x1e0
          nov. 14 14:09:09 cyclope kernel:  security_bprm_set_creds+0x5a/0x80
          nov. 14 14:09:09 cyclope kernel:  prepare_binprm+0x366/0x980
          nov. 14 14:09:09 cyclope kernel:  ? install_exec_creds+0x150/0x150
          nov. 14 14:09:09 cyclope kernel:  ? __might_fault+0x89/0xb0
          nov. 14 14:09:09 cyclope kernel:  ? up_read+0x40/0x40
          nov. 14 14:09:09 cyclope kernel:  ? get_user_arg_ptr.isra.18+0x2c/0x70
          nov. 14 14:09:09 cyclope kernel:  ? count.isra.20.constprop.32+0x7c/0xf0
          nov. 14 14:09:09 cyclope kernel:  do_execveat_common.isra.30+0x12a9/0x2350
          nov. 14 14:09:09 cyclope kernel:  ? prepare_bprm_creds+0x100/0x100
          nov. 14 14:09:09 cyclope kernel:  ? _raw_spin_unlock+0x22/0x30
          nov. 14 14:09:09 cyclope kernel:  ? deactivate_slab.isra.62+0x49d/0x5e0
          nov. 14 14:09:09 cyclope kernel:  ? save_stack_trace+0x16/0x20
          nov. 14 14:09:09 cyclope kernel:  ? init_object+0x88/0x90
          nov. 14 14:09:09 cyclope kernel:  ? ___slab_alloc+0x520/0x590
          nov. 14 14:09:09 cyclope kernel:  ? ___slab_alloc+0x520/0x590
          nov. 14 14:09:09 cyclope kernel:  ? kasan_check_write+0x14/0x20
          nov. 14 14:09:09 cyclope kernel:  ? memcg_kmem_get_cache+0x970/0x970
          nov. 14 14:09:09 cyclope kernel:  ? kasan_unpoison_shadow+0x35/0x50
          nov. 14 14:09:09 cyclope kernel:  ? glob_match+0x730/0x730
          nov. 14 14:09:09 cyclope kernel:  ? kmem_cache_alloc+0x225/0x280
          nov. 14 14:09:09 cyclope kernel:  ? getname_flags+0xb8/0x510
          nov. 14 14:09:09 cyclope kernel:  ? mm_fault_error+0x2e0/0x2e0
          nov. 14 14:09:09 cyclope kernel:  ? getname_flags+0xf6/0x510
          nov. 14 14:09:09 cyclope kernel:  ? ptregs_sys_vfork+0x10/0x10
          nov. 14 14:09:09 cyclope kernel:  SyS_execve+0x2c/0x40
          nov. 14 14:09:09 cyclope kernel:  do_syscall_64+0x228/0x650
          nov. 14 14:09:09 cyclope kernel:  ? syscall_return_slowpath+0x2f0/0x2f0
          nov. 14 14:09:09 cyclope kernel:  ? syscall_return_slowpath+0x167/0x2f0
          nov. 14 14:09:09 cyclope kernel:  ? prepare_exit_to_usermode+0x220/0x220
          nov. 14 14:09:09 cyclope kernel:  ? prepare_exit_to_usermode+0xda/0x220
          nov. 14 14:09:09 cyclope kernel:  ? perf_trace_sys_enter+0x1060/0x1060
          nov. 14 14:09:09 cyclope kernel:  ? __put_user_4+0x1c/0x30
          nov. 14 14:09:09 cyclope kernel:  entry_SYSCALL64_slow_path+0x25/0x25
          nov. 14 14:09:09 cyclope kernel: RIP: 0033:0x7f9320f23637
          nov. 14 14:09:09 cyclope kernel: RSP: 002b:00007fff783be338 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
          nov. 14 14:09:09 cyclope kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9320f23637
          nov. 14 14:09:09 cyclope kernel: RDX: 0000558c35002a70 RSI: 0000558c3505bd10 RDI: 0000558c35018b90
          nov. 14 14:09:09 cyclope kernel: RBP: 0000558c34b63ae8 R08: 0000558c3505bd10 R09: 0000000000000080
          nov. 14 14:09:09 cyclope kernel: R10: 0000000000000095 R11: 0000000000000202 R12: 0000000000000001
          nov. 14 14:09:09 cyclope kernel: R13: 0000558c35018b90 R14: 0000558c3505bd18 R15: 0000558c3505bd10
      
      Fixes: 4227c333 ("apparmor: Move path lookup to using preallocated buffers")
      BugLink: http://bugs.launchpad.net/bugs/173228Reported-by: default avatarAlban Browaeys <prahal@yahoo.com>
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      5d7c44ef
    • John Johansen's avatar
      apparmor: fix profile attachment for special unconfined profiles · 06d426d1
      John Johansen authored
      It used to be that unconfined would never attach. However that is not
      the case anymore as some special profiles can be marked as unconfined,
      that are not the namespaces unconfined profile, and may have an
      attachment.
      
      Fixes: f1bd9041 ("apparmor: add the base fns() for domain labels")
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      06d426d1
    • John Johansen's avatar
      apparmor: ensure that undecidable profile attachments fail · 844b8292
      John Johansen authored
      Profiles that have an undecidable overlap in their attachments are
      being incorrectly handled. Instead of failing to attach the first one
      encountered is being used.
      
      eg.
        profile A /** { .. }
        profile B /*foo { .. }
      
      have an unresolvable longest left attachment, they both have an exact
      match on / and then have an overlapping expression that has no clear
      winner.
      
      Currently the winner will be the profile that is loaded first which
      can result in non-deterministic behavior. Instead in this situation
      the exec should fail.
      
      Fixes: 898127c3 ("AppArmor: functions for domain transitions")
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      844b8292
    • John Johansen's avatar
      apparmor: fix leak of null profile name if profile allocation fails · 4633307e
      John Johansen authored
      Fixes: d07881d2 ("apparmor: move new_null_profile to after profile lookup fns()")
      Reported-by: default avatarSeth Arnold <seth.arnold@canonical.com>
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      4633307e
    • Colin Ian King's avatar
      apparmor: remove unused redundant variable stop · e3bcfc14
      Colin Ian King authored
      The boolean variable 'stop' is being set but never read. This
      is a redundant variable and can be removed.
      
      Cleans up clang warning: Value stored to 'stop' is never read
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      e3bcfc14
    • Thomas Meyer's avatar
      apparmor: Fix bool initialization/comparison · 954317fe
      Thomas Meyer authored
      Bool initializations should use true and false. Bool tests don't need
      comparisons.
      Signed-off-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      954317fe
    • Arnd Bergmann's avatar
      apparmor: initialized returned struct aa_perms · 7bba39ae
      Arnd Bergmann authored
      gcc-4.4 points out suspicious code in compute_mnt_perms, where
      the aa_perms structure is only partially initialized before getting
      returned:
      
      security/apparmor/mount.c: In function 'compute_mnt_perms':
      security/apparmor/mount.c:227: error: 'perms.prompt' is used uninitialized in this function
      security/apparmor/mount.c:227: error: 'perms.hide' is used uninitialized in this function
      security/apparmor/mount.c:227: error: 'perms.cond' is used uninitialized in this function
      security/apparmor/mount.c:227: error: 'perms.complain' is used uninitialized in this function
      security/apparmor/mount.c:227: error: 'perms.stop' is used uninitialized in this function
      security/apparmor/mount.c:227: error: 'perms.deny' is used uninitialized in this function
      
      Returning or assigning partially initialized structures is a bit tricky,
      in particular it is explicitly allowed in c99 to assign a partially
      initialized structure to another, as long as only members are read that
      have been initialized earlier. Looking at what various compilers do here,
      the version that produced the warning copied uninitialized stack data,
      while newer versions (and also clang) either set the other members to
      zero or don't update the parts of the return buffer that are not modified
      in the temporary structure, but they never warn about this.
      
      In case of apparmor, it seems better to be a little safer and always
      initialize the aa_perms structure. Most users already do that, this
      changes the remaining ones, including the one instance that I got the
      warning for.
      
      Fixes: fa488437d0f9 ("apparmor: add mount mediation")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarSeth Arnold <seth.arnold@canonical.com>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      7bba39ae
    • Colin Ian King's avatar
      apparmor: fix spelling mistake: "resoure" -> "resource" · 5933a627
      Colin Ian King authored
      Trivial fix to spelling mistake in comment and also with text in
      audit_resource call.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      5933a627
  2. 29 Oct, 2017 29 commits
    • Linus Torvalds's avatar
      Linux 4.14-rc7 · 0b07194b
      Linus Torvalds authored
      0b07194b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 19e12196
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix route leak in xfrm_bundle_create().
      
       2) In mac80211, validate user rate mask before configuring it. From
          Johannes Berg.
      
       3) Properly enforce memory limits in fair queueing code, from Toke
          Hoiland-Jorgensen.
      
       4) Fix lockdep splat in inet_csk_route_req(), from Eric Dumazet.
      
       5) Fix TSO header allocation and management in mvpp2 driver, from Yan
          Markman.
      
       6) Don't take socket lock in BH handler in strparser code, from Tom
          Herbert.
      
       7) Don't show sockets from other namespaces in AF_UNIX code, from
          Andrei Vagin.
      
       8) Fix double free in error path of tap_open(), from Girish Moodalbail.
      
       9) Fix TX map failure path in igb and ixgbe, from Jean-Philippe Brucker
          and Alexander Duyck.
      
      10) Fix DCB mode programming in stmmac driver, from Jose Abreu.
      
      11) Fix err_count handling in various tunnels (ipip, ip6_gre). From Xin
          Long.
      
      12) Properly align SKB head before building SKB in tuntap, from Jason
          Wang.
      
      13) Avoid matching qdiscs with a zero handle during lookups, from Cong
          Wang.
      
      14) Fix various endianness bugs in sctp, from Xin Long.
      
      15) Fix tc filter callback races and add selftests which trigger the
          problem, from Cong Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
        selftests: Introduce a new test case to tc testsuite
        selftests: Introduce a new script to generate tc batch file
        net_sched: fix call_rcu() race on act_sample module removal
        net_sched: add rtnl assertion to tcf_exts_destroy()
        net_sched: use tcf_queue_work() in tcindex filter
        net_sched: use tcf_queue_work() in rsvp filter
        net_sched: use tcf_queue_work() in route filter
        net_sched: use tcf_queue_work() in u32 filter
        net_sched: use tcf_queue_work() in matchall filter
        net_sched: use tcf_queue_work() in fw filter
        net_sched: use tcf_queue_work() in flower filter
        net_sched: use tcf_queue_work() in flow filter
        net_sched: use tcf_queue_work() in cgroup filter
        net_sched: use tcf_queue_work() in bpf filter
        net_sched: use tcf_queue_work() in basic filter
        net_sched: introduce a workqueue for RCU callbacks of tc filter
        sctp: fix some type cast warnings introduced since very beginning
        sctp: fix a type cast warnings that causes a_rwnd gets the wrong value
        sctp: fix some type cast warnings introduced by transport rhashtable
        sctp: fix some type cast warnings introduced by stream reconf
        ...
      19e12196
    • David S. Miller's avatar
      Merge branch 'net_sched-fix-races-with-RCU-callbacks' · 6c325f4e
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net_sched: fix races with RCU callbacks
      
      Recently, the RCU callbacks used in TC filters and TC actions keep
      drawing my attention, they introduce at least 4 race condition bugs:
      
      1. A simple one fixed by Daniel:
      
      commit c78e1746
      Author: Daniel Borkmann <daniel@iogearbox.net>
      Date:   Wed May 20 17:13:33 2015 +0200
      
          net: sched: fix call_rcu() race on classifier module unloads
      
      2. A very nasty one fixed by me:
      
      commit 1697c4bb
      Author: Cong Wang <xiyou.wangcong@gmail.com>
      Date:   Mon Sep 11 16:33:32 2017 -0700
      
          net_sched: carefully handle tcf_block_put()
      
      3. Two more bugs found by Chris:
      https://patchwork.ozlabs.org/patch/826696/
      https://patchwork.ozlabs.org/patch/826695/
      
      Usually RCU callbacks are simple, however for TC filters and actions,
      they are complex because at least TC actions could be destroyed
      together with the TC filter in one callback. And RCU callbacks are
      invoked in BH context, without locking they are parallel too. All of
      these contribute to the cause of these nasty bugs.
      
      Alternatively, we could also:
      
      a) Introduce a spinlock to serialize these RCU callbacks. But as I
      said in commit 1697c4bb ("net_sched: carefully handle
      tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
      Potentially we need to do a lot of work to make it possible (if not
      impossible).
      
      b) Just get rid of these RCU callbacks, because they are not
      necessary at all, callers of these call_rcu() are all on slow paths
      and holding RTNL lock, so blocking is allowed in their contexts.
      However, David and Eric dislike adding synchronize_rcu() here.
      
      As suggested by Paul, we could defer the work to a workqueue and
      gain the permission of holding RTNL again without any performance
      impact, however, in tcf_block_put() we could have a deadlock when
      flushing workqueue while hodling RTNL lock, the trick here is to
      defer the work itself in workqueue and make it queued after all
      other works so that we keep the same ordering to avoid any
      use-after-free. Please see the first patch for details.
      
      Patch 1 introduces the infrastructure, patch 2~12 move each
      tc filter to the new tc filter workqueue, patch 13 adds
      an assertion to catch potential bugs like this, patch 14
      closes another rcu callback race, patch 15 and patch 16 add
      new test cases.
      ====================
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c325f4e
    • Chris Mi's avatar
      selftests: Introduce a new test case to tc testsuite · 31c2611b
      Chris Mi authored
      In this patchset, we fixed a tc bug. This patch adds the test case
      that reproduces the bug. To run this test case, user should specify
      an existing NIC device:
        # sudo ./tdc.py -d enp4s0f0
      
      This test case belongs to category "flower". If user doesn't specify
      a NIC device, the test cases belong to "flower" will not be run.
      
      In this test case, we create 1M filters and all filters share the same
      action. When destroying all filters, kernel should not panic. It takes
      about 18s to run it.
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31c2611b
    • Chris Mi's avatar
      selftests: Introduce a new script to generate tc batch file · 7f071998
      Chris Mi authored
        # ./tdc_batch.py -h
        usage: tdc_batch.py [-h] [-n NUMBER] [-o] [-s] [-p] device file
      
        TC batch file generator
      
        positional arguments:
          device                device name
          file                  batch file name
      
        optional arguments:
          -h, --help            show this help message and exit
          -n NUMBER, --number NUMBER
                                how many lines in batch file
          -o, --skip_sw         skip_sw (offload), by default skip_hw
          -s, --share_action    all filters share the same action
          -p, --prio            all filters have different prio
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f071998
    • Cong Wang's avatar
      net_sched: fix call_rcu() race on act_sample module removal · 46e235c1
      Cong Wang authored
      Similar to commit c78e1746
      ("net: sched: fix call_rcu() race on classifier module unloads"),
      we need to wait for flying RCU callback tcf_sample_cleanup_rcu().
      
      Cc: Yotam Gigi <yotamg@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46e235c1
    • Cong Wang's avatar
      net_sched: add rtnl assertion to tcf_exts_destroy() · 2d132eba
      Cong Wang authored
      After previous patches, it is now safe to claim that
      tcf_exts_destroy() is always called with RTNL lock.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d132eba
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in tcindex filter · 27ce4f05
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27ce4f05
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in rsvp filter · d4f84a41
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4f84a41
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in route filter · c2f3f31d
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2f3f31d
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in u32 filter · c0d378ef
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0d378ef
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in matchall filter · df2735ee
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df2735ee
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in fw filter · e071dff2
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e071dff2
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in flower filter · 0552c8af
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0552c8af
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in flow filter · 94cdb475
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94cdb475
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in cgroup filter · b1b5b04f
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1b5b04f
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in bpf filter · e910af67
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e910af67
    • Cong Wang's avatar
      net_sched: use tcf_queue_work() in basic filter · c96a4838
      Cong Wang authored
      Defer the tcf_exts_destroy() in RCU callback to
      tc filter workqueue and get RTNL lock.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c96a4838
    • Cong Wang's avatar
      net_sched: introduce a workqueue for RCU callbacks of tc filter · 7aa0045d
      Cong Wang authored
      This patch introduces a dedicated workqueue for tc filters
      so that each tc filter's RCU callback could defer their
      action destroy work to this workqueue. The helper
      tcf_queue_work() is introduced for them to use.
      
      Because we hold RTNL lock when calling tcf_block_put(), we
      can not simply flush works inside it, therefore we have to
      defer it again to this workqueue and make sure all flying RCU
      callbacks have already queued their work before this one, in
      other words, to ensure this is the last one to execute to
      prevent any use-after-free.
      
      On the other hand, this makes tcf_block_put() ugly and
      harder to understand. Since David and Eric strongly dislike
      adding synchronize_rcu(), this is probably the only
      solution that could make everyone happy.
      
      Please also see the code comments below.
      Reported-by: default avatarChris Mi <chrism@mellanox.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7aa0045d
    • David S. Miller's avatar
      Merge branch 'sctp-endianness-fixes' · 8c83c885
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: a bunch of fixes for some sparse warnings
      
      As Eric noticed, when running 'make C=2 M=net/sctp/', a plenty of
      warnings or errors checked by sparse appear. They are all problems
      about Endian and type cast.
      
      Most of them are just warnings by which no issues could be caused
      while some might be bugs.
      
      This patchset fixes them with four patches basically according to
      how they are introduced.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c83c885
    • Xin Long's avatar
      sctp: fix some type cast warnings introduced since very beginning · 978aa047
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      They are there since very beginning.
      
      Note after this patch, there still one warning left in
      sctp_outq_flush():
        sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM)
      
      Since it has been moved to sctp_stream_outq_migrate on net-next,
      to avoid the extra job when merging net-next to net, I will post
      the fix for it after the merging is done.
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      978aa047
    • Xin Long's avatar
      sctp: fix a type cast warnings that causes a_rwnd gets the wrong value · f6fc6bc0
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      Commit d4d6fb57 ("sctp: Try not to change a_rwnd when faking a
      SACK from SHUTDOWN.") expected to use the peers old rwnd and add
      our flight size to the a_rwnd. But with the wrong Endian, it may
      not work as well as expected.
      
      So fix it by converting to the right value.
      
      Fixes: d4d6fb57 ("sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN.")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6fc6bc0
    • Xin Long's avatar
      sctp: fix some type cast warnings introduced by transport rhashtable · 8d32503e
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      They are introduced by not aware of Endian for the port when
      coding transport rhashtable patches.
      
      Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d32503e
    • Xin Long's avatar
      sctp: fix some type cast warnings introduced by stream reconf · 1da4fc97
      Xin Long authored
      These warnings were found by running 'make C=2 M=net/sctp/'.
      
      They are introduced by not aware of Endian when coding stream
      reconf patches.
      
      Since commit c0d8bab6 ("sctp: add get and set sockopt for
      reconf_enable") enabled stream reconf feature for users, the
      Fixes tag below would use it.
      
      Fixes: c0d8bab6 ("sctp: add get and set sockopt for reconf_enable")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1da4fc97
    • Cong Wang's avatar
      net_sched: avoid matching qdisc with zero handle · 50317fce
      Cong Wang authored
      Davide found the following script triggers a NULL pointer
      dereference:
      
      ip l a name eth0 type dummy
      tc q a dev eth0 parent :1 handle 1: htb
      
      This is because for a freshly created netdevice noop_qdisc
      is attached and when passing 'parent :1', kernel actually
      tries to match the major handle which is 0 and noop_qdisc
      has handle 0 so is matched by mistake. Commit 69012ae4
      tries to fix a similar bug but still misses this case.
      
      Handle 0 is not a valid one, should be just skipped. In
      fact, kernel uses it as TC_H_UNSPEC.
      
      Fixes: 69012ae4 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
      Fixes: 59cc1f61 ("net: sched:convert qdisc linked list to hashtable")
      Reported-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50317fce
    • Xin Long's avatar
      sctp: reset owner sk for data chunks on out queues when migrating a sock · d04adf1b
      Xin Long authored
      Now when migrating sock to another one in sctp_sock_migrate(), it only
      resets owner sk for the data in receive queues, not the chunks on out
      queues.
      
      It would cause that data chunks length on the sock is not consistent
      with sk sk_wmem_alloc. When closing the sock or freeing these chunks,
      the old sk would never be freed, and the new sock may crash due to
      the overflow sk_wmem_alloc.
      
      syzbot found this issue with this series:
      
        r0 = socket$inet_sctp()
        sendto$inet(r0)
        listen(r0)
        accept4(r0)
        close(r0)
      
      Although listen() should have returned error when one TCP-style socket
      is in connecting (I may fix this one in another patch), it could also
      be reproduced by peeling off an assoc.
      
      This issue is there since very beginning.
      
      This patch is to reset owner sk for the chunks on out queues so that
      sk sk_wmem_alloc has correct value after accept one sock or peeloff
      an assoc to one sock.
      
      Note that when resetting owner sk for chunks on outqueue, it has to
      sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk
      first and then sctp_set_owner_w them after changing assoc->base.sk,
      due to that sctp_wfree and it's callees are using assoc->base.sk.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d04adf1b
    • David S. Miller's avatar
      Merge branch 'sockmap-fixes' · 151516fa
      David S. Miller authored
      John Fastabend says:
      
      ====================
      net: sockmap fixes
      
      Last two fixes (as far as I know) for sockmap code this round.
      
      First, we are using the qdisc cb structure when making the data end
      calculation. This is really just wrong so, store it with the other
      metadata in the correct tcp_skb_cb sturct to avoid breaking things.
      
      Next, with recent work to attach multiple programs to a cgroup a
      specific enumeration of return codes was agreed upon. However,
      I wrote the sk_skb program types before seeing this work and used
      a different convention. Patch 2 in the series aligns the return
      codes to avoid breaking with this infrastructure and also aligns
      with other programming conventions to avoid being the odd duck out
      forcing programs to remember SK_SKB programs are different. Pusing
      to net because its a user visible change. With this SK_SKB program
      return codes are the same as other cgroup program types.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      151516fa
    • John Fastabend's avatar
      bpf: rename sk_actions to align with bpf infrastructure · bfa64075
      John Fastabend authored
      Recent additions to support multiple programs in cgroups impose
      a strict requirement, "all yes is yes, any no is no". To enforce
      this the infrastructure requires the 'no' return code, SK_DROP in
      this case, to be 0.
      
      To apply these rules to SK_SKB program types the sk_actions return
      codes need to be adjusted.
      
      This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
      SK_ABORTED to remove any chance that the API may allow aborted
      program flows to be passed up the stack. This would be incorrect
      behavior and allow programs to break existing policies.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfa64075
    • John Fastabend's avatar
      bpf: bpf_compute_data uses incorrect cb structure · 8108a775
      John Fastabend authored
      SK_SKB program types use bpf_compute_data to store the end of the
      packet data. However, bpf_compute_data assumes the cb is stored in the
      qdisc layer format. But, for SK_SKB this is the wrong layer of the
      stack for this type.
      
      It happens to work (sort of!) because in most cases nothing happens
      to be overwritten today. This is very fragile and error prone.
      Fortunately, we have another hole in tcp_skb_cb we can use so lets
      put the data_end value there.
      
      Note, SK_SKB program types do not use data_meta, they are failed by
      sk_skb_is_valid_access().
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8108a775
  3. 28 Oct, 2017 3 commits