1. 15 Jan, 2020 7 commits
    • John Fastabend's avatar
      bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt buf · d468e477
      John Fastabend authored
      It is possible to build a plaintext buffer using push helper that is larger
      than the allocated encrypt buffer. When this record is pushed to crypto
      layers this can result in a NULL pointer dereference because the crypto
      API expects the encrypt buffer is large enough to fit the plaintext
      buffer. Kernel splat below.
      
      To resolve catch the cases this can happen and split the buffer into two
      records to send individually. Unfortunately, there is still one case to
      handle where the split creates a zero sized buffer. In this case we merge
      the buffers and unmark the split. This happens when apply is zero and user
      pushed data beyond encrypt buffer. This fixes the original case as well
      because the split allocated an encrypt buffer larger than the plaintext
      buffer and the merge simply moves the pointers around so we now have
      a reference to the new (larger) encrypt buffer.
      
      Perhaps its not ideal but it seems the best solution for a fixes branch
      and avoids handling these two cases, (a) apply that needs split and (b)
      non apply case. The are edge cases anyways so optimizing them seems not
      necessary unless someone wants later in next branches.
      
      [  306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [...]
      [  306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
      [...]
      [  306.770350] Call Trace:
      [  306.770956]  scatterwalk_map_and_copy+0x6c/0x80
      [  306.772026]  gcm_enc_copy_hash+0x4b/0x50
      [  306.772925]  gcm_hash_crypt_remain_continue+0xef/0x110
      [  306.774138]  gcm_hash_crypt_continue+0xa1/0xb0
      [  306.775103]  ? gcm_hash_crypt_continue+0xa1/0xb0
      [  306.776103]  gcm_hash_assoc_remain_continue+0x94/0xa0
      [  306.777170]  gcm_hash_assoc_continue+0x9d/0xb0
      [  306.778239]  gcm_hash_init_continue+0x8f/0xa0
      [  306.779121]  gcm_hash+0x73/0x80
      [  306.779762]  gcm_encrypt_continue+0x6d/0x80
      [  306.780582]  crypto_gcm_encrypt+0xcb/0xe0
      [  306.781474]  crypto_aead_encrypt+0x1f/0x30
      [  306.782353]  tls_push_record+0x3b9/0xb20 [tls]
      [  306.783314]  ? sk_psock_msg_verdict+0x199/0x300
      [  306.784287]  bpf_exec_tx_verdict+0x3f2/0x680 [tls]
      [  306.785357]  tls_sw_sendmsg+0x4a3/0x6a0 [tls]
      
      test_sockmap test signature to trigger bug,
      
      [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
      d468e477
    • John Fastabend's avatar
      bpf: Sockmap/tls, msg_push_data may leave end mark in place · cf21e9ba
      John Fastabend authored
      Leaving an incorrect end mark in place when passing to crypto
      layer will cause crypto layer to stop processing data before
      all data is encrypted. To fix clear the end mark on push
      data instead of expecting users of the helper to clear the
      mark value after the fact.
      
      This happens when we push data into the middle of a skmsg and
      have room for it so we don't do a set of copies that already
      clear the end flag.
      
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com
      cf21e9ba
    • John Fastabend's avatar
      bpf: Sockmap, skmsg helper overestimates push, pull, and pop bounds · 6562e29c
      John Fastabend authored
      In the push, pull, and pop helpers operating on skmsg objects to make
      data writable or insert/remove data we use this bounds check to ensure
      specified data is valid,
      
       /* Bounds checks: start and pop must be inside message */
       if (start >= offset + l || last >= msg->sg.size)
           return -EINVAL;
      
      The problem here is offset has already included the length of the
      current element the 'l' above. So start could be past the end of
      the scatterlist element in the case where start also points into an
      offset on the last skmsg element.
      
      To fix do the accounting slightly different by adding the length of
      the previous entry to offset at the start of the iteration. And
      ensure its initialized to zero so that the first iteration does
      nothing.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-5-john.fastabend@gmail.com
      6562e29c
    • John Fastabend's avatar
      bpf: Sockmap/tls, push write_space updates through ulp updates · 33bfe20d
      John Fastabend authored
      When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
      and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
      don't push the write_space callback up and instead simply overwrite the
      op with the psock stored previous op. This may or may not be correct so
      to ensure we don't overwrite the TLS write space hook pass this field to
      the ULP and have it fixup the ctx.
      
      This completes a previous fix that pushed the ops through to the ULP
      but at the time missed doing this for write_space, presumably because
      write_space TLS hook was added around the same time.
      
      Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
      33bfe20d
    • John Fastabend's avatar
      bpf: Sockmap, ensure sock lock held during tear down · 7e81a353
      John Fastabend authored
      The sock_map_free() and sock_hash_free() paths used to delete sockmap
      and sockhash maps walk the maps and destroy psock and bpf state associated
      with the socks in the map. When done the socks no longer have BPF programs
      attached and will function normally. This can happen while the socks in
      the map are still "live" meaning data may be sent/received during the walk.
      
      Currently, though we don't take the sock_lock when the psock and bpf state
      is removed through this path. Specifically, this means we can be writing
      into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc.
      while they are also being called from the networking side. This is not
      safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we
      believed it was safe. Further its not clear to me its even a good idea
      to try and do this on "live" sockets while networking side might also
      be using the socket. Instead of trying to reason about using the socks
      from both sides lets realize that every use case I'm aware of rarely
      deletes maps, in fact kubernetes/Cilium case builds map at init and
      never tears it down except on errors. So lets do the simple fix and
      grab sock lock.
      
      This patch wraps sock deletes from maps in sock lock and adds some
      annotations so we catch any other cases easier.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
      7e81a353
    • John Fastabend's avatar
      bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop · 4da6a196
      John Fastabend authored
      When a sockmap is free'd and a socket in the map is enabled with tls
      we tear down the bpf context on the socket, the psock struct and state,
      and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform
      the tls stack it needs to update its saved sock ops so that when the tls
      socket is later destroyed it doesn't try to call the now destroyed psock
      hooks.
      
      This is about keeping stacked ULPs in good shape so they always have
      the right set of stacked ops.
      
      However, recently unhash() hook was removed from TLS side. But, the
      sockmap/bpf side is not doing any extra work to update the unhash op
      when is torn down instead expecting TLS side to manage it. So both
      TLS and sockmap believe the other side is managing the op and instead
      no one updates the hook so it continues to point at tcp_bpf_unhash().
      When unhash hook is called we call tcp_bpf_unhash() which detects the
      psock has already been destroyed and calls sk->sk_prot_unhash() which
      calls tcp_bpf_unhash() yet again and so on looping and hanging the core.
      
      To fix have sockmap tear down logic fixup the stale pointer.
      
      Fixes: 5d92e631 ("net/tls: partially revert fix transition through disconnect with close")
      Reported-by: syzbot+83979935eb6304f8cd46@syzkaller.appspotmail.com
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-2-john.fastabend@gmail.com
      4da6a196
    • Daniel Borkmann's avatar
      bpf: Fix incorrect verifier simulation of ARSH under ALU32 · 0af2ffc9
      Daniel Borkmann authored
      Anatoly has been fuzzing with kBdysch harness and reported a hang in one
      of the outcomes:
      
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (85) call bpf_get_socket_cookie#46
        1: R0_w=invP(id=0) R10=fp0
        1: (57) r0 &= 808464432
        2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
        2: (14) w0 -= 810299440
        3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
        3: (c4) w0 s>>= 1
        4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
        4: (76) if w0 s>= 0x30303030 goto pc+216
        221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
        221: (95) exit
        processed 6 insns (limit 1000000) [...]
      
      Taking a closer look, the program was xlated as follows:
      
        # ./bpftool p d x i 12
        0: (85) call bpf_get_socket_cookie#7800896
        1: (bf) r6 = r0
        2: (57) r6 &= 808464432
        3: (14) w6 -= 810299440
        4: (c4) w6 s>>= 1
        5: (76) if w6 s>= 0x30303030 goto pc+216
        6: (05) goto pc-1
        7: (05) goto pc-1
        8: (05) goto pc-1
        [...]
        220: (05) goto pc-1
        221: (05) goto pc-1
        222: (95) exit
      
      Meaning, the visible effect is very similar to f54c7898 ("bpf: Fix
      precision tracking for unbounded scalars"), that is, the fall-through
      branch in the instruction 5 is considered to be never taken given the
      conclusion from the min/max bounds tracking in w6, and therefore the
      dead-code sanitation rewrites it as goto pc-1. However, real-life input
      disagrees with verification analysis since a soft-lockup was observed.
      
      The bug sits in the analysis of the ARSH. The definition is that we shift
      the target register value right by K bits through shifting in copies of
      its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the
      register into 32 bit mode, same happens after simulating the operation.
      However, for the case of simulating the actual ARSH, we don't take the
      mode into account and act as if it's always 64 bit, but location of sign
      bit is different:
      
        dst_reg->smin_value >>= umin_val;
        dst_reg->smax_value >>= umin_val;
        dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val);
      
      Consider an unknown R0 where bpf_get_socket_cookie() (or others) would
      for example return 0xffff. With the above ARSH simulation, we'd see the
      following results:
      
        [...]
        1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0
        1: (85) call bpf_get_socket_cookie#46
        2: R0_w=invP(id=0) R10=fp0
        2: (57) r0 &= 808464432
          -> R0_runtime = 0x3030
        3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
        3: (14) w0 -= 810299440
          -> R0_runtime = 0xcfb40000
        4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
                                    (0xffffffff)
        4: (c4) w0 s>>= 1
          -> R0_runtime = 0xe7da0000
        5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
                                    (0x67c00000)           (0x7ffbfff8)
        [...]
      
      In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011
      0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that
      is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly
      retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into
      0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000'
      and means here that the simulation didn't retain the sign bit. With above
      logic, the updates happen on the 64 bit min/max bounds and given we coerced
      the register, the sign bits of the bounds are cleared as well, meaning, we
      need to force the simulation into s32 space for 32 bit alu mode.
      
      Verification after the fix below. We're first analyzing the fall-through branch
      on 32 bit signed >= test eventually leading to rejection of the program in this
      specific case:
      
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (b7) r2 = 808464432
        1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0
        1: (85) call bpf_get_socket_cookie#46
        2: R0_w=invP(id=0) R10=fp0
        2: (bf) r6 = r0
        3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0
        3: (57) r6 &= 808464432
        4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
        4: (14) w6 -= 810299440
        5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
        5: (c4) w6 s>>= 1
        6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
                                                    (0x67c00000)          (0xfffbfff8)
        6: (76) if w6 s>= 0x30303030 goto pc+216
        7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
        7: (30) r0 = *(u8 *)skb[808464432]
        BPF_LD_[ABS|IND] uses reserved fields
        processed 8 insns (limit 1000000) [...]
      
      Fixes: 9cbe1f5a ("bpf/verifier: improve register value range tracking with ARSH")
      Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
      0af2ffc9
  2. 12 Jan, 2020 1 commit
  3. 10 Jan, 2020 1 commit
  4. 09 Jan, 2020 1 commit
  5. 07 Jan, 2020 7 commits
    • Jose Abreu's avatar
      net: stmmac: Fixed link does not need MDIO Bus · da29f2d8
      Jose Abreu authored
      When using fixed link we don't need the MDIO bus support.
      Reported-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Reported-by: default avatarkernelci.org bot <bot@kernelci.org>
      Fixes: d3e014ec ("net: stmmac: platform: Fix MDIO init for platforms without PHY")
      Signed-off-by: default avatarJose Abreu <Jose.Abreu@synopsys.com>
      Acked-by: default avatarSriram Dash <Sriram.dash@samsung.com>
      Tested-by: default avatarPatrice Chotard <patrice.chotard@st.com>
      Tested-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Acked-by: default avatarNeil Armstrong <narmstrong@baylibre.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: Florian Fainelli <f.fainelli@gmail> # Lamobo R1 (fixed-link + MDIO sub node for roboswitch).
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da29f2d8
    • David S. Miller's avatar
      Merge branch 'vlan-rtnetlink-newlink-fixes' · b57e1fff
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      vlan: rtnetlink newlink fixes
      
      First patch fixes a potential memory leak found by syzbot
      
      Second patch makes vlan_changelink() aware of errors
      and report them to user.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b57e1fff
    • Eric Dumazet's avatar
      vlan: vlan_changelink() should propagate errors · eb8ef2a3
      Eric Dumazet authored
      Both vlan_dev_change_flags() and vlan_dev_set_egress_priority()
      can return an error. vlan_changelink() should not ignore them.
      
      Fixes: 07b5b17e ("[VLAN]: Use rtnl_link API")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb8ef2a3
    • Eric Dumazet's avatar
      vlan: fix memory leak in vlan_dev_set_egress_priority · 9bbd917e
      Eric Dumazet authored
      There are few cases where the ndo_uninit() handler might be not
      called if an error happens while device is initialized.
      
      Since vlan_newlink() calls vlan_changelink() before
      trying to register the netdevice, we need to make sure
      vlan_dev_uninit() has been called at least once,
      or we might leak allocated memory.
      
      BUG: memory leak
      unreferenced object 0xffff888122a206c0 (size 32):
        comm "syz-executor511", pid 7124, jiffies 4294950399 (age 32.240s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 61 73 00 00 00 00 00 00 00 00  ......as........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000000eb3bb85>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<000000000eb3bb85>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<000000000eb3bb85>] slab_alloc mm/slab.c:3320 [inline]
          [<000000000eb3bb85>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
          [<000000007b99f620>] kmalloc include/linux/slab.h:556 [inline]
          [<000000007b99f620>] vlan_dev_set_egress_priority+0xcc/0x150 net/8021q/vlan_dev.c:194
          [<000000007b0cb745>] vlan_changelink+0xd6/0x140 net/8021q/vlan_netlink.c:126
          [<0000000065aba83a>] vlan_newlink+0x135/0x200 net/8021q/vlan_netlink.c:181
          [<00000000fb5dd7a2>] __rtnl_newlink+0x89a/0xb80 net/core/rtnetlink.c:3305
          [<00000000ae4273a1>] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3363
          [<00000000decab39f>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
          [<00000000accba4ee>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<00000000319fe20f>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000d51938dc>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000d51938dc>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<00000000e539ac79>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<000000006250c27e>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<000000006250c27e>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<00000000e2a156d1>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<000000008c87466e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<00000000110e3054>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<00000000d71077c8>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<00000000d71077c8>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<00000000d71077c8>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixe: 07b5b17e ("[VLAN]: Use rtnl_link API")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bbd917e
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 96b11e93
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-01-07
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 2 non-merge commits during the last 1 day(s) which contain
      a total of 2 files changed, 16 insertions(+), 4 deletions(-).
      
      The main changes are:
      
      1) Fix a use-after-free in cgroup BPF due to auto-detachment, from Roman Gushchin.
      
      2) Fix skb out-of-bounds access in ld_abs/ind instruction, from Daniel Borkmann.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96b11e93
    • Jiping Ma's avatar
      stmmac: debugfs entry name is not be changed when udev rename device name. · 481a7d15
      Jiping Ma authored
      Add one notifier for udev changes net device name.
      Fixes: b6601323ef9e ("net: stmmac: debugfs entry name is not be changed when udev rename")
      Signed-off-by: default avatarJiping Ma <jiping.ma2@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      481a7d15
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · c101fffc
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2020-01-06
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      For -stable v5.3
       ('net/mlx5: Move devlink registration before interfaces load')
      
      For -stable v5.4
       ('net/mlx5e: Fix hairpin RSS table size')
       ('net/mlx5: DR, Init lists that are used in rule's member')
       ('net/mlx5e: Always print health reporter message to dmesg')
       ('net/mlx5: DR, No need for atomic refcount for internal SW steering resources')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c101fffc
  6. 06 Jan, 2020 19 commits
    • Erez Shitrit's avatar
      net/mlx5: DR, Init lists that are used in rule's member · df55c558
      Erez Shitrit authored
      Whenever adding new member of rule object we attach it to 2 lists,
      These 2 lists should be initialized first.
      
      Fixes: 41d07074 ("net/mlx5: DR, Expose steering rule functionality")
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      df55c558
    • Eli Cohen's avatar
      net/mlx5e: Fix hairpin RSS table size · 6412bb39
      Eli Cohen authored
      Set hairpin table size to the corret size, based on the groups that
      would be created in it. Groups are laid out on the table such that a
      group occupies a range of entries in the table. This implies that the
      group ranges should have correspondence to the table they are laid upon.
      
      The patch cited below  made group 1's size to grow hence causing
      overflow of group range laid on the table.
      
      Fixes: a795d8db ("net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets")
      Signed-off-by: default avatarEli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6412bb39
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, No need for atomic refcount for internal SW steering resources · 4ce380ca
      Yevgeny Kliteynik authored
      No need for an atomic refcounter for the STE and hashtables.
      These are internal SW steering resources and they are always
      under domain mutex.
      
      This also fixes the following refcount error:
        refcount_t: addition on 0; use-after-free.
        WARNING: CPU: 9 PID: 3527 at lib/refcount.c:25 refcount_warn_saturate+0x81/0xe0
        Call Trace:
         dr_table_init_nic+0x10d/0x110 [mlx5_core]
         mlx5dr_table_create+0xb4/0x230 [mlx5_core]
         mlx5_cmd_dr_create_flow_table+0x39/0x120 [mlx5_core]
         __mlx5_create_flow_table+0x221/0x5f0 [mlx5_core]
         esw_create_offloads_fdb_tables+0x180/0x5a0 [mlx5_core]
         ...
      
      Fixes: 26d688e3 ("net/mlx5: DR, Add Steering entry (STE) utilities")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@mellanox.com>
      Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4ce380ca
    • Parav Pandit's avatar
      Revert "net/mlx5: Support lockless FTE read lookups" · 1f0593e7
      Parav Pandit authored
      This reverts commit 7dee607e.
      
      During cleanup path, FTE's parent node group is removed which is
      referenced by the FTE while freeing the FTE.
      Hence FTE's lockless read lookup optimization done in cited commit is
      not possible at the moment.
      
      Hence, revert the commit.
      
      This avoid below KAZAN call trace.
      
      [  110.390896] BUG: KASAN: use-after-free in find_root.isra.14+0x56/0x60
      [mlx5_core]
      [  110.391048] Read of size 4 at addr ffff888c19e6d220 by task
      swapper/12/0
      
      [  110.391219] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 5.5.0-rc1+
      [  110.391222] Hardware name: HP ProLiant DL380p Gen8, BIOS P70
      08/02/2014
      [  110.391225] Call Trace:
      [  110.391229]  <IRQ>
      [  110.391246]  dump_stack+0x95/0xd5
      [  110.391307]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391320]  print_address_description.constprop.5+0x20/0x320
      [  110.391379]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391435]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391441]  __kasan_report+0x149/0x18c
      [  110.391499]  ? find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391504]  kasan_report+0x12/0x20
      [  110.391511]  __asan_report_load4_noabort+0x14/0x20
      [  110.391567]  find_root.isra.14+0x56/0x60 [mlx5_core]
      [  110.391625]  del_sw_fte_rcu+0x4a/0x100 [mlx5_core]
      [  110.391633]  rcu_core+0x404/0x1950
      [  110.391640]  ? rcu_accelerate_cbs_unlocked+0x100/0x100
      [  110.391649]  ? run_rebalance_domains+0x201/0x280
      [  110.391654]  rcu_core_si+0xe/0x10
      [  110.391661]  __do_softirq+0x181/0x66c
      [  110.391670]  irq_exit+0x12c/0x150
      [  110.391675]  smp_apic_timer_interrupt+0xf0/0x370
      [  110.391681]  apic_timer_interrupt+0xf/0x20
      [  110.391684]  </IRQ>
      [  110.391695] RIP: 0010:cpuidle_enter_state+0xfa/0xba0
      [  110.391703] Code: 3d c3 9b b5 50 e8 56 75 6e fe 48 89 45 c8 0f 1f 44
      00 00 31 ff e8 a6 94 6e fe 45 84 ff 0f 85 f6 02 00 00 fb 66 0f 1f 44 00
      00 <45> 85 f6 0f 88 db 06 00 00 4d 63 fe 4b 8d 04 7f 49 8d 04 87 49 8d
      [  110.391706] RSP: 0018:ffff888c23a6fce8 EFLAGS: 00000246 ORIG_RAX:
      ffffffffffffff13
      [  110.391712] RAX: dffffc0000000000 RBX: ffffe8ffff7002f8 RCX:
      000000000000001f
      [  110.391715] RDX: 1ffff11184ee6cb5 RSI: 0000000040277d83 RDI:
      ffff888c277365a8
      [  110.391718] RBP: ffff888c23a6fd40 R08: 0000000000000002 R09:
      0000000000035280
      [  110.391721] R10: ffff888c23a6fc80 R11: ffffed11847485d0 R12:
      ffffffffb1017740
      [  110.391723] R13: 0000000000000003 R14: 0000000000000003 R15:
      0000000000000000
      [  110.391732]  ? cpuidle_enter_state+0xea/0xba0
      [  110.391738]  cpuidle_enter+0x4f/0xa0
      [  110.391747]  call_cpuidle+0x6d/0xc0
      [  110.391752]  do_idle+0x360/0x430
      [  110.391758]  ? arch_cpu_idle_exit+0x40/0x40
      [  110.391765]  ? complete+0x67/0x80
      [  110.391771]  cpu_startup_entry+0x1d/0x20
      [  110.391779]  start_secondary+0x2f3/0x3c0
      [  110.391784]  ? set_cpu_sibling_map+0x2500/0x2500
      [  110.391795]  secondary_startup_64+0xa4/0xb0
      
      [  110.391841] Allocated by task 290:
      [  110.391917]  save_stack+0x21/0x90
      [  110.391921]  __kasan_kmalloc.constprop.8+0xa7/0xd0
      [  110.391925]  kasan_kmalloc+0x9/0x10
      [  110.391929]  kmem_cache_alloc_trace+0xf6/0x270
      [  110.391987]  create_root_ns.isra.36+0x58/0x260 [mlx5_core]
      [  110.392044]  mlx5_init_fs+0x5fd/0x1ee0 [mlx5_core]
      [  110.392092]  mlx5_load_one+0xc7a/0x3860 [mlx5_core]
      [  110.392139]  init_one+0x6ff/0xf90 [mlx5_core]
      [  110.392145]  local_pci_probe+0xde/0x190
      [  110.392150]  work_for_cpu_fn+0x56/0xa0
      [  110.392153]  process_one_work+0x678/0x1140
      [  110.392157]  worker_thread+0x573/0xba0
      [  110.392162]  kthread+0x341/0x400
      [  110.392166]  ret_from_fork+0x1f/0x40
      
      [  110.392218] Freed by task 2742:
      [  110.392288]  save_stack+0x21/0x90
      [  110.392292]  __kasan_slab_free+0x137/0x190
      [  110.392296]  kasan_slab_free+0xe/0x10
      [  110.392299]  kfree+0x94/0x250
      [  110.392357]  tree_put_node+0x257/0x360 [mlx5_core]
      [  110.392413]  tree_remove_node+0x63/0xb0 [mlx5_core]
      [  110.392469]  clean_tree+0x199/0x240 [mlx5_core]
      [  110.392525]  mlx5_cleanup_fs+0x76/0x580 [mlx5_core]
      [  110.392572]  mlx5_unload+0x22/0xc0 [mlx5_core]
      [  110.392619]  mlx5_unload_one+0x99/0x260 [mlx5_core]
      [  110.392666]  remove_one+0x61/0x160 [mlx5_core]
      [  110.392671]  pci_device_remove+0x10b/0x2c0
      [  110.392677]  device_release_driver_internal+0x1e4/0x490
      [  110.392681]  device_driver_detach+0x36/0x40
      [  110.392685]  unbind_store+0x147/0x200
      [  110.392688]  drv_attr_store+0x6f/0xb0
      [  110.392693]  sysfs_kf_write+0x127/0x1d0
      [  110.392697]  kernfs_fop_write+0x296/0x420
      [  110.392702]  __vfs_write+0x66/0x110
      [  110.392707]  vfs_write+0x1a0/0x500
      [  110.392711]  ksys_write+0x164/0x250
      [  110.392715]  __x64_sys_write+0x73/0xb0
      [  110.392720]  do_syscall_64+0x9f/0x3a0
      [  110.392725]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 7dee607e ("net/mlx5: Support lockless FTE read lookups")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1f0593e7
    • Michael Guralnik's avatar
      net/mlx5: Move devlink registration before interfaces load · a6f3b623
      Michael Guralnik authored
      Register devlink before interfaces are added.
      This will allow interfaces to use devlink while initalizing. For example,
      call mlx5_is_roce_enabled.
      
      Fixes: aba25279 ("net/mlx5e: Add TX reporter support")
      Signed-off-by: default avatarMichael Guralnik <michaelgur@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a6f3b623
    • Eran Ben Elisha's avatar
      net/mlx5e: Always print health reporter message to dmesg · 99cda454
      Eran Ben Elisha authored
      In case a reporter exists, error message is logged only to the devlink
      tracer. The devlink tracer is a visibility utility only, which user can
      choose not to monitor.
      After cited patch, 3rd party monitoring tools that tracks these error
      message will no longer find them in dmesg, causing a regression.
      
      With this patch, error messages are also logged into the dmesg.
      
      Fixes: c50de4af ("net/mlx5e: Generalize tx reporter's functionality")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      99cda454
    • Dmytro Linkin's avatar
      net/mlx5e: Avoid duplicating rule destinations · 554fe75c
      Dmytro Linkin authored
      Following scenario easily break driver logic and crash the kernel:
      1. Add rule with mirred actions to same device.
      2. Delete this rule.
      In described scenario rule is not added to database and on deletion
      driver access invalid entry.
      Example:
      
       $ tc filter add dev ens1f0_0 ingress protocol ip prio 1 \
             flower skip_sw \
             action mirred egress mirror dev ens1f0_1 pipe \
             action mirred egress redirect dev ens1f0_1
       $ tc filter del dev ens1f0_0 ingress protocol ip prio 1
      
      Dmesg output:
      
      [  376.634396] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 3439): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f)
      [  376.654983] mlx5_core 0000:82:00.0: del_hw_flow_group:567:(pid 3439): flow steering can't destroy fg 89 of ft 3145728
      [  376.673433] kasan: CONFIG_KASAN_INLINE enabled
      [  376.683769] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  376.695229] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
      [  376.705069] CPU: 7 PID: 3439 Comm: tc Not tainted 5.4.0-rc5+ #76
      [  376.714959] Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0a 08/12/2016
      [  376.726371] RIP: 0010:mlx5_del_flow_rules+0x105/0x960 [mlx5_core]
      [  376.735817] Code: 01 00 00 00 48 83 eb 08 e8 28 d9 ff ff 4c 39 e3 75 d8 4c 8d bd c0 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 84 04 00 00 48 8d 7d 28 8b 9 d
      [  376.761261] RSP: 0018:ffff888847c56db8 EFLAGS: 00010202
      [  376.770054] RAX: dffffc0000000000 RBX: ffff8888582a6da0 RCX: ffff888847c56d60
      [  376.780743] RDX: 0000000000000058 RSI: 0000000000000008 RDI: 0000000000000282
      [  376.791328] RBP: 0000000000000000 R08: fffffbfff0c60ea6 R09: fffffbfff0c60ea6
      [  376.802050] R10: fffffbfff0c60ea5 R11: ffffffff8630752f R12: ffff8888582a6da0
      [  376.812798] R13: dffffc0000000000 R14: ffff8888582a6da0 R15: 00000000000002c0
      [  376.823445] FS:  00007f675f9a8840(0000) GS:ffff88886d200000(0000) knlGS:0000000000000000
      [  376.834971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  376.844179] CR2: 00000000007d9640 CR3: 00000007d3f26003 CR4: 00000000001606e0
      [  376.854843] Call Trace:
      [  376.868542]  __mlx5_eswitch_del_rule+0x49/0x300 [mlx5_core]
      [  376.877735]  mlx5e_tc_del_fdb_flow+0x6ec/0x9e0 [mlx5_core]
      [  376.921549]  mlx5e_flow_put+0x2b/0x50 [mlx5_core]
      [  376.929813]  mlx5e_delete_flower+0x5b6/0xbd0 [mlx5_core]
      [  376.973030]  tc_setup_cb_reoffload+0x29/0xc0
      [  376.980619]  fl_reoffload+0x50a/0x770 [cls_flower]
      [  377.015087]  tcf_block_playback_offloads+0xbd/0x250
      [  377.033400]  tcf_block_setup+0x1b2/0xc60
      [  377.057247]  tcf_block_offload_cmd+0x195/0x240
      [  377.098826]  tcf_block_offload_unbind+0xe7/0x180
      [  377.107056]  __tcf_block_put+0xe5/0x400
      [  377.114528]  ingress_destroy+0x3d/0x60 [sch_ingress]
      [  377.122894]  qdisc_destroy+0xf1/0x5a0
      [  377.129993]  qdisc_graft+0xa3d/0xe50
      [  377.151227]  tc_get_qdisc+0x48e/0xa20
      [  377.165167]  rtnetlink_rcv_msg+0x35d/0x8d0
      [  377.199528]  netlink_rcv_skb+0x11e/0x340
      [  377.219638]  netlink_unicast+0x408/0x5b0
      [  377.239913]  netlink_sendmsg+0x71b/0xb30
      [  377.267505]  sock_sendmsg+0xb1/0xf0
      [  377.273801]  ___sys_sendmsg+0x635/0x900
      [  377.312784]  __sys_sendmsg+0xd3/0x170
      [  377.338693]  do_syscall_64+0x95/0x460
      [  377.344833]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  377.352321] RIP: 0033:0x7f675e58e090
      
      To avoid this, for every mirred action check if output device was
      already processed. If so - drop rule with EOPNOTSUPP error.
      Signed-off-by: default avatarDmytro Linkin <dmitrolin@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      554fe75c
    • Daniel Borkmann's avatar
      bpf: Fix passing modified ctx to ld/abs/ind instruction · 6d4f151a
      Daniel Borkmann authored
      Anatoly has been fuzzing with kBdysch harness and reported a KASAN
      slab oob in one of the outcomes:
      
        [...]
        [   77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
        [   77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
        [   77.361119]
        [   77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215 #1
        [   77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        [   77.362984] Call Trace:
        [   77.363249]  dump_stack+0x97/0xe0
        [   77.363603]  print_address_description.constprop.0+0x1d/0x220
        [   77.364251]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
        [   77.365030]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
        [   77.365860]  __kasan_report.cold+0x37/0x7b
        [   77.366365]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
        [   77.366940]  kasan_report+0xe/0x20
        [   77.367295]  bpf_skb_load_helper_8_no_cache+0x71/0x130
        [   77.367821]  ? bpf_skb_load_helper_8+0xf0/0xf0
        [   77.368278]  ? mark_lock+0xa3/0x9b0
        [   77.368641]  ? kvm_sched_clock_read+0x14/0x30
        [   77.369096]  ? sched_clock+0x5/0x10
        [   77.369460]  ? sched_clock_cpu+0x18/0x110
        [   77.369876]  ? bpf_skb_load_helper_8+0xf0/0xf0
        [   77.370330]  ___bpf_prog_run+0x16c0/0x28f0
        [   77.370755]  __bpf_prog_run32+0x83/0xc0
        [   77.371153]  ? __bpf_prog_run64+0xc0/0xc0
        [   77.371568]  ? match_held_lock+0x1b/0x230
        [   77.371984]  ? rcu_read_lock_held+0xa1/0xb0
        [   77.372416]  ? rcu_is_watching+0x34/0x50
        [   77.372826]  sk_filter_trim_cap+0x17c/0x4d0
        [   77.373259]  ? sock_kzfree_s+0x40/0x40
        [   77.373648]  ? __get_filter+0x150/0x150
        [   77.374059]  ? skb_copy_datagram_from_iter+0x80/0x280
        [   77.374581]  ? do_raw_spin_unlock+0xa5/0x140
        [   77.375025]  unix_dgram_sendmsg+0x33a/0xa70
        [   77.375459]  ? do_raw_spin_lock+0x1d0/0x1d0
        [   77.375893]  ? unix_peer_get+0xa0/0xa0
        [   77.376287]  ? __fget_light+0xa4/0xf0
        [   77.376670]  __sys_sendto+0x265/0x280
        [   77.377056]  ? __ia32_sys_getpeername+0x50/0x50
        [   77.377523]  ? lock_downgrade+0x350/0x350
        [   77.377940]  ? __sys_setsockopt+0x2a6/0x2c0
        [   77.378374]  ? sock_read_iter+0x240/0x240
        [   77.378789]  ? __sys_socketpair+0x22a/0x300
        [   77.379221]  ? __ia32_sys_socket+0x50/0x50
        [   77.379649]  ? mark_held_locks+0x1d/0x90
        [   77.380059]  ? trace_hardirqs_on_thunk+0x1a/0x1c
        [   77.380536]  __x64_sys_sendto+0x74/0x90
        [   77.380938]  do_syscall_64+0x68/0x2a0
        [   77.381324]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [   77.381878] RIP: 0033:0x44c070
        [...]
      
      After further debugging, turns out while in case of other helper functions
      we disallow passing modified ctx, the special case of ld/abs/ind instruction
      which has similar semantics (except r6 being the ctx argument) is missing
      such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
      and others are expecting skb fields in original position, hence, add
      check_ctx_reg() to reject any modified ctx. Issue was first introduced back
      in f1174f77 ("bpf/verifier: rework value tracking").
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
      6d4f151a
    • David S. Miller's avatar
      Merge branch 'atlantic-bugfixes' · d76063c5
      David S. Miller authored
      Igor Russkikh says:
      
      ====================
      Aquantia/Marvell atlantic bugfixes 2020/01
      
      Here is a set of recently discovered bugfixes,
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d76063c5
    • Igor Russkikh's avatar
      net: atlantic: remove duplicate entries · b585f860
      Igor Russkikh authored
      Function entries were duplicated accidentally, removing the dups.
      
      Fixes: ea4b4d7f ("net: atlantic: loopback tests via private flags")
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b585f860
    • Igor Russkikh's avatar
      net: atlantic: loopback configuration in improper place · 883daa18
      Igor Russkikh authored
      Initial loopback configuration should be called earlier, before
      starting traffic on HW blocks. Otherwise depending on race conditions
      it could be kept disabled.
      
      Fixes: ea4b4d7f ("net: atlantic: loopback tests via private flags")
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      883daa18
    • Igor Russkikh's avatar
      net: atlantic: broken link status on old fw · ac70957e
      Igor Russkikh authored
      Last code/checkpatch cleanup did a copy paste error where code from
      firmware 3 API logic was moved to firmware 1 logic.
      
      This resulted in FW1.x users would never see the link state as active.
      
      Fixes: 7b0c342f ("net: atlantic: code style cleanup")
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac70957e
    • Roman Gushchin's avatar
      bpf: cgroup: prevent out-of-order release of cgroup bpf · e10360f8
      Roman Gushchin authored
      Before commit 4bfc0bb2 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
      cgroup bpf structures were released with
      corresponding cgroup structures. It guaranteed the hierarchical order
      of destruction: children were always first. It preserved attached
      programs from being released before their propagated copies.
      
      But with cgroup auto-detachment there are no such guarantees anymore:
      cgroup bpf is released as soon as the cgroup is offline and there are
      no live associated sockets. It means that an attached program can be
      detached and released, while its propagated copy is still living
      in the cgroup subtree. This will obviously lead to an use-after-free
      bug.
      
      To reproduce the issue the following script can be used:
      
        #!/bin/bash
      
        CGROOT=/sys/fs/cgroup
      
        mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
        sleep 1
      
        ./test_cgrp2_attach ${CGROOT}/A egress &
        A_PID=$!
        ./test_cgrp2_attach ${CGROOT}/B egress &
        B_PID=$!
      
        echo $$ > ${CGROOT}/A/C/cgroup.procs
        iperf -s &
        S_PID=$!
        iperf -c localhost -t 100 &
        C_PID=$!
      
        sleep 1
      
        echo $$ > ${CGROOT}/B/cgroup.procs
        echo ${S_PID} > ${CGROOT}/B/cgroup.procs
        echo ${C_PID} > ${CGROOT}/B/cgroup.procs
      
        sleep 1
      
        rmdir ${CGROOT}/A/C
        rmdir ${CGROOT}/A
      
        sleep 1
      
        kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}
      
      On the unpatched kernel the following stacktrace can be obtained:
      
      [   33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
      [   33.620677] #PF: supervisor read access in kernel mode
      [   33.621293] #PF: error_code(0x0000) - not-present page
      [   33.622754] Oops: 0000 [#1] SMP NOPTI
      [   33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
      [   33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
      [   33.635809] Call Trace:
      [   33.636118]  ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
      [   33.636728]  ? __switch_to_asm+0x40/0x70
      [   33.637196]  ip_finish_output+0x68/0xa0
      [   33.637654]  ip_output+0x76/0xf0
      [   33.638046]  ? __ip_finish_output+0x1c0/0x1c0
      [   33.638576]  __ip_queue_xmit+0x157/0x410
      [   33.639049]  __tcp_transmit_skb+0x535/0xaf0
      [   33.639557]  tcp_write_xmit+0x378/0x1190
      [   33.640049]  ? _copy_from_iter_full+0x8d/0x260
      [   33.640592]  tcp_sendmsg_locked+0x2a2/0xdc0
      [   33.641098]  ? sock_has_perm+0x10/0xa0
      [   33.641574]  tcp_sendmsg+0x28/0x40
      [   33.641985]  sock_sendmsg+0x57/0x60
      [   33.642411]  sock_write_iter+0x97/0x100
      [   33.642876]  new_sync_write+0x1b6/0x1d0
      [   33.643339]  vfs_write+0xb6/0x1a0
      [   33.643752]  ksys_write+0xa7/0xe0
      [   33.644156]  do_syscall_64+0x5b/0x1b0
      [   33.644605]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fix this by grabbing a reference to the bpf structure of each ancestor
      on the initialization of the cgroup bpf structure, and dropping the
      reference at the end of releasing the cgroup bpf structure.
      
      This will restore the hierarchical order of cgroup bpf releasing,
      without adding any operations on hot paths.
      
      Thanks to Josef Bacik for the debugging and the initial analysis of
      the problem.
      
      Fixes: 4bfc0bb2 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
      Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e10360f8
    • Vikas Gupta's avatar
      firmware: tee_bnxt: Fix multiple call to tee_client_close_context · 4012a6f2
      Vikas Gupta authored
      Fix calling multiple tee_client_close_context in case of shm allocation
      fails.
      
      Fixes: 24688095 (“firmware: broadcom: add OP-TEE based BNXT f/w manager”)
      Signed-off-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4012a6f2
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Preserve priority when setting CPU port. · d8dc2c96
      Andrew Lunn authored
      The 6390 family uses an extended register to set the port connected to
      the CPU. The lower 5 bits indicate the port, the upper three bits are
      the priority of the frames as they pass through the switch, what
      egress queue they should use, etc. Since frames being set to the CPU
      are typically management frames, BPDU, IGMP, ARP, etc set the priority
      to 7, the reset default, and the highest.
      
      Fixes: 33641994 ("net: dsa: mv88e6xxx: Monitor and Management tables")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatarChris Healy <cphealy@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8dc2c96
    • Krzysztof Kozlowski's avatar
      net: ethernet: sxgbe: Rename Samsung to lowercase · 5adcb8b1
      Krzysztof Kozlowski authored
      Fix up inconsistent usage of upper and lowercase letters in "Samsung"
      name.
      
      "SAMSUNG" is not an abbreviation but a regular trademarked name.
      Therefore it should be written with lowercase letters starting with
      capital letter.
      
      Although advertisement materials usually use uppercase "SAMSUNG", the
      lowercase version is used in all legal aspects (e.g. on Wikipedia and in
      privacy/legal statements on
      https://www.samsung.com/semiconductor/privacy-global/).
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5adcb8b1
    • Krzysztof Kozlowski's avatar
      net: wan: sdla: Fix cast from pointer to integer of different size · 00c0688c
      Krzysztof Kozlowski authored
      Since net_device.mem_start is unsigned long, it should not be cast to
      int right before casting to pointer.  This fixes warning (compile
      testing on alpha architecture):
      
          drivers/net/wan/sdla.c: In function ‘sdla_transmit’:
          drivers/net/wan/sdla.c:711:13: warning:
              cast to pointer from integer of different size [-Wint-to-pointer-cast]
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00c0688c
    • Xin Long's avatar
      sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY · be7a7729
      Xin Long authored
      This patch is to fix a memleak caused by no place to free cmd->obj.chunk
      for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to
      process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq
      with an allocated chunk in cmd->obj.chunk.
      
      So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on
      the cmd seq when any cmd returns error. While at it, also remove 'nomem'
      label.
      
      Reported-by: syzbot+107c4aff5f392bf1517f@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be7a7729
    • Ying Xue's avatar
      tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit error · a7869e5f
      Ying Xue authored
      syzbot found the following crash on:
      =====================================================
      BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline]
      BUG: KMSAN: uninit-value in nlmsg_parse_deprecated
      include/net/netlink.h:706 [inline]
      BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0
      net/tipc/netlink_compat.c:215
      CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1c9/0x220 lib/dump_stack.c:118
        kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
        __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245
        __nlmsg_parse include/net/netlink.h:661 [inline]
        nlmsg_parse_deprecated include/net/netlink.h:706 [inline]
        __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215
        tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x444179
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20
      R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
        kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
        kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86
        slab_alloc_node mm/slub.c:2774 [inline]
        __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382
        __kmalloc_reserve net/core/skbuff.c:141 [inline]
        __alloc_skb+0x309/0xa50 net/core/skbuff.c:209
        alloc_skb include/linux/skbuff.h:1049 [inline]
        nlmsg_new include/net/netlink.h:888 [inline]
        tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      =====================================================
      
      The complaint above occurred because the memory region pointed by attrbuf
      variable was not initialized. To eliminate this warning, we use kcalloc()
      rather than kmalloc_array() to allocate memory for attrbuf.
      
      Reported-by: syzbot+b1fd2bf2c89d8407e15f@syzkaller.appspotmail.com
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7869e5f
  7. 05 Jan, 2020 4 commits
    • Stephen Boyd's avatar
      macb: Don't unregister clks unconditionally · d89091a4
      Stephen Boyd authored
      The only clk init function in this driver that register a clk is
      fu540_c000_clk_init(), and thus we need to unregister the clk when this
      driver is removed on that platform. Other init functions, for example
      macb_clk_init(), don't register clks and therefore we shouldn't
      unregister the clks when this driver is removed. Convert this
      registration path to devm so it gets auto-unregistered when this driver
      is removed and drop the clk_unregister() calls in driver remove (and
      error paths) so that we don't erroneously remove a clk from the system
      that isn't registered by this driver.
      
      Otherwise we get strange crashes with a use-after-free when the
      devm_clk_get() call in macb_clk_init() calls clk_put() on a clk pointer
      that has become invalid because it is freed in clk_unregister().
      
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Cc: Yash Shah <yash.shah@sifive.com>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Fixes: c218ad55 ("macb: Add support for SiFive FU540-C000")
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d89091a4
    • Krzysztof Kozlowski's avatar
      MAINTAINERS: Drop obsolete entries from Samsung sxgbe ethernet driver · 15a821f0
      Krzysztof Kozlowski authored
      The emails to ks.giri@samsung.com and vipul.pandya@samsung.com bounce
      with 550 error code:
      
          host mailin.samsung.com[203.254.224.12] said: 550
          5.1.1 Recipient address rejected: User unknown (in reply to RCPT TO
          command)"
      
      Drop Girish K S and Vipul Pandya from sxgbe maintainers entry.
      
      Cc: Byungho An <bh74.an@samsung.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15a821f0
    • Carl Huang's avatar
      net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue · ce57785b
      Carl Huang authored
      The len used for skb_put_padto is wrong, it need to add len of hdr.
      
      In qrtr_node_enqueue, local variable size_t len is assign with
      skb->len, then skb_push(skb, sizeof(*hdr)) will add skb->len with
      sizeof(*hdr), so local variable size_t len is not same with skb->len
      after skb_push(skb, sizeof(*hdr)).
      
      Then the purpose of skb_put_padto(skb, ALIGN(len, 4)) is to add add
      pad to the end of the skb's data if skb->len is not aligned to 4, but
      unfortunately it use len instead of skb->len, at this line, skb->len
      is 32 bytes(sizeof(*hdr)) more than len, for example, len is 3 bytes,
      then skb->len is 35 bytes(3 + 32), and ALIGN(len, 4) is 4 bytes, so
      __skb_put_padto will do nothing after check size(35) < len(4), the
      correct value should be 36(sizeof(*hdr) + ALIGN(len, 4) = 32 + 4),
      then __skb_put_padto will pass check size(35) < len(36) and add 1 byte
      to the end of skb's data, then logic is correct.
      
      function of skb_push:
      void *skb_push(struct sk_buff *skb, unsigned int len)
      {
      	skb->data -= len;
      	skb->len  += len;
      	if (unlikely(skb->data < skb->head))
      		skb_under_panic(skb, len, __builtin_return_address(0));
      	return skb->data;
      }
      
      function of skb_put_padto
      static inline int skb_put_padto(struct sk_buff *skb, unsigned int len)
      {
      	return __skb_put_padto(skb, len, true);
      }
      
      function of __skb_put_padto
      static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len,
      				  bool free_on_error)
      {
      	unsigned int size = skb->len;
      
      	if (unlikely(size < len)) {
      		len -= size;
      		if (__skb_pad(skb, len, free_on_error))
      			return -ENOMEM;
      		__skb_put(skb, len);
      	}
      	return 0;
      }
      Signed-off-by: default avatarCarl Huang <cjhuang@codeaurora.org>
      Signed-off-by: default avatarWen Gong <wgong@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce57785b
    • Fenghua Yu's avatar
      drivers/net/b44: Change to non-atomic bit operations on pwol_mask · f11421ba
      Fenghua Yu authored
      Atomic operations that span cache lines are super-expensive on x86
      (not just to the current processor, but also to other processes as all
      memory operations are blocked until the operation completes). Upcoming
      x86 processors have a switch to cause such operations to generate a #AC
      trap. It is expected that some real time systems will enable this mode
      in BIOS.
      
      In preparation for this, it is necessary to fix code that may execute
      atomic instructions with operands that cross cachelines because the #AC
      trap will crash the kernel.
      
      Since "pwol_mask" is local and never exposed to concurrency, there is
      no need to set bits in pwol_mask using atomic operations.
      
      Directly operate on the byte which contains the bit instead of using
      __set_bit() to avoid any big endian concern due to type cast to
      unsigned long in __set_bit().
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f11421ba