1. 30 Aug, 2017 25 commits
    • Daniel Borkmann's avatar
      bpf: fix mixed signed/unsigned derived min/max value bounds · bf5b91b7
      Daniel Borkmann authored
      
      [ Upstream commit 4cabc5b1 ]
      
      Edward reported that there's an issue in min/max value bounds
      tracking when signed and unsigned compares both provide hints
      on limits when having unknown variables. E.g. a program such
      as the following should have been rejected:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff8a94cda93400
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        11: (65) if r1 s> 0x1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      What happens is that in the first part ...
      
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
      
      ... r1 carries an unsigned value, and is compared as unsigned
      against a register carrying an immediate. Verifier deduces in
      reg_set_min_max() that since the compare is unsigned and operation
      is greater than (>), that in the fall-through/false case, r1's
      minimum bound must be 0 and maximum bound must be r2. Latter is
      larger than the bound and thus max value is reset back to being
      'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
      'R1=inv,min_value=0'. The subsequent test ...
      
        11: (65) if r1 s> 0x1 goto pc+2
      
      ... is a signed compare of r1 with immediate value 1. Here,
      verifier deduces in reg_set_min_max() that since the compare
      is signed this time and operation is greater than (>), that
      in the fall-through/false case, we can deduce that r1's maximum
      bound must be 1, meaning with prior test, we result in r1 having
      the following state: R1=inv,min_value=0,max_value=1. Given that
      the actual value this holds is -8, the bounds are wrongly deduced.
      When this is being added to r0 which holds the map_value(_adj)
      type, then subsequent store access in above case will go through
      check_mem_access() which invokes check_map_access_adj(), that
      will then probe whether the map memory is in bounds based
      on the min_value and max_value as well as access size since
      the actual unknown value is min_value <= x <= max_value; commit
      fce366a9 ("bpf, verifier: fix alu ops against map_value{,
      _adj} register types") provides some more explanation on the
      semantics.
      
      It's worth to note in this context that in the current code,
      min_value and max_value tracking are used for two things, i)
      dynamic map value access via check_map_access_adj() and since
      commit 06c1c049 ("bpf: allow helpers access to variable memory")
      ii) also enforced at check_helper_mem_access() when passing a
      memory address (pointer to packet, map value, stack) and length
      pair to a helper and the length in this case is an unknown value
      defining an access range through min_value/max_value in that
      case. The min_value/max_value tracking is /not/ used in the
      direct packet access case to track ranges. However, the issue
      also affects case ii), for example, the following crafted program
      based on the same principle must be rejected as well:
      
         0: (b7) r2 = 0
         1: (bf) r3 = r10
         2: (07) r3 += -512
         3: (7a) *(u64 *)(r10 -16) = -8
         4: (79) r4 = *(u64 *)(r10 -16)
         5: (b7) r6 = -1
         6: (2d) if r4 > r6 goto pc+5
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
         7: (65) if r4 s> 0x1 goto pc+4
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
        R10=fp
         8: (07) r4 += 1
         9: (b7) r5 = 0
        10: (6a) *(u16 *)(r10 -512) = 0
        11: (85) call bpf_skb_load_bytes#26
        12: (b7) r0 = 0
        13: (95) exit
      
      Meaning, while we initialize the max_value stack slot that the
      verifier thinks we access in the [1,2] range, in reality we
      pass -7 as length which is interpreted as u32 in the helper.
      Thus, this issue is relevant also for the case of helper ranges.
      Resetting both bounds in check_reg_overflow() in case only one
      of them exceeds limits is also not enough as similar test can be
      created that uses values which are within range, thus also here
      learned min value in r1 is incorrect when mixed with later signed
      test to create a range:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff880ad081fa00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (65) if r1 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      This leaves us with two options for fixing this: i) to invalidate
      all prior learned information once we switch signed context, ii)
      to track min/max signed and unsigned boundaries separately as
      done in [0]. (Given latter introduces major changes throughout
      the whole verifier, it's rather net-next material, thus this
      patch follows option i), meaning we can derive bounds either
      from only signed tests or only unsigned tests.) There is still the
      case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
      operations, meaning programs like the following where boundaries
      on the reg get mixed in context later on when bounds are merged
      on the dst reg must get rejected, too:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff89b2bf87ce00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+6
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (b7) r7 = 1
        12: (65) if r7 s> 0x0 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
        13: (b7) r0 = 0
        14: (95) exit
      
        from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
        15: (0f) r7 += r1
        16: (65) if r7 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        17: (0f) r0 += r7
        18: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        19: (b7) r0 = 0
        20: (95) exit
      
      Meaning, in adjust_reg_min_max_vals() we must also reset range
      values on the dst when src/dst registers have mixed signed/
      unsigned derived min/max value bounds with one unbounded value
      as otherwise they can be added together deducing false boundaries.
      Once both boundaries are established from either ALU ops or
      compare operations w/o mixing signed/unsigned insns, then they
      can safely be added to other regs also having both boundaries
      established. Adding regs with one unbounded side to a map value
      where the bounded side has been learned w/o mixing ops is
      possible, but the resulting map value won't recover from that,
      meaning such op is considered invalid on the time of actual
      access. Invalid bounds are set on the dst reg in case i) src reg,
      or ii) in case dst reg already had them. The only way to recover
      would be to perform i) ALU ops but only 'add' is allowed on map
      value types or ii) comparisons, but these are disallowed on
      pointers in case they span a range. This is fine as only BPF_JEQ
      and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
      which potentially turn them into PTR_TO_MAP_VALUE type depending
      on the branch, so only here min/max value cannot be invalidated
      for them.
      
      In terms of state pruning, value_from_signed is considered
      as well in states_equal() when dealing with adjusted map values.
      With regards to breaking existing programs, there is a small
      risk, but use-cases are rather quite narrow where this could
      occur and mixing compares probably unlikely.
      
      Joint work with Josef and Edward.
      
        [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Reported-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf5b91b7
    • Daniel Borkmann's avatar
      bpf, verifier: fix alu ops against map_value{, _adj} register types · 8d674bee
      Daniel Borkmann authored
      
      [ Upstream commit fce366a9 ]
      
      While looking into map_value_adj, I noticed that alu operations
      directly on the map_value() resp. map_value_adj() register (any
      alu operation on a map_value() register will turn it into a
      map_value_adj() typed register) are not sufficiently protected
      against some of the operations. Two non-exhaustive examples are
      provided that the verifier needs to reject:
      
       i) BPF_AND on r0 (map_value_adj):
      
        0: (bf) r2 = r10
        1: (07) r2 += -8
        2: (7a) *(u64 *)(r2 +0) = 0
        3: (18) r1 = 0xbf842a00
        5: (85) call bpf_map_lookup_elem#1
        6: (15) if r0 == 0x0 goto pc+2
         R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
        7: (57) r0 &= 8
        8: (7a) *(u64 *)(r0 +0) = 22
         R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
        9: (95) exit
      
        from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
        9: (95) exit
        processed 10 insns
      
      ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):
      
        0: (bf) r2 = r10
        1: (07) r2 += -8
        2: (7a) *(u64 *)(r2 +0) = 0
        3: (18) r1 = 0xc24eee00
        5: (85) call bpf_map_lookup_elem#1
        6: (15) if r0 == 0x0 goto pc+2
         R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
        7: (04) (u32) r0 += (u32) 0
        8: (7a) *(u64 *)(r0 +0) = 22
         R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
        9: (95) exit
      
        from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
        9: (95) exit
        processed 10 insns
      
      Issue is, while min_value / max_value boundaries for the access
      are adjusted appropriately, we change the pointer value in a way
      that cannot be sufficiently tracked anymore from its origin.
      Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
      that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
      all the test cases coming with 48461135 ("bpf: allow access
      into map value arrays") perform BPF_ADD only on the destination
      register that is PTR_TO_MAP_VALUE_ADJ.
      
      Only for UNKNOWN_VALUE register types such operations make sense,
      f.e. with unknown memory content fetched initially from a constant
      offset from the map value memory into a register. That register is
      then later tested against lower / upper bounds, so that the verifier
      can then do the tracking of min_value / max_value, and properly
      check once that UNKNOWN_VALUE register is added to the destination
      register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the
      original use-case is solving. Note, tracking on what is being
      added is done through adjust_reg_min_max_vals() and later access
      to the map value enforced with these boundaries and the given offset
      from the insn through check_map_access_adj().
      
      Tests will fail for non-root environment due to prohibited pointer
      arithmetic, in particular in check_alu_op(), we bail out on the
      is_pointer_value() check on the dst_reg (which is false in root
      case as we allow for pointer arithmetic via env->allow_ptr_leaks).
      
      Similarly to PTR_TO_PACKET, one way to fix it is to restrict the
      allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit
      mode BPF_ADD. The test_verifier suite runs fine after the patch
      and it also rejects mentioned test cases.
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d674bee
    • Daniel Borkmann's avatar
      bpf: adjust verifier heuristics · 577aa83b
      Daniel Borkmann authored
      
      [ Upstream commit 3c2ce60b ]
      
      Current limits with regards to processing program paths do not
      really reflect today's needs anymore due to programs becoming
      more complex and verifier smarter, keeping track of more data
      such as const ALU operations, alignment tracking, spilling of
      PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
      smarter matching of what LLVM generates.
      
      This also comes with the side-effect that we result in fewer
      opportunities to prune search states and thus often need to do
      more work to prove safety than in the past due to different
      register states and stack layout where we mismatch. Generally,
      it's quite hard to determine what caused a sudden increase in
      complexity, it could be caused by something as trivial as a
      single branch somewhere at the beginning of the program where
      LLVM assigned a stack slot that is marked differently throughout
      other branches and thus causing a mismatch, where verifier
      then needs to prove safety for the whole rest of the program.
      Subsequently, programs with even less than half the insn size
      limit can get rejected. We noticed that while some programs
      load fine under pre 4.11, they get rejected due to hitting
      limits on more recent kernels. We saw that in the vast majority
      of cases (90+%) pruning failed due to register mismatches. In
      case of stack mismatches, majority of cases failed due to
      different stack slot types (invalid, spill, misc) rather than
      differences in spilled registers.
      
      This patch makes pruning more aggressive by also adding markers
      that sit at conditional jumps as well. Currently, we only mark
      jump targets for pruning. For example in direct packet access,
      these are usually error paths where we bail out. We found that
      adding these markers, it can reduce number of processed insns
      by up to 30%. Another option is to ignore reg->id in probing
      PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
      slightly as well by up to 7% observed complexity reduction as
      stand-alone. Meaning, if a previous path with register type
      PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
      in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
      the same map X must be safe as well. Last but not least the
      patch also adds a scheduling point and bumps the current limit
      for instructions to be processed to a more adequate value.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      577aa83b
    • John Fastabend's avatar
      bpf, verifier: add additional patterns to evaluate_reg_imm_alu · e37bdeee
      John Fastabend authored
      
      [ Upstream commit 43188702 ]
      
      Currently the verifier does not track imm across alu operations when
      the source register is of unknown type. This adds additional pattern
      matching to catch this and track imm. We've seen LLVM generating this
      pattern while working on cilium.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e37bdeee
    • Konstantin Khlebnikov's avatar
      net_sched: fix order of queue length updates in qdisc_replace() · 7fa2fdf9
      Konstantin Khlebnikov authored
      
      [ Upstream commit 68a66d14 ]
      
      This important to call qdisc_tree_reduce_backlog() after changing queue
      length. Parent qdisc should deactivate class in ->qlen_notify() called from
      qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
      
      Missed class deactivations leads to crashes/warnings at picking packets
      from empty qdisc and corrupting state at reactivating this class in future.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 86a7996c ("net_sched: introduce qdisc_replace() helper")
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fa2fdf9
    • Xin Long's avatar
      net: sched: fix NULL pointer dereference when action calls some targets · 3e00bf91
      Xin Long authored
      
      [ Upstream commit 4f8a881a ]
      
      As we know in some target's checkentry it may dereference par.entryinfo
      to check entry stuff inside. But when sched action calls xt_check_target,
      par.entryinfo is set with NULL. It would cause kernel panic when calling
      some targets.
      
      It can be reproduce with:
        # tc qd add dev eth1 ingress handle ffff:
        # tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
          -j ECN --ecn-tcp-remove
      
      It could also crash kernel when using target CLUSTERIP or TPROXY.
      
      By now there's no proper value for par.entryinfo in ipt_init_target,
      but it can not be set with NULL. This patch is to void all these
      panics by setting it with an ipt_entry obj with all members = 0.
      
      Note that this issue has been there since the very beginning.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e00bf91
    • Colin Ian King's avatar
      irda: do not leak initialized list.dev to userspace · f3f5bf27
      Colin Ian King authored
      
      [ Upstream commit b024d949 ]
      
      list.dev has not been initialized and so the copy_to_user is copying
      data from the stack back to user space which is a potential
      information leak. Fix this ensuring all of list is initialized to
      zero.
      
      Detected by CoverityScan, CID#1357894 ("Uninitialized scalar variable")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3f5bf27
    • Huy Nguyen's avatar
      net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled · 19f433a9
      Huy Nguyen authored
      
      [ Upstream commit ca3d89a3 ]
      
      enable_4k_uar module parameter was added in patch cited below to
      address the backward compatibility issue in SRIOV when the VM has
      system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
      implementation.
      
      The above compatibility issue does not exist in the non SRIOV case.
      In this patch, we always enable 4k uar implementation if SRIOV
      is not enabled on mlx4's supported cards.
      
      Fixes: 76e39ccf ("net/mlx4_core: Fix backward compatibility on VFs")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19f433a9
    • Neal Cardwell's avatar
      tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP · aadbe1fe
      Neal Cardwell authored
      
      [ Upstream commit cdbeb633 ]
      
      In some situations tcp_send_loss_probe() can realize that it's unable
      to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
      to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
      realizes that the RTO was eligible to fire immediately or at some
      point in the past (delta_us <= 0). Previously in such cases
      tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
      icsk_rto, which caused needless delays of hundreds of milliseconds
      (and non-linear behavior that made reproducible testing
      difficult). This commit changes the logic to schedule "overdue" RTOs
      ASAP, rather than at now + icsk_rto.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Suggested-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aadbe1fe
    • Wei Wang's avatar
      ipv6: repair fib6 tree in failure case · 1c18f936
      Wei Wang authored
      
      [ Upstream commit 348a4002 ]
      
      In fib6_add(), it is possible that fib6_add_1() picks an intermediate
      node and sets the node's fn->leaf to NULL in order to add this new
      route. However, if fib6_add_rt2node() fails to add the new
      route for some reason, fn->leaf will be left as NULL and could
      potentially cause crash when fn->leaf is accessed in fib6_locate().
      This patch makes sure fib6_repair_tree() is called to properly repair
      fn->leaf in the above failure case.
      
      Here is the syzkaller reported general protection fault in fib6_locate:
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
      RSP: 0018:ffff8801d01a36a8  EFLAGS: 00010202
      RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
      RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
      RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
      R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
      FS:  00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
       ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
       ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
      Call Trace:
       [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
       [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
       [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
       [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
       [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
       [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
       [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
       [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
       [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
       [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
       [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
       [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
       [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
       [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
       [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
       [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
       [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      Note: there is no "Fixes" tag as this seems to be a bug introduced
      very early.
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c18f936
    • Wei Wang's avatar
      ipv6: reset fn->rr_ptr when replacing route · 62e9a28b
      Wei Wang authored
      
      [ Upstream commit 383143f3 ]
      
      syzcaller reported the following use-after-free issue in rt6_select():
      BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
      BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
      Read of size 4 by task syz-executor1/439628
      CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
       ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
       ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
      Call Trace:
       [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
      sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
      Use struct sctp_sack_info instead
       [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
       [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
       [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
       [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
       [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
       [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
       [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
       [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
       [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
       [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
       [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
       [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
       [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
       [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
       [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
       [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
       [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
       [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
       [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
       [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
       [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
       [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
      Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
      
      The root cause of it is that in fib6_add_rt2node(), when it replaces an
      existing route with the new one, it does not update fn->rr_ptr.
      This commit resets fn->rr_ptr to NULL when it points to a route which is
      replaced in fib6_add_rt2node().
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62e9a28b
    • Eric Dumazet's avatar
      tipc: fix use-after-free · 7ad5fb95
      Eric Dumazet authored
      
      [ Upstream commit 5bfd37b4 ]
      
      syszkaller reported use-after-free in tipc [1]
      
      When msg->rep skb is freed, set the pointer to NULL,
      so that caller does not free it again.
      
      [1]
      
      ==================================================================
      BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
      Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115
      
      CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x24e/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       skb_push+0xd4/0xe0 net/core/skbuff.c:1466
       tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512e9
      RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
      RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
      R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
       kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
       __alloc_skb+0xf1/0x740 net/core/skbuff.c:219
       alloc_skb include/linux/skbuff.h:903 [inline]
       tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
       tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kmem_cache_free+0x77/0x280 mm/slab.c:3763
       kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
       __kfree_skb net/core/skbuff.c:682 [inline]
       kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
       tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      The buggy address belongs to the object at ffff8801c6e71dc0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 208 bytes inside of
       224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
      The buggy address belongs to the page:
      page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
      flags: 0x200000000000100(slab)
      raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
      raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                               ^
       ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ad5fb95
    • Alexander Potapenko's avatar
      sctp: fully initialize the IPv6 address in sctp_v6_to_addr() · 0f5ecc79
      Alexander Potapenko authored
      
      [ Upstream commit 15339e44 ]
      
      KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
      sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
      Make sure all fields of an IPv6 address are initialized, which
      guarantees that the IPv4 fields are also initialized.
      
      ==================================================================
       BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
       net/sctp/ipv6.c:517
       CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
       01/01/2011
       Call Trace:
        dump_stack+0x172/0x1c0 lib/dump_stack.c:42
        is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
        native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
        arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
        arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
        sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
        sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
        sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
        sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
        inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg net/socket.c:643 [inline]
        SYSC_sendto+0x608/0x710 net/socket.c:1696
        SyS_sendto+0x8a/0xb0 net/socket.c:1664
        entry_SYSCALL_64_fastpath+0x13/0x94
       RIP: 0033:0x44b479
       RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
       RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
       RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
       R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
       R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
       origin description: ----dst_saddr@sctp_v6_get_dst
       local variable created at:
        sk_fullsock include/net/sock.h:2321 [inline]
        inet6_sk include/linux/ipv6.h:309 [inline]
        sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
      ==================================================================
       BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
       net/sctp/ipv6.c:517
       CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
       01/01/2011
       Call Trace:
        dump_stack+0x172/0x1c0 lib/dump_stack.c:42
        is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
        native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
        arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
        arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
        sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
        sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
        sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
        sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
        inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg net/socket.c:643 [inline]
        SYSC_sendto+0x608/0x710 net/socket.c:1696
        SyS_sendto+0x8a/0xb0 net/socket.c:1664
        entry_SYSCALL_64_fastpath+0x13/0x94
       RIP: 0033:0x44b479
       RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
       RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
       RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
       R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
       R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
       origin description: ----dst_saddr@sctp_v6_get_dst
       local variable created at:
        sk_fullsock include/net/sock.h:2321 [inline]
        inet6_sk include/linux/ipv6.h:309 [inline]
        sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
      ==================================================================
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f5ecc79
    • Colin Ian King's avatar
      nfp: fix infinite loop on umapping cleanup · 57406e73
      Colin Ian King authored
      
      [ Upstream commit eac2c68d ]
      
      The while loop that performs the dma page unmapping never decrements
      index counter f and hence loops forever. Fix this with a pre-decrement
      on f.
      
      Detected by CoverityScan, CID#1357309 ("Infinite loop")
      
      Fixes: 4c352362 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57406e73
    • Eric Dumazet's avatar
      ipv4: better IP_MAX_MTU enforcement · f29c9f46
      Eric Dumazet authored
      
      [ Upstream commit c780a049 ]
      
      While working on yet another syzkaller report, I found
      that our IP_MAX_MTU enforcements were not properly done.
      
      gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
      final result can be bigger than IP_MAX_MTU :/
      
      This is a problem because device mtu can be changed on other cpus or
      threads.
      
      While this patch does not fix the issue I am working on, it is
      probably worth addressing it.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f29c9f46
    • Eric Dumazet's avatar
      ptr_ring: use kmalloc_array() · 59af5b87
      Eric Dumazet authored
      
      [ Upstream commit 81fbfe8a ]
      
      As found by syzkaller, malicious users can set whatever tx_queue_len
      on a tun device and eventually crash the kernel.
      
      Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
      ring buffer is not fast anyway.
      
      Fixes: 2e0ab8ca ("ptr_ring: array based FIFO for pointers")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59af5b87
    • Liping Zhang's avatar
      openvswitch: fix skb_panic due to the incorrect actions attrlen · 3c7af814
      Liping Zhang authored
      
      [ Upstream commit 494bea39 ]
      
      For sw_flow_actions, the actions_len only represents the kernel part's
      size, and when we dump the actions to the userspace, we will do the
      convertions, so it's true size may become bigger than the actions_len.
      
      But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
      to alloc the skbuff, so the user_skb's size may become insufficient and
      oops will happen like this:
        skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
        ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
        ------------[ cut here ]------------
        kernel BUG at net/core/skbuff.c:129!
        [...]
        Call Trace:
         <IRQ>
         [<ffffffff8148be82>] skb_put+0x43/0x44
         [<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
         [<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
         [<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
         [<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
         [<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
         [<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
         [<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
         [<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
         [<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
         [<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
        [...]
      
      Also we can find that the actions_len is much little than the orig_len:
        crash> struct sw_flow_actions 0xffff8812f539d000
        struct sw_flow_actions {
          rcu = {
            next = 0xffff8812f5398800,
            func = 0xffffe3b00035db32
          },
          orig_len = 1384,
          actions_len = 592,
          actions = 0xffff8812f539d01c
        }
      
      So as a quick fix, use the orig_len instead of the actions_len to alloc
      the user_skb.
      
      Last, this oops happened on our system running a relative old kernel, but
      the same risk still exists on the mainline, since we use the wrong
      actions_len from the beginning.
      
      Fixes: ccea7445 ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
      Cc: Neil McKee <neil.mckee@inmon.com>
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c7af814
    • Daniel Borkmann's avatar
      bpf: fix bpf_trace_printk on 32 bit archs · d6a6b6b4
      Daniel Borkmann authored
      
      [ Upstream commit 88a5c690 ]
      
      James reported that on MIPS32 bpf_trace_printk() is currently
      broken while MIPS64 works fine:
      
        bpf_trace_printk() uses conditional operators to attempt to
        pass different types to __trace_printk() depending on the
        format operators. This doesn't work as intended on 32-bit
        architectures where u32 and long are passed differently to
        u64, since the result of C conditional operators follows the
        "usual arithmetic conversions" rules, such that the values
        passed to __trace_printk() will always be u64 [causing issues
        later in the va_list handling for vscnprintf()].
      
        For example the samples/bpf/tracex5 test printed lines like
        below on MIPS32, where the fd and buf have come from the u64
        fd argument, and the size from the buf argument:
      
          [...] 1180.941542: 0x00000001: write(fd=1, buf=  (null), size=6258688)
      
        Instead of this:
      
          [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)
      
      One way to get it working is to expand various combinations
      of argument types into 8 different combinations for 32 bit
      and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
      as well that it resolves the issue.
      
      Fixes: 9c959c86 ("tracing: Allow BPF programs to call bpf_trace_printk()")
      Reported-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Tested-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6a6b6b4
    • Konstantin Khlebnikov's avatar
      net_sched: remove warning from qdisc_hash_add · 792c0707
      Konstantin Khlebnikov authored
      
      [ Upstream commit c90e9514 ]
      
      It was added in commit e57a784d ("pkt_sched: set root qdisc
      before change() in attach_default_qdiscs()") to hide duplicates
      from "tc qdisc show" for incative deivices.
      
      After 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      it triggered when classful qdisc is added to inactive device because
      default qdiscs are added before switching root qdisc.
      
      Anyway after commit ea327469 ("net: sched: avoid duplicates in
      qdisc dump") duplicates are filtered right in dumper.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      792c0707
    • Konstantin Khlebnikov's avatar
      net_sched/sfq: update hierarchical backlog when drop packet · 38530f6e
      Konstantin Khlebnikov authored
      
      [ Upstream commit 325d5dc3 ]
      
      When sfq_enqueue() drops head packet or packet from another queue it
      have to update backlog at upper qdiscs too.
      
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38530f6e
    • Eric Dumazet's avatar
      ipv4: fix NULL dereference in free_fib_info_rcu() · 71501d9b
      Eric Dumazet authored
      
      [ Upstream commit 187e5b3a ]
      
      If fi->fib_metrics could not be allocated in fib_create_info()
      we attempt to dereference a NULL pointer in free_fib_info_rcu() :
      
          m = fi->fib_metrics;
          if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
                  kfree(m);
      
      Before my recent patch, we used to call kfree(NULL) and nothing wrong
      happened.
      
      Instead of using RCU to defer freeing while we are under memory stress,
      it seems better to take immediate action.
      
      This was reported by syzkaller team.
      
      Fixes: 3fb07daf ("ipv4: add reference counting to metrics")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71501d9b
    • Eric Dumazet's avatar
      dccp: defer ccid_hc_tx_delete() at dismantle time · 236b0d93
      Eric Dumazet authored
      
      [ Upstream commit 120e9dab ]
      
      syszkaller team reported another problem in DCCP [1]
      
      Problem here is that the structure holding RTO timer
      (ccid2_hc_tx_rto_expire() handler) is freed too soon.
      
      We can not use del_timer_sync() to cancel the timer
      since this timer wants to grab socket lock (that would risk a dead lock)
      
      Solution is to defer the freeing of memory when all references to
      the socket were released. Socket timers do own a reference, so this
      should fix the issue.
      
      [1]
      
      ==================================================================
      BUG: KASAN: use-after-free in ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
      Read of size 4 at addr ffff8801d2660540 by task kworker/u4:7/3365
      
      CPU: 1 PID: 3365 Comm: kworker/u4:7 Not tainted 4.13.0-rc4+ #3
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events_unbound call_usermodehelper_exec_work
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x24e/0x340 mm/kasan/report.c:409
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
       ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
       call_timer_fn+0x233/0x830 kernel/time/timer.c:1268
       expire_timers kernel/time/timer.c:1307 [inline]
       __run_timers+0x7fd/0xb90 kernel/time/timer.c:1601
       run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
       __do_softirq+0x2f5/0xba3 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:638 [inline]
       smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:1044
       apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:702
      RIP: 0010:arch_local_irq_enable arch/x86/include/asm/paravirt.h:824 [inline]
      RIP: 0010:__raw_write_unlock_irq include/linux/rwlock_api_smp.h:267 [inline]
      RIP: 0010:_raw_write_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:343
      RSP: 0018:ffff8801cd50eaa8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
      RAX: dffffc0000000000 RBX: ffffffff85a090c0 RCX: 0000000000000006
      RDX: 1ffffffff0b595f3 RSI: 1ffff1003962f989 RDI: ffffffff85acaf98
      RBP: ffff8801cd50eab0 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801cc96ea60
      R13: dffffc0000000000 R14: ffff8801cc96e4c0 R15: ffff8801cc96e4c0
       </IRQ>
       release_task+0xe9e/0x1a40 kernel/exit.c:220
       wait_task_zombie kernel/exit.c:1162 [inline]
       wait_consider_task+0x29b8/0x33c0 kernel/exit.c:1389
       do_wait_thread kernel/exit.c:1452 [inline]
       do_wait+0x441/0xa90 kernel/exit.c:1523
       kernel_wait4+0x1f5/0x370 kernel/exit.c:1665
       SYSC_wait4+0x134/0x140 kernel/exit.c:1677
       SyS_wait4+0x2c/0x40 kernel/exit.c:1673
       call_usermodehelper_exec_sync kernel/kmod.c:286 [inline]
       call_usermodehelper_exec_work+0x1a0/0x2c0 kernel/kmod.c:323
       process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2097
       worker_thread+0x223/0x1860 kernel/workqueue.c:2231
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:425
      
      Allocated by task 21267:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
       kmem_cache_alloc+0x127/0x750 mm/slab.c:3561
       ccid_new+0x20e/0x390 net/dccp/ccid.c:151
       dccp_hdlr_ccid+0x27/0x140 net/dccp/feat.c:44
       __dccp_feat_activate+0x142/0x2a0 net/dccp/feat.c:344
       dccp_feat_activate_values+0x34e/0xa90 net/dccp/feat.c:1538
       dccp_rcv_request_sent_state_process net/dccp/input.c:472 [inline]
       dccp_rcv_state_process+0xed1/0x1620 net/dccp/input.c:677
       dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
       sk_backlog_rcv include/net/sock.h:911 [inline]
       __release_sock+0x124/0x360 net/core/sock.c:2269
       release_sock+0xa4/0x2a0 net/core/sock.c:2784
       inet_wait_for_connect net/ipv4/af_inet.c:557 [inline]
       __inet_stream_connect+0x671/0xf00 net/ipv4/af_inet.c:643
       inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:682
       SYSC_connect+0x204/0x470 net/socket.c:1642
       SyS_connect+0x24/0x30 net/socket.c:1623
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 3049:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kmem_cache_free+0x77/0x280 mm/slab.c:3763
       ccid_hc_tx_delete+0xc5/0x100 net/dccp/ccid.c:190
       dccp_destroy_sock+0x1d1/0x2b0 net/dccp/proto.c:225
       inet_csk_destroy_sock+0x166/0x3f0 net/ipv4/inet_connection_sock.c:833
       dccp_done+0xb7/0xd0 net/dccp/proto.c:145
       dccp_time_wait+0x13d/0x300 net/dccp/minisocks.c:72
       dccp_rcv_reset+0x1d1/0x5b0 net/dccp/input.c:160
       dccp_rcv_state_process+0x8fc/0x1620 net/dccp/input.c:663
       dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
       sk_backlog_rcv include/net/sock.h:911 [inline]
       __sk_receive_skb+0x33e/0xc00 net/core/sock.c:521
       dccp_v4_rcv+0xef1/0x1c00 net/dccp/ipv4.c:871
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:248 [inline]
       ip_local_deliver+0x1ce/0x6d0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:477 [inline]
       ip_rcv_finish+0x8db/0x19c0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:248 [inline]
       ip_rcv+0xc3f/0x17d0 net/ipv4/ip_input.c:488
       __netif_receive_skb_core+0x19af/0x33d0 net/core/dev.c:4417
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4455
       process_backlog+0x203/0x740 net/core/dev.c:5130
       napi_poll net/core/dev.c:5527 [inline]
       net_rx_action+0x792/0x1910 net/core/dev.c:5593
       __do_softirq+0x2f5/0xba3 kernel/softirq.c:284
      
      The buggy address belongs to the object at ffff8801d2660100
       which belongs to the cache ccid2_hc_tx_sock of size 1240
      The buggy address is located 1088 bytes inside of
       1240-byte region [ffff8801d2660100, ffff8801d26605d8)
      The buggy address belongs to the page:
      page:ffffea0007499800 count:1 mapcount:0 mapping:ffff8801d2660100 index:0x0 compound_mapcount: 0
      flags: 0x200000000008100(slab|head)
      raw: 0200000000008100 ffff8801d2660100 0000000000000000 0000000100000005
      raw: ffffea00075271a0 ffffea0007538820 ffff8801d3aef9c0 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801d2660400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801d2660480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8801d2660500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                 ^
       ffff8801d2660580: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff8801d2660600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      236b0d93
    • Eric Dumazet's avatar
      dccp: purge write queue in dccp_destroy_sock() · b31cbe2c
      Eric Dumazet authored
      
      [ Upstream commit 7749d4ff ]
      
      syzkaller reported that DCCP could have a non empty
      write queue at dismantle time.
      
      WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:180
       __warn+0x1c4/0x1d9 kernel/panic.c:541
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
       do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
       invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
      RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
      RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
      RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
      R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
       inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
       dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       sock_release+0x8d/0x1e0 net/socket.c:597
       sock_close+0x16/0x20 net/socket.c:1126
       __fput+0x327/0x7e0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:246
       task_work_run+0x18a/0x260 kernel/task_work.c:116
       exit_task_work include/linux/task_work.h:21 [inline]
       do_exit+0xa32/0x1b10 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:969
       get_signal+0x7e8/0x17e0 kernel/signal.c:2330
       do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
       exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
       prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
       syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b31cbe2c
    • Eric Dumazet's avatar
      af_key: do not use GFP_KERNEL in atomic contexts · 2e3f17f8
      Eric Dumazet authored
      
      [ Upstream commit 36f41f8f ]
      
      pfkey_broadcast() might be called from non process contexts,
      we can not use GFP_KERNEL in these cases [1].
      
      This patch partially reverts commit ba51b6be ("net: Fix RCU splat in
      af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
      section.
      
      [1] : syzkaller reported :
      
      in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
      3 locks held by syzkaller183439/2932:
       #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
       #1:  (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
      CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
       __might_sleep+0x95/0x190 kernel/sched/core.c:5947
       slab_pre_alloc_hook mm/slab.h:416 [inline]
       slab_alloc mm/slab.c:3383 [inline]
       kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
       skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
       pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
       pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
       dump_sp+0x3d6/0x500 net/key/af_key.c:2685
       xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
       pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
       pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
       pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
       pfkey_process+0x606/0x710 net/key/af_key.c:2814
       pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
      sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x755/0x890 net/socket.c:2035
       __sys_sendmsg+0xe5/0x210 net/socket.c:2069
       SYSC_sendmsg net/socket.c:2080 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2076
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x445d79
      RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
      RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
      RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
      R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
      R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000
      
      Fixes: ba51b6be ("net: Fix RCU splat in af_key")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e3f17f8
    • Tushar Dave's avatar
      sparc64: remove unnecessary log message · d0526eef
      Tushar Dave authored
      [ Upstream commit 6170a506 ]
      
      There is no need to log message if ATU hvapi couldn't get register.
      Unlike PCI hvapi, ATU hvapi registration failure is not hard error.
      Even if ATU hvapi registration fails (on system with ATU or without
      ATU) system continues with legacy IOMMU. So only log message when
      ATU hvapi successfully get registered.
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0526eef
  2. 25 Aug, 2017 15 commits