1. 30 Aug, 2017 36 commits
    • Paolo Bonzini's avatar
      KVM: x86: block guest protection keys unless the host has them enabled · 2f76f62a
      Paolo Bonzini authored
      commit c469268c upstream.
      
      If the host has protection keys disabled, we cannot read and write the
      guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
      the PKU cpuid bit in that case.
      
      This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      Fixes: 1be0e61cReviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f76f62a
    • Paolo Bonzini's avatar
      KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state · 3c498d4b
      Paolo Bonzini authored
      commit 38cfd5e3 upstream.
      
      The host pkru is restored right after vcpu exit (commit 1be0e61c), so
      KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
      using the guest PKRU explicitly in fill_xsave and load_xsave.  This
      part is based on a patch by Junkang Fu.
      
      The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
      because it could have been changed by userspace since the last time
      it was saved, so skip loading it in kvm_load_guest_fpu.
      Reported-by: default avatarJunkang Fu <junkang.fjk@alibaba-inc.com>
      Cc: Yang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61cSigned-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c498d4b
    • Paolo Bonzini's avatar
      KVM: x86: simplify handling of PKRU · d0e52c82
      Paolo Bonzini authored
      commit b9dd21e1 upstream.
      
      Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
      simple comparison against the host value of the register.  The write of
      PKRU in addition can be skipped if the guest has not enabled the feature.
      Once we do this, we need not test OSPKE in the host anymore, because
      guest_CR4.PKE=1 implies host_CR4.PKE=1.
      
      The static PKU test is kept to elide the code on older CPUs.
      Suggested-by: default avatarYang Zhang <zy107165@alibaba-inc.com>
      Fixes: 1be0e61cReviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0e52c82
    • Heiko Carstens's avatar
      KVM: s390: sthyi: fix specification exception detection · 6dc06cd6
      Heiko Carstens authored
      commit 857b8de9 upstream.
      
      sthyi should only generate a specification exception if the function
      code is zero and the response buffer is not on a 4k boundary.
      
      The current code would also test for unknown function codes if the
      response buffer, that is currently only defined for function code 0,
      is not on a 4k boundary and incorrectly inject a specification
      exception instead of returning with condition code 3 and return code 4
      (unsupported function code).
      
      Fix this by moving the boundary check.
      
      Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation")
      Reviewed-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6dc06cd6
    • Heiko Carstens's avatar
      KVM: s390: sthyi: fix sthyi inline assembly · e516834a
      Heiko Carstens authored
      commit 4a4eefcd upstream.
      
      The sthyi inline assembly misses register r3 within the clobber
      list. The sthyi instruction will always write a return code to
      register "R2+1", which in this case would be r3. Due to that we may
      have register corruption and see host crashes or data corruption
      depending on how gcc decided to allocate and use registers during
      compile time.
      
      Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation")
      Reviewed-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e516834a
    • Masaki Ota's avatar
      Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad · ddae9e6e
      Masaki Ota authored
      commit 4a646580 upstream.
      
      Fixed the issue that two finger scroll does not work correctly
      on V8 protocol. The cause is that V8 protocol X-coordinate decode
      is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition.
      
      Mote notes:
      the problem manifests itself by the commit e7348396 ("Input: ALPS
      - fix V8+ protocol handling (73 03 28)"), where a fix for the V8+
      protocol was applied.  Although the culprit must have been present
      beforehand, the two-finger scroll worked casually even with the
      wrongly reported values by some reason.  It got broken by the commit
      above just because it changed x_max value, and this made libinput
      correctly figuring the MT events.  Since the X coord is reported as
      falsely doubled, the events on the right-half side go outside the
      boundary, thus they are no longer handled.  This resulted as a broken
      two-finger scroll.
      
      One finger event is decoded differently, and it didn't suffer from
      this problem.  The problem was only about MT events. --tiwai
      
      Fixes: e7348396 ("Input: ALPS - fix V8+ protocol handling (73 03 28)")
      Signed-off-by: default avatarMasaki Ota <masaki.ota@jp.alps.com>
      Tested-by: default avatarTakashi Iwai <tiwai@suse.de>
      Tested-by: default avatarPaul Donohue <linux-kernel@PaulSD.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddae9e6e
    • KT Liao's avatar
      Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310 · 8dcee8e8
      KT Liao authored
      commit 1d2226e4 upstream.
      
      Add ELAN0602 to the list of known ACPI IDs to enable support for ELAN
      touchpads found in Lenovo Yoga310.
      Signed-off-by: default avatarKT Liao <kt.liao@emc.com.tw>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dcee8e8
    • Aaron Ma's avatar
      Input: trackpoint - add new trackpoint firmware ID · 38c36f9d
      Aaron Ma authored
      commit ec667683 upstream.
      
      Synaptics add new TP firmware ID: 0x2 and 0x3, for now both lower 2 bits
      are indicated as TP. Change the constant to bitwise values.
      
      This makes trackpoint to be recognized on Lenovo Carbon X1 Gen5 instead
      of it being identified as "PS/2 Generic Mouse".
      Signed-off-by: default avatarAaron Ma <aaron.ma@canonical.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38c36f9d
    • Edward Cree's avatar
      bpf/verifier: fix min/max handling in BPF_SUB · c9c682f3
      Edward Cree authored
      
      [ Upstream commit 9305706c ]
      
      We have to subtract the src max from the dst min, and vice-versa, since
       (e.g.) the smallest result comes from the largest subtrahend.
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9c682f3
    • Daniel Borkmann's avatar
      bpf: fix mixed signed/unsigned derived min/max value bounds · eb6cf01c
      Daniel Borkmann authored
      
      [ Upstream commit 4cabc5b1 ]
      
      Edward reported that there's an issue in min/max value bounds
      tracking when signed and unsigned compares both provide hints
      on limits when having unknown variables. E.g. a program such
      as the following should have been rejected:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff8a94cda93400
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        11: (65) if r1 s> 0x1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      What happens is that in the first part ...
      
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
      
      ... r1 carries an unsigned value, and is compared as unsigned
      against a register carrying an immediate. Verifier deduces in
      reg_set_min_max() that since the compare is unsigned and operation
      is greater than (>), that in the fall-through/false case, r1's
      minimum bound must be 0 and maximum bound must be r2. Latter is
      larger than the bound and thus max value is reset back to being
      'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
      'R1=inv,min_value=0'. The subsequent test ...
      
        11: (65) if r1 s> 0x1 goto pc+2
      
      ... is a signed compare of r1 with immediate value 1. Here,
      verifier deduces in reg_set_min_max() that since the compare
      is signed this time and operation is greater than (>), that
      in the fall-through/false case, we can deduce that r1's maximum
      bound must be 1, meaning with prior test, we result in r1 having
      the following state: R1=inv,min_value=0,max_value=1. Given that
      the actual value this holds is -8, the bounds are wrongly deduced.
      When this is being added to r0 which holds the map_value(_adj)
      type, then subsequent store access in above case will go through
      check_mem_access() which invokes check_map_access_adj(), that
      will then probe whether the map memory is in bounds based
      on the min_value and max_value as well as access size since
      the actual unknown value is min_value <= x <= max_value; commit
      fce366a9 ("bpf, verifier: fix alu ops against map_value{,
      _adj} register types") provides some more explanation on the
      semantics.
      
      It's worth to note in this context that in the current code,
      min_value and max_value tracking are used for two things, i)
      dynamic map value access via check_map_access_adj() and since
      commit 06c1c049 ("bpf: allow helpers access to variable memory")
      ii) also enforced at check_helper_mem_access() when passing a
      memory address (pointer to packet, map value, stack) and length
      pair to a helper and the length in this case is an unknown value
      defining an access range through min_value/max_value in that
      case. The min_value/max_value tracking is /not/ used in the
      direct packet access case to track ranges. However, the issue
      also affects case ii), for example, the following crafted program
      based on the same principle must be rejected as well:
      
         0: (b7) r2 = 0
         1: (bf) r3 = r10
         2: (07) r3 += -512
         3: (7a) *(u64 *)(r10 -16) = -8
         4: (79) r4 = *(u64 *)(r10 -16)
         5: (b7) r6 = -1
         6: (2d) if r4 > r6 goto pc+5
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
         7: (65) if r4 s> 0x1 goto pc+4
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
        R10=fp
         8: (07) r4 += 1
         9: (b7) r5 = 0
        10: (6a) *(u16 *)(r10 -512) = 0
        11: (85) call bpf_skb_load_bytes#26
        12: (b7) r0 = 0
        13: (95) exit
      
      Meaning, while we initialize the max_value stack slot that the
      verifier thinks we access in the [1,2] range, in reality we
      pass -7 as length which is interpreted as u32 in the helper.
      Thus, this issue is relevant also for the case of helper ranges.
      Resetting both bounds in check_reg_overflow() in case only one
      of them exceeds limits is also not enough as similar test can be
      created that uses values which are within range, thus also here
      learned min value in r1 is incorrect when mixed with later signed
      test to create a range:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff880ad081fa00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (65) if r1 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      This leaves us with two options for fixing this: i) to invalidate
      all prior learned information once we switch signed context, ii)
      to track min/max signed and unsigned boundaries separately as
      done in [0]. (Given latter introduces major changes throughout
      the whole verifier, it's rather net-next material, thus this
      patch follows option i), meaning we can derive bounds either
      from only signed tests or only unsigned tests.) There is still the
      case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
      operations, meaning programs like the following where boundaries
      on the reg get mixed in context later on when bounds are merged
      on the dst reg must get rejected, too:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff89b2bf87ce00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+6
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (b7) r7 = 1
        12: (65) if r7 s> 0x0 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
        13: (b7) r0 = 0
        14: (95) exit
      
        from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
        15: (0f) r7 += r1
        16: (65) if r7 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        17: (0f) r0 += r7
        18: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        19: (b7) r0 = 0
        20: (95) exit
      
      Meaning, in adjust_reg_min_max_vals() we must also reset range
      values on the dst when src/dst registers have mixed signed/
      unsigned derived min/max value bounds with one unbounded value
      as otherwise they can be added together deducing false boundaries.
      Once both boundaries are established from either ALU ops or
      compare operations w/o mixing signed/unsigned insns, then they
      can safely be added to other regs also having both boundaries
      established. Adding regs with one unbounded side to a map value
      where the bounded side has been learned w/o mixing ops is
      possible, but the resulting map value won't recover from that,
      meaning such op is considered invalid on the time of actual
      access. Invalid bounds are set on the dst reg in case i) src reg,
      or ii) in case dst reg already had them. The only way to recover
      would be to perform i) ALU ops but only 'add' is allowed on map
      value types or ii) comparisons, but these are disallowed on
      pointers in case they span a range. This is fine as only BPF_JEQ
      and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
      which potentially turn them into PTR_TO_MAP_VALUE type depending
      on the branch, so only here min/max value cannot be invalidated
      for them.
      
      In terms of state pruning, value_from_signed is considered
      as well in states_equal() when dealing with adjusted map values.
      With regards to breaking existing programs, there is a small
      risk, but use-cases are rather quite narrow where this could
      occur and mixing compares probably unlikely.
      
      Joint work with Josef and Edward.
      
        [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Reported-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb6cf01c
    • John Fastabend's avatar
      bpf, verifier: add additional patterns to evaluate_reg_imm_alu · 659ee968
      John Fastabend authored
      
      [ Upstream commit 43188702 ]
      
      Currently the verifier does not track imm across alu operations when
      the source register is of unknown type. This adds additional pattern
      matching to catch this and track imm. We've seen LLVM generating this
      pattern while working on cilium.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      659ee968
    • Konstantin Khlebnikov's avatar
      net_sched: fix order of queue length updates in qdisc_replace() · d8a4ae09
      Konstantin Khlebnikov authored
      
      [ Upstream commit 68a66d14 ]
      
      This important to call qdisc_tree_reduce_backlog() after changing queue
      length. Parent qdisc should deactivate class in ->qlen_notify() called from
      qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
      
      Missed class deactivations leads to crashes/warnings at picking packets
      from empty qdisc and corrupting state at reactivating this class in future.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 86a7996c ("net_sched: introduce qdisc_replace() helper")
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8a4ae09
    • Xin Long's avatar
      net: sched: fix NULL pointer dereference when action calls some targets · 09e1d36d
      Xin Long authored
      
      [ Upstream commit 4f8a881a ]
      
      As we know in some target's checkentry it may dereference par.entryinfo
      to check entry stuff inside. But when sched action calls xt_check_target,
      par.entryinfo is set with NULL. It would cause kernel panic when calling
      some targets.
      
      It can be reproduce with:
        # tc qd add dev eth1 ingress handle ffff:
        # tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
          -j ECN --ecn-tcp-remove
      
      It could also crash kernel when using target CLUSTERIP or TPROXY.
      
      By now there's no proper value for par.entryinfo in ipt_init_target,
      but it can not be set with NULL. This patch is to void all these
      panics by setting it with an ipt_entry obj with all members = 0.
      
      Note that this issue has been there since the very beginning.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09e1d36d
    • Colin Ian King's avatar
      irda: do not leak initialized list.dev to userspace · f4e4a296
      Colin Ian King authored
      
      [ Upstream commit b024d949 ]
      
      list.dev has not been initialized and so the copy_to_user is copying
      data from the stack back to user space which is a potential
      information leak. Fix this ensuring all of list is initialized to
      zero.
      
      Detected by CoverityScan, CID#1357894 ("Uninitialized scalar variable")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4e4a296
    • Huy Nguyen's avatar
      net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled · 754df4da
      Huy Nguyen authored
      
      [ Upstream commit ca3d89a3 ]
      
      enable_4k_uar module parameter was added in patch cited below to
      address the backward compatibility issue in SRIOV when the VM has
      system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
      implementation.
      
      The above compatibility issue does not exist in the non SRIOV case.
      In this patch, we always enable 4k uar implementation if SRIOV
      is not enabled on mlx4's supported cards.
      
      Fixes: 76e39ccf ("net/mlx4_core: Fix backward compatibility on VFs")
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      754df4da
    • Neal Cardwell's avatar
      tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP · 2d093adf
      Neal Cardwell authored
      
      [ Upstream commit cdbeb633 ]
      
      In some situations tcp_send_loss_probe() can realize that it's unable
      to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
      to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
      realizes that the RTO was eligible to fire immediately or at some
      point in the past (delta_us <= 0). Previously in such cases
      tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
      icsk_rto, which caused needless delays of hundreds of milliseconds
      (and non-linear behavior that made reproducible testing
      difficult). This commit changes the logic to schedule "overdue" RTOs
      ASAP, rather than at now + icsk_rto.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Suggested-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d093adf
    • Wei Wang's avatar
      ipv6: repair fib6 tree in failure case · 7bbc60d9
      Wei Wang authored
      
      [ Upstream commit 348a4002 ]
      
      In fib6_add(), it is possible that fib6_add_1() picks an intermediate
      node and sets the node's fn->leaf to NULL in order to add this new
      route. However, if fib6_add_rt2node() fails to add the new
      route for some reason, fn->leaf will be left as NULL and could
      potentially cause crash when fn->leaf is accessed in fib6_locate().
      This patch makes sure fib6_repair_tree() is called to properly repair
      fn->leaf in the above failure case.
      
      Here is the syzkaller reported general protection fault in fib6_locate:
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
      RSP: 0018:ffff8801d01a36a8  EFLAGS: 00010202
      RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
      RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
      RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
      R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
      FS:  00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
       ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
       ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
      Call Trace:
       [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
       [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
       [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
       [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
       [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
       [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
       [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
       [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
       [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
       [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
       [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
       [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
       [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
       [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
       [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
       [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
       [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      Note: there is no "Fixes" tag as this seems to be a bug introduced
      very early.
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bbc60d9
    • Wei Wang's avatar
      ipv6: reset fn->rr_ptr when replacing route · 368129fe
      Wei Wang authored
      
      [ Upstream commit 383143f3 ]
      
      syzcaller reported the following use-after-free issue in rt6_select():
      BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
      BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
      Read of size 4 by task syz-executor1/439628
      CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
       ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
       ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
      Call Trace:
       [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
      sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
      Use struct sctp_sack_info instead
       [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
       [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
       [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
       [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
       [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
       [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
       [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
       [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
       [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
       [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
       [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
       [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
       [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
       [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
       [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
       [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
       [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
       [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
       [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
       [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
       [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
       [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
      Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
      
      The root cause of it is that in fib6_add_rt2node(), when it replaces an
      existing route with the new one, it does not update fn->rr_ptr.
      This commit resets fn->rr_ptr to NULL when it points to a route which is
      replaced in fib6_add_rt2node().
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      368129fe
    • Eric Dumazet's avatar
      tipc: fix use-after-free · c549de48
      Eric Dumazet authored
      
      [ Upstream commit 5bfd37b4 ]
      
      syszkaller reported use-after-free in tipc [1]
      
      When msg->rep skb is freed, set the pointer to NULL,
      so that caller does not free it again.
      
      [1]
      
      ==================================================================
      BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
      Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115
      
      CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x24e/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       skb_push+0xd4/0xe0 net/core/skbuff.c:1466
       tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512e9
      RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
      RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
      R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
       kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
       __alloc_skb+0xf1/0x740 net/core/skbuff.c:219
       alloc_skb include/linux/skbuff.h:903 [inline]
       tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
       tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kmem_cache_free+0x77/0x280 mm/slab.c:3763
       kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
       __kfree_skb net/core/skbuff.c:682 [inline]
       kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
       tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      The buggy address belongs to the object at ffff8801c6e71dc0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 208 bytes inside of
       224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
      The buggy address belongs to the page:
      page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
      flags: 0x200000000000100(slab)
      raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
      raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                               ^
       ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov  <dvyukov@google.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c549de48
    • Alexander Potapenko's avatar
      sctp: fully initialize the IPv6 address in sctp_v6_to_addr() · 62b3580f
      Alexander Potapenko authored
      
      [ Upstream commit 15339e44 ]
      
      KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
      sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
      Make sure all fields of an IPv6 address are initialized, which
      guarantees that the IPv4 fields are also initialized.
      
      ==================================================================
       BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
       net/sctp/ipv6.c:517
       CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
       01/01/2011
       Call Trace:
        dump_stack+0x172/0x1c0 lib/dump_stack.c:42
        is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
        native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
        arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
        arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
        sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
        sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
        sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
        sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
        inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg net/socket.c:643 [inline]
        SYSC_sendto+0x608/0x710 net/socket.c:1696
        SyS_sendto+0x8a/0xb0 net/socket.c:1664
        entry_SYSCALL_64_fastpath+0x13/0x94
       RIP: 0033:0x44b479
       RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
       RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
       RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
       R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
       R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
       origin description: ----dst_saddr@sctp_v6_get_dst
       local variable created at:
        sk_fullsock include/net/sock.h:2321 [inline]
        inet6_sk include/linux/ipv6.h:309 [inline]
        sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
      ==================================================================
       BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
       net/sctp/ipv6.c:517
       CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
       01/01/2011
       Call Trace:
        dump_stack+0x172/0x1c0 lib/dump_stack.c:42
        is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
        native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
        arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
        arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
        sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
        sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
        sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
        sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
        inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg net/socket.c:643 [inline]
        SYSC_sendto+0x608/0x710 net/socket.c:1696
        SyS_sendto+0x8a/0xb0 net/socket.c:1664
        entry_SYSCALL_64_fastpath+0x13/0x94
       RIP: 0033:0x44b479
       RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
       RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
       RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
       R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
       R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
       origin description: ----dst_saddr@sctp_v6_get_dst
       local variable created at:
        sk_fullsock include/net/sock.h:2321 [inline]
        inet6_sk include/linux/ipv6.h:309 [inline]
        sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
      ==================================================================
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62b3580f
    • Eric Dumazet's avatar
      tun: handle register_netdevice() failures properly · dda84477
      Eric Dumazet authored
      
      [ Upstream commit ff244c6b ]
      
      syzkaller reported a double free [1], caused by the fact
      that tun driver was not updated properly when priv_destructor
      was added.
      
      When/if register_netdevice() fails, priv_destructor() must have been
      called already.
      
      [1]
      BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
      
      CPU: 0 PID: 2919 Comm: syzkaller227220 Not tainted 4.13.0-rc4+ #23
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x7f/0x260 mm/kasan/report.c:252
       kasan_report_double_free+0x55/0x80 mm/kasan/report.c:333
       kasan_slab_free+0xa0/0xc0 mm/kasan/kasan.c:514
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xd3/0x260 mm/slab.c:3820
       selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
       security_tun_dev_free_security+0x48/0x80 security/security.c:1512
       tun_set_iff drivers/net/tun.c:1884 [inline]
       __tun_chr_ioctl+0x2ce6/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x443ff9
      RSP: 002b:00007ffc34271f68 EFLAGS: 00000217 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000443ff9
      RDX: 0000000020533000 RSI: 00000000400454ca RDI: 0000000000000003
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000401ce0
      R13: 0000000000401d70 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 2919:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:551
       kmem_cache_alloc_trace+0x101/0x6f0 mm/slab.c:3627
       kmalloc include/linux/slab.h:493 [inline]
       kzalloc include/linux/slab.h:666 [inline]
       selinux_tun_dev_alloc_security+0x49/0x170 security/selinux/hooks.c:5012
       security_tun_dev_alloc_security+0x6d/0xa0 security/security.c:1506
       tun_set_iff drivers/net/tun.c:1839 [inline]
       __tun_chr_ioctl+0x1730/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 2919:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x6e/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xd3/0x260 mm/slab.c:3820
       selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
       security_tun_dev_free_security+0x48/0x80 security/security.c:1512
       tun_free_netdev+0x13b/0x1b0 drivers/net/tun.c:1563
       register_netdevice+0x8d0/0xee0 net/core/dev.c:7605
       tun_set_iff drivers/net/tun.c:1859 [inline]
       __tun_chr_ioctl+0x1caf/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      The buggy address belongs to the object at ffff8801d2843b40
       which belongs to the cache kmalloc-32 of size 32
      The buggy address is located 0 bytes inside of
       32-byte region [ffff8801d2843b40, ffff8801d2843b60)
      The buggy address belongs to the page:
      page:ffffea000660cea8 count:1 mapcount:0 mapping:ffff8801d2843000 index:0xffff8801d2843fc1
      flags: 0x200000000000100(slab)
      raw: 0200000000000100 ffff8801d2843000 ffff8801d2843fc1 000000010000003f
      raw: ffffea0006626a40 ffffea00066141a0 ffff8801dbc00100
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801d2843a00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
       ffff8801d2843a80: 00 00 00 fc fc fc fc fc fb fb fb fb fc fc fc fc
      >ffff8801d2843b00: 00 00 00 00 fc fc fc fc fb fb fb fb fc fc fc fc
                                                 ^
       ffff8801d2843b80: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
       ffff8801d2843c00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
      
      ==================================================================
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dda84477
    • Colin Ian King's avatar
      nfp: fix infinite loop on umapping cleanup · 3c3181e1
      Colin Ian King authored
      
      [ Upstream commit eac2c68d ]
      
      The while loop that performs the dma page unmapping never decrements
      index counter f and hence loops forever. Fix this with a pre-decrement
      on f.
      
      Detected by CoverityScan, CID#1357309 ("Infinite loop")
      
      Fixes: 4c352362 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c3181e1
    • Eric Dumazet's avatar
      ipv4: better IP_MAX_MTU enforcement · 9c579acf
      Eric Dumazet authored
      
      [ Upstream commit c780a049 ]
      
      While working on yet another syzkaller report, I found
      that our IP_MAX_MTU enforcements were not properly done.
      
      gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
      final result can be bigger than IP_MAX_MTU :/
      
      This is a problem because device mtu can be changed on other cpus or
      threads.
      
      While this patch does not fix the issue I am working on, it is
      probably worth addressing it.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c579acf
    • Eric Dumazet's avatar
      ptr_ring: use kmalloc_array() · 12ee6d75
      Eric Dumazet authored
      
      [ Upstream commit 81fbfe8a ]
      
      As found by syzkaller, malicious users can set whatever tx_queue_len
      on a tun device and eventually crash the kernel.
      
      Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
      ring buffer is not fast anyway.
      
      Fixes: 2e0ab8ca ("ptr_ring: array based FIFO for pointers")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12ee6d75
    • Liping Zhang's avatar
      openvswitch: fix skb_panic due to the incorrect actions attrlen · cb445bfc
      Liping Zhang authored
      
      [ Upstream commit 494bea39 ]
      
      For sw_flow_actions, the actions_len only represents the kernel part's
      size, and when we dump the actions to the userspace, we will do the
      convertions, so it's true size may become bigger than the actions_len.
      
      But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
      to alloc the skbuff, so the user_skb's size may become insufficient and
      oops will happen like this:
        skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
        ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
        ------------[ cut here ]------------
        kernel BUG at net/core/skbuff.c:129!
        [...]
        Call Trace:
         <IRQ>
         [<ffffffff8148be82>] skb_put+0x43/0x44
         [<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
         [<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
         [<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
         [<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
         [<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
         [<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
         [<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
         [<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
         [<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
         [<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
        [...]
      
      Also we can find that the actions_len is much little than the orig_len:
        crash> struct sw_flow_actions 0xffff8812f539d000
        struct sw_flow_actions {
          rcu = {
            next = 0xffff8812f5398800,
            func = 0xffffe3b00035db32
          },
          orig_len = 1384,
          actions_len = 592,
          actions = 0xffff8812f539d01c
        }
      
      So as a quick fix, use the orig_len instead of the actions_len to alloc
      the user_skb.
      
      Last, this oops happened on our system running a relative old kernel, but
      the same risk still exists on the mainline, since we use the wrong
      actions_len from the beginning.
      
      Fixes: ccea7445 ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
      Cc: Neil McKee <neil.mckee@inmon.com>
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb445bfc
    • David Ahern's avatar
      net: igmp: Use ingress interface rather than vrf device · c6fc7b98
      David Ahern authored
      
      [ Upstream commit c7b725be ]
      
      Anuradha reported that statically added groups for interfaces enslaved
      to a VRF device were not persisting. The problem is that igmp queries
      and reports need to use the data in the in_dev for the real ingress
      device rather than the VRF device. Update igmp_rcv accordingly.
      
      Fixes: e58e4159 ("net: Enable support for VRF with ipv4 multicast")
      Reported-by: default avatarAnuradha Karuppiah <anuradhak@cumulusnetworks.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6fc7b98
    • Daniel Borkmann's avatar
      bpf: fix bpf_trace_printk on 32 bit archs · 921739a9
      Daniel Borkmann authored
      
      [ Upstream commit 88a5c690 ]
      
      James reported that on MIPS32 bpf_trace_printk() is currently
      broken while MIPS64 works fine:
      
        bpf_trace_printk() uses conditional operators to attempt to
        pass different types to __trace_printk() depending on the
        format operators. This doesn't work as intended on 32-bit
        architectures where u32 and long are passed differently to
        u64, since the result of C conditional operators follows the
        "usual arithmetic conversions" rules, such that the values
        passed to __trace_printk() will always be u64 [causing issues
        later in the va_list handling for vscnprintf()].
      
        For example the samples/bpf/tracex5 test printed lines like
        below on MIPS32, where the fd and buf have come from the u64
        fd argument, and the size from the buf argument:
      
          [...] 1180.941542: 0x00000001: write(fd=1, buf=  (null), size=6258688)
      
        Instead of this:
      
          [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)
      
      One way to get it working is to expand various combinations
      of argument types into 8 different combinations for 32 bit
      and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
      as well that it resolves the issue.
      
      Fixes: 9c959c86 ("tracing: Allow BPF programs to call bpf_trace_printk()")
      Reported-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Tested-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      921739a9
    • Konstantin Khlebnikov's avatar
      net_sched: remove warning from qdisc_hash_add · 99f635d1
      Konstantin Khlebnikov authored
      
      [ Upstream commit c90e9514 ]
      
      It was added in commit e57a784d ("pkt_sched: set root qdisc
      before change() in attach_default_qdiscs()") to hide duplicates
      from "tc qdisc show" for incative deivices.
      
      After 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      it triggered when classful qdisc is added to inactive device because
      default qdiscs are added before switching root qdisc.
      
      Anyway after commit ea327469 ("net: sched: avoid duplicates in
      qdisc dump") duplicates are filtered right in dumper.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99f635d1
    • Konstantin Khlebnikov's avatar
      net_sched/sfq: update hierarchical backlog when drop packet · cf665a60
      Konstantin Khlebnikov authored
      
      [ Upstream commit 325d5dc3 ]
      
      When sfq_enqueue() drops head packet or packet from another queue it
      have to update backlog at upper qdiscs too.
      
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf665a60
    • Eric Dumazet's avatar
      ipv4: fix NULL dereference in free_fib_info_rcu() · 163db2c6
      Eric Dumazet authored
      
      [ Upstream commit 187e5b3a ]
      
      If fi->fib_metrics could not be allocated in fib_create_info()
      we attempt to dereference a NULL pointer in free_fib_info_rcu() :
      
          m = fi->fib_metrics;
          if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
                  kfree(m);
      
      Before my recent patch, we used to call kfree(NULL) and nothing wrong
      happened.
      
      Instead of using RCU to defer freeing while we are under memory stress,
      it seems better to take immediate action.
      
      This was reported by syzkaller team.
      
      Fixes: 3fb07daf ("ipv4: add reference counting to metrics")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      163db2c6
    • Eric Dumazet's avatar
      dccp: defer ccid_hc_tx_delete() at dismantle time · f1d05546
      Eric Dumazet authored
      
      [ Upstream commit 120e9dab ]
      
      syszkaller team reported another problem in DCCP [1]
      
      Problem here is that the structure holding RTO timer
      (ccid2_hc_tx_rto_expire() handler) is freed too soon.
      
      We can not use del_timer_sync() to cancel the timer
      since this timer wants to grab socket lock (that would risk a dead lock)
      
      Solution is to defer the freeing of memory when all references to
      the socket were released. Socket timers do own a reference, so this
      should fix the issue.
      
      [1]
      
      ==================================================================
      BUG: KASAN: use-after-free in ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
      Read of size 4 at addr ffff8801d2660540 by task kworker/u4:7/3365
      
      CPU: 1 PID: 3365 Comm: kworker/u4:7 Not tainted 4.13.0-rc4+ #3
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events_unbound call_usermodehelper_exec_work
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x24e/0x340 mm/kasan/report.c:409
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
       ccid2_hc_tx_rto_expire+0x51c/0x5c0 net/dccp/ccids/ccid2.c:144
       call_timer_fn+0x233/0x830 kernel/time/timer.c:1268
       expire_timers kernel/time/timer.c:1307 [inline]
       __run_timers+0x7fd/0xb90 kernel/time/timer.c:1601
       run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
       __do_softirq+0x2f5/0xba3 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:638 [inline]
       smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:1044
       apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:702
      RIP: 0010:arch_local_irq_enable arch/x86/include/asm/paravirt.h:824 [inline]
      RIP: 0010:__raw_write_unlock_irq include/linux/rwlock_api_smp.h:267 [inline]
      RIP: 0010:_raw_write_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:343
      RSP: 0018:ffff8801cd50eaa8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
      RAX: dffffc0000000000 RBX: ffffffff85a090c0 RCX: 0000000000000006
      RDX: 1ffffffff0b595f3 RSI: 1ffff1003962f989 RDI: ffffffff85acaf98
      RBP: ffff8801cd50eab0 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801cc96ea60
      R13: dffffc0000000000 R14: ffff8801cc96e4c0 R15: ffff8801cc96e4c0
       </IRQ>
       release_task+0xe9e/0x1a40 kernel/exit.c:220
       wait_task_zombie kernel/exit.c:1162 [inline]
       wait_consider_task+0x29b8/0x33c0 kernel/exit.c:1389
       do_wait_thread kernel/exit.c:1452 [inline]
       do_wait+0x441/0xa90 kernel/exit.c:1523
       kernel_wait4+0x1f5/0x370 kernel/exit.c:1665
       SYSC_wait4+0x134/0x140 kernel/exit.c:1677
       SyS_wait4+0x2c/0x40 kernel/exit.c:1673
       call_usermodehelper_exec_sync kernel/kmod.c:286 [inline]
       call_usermodehelper_exec_work+0x1a0/0x2c0 kernel/kmod.c:323
       process_one_work+0xbf3/0x1bc0 kernel/workqueue.c:2097
       worker_thread+0x223/0x1860 kernel/workqueue.c:2231
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:425
      
      Allocated by task 21267:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
       kmem_cache_alloc+0x127/0x750 mm/slab.c:3561
       ccid_new+0x20e/0x390 net/dccp/ccid.c:151
       dccp_hdlr_ccid+0x27/0x140 net/dccp/feat.c:44
       __dccp_feat_activate+0x142/0x2a0 net/dccp/feat.c:344
       dccp_feat_activate_values+0x34e/0xa90 net/dccp/feat.c:1538
       dccp_rcv_request_sent_state_process net/dccp/input.c:472 [inline]
       dccp_rcv_state_process+0xed1/0x1620 net/dccp/input.c:677
       dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
       sk_backlog_rcv include/net/sock.h:911 [inline]
       __release_sock+0x124/0x360 net/core/sock.c:2269
       release_sock+0xa4/0x2a0 net/core/sock.c:2784
       inet_wait_for_connect net/ipv4/af_inet.c:557 [inline]
       __inet_stream_connect+0x671/0xf00 net/ipv4/af_inet.c:643
       inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:682
       SYSC_connect+0x204/0x470 net/socket.c:1642
       SyS_connect+0x24/0x30 net/socket.c:1623
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 3049:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kmem_cache_free+0x77/0x280 mm/slab.c:3763
       ccid_hc_tx_delete+0xc5/0x100 net/dccp/ccid.c:190
       dccp_destroy_sock+0x1d1/0x2b0 net/dccp/proto.c:225
       inet_csk_destroy_sock+0x166/0x3f0 net/ipv4/inet_connection_sock.c:833
       dccp_done+0xb7/0xd0 net/dccp/proto.c:145
       dccp_time_wait+0x13d/0x300 net/dccp/minisocks.c:72
       dccp_rcv_reset+0x1d1/0x5b0 net/dccp/input.c:160
       dccp_rcv_state_process+0x8fc/0x1620 net/dccp/input.c:663
       dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:679
       sk_backlog_rcv include/net/sock.h:911 [inline]
       __sk_receive_skb+0x33e/0xc00 net/core/sock.c:521
       dccp_v4_rcv+0xef1/0x1c00 net/dccp/ipv4.c:871
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:248 [inline]
       ip_local_deliver+0x1ce/0x6d0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:477 [inline]
       ip_rcv_finish+0x8db/0x19c0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:248 [inline]
       ip_rcv+0xc3f/0x17d0 net/ipv4/ip_input.c:488
       __netif_receive_skb_core+0x19af/0x33d0 net/core/dev.c:4417
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4455
       process_backlog+0x203/0x740 net/core/dev.c:5130
       napi_poll net/core/dev.c:5527 [inline]
       net_rx_action+0x792/0x1910 net/core/dev.c:5593
       __do_softirq+0x2f5/0xba3 kernel/softirq.c:284
      
      The buggy address belongs to the object at ffff8801d2660100
       which belongs to the cache ccid2_hc_tx_sock of size 1240
      The buggy address is located 1088 bytes inside of
       1240-byte region [ffff8801d2660100, ffff8801d26605d8)
      The buggy address belongs to the page:
      page:ffffea0007499800 count:1 mapcount:0 mapping:ffff8801d2660100 index:0x0 compound_mapcount: 0
      flags: 0x200000000008100(slab|head)
      raw: 0200000000008100 ffff8801d2660100 0000000000000000 0000000100000005
      raw: ffffea00075271a0 ffffea0007538820 ffff8801d3aef9c0 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801d2660400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801d2660480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8801d2660500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                 ^
       ffff8801d2660580: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff8801d2660600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1d05546
    • Eric Dumazet's avatar
      dccp: purge write queue in dccp_destroy_sock() · a8de69b9
      Eric Dumazet authored
      
      [ Upstream commit 7749d4ff ]
      
      syzkaller reported that DCCP could have a non empty
      write queue at dismantle time.
      
      WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:180
       __warn+0x1c4/0x1d9 kernel/panic.c:541
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
       do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
       invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
      RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
      RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
      RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
      R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
       inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
       dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       sock_release+0x8d/0x1e0 net/socket.c:597
       sock_close+0x16/0x20 net/socket.c:1126
       __fput+0x327/0x7e0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:246
       task_work_run+0x18a/0x260 kernel/task_work.c:116
       exit_task_work include/linux/task_work.h:21 [inline]
       do_exit+0xa32/0x1b10 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:969
       get_signal+0x7e8/0x17e0 kernel/signal.c:2330
       do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
       exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
       prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
       syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8de69b9
    • Eric Dumazet's avatar
      af_key: do not use GFP_KERNEL in atomic contexts · 94fd3556
      Eric Dumazet authored
      
      [ Upstream commit 36f41f8f ]
      
      pfkey_broadcast() might be called from non process contexts,
      we can not use GFP_KERNEL in these cases [1].
      
      This patch partially reverts commit ba51b6be ("net: Fix RCU splat in
      af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
      section.
      
      [1] : syzkaller reported :
      
      in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
      3 locks held by syzkaller183439/2932:
       #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
       #1:  (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
      CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
       __might_sleep+0x95/0x190 kernel/sched/core.c:5947
       slab_pre_alloc_hook mm/slab.h:416 [inline]
       slab_alloc mm/slab.c:3383 [inline]
       kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
       skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
       pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
       pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
       dump_sp+0x3d6/0x500 net/key/af_key.c:2685
       xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
       pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
       pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
       pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
       pfkey_process+0x606/0x710 net/key/af_key.c:2814
       pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
      sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x755/0x890 net/socket.c:2035
       __sys_sendmsg+0xe5/0x210 net/socket.c:2069
       SYSC_sendmsg net/socket.c:2080 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2076
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x445d79
      RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
      RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
      RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
      R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
      R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000
      
      Fixes: ba51b6be ("net: Fix RCU splat in af_key")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94fd3556
    • Andreas Born's avatar
      bonding: ratelimit failed speed/duplex update warning · 72942014
      Andreas Born authored
      
      [ Upstream commit 11e9d782 ]
      
      bond_miimon_commit() handles the UP transition for each slave of a bond
      in the case of MII. It is triggered 10 times per second for the default
      MII Polling interval of 100ms. For device drivers that do not implement
      __ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
      fails persistently while the MII status could remain UP. That is, in
      this and other cases where the speed/duplex update keeps failing over a
      longer period of time while the MII state is UP, a warning is printed
      every MII polling interval.
      
      To address these excessive warnings net_ratelimit() should be used.
      Printing a warning once would not be sufficient since the call to
      bond_update_speed_duplex() could recover to succeed and fail again
      later. In that case there would be no new indication what went wrong.
      
      Fixes: b5bf0f5b (bonding: correctly update link status during mii-commit phase)
      Signed-off-by: default avatarAndreas Born <futur.andy@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72942014
    • Andreas Born's avatar
      bonding: require speed/duplex only for 802.3ad, alb and tlb · b39ae1c8
      Andreas Born authored
      
      [ Upstream commit ad729bc9 ]
      
      The patch c4adfc82 ("bonding: make speed, duplex setting consistent
      with link state") puts the link state to down if
      bond_update_speed_duplex() cannot retrieve speed and duplex settings.
      Assumably the patch was written with 802.3ad mode in mind which relies
      on link speed/duplex settings. For other modes like active-backup these
      settings are not required. Thus, only for these other modes, this patch
      reintroduces support for slaves that do not support reporting speed or
      duplex such as wireless devices. This fixes the regression reported in
      bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547).
      
      Fixes: c4adfc82 ("bonding: make speed, duplex setting consistent
      with link state")
      Signed-off-by: default avatarAndreas Born <futur.andy@googlemail.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b39ae1c8
    • Tushar Dave's avatar
      sparc64: remove unnecessary log message · 16caf8df
      Tushar Dave authored
      [ Upstream commit 6170a506 ]
      
      There is no need to log message if ATU hvapi couldn't get register.
      Unlike PCI hvapi, ATU hvapi registration failure is not hard error.
      Even if ATU hvapi registration fails (on system with ATU or without
      ATU) system continues with legacy IOMMU. So only log message when
      ATU hvapi successfully get registered.
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16caf8df
  2. 25 Aug, 2017 4 commits