1. 29 Apr, 2018 40 commits
    • Martin Schwidefsky's avatar
      s390: move nobp parameter functions to nospec-branch.c · ea1bbd53
      Martin Schwidefsky authored
      [ Upstream commit b2e2f43a ]
      
      Keep the code for the nobp parameter handling with the code for
      expolines. Both are related to the spectre v2 mitigation.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea1bbd53
    • Christian Borntraeger's avatar
      s390/entry.S: fix spurious zeroing of r0 · 6cdc4b21
      Christian Borntraeger authored
      [ Upstream commit d3f46896 ]
      
      when a system call is interrupted we might call the critical section
      cleanup handler that re-does some of the operations. When we are between
      .Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
      problem state registers r0-r7:
      
      .Lcleanup_system_call:
      [...]
      0:      # update accounting time stamp
              mvc     __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
              # set up saved register r11
              lg      %r15,__LC_KERNEL_STACK
              la      %r9,STACK_FRAME_OVERHEAD(%r15)
              stg     %r9,24(%r11)            # r11 pt_regs pointer
              # fill pt_regs
              mvc     __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
      --->    stmg    %r0,%r7,__PT_R0(%r9)
      
      The problem is now, that we might have already zeroed out r0.
      The fix is to move the zeroing of r0 after sysc_do_svc.
      Reported-by: default avatarFarhan Ali <alifm@linux.vnet.ibm.com>
      Fixes: 7041d281 ("s390: scrub registers on kernel entry and KVM exit")
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cdc4b21
    • Martin Schwidefsky's avatar
      s390: do not bypass BPENTER for interrupt system calls · 74a93ae5
      Martin Schwidefsky authored
      [ Upstream commit d5feec04 ]
      
      The system call path can be interrupted before the switch back to the
      standard branch prediction with BPENTER has been done. The critical
      section cleanup code skips forward to .Lsysc_do_svc and bypasses the
      BPENTER. In this case the kernel and all subsequent code will run with
      the limited branch prediction.
      
      Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74a93ae5
    • Eugeniu Rosca's avatar
      s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*) · 6288e169
      Eugeniu Rosca authored
      [ Upstream commit 2cb370d6 ]
      
      I've accidentally stumbled upon the IS_ENABLED(EXPOLINE_*) lines, which
      obviously always evaluate to false. Fix this.
      
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Signed-off-by: default avatarEugeniu Rosca <erosca@de.adit-jv.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6288e169
    • Christian Borntraeger's avatar
      KVM: s390: force bp isolation for VSIE · 1d966a6a
      Christian Borntraeger authored
      [ Upstream commit f315104a ]
      
      If the guest runs with bp isolation when doing a SIE instruction,
      we must also run the nested guest with bp isolation when emulating
      that SIE instruction.
      This is done by activating BPBC in the lpar, which acts as an override
      for lower level guests.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d966a6a
    • Martin Schwidefsky's avatar
      s390: introduce execute-trampolines for branches · b609eb65
      Martin Schwidefsky authored
      [ Upstream commit f19fbd5e ]
      
      Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
      -mfunction_return= compiler options to create a kernel fortified against
      the specte v2 attack.
      
      With CONFIG_EXPOLINE=y all indirect branches will be issued with an
      execute type instruction. For z10 or newer the EXRL instruction will
      be used, for older machines the EX instruction. The typical indirect
      call
      
      	basr	%r14,%r1
      
      is replaced with a PC relative call to a new thunk
      
      	brasl	%r14,__s390x_indirect_jump_r1
      
      The thunk contains the EXRL/EX instruction to the indirect branch
      
      __s390x_indirect_jump_r1:
      	exrl	0,0f
      	j	.
      0:	br	%r1
      
      The detour via the execute type instruction has a performance impact.
      To get rid of the detour the new kernel parameter "nospectre_v2" and
      "spectre_v2=[on,off,auto]" can be used. If the parameter is specified
      the kernel and module code will be patched at runtime.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b609eb65
    • Martin Schwidefsky's avatar
      s390: run user space and KVM guests with modified branch prediction · 0bd4c47c
      Martin Schwidefsky authored
      [ Upstream commit 6b73044b ]
      
      Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
      plumbing in entry.S to be able to run user space and KVM guests with
      limited branch prediction.
      
      To switch a user space process to limited branch prediction the
      s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
      guest associated with the current task with limited branch prediction
      call s390_isolate_bp_guest().
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bd4c47c
    • Martin Schwidefsky's avatar
      s390: add options to change branch prediction behaviour for the kernel · 43cccd87
      Martin Schwidefsky authored
      [ Upstream commit d768bd89 ]
      
      Add the PPA instruction to the system entry and exit path to switch
      the kernel to a different branch prediction behaviour. The instructions
      are added via CPU alternatives and can be disabled with the "nospec"
      or the "nobp=0" kernel parameter. If the default behaviour selected
      with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
      used to enable the changed kernel branch prediction.
      Acked-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43cccd87
    • Martin Schwidefsky's avatar
      s390/alternative: use a copy of the facility bit mask · c257f81b
      Martin Schwidefsky authored
      [ Upstream commit cf148998 ]
      
      To be able to switch off specific CPU alternatives with kernel parameters
      make a copy of the facility bit mask provided by STFLE and use the copy
      for the decision to apply an alternative.
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c257f81b
    • Martin Schwidefsky's avatar
      s390: add optimized array_index_mask_nospec · 2ae89b86
      Martin Schwidefsky authored
      [ Upstream commit e2dd8333 ]
      
      Add an optimized version of the array_index_mask_nospec function for
      s390 based on a compare and a subtract with borrow.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ae89b86
    • Martin Schwidefsky's avatar
      s390: scrub registers on kernel entry and KVM exit · 2ae8b683
      Martin Schwidefsky authored
      [ Upstream commit 7041d281 ]
      
      Clear all user space registers on entry to the kernel and all KVM guest
      registers on KVM guest exit if the register does not contain either a
      parameter or a result value.
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ae8b683
    • Christian Borntraeger's avatar
      KVM: s390: wire up bpb feature · ea5566fe
      Christian Borntraeger authored
      [ Upstream commit 35b3fde6 ]
      
      The new firmware interfaces for branch prediction behaviour changes
      are transparently available for the guest. Nevertheless, there is
      new state attached that should be migrated and properly resetted.
      Provide a mechanism for handling reset, migration and VSIE.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      [Changed capability number to 152. - Radim]
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea5566fe
    • Heiko Carstens's avatar
      s390: enable CPU alternatives unconditionally · 37e79747
      Heiko Carstens authored
      [ Upstream commit 049a2c2d ]
      
      Remove the CPU_ALTERNATIVES config option and enable the code
      unconditionally. The config option was only added to avoid a conflict
      with the named saved segment support. Since that code is gone there is
      no reason to keep the CPU_ALTERNATIVES config option.
      
      Just enable it unconditionally to also reduce the number of config
      options and make it less likely that something breaks.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37e79747
    • Vasily Gorbik's avatar
      s390: introduce CPU alternatives · b44533a0
      Vasily Gorbik authored
      [ Upstream commit 686140a1 ]
      
      Implement CPU alternatives, which allows to optionally patch newer
      instructions at runtime, based on CPU facilities availability.
      
      A new kernel boot parameter "noaltinstr" disables patching.
      
      Current implementation is derived from x86 alternatives. Although
      ideal instructions padding (when altinstr is longer then oldinstr)
      is added at compile time, and no oldinstr nops optimization has to be
      done at runtime. Also couple of compile time sanity checks are done:
      1. oldinstr and altinstr must be <= 254 bytes long,
      2. oldinstr and altinstr must not have an odd length.
      
      alternative(oldinstr, altinstr, facility);
      alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
      
      Both compile time and runtime padding consists of either 6/4/2 bytes nop
      or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
      
      .altinstructions and .altinstr_replacement sections are part of
      __init_begin : __init_end region and are freed after initialization.
      Signed-off-by: default avatarVasily Gorbik <gor@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b44533a0
    • Michael S. Tsirkin's avatar
      virtio_net: fix adding vids on big-endian · 55c80adf
      Michael S. Tsirkin authored
      
      [ Upstream commit d7fad4c8 ]
      
      Programming vids (adding or removing them) still passes
      guest-endian values in the DMA buffer. That's wrong
      if guest is big-endian and when virtio 1 is enabled.
      
      Note: this is on top of a previous patch:
      	virtio_net: split out ctrl buffer
      
      Fixes: 9465a7a6 ("virtio_net: enable v1.0 support")
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55c80adf
    • Michael S. Tsirkin's avatar
      virtio_net: split out ctrl buffer · d86aacaa
      Michael S. Tsirkin authored
      
      [ Upstream commit 12e57169 ]
      
      When sending control commands, virtio net sets up several buffers for
      DMA. The buffers are all part of the net device which means it's
      actually allocated by kvmalloc so it's in theory (on extreme memory
      pressure) possible to get a vmalloc'ed buffer which on some platforms
      means we can't DMA there.
      
      Fix up by moving the DMA buffers into a separate structure.
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d86aacaa
    • Ivan Khoronzhuk's avatar
      net: ethernet: ti: cpsw: fix tx vlan priority mapping · 16c36a2c
      Ivan Khoronzhuk authored
      
      [ Upstream commit 5e391dc5 ]
      
      The CPDMA_TX_PRIORITY_MAP in real is vlan pcp field priority mapping
      register and basically replaces vlan pcp field for tagged packets.
      So, set it to be 1:1 mapping. Otherwise, it will cause unexpected
      change of egress vlan tagged packets, like prio 2 -> prio 5.
      
      Fixes: e05107e6 ("net: ethernet: ti: cpsw: add multi queue support")
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16c36a2c
    • Cong Wang's avatar
      llc: fix NULL pointer deref for SOCK_ZAPPED · 7814c479
      Cong Wang authored
      
      [ Upstream commit 3a04ce71 ]
      
      For SOCK_ZAPPED socket, we don't need to care about llc->sap,
      so we should just skip these refcount functions in this case.
      
      Fixes: f7e43672 ("llc: hold llc_sap before release_sock()")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7814c479
    • Cong Wang's avatar
      llc: hold llc_sap before release_sock() · 543a6011
      Cong Wang authored
      
      [ Upstream commit f7e43672 ]
      
      syzbot reported we still access llc->sap in llc_backlog_rcv()
      after it is freed in llc_sap_remove_socket():
      
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
       llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
       llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
       llc_conn_service net/llc/llc_conn.c:400 [inline]
       llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75
       llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891
       sk_backlog_rcv include/net/sock.h:909 [inline]
       __release_sock+0x12f/0x3a0 net/core/sock.c:2335
       release_sock+0xa4/0x2b0 net/core/sock.c:2850
       llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204
      
      llc->sap is refcount'ed and llc_sap_remove_socket() is paired
      with llc_sap_add_socket(). This can be amended by holding its refcount
      before llc_sap_remove_socket() and releasing it after release_sock().
      
      Reported-by: <syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      543a6011
    • Alexander Aring's avatar
      net: sched: ife: check on metadata length · 4c2c574c
      Alexander Aring authored
      
      [ Upstream commit d57493d6 ]
      
      This patch checks if sk buffer is available to dererence ife header. If
      not then NULL will returned to signal an malformed ife packet. This
      avoids to crashing the kernel from outside.
      Signed-off-by: default avatarAlexander Aring <aring@mojatatu.com>
      Reviewed-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c2c574c
    • Alexander Aring's avatar
      net: sched: ife: handle malformed tlv length · 388f3d97
      Alexander Aring authored
      
      [ Upstream commit cc74eddd ]
      
      There is currently no handling to check on a invalid tlv length. This
      patch adds such handling to avoid killing the kernel with a malformed
      ife packet.
      Signed-off-by: default avatarAlexander Aring <aring@mojatatu.com>
      Reviewed-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      388f3d97
    • Soheil Hassas Yeganeh's avatar
      tcp: clear tp->packets_out when purging write queue · 75020d63
      Soheil Hassas Yeganeh authored
      
      Clear tp->packets_out when purging the write queue, otherwise
      tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
      This results in NULL pointer dereference.
      
      Also, remove the redundant `tp->packets_out = 0` from
      tcp_disconnect(), since tcp_disconnect() calls
      tcp_write_queue_purge().
      
      Fixes: a27fd7a8 (tcp: purge write queue upon RST)
      Reported-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Reported-by: default avatarSami Farin <hvtaifwkbgefbaei@gmail.com>
      Tested-by: default avatarSami Farin <hvtaifwkbgefbaei@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75020d63
    • Alexander Aring's avatar
      net: sched: ife: signal not finding metaid · da499024
      Alexander Aring authored
      
      [ Upstream commit f6cd1453 ]
      
      We need to record stats for received metadata that we dont know how
      to process. Have find_decode_metaid() return -ENOENT to capture this.
      Signed-off-by: default avatarAlexander Aring <aring@mojatatu.com>
      Reviewed-by: default avatarYotam Gigi <yotam.gi@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da499024
    • Doron Roberts-Kedes's avatar
      strparser: Fix incorrect strp->need_bytes value. · 2f781ebf
      Doron Roberts-Kedes authored
      
      [ Upstream commit 9d0c75bf ]
      
      strp_data_ready resets strp->need_bytes to 0 if strp_peek_len indicates
      that the remainder of the message has been received. However,
      do_strp_work does not reset strp->need_bytes to 0. If do_strp_work
      completes a partial message, the value of strp->need_bytes will continue
      to reflect the needed bytes of the previous message, causing
      future invocations of strp_data_ready to return early if
      strp->need_bytes is less than strp_peek_len. Resetting strp->need_bytes
      to 0 in __strp_recv on handing a full message to the upper layer solves
      this problem.
      
      __strp_recv also calculates strp->need_bytes using stm->accum_len before
      stm->accum_len has been incremented by cand_len. This can cause
      strp->need_bytes to be equal to the full length of the message instead
      of the full length minus the accumulated length. This, in turn, causes
      strp_data_ready to return early, even when there is sufficient data to
      complete the partial message. Incrementing stm->accum_len before using
      it to calculate strp->need_bytes solves this problem.
      
      Found while testing net/tls_sw recv path.
      
      Fixes: 43a0c675 ("strparser: Stream parser for messages")
      Signed-off-by: default avatarDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f781ebf
    • Tom Lendacky's avatar
      amd-xgbe: Only use the SFP supported transceiver signals · 109feb04
      Tom Lendacky authored
      
      [ Upstream commit 117df655 ]
      
      The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.)
      that it supports.  Update the driver to include checking the eeprom data
      when deciding whether to use a transceiver signal.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      109feb04
    • Doron Roberts-Kedes's avatar
      strparser: Do not call mod_delayed_work with a timeout of LONG_MAX · 9a661231
      Doron Roberts-Kedes authored
      
      [ Upstream commit 7c5aba21 ]
      
      struct sock's sk_rcvtimeo is initialized to
      LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling
      mod_delayed_work with a timeout of LONG_MAX causes spurious execution of
      the work function. timer->expires is set equal to jiffies + LONG_MAX.
      When timer_base->clk falls behind the current value of jiffies,
      the delta between timer_base->clk and jiffies + LONG_MAX causes the
      expiration to be in the past. Returning early from strp_start_timer if
      timeo == LONG_MAX solves this problem.
      
      Found while testing net/tls_sw recv path.
      
      Fixes: 43a0c675 ("strparser: Stream parser for messages")
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a661231
    • Tom Lendacky's avatar
      amd-xgbe: Improve KR auto-negotiation and training · 346ba2f2
      Tom Lendacky authored
      
      [ Upstream commit 96f4d430 ]
      
      Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks
      to improve the ability to successfully complete Clause 73 AN when running
      at 10gbps.  Hardware can sometimes have issues with CDR lock when the
      AN DME page exchange is being performed.
      
      The AN and KR training hooks are used as follows:
      - The pre AN hook is used to disable CDR tracking in the PHY so that the
        DME page exchange can be successfully and consistently completed.
      - The post KR training hook is used to re-enable the CDR tracking so that
        KR training can successfully complete.
      - The post AN hook is used to check for an unsuccessful AN which will
        increase a CDR tracking enablement delay (up to a maximum value).
      
      Add two debugfs entries to allow control over use of the CDR tracking
      workaround.  The debugfs entries allow the CDR tracking workaround to
      be disabled and determine whether to re-enable CDR tracking before or
      after link training has been initiated.
      
      Also, with these changes the receiver reset cycle that is performed during
      the link status check can be performed less often.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      346ba2f2
    • Xin Long's avatar
      sctp: do not check port in sctp_inet6_cmp_addr · 29b623b6
      Xin Long authored
      
      [ Upstream commit 1071ec9d ]
      
      pf->cmp_addr() is called before binding a v6 address to the sock. It
      should not check ports, like in sctp_inet_cmp_addr.
      
      But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr,
      sctp_v6_cmp_addr where it also compares the ports.
      
      This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind
      multiple duplicated IPv6 addresses after Commit 40b4f0fd ("sctp:
      lack the check for ports in sctp_v6_cmp_addr").
      
      This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr,
      but do the proper check for both v6 addrs and v4mapped addrs.
      
      v1->v2:
        - define __sctp_v6_cmp_addr to do the common address comparison
          used for both pf and af v6 cmp_addr.
      
      Fixes: 40b4f0fd ("sctp: lack the check for ports in sctp_v6_cmp_addr")
      Reported-by: default avatarJianwen Ji <jiji@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29b623b6
    • Tom Lendacky's avatar
      amd-xgbe: Add pre/post auto-negotiation phy hooks · f42036e8
      Tom Lendacky authored
      
      [ Upstream commit 4d945663 ]
      
      Add hooks to the driver auto-negotiation (AN) flow to allow the different
      phy implementations to perform any steps necessary to improve AN.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f42036e8
    • Toshiaki Makita's avatar
      vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi · dd997151
      Toshiaki Makita authored
      
      [ Upstream commit 7ce23672 ]
      
      Syzkaller spotted an old bug which leads to reading skb beyond tail by 4
      bytes on vlan tagged packets.
      This is caused because skb_vlan_tagged_multi() did not check
      skb_headlen.
      
      BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline]
      BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
      BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline]
      BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline]
      BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
      CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x185/0x1d0 lib/dump_stack.c:53
        kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
        __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
        eth_type_vlan include/linux/if_vlan.h:283 [inline]
        skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
        vlan_features_check include/linux/if_vlan.h:672 [inline]
        dflt_features_check net/core/dev.c:2949 [inline]
        netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
        validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084
        __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549
        dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
        packet_snd net/packet/af_packet.c:2944 [inline]
        packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        sock_write_iter+0x3b9/0x470 net/socket.c:909
        do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
        do_iter_write+0x30d/0xd40 fs/read_write.c:932
        vfs_writev fs/read_write.c:977 [inline]
        do_writev+0x3c9/0x830 fs/read_write.c:1012
        SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
        SyS_writev+0x56/0x80 fs/read_write.c:1082
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x43ffa9
      RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9
      RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0
      R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
        kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
        kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
        slab_post_alloc_hook mm/slab.h:445 [inline]
        slab_alloc_node mm/slub.c:2737 [inline]
        __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:984 [inline]
        alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
        sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
        packet_alloc_skb net/packet/af_packet.c:2803 [inline]
        packet_snd net/packet/af_packet.c:2894 [inline]
        packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        sock_write_iter+0x3b9/0x470 net/socket.c:909
        do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
        do_iter_write+0x30d/0xd40 fs/read_write.c:932
        vfs_writev fs/read_write.c:977 [inline]
        do_writev+0x3c9/0x830 fs/read_write.c:1012
        SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
        SyS_writev+0x56/0x80 fs/read_write.c:1082
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.")
      Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd997151
    • Guillaume Nault's avatar
      pppoe: check sockaddr length in pppoe_connect() · 88b7895e
      Guillaume Nault authored
      
      [ Upstream commit a49e2f5d ]
      
      We must validate sockaddr_len, otherwise userspace can pass fewer data
      than we expect and we end up accessing invalid data.
      
      Fixes: 224cf5ad ("ppp: Move the PPP drivers")
      Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88b7895e
    • Eric Dumazet's avatar
      tipc: add policy for TIPC_NLA_NET_ADDR · ed2ba25f
      Eric Dumazet authored
      
      [ Upstream commit ec518f21 ]
      
      Before syzbot/KMSAN bites, add the missing policy for TIPC_NLA_NET_ADDR
      
      Fixes: 27c21416 ("tipc: add net set to new netlink api")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed2ba25f
    • Willem de Bruijn's avatar
      packet: fix bitfield update race · 6da813d7
      Willem de Bruijn authored
      
      [ Upstream commit a6361f0c ]
      
      Updates to the bitfields in struct packet_sock are not atomic.
      Serialize these read-modify-write cycles.
      
      Move po->running into a separate variable. Its writes are protected by
      po->bind_lock (except for one startup case at packet_create). Also
      replace a textual precondition warning with lockdep annotation.
      
      All others are set only in packet_setsockopt. Serialize these
      updates by holding the socket lock. Analogous to other field updates,
      also hold the lock when testing whether a ring is active (pg_vec).
      
      Fixes: 8dc41944 ("[PACKET]: Add optional checksum computation for recvmsg")
      Reported-by: default avatarDaeRyong Jeong <threeearcat@gmail.com>
      Reported-by: default avatarByoungyoung Lee <byoungyoung@purdue.edu>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6da813d7
    • Xin Long's avatar
      team: fix netconsole setup over team · 70a615d7
      Xin Long authored
      
      [ Upstream commit 9cf2f437 ]
      
      The same fix in Commit dbe17307 ("bridge: fix netconsole
      setup over bridge") is also needed for team driver.
      
      While at it, remove the unnecessary parameter *team from
      team_port_enable_netpoll().
      
      v1->v2:
        - fix it in a better way, as does bridge.
      
      Fixes: 0fb52a27 ("team: cleanup netpoll clode")
      Reported-by: default avatarJoão Avelino Bellomo Filho <jbellomo@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70a615d7
    • Ursula Braun's avatar
      net/smc: fix shutdown in state SMC_LISTEN · 07cb0b54
      Ursula Braun authored
      
      [ Upstream commit 1255fcb2 ]
      
      Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket
      crashes, because
         commit 127f4970 ("net/smc: release clcsock from tcp_listen_worker")
      releases the internal clcsock in smc_close_active() and sets smc->clcsock
      to NULL.
      For SHUT_RD the smc_close_active() call is removed.
      For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the
      clcsock is already released.
      
      Fixes: 127f4970 ("net/smc: release clcsock from tcp_listen_worker")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Reported-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07cb0b54
    • Paolo Abeni's avatar
      team: avoid adding twice the same option to the event list · 7b4f4d75
      Paolo Abeni authored
      
      [ Upstream commit 4fb0534f ]
      
      When parsing the options provided by the user space,
      team_nl_cmd_options_set() insert them in a temporary list to send
      multiple events with a single message.
      While each option's attribute is correctly validated, the code does
      not check for duplicate entries before inserting into the event
      list.
      
      Exploiting the above, the syzbot was able to trigger the following
      splat:
      
      kernel BUG at lib/list_debug.c:31!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 4466 Comm: syzkaller556835 Not tainted 4.16.0+ #17
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__list_add_valid+0xaa/0xb0 lib/list_debug.c:29
      RSP: 0018:ffff8801b04bf248 EFLAGS: 00010286
      RAX: 0000000000000058 RBX: ffff8801c8fc7a90 RCX: 0000000000000000
      RDX: 0000000000000058 RSI: ffffffff815fbf41 RDI: ffffed0036097e3f
      RBP: ffff8801b04bf260 R08: ffff8801b0b2a700 R09: ffffed003b604f90
      R10: ffffed003b604f90 R11: ffff8801db027c87 R12: ffff8801c8fc7a90
      R13: ffff8801c8fc7a90 R14: dffffc0000000000 R15: 0000000000000000
      FS:  0000000000b98880(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000043fc30 CR3: 00000001afe8e000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        __list_add include/linux/list.h:60 [inline]
        list_add include/linux/list.h:79 [inline]
        team_nl_cmd_options_set+0x9ff/0x12b0 drivers/net/team/team.c:2571
        genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599
        genl_rcv_msg+0xc6/0x170 net/netlink/genetlink.c:624
        netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
        genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
        sock_sendmsg_nosec net/socket.c:629 [inline]
        sock_sendmsg+0xd5/0x120 net/socket.c:639
        ___sys_sendmsg+0x805/0x940 net/socket.c:2117
        __sys_sendmsg+0x115/0x270 net/socket.c:2155
        SYSC_sendmsg net/socket.c:2164 [inline]
        SyS_sendmsg+0x29/0x30 net/socket.c:2162
        do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4458b9
      RSP: 002b:00007ffd1d4a7278 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 000000000000001b RCX: 00000000004458b9
      RDX: 0000000000000010 RSI: 0000000020000d00 RDI: 0000000000000004
      RBP: 00000000004a74ed R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000213 R12: 00007ffd1d4a7348
      R13: 0000000000402a60 R14: 0000000000000000 R15: 0000000000000000
      Code: 75 e8 eb a9 48 89 f7 48 89 75 e8 e8 d1 85 7b fe 48 8b 75 e8 eb bb 48
      89 f2 48 89 d9 4c 89 e6 48 c7 c7 a0 84 d8 87 e8 ea 67 28 fe <0f> 0b 0f 1f
      40 00 48 b8 00 00 00 00 00 fc ff df 55 48 89 e5 41
      RIP: __list_add_valid+0xaa/0xb0 lib/list_debug.c:29 RSP: ffff8801b04bf248
      
      This changeset addresses the avoiding list_add() if the current
      option is already present in the event list.
      
      Reported-and-tested-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Fixes: 2fcdb2c9 ("team: allow to send multiple set events in one message")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b4f4d75
    • Wolfgang Bumiller's avatar
      net: fix deadlock while clearing neighbor proxy table · 012e5e5b
      Wolfgang Bumiller authored
      
      [ Upstream commit 53b76cdf ]
      
      When coming from ndisc_netdev_event() in net/ipv6/ndisc.c,
      neigh_ifdown() is called with &nd_tbl, locking this while
      clearing the proxy neighbor entries when eg. deleting an
      interface. Calling the table's pndisc_destructor() with the
      lock still held, however, can cause a deadlock: When a
      multicast listener is available an IGMP packet of type
      ICMPV6_MGM_REDUCTION may be sent out. When reaching
      ip6_finish_output2(), if no neighbor entry for the target
      address is found, __neigh_create() is called with &nd_tbl,
      which it'll want to lock.
      
      Move the elements into their own list, then unlock the table
      and perform the destruction.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199289
      Fixes: 6fd6ce20 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
      Signed-off-by: default avatarWolfgang Bumiller <w.bumiller@proxmox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      012e5e5b
    • Eric Dumazet's avatar
      tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets · d5387e66
      Eric Dumazet authored
      
      [ Upstream commit 72123032 ]
      
      syzbot/KMSAN reported an uninit-value in tcp_parse_options() [1]
      
      I believe this was caused by a TCP_MD5SIG being set on live
      flow.
      
      This is highly unexpected, since TCP option space is limited.
      
      For instance, presence of TCP MD5 option automatically disables
      TCP TimeStamp option at SYN/SYNACK time, which we can not do
      once flow has been established.
      
      Really, adding/deleting an MD5 key only makes sense on sockets
      in CLOSE or LISTEN state.
      
      [1]
      BUG: KMSAN: uninit-value in tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
      CPU: 1 PID: 6177 Comm: syzkaller192004 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       tcp_parse_options+0xd74/0x1a30 net/ipv4/tcp_input.c:3720
       tcp_fast_parse_options net/ipv4/tcp_input.c:3858 [inline]
       tcp_validate_incoming+0x4f1/0x2790 net/ipv4/tcp_input.c:5184
       tcp_rcv_established+0xf60/0x2bb0 net/ipv4/tcp_input.c:5453
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x448fe9
      RSP: 002b:00007fd472c64d38 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000006e5a30 RCX: 0000000000448fe9
      RDX: 000000000000029f RSI: 0000000020a88f88 RDI: 0000000000000004
      RBP: 00000000006e5a34 R08: 0000000020e68000 R09: 0000000000000010
      R10: 00000000200007fd R11: 0000000000000216 R12: 0000000000000000
      R13: 00007fff074899ef R14: 00007fd472c659c0 R15: 0000000000000009
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
       slab_post_alloc_hook mm/slab.h:445 [inline]
       slab_alloc_node mm/slub.c:2737 [inline]
       __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
       __kmalloc_reserve net/core/skbuff.c:138 [inline]
       __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
       alloc_skb include/linux/skbuff.h:984 [inline]
       tcp_send_ack+0x18c/0x910 net/ipv4/tcp_output.c:3624
       __tcp_ack_snd_check net/ipv4/tcp_input.c:5040 [inline]
       tcp_ack_snd_check net/ipv4/tcp_input.c:5053 [inline]
       tcp_rcv_established+0x2103/0x2bb0 net/ipv4/tcp_input.c:5469
       tcp_v4_do_rcv+0x6cd/0xd90 net/ipv4/tcp_ipv4.c:1469
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_sendmsg+0xd6/0x100 net/ipv4/tcp.c:1464
       inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764
       sock_sendmsg_nosec net/socket.c:630 [inline]
       sock_sendmsg net/socket.c:640 [inline]
       SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747
       SyS_sendto+0x8a/0xb0 net/socket.c:1715
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5387e66
    • Eric Dumazet's avatar
      net: af_packet: fix race in PACKET_{R|T}X_RING · 7c235252
      Eric Dumazet authored
      
      [ Upstream commit 5171b37d ]
      
      In order to remove the race caught by syzbot [1], we need
      to lock the socket before using po->tp_version as this could
      change under us otherwise.
      
      This means lock_sock() and release_sock() must be done by
      packet_set_ring() callers.
      
      [1] :
      BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
      CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
       packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
       SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
       SyS_setsockopt+0x76/0xa0 net/socket.c:1828
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x449099
      RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
      RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
      RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
      R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001
      
      Local variable description: ----req_u@packet_setsockopt
      Variable was created at:
       packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
       SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c235252
    • Jann Horn's avatar
      tcp: don't read out-of-bounds opsize · b76d3f33
      Jann Horn authored
      
      [ Upstream commit 7e5a206a ]
      
      The old code reads the "opsize" variable from out-of-bounds memory (first
      byte behind the segment) if a broken TCP segment ends directly after an
      opcode that is neither EOL nor NOP.
      
      The result of the read isn't used for anything, so the worst thing that
      could theoretically happen is a pagefault; and since the physmap is usually
      mostly contiguous, even that seems pretty unlikely.
      
      The following C reproducer triggers the uninitialized read - however, you
      can't actually see anything happen unless you put something like a
      pr_warn() in tcp_parse_md5sig_option() to print the opsize.
      
      ====================================
      #define _GNU_SOURCE
      #include <arpa/inet.h>
      #include <stdlib.h>
      #include <errno.h>
      #include <stdarg.h>
      #include <net/if.h>
      #include <linux/if.h>
      #include <linux/ip.h>
      #include <linux/tcp.h>
      #include <linux/in.h>
      #include <linux/if_tun.h>
      #include <err.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/ioctl.h>
      #include <assert.h>
      
      void systemf(const char *command, ...) {
        char *full_command;
        va_list ap;
        va_start(ap, command);
        if (vasprintf(&full_command, command, ap) == -1)
          err(1, "vasprintf");
        va_end(ap);
        printf("systemf: <<<%s>>>\n", full_command);
        system(full_command);
      }
      
      char *devname;
      
      int tun_alloc(char *name) {
        int fd = open("/dev/net/tun", O_RDWR);
        if (fd == -1)
          err(1, "open tun dev");
        static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
        strcpy(req.ifr_name, name);
        if (ioctl(fd, TUNSETIFF, &req))
          err(1, "TUNSETIFF");
        devname = req.ifr_name;
        printf("device name: %s\n", devname);
        return fd;
      }
      
      #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))
      
      void sum_accumulate(unsigned int *sum, void *data, int len) {
        assert((len&2)==0);
        for (int i=0; i<len/2; i++) {
          *sum += ntohs(((unsigned short *)data)[i]);
        }
      }
      
      unsigned short sum_final(unsigned int sum) {
        sum = (sum >> 16) + (sum & 0xffff);
        sum = (sum >> 16) + (sum & 0xffff);
        return htons(~sum);
      }
      
      void fix_ip_sum(struct iphdr *ip) {
        unsigned int sum = 0;
        sum_accumulate(&sum, ip, sizeof(*ip));
        ip->check = sum_final(sum);
      }
      
      void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
        unsigned int sum = 0;
        struct {
          unsigned int saddr;
          unsigned int daddr;
          unsigned char pad;
          unsigned char proto_num;
          unsigned short tcp_len;
        } fakehdr = {
          .saddr = ip->saddr,
          .daddr = ip->daddr,
          .proto_num = ip->protocol,
          .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
        };
        sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
        sum_accumulate(&sum, tcp, tcp->doff*4);
        tcp->check = sum_final(sum);
      }
      
      int main(void) {
        int tun_fd = tun_alloc("inject_dev%d");
        systemf("ip link set %s up", devname);
        systemf("ip addr add 192.168.42.1/24 dev %s", devname);
      
        struct {
          struct iphdr ip;
          struct tcphdr tcp;
          unsigned char tcp_opts[20];
        } __attribute__((packed)) syn_packet = {
          .ip = {
            .ihl = sizeof(struct iphdr)/4,
            .version = 4,
            .tot_len = htons(sizeof(syn_packet)),
            .ttl = 30,
            .protocol = IPPROTO_TCP,
            /* FIXUP check */
            .saddr = IPADDR(192,168,42,2),
            .daddr = IPADDR(192,168,42,1)
          },
          .tcp = {
            .source = htons(1),
            .dest = htons(1337),
            .seq = 0x12345678,
            .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
            .syn = 1,
            .window = htons(64),
            .check = 0 /*FIXUP*/
          },
          .tcp_opts = {
            /* INVALID: trailing MD5SIG opcode after NOPs */
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 1,
            1, 1, 1, 1, 19
          }
        };
        fix_ip_sum(&syn_packet.ip);
        fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
        while (1) {
          int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
          if (write_res != sizeof(syn_packet))
            err(1, "packet write failed");
        }
      }
      ====================================
      
      Fixes: cfb6eeb4 ("[TCP]: MD5 Signature Option (RFC2385) support.")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b76d3f33