1. 03 May, 2017 40 commits
    • Jason A. Donenfeld's avatar
      macsec: avoid heap overflow in skb_to_sgvec · 43a35e67
      Jason A. Donenfeld authored
      commit 4d6fa57b upstream.
      
      While this may appear as a humdrum one line change, it's actually quite
      important. An sk_buff stores data in three places:
      
      1. A linear chunk of allocated memory in skb->data. This is the easiest
         one to work with, but it precludes using scatterdata since the memory
         must be linear.
      2. The array skb_shinfo(skb)->frags, which is of maximum length
         MAX_SKB_FRAGS. This is nice for scattergather, since these fragments
         can point to different pages.
      3. skb_shinfo(skb)->frag_list, which is a pointer to another sk_buff,
         which in turn can have data in either (1) or (2).
      
      The first two are rather easy to deal with, since they're of a fixed
      maximum length, while the third one is not, since there can be
      potentially limitless chains of fragments. Fortunately dealing with
      frag_list is opt-in for drivers, so drivers don't actually have to deal
      with this mess. For whatever reason, macsec decided it wanted pain, and
      so it explicitly specified NETIF_F_FRAGLIST.
      
      Because dealing with (1), (2), and (3) is insane, most users of sk_buff
      doing any sort of crypto or paging operation calls a convenient function
      called skb_to_sgvec (which happens to be recursive if (3) is in use!).
      This takes a sk_buff as input, and writes into its output pointer an
      array of scattergather list items. Sometimes people like to declare a
      fixed size scattergather list on the stack; othertimes people like to
      allocate a fixed size scattergather list on the heap. However, if you're
      doing it in a fixed-size fashion, you really shouldn't be using
      NETIF_F_FRAGLIST too (unless you're also ensuring the sk_buff and its
      frag_list children arent't shared and then you check the number of
      fragments in total required.)
      
      Macsec specifically does this:
      
              size += sizeof(struct scatterlist) * (MAX_SKB_FRAGS + 1);
              tmp = kmalloc(size, GFP_ATOMIC);
              *sg = (struct scatterlist *)(tmp + sg_offset);
      	...
              sg_init_table(sg, MAX_SKB_FRAGS + 1);
              skb_to_sgvec(skb, sg, 0, skb->len);
      
      Specifying MAX_SKB_FRAGS + 1 is the right answer usually, but not if you're
      using NETIF_F_FRAGLIST, in which case the call to skb_to_sgvec will
      overflow the heap, and disaster ensues.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43a35e67
    • Yan, Zheng's avatar
      ceph: fix recursion between ceph_set_acl() and __ceph_setattr() · e4720b00
      Yan, Zheng authored
      commit 8179a101 upstream.
      
      ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
      to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
      then calls posix_acl_chmod().
      
      The problem is that __ceph_setattr() calls posix_acl_chmod() before
      sending the setattr request. The get_acl() call in posix_acl_chmod()
      can trigger a getxattr request. The reply of the getxattr request
      can restore inode's i_mode to its old value. The set_acl() call in
      posix_acl_chmod() sees old value of inode's i_mode, so it calls
      __ceph_setattr() again.
      
      Link: http://tracker.ceph.com/issues/19688Reported-by: default avatarJerry Lee <leisurelysw24@gmail.com>
      Signed-off-by: default avatar"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Tested-by: default avatarLuis Henriques <lhenriques@suse.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4720b00
    • J. Bruce Fields's avatar
      nfsd: stricter decoding of write-like NFSv2/v3 ops · 43e36037
      J. Bruce Fields authored
      commit 13bf9fbf upstream.
      
      The NFSv2/v3 code does not systematically check whether we decode past
      the end of the buffer.  This generally appears to be harmless, but there
      are a few places where we do arithmetic on the pointers involved and
      don't account for the possibility that a length could be negative.  Add
      checks to catch these.
      Reported-by: default avatarTuomas Haanpää <thaan@synopsys.com>
      Reported-by: default avatarAri Kauppi <ari@synopsys.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43e36037
    • J. Bruce Fields's avatar
      nfsd4: minor NFSv2/v3 write decoding cleanup · 144180dc
      J. Bruce Fields authored
      commit db44bac4 upstream.
      
      Use a couple shortcuts that will simplify a following bugfix.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      144180dc
    • J. Bruce Fields's avatar
      nfsd: check for oversized NFSv2/v3 arguments · 86eb1d0a
      J. Bruce Fields authored
      commit e6838a29 upstream.
      
      A client can append random data to the end of an NFSv2 or NFSv3 RPC call
      without our complaining; we'll just stop parsing at the end of the
      expected data and ignore the rest.
      
      Encoded arguments and replies are stored together in an array of pages,
      and if a call is too large it could leave inadequate space for the
      reply.  This is normally OK because NFS RPC's typically have either
      short arguments and long replies (like READ) or long arguments and short
      replies (like WRITE).  But a client that sends an incorrectly long reply
      can violate those assumptions.  This was observed to cause crashes.
      
      Also, several operations increment rq_next_page in the decode routine
      before checking the argument size, which can leave rq_next_page pointing
      well past the end of the page array, causing trouble later in
      svc_free_pages.
      
      So, following a suggestion from Neil Brown, add a central check to
      enforce our expectation that no NFSv2/v3 call has both a large call and
      a large reply.
      
      As followup we may also want to rewrite the encoding routines to check
      more carefully that they aren't running off the end of the page array.
      
      We may also consider rejecting calls that have any extra garbage
      appended.  That would be safer, and within our rights by spec, but given
      the age of our server and the NFS protocol, and the fact that we've
      never enforced this before, we may need to balance that against the
      possibility of breaking some oddball client.
      Reported-by: default avatarTuomas Haanpää <thaan@synopsys.com>
      Reported-by: default avatarAri Kauppi <ari@synopsys.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86eb1d0a
    • Dmitry Torokhov's avatar
      Input: i8042 - add Clevo P650RS to the i8042 reset list · b98d12a1
      Dmitry Torokhov authored
      commit 7c5bb4ac upstream.
      
      Clevo P650RS and other similar devices require i8042 to be reset in order
      to detect Synaptics touchpad.
      Reported-by: default avatarPaweł Bylica <chfast@gmail.com>
      Tested-by: default avatarEd Bordin <edbordin@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=190301Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b98d12a1
    • Takashi Iwai's avatar
      ASoC: intel: Fix PM and non-atomic crash in bytcr drivers · 2f680d46
      Takashi Iwai authored
      commit 6e4cac23 upstream.
      
      The FE setups of Intel SST bytcr_rt5640 and bytcr_rt5651 drivers carry
      the ignore_suspend flag, and this prevents the suspend/resume working
      properly while the stream is running, since SST core code has the
      check of the running streams and returns -EBUSY.  Drop these
      superfluous flags for fixing the behavior.
      
      Also, the bytcr_rt5640 driver lacks of nonatomic flag in some FE
      definitions, which leads to the kernel Oops at suspend/resume like:
      
        BUG: scheduling while atomic: systemd-sleep/3144/0x00000003
        Call Trace:
         dump_stack+0x5c/0x7a
         __schedule_bug+0x55/0x70
         __schedule+0x63c/0x8c0
         schedule+0x3d/0x90
         schedule_timeout+0x16b/0x320
         ? del_timer_sync+0x50/0x50
         ? sst_wait_timeout+0xa9/0x170 [snd_intel_sst_core]
         ? sst_wait_timeout+0xa9/0x170 [snd_intel_sst_core]
         ? remove_wait_queue+0x60/0x60
         ? sst_prepare_and_post_msg+0x275/0x960 [snd_intel_sst_core]
         ? sst_pause_stream+0x9b/0x110 [snd_intel_sst_core]
         ....
      
      This patch addresses these appropriately, too.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f680d46
    • Al Viro's avatar
      p9_client_readdir() fix · bec07492
      Al Viro authored
      commit 71d6ad08 upstream.
      
      Don't assume that server is sane and won't return more data than
      asked for.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bec07492
    • James Cowgill's avatar
      MIPS: Avoid BUG warning in arch_check_elf · 67355b67
      James Cowgill authored
      commit c46f59e9 upstream.
      
      arch_check_elf contains a usage of current_cpu_data that will call
      smp_processor_id() with preemption enabled and therefore triggers a
      "BUG: using smp_processor_id() in preemptible" warning when an fpxx
      executable is loaded.
      
      As a follow-up to commit b244614a ("MIPS: Avoid a BUG warning during
      prctl(PR_SET_FP_MODE, ...)"), apply the same fix to arch_check_elf by
      using raw_current_cpu_data instead. The rationale quoted from the previous
      commit:
      
      "It is assumed throughout the kernel that if any CPU has an FPU, then
      all CPUs would have an FPU as well, so it is safe to perform the check
      with preemption enabled - change the code to use raw_ variant of the
      check to avoid the warning."
      
      Fixes: 46490b57 ("MIPS: kernel: elf: Improve the overall ABI and FPU mode checks")
      Signed-off-by: default avatarJames Cowgill <James.Cowgill@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15951/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67355b67
    • James Hogan's avatar
      MIPS: cevt-r4k: Fix out-of-bounds array access · 7cb5877d
      James Hogan authored
      commit 9d7f29cd upstream.
      
      calculate_min_delta() may incorrectly access a 4th element of buf2[]
      which only has 3 elements. This may trigger undefined behaviour and has
      been reported to cause strange crashes in start_kernel() sometime after
      timer initialization when built with GCC 5.3, possibly due to
      register/stack corruption:
      
      sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 10737418237ns
      CPU 0 Unable to handle kernel paging request at virtual address ffffb0aa, epc == 8067daa8, ra == 8067da84
      Oops[#1]:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.18 #51
      task: 8065e3e0 task.stack: 80644000
      $ 0   : 00000000 00000001 00000000 00000000
      $ 4   : 8065b4d0 00000000 805d0000 00000010
      $ 8   : 00000010 80321400 fffff000 812de408
      $12   : 00000000 00000000 00000000 ffffffff
      $16   : 00000002 ffffffff 80660000 806a666c
      $20   : 806c0000 00000000 00000000 00000000
      $24   : 00000000 00000010
      $28   : 80644000 80645ed0 00000000 8067da84
      Hi    : 00000000
      Lo    : 00000000
      epc   : 8067daa8 start_kernel+0x33c/0x500
      ra    : 8067da84 start_kernel+0x318/0x500
      Status: 11000402 KERNEL EXL
      Cause : 4080040c (ExcCode 03)
      BadVA : ffffb0aa
      PrId  : 0501992c (MIPS 1004Kc)
      Modules linked in:
      Process swapper/0 (pid: 0, threadinfo=80644000, task=8065e3e0, tls=00000000)
      Call Trace:
      [<8067daa8>] start_kernel+0x33c/0x500
      Code: 24050240  0c0131f9  24849c64 <a200b0a8> 41606020  000000c0  0c1a45e6 00000000  0c1a5f44
      
      UBSAN also detects the same issue:
      
      ================================================================
      UBSAN: Undefined behaviour in arch/mips/kernel/cevt-r4k.c:85:41
      load of address 80647e4c with insufficient space
      for an object of type 'unsigned int'
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.18 #47
      Call Trace:
      [<80028f70>] show_stack+0x88/0xa4
      [<80312654>] dump_stack+0x84/0xc0
      [<8034163c>] ubsan_epilogue+0x14/0x50
      [<803417d8>] __ubsan_handle_type_mismatch+0x160/0x168
      [<8002dab0>] r4k_clockevent_init+0x544/0x764
      [<80684d34>] time_init+0x18/0x90
      [<8067fa5c>] start_kernel+0x2f0/0x500
      =================================================================
      
      buf2[] is intentionally only 3 elements so that the last element is the
      median once 5 samples have been inserted, so explicitly prevent the
      possibility of comparing against the 4th element rather than extending
      the array.
      
      Fixes: 1fa40555 ("MIPS: cevt-r4k: Dynamically calculate min_delta_ns")
      Reported-by: default avatarRabin Vincent <rabinv@axis.com>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Tested-by: default avatarRabin Vincent <rabinv@axis.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15892/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cb5877d
    • James Hogan's avatar
      MIPS: KGDB: Use kernel context for sleeping threads · 09c953f7
      James Hogan authored
      commit 162b270c upstream.
      
      KGDB is a kernel debug stub and it can't be used to debug userland as it
      can only safely access kernel memory.
      
      On MIPS however KGDB has always got the register state of sleeping
      processes from the userland register context at the beginning of the
      kernel stack. This is meaningless for kernel threads (which never enter
      userland), and for user threads it prevents the user seeing what it is
      doing while in the kernel:
      
      (gdb) info threads
        Id   Target Id         Frame
        ...
        3    Thread 2 (kthreadd) 0x0000000000000000 in ?? ()
        2    Thread 1 (init)   0x000000007705c4b4 in ?? ()
        1    Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
      
      Get the register state instead from the (partial) kernel register
      context stored in the task's thread_struct for resume() to restore. All
      threads now correctly appear to be in context_switch():
      
      (gdb) info threads
        Id   Target Id         Frame
        ...
        3    Thread 2 (kthreadd) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
        2    Thread 1 (init)   context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
        1    Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
      
      Call clobbered registers which aren't saved and exception registers
      (BadVAddr & Cause) which can't be easily determined without stack
      unwinding are reported as 0. The PC is taken from the return address,
      such that the state presented matches that found immediately after
      returning from resume().
      
      Fixes: 88547001 ("[MIPS] kgdb: add arch support for the kernel's kgdb core")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/15829/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09c953f7
    • Noam Camus's avatar
      ARC: [plat-eznps] Fix build error · 4a71345e
      Noam Camus authored
      commit 6492f09e upstream.
      
      Make ATOMIC_INIT available for all ARC platforms (including plat-eznps)
      Signed-off-by: default avatarNoam Camus <noamca@mellanox.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a71345e
    • Johannes Thumshirn's avatar
      scsi: return correct blkprep status code in case scsi_init_io() fails. · 47dbabb8
      Johannes Thumshirn authored
      commit e7661a8e upstream.
      
      When instrumenting the SCSI layer to run into the
      !blk_rq_nr_phys_segments(rq) case the following warning emitted from the
      block layer:
      
      blk_peek_request: bad return=-22
      
      This happens because since commit fd3fc0b4 ("scsi: don't BUG_ON()
      empty DMA transfers") we return the wrong error value from
      scsi_prep_fn() back to the block layer.
      
      [mkp: silenced checkpatch]
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: fd3fc0b4 scsi: don't BUG_ON() empty DMA transfers
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47dbabb8
    • Takashi Iwai's avatar
      ALSA: seq: Don't break snd_use_lock_sync() loop by timeout · dcb730f7
      Takashi Iwai authored
      commit 4e7655fd upstream.
      
      The snd_use_lock_sync() (thus its implementation
      snd_use_lock_sync_helper()) has the 5 seconds timeout to break out of
      the sync loop.  It was introduced from the beginning, just to be
      "safer", in terms of avoiding the stupid bugs.
      
      However, as Ben Hutchings suggested, this timeout rather introduces a
      potential leak or use-after-free that was apparently fixed by the
      commit 2d7d5400 ("ALSA: seq: Fix race during FIFO resize"):
      for example, snd_seq_fifo_event_in() -> snd_seq_event_dup() ->
      copy_from_user() could block for a long time, and snd_use_lock_sync()
      goes timeout and still leaves the cell at releasing the pool.
      
      For fixing such a problem, we remove the break by the timeout while
      still keeping the warning.
      Suggested-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dcb730f7
    • Takashi Sakamoto's avatar
      ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type · 7b2b791c
      Takashi Sakamoto authored
      commit dfb00a56 upstream.
      
      An abstraction of asynchronous transaction for transmission of MIDI
      messages was introduced in Linux v4.4. Each driver can utilize this
      abstraction to transfer MIDI messages via fixed-length payload of
      transaction to a certain unit address. Filling payload of the transaction
      is done by callback. In this callback, each driver can return negative
      error code, however current implementation assigns the return value to
      unsigned variable.
      
      This commit changes type of the variable to fix the bug.
      Reported-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Fixes: 585d7cba ("ALSA: firewire-lib: add helper functions for asynchronous transactions to transfer MIDI messages")
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b2b791c
    • Takashi Sakamoto's avatar
      ALSA: oxfw: fix regression to handle Stanton SCS.1m/1d · a33e886d
      Takashi Sakamoto authored
      commit 3d016d57 upstream.
      
      At a commit 6c29230e ("ALSA: oxfw: delayed registration of sound
      card"), ALSA oxfw driver fails to handle SCS.1m/1d, due to -EBUSY at a call
      of snd_card_register(). The cause is that the driver manages to register
      two rawmidi instances with the same device number 0. This is a regression
      introduced since kernel 4.7.
      
      This commit fixes the regression, by fixing up device property after
      discovering stream formats.
      
      Fixes: 6c29230e ("ALSA: oxfw: delayed registration of sound card")
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a33e886d
    • Jamie Bainbridge's avatar
      ipv6: check raw payload size correctly in ioctl · f62c4586
      Jamie Bainbridge authored
      [ Upstream commit 105f5528 ]
      
      In situations where an skb is paged, the transport header pointer and
      tail pointer can be the same because the skb contents are in frags.
      
      This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
      length of 0 when the length to receive is actually greater than zero.
      
      skb->len is already correctly set in ip6_input_finish() with
      pskb_pull(), so use skb->len as it always returns the correct result
      for both linear and paged data.
      Signed-off-by: default avatarJamie Bainbridge <jbainbri@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f62c4586
    • Wei Wang's avatar
      tcp: memset ca_priv data to 0 properly · 466dfcd1
      Wei Wang authored
      [ Upstream commit c1201444 ]
      
      Always zero out ca_priv data in tcp_assign_congestion_control() so that
      ca_priv data is cleared out during socket creation.
      Also always zero out ca_priv data in tcp_reinit_congestion_control() so
      that when cc algorithm is changed, ca_priv data is cleared out as well.
      We should still zero out ca_priv data even in TCP_CLOSE state because
      user could call connect() on AF_UNSPEC to disconnect the socket and
      leave it in TCP_CLOSE state and later call setsockopt() to switch cc
      algorithm on this socket.
      
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Reported-by: default avatarAndrey Konovalov  <andreyknvl@google.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      466dfcd1
    • WANG Cong's avatar
      ipv6: check skb->protocol before lookup for nexthop · 04630e2e
      WANG Cong authored
      [ Upstream commit 199ab00f ]
      
      Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
      is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
      neigh key as an IPv6 address:
      
              neigh = dst_neigh_lookup(skb_dst(skb),
                                       &ipv6_hdr(skb)->daddr);
              if (!neigh)
                      goto tx_err_link_failure;
      
              addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
              addr_type = ipv6_addr_type(addr6);
      
              if (addr_type == IPV6_ADDR_ANY)
                      addr6 = &ipv6_hdr(skb)->daddr;
      
              memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
      
      Also the network header of the skb at this point should be still IPv4
      for 4in6 tunnels, we shold not just use it as IPv6 header.
      
      This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
      is, we are safe to do the nexthop lookup using skb_dst() and
      ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
      dest address we can pick here, we have to rely on callers to fill it
      from tunnel config, so just fall to ip6_route_output() to make the
      decision.
      
      Fixes: ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04630e2e
    • Alexander Kochetkov's avatar
      net: phy: fix auto-negotiation stall due to unavailable interrupt · 683f8d60
      Alexander Kochetkov authored
      [ Upstream commit f555f34f ]
      
      The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
      cable was plugged before the Ethernet interface was brought up.
      
      The patch trigger PHY state machine to update link state if PHY was requested to
      do auto-negotiation and auto-negotiation complete flag already set.
      
      During power-up cycle the PHY do auto-negotiation, generate interrupt and set
      auto-negotiation complete flag. Interrupt is handled by PHY state machine but
      doesn't update link state because PHY is in PHY_READY state. After some time
      MAC bring up, start and request PHY to do auto-negotiation. If there are no new
      settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
      PHY continue to stay in auto-negotiation complete state and doesn't fire
      interrupt. At the same time PHY state machine expect that PHY started
      auto-negotiation and is waiting for interrupt from PHY and it won't get it.
      
      Fixes: 321beec5 ("net: phy: Use interrupts when available in NOLINK state")
      Signed-off-by: default avatarAlexander Kochetkov <al.kochet@gmail.com>
      Cc: stable <stable@vger.kernel.org> # v4.9+
      Tested-by: default avatarRoger Quadros <rogerq@ti.com>
      Tested-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      683f8d60
    • David Ahern's avatar
      net: ipv6: regenerate host route if moved to gc list · f9a8970e
      David Ahern authored
      [ Upstream commit 8048ced9 ]
      
      Taking down the loopback device wreaks havoc on IPv6 routing. By
      extension, taking down a VRF device wreaks havoc on its table.
      
      Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
      FIB code while running syzkaller fuzzer. The root cause is a dead dst
      that is on the garbage list gets reinserted into the IPv6 FIB. While on
      the gc (or perhaps when it gets added to the gc list) the dst->next is
      set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
      out-of-bounds access.
      
      Andrey's reproducer was the key to getting to the bottom of this.
      
      With IPv6, host routes for an address have the dst->dev set to the
      loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
      a walk of the fib evicting routes with the 'lo' device which means all
      host routes are removed. That process moves the dst which is attached to
      an inet6_ifaddr to the gc list and marks it as dead.
      
      The recent change to keep global IPv6 addresses added a new function,
      fixup_permanent_addr, that is called on admin up. That function restarts
      dad for an inet6_ifaddr and when it completes the host route attached
      to it is inserted into the fib. Since the route was marked dead and
      moved to the gc list, re-inserting the route causes the reported
      out-of-bounds accesses. If the device with the address is taken down
      or the address is removed, the WARN_ON in fib6_del is triggered.
      
      All of those faults are fixed by regenerating the host route if the
      existing one has been moved to the gc list, something that can be
      determined by checking if the rt6i_ref counter is 0.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9a8970e
    • Herbert Xu's avatar
      macvlan: Fix device ref leak when purging bc_queue · e2ae7173
      Herbert Xu authored
      [ Upstream commit f6478218 ]
      
      When a parent macvlan device is destroyed we end up purging its
      broadcast queue without dropping the device reference count on
      the packet source device.  This causes the source device to linger.
      
      This patch drops that reference count.
      
      Fixes: 260916df ("macvlan: Fix potential use-after free for...")
      Reported-by: default avatarJoe Ghalam <Joe.Ghalam@dell.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2ae7173
    • Soheil Hassas Yeganeh's avatar
      tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATS · b073c2c3
      Soheil Hassas Yeganeh authored
      [ Upstream commit 4ef1b286 ]
      
      SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
      while packets are collected on the error queue.
      So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
      is not enough to safely assume that the skb contains
      OPT_STATS data.
      
      Add a bit in sock_exterr_skb to indicate whether the
      skb contains opt_stats data.
      
      Fixes: 1c885808 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
      Reported-by: default avatarJongHwan Kim <zzoru007@gmail.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b073c2c3
    • Soheil Hassas Yeganeh's avatar
      tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs · cdaf15b4
      Soheil Hassas Yeganeh authored
      [ Upstream commit 8605330a ]
      
      __sock_recv_timestamp can be called for both normal skbs (for
      receive timestamps) and for skbs on the error queue (for transmit
      timestamps).
      
      Commit 1c885808
      (tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING)
      assumes any skb passed to __sock_recv_timestamp are from
      the error queue, containing OPT_STATS in the content of the skb.
      This results in accessing invalid memory or generating junk
      data.
      
      To fix this, set skb->pkt_type to PACKET_OUTGOING for packets
      on the error queue. This is safe because on the receive path
      on local sockets skb->pkt_type is never set to PACKET_OUTGOING.
      With that, copy OPT_STATS from a packet, only if its pkt_type
      is PACKET_OUTGOING.
      
      Fixes: 1c885808 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING")
      Reported-by: default avatarJongHwan Kim <zzoru007@gmail.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdaf15b4
    • Ilan Tayari's avatar
      net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling · df4c4820
      Ilan Tayari authored
      [ Upstream commit 5e82c9e4 ]
      
      Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size
      of the table, regardless of the amount of entries in it.
      Existing code does not do that, and this breaks all usage of ethtool -N
      or -n without explicit location, with this error:
      rmgr: Invalid RX class rules table size: Success
      
      Set info->data to the table size.
      
      Tested:
      ethtool -n ens8
      ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1
      ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55
      ethtool -n ens8
      ethtool -N ens8 delete 1023
      ethtool -N ens8 delete 55
      
      Fixes: f913a72a ("net/mlx5e: Add support to get ethtool flow rules")
      Signed-off-by: default avatarIlan Tayari <ilant@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df4c4820
    • Eugenia Emantayev's avatar
      net/mlx5e: Fix small packet threshold · cce19108
      Eugenia Emantayev authored
      [ Upstream commit cbad8cdd ]
      
      RX packet headers are meant to be contained in SKB linear part,
      and chose a threshold of 128.
      It turns out this is not enough, i.e. for IPv6 packet over VxLAN.
      In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes,
      and 86 bytes for TCP/IPv6. In total 136 bytes that is more than
      current 128 bytes. In this case expand header flow is reached.
      The warning in skb_try_coalesce() caused by a wrong truesize
      was already fixed here:
      commit 158f323b ("net: adjust skb->truesize in pskb_expand_head()").
      Still, we prefer to totally avoid the expand header flow for performance reasons.
      Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found.
      
      Fixes: 461017cb ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cce19108
    • Or Gerlitz's avatar
      net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5 · 3faae16b
      Or Gerlitz authored
      [ Upstream commit c415f704 ]
      
      On ConnectX5 the wqe inline mode is "none" and hence the FW
      reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED.
      
      Fix our devlink callbacks to deal with that on get and set.
      
      Also fix the tc flow parsing code not to fail anything when
      inline isn't required.
      
      Fixes: bffaa916 ('net/mlx5: E-Switch, Add control for inline mode')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3faae16b
    • Mohamad Haj Yahia's avatar
      net/mlx5: Fix driver load bad flow when having fw initializing timeout · 82aa6b2c
      Mohamad Haj Yahia authored
      [ Upstream commit 55378a23 ]
      
      If FW is stuck in initializing state we will skip the driver load, but
      current error handling flow doesn't clean previously allocated command
      interface resources.
      
      Fixes: e3297246 ('net/mlx5_core: Wait for FW readiness on startup')
      Signed-off-by: default avatarMohamad Haj Yahia <mohamad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82aa6b2c
    • Nikolay Aleksandrov's avatar
      ip6mr: fix notification device destruction · ff247bdf
      Nikolay Aleksandrov authored
      [ Upstream commit 723b929c ]
      
      Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
      because we call unregister_netdevice_many for a device that is already
      being destroyed. In IPv4's ipmr that has been resolved by two commits
      long time ago by introducing the "notify" parameter to the delete
      function and avoiding the unregister when called from a notifier, so
      let's do the same for ip6mr.
      
      The trace from Andrey:
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:6813!
      invalid opcode: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Workqueue: netns cleanup_net
      task: ffff880069208000 task.stack: ffff8800692d8000
      RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
      RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
      RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
      RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
      R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
      R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
      FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
      Call Trace:
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
       ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
       notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
       call_netdevice_notifiers net/core/dev.c:1663
       rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many net/core/dev.c:7880
       default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
       ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
       cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
       process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
       worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
      Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
      47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
      0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
      RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
      ---[ end trace e0b29c57e9b3292c ]---
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff247bdf
    • Tushar Dave's avatar
      netpoll: Check for skb->queue_mapping · 9db670f7
      Tushar Dave authored
      [ Upstream commit c70b17b7 ]
      
      Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
      otherwise skbs with queue_mapping greater than real_num_tx_queues
      can be sent to the underlying driver and can result in kernel panic.
      
      One such event is running netconsole and enabling VF on the same
      device. Or running netconsole and changing number of tx queues via
      ethtool on same device.
      
      e.g.
      Unable to handle kernel NULL pointer dereference
      tsk->{mm,active_mm}->context = 0000000000001525
      tsk->{mm,active_mm}->pgd = fff800130ff9a000
                    \|/ ____ \|/
                    "@'/ .. \`@"
                    /_| \__/ |_\
                       \__U_/
      kworker/48:1(475): Oops [#1]
      CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G           OE
      4.11.0-rc3-davem-net+ #7
      Workqueue: events queue_process
      task: fff80013113299c0 task.stack: fff800131132c000
      TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
      00000000    Tainted: G           OE
      TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
      g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
      0000000000000001
      g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
      00000000000000c0
      o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
      0000000000000003
      o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
      000000000049ed94
      RPC: <set_next_entity+0x34/0xb80>
      l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
      0000000000000000
      l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
      fff8001fa7605028
      i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
      0000000000000000
      i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
      00000000103fa4b0
      I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
      Call Trace:
       [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
       [0000000000998c74] netpoll_start_xmit+0xf4/0x200
       [0000000000998e10] queue_process+0x90/0x160
       [0000000000485fa8] process_one_work+0x188/0x480
       [0000000000486410] worker_thread+0x170/0x4c0
       [000000000048c6b8] kthread+0xd8/0x120
       [0000000000406064] ret_from_fork+0x1c/0x2c
       [0000000000000000]           (null)
      Disabling lock debugging due to kernel taint
      Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
      Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
      Caller[0000000000998e10]: queue_process+0x90/0x160
      Caller[0000000000485fa8]: process_one_work+0x188/0x480
      Caller[0000000000486410]: worker_thread+0x170/0x4c0
      Caller[000000000048c6b8]: kthread+0xd8/0x120
      Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
      Caller[0000000000000000]:           (null)
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9db670f7
    • David Ahern's avatar
      net: ipv6: RTF_PCPU should not be settable from userspace · 5e54291e
      David Ahern authored
      [ Upstream commit 557c44be ]
      
      Andrey reported a fault in the IPv6 route code:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff880069809600 task.stack: ffff880062dc8000
      RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
      RSP: 0018:ffff880062dced30 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
      RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
      RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
      FS:  00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
      Call Trace:
       ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
       ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
      ...
      
      Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
      set. Flags passed to the kernel are blindly copied to the allocated
      rt6_info by ip6_route_info_create making a newly inserted route appear
      as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
      and expects rt->dst.from to be set - which it is not since it is not
      really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
      generates the fault.
      
      Fix by checking for the flag and failing with EINVAL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e54291e
    • Ilan Tayari's avatar
      gso: Validate assumption of frag_list segementation · ee1f368e
      Ilan Tayari authored
      [ Upstream commit 43170c4e ]
      
      Commit 07b26c94 ("gso: Support partial splitting at the frag_list
      pointer") assumes that all SKBs in a frag_list (except maybe the last
      one) contain the same amount of GSO payload.
      
      This assumption is not always correct, resulting in the following
      warning message in the log:
          skb_segment: too many frags
      
      For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
      one frag, and some with 2 frags.
      After GRO, the frag_list SKBs end up having different amounts of payload.
      If this frag_list SKB is then forwarded, the aforementioned assumption
      is violated.
      
      Validate the assumption, and fall back to software GSO if it not true.
      
      Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
      Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
      Signed-off-by: default avatarIlan Tayari <ilant@mellanox.com>
      Signed-off-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee1f368e
    • Sabrina Dubroca's avatar
      ipv6: fix source routing · 03940f08
      Sabrina Dubroca authored
      [ Upstream commit ec9c4215 ]
      
      Commit a149e7c7 ("ipv6: sr: add support for SRH injection through
      setsockopt") introduced handling of IPV6_SRCRT_TYPE_4, but at the same
      time restricted it to only IPV6_SRCRT_TYPE_0 and
      IPV6_SRCRT_TYPE_4. Previously, ipv6_push_exthdr() and fl6_update_dst()
      would also handle other values (ie STRICT and TYPE_2).
      
      Restore previous source routing behavior, by handling IPV6_SRCRT_STRICT
      and IPV6_SRCRT_TYPE_2 the same way as IPV6_SRCRT_TYPE_0 in
      ipv6_push_exthdr() and fl6_update_dst().
      
      Fixes: a149e7c7 ("ipv6: sr: add support for SRH injection through setsockopt")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03940f08
    • David Lebrun's avatar
      ipv6: sr: fix double free of skb after handling invalid SRH · c52ac068
      David Lebrun authored
      [ Upstream commit 95b9b88d ]
      
      The icmpv6_param_prob() function already does a kfree_skb(),
      this patch removes the duplicate one.
      
      Fixes: 1ababeba ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c52ac068
    • Dan Carpenter's avatar
      dp83640: don't recieve time stamps twice · 3b600a30
      Dan Carpenter authored
      [ Upstream commit 9d386cd9 ]
      
      This patch is prompted by a static checker warning about a potential
      use after free.  The concern is that netif_rx_ni() can free "skb" and we
      call it twice.
      
      When I look at the commit that added this, it looks like some stray
      lines were added accidentally.  It doesn't make sense to me that we
      would recieve the same data two times.  I asked the author but never
      recieved a response.
      
      I can't test this code, but I'm pretty sure my patch is correct.
      
      Fixes: 4b063258 ("dp83640: Delay scheduled work.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarStefan Sørensen <stefan.sorensen@spectralink.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b600a30
    • David Lebrun's avatar
      ipv6: sr: fix out-of-bounds access in SRH validation · a0240747
      David Lebrun authored
      [ Upstream commit 2f3bb642 ]
      
      This patch fixes an out-of-bounds access in seg6_validate_srh() when the
      trailing data is less than sizeof(struct sr6_tlv).
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0240747
    • Sergei Shtylyov's avatar
      sh_eth: unmap DMA buffers when freeing rings · 7e793ce3
      Sergei Shtylyov authored
      [ Upstream commit 1debdc8f ]
      
      The DMA API debugging (when enabled) causes:
      
      WARNING: CPU: 0 PID: 1445 at lib/dma-debug.c:519 add_dma_entry+0xe0/0x12c
      DMA-API: exceeded 7 overlapping mappings of cacheline 0x01b2974d
      
      to be  printed after repeated initialization of the Ether device, e.g.
      suspend/resume or 'ifconfig' up/down. This is because DMA buffers mapped
      using dma_map_single() in sh_eth_ring_format() and sh_eth_start_xmit() are
      never unmapped. Resolve this problem by unmapping the buffers when freeing
      the descriptor  rings;  in order  to do it right, we'd have to add an extra
      parameter to sh_eth_txfree() (we rename this function to sh_eth_tx_free(),
      while at it).
      
      Based on the commit a47b70ea ("ravb: unmap descriptors when freeing
      rings").
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e793ce3
    • David Ahern's avatar
      net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule · c526d086
      David Ahern authored
      [ Upstream commit 426c87ca ]
      
      Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr.
      
      Fixes: 1aa6c4f6 ("net: vrf: Add l3mdev rules on first device create")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c526d086
    • Willem de Bruijn's avatar
      net-timestamp: avoid use-after-free in ip_recv_error · 9ca5d7e4
      Willem de Bruijn authored
      [ Upstream commit 1862d620 ]
      
      Syzkaller reported a use-after-free in ip_recv_error at line
      
          info->ipi_ifindex = skb->dev->ifindex;
      
      This function is called on dequeue from the error queue, at which
      point the device pointer may no longer be valid.
      
      Save ifindex on enqueue in __skb_complete_tx_timestamp, when the
      pointer is valid or NULL. Store it in temporary storage skb->cb.
      
      It is safe to reference skb->dev here, as called from device drivers
      or dev_queue_xmit. The exception is when called from tcp_ack_tstamp;
      in that case it is NULL and ifindex is set to 0 (invalid).
      
      Do not return a pktinfo cmsg if ifindex is 0. This maintains the
      current behavior of not returning a cmsg if skb->dev was NULL.
      
      On dequeue, the ipv4 path will cast from sock_exterr_skb to
      in_pktinfo. Both have ifindex as their first element, so no explicit
      conversion is needed. This is by design, introduced in commit
      0b922b7a ("net: original ingress device index in PKTINFO"). For
      ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo.
      
      Fixes: 829ae9d6 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ca5d7e4
    • Rabin Vincent's avatar
      ipv6: Fix idev->addr_list corruption · 0d8ef98c
      Rabin Vincent authored
      [ Upstream commit a2d6cbb0 ]
      
      addrconf_ifdown() removes elements from the idev->addr_list without
      holding the idev->lock.
      
      If this happens while the loop in __ipv6_dev_get_saddr() is handling the
      same element, that function ends up in an infinite loop:
      
        NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [test:1719]
        Call Trace:
         ipv6_get_saddr_eval+0x13c/0x3a0
         __ipv6_dev_get_saddr+0xe4/0x1f0
         ipv6_dev_get_saddr+0x1b4/0x204
         ip6_dst_lookup_tail+0xcc/0x27c
         ip6_dst_lookup_flow+0x38/0x80
         udpv6_sendmsg+0x708/0xba8
         sock_sendmsg+0x18/0x30
         SyS_sendto+0xb8/0xf8
         syscall_common+0x34/0x58
      
      Fixes: 6a923934 (Revert "ipv6: Revert optional address flusing on ifdown.")
      Signed-off-by: default avatarRabin Vincent <rabinv@axis.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d8ef98c