1. 23 Dec, 2016 38 commits
    • Eric Dumazet's avatar
      ipvlan: fix various issues in ipvlan_process_multicast() · b1227d01
      Eric Dumazet authored
      1) netif_rx() / dev_forward_skb() should not be called from process
      context.
      
      2) ipvlan_count_rx() should be called with preemption disabled.
      
      3) We should check if ipvlan->dev is up before feeding packets
      to netif_rx()
      
      4) We need to prevent device from disappearing if some packets
      are in the multicast backlog.
      
      5) One kfree_skb() should be a consume_skb() eventually
      
      Fixes: ba35f858 ("ipvlan: Defer multicast / broadcast processing to
      a work-queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1227d01
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 50b17cfb
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) We have to be careful to not try and place a checksum after the end
          of a rawv6 packet, fix from Dave Jones with help from Hannes
          Frederic Sowa.
      
       2) Missing memory barriers in tcp_tasklet_func() lead to crashes, from
          Eric Dumazet.
      
       3) Several bug fixes for the new XDP support in virtio_net, from Jason
          Wang.
      
       4) Increase headroom in RX skbs in be2net driver to accomodate
          encapsulations such as geneve. From Kalesh A P.
      
       5) Fix SKB frag unmapping on TX in mvpp2, from Thomas Petazzoni.
      
       6) Pre-pulling UDP headers created a regression in RECVORIGDSTADDR
          socket option support, from Willem de Bruijn.
      
       7) UID based routing added a potential OOPS in ip_do_redirect() when we
          see an SKB without a socket attached. We just need it for the
          network namespace which we can get from skb->dev instead. Fix from
          Lorenzo Colitti.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
        sctp: fix recovering from 0 win with small data chunks
        sctp: do not loose window information if in rwnd_over
        virtio-net: XDP support for small buffers
        virtio-net: remove big packet XDP codes
        virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
        virtio-net: make rx buf size estimation works for XDP
        virtio-net: unbreak csumed packets for XDP_PASS
        virtio-net: correctly handle XDP_PASS for linearized packets
        virtio-net: fix page miscount during XDP linearizing
        virtio-net: correctly xmit linearized page on XDP_TX
        virtio-net: remove the warning before XDP linearizing
        mlxsw: spectrum_router: Correctly remove nexthop groups
        mlxsw: spectrum_router: Don't reflect dead neighs
        neigh: Send netevent after marking neigh as dead
        ipv6: handle -EFAULT from skb_copy_bits
        inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
        net/sched: cls_flower: Mandate mask when matching on flags
        net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6
        stmmac: CSR clock configuration fix
        net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
        ...
      50b17cfb
    • Marcelo Ricardo Leitner's avatar
      sctp: fix recovering from 0 win with small data chunks · 1636098c
      Marcelo Ricardo Leitner authored
      Currently if SCTP closes the receive window with window pressure, mostly
      caused by excessive skb overhead on payload/overheads ratio, SCTP will
      close the window abruptly while saving the delta on rwnd_press. It will
      start recovering rwnd as the chunks are consumed by the application and
      the rwnd_press will be only recovered after rwnd reach the same value as
      of rwnd_press, mostly to prevent silly window syndrome.
      
      Thing is, this is very inefficient with small data chunks, as with those
      it will never reach back that value, and thus it will never recover from
      such pressure. This means that we will not issue window updates when
      recovering from 0 window and will rely on a sender retransmit to notice
      it.
      
      The fix here is to remove such threshold, as no value is good enough: it
      depends on the (avg) chunk sizes being used.
      
      Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
      sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
      Rate limited to 845kbps, for visibility. Capture done at netserver side.
      
      Previously:
      01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
      01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
      01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
      01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
      01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
      01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
        (window update missed)
      04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
      04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
      04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
      04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g
      
      After:
      02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
      02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
      02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
      02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
      02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
      02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
      03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
      03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
      03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
      03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
      03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
      03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
        (following data transmission triggered by window updates above)
      03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
      03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
      03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
      03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
      03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
      03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1636098c
    • Marcelo Ricardo Leitner's avatar
      sctp: do not loose window information if in rwnd_over · 58b94d88
      Marcelo Ricardo Leitner authored
      It's possible that we receive a packet that is larger than current
      window. If it's the first packet in this way, it will cause it to
      increase rwnd_over. Then, if we receive another data chunk (specially as
      SCTP allows you to have one data chunk in flight even during 0 window),
      rwnd_over will be overwritten instead of added to.
      
      In the long run, this could cause the window to grow bigger than its
      initial size, as rwnd_over would be charged only for the last received
      data chunk while the code will try open the window for all packets that
      were received and had its value in rwnd_over overwritten. This, then,
      can lead to the worsening of payload/buffer ratio and cause rwnd_press
      to kick in more often.
      
      The fix is to sum it too, same as is done for rwnd_press, so that if we
      receive 3 chunks after closing the window, we still have to release that
      same amount before re-opening it.
      
      Log snippet from sctp_test exhibiting the issue:
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58b94d88
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a307d0a0
      Linus Torvalds authored
      Pull final vfs updates from Al Viro:
       "Assorted cleanups and fixes all over the place"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        sg_write()/bsg_write() is not fit to be called under KERNEL_DS
        ufs: fix function declaration for ufs_truncate_blocks
        fs: exec: apply CLOEXEC before changing dumpable task flags
        seq_file: reset iterator to first record for zero offset
        vfs: fix isize/pos/len checks for reflink & dedupe
        [iov_iter] fix iterate_all_kinds() on empty iterators
        move aio compat to fs/aio.c
        reorganize do_make_slave()
        clone_private_mount() doesn't need to touch namespace_sem
        remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()
      a307d0a0
    • David S. Miller's avatar
      Merge branch 'virtio-net-xdp-fixes' · e57cbe48
      David S. Miller authored
      Jason Wang says:
      
      ====================
      several fixups for virtio-net XDP
      
      Merry Xmas and a Happy New year to all:
      
      This series tries to fixes several issues for virtio-net XDP which
      could be categorized into several parts:
      
      - fix several issues during XDP linearizing
      - allow csumed packet to work for XDP_PASS
      - make EWMA rxbuf size estimation works for XDP
      - forbid XDP when GUEST_UFO is support
      - remove big packet XDP support
      - add XDP support or small buffer
      
      Please see individual patches for details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e57cbe48
    • Jason Wang's avatar
      virtio-net: XDP support for small buffers · bb91accf
      Jason Wang authored
      Commit f600b690 ("virtio_net: Add XDP support") leaves the case of
      small receive buffer untouched. This will confuse the user who want to
      set XDP but use small buffers. Other than forbid XDP in small buffer
      mode, let's make it work. XDP then can only work at skb->data since
      virtio-net create skbs during refill, this is sub optimal which could
      be optimized in the future.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb91accf
    • Jason Wang's avatar
      virtio-net: remove big packet XDP codes · c47a43d3
      Jason Wang authored
      Now we in fact don't allow XDP for big packets, remove its codes.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c47a43d3
    • Jason Wang's avatar
      virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support · 92502fe8
      Jason Wang authored
      When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
      packet that exceeds a single page which could not be handled
      correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
      supported. While at it, forbid XDP for ECN (which comes only from GRO)
      too to prevent user from misconfiguration.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92502fe8
    • Jason Wang's avatar
      virtio-net: make rx buf size estimation works for XDP · 5c33474d
      Jason Wang authored
      We don't update ewma rx buf size in the case of XDP. This will lead
      underestimation of rx buf size which causes host to produce more than
      one buffers. This will greatly increase the possibility of XDP page
      linearization.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c33474d
    • Jason Wang's avatar
      virtio-net: unbreak csumed packets for XDP_PASS · b00f70b0
      Jason Wang authored
      We drop csumed packet when do XDP for packets. This breaks
      XDP_PASS when GUEST_CSUM is supported. Fix this by allowing csum flag
      to be set. With this patch, simple TCP works for XDP_PASS.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b00f70b0
    • Jason Wang's avatar
      virtio-net: correctly handle XDP_PASS for linearized packets · 1830f893
      Jason Wang authored
      When XDP_PASS were determined for linearized packets, we try to get
      new buffers in the virtqueue and build skbs from them. This is wrong,
      we should create skbs based on existed buffers instead. Fixing them by
      creating skb based on xdp_page.
      
      With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1830f893
    • Jason Wang's avatar
      virtio-net: fix page miscount during XDP linearizing · 56a86f84
      Jason Wang authored
      We don't put page during linearizing, the would cause leaking when
      xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
      put page accordingly. Also decrease the number of buffers during
      linearizing to make sure caller can free buffers correctly when packet
      exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
      huge number of packets.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56a86f84
    • Jason Wang's avatar
      virtio-net: correctly xmit linearized page on XDP_TX · 275be061
      Jason Wang authored
      After we linearize page, we should xmit this page instead of the page
      of first buffer which may lead unexpected result. With this patch, we
      can see correct packet during XDP_TX.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      275be061
    • Jason Wang's avatar
      virtio-net: remove the warning before XDP linearizing · 73b62bd0
      Jason Wang authored
      Since we use EWMA to estimate the size of rx buffer. When rx buffer
      size is underestimated, it's usual to have a packet with more than one
      buffers. Consider this is not a bug, remove the warning and correct
      the comment before XDP linearizing.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73b62bd0
    • Linus Torvalds's avatar
      Merge tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs · fc26901b
      Linus Torvalds authored
      Pull befs updates from Luis de Bethencourt:
       "A series of small fixes and adding NFS export support"
      
      * tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs:
        befs: add NFS export support
        befs: remove trailing whitespaces
        befs: remove signatures from comments
        befs: fix style issues in header files
        befs: fix style issues in linuxvfs.c
        befs: fix typos in linuxvfs.c
        befs: fix style issues in io.c
        befs: fix style issues in inode.c
        befs: fix style issues in debug.c
      fc26901b
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-4.10-rc1' of git://people.freedesktop.org/~airlied/linux · 01302aac
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Some fixes came in while I was out, mostly intel and amdgpu ones, with
        one ast fix"
      
      Daniel Vetter says:
       "This should also shut up the WARN_ON(!intel_dp->lane_count) noise"
      
      * tag 'drm-fixes-for-4.10-rc1' of git://people.freedesktop.org/~airlied/linux: (35 commits)
        drm/amdgpu: update tile table for oland/hainan
        drm/amdgpu: update tile table for verde
        drm/amdgpu: update rev id for verde
        drm/amdgpu: update golden setting for verde
        drm/amdgpu: update rev id for oland
        drm/amdgpu: update golden setting for oland
        drm/amdgpu: update rev id for hainan
        drm/amdgpu: update golden setting for hainan
        drm/amdgpu: update rev id for pitcairn
        drm/amdgpu: update golden setting for pitcairn
        drm/amdgpu: update golden setting/tiling table of tahiti
        drm/i915: skip the first 4k of stolen memory on everything >= gen8
        drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
        drm/i915: Fix use after free in logical_render_ring_init
        drm/i915: disable PSR by default on HSW/BDW
        drm/i915: Fix setting of boost freq tunable
        drm/i915: tune down the fast link training vs boot fail
        drm/i915: Reorder phys backing storage release
        drm/i915/gen9: Fix PCODE polling during SAGV disabling
        drm/i915/gen9: Fix PCODE polling during CDCLK change notification
        ...
      01302aac
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 29691591
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "First round of -rc fixes for 4.10 kernel:
      
         - a series of qedr fixes
         - a series of rxe fixes
         - one i40iw fix
         - one cma fix
         - one cxgb4 fix"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/rxe: Don't check for null ptr in send()
        IB/rxe: Drop future atomic/read packets rather than retrying
        IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends
        qedr: Always notify the verb consumer of flushed CQEs
        qedr: clear the vendor error field in the work completion
        qedr: post_send/recv according to QP state
        qedr: ignore inline flag in read verbs
        qedr: modify QP state to error when destroying it
        qedr: return correct value on modify qp
        qedr: return error if destroy CQ failed
        qedr: configure the number of CQEs on CQ creation
        i40iw: Set 128B as the only supported RQ WQE size
        IB/cma: Fix a race condition in iboe_addr_get_sgid()
        IB/rxe: Fix a memory leak in rxe_qp_cleanup()
        iw_cxgb4: set correct FetchBurstMax for QPs
      29691591
    • Linus Torvalds's avatar
      Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f290cbac
      Linus Torvalds authored
      Pull late SCSI updates from James Bottomley:
       "This is mostly stuff which missed the initial pull.
      
        There's a new driver: qedi, and some ufs, ibmvscsis and ncr5380
        updates plus some assorted driver fixes and also a fix for the bug
        where if a device goes into a blocked state between configuration and
        sysfs device add (which can be a long time under async probing) it
        would become permanently blocked"
      
      * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits)
        scsi: avoid a permanent stop of the scsi device's request queue
        scsi: mpt3sas: Recognize and act on iopriority info
        scsi: qla2xxx: Fix Target mode handling with Multiqueue changes.
        scsi: qla2xxx: Add Block Multi Queue functionality.
        scsi: qla2xxx: Add multiple queue pair functionality.
        scsi: qla2xxx: Utilize pci_alloc_irq_vectors/pci_free_irq_vectors calls.
        scsi: qla2xxx: Only allow operational MBX to proceed during RESET.
        scsi: hpsa: remove memory allocate failure message
        scsi: Update 3ware driver email addresses
        scsi: zfcp: fix rport unblock race with LUN recovery
        scsi: zfcp: do not trace pure benign residual HBA responses at default level
        scsi: zfcp: fix use-after-"free" in FC ingress path after TMF
        scsi: libcxgbi: return error if interface is not up
        scsi: cxgb4i: libcxgbi: add missing module_put()
        scsi: cxgb4i: libcxgbi: cxgb4: add T6 iSCSI completion feature
        scsi: cxgb4i: libcxgbi: add active open cmd for T6 adapters
        scsi: cxgb4i: use cxgb4_tp_smt_idx() to get smt_idx
        scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.
        scsi: aacraid: remove wildcard for series 9 controllers
        scsi: ibmvscsi: add write memory barrier to CRQ processing
        ...
      f290cbac
    • Linus Torvalds's avatar
      Merge tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 42e0372c
      Linus Torvalds authored
      Pull more ARC updates from Vineet Gupta:
      
       - Fix for aliasing VIPT dcache in old ARC700 cores
      
       - micro-optimization in ARC700 ProtV handler
      
       - Enable SG_CHAIN  [Vladimir]
      
       - ARC HS38 core intc default to prio 1
      
      * tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache
        ARC: mm: No need to save cache version in @cpuinfo
        ARC: enable SG chaining
        ARCv2: intc: default all interrupts to priority 1
        ARCv2: entry: document intr disable in hard isr
        ARC: ARCompact entry: elide re-reading ECR in ProtV handler
      42e0372c
    • David S. Miller's avatar
      Merge branch 'mlxsw-router-fixes' · d3a51d6c
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Router fixes
      
      Ido says:
      
      First two patches ensure we remove from the device's table neighbours
      that are considered to be dead by the neighbour core.
      
      The last patch removes nexthop groups from the device when they are no
      longer valid.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3a51d6c
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Correctly remove nexthop groups · 58312125
      Ido Schimmel authored
      At the end of the nexthop initialization process we determine whether
      the nexthop should be offloaded or not based on the NUD state of the
      neighbour representing it. After all the nexthops were initialized we
      refresh the nexthop group and potentially offload it to the device, in
      case some of the nexthops were resolved.
      
      Make the destruction of a nexthop group symmetric with its creation by
      marking all nexthops as invalid and then refresh the nexthop group to
      make sure it was removed from the device's tables.
      
      Fixes: b2157149 ("mlxsw: spectrum_router: Add the nexthop neigh activity update")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58312125
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Don't reflect dead neighs · 93a87e5e
      Ido Schimmel authored
      When a neighbour is considered to be dead, we should remove it from the
      device's table regardless of its NUD state.
      
      Without this patch, after setting a port to be administratively down we
      get the following errors when we periodically try to update the kernel
      about neighbours activity:
      
      [  461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find
      matching neighbour for IP=192.168.100.2
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93a87e5e
    • Ido Schimmel's avatar
      neigh: Send netevent after marking neigh as dead · 53f800e3
      Ido Schimmel authored
      neigh_cleanup_and_release() is always called after marking a neighbour
      as dead, but it only notifies user space and not in-kernel listeners of
      the netevent notification chain.
      
      This can cause multiple problems. In my specific use case, it causes the
      listener (a switch driver capable of L3 offloads) to believe a neighbour
      entry is still valid, and is thus erroneously kept in the device's
      table.
      
      Fix that by sending a netevent after marking the neighbour as dead.
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53f800e3
    • Dave Jones's avatar
      ipv6: handle -EFAULT from skb_copy_bits · a98f9175
      Dave Jones authored
      By setting certain socket options on ipv6 raw sockets, we can confuse the
      length calculation in rawv6_push_pending_frames triggering a BUG_ON.
      
      RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
      RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
      RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
      RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
      RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
      R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
      R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
      
      Call Trace:
       [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
       [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
       [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
       [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
       [<ffffffff816da27e>] SyS_sendto+0xe/0x10
       [<ffffffff81002910>] do_syscall_64+0x50/0xa0
       [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
      
      Reproducer:
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      
      #define LEN 504
      
      int main(int argc, char* argv[])
      {
      	int fd;
      	int zero = 0;
      	char buf[LEN];
      
      	memset(buf, 0, LEN);
      
      	fd = socket(AF_INET6, SOCK_RAW, 7);
      
      	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
      	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
      
      	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
      }
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a98f9175
    • Willem de Bruijn's avatar
      inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets · 39b2dd76
      Willem de Bruijn authored
      Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
      the packet. For sockets that have transport headers pulled, transport
      offset can be negative. Use signed comparison to avoid overflow.
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Reported-by: default avatarNisar Jagabar <njagabar@cloudmark.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39b2dd76
    • David S. Miller's avatar
      Merge branch 'cls_flower-act_tunnel_key-fixes' · 9aa340a5
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      net/sched fixes for cls_flower and act_tunnel_key
      
      This small series contain a fix to the matching flags support
      in flower and to the tunnel key action MD prep for IPv6.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aa340a5
    • Or Gerlitz's avatar
      net/sched: cls_flower: Mandate mask when matching on flags · d9724772
      Or Gerlitz authored
      When matching on flags, we should require the user to provide the
      mask and avoid using an all-ones mask. Not doing so causes matching
      on flags provided w.o mask to hit on the value being unset for all
      flags, which may not what the user wanted to happen.
      
      Fixes: faa3ffce ('net/sched: cls_flower: Add support for matching on flags')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarPaul Blakey <paulb@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9724772
    • Or Gerlitz's avatar
      net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6 · dc594ecd
      Or Gerlitz authored
      The UDP dst port was provided to the helper function which sets the
      IPv6 IP tunnel meta-data under a wrong param order, fix that.
      
      Fixes: 75bfbca0 ('net/sched: act_tunnel_key: Add UDP dst port option')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc594ecd
    • jpinto's avatar
      stmmac: CSR clock configuration fix · 567be786
      jpinto authored
      When testing stmmac with my QoS reference design I checked a problem in the
      CSR clock configuration that was impossibilitating the phy discovery, since
      every read operation returned 0x0000ffff. This patch fixes the issue.
      Signed-off-by: default avatarJoao Pinto <jpinto@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      567be786
    • Al Viro's avatar
      Merge branch 'work.namespace' into for-linus · faf0dceb
      Al Viro authored
      faf0dceb
    • Al Viro's avatar
      sg_write()/bsg_write() is not fit to be called under KERNEL_DS · 128394ef
      Al Viro authored
      Both damn things interpret userland pointers embedded into the payload;
      worse, they are actually traversing those.  Leaving aside the bad
      API design, this is very much _not_ safe to call with KERNEL_DS.
      Bail out early if that happens.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      128394ef
    • Jeff Layton's avatar
      ufs: fix function declaration for ufs_truncate_blocks · f698cccb
      Jeff Layton authored
      sparse says:
      
          fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static?
      
      Note that the forward declaration in the file is already marked static.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f698cccb
    • Aleksa Sarai's avatar
      fs: exec: apply CLOEXEC before changing dumpable task flags · 613cc2b6
      Aleksa Sarai authored
      If you have a process that has set itself to be non-dumpable, and it
      then undergoes exec(2), any CLOEXEC file descriptors it has open are
      "exposed" during a race window between the dumpable flags of the process
      being reset for exec(2) and CLOEXEC being applied to the file
      descriptors. This can be exploited by a process by attempting to access
      /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
      
      The race in question is after set_dumpable has been (for get_link,
      though the trace is basically the same for readlink):
      
      [vfs]
      -> proc_pid_link_inode_operations.get_link
         -> proc_pid_get_link
            -> proc_fd_access_allowed
               -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
      
      Which will return 0, during the race window and CLOEXEC file descriptors
      will still be open during this window because do_close_on_exec has not
      been called yet. As a result, the ordering of these calls should be
      reversed to avoid this race window.
      
      This is of particular concern to container runtimes, where joining a
      PID namespace with file descriptors referring to the host filesystem
      can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
      against access of CLOEXEC file descriptors -- file descriptors which may
      reference filesystem objects the container shouldn't have access to).
      
      Cc: dev@opencontainers.org
      Cc: <stable@vger.kernel.org> # v3.2+
      Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
      Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      613cc2b6
    • Tomasz Majchrzak's avatar
      seq_file: reset iterator to first record for zero offset · e522751d
      Tomasz Majchrzak authored
      If kernfs file is empty on a first read, successive read operations
      using the same file descriptor will return no data, even when data is
      available. Default kernfs 'seq_next' implementation advances iterator
      position even when next object is not there. Kernfs 'seq_start' for
      following requests will not return iterator as position is already on
      the second object.
      
      This defect doesn't allow to monitor badblocks sysfs files from MD raid.
      They are initially empty but if data appears at some stage, userspace is
      not able to read it.
      Signed-off-by: default avatarTomasz Majchrzak <tomasz.majchrzak@intel.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e522751d
    • Darrick J. Wong's avatar
      vfs: fix isize/pos/len checks for reflink & dedupe · 22725ce4
      Darrick J. Wong authored
      Strengthen the checking of pos/len vs. i_size, clarify the return values
      for the clone prep function, and remove pointless code.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      22725ce4
    • Al Viro's avatar
      [iov_iter] fix iterate_all_kinds() on empty iterators · 33844e66
      Al Viro authored
      Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
      and followups, except that in this case we really want to do nothing when
      asked for zero-length operation - unlike zero-length iterate_and_advance(),
      zero-length iterate_all_kinds() has no side effects, and callers are simpler
      that way.
      
      That got exposed when copy_from_iter_full() had been used by tipc, which
      builds an msghdr with zero payload and (now) feeds it to a primitive
      based on iterate_all_kinds() instead of iterate_and_advance().
      Reported-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Tested-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      33844e66
    • Al Viro's avatar
      move aio compat to fs/aio.c · c00d2c7e
      Al Viro authored
      ... and fix the minor buglet in compat io_submit() - native one
      kills ioctx as cleanup when put_user() fails.  Get rid of
      bogus compat_... in !CONFIG_AIO case, while we are at it - they
      should simply fail with ENOSYS, same as for native counterparts.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c00d2c7e
  2. 22 Dec, 2016 2 commits
    • James Bottomley's avatar
      Merge branch 'misc' into for-linus · 3eff4c78
      James Bottomley authored
      3eff4c78
    • Dave Airlie's avatar
      Merge tag 'drm-intel-next-fixes-2016-12-22' of... · 4a401cee
      Dave Airlie authored
      Merge tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
      
      First set of i915 fixes for code in next.
      
      * tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel:
        drm/i915: skip the first 4k of stolen memory on everything >= gen8
        drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping
        drm/i915: Fix use after free in logical_render_ring_init
        drm/i915: disable PSR by default on HSW/BDW
        drm/i915: Fix setting of boost freq tunable
        drm/i915: tune down the fast link training vs boot fail
        drm/i915: Reorder phys backing storage release
        drm/i915/gen9: Fix PCODE polling during SAGV disabling
        drm/i915/gen9: Fix PCODE polling during CDCLK change notification
        drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting
        drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET
        drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
        drm/i915: drop the struct_mutex when wedged or trying to reset
      4a401cee