1. 25 Apr, 2017 14 commits
    • Chuck Lever's avatar
      svcrdma: Use rdma_rw API in RPC reply path · 9a6a180b
      Chuck Lever authored
      The current svcrdma sendto code path posts one RDMA Write WR at a
      time. Each of these Writes typically carries a small number of pages
      (for instance, up to 30 pages for mlx4 devices). That means a 1MB
      NFS READ reply requires 9 ib_post_send() calls for the Write WRs,
      and one for the Send WR carrying the actual RPC Reply message.
      
      Instead, use the new rdma_rw API. The details of Write WR chain
      construction and memory registration are taken care of in the RDMA
      core. svcrdma can focus on the details of the RPC-over-RDMA
      protocol. This gives three main benefits:
      
      1. All Write WRs for one RDMA segment are posted in a single chain.
      As few as one ib_post_send() for each Write chunk.
      
      2. The Write path can now use FRWR to register the Write buffers.
      If the device's maximum page list depth is large, this means a
      single Write WR is needed for each RPC's Write chunk data.
      
      3. The new code introduces support for RPCs that carry both a Write
      list and a Reply chunk. This combination can be used for an NFSv4
      READ where the data payload is large, and thus is removed from the
      Payload Stream, but the Payload Stream is still larger than the
      inline threshold.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      9a6a180b
    • Chuck Lever's avatar
      svcrdma: Introduce local rdma_rw API helpers · f13193f5
      Chuck Lever authored
      The plan is to replace the local bespoke code that constructs and
      posts RDMA Read and Write Work Requests with calls to the rdma_rw
      API. This shares code with other RDMA-enabled ULPs that manages the
      gory details of buffer registration and posting Work Requests.
      
      Some design notes:
      
       o The structure of RPC-over-RDMA transport headers is flexible,
         allowing multiple segments per Reply with arbitrary alignment,
         each with a unique R_key. Write and Send WRs continue to be
         built and posted in separate code paths. However, one whole
         chunk (with one or more RDMA segments apiece) gets exactly
         one ib_post_send and one work completion.
      
       o svc_xprt reference counting is modified, since a chain of
         rdma_rw_ctx structs generates one completion, no matter how
         many Write WRs are posted.
      
       o The current code builds the transport header as it is construct-
         ing Write WRs. I've replaced that with marshaling of transport
         header data items in a separate step. This is because the exact
         structure of client-provided segments may not align with the
         components of the server's reply xdr_buf, or the pages in the
         page list. Thus parts of each client-provided segment may be
         written at different points in the send path.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      f13193f5
    • Chuck Lever's avatar
      svcrdma: Clean up svc_rdma_get_inv_rkey() · c238c4c0
      Chuck Lever authored
      Replace C structure-based XDR decoding with more portable code that
      instead uses pointer arithmetic.
      
      This is a refactoring change only.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      c238c4c0
    • Chuck Lever's avatar
      svcrdma: Add helper to save pages under I/O · c55ab070
      Chuck Lever authored
      Clean up: extract the logic to save pages under I/O into a helper to
      add a big documenting comment without adding clutter in the send
      path.
      
      This is a refactoring change only.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      c55ab070
    • Chuck Lever's avatar
      svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT · b623589d
      Chuck Lever authored
      The Send Queue depth is temporarily reduced to 1 SQE per credit. The
      new rdma_rw API does an internal computation, during QP creation, to
      increase the depth of the Send Queue to handle RDMA Read and Write
      operations.
      
      This change has to come before the NFSD code paths are updated to
      use the rdma_rw API. Without this patch, rdma_rw_init_qp() increases
      the size of the SQ too much, resulting in memory allocation failures
      during QP creation.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      b623589d
    • Chuck Lever's avatar
      svcrdma: Add svc_rdma_map_reply_hdr() · 6e6092ca
      Chuck Lever authored
      Introduce a helper to DMA-map a reply's transport header before
      sending it. This will in part replace the map vector cache.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      6e6092ca
    • Chuck Lever's avatar
      svcrdma: Move send_wr to svc_rdma_op_ctxt · 17f5f7f5
      Chuck Lever authored
      Clean up: Move the ib_send_wr off the stack, and move common code
      to post a Send Work Request into a helper.
      
      This is a refactoring change only.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      17f5f7f5
    • NeilBrown's avatar
      NFS: don't try to cross a mountpount when there isn't one there. · 99bbf6ec
      NeilBrown authored
      consider the sequence of commands:
       mkdir -p /import/nfs /import/bind /import/etc
       mount --bind / /import/bind
       mount --make-private /import/bind
       mount --bind /import/etc /import/bind/etc
      
       exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
       mount -o vers=4 localhost:/ /import/nfs
       ls -l /import/nfs/etc
      
      You would not expect this to report a stale file handle.
      Yet it does.
      
      The manipulations under /import/bind cause the dentry for
      /etc to get the DCACHE_MOUNTED flag set, even though nothing
      is mounted on /etc.  This causes nfsd to call
      nfsd_cross_mnt() even though there is no mountpoint.  So an
      upcall to mountd for "/etc" is performed.
      
      The 'crossmnt' flag on the export of / causes mountd to
      report that /etc is exported as it is a descendant of /.  It
      assumes the kernel wouldn't ask about something that wasn't
      a mountpoint.  The filehandle returned identifies the
      filesystem and the inode number of /etc.
      
      When this filehandle is presented to rpc.mountd, via
      "nfsd.fh", the inode cannot be found associated with any
      name in /etc/exports, or with any mountpoint listed by
      getmntent().  So rpc.mountd says the filehandle doesn't
      exist. Hence ESTALE.
      
      This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
      too much.  It is just a hint, not a guarantee.
      Change nfsd_mountpoint() to return '1' for a certain mountpoint,
      '2' for a possible mountpoint, and 0 otherwise.
      
      Then change nfsd_crossmnt() to check if follow_down()
      actually found a mountpount and, if not, to avoid performing
      a lookup if the location is not known to certainly require
      an export-point.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      99bbf6ec
    • NeilBrown's avatar
      nfsd4: remove pointless strdup_if_nonnull · 2f10fdcb
      NeilBrown authored
      kstrdup() already checks for NULL.
      
      (Brought to our attention by Jason Yann noticing (from sparse output)
      that it should have been declared static.)
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Reported-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      2f10fdcb
    • Dmitry V. Levin's avatar
      uapi: fix linux/nfsd/cld.h userspace compilation errors · 16719199
      Dmitry V. Levin authored
      Include <linux/types.h> and consistently use types it provides
      to fix the following linux/nfsd/cld.h userspace compilation errors:
      
      /usr/include/linux/nfsd/cld.h:40:2: error: unknown type name 'uint16_t'
        uint16_t cn_len;    /* length of cm_id */
      /usr/include/linux/nfsd/cld.h:46:2: error: unknown type name 'uint8_t'
        uint8_t  cm_vers;  /* upcall version */
      /usr/include/linux/nfsd/cld.h:47:2: error: unknown type name 'uint8_t'
        uint8_t  cm_cmd;   /* upcall command */
      /usr/include/linux/nfsd/cld.h:48:2: error: unknown type name 'int16_t'
        int16_t  cm_status;  /* return code */
      /usr/include/linux/nfsd/cld.h:49:2: error: unknown type name 'uint32_t'
        uint32_t cm_xid;   /* transaction id */
      /usr/include/linux/nfsd/cld.h:51:3: error: unknown type name 'int64_t'
         int64_t  cm_gracetime; /* grace period start time */
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      16719199
    • J. Bruce Fields's avatar
      nfsd: check for oversized NFSv2/v3 arguments · 51f56777
      J. Bruce Fields authored
      A client can append random data to the end of an NFSv2 or NFSv3 RPC call
      without our complaining; we'll just stop parsing at the end of the
      expected data and ignore the rest.
      
      Encoded arguments and replies are stored together in an array of pages,
      and if a call is too large it could leave inadequate space for the
      reply.  This is normally OK because NFS RPC's typically have either
      short arguments and long replies (like READ) or long arguments and short
      replies (like WRITE).  But a client that sends an incorrectly long reply
      can violate those assumptions.  This was observed to cause crashes.
      
      So, insist that the argument not be any longer than we expect.
      
      Also, several operations increment rq_next_page in the decode routine
      before checking the argument size, which can leave rq_next_page pointing
      well past the end of the page array, causing trouble later in
      svc_free_pages.
      
      As followup we may also want to rewrite the encoding routines to check
      more carefully that they aren't running off the end of the page array.
      Reported-by: default avatarTuomas Haanpää <thaan@synopsys.com>
      Reported-by: default avatarAri Kauppi <ari@synopsys.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      51f56777
    • J. Bruce Fields's avatar
      nfsd: stricter decoding of write-like NFSv2/v3 ops · 13bf9fbf
      J. Bruce Fields authored
      The NFSv2/v3 code does not systematically check whether we decode past
      the end of the buffer.  This generally appears to be harmless, but there
      are a few places where we do arithmetic on the pointers involved and
      don't account for the possibility that a length could be negative.  Add
      checks to catch these.
      Reported-by: default avatarTuomas Haanpää <thaan@synopsys.com>
      Reported-by: default avatarAri Kauppi <ari@synopsys.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      13bf9fbf
    • J. Bruce Fields's avatar
      nfsd4: minor NFSv2/v3 write decoding cleanup · db44bac4
      J. Bruce Fields authored
      Use a couple shortcuts that will simplify a following bugfix.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      db44bac4
    • J. Bruce Fields's avatar
      nfsd: check for oversized NFSv2/v3 arguments · e6838a29
      J. Bruce Fields authored
      A client can append random data to the end of an NFSv2 or NFSv3 RPC call
      without our complaining; we'll just stop parsing at the end of the
      expected data and ignore the rest.
      
      Encoded arguments and replies are stored together in an array of pages,
      and if a call is too large it could leave inadequate space for the
      reply.  This is normally OK because NFS RPC's typically have either
      short arguments and long replies (like READ) or long arguments and short
      replies (like WRITE).  But a client that sends an incorrectly long reply
      can violate those assumptions.  This was observed to cause crashes.
      
      Also, several operations increment rq_next_page in the decode routine
      before checking the argument size, which can leave rq_next_page pointing
      well past the end of the page array, causing trouble later in
      svc_free_pages.
      
      So, following a suggestion from Neil Brown, add a central check to
      enforce our expectation that no NFSv2/v3 call has both a large call and
      a large reply.
      
      As followup we may also want to rewrite the encoding routines to check
      more carefully that they aren't running off the end of the page array.
      
      We may also consider rejecting calls that have any extra garbage
      appended.  That would be safer, and within our rights by spec, but given
      the age of our server and the NFS protocol, and the fact that we've
      never enforced this before, we may need to balance that against the
      possibility of breaking some oddball client.
      Reported-by: default avatarTuomas Haanpää <thaan@synopsys.com>
      Reported-by: default avatarAri Kauppi <ari@synopsys.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      e6838a29
  2. 23 Apr, 2017 4 commits
  3. 21 Apr, 2017 22 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux · 94836ecf
      Linus Torvalds authored
      Pull nfsd bugfix from Bruce Fields:
       "Fix a 4.11 regression that triggers a BUG() on an attempt to use an
        unsupported NFSv4 compound op"
      
      * tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux:
        nfsd: fix oops on unsupported operation
      94836ecf
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 057a650b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Don't race in IPSEC dumps, from Yuejie Shi.
      
       2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.
      
       3) Fix out of bounds access in ipv6 segment routing code, from David
          Lebrun.
      
       4) Don't write into the header of cloned SKBs in smsc95xx driver, from
          James Hughes.
      
       5) Several other drivers have this bug too, fix them. From Eric
          Dumazet.
      
       6) Fix access to uninitialized data in TC action cookie code, from
          Wolfgang Bumiller.
      
       7) Fix double free in IPV6 segment routing, again from David Lebrun.
      
       8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.
      
       9) Fix use after free in qrtr code, from Dan Carpenter.
      
      10) Don't double-destroy devices in ip6mr code, from Nikolay
          Aleksandrov.
      
      11) Don't pass out-of-range TX queue indices into drivers, from Tushar
          Dave.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
        netpoll: Check for skb->queue_mapping
        ip6mr: fix notification device destruction
        bpf, doc: update bpf maintainers entry
        net: qrtr: potential use after free in qrtr_sendmsg()
        bpf: Fix values type used in test_maps
        net: ipv6: RTF_PCPU should not be settable from userspace
        gso: Validate assumption of frag_list segementation
        kaweth: use skb_cow_head() to deal with cloned skbs
        ch9200: use skb_cow_head() to deal with cloned skbs
        lan78xx: use skb_cow_head() to deal with cloned skbs
        sr9700: use skb_cow_head() to deal with cloned skbs
        cx82310_eth: use skb_cow_head() to deal with cloned skbs
        smsc75xx: use skb_cow_head() to deal with cloned skbs
        ipv6: sr: fix double free of skb after handling invalid SRH
        MAINTAINERS: Add "B:" field for networking.
        net sched actions: allocate act cookie early
        qed: Fix issue in populating the PFC config paramters.
        qed: Fix possible system hang in the dcbnl-getdcbx() path.
        qed: Fix sending an invalid PFC error mask to MFW.
        qed: Fix possible error in populating max_tc field.
        ...
      057a650b
    • Tushar Dave's avatar
      netpoll: Check for skb->queue_mapping · c70b17b7
      Tushar Dave authored
      Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
      otherwise skbs with queue_mapping greater than real_num_tx_queues
      can be sent to the underlying driver and can result in kernel panic.
      
      One such event is running netconsole and enabling VF on the same
      device. Or running netconsole and changing number of tx queues via
      ethtool on same device.
      
      e.g.
      Unable to handle kernel NULL pointer dereference
      tsk->{mm,active_mm}->context = 0000000000001525
      tsk->{mm,active_mm}->pgd = fff800130ff9a000
                    \|/ ____ \|/
                    "@'/ .. \`@"
                    /_| \__/ |_\
                       \__U_/
      kworker/48:1(475): Oops [#1]
      CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G           OE
      4.11.0-rc3-davem-net+ #7
      Workqueue: events queue_process
      task: fff80013113299c0 task.stack: fff800131132c000
      TSTATE: 0000004480e01600 TPC: 00000000103f9e3c TNPC: 00000000103f9e40 Y:
      00000000    Tainted: G           OE
      TPC: <ixgbe_xmit_frame_ring+0x7c/0x6c0 [ixgbe]>
      g0: 0000000000000000 g1: 0000000000003fff g2: 0000000000000000 g3:
      0000000000000001
      g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
      00000000000000c0
      o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
      0000000000000003
      o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
      000000000049ed94
      RPC: <set_next_entity+0x34/0xb80>
      l0: 0000000000000000 l1: 0000000000000800 l2: 0000000000000000 l3:
      0000000000000000
      l4: 000b2aa30e34b10d l5: 0000000000000000 l6: 0000000000000000 l7:
      fff8001fa7605028
      i0: fff80013111a8a00 i1: fff80013155a0780 i2: 0000000000000000 i3:
      0000000000000000
      i4: 0000000000000000 i5: 0000000000100000 i6: fff800131132f1a1 i7:
      00000000103fa4b0
      I7: <ixgbe_xmit_frame+0x30/0xa0 [ixgbe]>
      Call Trace:
       [00000000103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
       [0000000000998c74] netpoll_start_xmit+0xf4/0x200
       [0000000000998e10] queue_process+0x90/0x160
       [0000000000485fa8] process_one_work+0x188/0x480
       [0000000000486410] worker_thread+0x170/0x4c0
       [000000000048c6b8] kthread+0xd8/0x120
       [0000000000406064] ret_from_fork+0x1c/0x2c
       [0000000000000000]           (null)
      Disabling lock debugging due to kernel taint
      Caller[00000000103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
      Caller[0000000000998c74]: netpoll_start_xmit+0xf4/0x200
      Caller[0000000000998e10]: queue_process+0x90/0x160
      Caller[0000000000485fa8]: process_one_work+0x188/0x480
      Caller[0000000000486410]: worker_thread+0x170/0x4c0
      Caller[000000000048c6b8]: kthread+0xd8/0x120
      Caller[0000000000406064]: ret_from_fork+0x1c/0x2c
      Caller[0000000000000000]:           (null)
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c70b17b7
    • Nikolay Aleksandrov's avatar
      ip6mr: fix notification device destruction · 723b929c
      Nikolay Aleksandrov authored
      Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
      because we call unregister_netdevice_many for a device that is already
      being destroyed. In IPv4's ipmr that has been resolved by two commits
      long time ago by introducing the "notify" parameter to the delete
      function and avoiding the unregister when called from a notifier, so
      let's do the same for ip6mr.
      
      The trace from Andrey:
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:6813!
      invalid opcode: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Workqueue: netns cleanup_net
      task: ffff880069208000 task.stack: ffff8800692d8000
      RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
      RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
      RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
      RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
      R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
      R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
      FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
      Call Trace:
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
       ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
       notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
       call_netdevice_notifiers net/core/dev.c:1663
       rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
       unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
       unregister_netdevice_many net/core/dev.c:7880
       default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
       ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
       cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
       process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
       worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
      Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
      47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe <0f>
      0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
      RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
      ---[ end trace e0b29c57e9b3292c ]---
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      723b929c
    • Daniel Borkmann's avatar
      bpf, doc: update bpf maintainers entry · cdb90499
      Daniel Borkmann authored
      Add various related files that have been missing under
      BPF entry covering essential parts of its infrastructure
      and also add myself as co-maintainer.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdb90499
    • Dan Carpenter's avatar
      net: qrtr: potential use after free in qrtr_sendmsg() · 6f60f438
      Dan Carpenter authored
      If skb_pad() fails then it frees the skb so we should check for errors.
      
      Fixes: bdabad3e ("net: Add Qualcomm IPC router")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f60f438
    • David Miller's avatar
      bpf: Fix values type used in test_maps · 89087c45
      David Miller authored
      Maps of per-cpu type have their value element size adjusted to 8 if it
      is specified smaller during various map operations.
      
      This makes test_maps as a 32-bit binary fail, in fact the kernel
      writes past the end of the value's array on the user's stack.
      
      To be quite honest, I think the kernel should reject creation of a
      per-cpu map that doesn't have a value size of at least 8 if that's
      what the kernel is going to silently adjust to later.
      
      If the user passed something smaller, it is a sizeof() calcualtion
      based upon the type they will actually use (just like in this testcase
      code) in later calls to the map operations.
      
      Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      89087c45
    • David Ahern's avatar
      net: ipv6: RTF_PCPU should not be settable from userspace · 557c44be
      David Ahern authored
      Andrey reported a fault in the IPv6 route code:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff880069809600 task.stack: ffff880062dc8000
      RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
      RSP: 0018:ffff880062dced30 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
      RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
      RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
      FS:  00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
      Call Trace:
       ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
       ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
      ...
      
      Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
      set. Flags passed to the kernel are blindly copied to the allocated
      rt6_info by ip6_route_info_create making a newly inserted route appear
      as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
      and expects rt->dst.from to be set - which it is not since it is not
      really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
      generates the fault.
      
      Fix by checking for the flag and failing with EINVAL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      557c44be
    • Ilan Tayari's avatar
      gso: Validate assumption of frag_list segementation · 43170c4e
      Ilan Tayari authored
      Commit 07b26c94 ("gso: Support partial splitting at the frag_list
      pointer") assumes that all SKBs in a frag_list (except maybe the last
      one) contain the same amount of GSO payload.
      
      This assumption is not always correct, resulting in the following
      warning message in the log:
          skb_segment: too many frags
      
      For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
      one frag, and some with 2 frags.
      After GRO, the frag_list SKBs end up having different amounts of payload.
      If this frag_list SKB is then forwarded, the aforementioned assumption
      is violated.
      
      Validate the assumption, and fall back to software GSO if it not true.
      
      Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212
      Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
      Signed-off-by: default avatarIlan Tayari <ilant@mellanox.com>
      Signed-off-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43170c4e
    • David S. Miller's avatar
      Merge branch 'skb_cow_head' · 918b7024
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: use skb_cow_head() to deal with cloned skbs
      
      James Hughes found an issue with smsc95xx driver. Same problematic code
      is found in other drivers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      918b7024
    • Eric Dumazet's avatar
      kaweth: use skb_cow_head() to deal with cloned skbs · 39fba783
      Eric Dumazet authored
      We can use skb_cow_head() to properly deal with clones,
      especially the ones coming from TCP stack that allow their head being
      modified. This avoids a copy.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39fba783
    • Eric Dumazet's avatar
      ch9200: use skb_cow_head() to deal with cloned skbs · 6bc6895b
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: 4a476bd6 ("usbnet: New driver for QinHeng CH9200 devices")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bc6895b
    • Eric Dumazet's avatar
      lan78xx: use skb_cow_head() to deal with cloned skbs · d4ca7359
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Cc: Woojung Huh <woojung.huh@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4ca7359
    • Eric Dumazet's avatar
      sr9700: use skb_cow_head() to deal with cloned skbs · d532c108
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: c9b37458 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d532c108
    • Eric Dumazet's avatar
      cx82310_eth: use skb_cow_head() to deal with cloned skbs · a9e840a2
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: cc28a20e ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9e840a2
    • Eric Dumazet's avatar
      smsc75xx: use skb_cow_head() to deal with cloned skbs · b7c6d267
      Eric Dumazet authored
      We need to ensure there is enough headroom to push extra header,
      but we also need to check if we are allowed to change headers.
      
      skb_cow_head() is the proper helper to deal with this.
      
      Fixes: d0cad871 ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Hughes <james.hughes@raspberrypi.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7c6d267
    • David Lebrun's avatar
      ipv6: sr: fix double free of skb after handling invalid SRH · 95b9b88d
      David Lebrun authored
      The icmpv6_param_prob() function already does a kfree_skb(),
      this patch removes the duplicate one.
      
      Fixes: 1ababeba ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95b9b88d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 92b4fc75
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Just two fixes.
      
        The first fixes kprobing a stdu, and is marked for stable as it's been
        broken for ~ever. In hindsight this could have gone in next.
      
        The other is a fix for a change we merged this cycle, where if we take
        a certain exception when the kernel is running relocated (currently
        only used for kdump), we checkstop the box.
      
        Thanks to Ravi Bangoria"
      
      * tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
        powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
      92b4fc75
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · fe7ba289
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "Sorry this is so late. It's been in -next for over a week, but I
        forgot to send it on until now.
      
        A single fix to the DT binding of the HiSilicon PCIe host support"
      
      * tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)
      fe7ba289
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · a9aa1908
      Linus Torvalds authored
      Pull block layer fixes from Jens Axboe:
       "A couple of last minute fixes for regressions in this cycle. More
        specifically:
      
         - Two patches from Andy, adjusting the NVMe APST quirks to avoid some
           issues specific to one Toshiba drive, and some variant of Samsung
           on two specific Dell laptops.
      
         - A fix for mtip32xx, turning off mq scheduling on that device. We
           have a real fix for this, but it's too late in the cycle.
           Thankfully we already have a NO_SCHED flag we can apply here. A
           prep patch for this is ensuring that we honor the NO_SCHED flag
           when attempting to online switch schedulers, previsouly we only did
           so for drive load time. From Ming.
      
         - Fixing an oops in blk-mq polling with scheduling attached. This one
           is easily reproducible, it would be a shame to release 4.11 with
           that issue. From me.
      
        I'd prefer not having to send in patches at this point in time, but
        the above are all things that have regressed in this cycle and the
        fixes are relatively straight forward"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: fix potential oops with polling and blk-mq scheduler
        nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA"
        nvme: Adjust the Samsung APST quirk
        mtip32xx: pass BLK_MQ_F_NO_SCHED
        block: respect BLK_MQ_F_NO_SCHED
      a9aa1908
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4664e322
      Linus Torvalds authored
      Pull ACPI build fix from Rafael Wysocki:
       "This avoids a false-positive build warning from the compiler.
      
        Specifics:
      
         - Avoid a false-positive warning regarding a variable that may not be
           initialized that started to trigger after a previous general build
           fix (Arnd Bergmann)"
      
      * tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / power: Avoid maybe-uninitialized warning
      4664e322
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 11b211ed
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
      
         - kmalloc sdio scratch buffer to make it DMA-friendly
      
        MMC host:
      
         - dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used
      
         - sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50
           cards"
      
      * tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
        mmc: dw_mmc: Don't allow Runtime PM for SDIO cards
        mmc: sdio: fix alignment issue in struct sdio_func
      11b211ed