1. 11 Feb, 2022 5 commits
    • Felix Maurer's avatar
      bpf: Do not try bpf_msg_push_data with len 0 · 4a11678f
      Felix Maurer authored
      If bpf_msg_push_data() is called with len 0 (as it happens during
      selftests/bpf/test_sockmap), we do not need to do anything and can
      return early.
      
      Calling bpf_msg_push_data() with len 0 previously lead to a wrong ENOMEM
      error: we later called get_order(copy + len); if len was 0, copy + len
      was also often 0 and get_order() returned some undefined value (at the
      moment 52). alloc_pages() caught that and failed, but then bpf_msg_push_data()
      returned ENOMEM. This was wrong because we are most probably not out of
      memory and actually do not need any additional memory.
      
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Signed-off-by: default avatarFelix Maurer <fmaurer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/df69012695c7094ccb1943ca02b4920db3537466.1644421921.git.fmaurer@redhat.com
      4a11678f
    • David S. Miller's avatar
      Merge ra.kernel.org:/pub/scm/linux/kernel/git/netfilter/nf · 525de9a7
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Add selftest for nft_synproxy, from Florian Westphal.
      
      2) xt_socket destroy path incorrectly disables IPv4 defrag for
         IPv6 traffic (typo), from Eric Dumazet.
      
      3) Fix exit value selftest nft_concat_range.sh, from Hangbin Liu.
      
      4) nft_synproxy disables the IPv4 hooks if the IPv6 hooks fail
         to be registered.
      
      5) disable rp_filter on router in selftest nft_fib.sh, also
         from Hangbin Liu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      525de9a7
    • Eric Dumazet's avatar
      drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit · dcd54265
      Eric Dumazet authored
      trace_napi_poll_hit() is reading stat->dev while another thread can write
      on it from dropmon_net_event()
      
      Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already,
      we only have to take care of load/store tearing.
      
      BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit
      
      write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1:
       dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579
       notifier_call_chain kernel/notifier.c:84 [inline]
       raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392
       call_netdevice_notifiers_info net/core/dev.c:1919 [inline]
       call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
       call_netdevice_notifiers net/core/dev.c:1945 [inline]
       unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415
       ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123
       vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515
       ops_exit_list net/core/net_namespace.c:173 [inline]
       cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0:
       trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292
       trace_napi_poll include/trace/events/napi.h:14 [inline]
       __napi_poll+0x36b/0x3f0 net/core/dev.c:6366
       napi_poll net/core/dev.c:6432 [inline]
       net_rx_action+0x29e/0x650 net/core/dev.c:6519
       __do_softirq+0x158/0x2de kernel/softirq.c:558
       do_softirq+0xb1/0xf0 kernel/softirq.c:459
       __local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383
       __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
       _raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210
       spin_unlock_bh include/linux/spinlock.h:394 [inline]
       ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
       wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      value changed: 0xffff88815883e000 -> 0x0000000000000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
      
      Fixes: 4ea7e386 ("dropmon: add ability to detect when hardware dropsrxpackets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcd54265
    • Wen Gu's avatar
      net/smc: Avoid overwriting the copies of clcsock callback functions · 1de9770d
      Wen Gu authored
      The callback functions of clcsock will be saved and replaced during
      the fallback. But if the fallback happens more than once, then the
      copies of these callback functions will be overwritten incorrectly,
      resulting in a loop call issue:
      
      clcsk->sk_error_report
       |- smc_fback_error_report() <------------------------------|
           |- smc_fback_forward_wakeup()                          | (loop)
               |- clcsock_callback()  (incorrectly overwritten)   |
                   |- smc->clcsk_error_report() ------------------|
      
      So this patch fixes the issue by saving these function pointers only
      once in the fallback and avoiding overwriting.
      
      Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com
      Fixes: 341adeec ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
      Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.comSigned-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1de9770d
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f1baf68e
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter and can.
      
      Current release - new code bugs:
      
         - sparx5: fix get_stat64 out-of-bound access and crash
      
         - smc: fix netdev ref tracker misuse
      
        Previous releases - regressions:
      
         - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
           overflows
      
         - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
           over IP
      
         - bonding: fix rare link activation misses in 802.3ad mode
      
        Previous releases - always broken:
      
         - tcp: fix tcp sock mem accounting in zero-copy corner cases
      
         - remove the cached dst when uncloning an skb dst and its metadata,
           since we only have one ref it'd lead to an UaF
      
         - netfilter:
            - conntrack: don't refresh sctp entries in closed state
            - conntrack: re-init state for retransmitted syn-ack, avoid
              connection establishment getting stuck with strange stacks
            - ctnetlink: disable helper autoassign, avoid it getting lost
            - nft_payload: don't allow transport header access for fragments
      
         - dsa: fix use of devres for mdio throughout drivers
      
         - eth: amd-xgbe: disable interrupts during pci removal
      
         - eth: dpaa2-eth: unregister netdev before disconnecting the PHY
      
         - eth: ice: fix IPIP and SIT TSO offload"
      
      * tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
        net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
        net: mscc: ocelot: fix mutex lock error during ethtool stats read
        ice: Avoid RTNL lock when re-creating auxiliary device
        ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
        ice: fix IPIP and SIT TSO offload
        ice: fix an error code in ice_cfg_phy_fec()
        net: mpls: Fix GCC 12 warning
        dpaa2-eth: unregister the netdev before disconnecting from the PHY
        skbuff: cleanup double word in comment
        net: macb: Align the dma and coherent dma masks
        mptcp: netlink: process IPv6 addrs in creating listening sockets
        selftests: mptcp: add missing join check
        net: usb: qmi_wwan: Add support for Dell DW5829e
        vlan: move dev_put into vlan_dev_uninit
        vlan: introduce vlan_dev_free_egress_priority
        ax25: fix UAF bugs of net_device caused by rebinding operation
        net: dsa: fix panic when DSA master device unbinds on shutdown
        net: amd-xgbe: disable interrupts during pci removal
        tipc: rate limit warning for received illegal binding update
        net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
        ...
      f1baf68e
  2. 10 Feb, 2022 23 commits
  3. 09 Feb, 2022 12 commits
    • Paul Moore's avatar
      audit: don't deref the syscall args when checking the openat2 open_how::flags · 7a82f89d
      Paul Moore authored
      As reported by Jeff, dereferencing the openat2 syscall argument in
      audit_match_perm() to obtain the open_how::flags can result in an
      oops/page-fault.  This patch fixes this by using the open_how struct
      that we store in the audit_context with audit_openat2_how().
      
      Independent of this patch, Richard Guy Briggs posted a similar patch
      to the audit mailing list roughly 40 minutes after this patch was
      posted.
      
      Cc: stable@vger.kernel.org
      Fixes: 1c30e3af ("audit: add support for the openat2 syscall")
      Reported-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      7a82f89d
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · f4bc5bbb
      Linus Torvalds authored
      Pull more nfsd fixes from Chuck Lever:
       "Ensure that NFS clients cannot send file size or offset values that
        can cause the NFS server to crash or to return incorrect or surprising
        results.
      
        In particular, fix how the NFS server handles values larger than
        OFFSET_MAX"
      
      * tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        NFSD: Deprecate NFS_OFFSET_MAX
        NFSD: Fix offset type in I/O trace points
        NFSD: COMMIT operations must not return NFS?ERR_INVAL
        NFSD: Clamp WRITE offsets
        NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
        NFSD: Fix ia_size underflow
        NFSD: Fix the behavior of READ near OFFSET_MAX
      f4bc5bbb
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · f9f94c9d
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "Fix two regressions:
      
         - Potential boot failure due to missing cryptomgr on initramfs
      
         - Stack overflow in octeontx2"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: api - Move cryptomgr soft dependency into algapi
        crypto: octeontx2 - Avoid stack variable overflow
      f9f94c9d
    • Domenico Andreoli's avatar
      Fix regression due to "fs: move binfmt_misc sysctl to its own file" · b42bc9a3
      Domenico Andreoli authored
      Commit 3ba442d5 ("fs: move binfmt_misc sysctl to its own file") did
      not go unnoticed, binfmt-support stopped to work on my Debian system
      since v5.17-rc2 (did not check with -rc1).
      
      The existance of the /proc/sys/fs/binfmt_misc is a precondition for
      attempting to mount the binfmt_misc fs, which in turn triggers the
      autoload of the binfmt_misc module.  Without it, no module is loaded and
      no binfmt is available at boot.
      
      Building as built-in or manually loading the module and mounting the fs
      works fine, it's therefore only a matter of interaction with user-space.
      I could try to improve the Debian systemd configuration but I can't say
      anything about the other distributions.
      
      This patch restores a working system right after boot.
      
      Fixes: 3ba442d5 ("fs: move binfmt_misc sysctl to its own file")
      Signed-off-by: default avatarDomenico Andreoli <domenico.andreoli@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarTong Zhang <ztong0001@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b42bc9a3
    • Linus Torvalds's avatar
      Merge tag 'kvm-s390-kernel-access' from emailed bundle · 09a93c1d
      Linus Torvalds authored
      Pull s390 kvm fix from Christian Borntraeger:
       "Add missing check for the MEMOP ioctl
      
        The SIDA MEMOPs must only be used for secure guests, otherwise
        userspace can do unwanted memory accesses"
      
      * tag 'kvm-s390-kernel-access' from emailed bundle:
        KVM: s390: Return error on SIDA memop on normal guest
      09a93c1d
    • Chuck Lever's avatar
      NFSD: Deprecate NFS_OFFSET_MAX · c306d737
      Chuck Lever authored
      NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
      was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
      few uses of it with its generic equivalent, and get rid of it.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c306d737
    • Chuck Lever's avatar
      NFSD: Fix offset type in I/O trace points · 6a4d333d
      Chuck Lever authored
      NFSv3 and NFSv4 use u64 offset values on the wire. Record these values
      verbatim without the implicit type case to loff_t.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6a4d333d
    • Chuck Lever's avatar
      NFSD: COMMIT operations must not return NFS?ERR_INVAL · 3f965021
      Chuck Lever authored
      Since, well, forever, the Linux NFS server's nfsd_commit() function
      has returned nfserr_inval when the passed-in byte range arguments
      were non-sensical.
      
      However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests
      are permitted to return only the following non-zero status codes:
      
            NFS3ERR_IO
            NFS3ERR_STALE
            NFS3ERR_BADHANDLE
            NFS3ERR_SERVERFAULT
      
      NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL
      is not listed in the COMMIT row of Table 6 in RFC 8881.
      
      RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not
      specify when it can or should be used.
      
      Instead of dropping or failing a COMMIT request in a byte range that
      is not supported, turn it into a valid request by treating one or
      both arguments as zero. Offset zero means start-of-file, count zero
      means until-end-of-file, so we only ever extend the commit range.
      NFS servers are always allowed to commit more and sooner than
      requested.
      
      The range check is no longer bounded by NFS_OFFSET_MAX, but rather
      by the value that is returned in the maxfilesize field of the NFSv3
      FSINFO procedure or the NFSv4 maxfilesize file attribute.
      
      Note that this change results in a new pynfs failure:
      
      CMT4     st_commit.testCommitOverflow                             : RUNNING
      CMT4     st_commit.testCommitOverflow                             : FAILURE
                 COMMIT with offset + count overflow should return
                 NFS4ERR_INVAL, instead got NFS4_OK
      
      IMO the test is not correct as written: RFC 8881 does not allow the
      COMMIT operation to return NFS4ERR_INVAL.
      Reported-by: default avatarDan Aloni <dan.aloni@vastdata.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarBruce Fields <bfields@fieldses.org>
      3f965021
    • Chuck Lever's avatar
      NFSD: Clamp WRITE offsets · 6260d9a5
      Chuck Lever authored
      Ensure that a client cannot specify a WRITE range that falls in a
      byte range outside what the kernel's internal types (such as loff_t,
      which is signed) can represent. The kiocb iterators, invoked in
      nfsd_vfs_write(), should properly limit write operations to within
      the underlying file system's s_maxbytes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6260d9a5
    • Chuck Lever's avatar
      NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes · a648fdeb
      Chuck Lever authored
      iattr::ia_size is a loff_t, so these NFSv3 procedures must be
      careful to deal with incoming client size values that are larger
      than s64_max without corrupting the value.
      
      Silently capping the value results in storing a different value
      than the client passed in which is unexpected behavior, so remove
      the min_t() check in decode_sattr3().
      
      Note that RFC 1813 permits only the WRITE procedure to return
      NFS3ERR_FBIG. We believe that NFSv3 reference implementations
      also return NFS3ERR_FBIG when ia_size is too large.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a648fdeb
    • Chuck Lever's avatar
      NFSD: Fix ia_size underflow · e6faac3f
      Chuck Lever authored
      iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and
      NFSv4 both define file size as an unsigned 64-bit type. Thus there
      is a range of valid file size values an NFS client can send that is
      already larger than Linux can handle.
      
      Currently decode_fattr4() dumps a full u64 value into ia_size. If
      that value happens to be larger than S64_MAX, then ia_size
      underflows. I'm about to fix up the NFSv3 behavior as well, so let's
      catch the underflow in the common code path: nfsd_setattr().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e6faac3f
    • Chuck Lever's avatar
      NFSD: Fix the behavior of READ near OFFSET_MAX · 0cb4d23a
      Chuck Lever authored
      Dan Aloni reports:
      > Due to commit 8cfb9015 ("NFS: Always provide aligned buffers to
      > the RPC read layers") on the client, a read of 0xfff is aligned up
      > to server rsize of 0x1000.
      >
      > As a result, in a test where the server has a file of size
      > 0x7fffffffffffffff, and the client tries to read from the offset
      > 0x7ffffffffffff000, the read causes loff_t overflow in the server
      > and it returns an NFS code of EINVAL to the client. The client as
      > a result indefinitely retries the request.
      
      The Linux NFS client does not handle NFS?ERR_INVAL, even though all
      NFS specifications permit servers to return that status code for a
      READ.
      
      Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed
      and return a short result. Set the EOF flag in the result to prevent
      the client from retrying the READ request. This behavior appears to
      be consistent with Solaris NFS servers.
      
      Note that NFSv3 and NFSv4 use u64 offset values on the wire. These
      must be converted to loff_t internally before use -- an implicit
      type cast is not adequate for this purpose. Otherwise VFS checks
      against sb->s_maxbytes do not work properly.
      Reported-by: default avatarDan Aloni <dan.aloni@vastdata.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      0cb4d23a