1. 18 Mar, 2022 7 commits
    • Guillaume Nault's avatar
      ipv4: Fix route lookups when handling ICMP redirects and PMTU updates · 544b4dd5
      Guillaume Nault authored
      The PMTU update and ICMP redirect helper functions initialise their fl4
      variable with either __build_flow_key() or build_sk_flow_key(). These
      initialisation functions always set ->flowi4_scope with
      RT_SCOPE_UNIVERSE and might set the ECN bits of ->flowi4_tos. This is
      not a problem when the route lookup is later done via
      ip_route_output_key_hash(), which properly clears the ECN bits from
      ->flowi4_tos and initialises ->flowi4_scope based on the RTO_ONLINK
      flag. However, some helpers call fib_lookup() directly, without
      sanitising the tos and scope fields, so the route lookup can fail and,
      as a result, the ICMP redirect or PMTU update aren't taken into
      account.
      
      Fix this by extracting the ->flowi4_tos and ->flowi4_scope sanitisation
      code into ip_rt_fix_tos(), then use this function in handlers that call
      fib_lookup() directly.
      
      Note 1: We can't sanitise ->flowi4_tos and ->flowi4_scope in a central
      place (like __build_flow_key() or flowi4_init_output()), because
      ip_route_output_key_hash() expects non-sanitised values. When called
      with sanitised values, it can erroneously overwrite RT_SCOPE_LINK with
      RT_SCOPE_UNIVERSE in ->flowi4_scope. Therefore we have to be careful to
      sanitise the values only for those paths that don't call
      ip_route_output_key_hash().
      
      Note 2: The problem is mostly about sanitising ->flowi4_tos. Having
      ->flowi4_scope initialised with RT_SCOPE_UNIVERSE instead of
      RT_SCOPE_LINK probably wasn't really a problem: sockets with the
      SOCK_LOCALROUTE flag set (those that'd result in RTO_ONLINK being set)
      normally shouldn't receive ICMP redirects or PMTU updates.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      544b4dd5
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 6bd0c76b
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2022-03-18
      
      We've added 2 non-merge commits during the last 18 day(s) which contain
      a total of 2 files changed, 50 insertions(+), 20 deletions(-).
      
      The main changes are:
      
      1) Fix a race in XSK socket teardown code that can lead to a NULL pointer
         dereference, from Magnus.
      
      2) Small MAINTAINERS doc update to remove Lorenz from sockmap, from Lorenz.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        xsk: Fix race at socket teardown
        bpf: Remove Lorenz Bauer from L7 BPF maintainers
      ====================
      
      Link: https://lore.kernel.org/r/20220318152418.28638-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6bd0c76b
    • David S. Miller's avatar
      Merge branch 'af_unix-OOB-fixes' · 9905eed4
      David S. Miller authored
      Kuniyuki Iwashima says:
      
      ====================
      af_unix: Fix some OOB implementation.
      
      This series fixes some data-races and adds a missing feature around the
      commit 314001f0 ("af_unix: Add OOB support").
      
      Changelog:
        - v3:
          - Add the first patch
      
        - v2: https://lore.kernel.org/netdev/20220315054801.72035-1-kuniyu@amazon.co.jp/
          - Add READ_ONCE() to avoid a race reported by KCSAN (Eric)
          - Add IS_ENABLED(CONFIG_AF_UNIX_OOB) (Shoaib)
      
        - v1: https://lore.kernel.org/netdev/20220314052110.53634-1-kuniyu@amazon.co.jp/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9905eed4
    • Kuniyuki Iwashima's avatar
      af_unix: Support POLLPRI for OOB. · d9a232d4
      Kuniyuki Iwashima authored
      The commit 314001f0 ("af_unix: Add OOB support") introduced OOB for
      AF_UNIX, but it lacks some changes for POLLPRI.  Let's add the missing
      piece.
      
      In the selftest, normal datagrams are sent followed by OOB data, so this
      commit replaces `POLLIN | POLLPRI` with just `POLLPRI` in the first test
      case.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9a232d4
    • Kuniyuki Iwashima's avatar
      af_unix: Fix some data-races around unix_sk(sk)->oob_skb. · e82025c6
      Kuniyuki Iwashima authored
      Out-of-band data automatically places a "mark" showing wherein the
      sequence the out-of-band data would have been.  If the out-of-band data
      implies cancelling everything sent so far, the "mark" is helpful to flush
      them.  When the socket's read pointer reaches the "mark", the ioctl() below
      sets a non zero value to the arg `atmark`:
      
      The out-of-band data is queued in sk->sk_receive_queue as well as ordinary
      data and also saved in unix_sk(sk)->oob_skb.  It can be used to test if the
      head of the receive queue is the out-of-band data meaning the socket is at
      the "mark".
      
      While testing that, unix_ioctl() reads unix_sk(sk)->oob_skb locklessly.
      Thus, all accesses to oob_skb need some basic protection to avoid
      load/store tearing which KCSAN detects when these are called concurrently:
      
        - ioctl(fd_a, SIOCATMARK, &atmark, sizeof(atmark))
        - send(fd_b_connected_to_a, buf, sizeof(buf), MSG_OOB)
      
      BUG: KCSAN: data-race in unix_ioctl / unix_stream_sendmsg
      
      write to 0xffff888003d9cff0 of 8 bytes by task 175 on cpu 1:
       unix_stream_sendmsg (net/unix/af_unix.c:2087 net/unix/af_unix.c:2191)
       sock_sendmsg (net/socket.c:705 net/socket.c:725)
       __sys_sendto (net/socket.c:2040)
       __x64_sys_sendto (net/socket.c:2048)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
      
      read to 0xffff888003d9cff0 of 8 bytes by task 176 on cpu 0:
       unix_ioctl (net/unix/af_unix.c:3101 (discriminator 1))
       sock_do_ioctl (net/socket.c:1128)
       sock_ioctl (net/socket.c:1242)
       __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
      
      value changed: 0xffff888003da0c00 -> 0xffff888003da0d00
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 176 Comm: unix_race_oob_i Not tainted 5.17.0-rc5-59529-g83dc4c2a #12
      Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e82025c6
    • Sukadev Bhattiprolu's avatar
      ibmvnic: fix race between xmit and reset · 4219196d
      Sukadev Bhattiprolu authored
      There is a race between reset and the transmit paths that can lead to
      ibmvnic_xmit() accessing an scrq after it has been freed in the reset
      path. It can result in a crash like:
      
      	Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
      	BUG: Kernel NULL pointer dereference on read at 0x00000000
      	Faulting instruction address: 0xc0080000016189f8
      	Oops: Kernel access of bad area, sig: 11 [#1]
      	...
      	NIP [c0080000016189f8] ibmvnic_xmit+0x60/0xb60 [ibmvnic]
      	LR [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
      	Call Trace:
      	[c008000001618f08] ibmvnic_xmit+0x570/0xb60 [ibmvnic] (unreliable)
      	[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
      	[c000000000c9cfcc] sch_direct_xmit+0xec/0x330
      	[c000000000bfe640] __dev_xmit_skb+0x3a0/0x9d0
      	[c000000000c00ad4] __dev_queue_xmit+0x394/0x730
      	[c008000002db813c] __bond_start_xmit+0x254/0x450 [bonding]
      	[c008000002db8378] bond_start_xmit+0x40/0xc0 [bonding]
      	[c000000000c0046c] dev_hard_start_xmit+0x11c/0x280
      	[c000000000c00ca4] __dev_queue_xmit+0x564/0x730
      	[c000000000cf97e0] neigh_hh_output+0xd0/0x180
      	[c000000000cfa69c] ip_finish_output2+0x31c/0x5c0
      	[c000000000cfd244] __ip_queue_xmit+0x194/0x4f0
      	[c000000000d2a3c4] __tcp_transmit_skb+0x434/0x9b0
      	[c000000000d2d1e0] __tcp_retransmit_skb+0x1d0/0x6a0
      	[c000000000d2d984] tcp_retransmit_skb+0x34/0x130
      	[c000000000d310e8] tcp_retransmit_timer+0x388/0x6d0
      	[c000000000d315ec] tcp_write_timer_handler+0x1bc/0x330
      	[c000000000d317bc] tcp_write_timer+0x5c/0x200
      	[c000000000243270] call_timer_fn+0x50/0x1c0
      	[c000000000243704] __run_timers.part.0+0x324/0x460
      	[c000000000243894] run_timer_softirq+0x54/0xa0
      	[c000000000ea713c] __do_softirq+0x15c/0x3e0
      	[c000000000166258] __irq_exit_rcu+0x158/0x190
      	[c000000000166420] irq_exit+0x20/0x40
      	[c00000000002853c] timer_interrupt+0x14c/0x2b0
      	[c000000000009a00] decrementer_common_virt+0x210/0x220
      	--- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
      
      The immediate cause of the crash is the access of tx_scrq in the following
      snippet during a reset, where the tx_scrq can be either NULL or an address
      that will soon be invalid:
      
      	ibmvnic_xmit()
      	{
      		...
      		tx_scrq = adapter->tx_scrq[queue_num];
      		txq = netdev_get_tx_queue(netdev, queue_num);
      		ind_bufp = &tx_scrq->ind_buf;
      
      		if (test_bit(0, &adapter->resetting)) {
      		...
      	}
      
      But beyond that, the call to ibmvnic_xmit() itself is not safe during a
      reset and the reset path attempts to avoid this by stopping the queue in
      ibmvnic_cleanup(). However just after the queue was stopped, an in-flight
      ibmvnic_complete_tx() could have restarted the queue even as the reset is
      progressing.
      
      Since the queue was restarted we could get a call to ibmvnic_xmit() which
      can then access the bad tx_scrq (or other fields).
      
      We cannot however simply have ibmvnic_complete_tx() check the ->resetting
      bit and skip starting the queue. This can race at the "back-end" of a good
      reset which just restarted the queue but has not cleared the ->resetting
      bit yet. If we skip restarting the queue due to ->resetting being true,
      the queue would remain stopped indefinitely potentially leading to transmit
      timeouts.
      
      IOW ->resetting is too broad for this purpose. Instead use a new flag
      that indicates whether or not the queues are active. Only the open/
      reset paths control when the queues are active. ibmvnic_complete_tx()
      and others wake up the queue only if the queue is marked active.
      
      So we will have:
      	A. reset/open thread in ibmvnic_cleanup() and __ibmvnic_open()
      
      		->resetting = true
      		->tx_queues_active = false
      		disable tx queues
      		...
      		->tx_queues_active = true
      		start tx queues
      
      	B. Tx interrupt in ibmvnic_complete_tx():
      
      		if (->tx_queues_active)
      			netif_wake_subqueue();
      
      To ensure that ->tx_queues_active and state of the queues are consistent,
      we need a lock which:
      
      	- must also be taken in the interrupt path (ibmvnic_complete_tx())
      	- shared across the multiple queues in the adapter (so they don't
      	  become serialized)
      
      Use rcu_read_lock() and have the reset thread synchronize_rcu() after
      updating the ->tx_queues_active state.
      
      While here, consolidate a few boolean fields in ibmvnic_adapter for
      better alignment.
      
      Based on discussions with Brian King and Dany Madden.
      
      Fixes: 7ed5b31f ("net/ibmvnic: prevent more than one thread from running in reset")
      Reported-by: default avatarVaishnavi Bhat <vaish123@in.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4219196d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 4fa331b4
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix PPPoE and QinQ with flowtable inet family.
      
      2) Missing register validation in nf_tables.
      
      3) Initialize registers to avoid stack memleak to userspace.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fa331b4
  2. 17 Mar, 2022 16 commits
  3. 16 Mar, 2022 9 commits
  4. 15 Mar, 2022 8 commits