1. 20 Feb, 2024 8 commits
    • Arkadiusz Kubalewski's avatar
      ice: fix dpll periodic work data updates on PF reset · 9a8385fe
      Arkadiusz Kubalewski authored
      Do not allow dpll periodic work function to acquire data from firmware
      if PF reset is in progress. Acquiring data will cause dmesg errors as the
      firmware cannot respond or process the request properly during the reset
      time.
      
      Test by looping execution of below step until dmesg error appears:
      - perform PF reset
      $ echo 1 > /sys/class/net/<ice PF>/device/reset
      
      Fixes: d7999f5e ("ice: implement dpll interface to control cgu")
      Reviewed-by: default avatarIgor Bagnucki <igor.bagnucki@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9a8385fe
    • Arkadiusz Kubalewski's avatar
      ice: fix dpll and dpll_pin data access on PF reset · fc7fd1a1
      Arkadiusz Kubalewski authored
      Do not allow to acquire data or alter configuration of dpll and pins
      through firmware if PF reset is in progress, this would cause confusing
      netlink extack errors as the firmware cannot respond or process the
      request properly during the reset time.
      
      Return (-EBUSY) and extack error for the user who tries access/modify
      the config of dpll/pin through firmware during the reset time.
      
      The PF reset and kernel access to dpll data are both asynchronous. It is
      not possible to guard all the possible reset paths with any determinictic
      approach. I.e., it is possible that reset starts after reset check is
      performed (or if the reset would be checked after mutex is locked), but at
      the same time it is not possible to wait for dpll mutex unlock in the
      reset flow.
      This is best effort solution to at least give a clue to the user
      what is happening in most of the cases, knowing that there are possible
      race conditions where the user could see a different error received
      from firmware due to reset unexpectedly starting.
      
      Test by looping execution of below steps until netlink error appears:
      - perform PF reset
      $ echo 1 > /sys/class/net/<ice PF>/device/reset
      - i.e. try to alter/read dpll/pin config:
      $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
      	--dump pin-get
      
      Fixes: d7999f5e ("ice: implement dpll interface to control cgu")
      Reviewed-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fc7fd1a1
    • Arkadiusz Kubalewski's avatar
      ice: fix dpll input pin phase_adjust value updates · 3b14430c
      Arkadiusz Kubalewski authored
      The value of phase_adjust for input pin shall be updated in
      ice_dpll_pin_state_update(..). Fix by adding proper argument to the
      firmware query function call - a pin's struct field pointer where the
      phase_adjust value during driver runtime is stored.
      
      Previously the phase_adjust used to misinform user about actual
      phase_adjust value. I.e., if phase_adjust was set to a non zero value and
      if driver was reloaded, the user would see the value equal 0, which is
      not correct - the actual value is equal to value set before driver reload.
      
      Fixes: 90e1c907 ("ice: dpll: implement phase related callbacks")
      Reviewed-by: default avatarAlan Brady <alan.brady@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3b14430c
    • Yochai Hagvi's avatar
      ice: fix connection state of DPLL and out pin · e8335ef5
      Yochai Hagvi authored
      Fix the connection state between source DPLL and output pin, updating the
      attribute 'state' of 'parent_device'. Previously, the connection state
      was broken, and didn't reflect the correct state.
      
      When 'state_on_dpll_set' is called with the value
      'DPLL_PIN_STATE_CONNECTED' (1), the output pin will switch to the given
      DPLL, and the state of the given DPLL will be set to connected.
      E.g.:
      	--do pin-set --json '{"id":2, "parent-device":{"parent-id":1,
      						       "state": 1 }}'
      This command will connect DPLL device with id 1 to output pin with id 2.
      
      When 'state_on_dpll_set' is called with the value
      'DPLL_PIN_STATE_DISCONNECTED' (2) and the given DPLL is currently
      connected, then the output pin will be disabled.
      E.g:
      	--do pin-set --json '{"id":2, "parent-device":{"parent-id":1,
      						       "state": 2 }}'
      This command will disable output pin with id 2 if DPLL device with ID 1 is
      connected to it; otherwise, the command is ignored.
      
      Fixes: d7999f5e ("ice: implement dpll interface to control cgu")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Signed-off-by: default avatarYochai Hagvi <yochai.hagvi@intel.com>
      Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e8335ef5
    • Jakub Kicinski's avatar
      docs: netdev: update the link to the CI repo · 23f9c2c0
      Jakub Kicinski authored
      Netronome graciously transferred the original NIPA repo
      to our new netdev umbrella org. Link to that instead of
      my private fork.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240216161945.2208842-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      23f9c2c0
    • Kuniyuki Iwashima's avatar
      arp: Prevent overflow in arp_req_get(). · a7d60277
      Kuniyuki Iwashima authored
      syzkaller reported an overflown write in arp_req_get(). [0]
      
      When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour
      entry and copies neigh->ha to struct arpreq.arp_ha.sa_data.
      
      The arp_ha here is struct sockaddr, not struct sockaddr_storage, so
      the sa_data buffer is just 14 bytes.
      
      In the splat below, 2 bytes are overflown to the next int field,
      arp_flags.  We initialise the field just after the memcpy(), so it's
      not a problem.
      
      However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN),
      arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL)
      in arp_ioctl() before calling arp_req_get().
      
      To avoid the overflow, let's limit the max length of memcpy().
      
      Note that commit b5f0de6d ("net: dev: Convert sa_data to flexible
      array in struct sockaddr") just silenced syzkaller.
      
      [0]:
      memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14)
      WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
      Modules linked in:
      CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
      RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128
      Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6
      RSP: 0018:ffffc900050b7998 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001
      RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000
      R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010
      FS:  00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261
       inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981
       sock_do_ioctl+0xdf/0x260 net/socket.c:1204
       sock_ioctl+0x3ef/0x650 net/socket.c:1321
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x64/0xce
      RIP: 0033:0x7f172b262b8d
      Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d
      RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003
      RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000
       </TASK>
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Reported-by: default avatarBjoern Doebel <doebel@amazon.de>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240215230516.31330-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a7d60277
    • Vasiliy Kovalev's avatar
      devlink: fix possible use-after-free and memory leaks in devlink_init() · def689fc
      Vasiliy Kovalev authored
      The pernet operations structure for the subsystem must be registered
      before registering the generic netlink family.
      
      Make an unregister in case of unsuccessful registration.
      
      Fixes: 687125b5 ("devlink: split out core code")
      Signed-off-by: default avatarVasiliy Kovalev <kovalev@altlinux.org>
      Link: https://lore.kernel.org/r/20240215203400.29976-1-kovalev@altlinux.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      def689fc
    • Vasiliy Kovalev's avatar
      ipv6: sr: fix possible use-after-free and null-ptr-deref · 5559cea2
      Vasiliy Kovalev authored
      The pernet operations structure for the subsystem must be registered
      before registering the generic netlink family.
      
      Fixes: 915d7e5e ("ipv6: sr: add code base for control plane support of SR-IPv6")
      Signed-off-by: default avatarVasiliy Kovalev <kovalev@altlinux.org>
      Link: https://lore.kernel.org/r/20240215202717.29815-1-kovalev@altlinux.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5559cea2
  2. 19 Feb, 2024 3 commits
    • Kees Cook's avatar
      enic: Avoid false positive under FORTIFY_SOURCE · 40b9385d
      Kees Cook authored
      FORTIFY_SOURCE has been ignoring 0-sized destinations while the kernel
      code base has been converted to flexible arrays. In order to enforce
      the 0-sized destinations (e.g. with __counted_by), the remaining 0-sized
      destinations need to be handled. Unfortunately, struct vic_provinfo
      resists full conversion, as it contains a flexible array of flexible
      arrays, which is only possible with the 0-sized fake flexible array.
      
      Use unsafe_memcpy() to avoid future false positives under
      CONFIG_FORTIFY_SOURCE.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40b9385d
    • Shannon Nelson's avatar
      ionic: use pci_is_enabled not open code · 121e4dcb
      Shannon Nelson authored
      Since there is a utility available for this, use
      the API rather than open code.
      
      Fixes: 13943d6c ("ionic: prevent pci disable of already disabled device")
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      121e4dcb
    • Hangbin Liu's avatar
      selftests: bonding: set active slave to primary eth1 specifically · cd65c48d
      Hangbin Liu authored
      In bond priority testing, we set the primary interface to eth1 and add
      eth0,1,2 to bond in serial. This is OK in normal times. But when in
      debug kernel, the bridge port that eth0,1,2 connected would start
      slowly (enter blocking, forwarding state), which caused the primary
      interface down for a while after enslaving and active slave changed.
      Here is a test log from Jakub's debug test[1].
      
       [  400.399070][   T50] br0: port 1(s0) entered disabled state
       [  400.400168][   T50] br0: port 4(s2) entered disabled state
       [  400.941504][ T2791] bond0: (slave eth0): making interface the new active one
       [  400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link
       [  400.943633][ T2766] br0: port 1(s0) entered blocking state
       [  400.944119][ T2766] br0: port 1(s0) entered forwarding state
       [  401.128792][ T2792] bond0: (slave eth1): making interface the new active one
       [  401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link
       [  401.131643][   T69] br0: port 2(s1) entered blocking state
       [  401.132067][   T69] br0: port 2(s1) entered forwarding state
       [  401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link
       [  401.348414][   T50] br0: port 4(s2) entered blocking state
       [  401.348857][   T50] br0: port 4(s2) entered forwarding state
       [  401.519669][  T250] bond0: (slave eth0): link status definitely down, disabling slave
       [  401.526522][  T250] bond0: (slave eth1): link status definitely down, disabling slave
       [  401.526986][  T250] bond0: (slave eth2): making interface the new active one
       [  401.629470][  T250] bond0: (slave eth0): link status definitely up
       [  401.630089][  T250] bond0: (slave eth1): link status definitely up
       [...]
       # TEST: prio (active-backup ns_ip6_target primary_reselect 1)         [FAIL]
       # Current active slave is eth2 but not eth1
      
      Fix it by setting active slave to primary slave specifically before
      testing.
      
      [1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout
      
      Fixes: 481b56e0 ("selftests: bonding: re-format bond option tests")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd65c48d
  3. 18 Feb, 2024 20 commits
  4. 17 Feb, 2024 1 commit
  5. 16 Feb, 2024 8 commits
    • Jakub Kicinski's avatar
      net/sched: act_mirred: don't override retval if we already lost the skb · 166c2c8a
      Jakub Kicinski authored
      If we're redirecting the skb, and haven't called tcf_mirred_forward(),
      yet, we need to tell the core to drop the skb by setting the retcode
      to SHOT. If we have called tcf_mirred_forward(), however, the skb
      is out of our hands and returning SHOT will lead to UaF.
      
      Move the retval override to the error path which actually need it.
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Fixes: e5cf1baf ("act_mirred: use TC_ACT_REINSERT when possible")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      166c2c8a
    • Jakub Kicinski's avatar
      net/sched: act_mirred: use the backlog for mirred ingress · 52f671db
      Jakub Kicinski authored
      The test Davide added in commit ca22da2f ("act_mirred: use the backlog
      for nested calls to mirred ingress") hangs our testing VMs every 10 or so
      runs, with the familiar tcp_v4_rcv -> tcp_v4_rcv deadlock reported by
      lockdep.
      
      The problem as previously described by Davide (see Link) is that
      if we reverse flow of traffic with the redirect (egress -> ingress)
      we may reach the same socket which generated the packet. And we may
      still be holding its socket lock. The common solution to such deadlocks
      is to put the packet in the Rx backlog, rather than run the Rx path
      inline. Do that for all egress -> ingress reversals, not just once
      we started to nest mirred calls.
      
      In the past there was a concern that the backlog indirection will
      lead to loss of error reporting / less accurate stats. But the current
      workaround does not seem to address the issue.
      
      Fixes: 53592b36 ("net/sched: act_mirred: Implement ingress actions")
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Suggested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Link: https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52f671db
    • Randy Dunlap's avatar
      net: ethernet: adi: requires PHYLIB support · a9f80df4
      Randy Dunlap authored
      This driver uses functions that are supplied by the Kconfig symbol
      PHYLIB, so select it to ensure that they are built as needed.
      
      When CONFIG_ADIN1110=y and CONFIG_PHYLIB=m, there are multiple build
      (linker) errors that are resolved by this Kconfig change:
      
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_open':
         drivers/net/ethernet/adi/adin1110.c:933: undefined reference to `phy_start'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_probe_netdevs':
         drivers/net/ethernet/adi/adin1110.c:1603: undefined reference to `get_phy_device'
         ld: drivers/net/ethernet/adi/adin1110.c:1609: undefined reference to `phy_connect'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
         drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `devm_mdiobus_alloc':
         include/linux/phy.h:455: undefined reference to `devm_mdiobus_alloc_size'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_register_mdiobus':
         drivers/net/ethernet/adi/adin1110.c:529: undefined reference to `__devm_mdiobus_register'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_net_stop':
         drivers/net/ethernet/adi/adin1110.c:958: undefined reference to `phy_stop'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_disconnect_phy':
         drivers/net/ethernet/adi/adin1110.c:1226: undefined reference to `phy_disconnect'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_adjust_link':
         drivers/net/ethernet/adi/adin1110.c:1077: undefined reference to `phy_print_status'
         ld: drivers/net/ethernet/adi/adin1110.o: in function `adin1110_ioctl':
         drivers/net/ethernet/adi/adin1110.c:790: undefined reference to `phy_do_ioctl'
         ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf60): undefined reference to `phy_ethtool_get_link_ksettings'
         ld: drivers/net/ethernet/adi/adin1110.o:(.rodata+0xf68): undefined reference to `phy_ethtool_set_link_ksettings'
      
      Fixes: bc93e19d ("net: ethernet: adi: Add ADIN1110 support")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202402070626.eZsfVHG5-lkp@intel.com/
      Cc: Lennart Franzen <lennart@lfdomain.com>
      Cc: Alexandru Tachici <alexandru.tachici@analog.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Reviewed-by: default avatarNuno Sa <nuno.sa@analog.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9f80df4
    • Kuniyuki Iwashima's avatar
      dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished(). · 66b60b0c
      Kuniyuki Iwashima authored
      syzkaller reported a warning [0] in inet_csk_destroy_sock() with no
      repro.
      
        WARN_ON(inet_sk(sk)->inet_num && !inet_csk(sk)->icsk_bind_hash);
      
      However, the syzkaller's log hinted that connect() failed just before
      the warning due to FAULT_INJECTION.  [1]
      
      When connect() is called for an unbound socket, we search for an
      available ephemeral port.  If a bhash bucket exists for the port, we
      call __inet_check_established() or __inet6_check_established() to check
      if the bucket is reusable.
      
      If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num.
      
      Later, we look up the corresponding bhash2 bucket and try to allocate
      it if it does not exist.
      
      Although it rarely occurs in real use, if the allocation fails, we must
      revert the changes by check_established().  Otherwise, an unconnected
      socket could illegally occupy an ehash entry.
      
      Note that we do not put tw back into ehash because sk might have
      already responded to a packet for tw and it would be better to free
      tw earlier under such memory presure.
      
      [0]:
      WARNING: CPU: 0 PID: 350830 at net/ipv4/inet_connection_sock.c:1193 inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
      Modules linked in:
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
      Code: 41 5c 41 5d 41 5e e9 2d 4a 3d fd e8 28 4a 3d fd 48 89 ef e8 f0 cd 7d ff 5b 5d 41 5c 41 5d 41 5e e9 13 4a 3d fd e8 0e 4a 3d fd <0f> 0b e9 61 fe ff ff e8 02 4a 3d fd 4c 89 e7 be 03 00 00 00 e8 05
      RSP: 0018:ffffc9000b21fd38 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000009e78 RCX: ffffffff840bae40
      RDX: ffff88806e46c600 RSI: ffffffff840bb012 RDI: ffff88811755cca8
      RBP: ffff88811755c880 R08: 0000000000000003 R09: 0000000000000000
      R10: 0000000000009e78 R11: 0000000000000000 R12: ffff88811755c8e0
      R13: ffff88811755c892 R14: ffff88811755c918 R15: 0000000000000000
      FS:  00007f03e5243800(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b32f21000 CR3: 0000000112ffe001 CR4: 0000000000770ef0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193)
       dccp_close (net/dccp/proto.c:1078)
       inet_release (net/ipv4/af_inet.c:434)
       __sock_release (net/socket.c:660)
       sock_close (net/socket.c:1423)
       __fput (fs/file_table.c:377)
       __fput_sync (fs/file_table.c:462)
       __x64_sys_close (fs/open.c:1557 fs/open.c:1539 fs/open.c:1539)
       do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
      RIP: 0033:0x7f03e53852bb
      Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 c9 f5 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 c9 f5 ff 8b 44
      RSP: 002b:00000000005dfba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
      RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03e53852bb
      RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000167c
      R10: 0000000008a79680 R11: 0000000000000293 R12: 00007f03e4e43000
      R13: 00007f03e4e43170 R14: 00007f03e4e43178 R15: 00007f03e4e43170
       </TASK>
      
      [1]:
      FAULT_INJECTION: forcing a failure.
      name failslab, interval 1, probability 0, space 0, times 0
      CPU: 0 PID: 350833 Comm: syz-executor.1 Not tainted 6.7.0-12272-g2121c43f #9
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
       should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
       should_failslab (mm/slub.c:3748)
       kmem_cache_alloc (mm/slub.c:3763 mm/slub.c:3842 mm/slub.c:3867)
       inet_bind2_bucket_create (net/ipv4/inet_hashtables.c:135)
       __inet_hash_connect (net/ipv4/inet_hashtables.c:1100)
       dccp_v4_connect (net/dccp/ipv4.c:116)
       __inet_stream_connect (net/ipv4/af_inet.c:676)
       inet_stream_connect (net/ipv4/af_inet.c:747)
       __sys_connect_file (net/socket.c:2048 (discriminator 2))
       __sys_connect (net/socket.c:2065)
       __x64_sys_connect (net/socket.c:2072)
       do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
      RIP: 0033:0x7f03e5284e5d
      Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
      RSP: 002b:00007f03e4641cc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f03e5284e5d
      RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
      R13: 000000000000000b R14: 00007f03e52e5530 R15: 0000000000000000
       </TASK>
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66b60b0c
    • David S. Miller's avatar
      Merge branch 'bridge-mdb-events' · 82a678e2
      David S. Miller authored
      Tobias Waldekranz says:
      
      ====================
      net: bridge: switchdev: Ensure MDB events are delivered exactly once
      
      When a device is attached to a bridge, drivers will request a replay
      of objects that were created before the device joined the bridge, that
      are still of interest to the joining port. Typical examples include
      FDB entries and MDB memberships on other ports ("foreign interfaces")
      or on the bridge itself.
      
      Conversely when a device is detached, the bridge will synthesize
      deletion events for all those objects that are still live, but no
      longer applicable to the device in question.
      
      This series eliminates two races related to the synching and
      unsynching phases of a bridge's MDB with a joining or leaving device,
      that would cause notifications of such objects to be either delivered
      twice (1/2), or not at all (2/2).
      
      A similar race to the one solved by 1/2 still remains for the
      FDB. This is much harder to solve, due to the lockless operation of
      the FDB's rhashtable, and is therefore knowingly left out of this
      series.
      
      v1 -> v2:
      - Squash the previously separate addition of
        switchdev_port_obj_act_is_deferred into first consumer.
      - Use ether_addr_equal to compare MAC addresses.
      - Document switchdev_port_obj_act_is_deferred (renamed from
        switchdev_port_obj_is_deferred in v1, to indicate that we also match
        on the action).
      - Delay allocations of MDB objects until we know they're needed.
      - Use non-RCU version of the hash list iterator, now that the MDB is
        not scanned while holding the RCU read lock.
      - Add Fixes tag to commit message
      
      v2 -> v3:
      - Fix unlocking in error paths
      - Access RCU protected port list via mlock_dereference, since MDB is
        guaranteed to remain constant for the duration of the scan.
      
      v3 -> v4:
      - Limit the search for exiting deferred events in 1/2 to only apply to
        additions, since the problem does not exist in the deletion case.
      - Add 2/2, to plug a related race when unoffloading an indirectly
        associated device.
      
      v4 -> v5:
      - Fix grammatical errors in kerneldoc of
        switchdev_port_obj_act_is_deferred
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82a678e2
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Ensure deferred event delivery on unoffload · f7a70d65
      Tobias Waldekranz authored
      When unoffloading a device, it is important to ensure that all
      relevant deferred events are delivered to it before it disassociates
      itself from the bridge.
      
      Before this change, this was true for the normal case when a device
      maps 1:1 to a net_bridge_port, i.e.
      
         br0
         /
      swp0
      
      When swp0 leaves br0, the call to switchdev_deferred_process() in
      del_nbp() makes sure to process any outstanding events while the
      device is still associated with the bridge.
      
      In the case when the association is indirect though, i.e. when the
      device is attached to the bridge via an intermediate device, like a
      LAG...
      
          br0
          /
        lag0
        /
      swp0
      
      ...then detaching swp0 from lag0 does not cause any net_bridge_port to
      be deleted, so there was no guarantee that all events had been
      processed before the device disassociated itself from the bridge.
      
      Fix this by always synchronously processing all deferred events before
      signaling completion of unoffloading back to the driver.
      
      Fixes: 4e51bf44 ("net: bridge: move the switchdev object replay helpers to "push" mode")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a70d65
    • Tobias Waldekranz's avatar
      net: bridge: switchdev: Skip MDB replays of deferred events on offload · dc489f86
      Tobias Waldekranz authored
      Before this change, generation of the list of MDB events to replay
      would race against the creation of new group memberships, either from
      the IGMP/MLD snooping logic or from user configuration.
      
      While new memberships are immediately visible to walkers of
      br->mdb_list, the notification of their existence to switchdev event
      subscribers is deferred until a later point in time. So if a replay
      list was generated during a time that overlapped with such a window,
      it would also contain a replay of the not-yet-delivered event.
      
      The driver would thus receive two copies of what the bridge internally
      considered to be one single event. On destruction of the bridge, only
      a single membership deletion event was therefore sent. As a
      consequence of this, drivers which reference count memberships (at
      least DSA), would be left with orphan groups in their hardware
      database when the bridge was destroyed.
      
      This is only an issue when replaying additions. While deletion events
      may still be pending on the deferred queue, they will already have
      been removed from br->mdb_list, so no duplicates can be generated in
      that scenario.
      
      To a user this meant that old group memberships, from a bridge in
      which a port was previously attached, could be reanimated (in
      hardware) when the port joined a new bridge, without the new bridge's
      knowledge.
      
      For example, on an mv88e6xxx system, create a snooping bridge and
      immediately add a port to it:
      
          root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
          > ip link set dev x3 up master br0
      
      And then destroy the bridge:
      
          root@infix-06-0b-00:~$ ip link del dev br0
          root@infix-06-0b-00:~$ mvls atu
          ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
          DEV:0 Marvell 88E6393X
          33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
          ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
          root@infix-06-0b-00:~$
      
      The two IPv6 groups remain in the hardware database because the
      port (x3) is notified of the host's membership twice: once via the
      original event and once via a replay. Since only a single delete
      notification is sent, the count remains at 1 when the bridge is
      destroyed.
      
      Then add the same port (or another port belonging to the same hardware
      domain) to a new bridge, this time with snooping disabled:
      
          root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
          > ip link set dev x3 up master br1
      
      All multicast, including the two IPv6 groups from br0, should now be
      flooded, according to the policy of br1. But instead the old
      memberships are still active in the hardware database, causing the
      switch to only forward traffic to those groups towards the CPU (port
      0).
      
      Eliminate the race in two steps:
      
      1. Grab the write-side lock of the MDB while generating the replay
         list.
      
      This prevents new memberships from showing up while we are generating
      the replay list. But it leaves the scenario in which a deferred event
      was already generated, but not delivered, before we grabbed the
      lock. Therefore:
      
      2. Make sure that no deferred version of a replay event is already
         enqueued to the switchdev deferred queue, before adding it to the
         replay list, when replaying additions.
      
      Fixes: 4f2673b3 ("net: bridge: add helper to replay port and host-joined mdb entries")
      Signed-off-by: default avatarTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc489f86
    • Alexander Gordeev's avatar
      net/iucv: fix the allocation size of iucv_path_table array · b4ea9b6a
      Alexander Gordeev authored
      iucv_path_table is a dynamically allocated array of pointers to
      struct iucv_path items. Yet, its size is calculated as if it was
      an array of struct iucv_path items.
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4ea9b6a