1. 15 Apr, 2022 2 commits
  2. 14 Apr, 2022 12 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d20339fa
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wireless and netfilter.
      
        Current release - regressions:
      
         - smc: fix af_ops of child socket pointing to released memory
      
         - wifi: ath9k: fix usage of driver-private space in tx_info
      
        Previous releases - regressions:
      
         - ipv6: fix panic when forwarding a pkt with no in6 dev
      
         - sctp: use the correct skb for security_sctp_assoc_request
      
         - smc: fix NULL pointer dereference in smc_pnet_find_ib()
      
         - sched: fix initialization order when updating chain 0 head
      
         - phy: don't defer probe forever if PHY IRQ provider is missing
      
         - dsa: revert "net: dsa: setup master before ports"
      
         - dsa: felix: fix tagging protocol changes with multiple CPU ports
      
         - eth: ice:
            - fix use-after-free when freeing @rx_cpu_rmap
            - revert "iavf: fix deadlock occurrence during resetting VF
              interface"
      
         - eth: lan966x: stop processing the MAC entry is port is wrong
      
        Previous releases - always broken:
      
         - sched:
            - flower: fix parsing of ethertype following VLAN header
            - taprio: check if socket flags are valid
      
         - nfc: add flush_workqueue to prevent uaf
      
         - veth: ensure eth header is in skb's linear part
      
         - eth: stmmac: fix altr_tse_pcs function when using a fixed-link
      
         - eth: macb: restart tx only if queue pointer is lagging
      
         - eth: macvlan: fix leaking skb in source mode with nodst option"
      
      * tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
        net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
        rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
        net: dsa: felix: fix tagging protocol changes with multiple CPU ports
        tun: annotate access to queue->trans_start
        nfc: nci: add flush_workqueue to prevent uaf
        net: dsa: realtek: don't parse compatible string for RTL8366S
        net: dsa: realtek: fix Kconfig to assure consistent driver linkage
        net: ftgmac100: access hardware register after clock ready
        Revert "net: dsa: setup master before ports"
        macvlan: Fix leaking skb in source mode with nodst option
        netfilter: nf_tables: nft_parse_register can return a negative value
        net: lan966x: Stop processing the MAC entry is port is wrong.
        net: lan966x: Fix when a port's upper is changed.
        net: lan966x: Fix IGMP snooping when frames have vlan tag
        net: lan966x: Update lan966x_ptp_get_nominal_value
        sctp: Initialize daddr on peeled off socket
        net/smc: Fix af_ops of child socket pointing to released memory
        net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
        net/smc: use memcpy instead of snprintf to avoid out of bounds read
        net: macb: Restart tx only if queue pointer is lagging
        ...
      d20339fa
    • Linus Torvalds's avatar
      Merge tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · b9b4c79e
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became an unexpectedly large pull request due to various
        regression fixes in the previous kernels.
      
        The majority of fixes are a series of patches to address the
        regression at probe errors in devres'ed drivers, while there are yet
        more fixes for the x86 SG allocations and for USB-audio buffer
        management. In addition, a few HD-audio quirks and other small fixes
        are found"
      
      * tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
        ALSA: usb-audio: Limit max buffer and period sizes per time
        ALSA: memalloc: Add fallback SG-buffer allocations for x86
        ALSA: nm256: Don't call card private_free at probe error path
        ALSA: mtpav: Don't call card private_free at probe error path
        ALSA: rme9652: Fix the missing snd_card_free() call at probe error
        ALSA: hdspm: Fix the missing snd_card_free() call at probe error
        ALSA: hdsp: Fix the missing snd_card_free() call at probe error
        ALSA: oxygen: Fix the missing snd_card_free() call at probe error
        ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
        ALSA: cmipci: Fix the missing snd_card_free() call at probe error
        ALSA: aw2: Fix the missing snd_card_free() call at probe error
        ALSA: als300: Fix the missing snd_card_free() call at probe error
        ALSA: lola: Fix the missing snd_card_free() call at probe error
        ALSA: bt87x: Fix the missing snd_card_free() call at probe error
        ALSA: sis7019: Fix the missing error handling
        ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
        ALSA: via82xx: Fix the missing snd_card_free() call at probe error
        ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
        ALSA: rme96: Fix the missing snd_card_free() call at probe error
        ALSA: rme32: Fix the missing snd_card_free() call at probe error
        ...
      b9b4c79e
    • Linus Torvalds's avatar
      Merge tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 722985e2
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A few more code and warning fixes.
      
        There's one feature ioctl removal patch slated for 5.18 that did not
        make it to the main pull request. It's just a one-liner and the ioctl
        has a v2 that's in use for a long time, no point to postpone it to
        5.19.
      
        Late update:
      
         - remove balance v1 ioctl, superseded by v2 in 2012
      
        Fixes:
      
         - add back cgroup attribution for compressed writes
      
         - add super block write start/end annotations to asynchronous balance
      
         - fix root reference count on an error handling path
      
         - in zoned mode, activate zone at the chunk allocation time to avoid
           ENOSPC due to timing issues
      
         - fix delayed allocation accounting for direct IO
      
        Warning fixes:
      
         - simplify assertion condition in zoned check
      
         - remove an unused variable"
      
      * tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix btrfs_submit_compressed_write cgroup attribution
        btrfs: fix root ref counts in error handling in btrfs_get_root_ref
        btrfs: zoned: activate block group only for extent allocation
        btrfs: return allocated block group from do_chunk_alloc()
        btrfs: mark resumed async balance as writing
        btrfs: remove support of balance v1 ioctl
        btrfs: release correct delalloc amount in direct IO write path
        btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
        btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
      722985e2
    • Linus Torvalds's avatar
      Merge tag 'fscache-fixes-20220413' of... · ec9c57a7
      Linus Torvalds authored
      Merge tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fscache fixes from David Howells:
       "Here's a collection of fscache and cachefiles fixes and misc small
        cleanups. The two main fixes are:
      
         - Add a missing unmark of the inode in-use mark in an error path.
      
         - Fix a KASAN slab-out-of-bounds error when setting the xattr on a
           cachefiles volume due to the wrong length being given to memcpy().
      
        In addition, there's the removal of an unused parameter, removal of an
        unused Kconfig option, conditionalising a bit of procfs-related stuff
        and some doc fixes"
      
      * tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        fscache: remove FSCACHE_OLD_API Kconfig option
        fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
        fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
        fscache: Remove the cookie parameter from fscache_clear_page_bits()
        docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
        docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
        cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
        cachefiles: unmark inode in use in error path
      ec9c57a7
    • Paolo Abeni's avatar
      Merge branch 'rndis_host-handle-bogus-mac-addresses-in-zte-rndis-devices' · caf968b4
      Paolo Abeni authored
      Lech Perczak says:
      
      ====================
      rndis_host: handle bogus MAC addresses in ZTE RNDIS devices
      
      When porting support of ZTE MF286R to OpenWrt [1], it was discovered,
      that its built-in LTE modem fails to adjust its target MAC address,
      when a random MAC address is assigned to the interface, due to detection of
      "locally-administered address" bit. This leads to dropping of ingress
      trafficat the host. The modem uses RNDIS as its primary interface,
      with some variants exposing both of them simultaneously.
      
      Then it was discovered, that cdc_ether driver contains a fixup for that
      exact issue, also appearing on CDC ECM interfaces.
      I discussed how to proceed with that with Bjørn Mork at OpenWrt forum [3],
      with the first approach would be to trust the locally-administered MAC
      again, and add a quirk for the problematic ZTE devices, as suggested by
      Kristian Evensen. before [4], but reusing the fixup from cdc_ether looks
      like a safer and more generic solution.
      
      Finally, according to Bjørn's suggestion. limit the scope of bogus MAC
      addressdetection to ZTE devices, the same way as it is done in cdc_ether,
      as this trait wasn't really observed outside of ZTE devices.
      Do that for both flavours of RNDIS devices, with interface classes
      02/02/ff and e0/01/03, as both types are reported by different modems.
      
      [1] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=7ac8da00609f42b8aba74b7efc6b0d055b7cef3e
      [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfe9b9d2df669a57a95d641ed46eb018e204c6ce
      [3] https://forum.openwrt.org/t/problem-with-modem-in-zte-mf286r/120988
      [4] https://lore.kernel.org/all/CAKfDRXhDp3heiD75Lat7cr1JmY-kaJ-MS0tt7QXX=s8RFjbpUQ@mail.gmail.com/T/
      
      Cc: Bjørn Mork <bjorn@mork.no>
      Cc: Kristian Evensen <kristian.evensen@gmail.com>
      Cc: Oliver Neukum <oliver@neukum.org>
      
      v3: Fixed wrong identifier commit description and whitespace in patch 2.
      
      v2: ensure that MAC fixup is applied to all Ethernet frames in RNDIS
      batch, by introducing a driver flag, and integrating the fixup inside
      rndis_rx_fixup().
      ====================
      
      Link: https://lore.kernel.org/r/20220413014416.2306843-1-lech.perczak@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      caf968b4
    • Lech Perczak's avatar
      rndis_host: limit scope of bogus MAC address detection to ZTE devices · 171cfae6
      Lech Perczak authored
      Reporting of bogus MAC addresses and ignoring configuration of new
      destination address wasn't observed outside of a range of ZTE devices,
      among which this seems to be the common bug. Align rndis_host driver
      with implementation found in cdc_ether, which also limits this workaround
      to ZTE devices.
      Suggested-by: default avatarBjørn Mork <bjorn@mork.no>
      Cc: Kristian Evensen <kristian.evensen@gmail.com>
      Cc: Oliver Neukum <oliver@neukum.org>
      Signed-off-by: default avatarLech Perczak <lech.perczak@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      171cfae6
    • Lech Perczak's avatar
      rndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether · 36e74797
      Lech Perczak authored
      Certain ZTE modems, namely: MF823. MF831, MF910, built-in modem from
      MF286R, expose both CDC-ECM and RNDIS network interfaces.
      They have a trait of ignoring the locally-administered MAC address
      configured on the interface both in CDC-ECM and RNDIS part,
      and this leads to dropping of incoming traffic by the host.
      However, the workaround was only present in CDC-ECM, and MF286R
      explicitly requires it in RNDIS mode.
      
      Re-use the workaround in rndis_host as well, to fix operation of MF286R
      module, some versions of which expose only the RNDIS interface. Do so by
      introducing new flag, RNDIS_DRIVER_DATA_DST_MAC_FIXUP, and testing for it
      in rndis_rx_fixup. This is required, as RNDIS uses frame batching, and all
      of the packets inside the batch need the fixup. This might introduce a
      performance penalty, because test is done for every returned Ethernet
      frame.
      
      Apply the workaround to both "flavors" of RNDIS interfaces, as older ZTE
      modems, like MF823 found in the wild, report the USB_CLASS_COMM class
      interfaces, while MF286R reports USB_CLASS_WIRELESS_CONTROLLER.
      Suggested-by: default avatarBjørn Mork <bjorn@mork.no>
      Cc: Kristian Evensen <kristian.evensen@gmail.com>
      Cc: Oliver Neukum <oliver@neukum.org>
      Signed-off-by: default avatarLech Perczak <lech.perczak@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      36e74797
    • Lech Perczak's avatar
      cdc_ether: export usbnet_cdc_zte_rx_fixup · 64b97df9
      Lech Perczak authored
      Commit bfe9b9d2 ("cdc_ether: Improve ZTE MF823/831/910 handling")
      introduces a workaround for certain ZTE modems reporting invalid MAC
      addresses over CDC-ECM.
      The same issue was present on their RNDIS interface,which was fixed in
      commit a5a18bdf ("rndis_host: Set valid random MAC on buggy devices").
      
      However, internal modem of ZTE MF286R router, on its RNDIS interface, also
      exhibits a second issue fixed already in CDC-ECM, of the device not
      respecting configured random MAC address. In order to share the fixup for
      this with rndis_host driver, export the workaround function, which will
      be re-used in the following commit in rndis_host.
      
      Cc: Kristian Evensen <kristian.evensen@gmail.com>
      Cc: Bjørn Mork <bjorn@mork.no>
      Cc: Oliver Neukum <oliver@neukum.org>
      Signed-off-by: default avatarLech Perczak <lech.perczak@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      64b97df9
    • Jeremy Linton's avatar
      net: bcmgenet: Revert "Use stronger register read/writes to assure ordering" · 2df3fc4a
      Jeremy Linton authored
      It turns out after digging deeper into this bug, that it was being
      triggered by GCC12 failing to call the bcmgenet_enable_dma()
      routine. Given that a gcc12 fix has been merged [1] and the genet
      driver now works properly when built with gcc12, this commit should
      be reverted.
      
      [1]
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
      https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aabb9a261ef060cf24fd626713f1d7d9df81aa57
      
      Fixes: 8d3ea3d4 ("net: bcmgenet: Use stronger register read/writes to assure ordering")
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220412210420.1129430-1-jeremy.linton@arm.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2df3fc4a
    • Petr Machata's avatar
      rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies · 23cfe941
      Petr Machata authored
      When L3 stats are disabled, rtnl_offload_xstats_get_size_stats() returns
      size of 0, which is supposed to be an indication that the corresponding
      attribute should not be emitted. However, instead, the current code
      reserves a 0-byte attribute.
      
      The reason this does not show up as a citation on a kasan kernel is that
      netdev_offload_xstats_get(), which is supposed to fill in the data, never
      ends up getting called, because rtnl_offload_xstats_get_stats() notices
      that the stats are not actually used and skips the call.
      
      Thus a zero-length IFLA_OFFLOAD_XSTATS_L3_STATS attribute ends up in a
      response, confusing the userspace.
      
      Fix by skipping the L3-stats related block in rtnl_offload_xstats_fill().
      
      Fixes: 0e7788fd ("net: rtnetlink: Add UAPI for obtaining L3 offload xstats")
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/591b58e7623edc3eb66dd1fcfa8c8f133d090974.1649794741.git.petrm@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      23cfe941
    • Vladimir Oltean's avatar
      net: dsa: felix: fix tagging protocol changes with multiple CPU ports · 00fa91bc
      Vladimir Oltean authored
      When the device tree has 2 CPU ports defined, a single one is active
      (has any dp->cpu_dp pointers point to it). Yet the second one is still a
      CPU port, and DSA still calls ->change_tag_protocol on it.
      
      On the NXP LS1028A, the CPU ports are ports 4 and 5. Port 4 is the
      active CPU port and port 5 is inactive.
      
      After the following commands:
      
       # Initial setting
       cat /sys/class/net/eno2/dsa/tagging
       ocelot
       echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
       echo ocelot > /sys/class/net/eno2/dsa/tagging
      
      traffic is now broken, because the driver has moved the NPI port from
      port 4 to port 5, unbeknown to DSA.
      
      The problem can be avoided by detecting that the second CPU port is
      unused, and not doing anything for it. Further rework will be needed
      when proper support for multiple CPU ports is added.
      
      Treat this as a bug and prepare current kernels to work in single-CPU
      mode with multiple-CPU DT blobs.
      
      Fixes: adb3dccf ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220412172209.2531865-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      00fa91bc
    • Antoine Tenart's avatar
      tun: annotate access to queue->trans_start · 968a1a5d
      Antoine Tenart authored
      Commit 5337824f ("net: annotate accesses to queue->trans_start")
      introduced a new helper, txq_trans_cond_update, to update
      queue->trans_start using WRITE_ONCE. One snippet in drivers/net/tun.c
      was missed, as it was introduced roughly at the same time.
      
      Fixes: 5337824f ("net: annotate accesses to queue->trans_start")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220412135852.466386-1-atenart@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      968a1a5d
  3. 13 Apr, 2022 26 commits
    • Lin Ma's avatar
      nfc: nci: add flush_workqueue to prevent uaf · ef27324e
      Lin Ma authored
      Our detector found a concurrent use-after-free bug when detaching an
      NCI device. The main reason for this bug is the unexpected scheduling
      between the used delayed mechanism (timer and workqueue).
      
      The race can be demonstrated below:
      
      Thread-1                           Thread-2
                                       | nci_dev_up()
                                       |   nci_open_device()
                                       |     __nci_request(nci_reset_req)
                                       |       nci_send_cmd
                                       |         queue_work(cmd_work)
      nci_unregister_device()          |
        nci_close_device()             | ...
          del_timer_sync(cmd_timer)[1] |
      ...                              | Worker
      nci_free_device()                | nci_cmd_work()
        kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
      
      In short, the cleanup routine thought that the cmd_timer has already
      been detached by [1] but the mod_timer can re-attach the timer [2], even
      it is already released [3], resulting in UAF.
      
      This UAF is easy to trigger, crash trace by POC is like below
      
      [   66.703713] ==================================================================
      [   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
      [   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
      [   66.703974]
      [   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
      [   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
      [   66.703974] Call Trace:
      [   66.703974]  <TASK>
      [   66.703974]  dump_stack_lvl+0x57/0x7d
      [   66.703974]  print_report.cold+0x5e/0x5db
      [   66.703974]  ? enqueue_timer+0x448/0x490
      [   66.703974]  kasan_report+0xbe/0x1c0
      [   66.703974]  ? enqueue_timer+0x448/0x490
      [   66.703974]  enqueue_timer+0x448/0x490
      [   66.703974]  __mod_timer+0x5e6/0xb80
      [   66.703974]  ? mark_held_locks+0x9e/0xe0
      [   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
      [   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
      [   66.703974]  ? queue_work_on+0x61/0x80
      [   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
      [   66.703974]  process_one_work+0x8bb/0x1510
      [   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
      [   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
      [   66.703974]  ? rwlock_bug.part.0+0x90/0x90
      [   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
      [   66.703974]  worker_thread+0x575/0x1190
      [   66.703974]  ? process_one_work+0x1510/0x1510
      [   66.703974]  kthread+0x2a0/0x340
      [   66.703974]  ? kthread_complete_and_exit+0x20/0x20
      [   66.703974]  ret_from_fork+0x22/0x30
      [   66.703974]  </TASK>
      [   66.703974]
      [   66.703974] Allocated by task 267:
      [   66.703974]  kasan_save_stack+0x1e/0x40
      [   66.703974]  __kasan_kmalloc+0x81/0xa0
      [   66.703974]  nci_allocate_device+0xd3/0x390
      [   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
      [   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
      [   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
      [   66.703974]  tty_ioctl+0x764/0x1310
      [   66.703974]  __x64_sys_ioctl+0x122/0x190
      [   66.703974]  do_syscall_64+0x3b/0x90
      [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   66.703974]
      [   66.703974] Freed by task 406:
      [   66.703974]  kasan_save_stack+0x1e/0x40
      [   66.703974]  kasan_set_track+0x21/0x30
      [   66.703974]  kasan_set_free_info+0x20/0x30
      [   66.703974]  __kasan_slab_free+0x108/0x170
      [   66.703974]  kfree+0xb0/0x330
      [   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
      [   66.703974]  nci_uart_tty_close+0xdf/0x180
      [   66.703974]  tty_ldisc_kill+0x73/0x110
      [   66.703974]  tty_ldisc_hangup+0x281/0x5b0
      [   66.703974]  __tty_hangup.part.0+0x431/0x890
      [   66.703974]  tty_release+0x3a8/0xc80
      [   66.703974]  __fput+0x1f0/0x8c0
      [   66.703974]  task_work_run+0xc9/0x170
      [   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
      [   66.703974]  syscall_exit_to_user_mode+0x19/0x50
      [   66.703974]  do_syscall_64+0x48/0x90
      [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      To fix the UAF, this patch adds flush_workqueue() to ensure the
      nci_cmd_work is finished before the following del_timer_sync.
      This combination will promise the timer is actually detached.
      
      Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef27324e
    • Alvin Šipraga's avatar
      net: dsa: realtek: don't parse compatible string for RTL8366S · 8e925de6
      Alvin Šipraga authored
      This switch is not even supported, but if someone were to actually put
      this compatible string "realtek,rtl8366s" in their device tree, they
      would be greeted with a kernel panic because the probe function would
      dereference NULL. So let's just remove it.
      
      Link: https://lore.kernel.org/all/CACRpkdYdKZs0WExXc3=0yPNOwP+oOV60HRz7SRoGjZvYHaT=1g@mail.gmail.com/Signed-off-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e925de6
    • Alvin Šipraga's avatar
      net: dsa: realtek: fix Kconfig to assure consistent driver linkage · 2511e0c8
      Alvin Šipraga authored
      The kernel test robot reported a build failure:
      
      or1k-linux-ld: drivers/net/dsa/realtek/realtek-smi.o:(.rodata+0x16c): undefined reference to `rtl8366rb_variant'
      
      ... with the following build configuration:
      
      CONFIG_NET_DSA_REALTEK=y
      CONFIG_NET_DSA_REALTEK_SMI=y
      CONFIG_NET_DSA_REALTEK_RTL8365MB=y
      CONFIG_NET_DSA_REALTEK_RTL8366RB=m
      
      The problem here is that the realtek-smi interface driver gets built-in,
      while the rtl8366rb switch subdriver gets built as a module, hence the
      symbol rtl8366rb_variant is not reachable when defining the OF device
      table in the interface driver.
      
      The Kconfig dependencies don't help in this scenario because they just
      say that the subdriver(s) depend on at least one interface driver. In
      fact, the subdrivers don't depend on the interface drivers at all, and
      can even be built even in their absence. Somewhat strangely, the
      interface drivers can also be built in the absence of any subdriver,
      BUT, if a subdriver IS enabled, then it must be reachable according to
      the linkage of the interface driver: effectively what the IS_REACHABLE()
      macro achieves. If it is not reachable, the above kind of linker error
      will be observed.
      
      Rather than papering over the above build error by simply using
      IS_REACHABLE(), we can do a little better and admit that it is actually
      the interface drivers that have a dependency on the subdrivers. So this
      patch does exactly that. Specifically, we ensure that:
      
      1. The interface drivers' Kconfig symbols must have a value no greater
         than the value of any subdriver Kconfig symbols.
      
      2. The subdrivers should by default enable both interface drivers, since
         most users probably want at least one of them; those interface
         drivers can be explicitly disabled however.
      
      What this doesn't do is prevent a user from building only a subdriver,
      without any interface driver. To that end, add an additional line of
      help in the menu to guide users in the right direction.
      
      Link: https://lore.kernel.org/all/202204110757.XIafvVnj-lkp@intel.com/Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: aac94001 ("net: dsa: realtek: add new mdio interface for drivers")
      Signed-off-by: default avatarAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2511e0c8
    • Dylan Muller's avatar
      nfp: update nfp_X logging definitions · 9386ebcc
      Dylan Muller authored
      Previously it was not possible to determine which code path was responsible
      for generating a certain message after a call to the nfp_X messaging
      definitions for cases of duplicate strings. We therefore modify nfp_err,
      nfp_warn, nfp_info, nfp_dbg and nfp_printk to print the corresponding file
      and line number where the nfp_X definition is used.
      Signed-off-by: default avatarDylan Muller <dylan.muller@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9386ebcc
    • David S. Miller's avatar
      Merge tag 'wireless-2022-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · dad32cfe
      David S. Miller authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v5.18
      
      First set of fixes for v5.18. Maintainers file updates, two
      compilation warning fixes, one revert for ath11k and smaller fixes to
      drivers and stack. All the usual stuff.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dad32cfe
    • David S. Miller's avatar
      Merge branch 'ip-ingress-skb-reason' · 735cb16b
      David S. Miller authored
      Menglong Dong says:
      
      ====================
      net: ip: add skb drop reasons to ip ingress
      
      In the series "net: use kfree_skb_reason() for ip/udp packet receive",
      skb drop reasons are added to the basic ingress path of IPv4. And in
      the series "net: use kfree_skb_reason() for ip/neighbour", the egress
      paths of IPv4 and IPv6 are handled. Related links:
      
      https://lore.kernel.org/netdev/20220205074739.543606-1-imagedong@tencent.com/
      https://lore.kernel.org/netdev/20220226041831.2058437-1-imagedong@tencent.com/
      
      Seems we still have a lot work to do with IP layer, including IPv6 basic
      ingress path, IPv4/IPv6 forwarding, IPv6 exthdrs, fragment and defrag,
      etc.
      
      In this series, skb drop reasons are added to the basic ingress path of
      IPv6 protocol and IPv4/IPv6 packet forwarding. Following functions, which
      are used for IPv6 packet receiving are handled:
      
        ip6_pkt_drop()
        ip6_rcv_core()
        ip6_protocol_deliver_rcu()
      
      And following functions that used for IPv6 TLV parse are handled:
      
        ip6_parse_tlv()
        ipv6_hop_ra()
        ipv6_hop_ioam()
        ipv6_hop_jumbo()
        ipv6_hop_calipso()
        ipv6_dest_hao()
      
      Besides, ip_forward() and ip6_forward(), which are used for IPv4/IPv6
      forwarding, are also handled. And following new drop reasons are added:
      
        /* host unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
        SKB_DROP_REASON_IP_INADDRERRORS
        /* network unreachable, corresponding to IPSTATS_MIB_INADDRERRORS */
        SKB_DROP_REASON_IP_INNOROUTES
        /* packet size is too big, corresponding to
         * IPSTATS_MIB_INTOOBIGERRORS
         */
        SKB_DROP_REASON_PKT_TOO_BIG
      
      In order to simply the definition and assignment for
      'enum skb_drop_reason', some helper functions are introduced in the 1th
      patch. I'm not such if this is necessary, but it makes the code simpler.
      For example, we can replace the code:
      
        if (reason == SKB_DROP_REASON_NOT_SPECIFIED)
                reason = SKB_DROP_REASON_IP_INHDR;
      
      with:
      
        SKB_DR_OR(reason, IP_INHDR);
      
      In the 6th patch, the statistics for skb in ipv6_hop_jum() is removed,
      as I think it is redundant. There are two call chains for
      ipv6_hop_jumbo(). The first one is:
      
        ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()
      
      On this call chain, the drop statistics will be done in
      ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
      returns false.
      
      The second call chain is:
      
        ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()
      
      And the drop statistics will also be done in ip6_rcv_core() with
      'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.
      
      Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
      means the drop is counted twice. The statistics in ipv6_hop_jumbo()
      is almost the same as the outside, except the
      'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.
      
      ======================================================================
      
      Here is a basic test for IPv6 forwarding packet drop that monitored by
      'dropwatch' tool:
      
        drop at: ip6_forward+0x81a/0xb70 (0xffffffff86c73f8a)
        origin: software
        input port ifindex: 7
        timestamp: Wed Apr 13 11:51:06 2022 130010176 nsec
        protocol: 0x86dd
        length: 94
        original length: 94
        drop reason: IP_INADDRERRORS
      
      The origin cause of this case is that IPv6 doesn't allow to forward the
      packet with LOCAL-LINK saddr, and results the 'IP_INADDRERRORS' drop
      reason.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      735cb16b
    • Menglong Dong's avatar
      net: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu() · eeab7e7f
      Menglong Dong authored
      Replace kfree_skb() used in ip6_protocol_deliver_rcu() with
      kfree_skb_reason().
      
      No new reasons are added.
      
      Some paths are ignored, as they are not common, such as encapsulation
      on non-final protocol.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeab7e7f
    • Menglong Dong's avatar
      net: ipv6: add skb drop reasons to ip6_rcv_core() · 4daf841a
      Menglong Dong authored
      Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason().
      No new drop reasons are added.
      
      Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during
      ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS'
      do. Will it be too general and hard to know what happened?
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4daf841a
    • Menglong Dong's avatar
      net: ipv6: add skb drop reasons to TLV parse · 7d9dbdfb
      Menglong Dong authored
      Replace kfree_skb() used in TLV encoded option header parsing with
      kfree_skb_reason(). Following functions are involved:
      
      ip6_parse_tlv()
      ipv6_hop_ra()
      ipv6_hop_ioam()
      ipv6_hop_jumbo()
      ipv6_hop_calipso()
      ipv6_dest_hao()
      
      Most skb drops during this process are regarded as 'InHdrErrors',
      as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails,
      which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly.
      
      However, 'IP_INHDR' is a relatively general reason. Therefore, we
      can use other reasons with higher priority in some cases. For example,
      'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d9dbdfb
    • Menglong Dong's avatar
      net: ipv6: remove redundant statistics in ipv6_hop_jumbo() · bba98083
      Menglong Dong authored
      There are two call chains for ipv6_hop_jumbo(). The first one is:
      
      ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()
      
      On this call chain, the drop statistics will be done in
      ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
      returns false.
      
      The second call chain is:
      
      ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()
      
      And the drop statistics will also be done in ip6_rcv_core() with
      'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.
      
      Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
      means the drop is counted twice. The statistics in ipv6_hop_jumbo()
      is almost the same as the outside, except the
      'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bba98083
    • Menglong Dong's avatar
      net: icmp: introduce function icmpv6_param_prob_reason() · 1ad6d548
      Menglong Dong authored
      In order to add the skb drop reasons support to icmpv6_param_prob(),
      introduce the function icmpv6_param_prob_reason() and make
      icmpv6_param_prob() an inline call to it. This new function will be
      used in the following patches.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ad6d548
    • Menglong Dong's avatar
      net: ip: add skb drop reasons to ip forwarding · 2edc1a38
      Menglong Dong authored
      Replace kfree_skb() which is used in ip6_forward() and ip_forward()
      with kfree_skb_reason().
      
      The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for
      the case that the length of the packet exceeds MTU and can't
      fragment.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2edc1a38
    • Menglong Dong's avatar
      net: ipv6: add skb drop reasons to ip6_pkt_drop() · 3ae42cc8
      Menglong Dong authored
      Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason().
      No new reason is added.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ae42cc8
    • Menglong Dong's avatar
      net: ipv4: add skb drop reasons to ip_error() · c4eb6641
      Menglong Dong authored
      Eventually, I find out the handler function for inputting route lookup
      fail: ip_error().
      
      The drop reasons we used in ip_error() are almost corresponding to
      IPSTATS_MIB_*, and following new reasons are introduced:
      
      SKB_DROP_REASON_IP_INADDRERRORS
      SKB_DROP_REASON_IP_INNOROUTES
      
      Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and
      SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding
      to IPSTATS_MIB_*, we keep their name still.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4eb6641
    • Menglong Dong's avatar
      skb: add some helpers for skb drop reasons · d6d3146c
      Menglong Dong authored
      In order to simply the definition and assignment for
      'enum skb_drop_reason', introduce some helpers.
      
      SKB_DR() is used to define a variable of type 'enum skb_drop_reason'
      with the 'SKB_DROP_REASON_NOT_SPECIFIED' initial value.
      
      SKB_DR_SET() is used to set the value of the variable. Seems it is
      a little useless? But it makes the code shorter.
      
      SKB_DR_OR() is used to set the value of the variable if it is not set
      yet, which means its value is SKB_DROP_REASON_NOT_SPECIFIED.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Reviewed-by: default avatarJiang Biao <benbjiang@tencent.com>
      Reviewed-by: default avatarHao Peng <flyingpeng@tencent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6d3146c
    • David S. Miller's avatar
      Merge branch 'octeon_ep-driver' · dba47afd
      David S. Miller authored
      Veerasenareddy Burru says:
      
      ====================
      Add octeon_ep driver
      
      This driver implements networking functionality of Marvell's Octeon
      PCI Endpoint NIC.
      
      This driver support following devices:
       * Network controller: Cavium, Inc. Device b200
      
      V4 -> V5:
         - Fix warnings reported by clang.
         - Address comments from community reviews.
      
      V3 -> V4:
         - Fix warnings and errors reported by "make W=1 C=1".
      
      V2 -> V3:
         - Fix warnings and errors reported by kernel test robot:
           "Reported-by: kernel test robot <lkp@intel.com>"
      
      V1 -> V2:
          - Address review comments on original patch series.
          - Divide PATCH 1/4 from the original series into 4 patches in
            v2 patch series: PATCH 1/7 to PATCH 4/7.
          - Fix clang build errors.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba47afd
    • Veerasenareddy Burru's avatar
      octeon_ep: add ethtool support for Octeon PCI Endpoint NIC · 5cc256e7
      Veerasenareddy Burru authored
      Add support for the following ethtool commands:
      
      ethtool -i|--driver devname
      ethtool devname
      ethtool -s devname [speed N] [autoneg on|off] [advertise N]
      ethtool -S|--statistics devname
      Signed-off-by: default avatarVeerasenareddy Burru <vburru@marvell.com>
      Signed-off-by: default avatarAbhijit Ayarekar <aayarekar@marvell.com>
      Signed-off-by: default avatarSatananda Burla <sburla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cc256e7
    • Veerasenareddy Burru's avatar
      octeon_ep: add Tx/Rx processing and interrupt support · 37d79d05
      Veerasenareddy Burru authored
      Add support to enable MSI-x and register interrupts.
      Add support to process Tx and Rx traffic. Includes processing
      Tx completions and Rx refill.
      Signed-off-by: default avatarVeerasenareddy Burru <vburru@marvell.com>
      Signed-off-by: default avatarAbhijit Ayarekar <aayarekar@marvell.com>
      Signed-off-by: default avatarSatananda Burla <sburla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37d79d05
    • Veerasenareddy Burru's avatar
      octeon_ep: add support for ndo ops · 6a610a46
      Veerasenareddy Burru authored
      Add support for ndo ops to set MAC address, change MTU, get stats.
      Add control path support to set MAC address, change MTU, get stats,
      set speed, get and set link mode.
      Signed-off-by: default avatarVeerasenareddy Burru <vburru@marvell.com>
      Signed-off-by: default avatarAbhijit Ayarekar <aayarekar@marvell.com>
      Signed-off-by: default avatarSatananda Burla <sburla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a610a46
    • Veerasenareddy Burru's avatar
      octeon_ep: add Tx/Rx ring resource setup and cleanup · 397dfb57
      Veerasenareddy Burru authored
      Implement Tx/Rx ring resource allocation and cleanup.
      Signed-off-by: default avatarVeerasenareddy Burru <vburru@marvell.com>
      Signed-off-by: default avatarAbhijit Ayarekar <aayarekar@marvell.com>
      Signed-off-by: default avatarSatananda Burla <sburla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      397dfb57
    • Veerasenareddy Burru's avatar
      octeon_ep: Add mailbox for control commands · 4ca2fbdd
      Veerasenareddy Burru authored
      Add mailbox between host and NIC to send control commands from host to
      NIC and receive responses and notifications from NIC to host driver,
      like link status update.
      Signed-off-by: default avatarVeerasenareddy Burru <vburru@marvell.com>
      Signed-off-by: default avatarAbhijit Ayarekar <aayarekar@marvell.com>
      Signed-off-by: default avatarSatananda Burla <sburla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ca2fbdd
    • Veerasenareddy Burru's avatar
      octeon_ep: add hardware configuration APIs · 1f2c2d0c
      Veerasenareddy Burru authored
      Implement hardware resource init and shutdown helper APIs.
      This includes hardware Tx/Rx queue init/enable/disable/reset,
      non queue interrupt handler that decodes non-queue interrupt type.
      Signed-off-by: default avatarVeerasenareddy Burru <vburru@marvell.com>
      Signed-off-by: default avatarAbhijit Ayarekar <aayarekar@marvell.com>
      Signed-off-by: default avatarSatananda Burla <sburla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f2c2d0c
    • Veerasenareddy Burru's avatar
      octeon_ep: Add driver framework and device initialization · 862cd659
      Veerasenareddy Burru authored
      Add driver framework and device setup and initialization for Octeon
      PCI Endpoint NIC.
      
      Add implementation to load module, initilaize, register network device,
      cleanup and unload module.
      Signed-off-by: default avatarVeerasenareddy Burru <vburru@marvell.com>
      Signed-off-by: default avatarAbhijit Ayarekar <aayarekar@marvell.com>
      Signed-off-by: default avatarSatananda Burla <sburla@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      862cd659
    • David S. Miller's avatar
      Merge branch 'br-flush-filtering' · 92716869
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: add flush filtering support
      
      This patch-set adds support to specify filtering conditions for a bulk
      delete (flush) operation. This version uses a new nlmsghdr delete flag
      called NLM_F_BULK in combination with a new ndo_fdb_del_bulk op which is
      used to signal that the driver supports bulk deletes (that avoids
      pushing common mac address checks to ndo_fdb_del implementations and
      also has a different prototype and parsed attribute expectations, more
      info in patch 03). The new delete flag can be used for any RTM_DEL*
      type, implementations just need to be careful with older kernels which
      are doing non-strict attribute parses. A new rtnl flag
      (RTNL_FLAG_BULK_DEL_SUPPORTED) is used to show that the delete supports
      NLM_F_BULK. A proper error is returned if bulk delete is not supported.
      For old kernels I use the fact that mac address attribute (lladdr) is
      mandatory in the classic fdb del case, but it's not allowed if bulk
      deleting so older kernels will error out.
      
      Patch 01 and 02 are minor rtnetlink cleanups to make the code easier to
      read. They remove hardcoded values and use names instead. Patch 03 uses
      BIT() for rtnl flags.
      Patch 04 adds the new NLM_F_BULK delete request modifier, patch 05 adds
      the new bulk delete flag and checks for it if the delete requests have
      NLM_F_BULK set, it also warns if rtnl register is called with a non-delete
      kind and the bulk delete flag is set.
      Patch 06 adds the new ndo_fdb_del_bulk call. Patch 07 adds NLM_F_BULK
      support to rtnl_fdb_del, on such request strict parsing is used only for
      the supported attributes, and if the ndo is implemented it's called, the
      NTF_SELF/MASTER rules are the same as for the standard rtnl_fdb_del.
      Patch 08 implements bridge-specific minimal ndo_fdb_del_bulk call which
      uses the current br_fdb_flush to delete all entries. Patch 09 adds
      filtering support to the new bridge flush op which supports target
      ifindex (port or bridge), vlan id and flags/state mask. Patch 10 adds
      ndm state and flags mask attributes which will be used for filtering.
      Patch 11 converts ndm state/flags and their masks to bridge-private flags
      and fills them in the filter descriptor for matching. Finally patch 12
      fills in the target ifindex (after validating it) and vlan id (already
      validated by rtnl_fdb_flush) for matching. Flush filtering is needed
      because user-space applications need a quick way to delete only a
      specific set of entries, e.g. mlag implementations need a way to flush only
      dynamic entries excluding externally learned ones or only externally
      learned ones without static entries etc. Also apps usually want to target
      only a specific vlan or port/vlan combination. The current 2 flush
      operations (per port and bridge-wide) are not extensible and cannot
      provide such filtering.
      
      I decided against embedding new attrs into the old flush attributes for
      multiple reasons - proper error handling on unsupported attributes,
      older kernels silently flushing all, need for a second mechanism to
      signal that the attribute should be parsed (e.g. using boolopts),
      special treatment for permanent entries.
      
      Examples:
      $ bridge fdb flush dev bridge vlan 100 static
      < flush all static entries on vlan 100 >
      $ bridge fdb flush dev bridge vlan 1 dynamic
      < flush all dynamic entries on vlan 1 >
      $ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
      < flush all dynamic entries on port ens16 and vlan 1 >
      $ bridge fdb flush dev ens16 vlan 1 dynamic master
      < as above: flush all dynamic entries on port ens16 and vlan 1 >
      $ bridge fdb flush dev bridge nooffloaded nopermanent self
      < flush all non-offloaded and non-permanent entries >
      $ bridge fdb flush dev bridge static noextern_learn
      < flush all static entries which are not externally learned >
      $ bridge fdb flush dev bridge permanent
      < flush all permanent entries >
      $ bridge fdb flush dev bridge port bridge permanent
      < flush all permanent entries pointing to the bridge itself >
      
      Example of a flush call with unsupported netlink attribute (NDA_DST):
      $ bridge fdb flush dev bridge vlan 100 dynamic dst
      Error: Unsupported attribute.
      
      Example of a flush call on an older kernel:
      $ bridge fdb flush dev bridge dynamic
      Error: invalid address.
      
      Example of calling PF_UNSPEC RTM_DELNEIGH which doesn't support bulk delete
      with NLM_F_BULK set (ip neigh is changed to add the flag):
      $ ip n del 192.168.122.5 lladdr 00:11:22:33:44:55 dev ens3
      Error: Bulk delete is not supported.
      
      Note that all flags have their negated version (static vs nostatic etc)
      and there are some tricky cases to handle like "static" which in flag
      terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
      mask matches on both but we need only NUD_NOARP to be set. That's
      because permanent entries have both set so we can't just match on
      NUD_NOARP. Also note that this flush operation doesn't treat permanent
      entries in a special way (fdb_delete vs fdb_delete_local), it will
      delete them regardless if any port is using them. We can extend the api
      with a flag to do that if needed in the future.
      
      Patch-sets (in order):
       - Initial bulk del infra and fdb flush filtering (this set)
       - iproute2 support
       - selftests
      
      v4: Add and check for rtnl del bulk supported flag when using
          NLM_F_BULK (new patch 05), patches 01 - 03 are also new minor cleanups
          to remove use of raw values and make code easier to read, don't
          rename br_fdb_flush in patch 08, set port ifindex as flush target if
          NDA_IFINDEX is missing and flush was called with port netdev and
          NTF_MASTER (patch 12).
      
      v3: Add NLM_F_BULK delete modifier and ndo_fdb_del_bulk callback,
          patches 01 - 03 and 06 are new. Patch 04 is changed to implement
          bulk_del instead of flush, patches 05, 07 and 08 are adjusted to
          use NDA_ attributes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92716869
    • Nikolay Aleksandrov's avatar
      net: bridge: fdb: add support for flush filtering based on ifindex and vlan · 0dbe886a
      Nikolay Aleksandrov authored
      Add support for fdb flush filtering based on destination ifindex and
      vlan id. The ifindex must either match a port's device ifindex or the
      bridge's. The vlan support is trivial since it's already validated by
      rtnl_fdb_del, we just need to fill it in.
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dbe886a
    • Nikolay Aleksandrov's avatar
      net: bridge: fdb: add support for flush filtering based on ndm flags and state · 564445fb
      Nikolay Aleksandrov authored
      Add support for fdb flush filtering based on ndm flags and state. NDM
      state and flags are mapped to bridge-specific flags and matched
      according to the specified masks. NTF_USE is used to represent
      added_by_user flag since it sets it on fdb add and we don't have a 1:1
      mapping for it. Only allowed bits can be set, NTF_SELF and NTF_MASTER are
      ignored.
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      564445fb