1. 22 Apr, 2021 5 commits
  2. 21 Apr, 2021 15 commits
    • Chinmay Agarwal's avatar
      neighbour: Prevent Race condition in neighbour subsytem · eefb45ee
      Chinmay Agarwal authored
      Following Race Condition was detected:
      
      <CPU A, t0>: Executing: __netif_receive_skb() ->__netif_receive_skb_core()
      -> arp_rcv() -> arp_process().arp_process() calls __neigh_lookup() which
      takes a reference on neighbour entry 'n'.
      Moves further along, arp_process() and calls neigh_update()->
      __neigh_update(). Neighbour entry is unlocked just before a call to
      neigh_update_gc_list.
      
      This unlocking paves way for another thread that may take a reference on
      the same and mark it dead and remove it from gc_list.
      
      <CPU B, t1> - neigh_flush_dev() is under execution and calls
      neigh_mark_dead(n) marking the neighbour entry 'n' as dead. Also n will be
      removed from gc_list.
      Moves further along neigh_flush_dev() and calls
      neigh_cleanup_and_release(n), but since reference count increased in t1,
      'n' couldn't be destroyed.
      
      <CPU A, t3>- Code hits neigh_update_gc_list, with neighbour entry
      set as dead.
      
      <CPU A, t4> - arp_process() finally calls neigh_release(n), destroying
      the neighbour entry and we have a destroyed ntry still part of gc_list.
      
      Fixes: eb4e8fac("neighbour: Prevent a dead entry from updating gc_list")
      Signed-off-by: default avatarChinmay Agarwal <chinagar@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eefb45ee
    • jinyiting's avatar
      bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine · 83d686a6
      jinyiting authored
      The bond works in mode 4, and performs down/up operations on the bond
      that is normally negotiated. The probability of bond-> slave_arr is NULL
      
      Test commands:
         ifconfig bond1 down
         ifconfig bond1 up
      
      The conflict occurs in the following process:
      
      __dev_open (CPU A)
      --bond_open
        --queue_delayed_work(bond->wq,&bond->ad_work,0);
        --bond_update_slave_arr
          --bond_3ad_get_active_agg_info
      
      ad_work(CPU B)
      --bond_3ad_state_machine_handler
        --ad_agg_selection_logic
      
      ad_work runs on cpu B. In the function ad_agg_selection_logic, all
      agg->is_active will be cleared. Before the new active aggregator is
      selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A,
      bond->slave_arr will be set to NULL. The best aggregator in
      ad_agg_selection_logic has not changed, no need to update slave arr.
      
      The conflict occurred in that ad_agg_selection_logic clears
      agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr
      is inspecting agg->is_active outside the lock.
      
      Also, bond_update_slave_arr is normal for potential sleep when
      allocating memory, so replace the WARN_ON with a call to might_sleep.
      Signed-off-by: default avatarjinyiting <jinyiting@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83d686a6
    • Bjorn Andersson's avatar
      net: qrtr: Avoid potential use after free in MHI send · 47a017f3
      Bjorn Andersson authored
      It is possible that the MHI ul_callback will be invoked immediately
      following the queueing of the skb for transmission, leading to the
      callback decrementing the refcount of the associated sk and freeing the
      skb.
      
      As such the dereference of skb and the increment of the sk refcount must
      happen before the skb is queued, to avoid the skb to be used after free
      and potentially the sk to drop its last refcount..
      
      Fixes: 6e728f32 ("net: qrtr: Add MHI transport layer")
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47a017f3
    • Martin Schiller's avatar
      net: phy: intel-xway: enable integrated led functions · 357a07c2
      Martin Schiller authored
      The Intel xway phys offer the possibility to deactivate the integrated
      LED function and to control the LEDs manually.
      If this was set by the bootloader, it must be ensured that the
      integrated LED function is enabled for all LEDs when loading the driver.
      
      Before commit 6e2d85ec ("net: phy: Stop with excessive soft reset")
      the LEDs were enabled by a soft-reset of the PHY (using
      genphy_soft_reset). Initialize the XWAY_MDIO_LED with it's default
      value (which is applied during a soft reset) instead of adding back
      the soft reset. This brings back the default LED configuration while
      still preventing an excessive amount of soft resets.
      
      Fixes: 6e2d85ec ("net: phy: Stop with excessive soft reset")
      Signed-off-by: default avatarMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      357a07c2
    • Yoshihiro Shimoda's avatar
      net: renesas: ravb: Fix a stuck issue when a lot of frames are received · 5718458b
      Yoshihiro Shimoda authored
      When a lot of frames were received in the short term, the driver
      caused a stuck of receiving until a new frame was received. For example,
      the following command from other device could cause this issue.
      
          $ sudo ping -f -l 1000 -c 1000 <this driver's ipaddress>
      
      The previous code always cleared the interrupt flag of RX but checks
      the interrupt flags in ravb_poll(). So, ravb_poll() could not call
      ravb_rx() in the next time until a new RX frame was received if
      ravb_rx() returned true. To fix the issue, always calls ravb_rx()
      regardless the interrupt flags condition.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5718458b
    • Ong Boon Leong's avatar
      net: stmmac: fix TSO and TBS feature enabling during driver open · 5e6038b8
      Ong Boon Leong authored
      TSO and TBS cannot co-exist and current implementation requires two
      fixes:
      
       1) stmmac_open() does not need to call stmmac_enable_tbs() because
          the MAC is reset in stmmac_init_dma_engine() anyway.
       2) Inside stmmac_hw_setup(), we should call stmmac_enable_tso() for
          TX Q that is _not_ configured for TBS.
      
      Fixes: 579a25a8 ("net: stmmac: Initial support for TBS")
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e6038b8
    • Yinjun Zhang's avatar
      nfp: devlink: initialize the devlink port attribute "lanes" · 90b669d6
      Yinjun Zhang authored
      The number of lanes of devlink port should be correctly initialized
      when registering the port, so that the input check when running
      "devlink port split <port> count <N>" can pass.
      
      Fixes: a21cf0a8 ("devlink: Add a new devlink port lanes attribute and pass to netlink")
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90b669d6
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2021-04-21' of... · 542c4095
      David S. Miller authored
      Merge tag 'wireless-drivers-2021-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.12
      
      As there was -rc8 release, one more important fix for v5.12.
      
      iwlwifi
      
      * fix spinlock warning in gen2 devices
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      542c4095
    • Colin Ian King's avatar
      net: davinci_emac: Fix incorrect masking of tx and rx error channel · d83b8aa5
      Colin Ian King authored
      The bit-masks used for the TXERRCH and RXERRCH (tx and rx error channels)
      are incorrect and always lead to a zero result. The mask values are
      currently the incorrect post-right shifted values, fix this by setting
      them to the currect values.
      
      (I double checked these against the TMS320TCI6482 data sheet, section
      5.30, page 127 to ensure I had the correct mask values for the TXERRCH
      and RXERRCH fields in the MACSTATUS register).
      
      Addresses-Coverity: ("Operands don't affect result")
      Fixes: a6286ee6 ("net: Add TI DaVinci EMAC driver")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d83b8aa5
    • Vadym Kochan's avatar
      net: marvell: prestera: fix port event handling on init · 33398048
      Vadym Kochan authored
      For some reason there might be a crash during ports creation if port
      events are handling at the same time  because fw may send initial
      port event with down state.
      
      The crash points to cancel_delayed_work() which is called when port went
      is down.  Currently I did not find out the real cause of the issue, so
      fixed it by cancel port stats work only if previous port's state was up
      & runnig.
      
      The following is the crash which can be triggered:
      
      [   28.311104] Unable to handle kernel paging request at virtual address
      000071775f776600
      [   28.319097] Mem abort info:
      [   28.321914]   ESR = 0x96000004
      [   28.324996]   EC = 0x25: DABT (current EL), IL = 32 bits
      [   28.330350]   SET = 0, FnV = 0
      [   28.333430]   EA = 0, S1PTW = 0
      [   28.336597] Data abort info:
      [   28.339499]   ISV = 0, ISS = 0x00000004
      [   28.343362]   CM = 0, WnR = 0
      [   28.346354] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100bf7000
      [   28.352842] [000071775f776600] pgd=0000000000000000,
      p4d=0000000000000000
      [   28.359695] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [   28.365310] Modules linked in: prestera_pci(+) prestera
      uio_pdrv_genirq
      [   28.372005] CPU: 0 PID: 1291 Comm: kworker/0:1H Not tainted
      5.11.0-rc4 #1
      [   28.378846] Hardware name: DNI AmazonGo1 A7040 board (DT)
      [   28.384283] Workqueue: prestera_fw_wq prestera_fw_evt_work_fn
      [prestera_pci]
      [   28.391413] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
      [   28.397468] pc : get_work_pool+0x48/0x60
      [   28.401442] lr : try_to_grab_pending+0x6c/0x1b0
      [   28.406018] sp : ffff80001391bc60
      [   28.409358] x29: ffff80001391bc60 x28: 0000000000000000
      [   28.414725] x27: ffff000104fc8b40 x26: ffff80001127de88
      [   28.420089] x25: 0000000000000000 x24: ffff000106119760
      [   28.425452] x23: ffff00010775dd60 x22: ffff00010567e000
      [   28.430814] x21: 0000000000000000 x20: ffff80001391bcb0
      [   28.436175] x19: ffff00010775deb8 x18: 00000000000000c0
      [   28.441537] x17: 0000000000000000 x16: 000000008d9b0e88
      [   28.446898] x15: 0000000000000001 x14: 00000000000002ba
      [   28.452261] x13: 80a3002c00000002 x12: 00000000000005f4
      [   28.457622] x11: 0000000000000030 x10: 000000000000000c
      [   28.462985] x9 : 000000000000000c x8 : 0000000000000030
      [   28.468346] x7 : ffff800014400000 x6 : ffff000106119758
      [   28.473708] x5 : 0000000000000003 x4 : ffff00010775dc60
      [   28.479068] x3 : 0000000000000000 x2 : 0000000000000060
      [   28.484429] x1 : 000071775f776600 x0 : ffff00010775deb8
      [   28.489791] Call trace:
      [   28.492259]  get_work_pool+0x48/0x60
      [   28.495874]  cancel_delayed_work+0x38/0xb0
      [   28.500011]  prestera_port_handle_event+0x90/0xa0 [prestera]
      [   28.505743]  prestera_evt_recv+0x98/0xe0 [prestera]
      [   28.510683]  prestera_fw_evt_work_fn+0x180/0x228 [prestera_pci]
      [   28.516660]  process_one_work+0x1e8/0x360
      [   28.520710]  worker_thread+0x44/0x480
      [   28.524412]  kthread+0x154/0x160
      [   28.527670]  ret_from_fork+0x10/0x38
      [   28.531290] Code: a8c17bfd d50323bf d65f03c0 9278dc21 (f9400020)
      [   28.537429] ---[ end trace 5eced933df3a080b ]---
      
      Fixes: 501ef306 ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
      Signed-off-by: default avatarVadym Kochan <vkochan@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33398048
    • Stefano Garzarella's avatar
      vsock/virtio: free queued packets when closing socket · 8432b811
      Stefano Garzarella authored
      As reported by syzbot [1], there is a memory leak while closing the
      socket. We partially solved this issue with commit ac03046e
      ("vsock/virtio: free packets during the socket release"), but we
      forgot to drain the RX queue when the socket is definitely closed by
      the scheduled work.
      
      To avoid future issues, let's use the new virtio_transport_remove_sock()
      to drain the RX queue before removing the socket from the af_vsock lists
      calling vsock_remove_sock().
      
      [1] https://syzkaller.appspot.com/bug?extid=24452624fc4c571eedd9
      
      Fixes: ac03046e ("vsock/virtio: free packets during the socket release")
      Reported-and-tested-by: syzbot+24452624fc4c571eedd9@syzkaller.appspotmail.com
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8432b811
    • David S. Miller's avatar
      Merge branch 'sfc-txq-lookups' · eeddfd8e
      David S. Miller authored
      Edward Cree says:
      
      ====================
      sfc: fix TXQ lookups
      
      The TXQ handling changes in 12804793 ("sfc: decouple TXQ type from label")
       which were made as part of the support for encap offloads on EF10 caused some
       breakage on Siena (5000- and 6000-series) NICs, which caused null-dereference
       kernel panics.
      This series fixes those issues, and also a similarly incorrect code-path on
       EF10 which worked by chance.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeddfd8e
    • Edward Cree's avatar
      sfc: ef10: fix TX queue lookup in TX event handling · 172e269e
      Edward Cree authored
      We're starting from a TXQ label, not a TXQ type, so
       efx_channel_get_tx_queue() is inappropriate.  This worked by chance,
       because labels and types currently match on EF10, but we shouldn't
       rely on that.
      
      Fixes: 12804793 ("sfc: decouple TXQ type from label")
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      172e269e
    • Edward Cree's avatar
      sfc: farch: fix TX queue lookup in TX event handling · 83b09a18
      Edward Cree authored
      We're starting from a TXQ label, not a TXQ type, so
       efx_channel_get_tx_queue() is inappropriate (and could return NULL,
       leading to panics).
      
      Fixes: 12804793 ("sfc: decouple TXQ type from label")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83b09a18
    • Edward Cree's avatar
      sfc: farch: fix TX queue lookup in TX flush done handling · 5b1faa92
      Edward Cree authored
      We're starting from a TXQ instance number ('qid'), not a TXQ type, so
       efx_get_tx_queue() is inappropriate (and could return NULL, leading
       to panics).
      
      Fixes: 12804793 ("sfc: decouple TXQ type from label")
      Reported-by: default avatarTrevor Hemsley <themsley@voiceflex.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEdward Cree <ecree.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b1faa92
  3. 19 Apr, 2021 7 commits
  4. 17 Apr, 2021 6 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 88a5af94
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.12-rc8, including fixes from netfilter, and
        bpf. BPF verifier changes stand out, otherwise things have slowed
        down.
      
        Current release - regressions:
      
         - gro: ensure frag0 meets IP header alignment
      
         - Revert "net: stmmac: re-init rx buffers when mac resume back"
      
         - ethernet: macb: fix the restore of cmp registers
      
        Previous releases - regressions:
      
         - ixgbe: Fix NULL pointer dereference in ethtool loopback test
      
         - ixgbe: fix unbalanced device enable/disable in suspend/resume
      
         - phy: marvell: fix detection of PHY on Topaz switches
      
         - make tcp_allowed_congestion_control readonly in non-init netns
      
         - xen-netback: Check for hotplug-status existence before watching
      
        Previous releases - always broken:
      
         - bpf: mitigate a speculative oob read of up to map value size by
           tightening the masking window
      
         - sctp: fix race condition in sctp_destroy_sock
      
         - sit, ip6_tunnel: Unregister catch-all devices
      
         - netfilter: nftables: clone set element expression template
      
         - netfilter: flowtable: fix NAT IPv6 offload mangling
      
         - net: geneve: check skb is large enough for IPv4/IPv6 header
      
         - netlink: don't call ->netlink_bind with table lock held"
      
      * tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
        netlink: don't call ->netlink_bind with table lock held
        MAINTAINERS: update my email
        bpf: Update selftests to reflect new error states
        bpf: Tighten speculative pointer arithmetic mask
        bpf: Move sanitize_val_alu out of op switch
        bpf: Refactor and streamline bounds check into helper
        bpf: Improve verifier error messages for users
        bpf: Rework ptr_limit into alu_limit and add common error path
        bpf: Ensure off_reg has no mixed signed bounds for all types
        bpf: Move off_reg into sanitize_ptr_alu
        bpf: Use correct permission flag for mixed signed bounds arithmetic
        ch_ktls: do not send snd_una update to TCB in middle
        ch_ktls: tcb close causes tls connection failure
        ch_ktls: fix device connection close
        ch_ktls: Fix kernel panic
        i40e: fix the panic when running bpf in xdpdrv mode
        net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta
        net/mlx5e: Fix setting of RS FEC mode
        net/mlx5: Fix setting of devlink traps in switchdev mode
        Revert "net: stmmac: re-init rx buffers when mac resume back"
        ...
      88a5af94
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-for-5.12-rc8' of... · bdfd99e6
      Linus Torvalds authored
      Merge tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
      
      Pull libnvdimm fixes from Dan Williams:
       "The largest change is for a regression that landed during -rc1 for
        block-device read-only handling. Vaibhav found a new use for the
        ability (originally introduced by virtio_pmem) to call back to the
        platform to flush data, but also found an original bug in that
        implementation. Lastly, Arnd cleans up some compile warnings in dax.
      
        This has all appeared in -next with no reported issues.
      
        Summary:
      
         - Fix a regression of read-only handling in the pmem driver
      
         - Fix a compile warning
      
         - Fix support for platform cache flush commands on powerpc/papr"
      
      * tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC
        libnvdimm: Notify disk drivers to revalidate region read-only
        dax: avoid -Wempty-body warnings
      bdfd99e6
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 7c226774
      Linus Torvalds authored
      Pull CXL memory class fixes from Dan Williams:
       "A collection of fixes for the CXL memory class driver introduced in
        this release cycle.
      
        The driver was primarily developed on a work-in-progress QEMU
        emulation of the interface and we have since found a couple places
        where it hid spec compliance bugs in the driver, or had a spec
        implementation bug itself.
      
        The biggest change here is replacing a percpu_ref with an rwsem to
        cleanup a couple bugs in the error unwind path during ioctl device
        init. Lastly there were some minor cleanups to not export the
        power-management sysfs-ABI for the ioctl device, use the proper sysfs
        helper for emitting values, and prevent subtle bugs as new
        administration commands are added to the supported list.
      
        The bulk of it has appeared in -next save for the top commit which was
        found today and validated on a fixed-up QEMU model.
      
        Summary:
      
         - Fix support for CXL memory devices with registers offset from the
           BAR base.
      
         - Fix the reporting of device capacity.
      
         - Fix the driver commands list definition to be disconnected from the
           UAPI command list.
      
         - Replace percpu_ref with rwsem to fix initialization error path.
      
         - Fix leaks in the driver initialization error path.
      
         - Drop the power/ directory from CXL device sysfs.
      
         - Use the recommended sysfs helper for attribute 'show'
           implementations"
      
      * tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/mem: Fix memory device capacity probing
        cxl/mem: Fix register block offset calculation
        cxl/mem: Force array size of mem_commands[] to CXL_MEM_COMMAND_ID_MAX
        cxl/mem: Disable cxl device power management
        cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
        cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
        cxl/mem: Use sysfs_emit() for attribute show routines
      7c226774
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · fdb5d6ca
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "12 patches.
      
        Subsystems affected by this patch series: mm (documentation, kasan,
        and pagemap), csky, ia64, gcov, and lib"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib: remove "expecting prototype" kernel-doc warnings
        gcov: clang: fix clang-11+ build
        mm: ptdump: fix build failure
        mm/mapping_dirty_helpers: guard hugepage pud's usage
        ia64: tools: remove duplicate definition of ia64_mf() on ia64
        ia64: tools: remove inclusion of ia64-specific version of errno.h header
        ia64: fix discontig.c section mismatches
        ia64: remove duplicate entries in generic_defconfig
        csky: change a Kconfig symbol name to fix e1000 build error
        kasan: remove redundant config option
        kasan: fix hwasan build for gcc
        mm: eliminate "expecting prototype" kernel-doc warnings
      fdb5d6ca
    • Dan Williams's avatar
      cxl/mem: Fix memory device capacity probing · fae8817a
      Dan Williams authored
      The CXL Identify Memory Device output payload emits capacity in 256MB
      units. The driver is treating the capacity field as bytes. This was
      missed because QEMU reports bytes when it should report bytes / 256MB.
      
      Fixes: 8adaf747 ("cxl/mem: Find device capabilities")
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Cc: Ben Widawsky <ben.widawsky@intel.com>
      Link: https://lore.kernel.org/r/161862021044.3259705.7008520073059739760.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      fae8817a
    • Florian Westphal's avatar
      netlink: don't call ->netlink_bind with table lock held · f2764bd4
      Florian Westphal authored
      When I added support to allow generic netlink multicast groups to be
      restricted to subscribers with CAP_NET_ADMIN I was unaware that a
      genl_bind implementation already existed in the past.
      
      It was reverted due to ABBA deadlock:
      
      1. ->netlink_bind gets called with the table lock held.
      2. genetlink bind callback is invoked, it grabs the genl lock.
      
      But when a new genl subsystem is (un)registered, these two locks are
      taken in reverse order.
      
      One solution would be to revert again and add a comment in genl
      referring 1e82a62f, "genetlink: remove genl_bind").
      
      This would need a second change in mptcp to not expose the raw token
      value anymore, e.g.  by hashing the token with a secret key so userspace
      can still associate subflow events with the correct mptcp connection.
      
      However, Paolo Abeni reminded me to double-check why the netlink table is
      locked in the first place.
      
      I can't find one.  netlink_bind() is already called without this lock
      when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt.
      Same holds for the netlink_unbind operation.
      
      Digging through the history, commit f7736080
      ("netlink: access nlk groups safely in netlink bind and getname")
      expanded the lock scope.
      
      commit 3a20773b ("net: netlink: cap max groups which will be considered in netlink_bind()")
      ... removed the nlk->ngroups access that the lock scope
      extension was all about.
      
      Reduce the lock scope again and always call ->netlink_bind without
      the table lock.
      
      The Fixes tag should be vs. the patch mentioned in the link below,
      but that one got squash-merged into the patch that came earlier in the
      series.
      
      Fixes: 4d54cc32 ("mptcp: avoid lock_fast usage in accept path")
      Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/T/#u
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Sean Tranchetti <stranche@codeaurora.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2764bd4
  5. 16 Apr, 2021 7 commits