1. 16 May, 2020 16 commits
  2. 15 May, 2020 24 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2020-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ea6119aa
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2020-05-15
      
      mlx5 core and mlx5e (netdev) updates:
      
      1) Two fixes for release all FW pages support.
      2) Improvement in calculating the send queue stop room on tx
      3) Flow steering auto-groups creation improvements
      4) TC offload fix for Connection tracking with NAT action
      5) IPoIB support for self looback to allow communication between ipoib
      pkey child interfaces on the same host.
      6) DCBNL cleanup to avoid #ifdef DCBNL all over the main mlx5e code
      7) Small and trivial code cleanup
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea6119aa
    • Nathan Chancellor's avatar
      ethernet: ti: am65-cpts: Add missing inline qualifier to stub functions · 2ea46dc6
      Nathan Chancellor authored
      When building with Clang:
      
      In file included from drivers/net/ethernet/ti/am65-cpsw-ethtool.c:15:
      drivers/net/ethernet/ti/am65-cpts.h:58:12: warning: unused function
      'am65_cpts_ns_gettime' [-Wunused-function]
      static s64 am65_cpts_ns_gettime(struct am65_cpts *cpts)
                 ^
      drivers/net/ethernet/ti/am65-cpts.h:63:12: warning: unused function
      'am65_cpts_estf_enable' [-Wunused-function]
      static int am65_cpts_estf_enable(struct am65_cpts *cpts,
                 ^
      drivers/net/ethernet/ti/am65-cpts.h:69:13: warning: unused function
      'am65_cpts_estf_disable' [-Wunused-function]
      static void am65_cpts_estf_disable(struct am65_cpts *cpts, int idx)
                  ^
      3 warnings generated.
      
      These functions need to be marked as inline, which adds __maybe_unused,
      to avoid these warnings, which is the pattern for stub functions.
      
      Fixes: ec008fa2 ("ethernet: ti: am65-cpts: add routines to support taprio offload")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1026Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ea46dc6
    • Tariq Toukan's avatar
      net/mlx5e: Take DCBNL-related definitions into dedicated files · 3f3ab178
      Tariq Toukan authored
      Take DCBNL-related definitions out of the common en.h header,
      Use a dedicated header file for exposing them.
      Some need not to be exposed, use them locally in the .c file.
      Use stubs to eliminate use of CONFIG_MLX5_CORE_EN_DCB in the
      generic control flows.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3f3ab178
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Calculate SQ stop room in a robust way · 5ffb4d85
      Maxim Mikityanskiy authored
      Currently, different formulas are used to estimate the space that may be
      taken by WQEs in the SQ during a single packet transmit. This space is
      called stop room, and it's checked in the end of packet transmit to find
      out if the next packet could overflow the SQ. If it could, the driver
      tells the kernel to stop sending next packets.
      
      Many factors affect the stop room:
      
      1. Padding with NOPs to avoid WQEs spanning over page boundaries.
      
      2. Enabled and disabled offloads (TLS, upcoming MPWQE).
      
      3. The maximum size of a WQE.
      
      The padding is performed before every WQE if it doesn't fit the current
      page.
      
      The current formula assumes that only one padding will be required per
      packet, and it doesn't take into account that the WQEs posted during the
      transmission of a single packet might exceed the page size in very rare
      circumstances. For example, to hit this condition with 4096-byte pages,
      TLS offload will have to interrupt an almost-full MPWQE session, be in
      the resync flow and try to transmit a near to maximum amount of data.
      
      To avoid SQ overflows in such rare cases after MPWQE is added, this
      patch introduces a more robust formula to estimate the stop room. The
      new formula uses the fact that a WQE of size X will not require more
      than X-1 WQEBBs of padding. More exact estimations are possible, but
      they result in much more complex and error-prone code for little gain.
      
      Before this patch, the TLS stop room included space for both INNOVA and
      ConnectX TLS offloads that couldn't run at the same time anyway, so this
      patch accounts only for the active one.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5ffb4d85
    • Erez Shitrit's avatar
      net/mlx5e: IPoIB, Drop multicast packets that this interface sent · 8b46d424
      Erez Shitrit authored
      After enabled loopback packets for IPoIB, we need to drop these packets
      that this HCA has replicated and came back to the same interface that
      sent them.
      
      Fixes: 4c6c615e ("net/mlx5e: IPoIB, Add PKEY child interface nic profile")
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8b46d424
    • Erez Shitrit's avatar
      net/mlx5e: IPoIB, Enable loopback packets for IPoIB interfaces · 80639b19
      Erez Shitrit authored
      Enable loopback of unicast and multicast traffic for IPoIB enhanced
      mode.
      This will allow interfaces with the same pkey to communicate between
      them e.g cloned interfaces that located in different namespaces.
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      80639b19
    • Roi Dayan's avatar
      net/mlx5e: CT: Fix offload with CT action after CT NAT action · 9102d836
      Roi Dayan authored
      It could be a chain of rules will do action CT again after CT NAT
      Before this fix matching will break as we get into the CT table
      after NAT changes and not CT NAT.
      Fix this by adding pre ct and pre ct nat tables to skip ct/ct_nat
      tables and go straight to post_ct table if ct/nat was already done.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9102d836
    • Eran Ben Elisha's avatar
      net/mlx5: Move internal timer read function to clock library · 90bf1c8d
      Eran Ben Elisha authored
      Move mlx5_read_internal_timer() into lib/clock.c file as it is being
      used there. As such, make this function a static one.
      
      In addition, rearrange headers include to support function move.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      90bf1c8d
    • Paul Blakey's avatar
      net/mlx5: Wait for inactive autogroups · 49c0355d
      Paul Blakey authored
      Currently, if one thread tries to add an entry to an autogrouped table
      with no free matching group, while another thread is in the process of
      creating a new matching autogroup, it doesn't wait for the new group
      creation, and creates an unnecessary new autogroup.
      
      Instead of skipping inactive, wait on the write lock of those groups.
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      49c0355d
    • Parav Pandit's avatar
      net/mlx5: Drain wq first during PCI device removal · 41798df9
      Parav Pandit authored
      mlx5_unload_one() is done with cleanup = true only once.
      
      So instead of doing health wq drain inside the if(), directly do
      during PCI device removal.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      41798df9
    • Parav Pandit's avatar
      net/mlx5: Have single error unwinding path · 4162f58b
      Parav Pandit authored
      Having multiple error unwinding path are error prone.
      Lets have just one error unwinding path.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4162f58b
    • Eran Ben Elisha's avatar
      net/mlx5: Fix a bug of releasing wrong chunks on > 4K page size systems · e7f860e2
      Eran Ben Elisha authored
      On systems with page size larger than 4K, a fwp object has few 4K chunks.
      Fix a bug in fwp free flow where the chunk address was dropped and
      fwp->addr was used instead (first chunk address). This caused a wrong
      update of fwp->bitmask which later can cause errors in re-alloc fwp
      chunk flow.
      
      In order to fix this it, re-factor the release flow:
      - Free 4k: Releases a specific 4k chunk inside the fwp, defined by
        starting address.
      - Free fwp: Unconditionally release the whole fwp and its resources.
      Free addr will call free fwp if all chunks were released, in order to do
      code sharing.
      
      In addition, fix npages to count for all released chunks correctly.
      
      Fixes: c6168161 ("net/mlx5: Add support for release all pages event")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e7f860e2
    • Eran Ben Elisha's avatar
      net/mlx5: Dedicate fw page to the requesting function · 2726cd4a
      Eran Ben Elisha authored
      The cited patch assumes that all chuncks in a fw page belong to the same
      function, thus the driver must dedicate fw page to the requesting
      function, which is actually what was intedned in the original fw pages
      allocator design, hence the fwp->func_id !
      
      Up until the cited patch everything worked ok, but now "relase all pages"
      is broken on systems with page_size > 4k.
      
      Fix this by dedicating fw page to the requesting function id via adding a
      func_id parameter to alloc_4k() function.
      
      Fixes: c6168161 ("net/mlx5: Add support for release all pages event")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2726cd4a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · da07f52d
      David S. Miller authored
      Move the bpf verifier trace check into the new switch statement in
      HEAD.
      
      Resolve the overlapping changes in hinic, where bug fixes overlap
      the addition of VF support.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da07f52d
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f85c1598
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix sk_psock reference count leak on receive, from Xiyu Yang.
      
       2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.
      
       3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
          from Maciej Żenczykowski.
      
       4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
          Abeni.
      
       5) Fix netprio cgroup v2 leak, from Zefan Li.
      
       6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.
      
       7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.
      
       8) Various bpf probe handling fixes, from Daniel Borkmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
        selftests: mptcp: pm: rm the right tmp file
        dpaa2-eth: properly handle buffer size restrictions
        bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
        bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
        bpf: Restrict bpf_probe_read{, str}() only to archs where they work
        MAINTAINERS: Mark networking drivers as Maintained.
        ipmr: Add lockdep expression to ipmr_for_each_table macro
        ipmr: Fix RCU list debugging warning
        drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
        net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
        tcp: fix error recovery in tcp_zerocopy_receive()
        MAINTAINERS: Add Jakub to networking drivers.
        MAINTAINERS: another add of Karsten Graul for S390 networking
        drivers: ipa: fix typos for ipa_smp2p structure doc
        pppoe: only process PADT targeted at local interfaces
        selftests/bpf: Enforce returning 0 for fentry/fexit programs
        bpf: Enforce returning 0 for fentry/fexit progs
        net: stmmac: fix num_por initialization
        security: Fix the default value of secid_to_secctx hook
        libbpf: Fix register naming in PT_REGS s390 macros
        ...
      f85c1598
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · d5dfe4f1
      Linus Torvalds authored
      Pull rdma fixes from Jason Gunthorpe:
       "A few minor bug fixes for user visible defects, and one regression:
      
         - Various bugs from static checkers and syzkaller
      
         - Add missing error checking in mlx4
      
         - Prevent RTNL lock recursion in i40iw
      
         - Fix segfault in cxgb4 in peer abort cases
      
         - Fix a regression added in 5.7 where the IB_EVENT_DEVICE_FATAL could
           be lost, and wasn't delivered to all the FDs"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/uverbs: Move IB_EVENT_DEVICE_FATAL to destroy_uobj
        RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event
        RDMA/iw_cxgb4: Fix incorrect function parameters
        RDMA/core: Fix double put of resource
        IB/core: Fix potential NULL pointer dereference in pkey cache
        IB/hfi1: Fix another case where pq is left on waitlist
        IB/i40iw: Remove bogus call to netdev_master_upper_dev_get()
        IB/mlx4: Test return value of calls to ib_get_cached_pkey
        RDMA/rxe: Always return ERR_PTR from rxe_create_mmap_info()
        i40iw: Fix error handling in i40iw_manage_arp_cache()
      d5dfe4f1
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-5.7-rc6' of... · ce247296
      Linus Torvalds authored
      Merge tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fixes from Shuah Khan:
      
       - lkdtm runner fixes to prevent dmesg clearing and shellcheck errors
      
       - ftrace test handling when test module doesn't exist
      
       - nsfs test fix to replace zero-length array with flexible-array
      
       - dmabuf-heaps test fix to return clear error value
      
      * tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/lkdtm: Use grep -E instead of egrep
        selftests/lkdtm: Don't clear dmesg when running tests
        selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist
        tools/testing: Replace zero-length array with flexible-array
        kselftests: dmabuf-heaps: Fix confused return value on expected error testing
      ce247296
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 67e45621
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
       "A handful of build fixes, all found by Huawei's autobuilder.
      
        None of these patches should have any functional impact on kernels
        that build, and they're mostly related to various features
        intermingling with !MMU.
      
        While some of these might be better hoisted to generic code, it seems
        better to have the simple fixes in the meanwhile.
      
        As far as I know these are the only outstanding patches for 5.7"
      
      * tag 'riscv-for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: mmiowb: Fix implicit declaration of function 'smp_processor_id'
        riscv: pgtable: Fix __kernel_map_pages build error if NOMMU
        riscv: Make SYS_SUPPORTS_HUGETLBFS depends on MMU
        riscv: Disable ARCH_HAS_DEBUG_VIRTUAL if NOMMU
        riscv: Add pgprot_writecombine/device and PAGE_SHARED defination if NOMMU
        riscv: stacktrace: Fix undefined reference to `walk_stackframe'
        riscv: Fix unmet direct dependencies built based on SOC_VIRT
        riscv: perf: RISCV_BASE_PMU should be independent
        riscv: perf_event: Make some funciton static
      67e45621
    • David S. Miller's avatar
      Merge branch 'mptcp-fix-MP_JOIN-failure-handling' · 93d43e58
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      mptcp: fix MP_JOIN failure handling
      
      Currently if we hit an MP_JOIN failure on the third ack, the child socket is
      closed with reset, but the request socket is not deleted, causing weird
      behaviors.
      
      The main problem is that MPTCP's MP_JOIN code needs to plug it's own
      'valid 3rd ack' checks and the current TCP callbacks do not allow that.
      
      This series tries to address the above shortcoming introducing a new MPTCP
      specific bit in a 'struct tcp_request_sock' hole, and leveraging that to allow
      tcp_check_req releasing the request socket when needed.
      
      The above allows cleaning-up a bit current MPTCP hooking in tcp_check_req().
      
      An alternative solution, possibly cleaner but more invasive, would be
      changing the 'bool *own_req' syn_recv_sock() argument into 'int *req_status'
      and let MPTCP set it to 'REQ_DROP'.
      
      v1 -> v2:
       - be more conservative about drop_req initialization
      
      RFC -> v1:
       - move the drop_req bit inside tcp_request_sock (Eric)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93d43e58
    • Paolo Abeni's avatar
      mptcp: cope better with MP_JOIN failure · 729cd643
      Paolo Abeni authored
      Currently, on MP_JOIN failure we reset the child
      socket, but leave the request socket untouched.
      
      tcp_check_req will deal with it according to the
      'tcp_abort_on_overflow' sysctl value - by default the
      req socket will stay alive.
      
      The above leads to inconsistent behavior on MP JOIN
      failure, and bad listener overflow accounting.
      
      This patch addresses the issue leveraging the infrastructure
      just introduced to ask the TCP stack to drop the req on
      failure.
      
      The child socket is not freed anymore by subflow_syn_recv_sock(),
      instead it's moved to a dead state and will be disposed by the
      next sock_put done by the TCP stack, so that listener overflow
      accounting is not affected by MP JOIN failure.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      729cd643
    • Paolo Abeni's avatar
      inet_connection_sock: factor out destroy helper. · 2f8a397d
      Paolo Abeni authored
      Move the steps to prepare an inet_connection_sock for
      forced disposal inside a separate helper. No functional
      changes inteded, this will just simplify the next patch.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f8a397d
    • Paolo Abeni's avatar
      mptcp: add new sock flag to deal with join subflows · 90bf4513
      Paolo Abeni authored
      MP_JOIN subflows must not land into the accept queue.
      Currently tcp_check_req() calls an mptcp specific helper
      to detect such scenario.
      
      Such helper leverages the subflow context to check for
      MP_JOIN subflows. We need to deal also with MP JOIN
      failures, even when the subflow context is not available
      due allocation failure.
      
      A possible solution would be changing the syn_recv_sock()
      signature to allow returning a more descriptive action/
      error code and deal with that in tcp_check_req().
      
      Since the above need is MPTCP specific, this patch instead
      uses a TCP request socket hole to add a MPTCP specific flag.
      Such flag is used by the MPTCP syn_recv_sock() to tell
      tcp_check_req() how to deal with the request socket.
      
      This change is a no-op for !MPTCP build, and makes the
      MPTCP code simpler. It allows also the next patch to deal
      correctly with MP JOIN failure.
      
      v1 -> v2:
       - be more conservative on drop_req initialization (Mat)
      
      RFC -> v1:
       - move the drop_req bit inside tcp_request_sock (Eric)
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90bf4513
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 01d8a748
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "Fix flush_icache_range() second argument in machine_kexec() to be an
        address rather than size"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: fix the flush_icache_range arguments in machine_kexec
      01d8a748
    • Oleksij Rempel's avatar
      net: phy: tja11xx: execute cable test on link up · ca1c933b
      Oleksij Rempel authored
      A typical 100Base-T1 link should be always connected. If the link is in
      a shot or open state, it is a failure. In most cases, we won't be able
      to automatically handle this issue, but we need to log it or notify user
      (if possible).
      
      With this patch, the cable will be tested on "ip l s dev .. up" attempt
      and send ethnl notification to the user space.
      
      This patch was tested with TJA1102 PHY and "ethtool --monitor" command.
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca1c933b