1. 03 Jun, 2021 14 commits
  2. 02 Jun, 2021 15 commits
    • Wong Vee Khee's avatar
      net: stmmac: fix issue where clk is being unprepared twice · ab00f3e0
      Wong Vee Khee authored
      In the case of MDIO bus registration failure due to no external PHY
      devices is connected to the MAC, clk_disable_unprepare() is called in
      stmmac_bus_clk_config() and intel_eth_pci_probe() respectively.
      
      The second call in intel_eth_pci_probe() will caused the following:-
      
      [   16.578605] intel-eth-pci 0000:00:1e.5: No PHY found
      [   16.583778] intel-eth-pci 0000:00:1e.5: stmmac_dvr_probe: MDIO bus (id: 2) registration failed
      [   16.680181] ------------[ cut here ]------------
      [   16.684861] stmmac-0000:00:1e.5 already disabled
      [   16.689547] WARNING: CPU: 13 PID: 2053 at drivers/clk/clk.c:952 clk_core_disable+0x96/0x1b0
      [   16.697963] Modules linked in: dwc3 iTCO_wdt mei_hdcp iTCO_vendor_support udc_core x86_pkg_temp_thermal kvm_intel marvell10g kvm sch_fq_codel nfsd irqbypass dwmac_intel(+) stmmac uio ax88179_178a pcs_xpcs phylink uhid spi_pxa2xx_platform usbnet mei_me pcspkr tpm_crb mii i2c_i801 dw_dmac dwc3_pci thermal dw_dmac_core intel_rapl_msr libphy i2c_smbus mei tpm_tis intel_th_gth tpm_tis_core tpm intel_th_acpi intel_pmc_core intel_th i915 fuse configfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore
      [   16.746785] CPU: 13 PID: 2053 Comm: systemd-udevd Tainted: G     U            5.13.0-rc3-intel-lts #76
      [   16.756134] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020
      [   16.769465] RIP: 0010:clk_core_disable+0x96/0x1b0
      [   16.774222] Code: 00 8b 05 45 96 17 01 85 c0 7f 24 48 8b 5b 30 48 85 db 74 a5 8b 43 7c 85 c0 75 93 48 8b 33 48 c7 c7 6e 32 cc b7 e8 b2 5d 52 00 <0f> 0b 5b 5d c3 65 8b 05 76 31 18 49 89 c0 48 0f a3 05 bc 92 1a 01
      [   16.793016] RSP: 0018:ffffa44580523aa0 EFLAGS: 00010086
      [   16.798287] RAX: 0000000000000000 RBX: ffff8d7d0eb70a00 RCX: 0000000000000000
      [   16.805435] RDX: 0000000000000002 RSI: ffffffffb7c62d5f RDI: 00000000ffffffff
      [   16.812610] RBP: 0000000000000287 R08: 0000000000000000 R09: ffffa445805238d0
      [   16.819759] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8d7d0eb70a00
      [   16.826904] R13: ffff8d7d027370c8 R14: 0000000000000006 R15: ffffa44580523ad0
      [   16.834047] FS:  00007f9882fa2600(0000) GS:ffff8d80a0940000(0000) knlGS:0000000000000000
      [   16.842177] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.847966] CR2: 00007f9882bea3d8 CR3: 000000010b126001 CR4: 0000000000370ee0
      [   16.855144] Call Trace:
      [   16.857614]  clk_core_disable_lock+0x1b/0x30
      [   16.861941]  intel_eth_pci_probe.cold+0x11d/0x136 [dwmac_intel]
      [   16.867913]  pci_device_probe+0xcf/0x150
      [   16.871890]  really_probe+0xf5/0x3e0
      [   16.875526]  driver_probe_device+0x64/0x150
      [   16.879763]  device_driver_attach+0x53/0x60
      [   16.883998]  __driver_attach+0x9f/0x150
      [   16.887883]  ? device_driver_attach+0x60/0x60
      [   16.892288]  ? device_driver_attach+0x60/0x60
      [   16.896698]  bus_for_each_dev+0x77/0xc0
      [   16.900583]  bus_add_driver+0x184/0x1f0
      [   16.904469]  driver_register+0x6c/0xc0
      [   16.908268]  ? 0xffffffffc07ae000
      [   16.911598]  do_one_initcall+0x4a/0x210
      [   16.915489]  ? kmem_cache_alloc_trace+0x305/0x4e0
      [   16.920247]  do_init_module+0x5c/0x230
      [   16.924057]  load_module+0x2894/0x2b70
      [   16.927857]  ? __do_sys_finit_module+0xb5/0x120
      [   16.932441]  __do_sys_finit_module+0xb5/0x120
      [   16.936845]  do_syscall_64+0x42/0x80
      [   16.940476]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.945586] RIP: 0033:0x7f98830e5ccd
      [   16.949177] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48
      [   16.967970] RSP: 002b:00007ffc66b60168 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [   16.975583] RAX: ffffffffffffffda RBX: 000055885de35ef0 RCX: 00007f98830e5ccd
      [   16.982725] RDX: 0000000000000000 RSI: 00007f98832541e3 RDI: 0000000000000012
      [   16.989868] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
      [   16.997042] R10: 0000000000000012 R11: 0000000000000246 R12: 00007f98832541e3
      [   17.004222] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffc66b60328
      [   17.011369] ---[ end trace df06a3dab26b988c ]---
      [   17.016062] ------------[ cut here ]------------
      [   17.020701] stmmac-0000:00:1e.5 already unprepared
      
      Removing the stmmac_bus_clks_config() call in stmmac_dvr_probe and let
      dwmac-intel to handle the unprepare and disable of the clk device.
      
      Fixes: 5ec55823 ("net: stmmac: add clocks management for gmac driver")
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Reviewed-by: default avatarJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab00f3e0
    • Josh Triplett's avatar
      net: ipconfig: Don't override command-line hostnames or domains · b508d5fb
      Josh Triplett authored
      If the user specifies a hostname or domain name as part of the ip=
      command-line option, preserve it and don't overwrite it with one
      supplied by DHCP/BOOTP.
      
      For instance, ip=::::myhostname::dhcp will use "myhostname" rather than
      ignoring and overwriting it.
      
      Fix the comment on ic_bootp_string that suggests it only copies a string
      "if not already set"; it doesn't have any such logic.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b508d5fb
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-06-01' of git://git.kernel.org/pub/scm/linu · dd627662
      David S. Miller authored
      x/kernel/git/saeed/linux
      
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2021-06-01
      
      This series introduces some fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd627662
    • Daniel Borkmann's avatar
      bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks · ff40e510
      Daniel Borkmann authored
      Commit 59438b46 ("security,lockdown,selinux: implement SELinux lockdown")
      added an implementation of the locked_down LSM hook to SELinux, with the aim
      to restrict which domains are allowed to perform operations that would breach
      lockdown. This is indirectly also getting audit subsystem involved to report
      events. The latter is problematic, as reported by Ondrej and Serhei, since it
      can bring down the whole system via audit:
      
        1) The audit events that are triggered due to calls to security_locked_down()
           can OOM kill a machine, see below details [0].
      
        2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit()
           when trying to wake up kauditd, for example, when using trace_sched_switch()
           tracepoint, see details in [1]. Triggering this was not via some hypothetical
           corner case, but with existing tools like runqlat & runqslower from bcc, for
           example, which make use of this tracepoint. Rough call sequence goes like:
      
           rq_lock(rq) -> -------------------------+
             trace_sched_switch() ->               |
               bpf_prog_xyz() ->                   +-> deadlock
                 selinux_lockdown() ->             |
                   audit_log_end() ->              |
                     wake_up_interruptible() ->    |
                       try_to_wake_up() ->         |
                         rq_lock(rq) --------------+
      
      What's worse is that the intention of 59438b46 to further restrict lockdown
      settings for specific applications in respect to the global lockdown policy is
      completely broken for BPF. The SELinux policy rule for the current lockdown check
      looks something like this:
      
        allow <who> <who> : lockdown { <reason> };
      
      However, this doesn't match with the 'current' task where the security_locked_down()
      is executed, example: httpd does a syscall. There is a tracing program attached
      to the syscall which triggers a BPF program to run, which ends up doing a
      bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does
      the permission check against 'current', that is, httpd in this example. httpd
      has literally zero relation to this tracing program, and it would be nonsensical
      having to write an SELinux policy rule against httpd to let the tracing helper
      pass. The policy in this case needs to be against the entity that is installing
      the BPF program. For example, if bpftrace would generate a histogram of syscall
      counts by user space application:
      
        bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
      
      bpftrace would then go and generate a BPF program from this internally. One way
      of doing it [for the sake of the example] could be to call bpf_get_current_task()
      helper and then access current->comm via one of bpf_probe_read_kernel{,_str}()
      helpers. So the program itself has nothing to do with httpd or any other random
      app doing a syscall here. The BPF program _explicitly initiated_ the lockdown
      check. The allow/deny policy belongs in the context of bpftrace: meaning, you
      want to grant bpftrace access to use these helpers, but other tracers on the
      system like my_random_tracer _not_.
      
      Therefore fix all three issues at the same time by taking a completely different
      approach for the security_locked_down() hook, that is, move the check into the
      program verification phase where we actually retrieve the BPF func proto. This
      also reliably gets the task (current) that is trying to install the BPF tracing
      program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since
      we're moving this out of the BPF helper's fast-path which can be called several
      millions of times per second.
      
      The check is then also in line with other security_locked_down() hooks in the
      system where the enforcement is performed at open/load time, for example,
      open_kcore() for /proc/kcore access or module_sig_check() for module signatures
      just to pick few random ones. What's out of scope in the fix as well as in
      other security_locked_down() hook locations /outside/ of BPF subsystem is that
      if the lockdown policy changes on the fly there is no retrospective action.
      This requires a different discussion, potentially complex infrastructure, and
      it's also not clear whether this can be solved generically. Either way, it is
      out of scope for a suitable stable fix which this one is targeting. Note that
      the breakage is specifically on 59438b46 where it started to rely on 'current'
      as UAPI behavior, and _not_ earlier infrastructure such as 9d1f8be5 ("bpf:
      Restrict bpf when kernel lockdown is in confidentiality mode").
      
      [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says:
      
        I starting seeing this with F-34. When I run a container that is traced with
        BPF to record the syscalls it is doing, auditd is flooded with messages like:
      
        type=AVC msg=audit(1619784520.593:282387): avc:  denied  { confidentiality }
          for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM"
            scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0
              tclass=lockdown permissive=0
      
        This seems to be leading to auditd running out of space in the backlog buffer
        and eventually OOMs the machine.
      
        [...]
        auditd running at 99% CPU presumably processing all the messages, eventually I get:
        Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
        Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded
        Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64
        Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64
        Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64
        Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64
        Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
        Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1
        Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
        [...]
      
      [1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/,
          Serhei Makarov says:
      
        Upstream kernel 5.11.0-rc7 and later was found to deadlock during a
        bpf_probe_read_compat() call within a sched_switch tracepoint. The problem
        is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend
        testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on
        ppc64le. Example stack trace:
      
        [...]
        [  730.868702] stack backtrace:
        [  730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef.166.fc35.x86_64 #1
        [  730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        [  730.873278] Call Trace:
        [  730.873770]  dump_stack+0x7f/0xa1
        [  730.874433]  check_noncircular+0xdf/0x100
        [  730.875232]  __lock_acquire+0x1202/0x1e10
        [  730.876031]  ? __lock_acquire+0xfc0/0x1e10
        [  730.876844]  lock_acquire+0xc2/0x3a0
        [  730.877551]  ? __wake_up_common_lock+0x52/0x90
        [  730.878434]  ? lock_acquire+0xc2/0x3a0
        [  730.879186]  ? lock_is_held_type+0xa7/0x120
        [  730.880044]  ? skb_queue_tail+0x1b/0x50
        [  730.880800]  _raw_spin_lock_irqsave+0x4d/0x90
        [  730.881656]  ? __wake_up_common_lock+0x52/0x90
        [  730.882532]  __wake_up_common_lock+0x52/0x90
        [  730.883375]  audit_log_end+0x5b/0x100
        [  730.884104]  slow_avc_audit+0x69/0x90
        [  730.884836]  avc_has_perm+0x8b/0xb0
        [  730.885532]  selinux_lockdown+0xa5/0xd0
        [  730.886297]  security_locked_down+0x20/0x40
        [  730.887133]  bpf_probe_read_compat+0x66/0xd0
        [  730.887983]  bpf_prog_250599c5469ac7b5+0x10f/0x820
        [  730.888917]  trace_call_bpf+0xe9/0x240
        [  730.889672]  perf_trace_run_bpf_submit+0x4d/0xc0
        [  730.890579]  perf_trace_sched_switch+0x142/0x180
        [  730.891485]  ? __schedule+0x6d8/0xb20
        [  730.892209]  __schedule+0x6d8/0xb20
        [  730.892899]  schedule+0x5b/0xc0
        [  730.893522]  exit_to_user_mode_prepare+0x11d/0x240
        [  730.894457]  syscall_exit_to_user_mode+0x27/0x70
        [  730.895361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [...]
      
      Fixes: 59438b46 ("security,lockdown,selinux: implement SELinux lockdown")
      Reported-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Reported-by: default avatarJakub Hrozek <jhrozek@redhat.com>
      Reported-by: default avatarSerhei Makarov <smakarov@redhat.com>
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: James Morris <jamorris@linux.microsoft.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Frank Eigler <fche@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net
      ff40e510
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches · 8971ee8b
      Pablo Neira Ayuso authored
      The private helper data size cannot be updated. However, updates that
      contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is
      the same.
      
      Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8971ee8b
    • Pablo Neira Ayuso's avatar
      netfilter: nft_ct: skip expectations for confirmed conntrack · 1710eb91
      Pablo Neira Ayuso authored
      nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed
      conntrack entry. However, nf_ct_ext_add() can only be called for
      !nf_ct_is_confirmed().
      
      [ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack]
      [ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack]
      [ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00
      [ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202
      [ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887
      [ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440
      [ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447
      [ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440
      [ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20
      [ 1825.352240] FS:  00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000
      [ 1825.352343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0
      [ 1825.352508] Call Trace:
      [ 1825.352544]  nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack]
      [ 1825.352641]  nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct]
      [ 1825.352716]  nft_do_chain+0x232/0x850 [nf_tables]
      
      Add the ct helper extension only for unconfirmed conntrack. Skip rule
      evaluation if the ct helper extension does not exist. Thus, you can
      only create expectations from the first packet.
      
      It should be possible to remove this limitation by adding a new action
      to attach a generic ct helper to the first packet. Then, use this ct
      helper extension from follow up packets to create the ct expectation.
      
      While at it, add a missing check to skip the template conntrack too
      and remove check for IPCT_UNTRACK which is implicit to !ct.
      
      Fixes: 857b4602 ("netfilter: nft_ct: add ct expectations support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1710eb91
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Create multi-destination flow table with level less than 64 · 216214c6
      Yevgeny Kliteynik authored
      Flow table that contains flow pointing to multiple flow tables or multiple
      TIRs must have a level lower than 64. In our case it applies to muli-
      destination flow table.
      Fix the level of the created table to comply with HW Spec definitions, and
      still make sure that its level lower than SW-owned tables, so that it
      would be possible to point from the multi-destination FW table to SW
      tables.
      
      Fixes: 34583bee ("net/mlx5: DR, Create multi-destination table for SW-steering use")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      216214c6
    • Aya Levin's avatar
      net/mlx5e: Fix conflict with HW TS and CQE compression · 5349cbba
      Aya Levin authored
      When a driver's profile doesn't support a dedicated PTP-RQ,
      configuration of CQE compression while HW TS is configured should fail.
      
      Fixes: 885b8cfb ("net/mlx5e: Update ethtool setting of CQE compression")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5349cbba
    • Aya Levin's avatar
      net/mlx5e: Fix HW TS with CQE compression according to profile · 256f79d1
      Aya Levin authored
      When the driver's profile doesn't support a dedicated PTP-RQ, the PTP
      accuracy of HW TS is affected by the CQE compression. In this case,
      turn off CQE compression. Otherwise, the driver crashes:
      
      BUG: kernel NULL pointer dereference, address:0000000000000018
      ...
      ...
      RIP: 0010:mlx5e_ptp_rx_set_fs+0x25/0x1a0 [mlx5_core]
      ...
      ...
      Call Trace:
       mlx5e_ptp_activate_channel+0xb2/0xf0 [mlx5_core]
       mlx5e_activate_priv_channels+0x3b9/0x8c0 [mlx5_core]
       ? __mutex_unlock_slowpath+0x45/0x2a0
       ? mlx5e_refresh_tirs+0x151/0x1e0 [mlx5_core]
       mlx5e_switch_priv_channels+0x1cd/0x2d0 [mlx5_core]
       ? mlx5e_xdp_allowed+0x150/0x150 [mlx5_core]
       mlx5e_safe_switch_params+0x118/0x3c0 [mlx5_core]
       ? __mutex_lock+0x6e/0x8e0
       ? mlx5e_hwstamp_set+0xa9/0x300 [mlx5_core]
       mlx5e_hwstamp_set+0x194/0x300 [mlx5_core]
       ? dev_ioctl+0x9b/0x3d0
       mlx5i_ioctl+0x37/0x60 [mlx5_core]
       mlx5i_pkey_ioctl+0x12/0x20 [mlx5_core]
       dev_ioctl+0xa9/0x3d0
       sock_ioctl+0x268/0x420
       __x64_sys_ioctl+0x3d8/0x790
       ? lockdep_hardirqs_on_prepare+0xe4/0x190
       do_syscall_64+0x2d/0x40
      entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 960fbfe2 ("net/mlx5e: Allow coexistence of CQE compression and HW TS PTP")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      256f79d1
    • Roi Dayan's avatar
      net/mlx5e: Fix adding encap rules to slow path · 2a2c84fa
      Roi Dayan authored
      On some devices the ignore flow level cap is not supported and we
      shouldn't use it. Setting the dest ft with mlx5_chains_get_tc_end_ft()
      already gives the correct end ft if ignore flow level cap is supported
      or not.
      
      Fixes: 39ac237c ("net/mlx5: E-Switch, Refactor chains and priorities")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2a2c84fa
    • Roi Dayan's avatar
      net/mlx5e: Check for needed capability for cvlan matching · afe93f71
      Roi Dayan authored
      If not supported show an error and return instead of trying to offload
      to the hardware and fail.
      
      Fixes: 699e96dd ("net/mlx5e: Support offloading tc double vlan headers match")
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      afe93f71
    • Moshe Shemesh's avatar
      net/mlx5: Check firmware sync reset requested is set before trying to abort it · 5940e642
      Moshe Shemesh authored
      In case driver sent NACK to firmware on sync reset request, it will get
      sync reset abort event while it didn't set sync reset requested mode.
      Thus, on abort sync reset event handler, driver should check reset
      requested is set before trying to stop sync reset poll.
      
      Fixes: 7dd6df32 ("net/mlx5: Handle sync reset abort event")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5940e642
    • Roi Dayan's avatar
      net/mlx5e: Disable TLS offload for uplink representor · b38742e4
      Roi Dayan authored
      TLS offload is not supported in switchdev mode.
      
      Fixes: 7a9fb35e ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b38742e4
    • Aya Levin's avatar
      net/mlx5e: Fix incompatible casting · d8ec9200
      Aya Levin authored
      Device supports setting of a single fec mode at a time, enforce this
      by bitmap_weight == 1. Input from fec command is in u32, avoid cast to
      unsigned long and use bitmap_from_arr32 to populate bitmap safely.
      
      Fixes: 4bd9d507 ("net/mlx5e: Enforce setting of a single FEC mode")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d8ec9200
    • Joe Perches's avatar
      MAINTAINERS: nfc mailing lists are subscribers-only · b0003726
      Joe Perches authored
      It looks as if the MAINTAINERS entries for the nfc mailing list
      should be updated as I just got a "rejected" bounce from the nfc list.
      
      -------
      Your message to the Linux-nfc mailing-list was rejected for the following
      reasons:
      
      The message is not from a list member
      -------
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0003726
  3. 01 Jun, 2021 10 commits
    • David S. Miller's avatar
      Merge branch 'ktls-use-after-free' · 7c0aee30
      David S. Miller authored
      Maxim Mikityanskiy says:
      
      ====================
      Fix use-after-free after the TLS device goes down and up
      
      This small series fixes a use-after-free bug in the TLS offload code.
      The first patch is a preparation for the second one, and the second is
      the fix itself.
      
      v2 changes:
      
      Remove unneeded EXPORT_SYMBOL_GPL.
      ====================
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c0aee30
    • Maxim Mikityanskiy's avatar
      net/tls: Fix use-after-free after the TLS device goes down and up · c55dcdd4
      Maxim Mikityanskiy authored
      When a netdev with active TLS offload goes down, tls_device_down is
      called to stop the offload and tear down the TLS context. However, the
      socket stays alive, and it still points to the TLS context, which is now
      deallocated. If a netdev goes up, while the connection is still active,
      and the data flow resumes after a number of TCP retransmissions, it will
      lead to a use-after-free of the TLS context.
      
      This commit addresses this bug by keeping the context alive until its
      normal destruction, and implements the necessary fallbacks, so that the
      connection can resume in software (non-offloaded) kTLS mode.
      
      On the TX side tls_sw_fallback is used to encrypt all packets. The RX
      side already has all the necessary fallbacks, because receiving
      non-decrypted packets is supported. The thing needed on the RX side is
      to block resync requests, which are normally produced after receiving
      non-decrypted packets.
      
      The necessary synchronization is implemented for a graceful teardown:
      first the fallbacks are deployed, then the driver resources are released
      (it used to be possible to have a tls_dev_resync after tls_dev_del).
      
      A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
      mode. It's used to skip the RX resync logic completely, as it becomes
      useless, and some objects may be released (for example, resync_async,
      which is allocated and freed by the driver).
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c55dcdd4
    • Maxim Mikityanskiy's avatar
      net/tls: Replace TLS_RX_SYNC_RUNNING with RCU · 05fc8b6c
      Maxim Mikityanskiy authored
      RCU synchronization is guaranteed to finish in finite time, unlike a
      busy loop that polls a flag. This patch is a preparation for the bugfix
      in the next patch, where the same synchronize_net() call will also be
      used to sync with the TX datapath.
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05fc8b6c
    • Jiapeng Chong's avatar
      ethernet: myri10ge: Fix missing error code in myri10ge_probe() · f336d0b9
      Jiapeng Chong authored
      The error code is missing in this code scenario, add the error code
      '-EINVAL' to the return value 'status'.
      
      Eliminate the follow smatch warning:
      
      drivers/net/ethernet/myricom/myri10ge/myri10ge.c:3818 myri10ge_probe()
      warn: missing error code 'status'.
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f336d0b9
    • David S. Miller's avatar
      Merge branch 'virtio_net-build_skb-fixes' · 53d5fa9b
      David S. Miller authored
      Xuan Zhuo says:
      
      ====================
      virtio-net: fix for build_skb()
      
      The logic of this piece is really messy. Fortunately, my refactored patch can be
      completed with a small amount of testing.
      ====================
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53d5fa9b
    • Xuan Zhuo's avatar
      virtio_net: get build_skb() buf by data ptr · 8fb7da9e
      Xuan Zhuo authored
      In the case of merge, the page passed into page_to_skb() may be a head
      page, not the page where the current data is located. So when trying to
      get the buf where the data is located, we should get buf based on
      headroom instead of offset.
      
      This patch solves this problem. But if you don't use this patch, the
      original code can also run, because if the page is not the page of the
      current data, the calculated tailroom will be less than 0, and will not
      enter the logic of build_skb() . The significance of this patch is to
      modify this logical problem, allowing more situations to use
      build_skb().
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fb7da9e
    • Xuan Zhuo's avatar
      virtio-net: fix for unable to handle page fault for address · 5c37711d
      Xuan Zhuo authored
      In merge mode, when xdp is enabled, if the headroom of buf is smaller
      than virtnet_get_headroom(), xdp_linearize_page() will be called but the
      variable of "headroom" is still 0, which leads to wrong logic after
      entering page_to_skb().
      
      [   16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[   16.602175] #PF: supervisor read access in kernel mode
      [   16.603350] #PF: error_code(0x0000) - not-present page
      [   16.604200] PGD 0 P4D 0
      [   16.604686] Oops: 0000 [#1] SMP PTI
      [   16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G    B             5.12.0+ #312
      [   16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
      [   16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
      [   16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
      [   16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
      [   16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
      [   16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
      [   16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
      [   16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
      [   16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
      [   16.618423] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
      [   16.619738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
      [   16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.624047] Call Trace:
      [   16.624525]  ? release_pages+0x24d/0x730
      [   16.625209]  unmap_single_vma+0xa9/0x130
      [   16.625885]  unmap_vmas+0x76/0xf0
      [   16.626480]  exit_mmap+0xa0/0x210
      [   16.627129]  mmput+0x67/0x180
      [   16.627673]  do_exit+0x3d1/0xf10
      [   16.628259]  ? do_user_addr_fault+0x231/0x840
      [   16.629000]  do_group_exit+0x53/0xd0
      [   16.629631]  __x64_sys_exit_group+0x1d/0x20
      [   16.630354]  do_syscall_64+0x3c/0x80
      [   16.630988]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.631828] RIP: 0033:0x7f1a043d0191
      [   16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
      [   16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      [   16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
      [   16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
      [   16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
      [   16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
      [   16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
      [   16.640408] Modules linked in:
      [   16.640958] CR2: ffffecbfff7b43c8
      [   16.641557] ---[ end trace bc4891c6ce46354c ]---
      [   16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
      [   16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
      [   16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
      [   16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
      [   16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
      [   16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
      [   16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
      [   16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
      [   16.652529] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
      [   16.653887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
      [   16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.658290] Kernel panic - not syncing: Fatal exception
      [   16.659613] Kernel Offset: disabled
      [   16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c37711d
    • Alexander Aring's avatar
      net: sock: fix in-kernel mark setting · dd9082f4
      Alexander Aring authored
      This patch fixes the in-kernel mark setting by doing an additional
      sk_dst_reset() which was introduced by commit 50254256 ("sock: Reset
      dst when changing sk_mark via setsockopt"). The code is now shared to
      avoid any further suprises when changing the socket mark value.
      
      Fixes: 84d1c617 ("net: sock: add sock_set_mark")
      Reported-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd9082f4
    • Vladimir Oltean's avatar
      net: dsa: tag_8021q: fix the VLAN IDs used for encoding sub-VLANs · 4ef8d857
      Vladimir Oltean authored
      When using sub-VLANs in the range of 1-7, the resulting value from:
      
      	rx_vid = dsa_8021q_rx_vid_subvlan(ds, port, subvlan);
      
      is wrong according to the description from tag_8021q.c:
      
       | 11  | 10  |  9  |  8  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
       +-----------+-----+-----------------+-----------+-----------------------+
       |    DIR    | SVL |    SWITCH_ID    |  SUBVLAN  |          PORT         |
       +-----------+-----+-----------------+-----------+-----------------------+
      
      For example, when ds->index == 0, port == 3 and subvlan == 1,
      dsa_8021q_rx_vid_subvlan() returns 1027, same as it returns for
      subvlan == 0, but it should have returned 1043.
      
      This is because the low portion of the subvlan bits are not masked
      properly when writing into the 12-bit VLAN value. They are masked into
      bits 4:3, but they should be masked into bits 5:4.
      
      Fixes: 3eaae1d0 ("net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ef8d857
    • Krzysztof Kozlowski's avatar
      nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect · 4ac06a1e
      Krzysztof Kozlowski authored
      It's possible to trigger NULL pointer dereference by local unprivileged
      user, when calling getsockname() after failed bind() (e.g. the bind
      fails because LLCP_SAP_MAX used as SAP):
      
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
        Call Trace:
         llcp_sock_getname+0xb1/0xe0
         __sys_getpeername+0x95/0xc0
         ? lockdep_hardirqs_on_prepare+0xd5/0x180
         ? syscall_enter_from_user_mode+0x1c/0x40
         __x64_sys_getpeername+0x11/0x20
         do_syscall_64+0x36/0x70
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      This can be reproduced with Syzkaller C repro (bind followed by
      getpeername):
      https://syzkaller.appspot.com/x/repro.c?x=14def446e00000
      
      Cc: <stable@vger.kernel.org>
      Fixes: d646960f ("NFC: Initial LLCP support")
      Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4ac06a1e
  4. 30 May, 2021 1 commit
    • Sriranjani P's avatar
      net: stmmac: fix kernel panic due to NULL pointer dereference of mdio_bus_data · 593f555f
      Sriranjani P authored
      Fixed link does not need mdio bus and in that case mdio_bus_data will
      not be allocated. Before using mdio_bus_data we should check for NULL.
      
      This patch fix the kernel panic due to NULL pointer dereference of
      mdio_bus_data when it is not allocated.
      
      Without this patch we do see following kernel crash caused due to kernel
      NULL pointer dereference.
      
      Call trace:
      stmmac_dvr_probe+0x3c/0x10b0
      dwc_eth_dwmac_probe+0x224/0x378
      platform_probe+0x68/0xe0
      really_probe+0x130/0x3d8
      driver_probe_device+0x68/0xd0
      device_driver_attach+0x74/0x80
      __driver_attach+0x58/0xf8
      bus_for_each_dev+0x7c/0xd8
      driver_attach+0x24/0x30
      bus_add_driver+0x148/0x1f0
      driver_register+0x64/0x120
      __platform_driver_register+0x28/0x38
      dwc_eth_dwmac_driver_init+0x1c/0x28
      do_one_initcall+0x78/0x158
      kernel_init_freeable+0x1f0/0x244
      kernel_init+0x14/0x118
      ret_from_fork+0x10/0x30
      Code: f9002bfb 9113e2d9 910e6273 aa0003f7 (f9405c78)
      ---[ end trace 32d9d41562ddc081 ]---
      
      Fixes: e5e5b771 ("net: stmmac: make in-band AN mode parsing is supported for non-DT")
      Signed-off-by: default avatarSriranjani P <sriranjani.p@samsung.com>
      Signed-off-by: default avatarPankaj Dubey <pankaj.dubey@samsung.com>
      Link: https://lore.kernel.org/r/20210528071056.35252-1-sriranjani.p@samsung.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      593f555f