1. 02 Aug, 2021 8 commits
    • Vladimir Oltean's avatar
      net: dsa: mt7530: drop paranoid checks in .get_tag_protocol() · 244f8a80
      Vladimir Oltean authored
      It is desirable to reduce the surface of DSA_TAG_PROTO_NONE as much as
      we can, because we now have options for switches without hardware
      support for DSA tagging, and the occurrence in the mt7530 driver is in
      fact quite gratuitout and easy to remove. Since ds->ops->get_tag_protocol()
      is only called for CPU ports, the checks for a CPU port in
      mtk_get_tag_protocol() are redundant and can be removed.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      244f8a80
    • David S. Miller's avatar
      Merge branch 'octeon-drr-config' · a3280efd
      David S. Miller authored
      Sunil Goutham says:
      
      ====================
      cn10k: DWRR MTU and weights configuration
      
      On OcteonTx2 DWRR quantum is directly configured into each of
      the transmit scheduler queues. And PF/VF drivers were free to
      config any value upto 2^24.
      
      On CN10K, HW is modified, the quantum configuration at scheduler
      queues is in terms of weight. And SW needs to setup a base DWRR MTU
      at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
      'DWRR MTU * weight' to get the quantum.
      
      This patch series addresses this HW change on CN10K silicons,
      both admin function and PF/VF drivers are modified.
      
      Also added support to program DWRR MTU via devlink params.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3280efd
    • Sunil Goutham's avatar
      octeontx2-pf: cn10k: Config DWRR weight based on MTU · c39830a4
      Sunil Goutham authored
      Program SQ, MDQ, TL4 to TL2 transmit scheduler queues' DWRR
      weight based on DWRR MTU programmed at NIX_AF_DWRR_RPM_MTU.
      The DWRR MTU from admin function is retrieved via mbox.
      
      On OcteaonTx2 silicon, admin function driver responds with DWRR
      MTU as '1'. This helps to avoid silicon specific transmit
      scheduler DWRR quantum/weight configuration logic.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c39830a4
    • Sunil Goutham's avatar
      octeontx2-af: cn10k: DWRR MTU configuration · 76660df2
      Sunil Goutham authored
      On OcteonTx2 DWRR quantum is directly configured into each of
      the transmit scheduler queues. And PF/VF drivers were free to
      config any value upto 2^24.
      
      On CN10K, HW is modified, the quantum configuration at scheduler
      queues is in terms of weight. And SW needs to setup a base DWRR MTU
      at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
      'DWRR MTU * weight' to get the quantum. For LBK traffic, value
      programmed into NIX_AF_DWRR_RPM_MTU register is considered as
      DWRR MTU.
      
      This patch programs a default DWRR MTU of 8192 into HW and also
      provides a way to change this via devlink params.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76660df2
    • Dust Li's avatar
      selftests/net: remove min gso test in packet_snd · cfba3fb6
      Dust Li authored
      This patch removed the 'raw gso min size - 1' test which
      always fails now:
      ./in_netns.sh ./psock_snd -v -c -g -l "${mss}"
        raw gso min size - 1 (expected to fail)
        tx: 1524
        rx: 1472
        OK
      
      After commit 7c6d2ecb ("net: be more gentle about silly
      gso requests coming from user"), we relaxed the min gso_size
      check in virtio_net_hdr_to_skb().
      So when a packet which is smaller then the gso_size,
      GSO for this packet will not be set, the packet will be
      send/recv successfully.
      Signed-off-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfba3fb6
    • Yufeng Mo's avatar
      bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() · 220ade77
      Yufeng Mo authored
      Some time ago, I reported a calltrace issue
      "did not find a suitable aggregator", please see[1].
      After a period of analysis and reproduction, I find
      that this problem is caused by concurrency.
      
      Before the problem occurs, the bond structure is like follows:
      
      bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1
                            \
                              port0
            \
              slaver1(eth1) - agg1.lag_ports -> NULL
                            \
                              port1
      
      If we run 'ifenslave bond0 -d eth1', the process is like below:
      
      excuting __bond_release_one()
      |
      bond_upper_dev_unlink()[step1]
      |                       |                       |
      |                       |                       bond_3ad_lacpdu_recv()
      |                       |                       ->bond_3ad_rx_indication()
      |                       |                       spin_lock_bh()
      |                       |                       ->ad_rx_machine()
      |                       |                       ->__record_pdu()[step2]
      |                       |                       spin_unlock_bh()
      |                       |                       |
      |                       bond_3ad_state_machine_handler()
      |                       spin_lock_bh()
      |                       ->ad_port_selection_logic()
      |                       ->try to find free aggregator[step3]
      |                       ->try to find suitable aggregator[step4]
      |                       ->did not find a suitable aggregator[step5]
      |                       spin_unlock_bh()
      |                       |
      |                       |
      bond_3ad_unbind_slave() |
      spin_lock_bh()
      spin_unlock_bh()
      
      step1: already removed slaver1(eth1) from list, but port1 remains
      step2: receive a lacpdu and update port0
      step3: port0 will be removed from agg0.lag_ports. The struct is
             "agg0.lag_ports -> port1" now, and agg0 is not free. At the
      	   same time, slaver1/agg1 has been removed from the list by step1.
      	   So we can't find a free aggregator now.
      step4: can't find suitable aggregator because of step2
      step5: cause a calltrace since port->aggregator is NULL
      
      To solve this concurrency problem, put bond_upper_dev_unlink()
      after bond_3ad_unbind_slave(). In this way, we can invalid the port
      first and skip this port in bond_3ad_state_machine_handler(). This
      eliminates the situation that the slaver has been removed from the
      list but the port is still valid.
      
      [1]https://lore.kernel.org/netdev/10374.1611947473@famine/Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      220ade77
    • Cong Wang's avatar
      net_sched: refactor TC action init API · 695176bf
      Cong Wang authored
      TC action ->init() API has 10 parameters, it becomes harder
      to read. Some of them are just boolean and can be replaced
      by flags. Similarly for the internal API tcf_action_init()
      and tcf_exts_validate().
      
      This patch converts them to flags and fold them into
      the upper 16 bits of "flags", whose lower 16 bits are still
      reserved for user-space. More specifically, the following
      kernel flags are introduced:
      
      TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
      distinguish whether it is compatible with policer.
      
      TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
      this action is bound to a filter.
      
      TCA_ACT_FLAGS_REPLACE  replaces 'ovr' in most contexts,
      means we are replacing an existing action.
      
      TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
      opposite meaning, because we still hold RTNL in most
      cases.
      
      The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
      untouched and still stored as before.
      
      I have tested this patch with tdc and I do not see any
      failure related to this patch.
      Tested-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Acked-by: Jamal Hadi Salim<jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      695176bf
    • Martin Kaiser's avatar
      niu: read property length only if we use it · 451395f7
      Martin Kaiser authored
      In three places, the driver calls of_get_property and reads the property
      length although the length is not used. Update the calls to not request
      the length.
      Signed-off-by: default avatarMartin Kaiser <martin@kaiser.cx>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      451395f7
  2. 31 Jul, 2021 2 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · d39e8b92
      Jakub Kicinski authored
      Andrii Nakryiko says:
      
      ====================
      bpf-next 2021-07-30
      
      We've added 64 non-merge commits during the last 15 day(s) which contain
      a total of 83 files changed, 5027 insertions(+), 1808 deletions(-).
      
      The main changes are:
      
      1) BTF-guided binary data dumping libbpf API, from Alan.
      
      2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei.
      
      3) Ambient BPF run context and cgroup storage cleanup, from Andrii.
      
      4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi.
      
      5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri.
      
      6) bpf_{get,set}sockopt() support in BPF iterators, from Martin.
      
      7) BPF map pinning improvements in libbpf, from Martynas.
      
      8) Improved module BTF support in libbpf and bpftool, from Quentin.
      
      9) Bpftool cleanups and documentation improvements, from Quentin.
      
      10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi.
      
      11) Increased maximum cgroup storage size, from Stanislav.
      
      12) Small fixes and improvements to BPF tests and samples, from various folks.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
        tools: bpftool: Complete metrics list in "bpftool prog profile" doc
        tools: bpftool: Document and add bash completion for -L, -B options
        selftests/bpf: Update bpftool's consistency script for checking options
        tools: bpftool: Update and synchronise option list in doc and help msg
        tools: bpftool: Complete and synchronise attach or map types
        selftests/bpf: Check consistency between bpftool source, doc, completion
        tools: bpftool: Slightly ease bash completion updates
        unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()
        libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
        tools: bpftool: Support dumping split BTF by id
        libbpf: Add split BTF support for btf__load_from_kernel_by_id()
        tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
        tools: Free BTF objects at various locations
        libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
        libbpf: Rename btf__load() as btf__load_into_kernel()
        libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
        bpf: Emit better log message if bpf_iter ctx arg btf_id == 0
        tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
        bpf: Increase supported cgroup storage value size
        libbpf: Fix race when pinning maps in parallel
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20210730225606.1897330-1-andrii@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d39e8b92
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d2e11fd2
      Jakub Kicinski authored
      Conflicting commits, all resolutions pretty trivial:
      
      drivers/bus/mhi/pci_generic.c
        5c2c8531 ("bus: mhi: pci-generic: configurable network interface MRU")
        56f6f4c4 ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean")
      
      drivers/nfc/s3fwrn5/firmware.c
        a0302ff5 ("nfc: s3fwrn5: remove unnecessary label")
        46573e3a ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
        801e541c ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
      
      MAINTAINERS
        7d901a1e ("net: phy: add Maxlinear GPY115/21x/24x driver")
        8a7b46fa ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d2e11fd2
  3. 30 Jul, 2021 30 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c7d10223
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi
        (mac80211) and netfilter trees.
      
        Current release - regressions:
      
         - mac80211: fix starting aggregation sessions on mesh interfaces
      
        Current release - new code bugs:
      
         - sctp: send pmtu probe only if packet loss in Search Complete state
      
         - bnxt_en: add missing periodic PHC overflow check
      
         - devlink: fix phys_port_name of virtual port and merge error
      
         - hns3: change the method of obtaining default ptp cycle
      
         - can: mcba_usb_start(): add missing urb->transfer_dma initialization
      
        Previous releases - regressions:
      
         - set true network header for ECN decapsulation
      
         - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
           LRO
      
         - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
           PHY
      
         - sctp: fix return value check in __sctp_rcv_asconf_lookup
      
        Previous releases - always broken:
      
         - bpf:
             - more spectre corner case fixes, introduce a BPF nospec
               instruction for mitigating Spectre v4
             - fix OOB read when printing XDP link fdinfo
             - sockmap: fix cleanup related races
      
         - mac80211: fix enabling 4-address mode on a sta vif after assoc
      
         - can:
             - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
             - j1939: j1939_session_deactivate(): clarify lifetime of session
               object, avoid UAF
             - fix number of identical memory leaks in USB drivers
      
         - tipc:
             - do not blindly write skb_shinfo frags when doing decryption
             - fix sleeping in tipc accept routine"
      
      * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        gve: Update MAINTAINERS list
        can: esd_usb2: fix memory leak
        can: ems_usb: fix memory leak
        can: usb_8dev: fix memory leak
        can: mcba_usb_start(): add missing urb->transfer_dma initialization
        can: hi311x: fix a signedness bug in hi3110_cmd()
        MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
        bpf: Fix leakage due to insufficient speculative store bypass mitigation
        bpf: Introduce BPF nospec instruction for mitigating Spectre v4
        sis900: Fix missing pci_disable_device() in probe and remove
        net: let flow have same hash in two directions
        nfc: nfcsim: fix use after free during module unload
        tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
        sctp: fix return value check in __sctp_rcv_asconf_lookup
        nfc: s3fwrn5: fix undefined parameter values in dev_err()
        net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
        net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
        net/mlx5: Unload device upon firmware fatal error
        net/mlx5e: Fix page allocation failure for ptp-RQ over SF
        net/mlx5e: Fix page allocation failure for trap-RQ over SF
        ...
      c7d10223
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e1dab4c0
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These revert a recent IRQ resources handling modification that turned
        out to be problematic, fix suspend-to-idle handling on AMD platforms
        to take upcoming systems into account properly and fix the retrieval
        of the DPTF attributes of the PCH FIVR.
      
        Specifics:
      
         - Revert recent change of the ACPI IRQ resources handling that
           attempted to improve the ACPI IRQ override selection logic, but
           introduced serious regressions on some systems (Hui Wang).
      
         - Fix up quirks for AMD platforms in the suspend-to-idle support code
           so as to take upcoming systems using uPEP HID AMDI007 into account
           as appropriate (Mario Limonciello).
      
         - Fix the code retrieving DPTF attributes of the PCH FIVR so that it
           agrees on the return data type with the ACPI control method
           evaluated for this purpose (Srinivas Pandruvada)"
      
      * tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: DPTF: Fix reading of attributes
        Revert "ACPI: resources: Add checks for ACPI IRQ override"
        ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007
      e1dab4c0
    • Linus Torvalds's avatar
      pipe: make pipe writes always wake up readers · 3a34b13a
      Linus Torvalds authored
      Since commit 1b6b26ae ("pipe: fix and clarify pipe write wakeup
      logic") we have sanitized the pipe write logic, and would only try to
      wake up readers if they needed it.
      
      In particular, if the pipe already had data in it before the write,
      there was no point in trying to wake up a reader, since any existing
      readers must have been aware of the pre-existing data already.  Doing
      extraneous wakeups will only cause potential thundering herd problems.
      
      However, it turns out that some Android libraries have misused the EPOLL
      interface, and expected "edge triggered" be to "any new write will
      trigger it".  Even if there was no edge in sight.
      
      Quoting Sandeep Patil:
       "The commit 1b6b26ae ('pipe: fix and clarify pipe write wakeup
        logic') changed pipe write logic to wakeup readers only if the pipe
        was empty at the time of write. However, there are libraries that
        relied upon the older behavior for notification scheme similar to
        what's described in [1]
      
        One such library 'realm-core'[2] is used by numerous Android
        applications. The library uses a similar notification mechanism as GNU
        Make but it never drains the pipe until it is full. When Android moved
        to v5.10 kernel, all applications using this library stopped working.
      
        The library has since been fixed[3] but it will be a while before all
        applications incorporate the updated library"
      
      Our regression rule for the kernel is that if applications break from
      new behavior, it's a regression, even if it was because the application
      did something patently wrong.  Also note the original report [4] by
      Michal Kerrisk about a test for this epoll behavior - but at that point
      we didn't know of any actual broken use case.
      
      So add the extraneous wakeup, to approximate the old behavior.
      
      [ I say "approximate", because the exact old behavior was to do a wakeup
        not for each write(), but for each pipe buffer chunk that was filled
        in. The behavior introduced by this change is not that - this is just
        "every write will cause a wakeup, whether necessary or not", which
        seems to be sufficient for the broken library use. ]
      
      It's worth noting that this adds the extraneous wakeup only for the
      write side, while the read side still considers the "edge" to be purely
      about reading enough from the pipe to allow further writes.
      
      See commit f467a6a6 ("pipe: fix and clarify pipe read wakeup logic")
      for the pipe read case, which remains that "only wake up if the pipe was
      full, and we read something from it".
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
      Link: https://github.com/realm/realm-core [2]
      Link: https://github.com/realm/realm-core/issues/4666 [3]
      Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
      Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/Reported-by: default avatarSandeep Patil <sspatil@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a34b13a
    • Andrii Nakryiko's avatar
      Merge branch 'tools: bpftool: update, synchronise and validate types and options' · ab0720ce
      Andrii Nakryiko authored
      Quentin Monnet says:
      
      ====================
      
      To work with the different program types, map types, attach types etc.
      supported by eBPF, bpftool needs occasional updates to learn about the new
      features supported by the kernel. When such types translate into new
      keyword for the command line, updates are expected in several locations:
      typically, the help message displayed from bpftool itself, the manual page,
      and the bash completion file should be updated. The options used by the
      different commands for bpftool should also remain synchronised at those
      locations.
      
      Several omissions have occurred in the past, and a number of types are
      still missing today. This set is an attempt to improve the situation. It
      brings up-to-date the lists of types or options in bpftool, and also adds a
      Python script to the BPF selftests to automatically check that most of
      these lists remain synchronised.
      
      v2:
      - Reformat some lines in the bash completion file.
      - Do not reformat attach types, to preserve git-blame history.
      - Do not call Python script from tools/testing/selftests/bpf/Makefile.
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      ab0720ce
    • Quentin Monnet's avatar
      tools: bpftool: Complete metrics list in "bpftool prog profile" doc · 475a23c2
      Quentin Monnet authored
      Profiling programs with bpftool was extended some time ago to support
      two new metrics, namely itlb_misses and dtlb_misses (misses for the
      instruction/data translation lookaside buffer). Update the manual page
      and bash completion accordingly.
      
      Fixes: 450d060e ("bpftool: Add {i,d}tlb_misses support for bpftool profile")
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-8-quentin@isovalent.com
      475a23c2
    • Quentin Monnet's avatar
      tools: bpftool: Document and add bash completion for -L, -B options · 8cc8c635
      Quentin Monnet authored
      The -L|--use-loader option for using loader programs when loading, or
      when generating a skeleton, did not have any documentation or bash
      completion. Same thing goes for -B|--base-btf, used to pass a path to a
      base BTF object for split BTF such as BTF for kernel modules.
      
      This patch documents and adds bash completion for those options.
      
      Fixes: 75fa1777 ("tools/bpftool: Add bpftool support for split BTF")
      Fixes: d510296d ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-7-quentin@isovalent.com
      8cc8c635
    • Quentin Monnet's avatar
      selftests/bpf: Update bpftool's consistency script for checking options · da87772f
      Quentin Monnet authored
      Update the script responsible for checking that the different types used
      at various places in bpftool are synchronised, and extend it to check
      the consistency of options between the help messages in the source code
      and the manual pages.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-6-quentin@isovalent.com
      da87772f
    • Quentin Monnet's avatar
      tools: bpftool: Update and synchronise option list in doc and help msg · c07ba629
      Quentin Monnet authored
      All bpftool commands support the options for JSON output and debug from
      libbpf. In addition, some commands support additional options
      corresponding to specific use cases.
      
      The list of options described in the man pages for the different
      commands are not always accurate. The messages for interactive help are
      mostly limited to HELP_SPEC_OPTIONS, and are even less representative of
      the actual set of options supported for the commands.
      
      Let's update the lists:
      
      - HELP_SPEC_OPTIONS is modified to contain the "default" options (JSON
        and debug), and to be extensible (no ending curly bracket).
      - All commands use HELP_SPEC_OPTIONS in their help message, and then
        complete the list with their specific options.
      - The lists of options in the man pages are updated.
      - The formatting of the list for bpftool.rst is adjusted to match
        formatting for the other man pages. This is for consistency, and also
        because it will be helpful in a future patch to automatically check
        that the files are synchronised.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-5-quentin@isovalent.com
      c07ba629
    • Quentin Monnet's avatar
      tools: bpftool: Complete and synchronise attach or map types · b544342e
      Quentin Monnet authored
      Update bpftool's list of attach type names to tell it about the latest
      attach types, or the "ringbuf" map. Also update the documentation, help
      messages, and bash completion when relevant.
      
      These missing items were reported by the newly added Python script used
      to help maintain consistency in bpftool.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-4-quentin@isovalent.com
      b544342e
    • Quentin Monnet's avatar
      selftests/bpf: Check consistency between bpftool source, doc, completion · a2b5944f
      Quentin Monnet authored
      Whenever the eBPF subsystem gains new elements, such as new program or
      map types, it is necessary to update bpftool if we want it able to
      handle the new items.
      
      In addition to the main arrays containing the names of these elements in
      the source code, there are also multiple locations to update:
      
      - The help message in the do_help() functions in bpftool's source code.
      - The RST documentation files.
      - The bash completion file.
      
      This has led to omissions multiple times in the past. This patch
      attempts to address this issue by adding consistency checks for all
      these different locations. It also verifies that the bpf_prog_type,
      bpf_map_type and bpf_attach_type enums from the UAPI BPF header have all
      their members present in bpftool.
      
      The script requires no argument to run, it reads and parses the
      different files to check, and prints the mismatches, if any. It
      currently reports a number of missing elements, which will be fixed in a
      later patch:
      
        $ ./test_bpftool_synctypes.py
        Comparing [...]/linux/tools/bpf/bpftool/map.c (map_type_name) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_MAP_CREATE_TYPES): {'ringbuf'}
        Comparing BPF header (enum bpf_attach_type) and [...]/linux/tools/bpf/bpftool/common.c (attach_type_name): {'BPF_TRACE_ITER', 'BPF_XDP_DEVMAP', 'BPF_XDP', 'BPF_SK_REUSEPORT_SELECT', 'BPF_XDP_CPUMAP', 'BPF_SK_REUSEPORT_SELECT_OR_MIGRATE'}
        Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/prog.c (do_help() ATTACH_TYPE): {'skb_verdict'}
        Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/Documentation/bpftool-prog.rst (ATTACH_TYPE): {'skb_verdict'}
        Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_PROG_ATTACH_TYPES): {'skb_verdict'}
      
      Note that the script does NOT check for consistency between the list of
      program types that bpftool claims it accepts and the actual list of
      keywords that can be used. This is because bpftool does not "see" them,
      they are ELF section names parsed by libbpf. It is not hard to parse the
      section_defs[] array in libbpf, but some section names are associated
      with program types that bpftool cannot load at the moment. For example,
      some programs require a BTF target and an attach target that bpftool
      cannot handle. The script may be extended to parse the array and check
      only relevant values in the future.
      
      The script is not added to the selftests' Makefile, because doing so
      would require all patches with BPF UAPI change to also update bpftool.
      Instead it is to be added to the CI.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-3-quentin@isovalent.com
      a2b5944f
    • Quentin Monnet's avatar
      tools: bpftool: Slightly ease bash completion updates · 510b4d4c
      Quentin Monnet authored
      Bash completion for bpftool gets two minor improvements in this patch.
      
      Move the detection of attach types for "bpftool cgroup attach" outside
      of the "case/esac" bloc, where we cannot reuse our variable holding the
      list of supported attach types as a pattern list. After the change, we
      have only one list of cgroup attach types to update when new types are
      added, instead of the former two lists.
      
      Also rename the variables holding lists of names for program types, map
      types, and attach types, to make them more unique. This can make it
      slightly easier to point people to the relevant variables to update, but
      the main objective here is to help run a script to check that bash
      completion is up-to-date with bpftool's source code.
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730215435.7095-2-quentin@isovalent.com
      510b4d4c
    • Jakub Kicinski's avatar
      Merge branch 'clean-devlink-net-namespace-operations' · aae950b1
      Jakub Kicinski authored
      Leon Romanovsky says:
      
      ====================
      Clean devlink net namespace operations
      
      This short series continues my work on devlink core code to make devlink
      reload less prone to errors and harden it from API abuse.
      
      Despite first patch being a clear fix, I would ask you to apply it to
      net-next anyway, because the fixed patch is anyway old and it will
      help us to eliminate merge conflicts that will arise for following
      patches or even for the second one.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1627578998.git.leonro@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aae950b1
    • Leon Romanovsky's avatar
      devlink: Allocate devlink directly in requested net namespace · 26713455
      Leon Romanovsky authored
      There is no need in extra call indirection and check from impossible
      flow where someone tries to set namespace without prior call
      to devlink_alloc().
      
      Instead of this extra logic and additional EXPORT_SYMBOL, use specialized
      devlink allocation function that receives net namespace as an argument.
      
      Such specialized API allows clear view when devlink initialized in wrong
      net namespace and/or kernel users don't try to change devlink namespace
      under the hood.
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      26713455
    • Leon Romanovsky's avatar
      devlink: Break parameter notification sequence to be before/after unload/load driver · 05a7f4a8
      Leon Romanovsky authored
      The change of namespaces during devlink reload calls to driver unload
      before it accesses devlink parameters. The commands below causes to
      use-after-free bug when trying to get flow steering mode.
      
       * ip netns add n1
       * devlink dev reload pci/0000:00:09.0 netns n1
      
       ==================================================================
       BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
       Read of size 4 at addr ffff888009d04308 by task devlink/275
      
       CPU: 6 PID: 275 Comm: devlink Not tainted 5.12.0-rc2+ #2853
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x93/0xc2
        print_address_description.constprop.0+0x18/0x140
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        kasan_report.cold+0x7c/0xd8
        ? mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
        devlink_nl_param_fill+0x1c8/0xe80
        ? __free_pages_ok+0x37a/0x8a0
        ? devlink_flash_update_timeout_notify+0xd0/0xd0
        ? lock_acquire+0x1a9/0x6d0
        ? fs_reclaim_acquire+0xb7/0x160
        ? lock_is_held_type+0x98/0x110
        ? 0xffffffff81000000
        ? lock_release+0x1f9/0x6c0
        ? fs_reclaim_release+0xa1/0xf0
        ? lock_downgrade+0x6d0/0x6d0
        ? lock_is_held_type+0x98/0x110
        ? lock_is_held_type+0x98/0x110
        ? memset+0x20/0x40
        ? __build_skb_around+0x1f8/0x2b0
        devlink_param_notify+0x6d/0x180
        devlink_reload+0x1c3/0x520
        ? devlink_remote_reload_actions_performed+0x30/0x30
        ? mutex_trylock+0x24b/0x2d0
        ? devlink_nl_cmd_reload+0x62b/0x1070
        devlink_nl_cmd_reload+0x66d/0x1070
        ? devlink_reload+0x520/0x520
        ? devlink_get_from_attrs+0x1bc/0x260
        ? devlink_nl_pre_doit+0x64/0x4d0
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        ? mutex_lock_io_nested+0x1130/0x1130
        ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240
        ? security_capable+0x51/0x90
        genl_rcv_msg+0x27f/0x4a0
        ? genl_get_cmd+0x3c0/0x3c0
        ? lock_acquire+0x1a9/0x6d0
        ? devlink_reload+0x520/0x520
        ? lock_release+0x6c0/0x6c0
        netlink_rcv_skb+0x11d/0x340
        ? genl_get_cmd+0x3c0/0x3c0
        ? netlink_ack+0x9f0/0x9f0
        ? lock_release+0x1f9/0x6c0
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        ? netlink_attachskb+0x730/0x730
        ? _copy_from_iter_full+0x178/0x650
        ? __alloc_skb+0x113/0x2b0
        netlink_sendmsg+0x6f1/0xbd0
        ? netlink_unicast+0x700/0x700
        ? lock_is_held_type+0x98/0x110
        ? netlink_unicast+0x700/0x700
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        ? __x64_sys_getpeername+0xb0/0xb0
        ? do_sys_openat2+0x10b/0x370
        ? __up_read+0x1a1/0x7b0
        ? do_user_addr_fault+0x219/0xdc0
        ? __x64_sys_openat+0x120/0x1d0
        ? __x64_sys_open+0x1a0/0x1a0
        __x64_sys_sendto+0xdd/0x1b0
        ? syscall_enter_from_user_mode+0x1d/0x50
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7fc69d0af14a
       Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
       RSP: 002b:00007ffc1d8292f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fc69d0af14a
       RDX: 0000000000000038 RSI: 0000555f57c56440 RDI: 0000000000000003
       RBP: 0000555f57c56410 R08: 00007fc69d17b200 R09: 000000000000000c
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
       Allocated by task 146:
        kasan_save_stack+0x1b/0x40
        __kasan_kmalloc+0x99/0xc0
        mlx5_init_fs+0xf0/0x1c50 [mlx5_core]
        mlx5_load+0xd2/0x180 [mlx5_core]
        mlx5_init_one+0x2f6/0x450 [mlx5_core]
        probe_one+0x47d/0x6e0 [mlx5_core]
        pci_device_probe+0x2a0/0x4a0
        really_probe+0x20a/0xc90
        driver_probe_device+0xd8/0x380
        device_driver_attach+0x1df/0x250
        __driver_attach+0xff/0x240
        bus_for_each_dev+0x11e/0x1a0
        bus_add_driver+0x309/0x570
        driver_register+0x1ee/0x380
        0xffffffffa06b8062
        do_one_initcall+0xd5/0x410
        do_init_module+0x1c8/0x760
        load_module+0x6d8b/0x9650
        __do_sys_finit_module+0x118/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       Freed by task 275:
        kasan_save_stack+0x1b/0x40
        kasan_set_track+0x1c/0x30
        kasan_set_free_info+0x20/0x30
        __kasan_slab_free+0x102/0x140
        slab_free_freelist_hook+0x74/0x1b0
        kfree+0xd7/0x2a0
        mlx5_unload+0x16/0xb0 [mlx5_core]
        mlx5_unload_one+0xae/0x120 [mlx5_core]
        mlx5_devlink_reload_down+0x1bc/0x380 [mlx5_core]
        devlink_reload+0x141/0x520
        devlink_nl_cmd_reload+0x66d/0x1070
        genl_family_rcv_msg_doit+0x1e9/0x2f0
        genl_rcv_msg+0x27f/0x4a0
        netlink_rcv_skb+0x11d/0x340
        genl_rcv+0x24/0x40
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6f1/0xbd0
        sock_sendmsg+0xb0/0xe0
        __sys_sendto+0x193/0x240
        __x64_sys_sendto+0xdd/0x1b0
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       The buggy address belongs to the object at ffff888009d04300
        which belongs to the cache kmalloc-128 of size 128
       The buggy address is located 8 bytes inside of
        128-byte region [ffff888009d04300, ffff888009d04380)
       The buggy address belongs to the page:
       page:0000000086a64ecc refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888009d04000 pfn:0x9d04
       head:0000000086a64ecc order:1 compound_mapcount:0
       flags: 0x4000000000010200(slab|head)
       raw: 4000000000010200 ffffea0000203980 0000000200000002 ffff8880050428c0
       raw: ffff888009d04000 000000008020001d 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
        ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      
      The right solution to devlink reload is to notify about deletion of
      parameters, unload driver, change net namespaces, load driver and notify
      about addition of parameters.
      
      Fixes: 070c63f2 ("net: devlink: allow to change namespaces during reload")
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      05a7f4a8
    • Cong Wang's avatar
      unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg() · 0b846445
      Cong Wang authored
      As Eric noticed, __unix_dgram_recvmsg() may acquire u->iolock
      too, so we have to release it before calling this function.
      
      Fixes: 9825d866 ("af_unix: Implement unix_dgram_bpf_recvmsg()")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      0b846445
    • Hengqi Chen's avatar
      libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf · a710eed3
      Hengqi Chen authored
      Add two new APIs: btf__load_vmlinux_btf and btf__load_module_btf.
      btf__load_vmlinux_btf is just an alias to the existing API named
      libbpf_find_kernel_btf, rename to be more precisely and consistent
      with existing BTF APIs. btf__load_module_btf can be used to load
      module BTF, add it for completeness. These two APIs are useful for
      implementing tracing tools and introspection tools. This is part
      of the effort towards libbpf 1.0 ([0]).
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/280Signed-off-by: default avatarHengqi Chen <hengqi.chen@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210730114012.494408-1-hengqi.chen@gmail.com
      a710eed3
    • Paolo Abeni's avatar
      sk_buff: avoid potentially clearing 'slow_gro' field · a432934a
      Paolo Abeni authored
      If skb_dst_set_noref() is invoked with a NULL dst, the 'slow_gro'
      field is cleared, too. That could lead to wrong behavior if
      the skb later enters the GRO stage.
      
      Fix the potential issue replacing preserving a non-zero value of
      the 'slow_gro' field.
      
      Additionally, fix a comment typo.
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Fixes: 8a886b14 ("sk_buff: track dst status in slow_gro")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/aa42529252dc8bb02bd42e8629427040d1058537.1627662501.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a432934a
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-resources' and 'acpi-dptf' · e83f54ea
      Rafael J. Wysocki authored
      * acpi-resources:
        Revert "ACPI: resources: Add checks for ACPI IRQ override"
      
      * acpi-dptf:
        ACPI: DPTF: Fix reading of attributes
      e83f54ea
    • Linus Torvalds's avatar
      Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block · 4669e13c
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - gendisk freeing fix (Christoph)
      
       - blk-iocost wake ordering fix (Tejun)
      
       - tag allocation error handling fix (John)
      
       - loop locking fix. While this isn't the prettiest fix in the world,
         nobody has any good alternatives for 5.14. Something to likely
         revisit for 5.15. (Tetsuo)
      
      * tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        block: delay freeing the gendisk
        blk-iocost: fix operation ordering in iocg_wake_fn()
        blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling
        loop: reintroduce global lock for safe loop_validate_file() traversal
      4669e13c
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block · 27eb687b
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - A fix for block backed reissue (me)
      
       - Reissue context hardening (me)
      
       - Async link locking fix (Pavel)
      
      * tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        io_uring: fix poll requests leaking second poll entries
        io_uring: don't block level reissue off completion path
        io_uring: always reissue from task_work context
        io_uring: fix race in unified task_work running
        io_uring: fix io_prep_async_link locking
      27eb687b
    • Linus Torvalds's avatar
      Merge tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block · f6c5971b
      Linus Torvalds authored
      Pull libata fixlets from Jens Axboe:
      
       - A fix for PIO highmem (Christoph)
      
       - Kill HAVE_IDE as it's now unused (Lukas)
      
      * tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        arch: Kconfig: clean up obsolete use of HAVE_IDE
        libata: fix ata_pio_sector for CONFIG_HIGHMEM
      f6c5971b
    • Linus Torvalds's avatar
      Merge tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 051df241
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - fix -Warray-bounds warning, to help external patchset to make it
         default treewide
      
       - fix writeable device accounting (syzbot report)
      
       - fix fsync and log replay after a rename and inode eviction
      
       - fix potentially lost error code when submitting multiple bios for
         compressed range
      
      * tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: calculate number of eb pages properly in csum_tree_block
        btrfs: fix rw device counting in __btrfs_free_extra_devids
        btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction
        btrfs: mark compressed range uptodate only if all bio succeed
      051df241
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 8723bc8f
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - resume timing fix for intel-ish driver (Ye Xiang)
      
       - fix for using incorrect MMIO register in amd_sfh driver (Dylan
         MacKenzie)
      
       - Cintiq 24HDT / 27QHDT regression fix and touch processing fix for
         Wacom driver (Jason Gerecke)
      
       - device removal bugfix for ft260 driver (Michael Zaidman)
      
       - other small assorted fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: ft260: fix device removal due to USB disconnect
        HID: wacom: Skip processing of touches with negative slot values
        HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT
        HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible"
        HID: apple: Add support for Keychron K1 wireless keyboard
        HID: fix typo in Kconfig
        HID: ft260: fix format type warning in ft260_word_show()
        HID: amd_sfh: Use correct MMIO register for DMA address
        HID: asus: Remove check for same LED brightness on set
        HID: intel-ish-hid: use async resume function
      8723bc8f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · ad6ec09d
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "7 patches.
      
        Subsystems affected by this patch series: lib, ocfs2, and mm (slub,
        migration, and memcg)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
        slub: fix unreclaimable slab stat for bulk free
        mm/migrate: fix NR_ISOLATED corruption on 64-bit
        mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
        ocfs2: issue zeroout to EOF blocks
        ocfs2: fix zero out valid data
        lib/test_string.c: move string selftest in the Runtime Testing menu
      ad6ec09d
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-5.14-20210730' of... · 8d670412
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-07-30
      
      The first patch is by me and adds Yasushi SHOJI as a reviewer for the
      Microchip CAN BUS Analyzer Tool driver.
      
      Dan Carpenter's patch fixes a signedness bug in the hi311x driver.
      
      Pavel Skripkin provides 4 patches, the first targets the mcba_usb
      driver by adding the missing urb->transfer_dma initialization, which
      was broken in a previous commit. The last 3 patches fix a memory leak
      in the usb_8dev, ems_usb and esd_usb2 driver.
      
      * tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: esd_usb2: fix memory leak
        can: ems_usb: fix memory leak
        can: usb_8dev: fix memory leak
        can: mcba_usb_start(): add missing urb->transfer_dma initialization
        can: hi311x: fix a signedness bug in hi3110_cmd()
        MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
      ====================
      
      Link: https://lore.kernel.org/r/20210730070526.1699867-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d670412
    • Wang Hai's avatar
      mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() · 121dffe2
      Wang Hai authored
      When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
      the following dump occurs.
      
        BUG: kernel NULL pointer dereference, address: 0000000000000020
        [...]
        Oops: 0000 [#1] SMP
        [...]
        Workqueue: events kfree_rcu_work
        RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
        RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
        RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
        [...]
        Call Trace:
          kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
          kfree_bulk include/linux/slab.h:413 [inline]
          kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
          process_one_work+0x207/0x530 kernel/workqueue.c:2276
          worker_thread+0x320/0x610 kernel/workqueue.c:2422
          kthread+0x13d/0x160 kernel/kthread.c:313
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      When kmalloc_node() a large memory, page is allocated, not slab, so when
      freeing memory via kfree_rcu(), this large memory should not be used by
      memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
      slab.
      
      Using page_objcgs_check() instead of page_objcgs() in
      memcg_slab_free_hook() to fix this bug.
      
      Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
      Fixes: 270c6a71 ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      121dffe2
    • Shakeel Butt's avatar
      slub: fix unreclaimable slab stat for bulk free · f227f0fa
      Shakeel Butt authored
      SLUB uses page allocator for higher order allocations and update
      unreclaimable slab stat for such allocations.  At the moment, the bulk
      free for SLUB does not share code with normal free code path for these
      type of allocations and have missed the stat update.  So, fix the stat
      update by common code.  The user visible impact of the bug is the
      potential of inconsistent unreclaimable slab stat visible through
      meminfo and vmstat.
      
      Link: https://lkml.kernel.org/r/20210728155354.3440560-1-shakeelb@google.com
      Fixes: 6a486c0a ("mm, sl[ou]b: improve memory accounting")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f227f0fa
    • Aneesh Kumar K.V's avatar
      mm/migrate: fix NR_ISOLATED corruption on 64-bit · b5916c02
      Aneesh Kumar K.V authored
      Similar to commit 2da9f630 ("mm/vmscan: fix NR_ISOLATED_FILE
      corruption on 64-bit") avoid using unsigned int for nr_pages.  With
      unsigned int type the large unsigned int converts to a large positive
      signed long.
      
      Symptoms include CMA allocations hanging forever due to
      alloc_contig_range->...->isolate_migratepages_block waiting forever in
      "while (unlikely(too_many_isolated(pgdat)))".
      
      Link: https://lkml.kernel.org/r/20210728042531.359409-1-aneesh.kumar@linux.ibm.com
      Fixes: c5fc5c3a ("mm: migrate: account THP NUMA migration counters correctly")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5916c02
    • Johannes Weiner's avatar
      mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code · 30def935
      Johannes Weiner authored
      Dan Carpenter reports:
      
          The patch 2d146aa3: "mm: memcontrol: switch to rstat" from Apr
          29, 2021, leads to the following static checker warning:
      
      	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
      	    warn: sleeping in atomic context
      
          mm/memcontrol.c
            3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
            3573  {
            3574          unsigned long val;
            3575
            3576          if (mem_cgroup_is_root(memcg)) {
            3577                  cgroup_rstat_flush(memcg->css.cgroup);
      			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
          This is from static analysis and potentially a false positive.  The
          problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
          which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
          can sleep.
      
            3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
            3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
            3580                  if (swap)
            3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
            3582          } else {
            3583                  if (!swap)
            3584                          val = page_counter_read(&memcg->memory);
            3585                  else
            3586                          val = page_counter_read(&memcg->memsw);
            3587          }
            3588          return val;
            3589  }
      
      __mem_cgroup_threshold() indeed holds the rcu lock.  In addition, the
      thresholding code is invoked during stat changes, and those contexts
      have irqs disabled as well.  If the lock breaking occurs inside the
      flush function, it will result in a sleep from an atomic context.
      
      Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
      
      Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
      Fixes: 2d146aa3 ("mm: memcontrol: switch to rstat")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Reviewed-by: default avatarRik van Riel <riel@surriel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30def935
    • Junxiao Bi's avatar
      ocfs2: issue zeroout to EOF blocks · 9449ad33
      Junxiao Bi authored
      For punch holes in EOF blocks, fallocate used buffer write to zero the
      EOF blocks in last cluster.  But since ->writepage will ignore EOF
      pages, those zeros will not be flushed.
      
      This "looks" ok as commit 6bba4471 ("ocfs2: fix data corruption by
      fallocate") will zero the EOF blocks when extend the file size, but it
      isn't.  The problem happened on those EOF pages, before writeback, those
      pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
      set, when writeback run by write_cache_pages(), DIRTY flag on the page
      was cleared, but DIRTY flag on the buffer_head not.
      
      When next write happened to those EOF pages, since buffer_head already
      had DIRTY flag set, it would not mark page DIRTY again.  That made
      writeback ignore them forever.  That will cause data corruption.  Even
      directio write can't work because it will fail when trying to drop pages
      caches before direct io, as it found the buffer_head for those pages
      still had DIRTY flag set, then it will fall back to buffer io mode.
      
      To make a summary of the issue, as writeback ingores EOF pages, once any
      EOF page is generated, any write to it will only go to the page cache,
      it will never be flushed to disk even file size extends and that page is
      not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
      write.
      
      The following code snippet from qemu-img could trigger the corruption.
      
        656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
        ...
        660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
        660   fallocate(11, 0, 2275868672, 327680) = 0
        658   pwrite64(11, "
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9449ad33