1. 14 Oct, 2022 10 commits
    • Rob Herring's avatar
      perf: Skip and warn on unknown format 'configN' attrs · e552b7be
      Rob Herring authored
      If the kernel exposes a new perf_event_attr field in a format attr, perf
      will return an error stating the specified PMU can't be found. For
      example, a format attr with 'config3:0-63' causes an error as config3 is
      unknown to perf. This causes a compatibility issue between a newer
      kernel with older perf tool.
      
      Before this change with a kernel adding 'config3' I get:
      
        $ perf record -e arm_spe// -- true
        event syntax error: 'arm_spe//'
                             \___ Cannot find PMU `arm_spe'. Missing kernel support?
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list
        available events
      
      After this change, I get:
      
        $ perf record -e arm_spe// -- true
        WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf!
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.091 MB perf.data ]
      
      To support unknown configN formats, rework the YACC implementation to
      pass any config[0-9]+ format to perf_pmu__new_format() to handle with a
      warning.
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e552b7be
    • Andi Kleen's avatar
      perf list: Fix metricgroups title message · 0cef141e
      Andi Kleen authored
        $ perf list metricgroups
      
      gives
      
        List of pre-defined events (to be used in -e):
      
        Metric Groups:
      
        Backend
        Bad
        BadSpec
      
      But that's incorrect of course because metric groups or metrics can only
      be specified with -M. So fix the message to say -e or -M
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20221004192634.998984-1-ak@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0cef141e
    • Namhyung Kim's avatar
      perf mem: Fix -C option behavior for perf mem record · 7d60fa2c
      Namhyung Kim authored
      The -C/--cpu option was maily for report but it also affected record as
      it ate the option.  So users needed to use "--" after perf mem record to
      pass the info to the perf record properly.
      
      Check if this option is set for record, and pass it to the actual perf
      record.
      
      Before)
        $ sudo perf --debug perf-event-open mem record -C 0 2>&1 | grep -a sys_perf_event_open
        ...
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 4
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 5
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 6
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 8
        sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 9
        sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 10
        sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 11
        ...
      
      After)
        $ sudo perf --debug perf-event-open mem record -C 0 2>&1 | grep -a sys_perf_event_open
        ...
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 4
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 6
        sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 7
      Reported-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20221004200211.1444521-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7d60fa2c
    • Namhyung Kim's avatar
      perf annotate: Add missing condition flags for arm64 · 531778b1
      Namhyung Kim authored
      According to the document [1], it can also have 'hs', 'lo', 'vc', 'vs' as a
      condition code.  Let's add them too.
      
      [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/condition-codes-1-condition-flags-and-codesReported-by: default avatarKevin Nomura <nomurak@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20221006222232.266416-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      531778b1
    • Arnaldo Carvalho de Melo's avatar
      libperf: Do not include non-UAPI linux/compiler.h header · d5e57375
      Arnaldo Carvalho de Melo authored
      Its just for that __packed define, so use it expanded as __attribute__((packed)),
      like the other files in /usr/include do.
      
      This was problem was preventing building the libperf examples on ALT
      Linux and Fedora 35, fix it.
      Reported-by: default avatarVitaly Chikunov <vt@altlinux.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Dmitry Levin <ldv@altlinux.org
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/Y0lnpl2Ix7VljVDc@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d5e57375
    • James Clark's avatar
      perf test: Fix test_arm_coresight.sh failures on Juno · fe180a52
      James Clark authored
      This test commonly fails on Arm Juno because the instruction interval
      is large enough to miss generating any samples for Perf in system-wide
      mode.
      
      Fix this by lowering the interval until a comfortable number of Perf
      instructions are generated. The test is still quick to run because only
      a small amount of trace is gathered.
      
      Before:
      
        sudo ./perf test coresight -vvv
        ...
        Recording trace with system wide mode
        Looking at perf.data file for dumping branch samples:
        Looking at perf.data file for reporting branch samples:
        Looking at perf.data file for instruction samples:
        CoreSight system wide testing: FAIL
        ...
      
      After:
      
        sudo ./perf test coresight -vvv
        ...
        Recording trace with system wide mode
        Looking at perf.data file for dumping branch samples:
        Looking at perf.data file for reporting branch samples:
        Looking at perf.data file for instruction samples:
        CoreSight system wide testing: PASS
        ...
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: coresight@lists.linaro.org
      Link: https://lore.kernel.org/r/20221005140508.1537277-1-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fe180a52
    • Namhyung Kim's avatar
      perf stat: Support old kernels for bperf cgroup counting · f21cb520
      Namhyung Kim authored
      The recent change in the cgroup will break the backward compatiblity in
      the BPF program.  It should support both old and new kernels using BPF
      CO-RE technique.
      
      Like the task_struct->__state handling in the offcpu analysis, we can
      check the field name in the cgroup struct.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: bpf@vger.kernel.org
      Cc: cgroups@vger.kernel.org
      Cc: zefan li <lizefan.x@bytedance.com>
      Link: http://lore.kernel.org/lkml/20221011052808.282394-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f21cb520
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm · 9c9155a3
      Linus Torvalds authored
      Pull more drm updates from Dave Airlie:
       "Round of fixes for the merge window stuff, bunch of amdgpu and i915
        changes, this should have the gcc11 warning fix, amongst other
        changes.
      
        amdgpu:
         - DC mutex fix
         - DC SubVP fixes
         - DCN 3.2.x fixes
         - DCN 3.1.x fixes
         - SDMA 6.x fixes
         - Enable DPIA for 3.1.4
         - VRR fixes
         - VRAM BO swapping fix
         - Revert dirty fb helper change
         - SR-IOV suspend/resume fixes
         - Work around GCC array bounds check fail warning
         - UMC 8.10 fixes
         - Misc fixes and cleanups
      
        i915:
         - Round to closest in g4x+ HDMI clock readout
         - Update MOCS table for EHL
         - Fix PSR_IMR/IIR field handling
         - Fix watermark calculations for gen12+/DG2 modifiers
         - Reject excessive dotclocks early
         - Fix revocation of non-persistent contexts
         - Handle migration for dpt
         - Fix display problems after resume
         - Allow control over the flags when migrating
         - Consider DG2_RC_CCS_CC when migrating buffers"
      
      * tag 'drm-next-2022-10-14' of git://anongit.freedesktop.org/drm/drm: (110 commits)
        drm/amd/display: Add HUBP surface flip interrupt handler
        drm/i915/display: consider DG2_RC_CCS_CC when migrating buffers
        drm/i915: allow control over the flags when migrating
        drm/amd/display: Simplify bool conversion
        drm/amd/display: fix transfer function passed to build_coefficients()
        drm/amd/display: add a license to cursor_reg_cache.h
        drm/amd/display: make virtual_disable_link_output static
        drm/amd/display: fix indentation in dc.c
        drm/amd/display: make dcn32_split_stream_for_mpc_or_odm static
        drm/amd/display: fix build error on arm64
        drm/amd/display: 3.2.207
        drm/amd/display: Clean some DCN32 macros
        drm/amdgpu: Add poison mode query for umc v8_10_0
        drm/amdgpu: Update umc v8_10_0 headers
        drm/amdgpu: fix coding style issue for mca notifier
        drm/amdgpu: define convert_error_address for umc v8.7
        drm/amdgpu: define RAS convert_error_address API
        drm/amdgpu: remove check for CE in RAS error address query
        drm/i915: Fix display problems after resume
        drm/amd/display: fix array-bounds error in dc_stream_remove_writeback() [take 2]
        ...
      9c9155a3
    • Linus Torvalds's avatar
      Merge tag 'block-6.1-2022-10-13' of git://git.kernel.dk/linux · a521fc3c
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
       "Fixes that ended up landing later than the initial block pull request.
        Nothing really major in here:
      
         - NVMe pull request via Christoph:
              - add NVME_QUIRK_BOGUS_NID for Lexar NM760 (Abhijit)
              - add NVME_QUIRK_NO_DEEPEST_PS to avoid the deepest sleep state
                on ZHITAI TiPro5000 SSDs (Xi Ruoyao)
              - fix possible hang caused during ctrl deletion (Sagi Grimberg)
              - fix possible hang in live ns resize with ANA access (Sagi
                Grimberg)
      
         - Proactively avoid a sign extension issue with the queue flags
           (Brian)
      
         - Regression fix for hidden disks (Christoph)
      
         - Update OPAL maintainers entry (Jonathan)
      
         - blk-wbt regression initialization fix (Yu)"
      
      * tag 'block-6.1-2022-10-13' of git://git.kernel.dk/linux:
        nvme-multipath: fix possible hang in live ns resize with ANA access
        nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
        nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760
        nvme-tcp: fix possible hang caused during ctrl deletion
        nvme-rdma: fix possible hang caused during ctrl deletion
        block: fix leaking minors of hidden disks
        block: avoid sign extend problem with default queue flags mask
        blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
        block: Remove the repeat word 'can'
        MAINTAINERS: Update SED-Opal Maintainers
      a521fc3c
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.1-2022-10-13' of git://git.kernel.dk/linux · c98c70ed
      Linus Torvalds authored
      Pull more io_uring updates from Jens Axboe:
       "A collection of fixes that ended up either being later than the
        initial pull, or dependent on multiple branches (6.0-late being one of
        them) and hence deferred purposely. This contains:
      
         - Cleanup fixes for the single submitter late 6.0 change, which we
           pushed to 6.1 to keep the 6.0 changes small (Dylan, Pavel)
      
         - Fix for IORING_OP_CONNECT not handling -EINPROGRESS correctly (me)
      
         - Ensure that the zc sendmsg variant gets audited correctly (me)
      
         - Regression fix from this merge window where kiocb_end_write()
           doesn't always gets called, which can cause issues with fs freezing
           (me)
      
         - Registered files SCM handling fix (Pavel)
      
         - Regression fix for big sqe dumping in fdinfo (Pavel)
      
         - Registered buffers accounting fix (Pavel)
      
         - Remove leftover notification structures, we killed them off late in
           6.0 (Pavel)
      
         - Minor optimizations (Pavel)
      
         - Cosmetic variable shadowing fix (Stefan)"
      
      * tag 'io_uring-6.1-2022-10-13' of git://git.kernel.dk/linux:
        io_uring/rw: ensure kiocb_end_write() is always called
        io_uring: fix fdinfo sqe offsets calculation
        io_uring: local variable rw shadows outer variable in io_write
        io_uring/opdef: remove 'audit_skip' from SENDMSG_ZC
        io_uring: optimise locking for local tw with submit_wait
        io_uring: remove redundant memory barrier in io_req_local_work_add
        io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT
        io_uring: remove notif leftovers
        io_uring: correct pinned_vm accounting
        io_uring/af_unix: defer registered files gc to io_uring release
        io_uring: limit registration w/ SINGLE_ISSUER
        io_uring: remove io_register_submitter
        io_uring: simplify __io_uring_add_tctx_node
      c98c70ed
  2. 13 Oct, 2022 30 commits
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-6.1-2022-10-12' of... · fc3523a8
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-6.1-2022-10-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
      
      amd-drm-fixes-6.1-2022-10-12:
      
      amdgpu:
      - DC mutex fix
      - DC SubVP fixes
      - DCN 3.2.x fixes
      - DCN 3.1.x fixes
      - SDMA 6.x fixes
      - Enable DPIA for 3.1.4
      - VRR fixes
      - VRAM BO swapping fix
      - Revert dirty fb helper change
      - SR-IOV suspend/resume fixes
      - Work around GCC array bounds check fail warning
      - UMC 8.10 fixes
      - Misc fixes and cleanups
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221012162650.8810-1-alexander.deucher@amd.com
      fc3523a8
    • Dave Airlie's avatar
      Merge tag 'drm-intel-next-fixes-2022-10-13' of... · e55978a4
      Dave Airlie authored
      Merge tag 'drm-intel-next-fixes-2022-10-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
      
      - Fix revocation of non-persistent contexts (Tvrtko Ursulin)
      - Handle migration for dpt (Matthew Auld)
      - Fix display problems after resume (Thomas Hellström)
      - Allow control over the flags when migrating (Matthew Auld)
      - Consider DG2_RC_CCS_CC when migrating buffers (Matthew Auld)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/Y0gK9QmCmktLLzqp@tursulin-desk
      e55978a4
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 6d84c258
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fixes for Mediatek MT6370 binding
      
       - Merge the DT overlay maintainer entry to the main entry as Pantelis
         is not active and Frank is taking a step back
      
      * tag 'devicetree-fixes-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        MAINTAINERS: of: collapse overlay entry into main device tree entry
        dt-bindings: mfd: mt6370: fix the interrupt order of the charger in the example
        dt-bindings: leds: mt6370: Fix MT6370 LED indicator DT warning
      6d84c258
    • Linus Torvalds's avatar
      Merge tag 'mmc-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 09817941
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
         - Add SD card quirk for broken discard
      
        MMC host:
         - renesas_sdhi: Fix clock rounding errors
         - sdhci-sprd: Fix minimum clock limit to detect cards
         - sdhci-tegra: Use actual clock rate for SW tuning correction"
      
      * tag 'mmc-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: sdhci-sprd: Fix minimum clock limit
        mmc: sdhci-tegra: Use actual clock rate for SW tuning correction
        mmc: renesas_sdhi: Fix rounding errors
        mmc: core: Add SD card quirk for broken discard
      09817941
    • Linus Torvalds's avatar
      Merge tag 'docs-6.1-2' of git://git.lwn.net/linux · f2b220ef
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "A handful of relatively simple documentation fixes, plus a set of
        patches catching the Chinese translation up with the front-page
        rework"
      
      * tag 'docs-6.1-2' of git://git.lwn.net/linux:
        Documentation: rtla: Correct command line example
        docs/zh_CN: add a man-pages link to zh_CN/index.rst
        docs/zh_CN: Rewrite the Chinese translation front page
        docs/zh_CN: add zh_CN/arch.rst
        docs/zh_CN: promote the title of zh_CN/process/index.rst
        docs/zh_CN: Update the translation of page_owner to 6.0-rc7
        docs/zh_CN: Update the translation of ksm to 6.0-rc7
        docs/howto: Replace abundoned URL of gmane.org
        Documentation: ubifs: Fix compression idiom
        Documentation/mm/page_owner.rst: delete frequently changing experimental data
        docs/zh_CN: Fix build warning
        docs: ftrace: Correct access mode
      f2b220ef
    • Linus Torvalds's avatar
      Merge tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 66ae0436
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, and wifi.
      
      Current release - regressions:
      
         - Revert "net/sched: taprio: make qdisc_leaf() see the
           per-netdev-queue pfifo child qdiscs", it may cause crashes when the
           qdisc is reconfigured
      
         - inet: ping: fix splat due to packet allocation refactoring in inet
      
         - tcp: clean up kernel listener's reqsk in inet_twsk_purge(), fix UAF
           due to races when per-netns hash table is used
      
        Current release - new code bugs:
      
         - eth: adin1110: check in netdev_event that netdev belongs to driver
      
         - fixes for PTR_ERR() vs NULL bugs in driver code, from Dan and co.
      
        Previous releases - regressions:
      
         - ipv4: handle attempt to delete multipath route when fib_info
           contains an nh reference, avoid oob access
      
         - wifi: fix handful of bugs in the new Multi-BSSID code
      
         - wifi: mt76: fix rate reporting / throughput regression on mt7915
           and newer, fix checksum offload
      
         - wifi: iwlwifi: mvm: fix double list_add at
           iwl_mvm_mac_wake_tx_queue (other cases)
      
         - wifi: mac80211: do not drop packets smaller than the LLC-SNAP
           header on fast-rx
      
        Previous releases - always broken:
      
         - ieee802154: don't warn zero-sized raw_sendmsg()
      
         - ipv6: ping: fix wrong checksum for large frames
      
         - mctp: prevent double key removal and unref
      
         - tcp/udp: fix memory leaks and races around IPV6_ADDRFORM
      
         - hv_netvsc: fix race between VF offering and VF association message
      
        Misc:
      
         - remove -Warray-bounds silencing in the drivers, compilers fixed"
      
      * tag 'net-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
        sunhme: fix an IS_ERR() vs NULL check in probe
        net: marvell: prestera: fix a couple NULL vs IS_ERR() checks
        kcm: avoid potential race in kcm_tx_work
        tcp: Clean up kernel listener's reqsk in inet_twsk_purge()
        net: phy: micrel: Fixes FIELD_GET assertion
        openvswitch: add nf_ct_is_confirmed check before assigning the helper
        tcp: Fix data races around icsk->icsk_af_ops.
        ipv6: Fix data races around sk->sk_prot.
        tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
        udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
        tcp/udp: Fix memory leak in ipv6_renew_options().
        mctp: prevent double key removal and unref
        selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
        netfilter: rpfilter/fib: Populate flowic_l3mdev field
        selftests: netfilter: Test reverse path filtering
        net/mlx5: Make ASO poll CQ usable in atomic context
        tcp: cdg: allow tcp_cdg_release() to be called multiple times
        inet: ping: fix recent breakage
        ipv6: ping: fix wrong checksum for large frames
        net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
        ...
      66ae0436
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · d6f04f26
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
      
       - Fix a regression in virtio pci on power
      
       - Add a reviewer for ifcvf
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vdpa/ifcvf: add reviewer
        virtio_pci: use irq to detect interrupt support
      d6f04f26
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · aa41478a
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Found that the synthetic events were using strlen/strscpy() on values
         that could have come from userspace, and that is bad.
      
         Consolidate the string logic of kprobe and eprobe and extend it to
         the synthetic events to safely process string addresses.
      
       - Clean up content of text dump in ftrace_bug() where the output does
         not make char reads into signed and sign extending the byte output.
      
       - Fix some kernel docs in the ring buffer code.
      
      * tag 'trace-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Fix reading strings from synthetic events
        tracing: Add "(fault)" name injection to kernel probes
        tracing: Move duplicate code of trace_kprobe/eprobe.c into header
        ring-buffer: Fix kernel-doc
        ftrace: Fix char print issue in print_ip_ins()
      aa41478a
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-6.1-rc1' of git://www.linux-watchdog.org/linux-watchdog · 3d33e6dd
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - new driver for Exar/MaxLinear XR28V38x
      
       - support for exynosautov9 SoC
      
       - support for Renesas R-Car V5H (R8A779G0) and RZ/V2M (r9a09g011) SoC
      
       - support for imx93
      
       - several other fixes and improvements
      
      * tag 'linux-watchdog-6.1-rc1' of git://www.linux-watchdog.org/linux-watchdog: (36 commits)
        watchdog: twl4030_wdt: add missing mod_devicetable.h include
        dt-bindings: watchdog: migrate mt7621 text bindings to YAML
        watchdog: sp5100_tco: Add "action" module parameter
        watchdog: imx93: add watchdog timer on imx93
        watchdog: imx7ulp_wdt: init wdog when it was active
        watchdog: imx7ulp_wdt: Handle wdog reconfigure failure
        watchdog: imx7ulp_wdt: Fix RCS timeout issue
        watchdog: imx7ulp_wdt: Check CMD32EN in wdog init
        watchdog: imx7ulp: Add explict memory barrier for unlock sequence
        watchdog: imx7ulp: Move suspend/resume to noirq phase
        watchdog: rti-wdt:using the pm_runtime_resume_and_get to simplify the code
        dt-bindings: watchdog: rockchip: add rockchip,rk3128-wdt
        watchdog: s3c2410_wdt: support exynosautov9 watchdog
        dt-bindings: watchdog: add exynosautov9 compatible
        watchdog: npcm: Enable clock if provided
        watchdog: meson: keep running if already active
        watchdog: dt-bindings: atmel,at91sam9-wdt: convert to json-schema
        watchdog: armada_37xx_wdt: Fix .set_timeout callback
        watchdog: sa1100: make variable sa1100dog_driver static
        watchdog: w83977f_wdt: Fix comment typo
        ...
      3d33e6dd
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client · 524d0c68
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "A quiet round this time: several assorted filesystem fixes, the most
        noteworthy one being some additional wakeups in cap handling code, and
        a messenger cleanup"
      
      * tag 'ceph-for-6.1-rc1' of https://github.com/ceph/ceph-client:
        ceph: remove Sage's git tree from documentation
        ceph: fix incorrectly showing the .snap size for stat
        ceph: fail the open_by_handle_at() if the dentry is being unlinked
        ceph: increment i_version when doing a setattr with caps
        ceph: Use kcalloc for allocating multiple elements
        ceph: no need to wait for transition RDCACHE|RD -> RD
        ceph: fail the request if the peer MDS doesn't support getvxattr op
        ceph: wake up the waiters if any new caps comes
        libceph: drop last_piece flag from ceph_msg_data_cursor
      524d0c68
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · 66b83455
      Linus Torvalds authored
      Pull NFS client updates from Anna Schumaker:
       "New Features:
         - Add NFSv4.2 xattr tracepoints
         - Replace xprtiod WQ in rpcrdma
         - Flexfiles cancels I/O on layout recall or revoke
      
        Bugfixes and Cleanups:
         - Directly use ida_alloc() / ida_free()
         - Don't open-code max_t()
         - Prefer using strscpy over strlcpy
         - Remove unused forward declarations
         - Always return layout states on flexfiles layout return
         - Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of
           error
         - Allow more xprtrdma memory allocations to fail without triggering a
           reclaim
         - Various other xprtrdma clean ups
         - Fix rpc_killall_tasks() races"
      
      * tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
        NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked
        SUNRPC: Add API to force the client to disconnect
        SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls
        SUNRPC: Fix races with rpc_killall_tasks()
        xprtrdma: Fix uninitialized variable
        xprtrdma: Prevent memory allocations from driving a reclaim
        xprtrdma: Memory allocation should be allowed to fail during connect
        xprtrdma: MR-related memory allocation should be allowed to fail
        xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc()
        xprtrdma: Clean up synopsis of rpcrdma_req_create()
        svcrdma: Clean up RPCRDMA_DEF_GFP
        SUNRPC: Replace the use of the xprtiod WQ in rpcrdma
        NFSv4.2: Add a tracepoint for listxattr
        NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr
        NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2
        NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR
        nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration
        NFSv4: remove nfs4_renewd_prepare_shutdown() declaration
        fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment
        NFSv4/pNFS: Always return layout stats on layout return for flexfiles
        ...
      66b83455
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 531d3b5f
      Linus Torvalds authored
      Pull orangefs update from Mike Marshall:
       "Change iterate to iterate_shared"
      
      * tag 'for-linus-6.1-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        Orangefs: change iterate to iterate_shared
      531d3b5f
    • Pierre Gondois's avatar
      Documentation: rtla: Correct command line example · 877d95dc
      Pierre Gondois authored
      The '-t/-T' parameters seem to have been swapped:
      -t/--trace[=file]: save the stopped trace
      to [file|timerlat_trace.txt]
      -T/--thread us: stop trace if the thread latency
      is higher than the argument in us
      
      Swap them back.
      Signed-off-by: default avatarPierre Gondois <pierre.gondois@arm.com>
      Acked-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
      Link: https://lore.kernel.org/r/20221006084409.3882542-1-pierre.gondois@arm.comSigned-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      877d95dc
    • Dan Carpenter's avatar
      sunhme: fix an IS_ERR() vs NULL check in probe · 99df45c9
      Dan Carpenter authored
      The devm_request_region() function does not return error pointers, it
      returns NULL on error.
      
      Fixes: 914d9b27 ("sunhme: switch to devres")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarSean Anderson <seanga2@gmail.com>
      Reviewed-by: default avatarRolf Eike Beer <eike-kernel@sf-tec.de>
      Link: https://lore.kernel.org/r/Y0bWzJL8JknX8MUf@kiliSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      99df45c9
    • Dan Carpenter's avatar
      net: marvell: prestera: fix a couple NULL vs IS_ERR() checks · 30e9672a
      Dan Carpenter authored
      The __prestera_nexthop_group_create() function returns NULL on error
      and the prestera_nexthop_group_get() returns error pointers.  Fix these
      two checks.
      
      Fixes: 0a23ae23 ("net: marvell: prestera: Add router nexthops ABI")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/Y0bWq+7DoKK465z8@kiliSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      30e9672a
    • Eric Dumazet's avatar
      kcm: avoid potential race in kcm_tx_work · ec7eede3
      Eric Dumazet authored
      syzbot found that kcm_tx_work() could crash [1] in:
      
      	/* Primarily for SOCK_SEQPACKET sockets */
      	if (likely(sk->sk_socket) &&
      	    test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
      <<*>>	clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
      		sk->sk_write_space(sk);
      	}
      
      I think the reason is that another thread might concurrently
      run in kcm_release() and call sock_orphan(sk) while sk is not
      locked. kcm_tx_work() find sk->sk_socket being NULL.
      
      [1]
      BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
      BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
      BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
      Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53
      
      CPU: 0 PID: 53 Comm: kworker/u4:3 Not tainted 5.19.0-rc3-next-20220621-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: kkcmd kcm_tx_work
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      kasan_report+0xbe/0x1f0 mm/kasan/report.c:495
      check_region_inline mm/kasan/generic.c:183 [inline]
      kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
      instrument_atomic_write include/linux/instrumented.h:86 [inline]
      clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
      kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
      process_one_work+0x996/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e9/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
      </TASK>
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Link: https://lore.kernel.org/r/20221012133412.519394-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec7eede3
    • Kuniyuki Iwashima's avatar
      tcp: Clean up kernel listener's reqsk in inet_twsk_purge() · 740ea3c4
      Kuniyuki Iwashima authored
      Eric Dumazet reported a use-after-free related to the per-netns ehash
      series. [0]
      
      When we create a TCP socket from userspace, the socket always holds a
      refcnt of the netns.  This guarantees that a reqsk timer is always fired
      before netns dismantle.  Each reqsk has a refcnt of its listener, so the
      listener is not freed before the reqsk, and the net is not freed before
      the listener as well.
      
      OTOH, when in-kernel users create a TCP socket, it might not hold a refcnt
      of its netns.  Thus, a reqsk timer can be fired after the netns dismantle
      and access freed per-netns ehash.
      
      To avoid the use-after-free, we need to clean up TCP_NEW_SYN_RECV sockets
      in inet_twsk_purge() if the netns uses a per-netns ehash.
      
      [0]: https://lore.kernel.org/netdev/CANn89iLXMup0dRD_Ov79Xt8N9FM0XdhCHEN05sf3eLwxKweM6w@mail.gmail.com/
      
      BUG: KASAN: use-after-free in tcp_or_dccp_get_hashinfo
      include/net/inet_hashtables.h:181 [inline]
      BUG: KASAN: use-after-free in reqsk_queue_unlink+0x320/0x350
      net/ipv4/inet_connection_sock.c:913
      Read of size 8 at addr ffff88807545bd80 by task syz-executor.2/8301
      
      CPU: 1 PID: 8301 Comm: syz-executor.2 Not tainted
      6.0.0-syzkaller-02757-gaf7d23f9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 09/22/2022
      Call Trace:
      <IRQ>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
      tcp_or_dccp_get_hashinfo include/net/inet_hashtables.h:181 [inline]
      reqsk_queue_unlink+0x320/0x350 net/ipv4/inet_connection_sock.c:913
      inet_csk_reqsk_queue_drop net/ipv4/inet_connection_sock.c:927 [inline]
      inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:939 [inline]
      reqsk_timer_handler+0x724/0x1160 net/ipv4/inet_connection_sock.c:1053
      call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474
      expire_timers kernel/time/timer.c:1519 [inline]
      __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790
      __run_timers kernel/time/timer.c:1768 [inline]
      run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803
      __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
      invoke_softirq kernel/softirq.c:445 [inline]
      __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
      irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
      sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1107
      </IRQ>
      
      Fixes: d1e5e640 ("tcp: Introduce optional per-netns ehash.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221012145036.74960-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      740ea3c4
    • Frank Rowand's avatar
      MAINTAINERS: of: collapse overlay entry into main device tree entry · 917c362b
      Frank Rowand authored
      Pantelis has not been active in recent years so no need to maintain
      a separate entry for device tree overlays.
      Signed-off-by: default avatarFrank Rowand <frank.rowand@sony.com>
      Link: https://lore.kernel.org/r/20221012220548.4163865-1-frowand.list@gmail.comSigned-off-by: default avatarRob Herring <robh@kernel.org>
      917c362b
    • Michael S. Tsirkin's avatar
      vdpa/ifcvf: add reviewer · be8ddea9
      Michael S. Tsirkin authored
      Zhu Lingshan has been writing and reviewing ifcvf patches for
      a while now, add as reviewer.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarZhu Lingshan <lingshan.zhu@intel.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      be8ddea9
    • Michael S. Tsirkin's avatar
      virtio_pci: use irq to detect interrupt support · 2145ab51
      Michael S. Tsirkin authored
      commit 71491c54 ("virtio_pci: don't try to use intxif pin is zero")
      breaks virtio_pci on powerpc, when running as a qemu guest.
      
      vp_find_vqs() bails out because pci_dev->pin == 0.
      
      But pci_dev->irq is populated correctly, so vp_find_vqs_intx() would
      succeed if we called it - which is what the code used to do.
      
      This seems to happen because pci_dev->pin is not populated in
      pci_assign_irq(). A PCI core bug? Maybe.
      
      However Linus said:
      	I really think that that is basically the only time you should use
      	that 'pci_dev->pin' thing: it basically exists not for "does this
      	device have an IRQ", but for "what is the routing of this irq on this
      	device".
      
      and
      	The correct way to check for "no irq" doesn't use NO_IRQ at all, it just does
      		if (dev->irq) ...
      
      so let's just check irq and be done with it.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Fixes: 71491c54 ("virtio_pci: don't try to use intxif pin is zero")
      Cc: "Angus Chen" <angus.chen@jaguarmicro.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20221012220312.308522-1-mst@redhat.com>
      2145ab51
    • Paolo Abeni's avatar
      Merge tag 'wireless-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · ac85bc71
      Paolo Abeni authored
      Johannes Berg says:
      
      ====================
      More wireless fixes for 6.1
      
      This has only the fixes for the scan parsing issues.
      
      * tag 'wireless-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: cfg80211: update hidden BSSes to avoid WARN_ON
        wifi: mac80211: fix crash in beacon protection for P2P-device
        wifi: mac80211_hwsim: avoid mac80211 warning on bad rate
        wifi: cfg80211: avoid nontransmitted BSS list corruption
        wifi: cfg80211: fix BSS refcounting bugs
        wifi: cfg80211: ensure length byte is present before access
        wifi: mac80211: fix MBSSID parsing use-after-free
        wifi: cfg80211/mac80211: reject bad MBSSID elements
        wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans()
      ====================
      
      Link: https://lore.kernel.org/r/20221013100522.46346-1-johannes@sipsolutions.netSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ac85bc71
    • Johannes Berg's avatar
      Merge branch 'cve-fixes-2022-10-13' · e7ad651c
      Johannes Berg authored
      Pull in the fixes for various scan parsing bugs found by
      Sönke Huster by fuzzing.
      e7ad651c
    • Divya Koppera's avatar
      net: phy: micrel: Fixes FIELD_GET assertion · fa182ea2
      Divya Koppera authored
      FIELD_GET() must only be used with a mask that is a compile-time
      constant. Mark the functions as __always_inline to avoid the problem.
      
      Fixes: 21b688da ("net: phy: micrel: Cable Diag feature for lan8814 phy")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDivya Koppera <Divya.Koppera@microchip.com>
      Link: https://lore.kernel.org/r/20221011095437.12580-1-Divya.Koppera@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa182ea2
    • Xin Long's avatar
      openvswitch: add nf_ct_is_confirmed check before assigning the helper · 3c186054
      Xin Long authored
      A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)'
      applies on a confirmed connection:
      
        WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98
        RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack]
        Call Trace:
         <TASK>
         nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack]
         __nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack]
         __ovs_ct_lookup+0x72e/0x780 [openvswitch]
         ovs_ct_execute+0x1d8/0x920 [openvswitch]
         do_execute_actions+0x4e6/0xb60 [openvswitch]
         ovs_execute_actions+0x60/0x140 [openvswitch]
         ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch]
         genl_family_rcv_msg_doit.isra.15+0x113/0x150
         genl_rcv_msg+0xef/0x1f0
      
      which can be reproduced with these OVS flows:
      
        table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk
        actions=ct(commit, table=1)
        table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new
        actions=ct(commit, alg=ftp),normal
      
      The issue was introduced by commit 248d45f1 ("openvswitch: Allow
      attaching helper in later commit") where it somehow removed the check
      of nf_ct_is_confirmed before asigning the helper. This patch is to fix
      it by bringing it back.
      
      Fixes: 248d45f1 ("openvswitch: Allow attaching helper in later commit")
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarAaron Conole <aconole@redhat.com>
      Tested-by: default avatarAaron Conole <aconole@redhat.com>
      Link: https://lore.kernel.org/r/c5c9092a22a2194650222bffaf786902613deb16.1665085502.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c186054
    • Jakub Kicinski's avatar
      Merge branch 'tcp-udp-fix-memory-leaks-and-data-races-around-ipv6_addrform' · 4f0f2121
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      tcp/udp: Fix memory leaks and data races around IPV6_ADDRFORM.
      
      This series fixes some memory leaks and data races caused in the
      same scenario where one thread converts an IPv6 socket into IPv4
      with IPV6_ADDRFORM and another accesses the socket concurrently.
      
        v4: https://lore.kernel.org/netdev/20221004171802.40968-1-kuniyu@amazon.com/
        v3 (Resend): https://lore.kernel.org/netdev/20221003154425.49458-1-kuniyu@amazon.com/
        v3: https://lore.kernel.org/netdev/20220929012542.55424-1-kuniyu@amazon.com/
        v2: https://lore.kernel.org/netdev/20220928002741.64237-1-kuniyu@amazon.com/
        v1: https://lore.kernel.org/netdev/20220927161209.32939-1-kuniyu@amazon.com/
      ====================
      
      Link: https://lore.kernel.org/r/20221006185349.74777-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4f0f2121
    • Kuniyuki Iwashima's avatar
      tcp: Fix data races around icsk->icsk_af_ops. · f49cd2f4
      Kuniyuki Iwashima authored
      setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops
      under lock_sock(), but tcp_(get|set)sockopt() read it locklessly.  To
      avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE()
      for the reads and writes.
      
      Thanks to Eric Dumazet for providing the syzbot report:
      
      BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect
      
      write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
      tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
      __inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
      inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
      __sys_connect_file net/socket.c:1976 [inline]
      __sys_connect+0x197/0x1b0 net/socket.c:1993
      __do_sys_connect net/socket.c:2003 [inline]
      __se_sys_connect net/socket.c:2000 [inline]
      __x64_sys_connect+0x3d/0x50 net/socket.c:2000
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
      tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
      sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
      __sys_setsockopt+0x212/0x2b0 net/socket.c:2252
      __do_sys_setsockopt net/socket.c:2263 [inline]
      __se_sys_setsockopt net/socket.c:2260 [inline]
      __x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
      6.0.0-rc4-syzkaller-00331-g4ed9c1e9-dirty #0
      
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 08/26/2022
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f49cd2f4
    • Kuniyuki Iwashima's avatar
      ipv6: Fix data races around sk->sk_prot. · 364f997b
      Kuniyuki Iwashima authored
      Commit 086d4905 ("ipv6: annotate some data-races around sk->sk_prot")
      fixed some data-races around sk->sk_prot but it was not enough.
      
      Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
      without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
      load tearing.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      364f997b
    • Kuniyuki Iwashima's avatar
      tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). · d38afeec
      Kuniyuki Iwashima authored
      Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
      able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
      IPv4 conversion by IPV6_ADDRFORM.  However, commit 03485f2a ("udpv6:
      Add lockless sendmsg() support") added a lockless memory allocation path,
      which could cause a memory leak:
      
      setsockopt(IPV6_ADDRFORM)                 sendmsg()
      +-----------------------+                 +-------+
      - do_ipv6_setsockopt(sk, ...)             - udpv6_sendmsg(sk, ...)
        - sockopt_lock_sock(sk)                   ^._ called via udpv6_prot
          - lock_sock(sk)                             before WRITE_ONCE()
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)
        - inet6_destroy_sock()                    - if (!corkreq)
        - sockopt_release_sock(sk)                  - ip6_make_skb(sk, ...)
          - release_sock(sk)                          ^._ lockless fast path for
                                                          the non-corking case
      
                                                      - __ip6_append_data(sk, ...)
                                                        - ipv6_local_rxpmtu(sk, ...)
                                                          - xchg(&np->rxpmtu, skb)
                                                            ^._ rxpmtu is never freed.
      
                                                      - goto out_no_dst;
      
                                                  - lock_sock(sk)
      
      For now, rxpmtu is only the case, but not to miss the future change
      and a similar bug fixed in commit e2732600 ("net: ping6: Fix
      memleak in ipv6_renew_options()."), let's set a new function to IPv6
      sk->sk_destruct() and call inet6_cleanup_sock() there.  Since the
      conversion does not change sk->sk_destruct(), we can guarantee that
      we can clean up IPv6 resources finally.
      
      We can now remove all inet6_destroy_sock() calls from IPv6 protocol
      specific ->destroy() functions, but such changes are invasive to
      backport.  So they can be posted as a follow-up later for net-next.
      
      Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d38afeec
    • Kuniyuki Iwashima's avatar
      udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). · 21985f43
      Kuniyuki Iwashima authored
      Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot
      to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
      socket into IPv4 with IPV6_ADDRFORM.  After conversion, sk_prot is
      changed to udp_prot and ->destroy() never cleans it up, resulting in
      a memory leak.
      
      This is due to the discrepancy between inet6_destroy_sock() and
      IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
      to remove the difference.
      
      However, this is not enough for now because rxpmtu can be changed
      without lock_sock() after commit 03485f2a ("udpv6: Add lockless
      sendmsg() support").  We will fix this case in the following patch.
      
      Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
      remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
      in the future.
      
      Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      21985f43
    • Kuniyuki Iwashima's avatar
      tcp/udp: Fix memory leak in ipv6_renew_options(). · 3c52c6bb
      Kuniyuki Iwashima authored
      syzbot reported a memory leak [0] related to IPV6_ADDRFORM.
      
      The scenario is that while one thread is converting an IPv6 socket into
      IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and
      allocates memory to inet6_sk(sk)->XXX after conversion.
      
      Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources,
      which inet6_destroy_sock() should have cleaned up.
      
      setsockopt(IPV6_ADDRFORM)                 setsockopt(IPV6_DSTOPTS)
      +-----------------------+                 +----------------------+
      - do_ipv6_setsockopt(sk, ...)
        - sockopt_lock_sock(sk)                 - do_ipv6_setsockopt(sk, ...)
          - lock_sock(sk)                         ^._ called via tcpv6_prot
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)          before WRITE_ONCE()
        - xchg(&np->opt, NULL)
        - txopt_put(opt)
        - sockopt_release_sock(sk)
          - release_sock(sk)                      - sockopt_lock_sock(sk)
                                                    - lock_sock(sk)
                                                  - ipv6_set_opt_hdr(sk, ...)
                                                    - ipv6_update_options(sk, opt)
                                                      - xchg(&inet6_sk(sk)->opt, opt)
                                                        ^._ opt is never freed.
      
                                                  - sockopt_release_sock(sk)
                                                    - release_sock(sk)
      
      Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this
      memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after
      acquiring the lock.
      
      This issue exists from the initial commit between IPV6_ADDRFORM and
      IPV6_PKTOPTIONS.
      
      [0]:
      BUG: memory leak
      unreferenced object 0xffff888009ab9f80 (size 96):
        comm "syz-executor583", pid 328, jiffies 4294916198 (age 13.034s)
        hex dump (first 32 bytes):
          01 00 00 00 48 00 00 00 08 00 00 00 00 00 00 00  ....H...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002ee98ae1>] kmalloc include/linux/slab.h:605 [inline]
          [<000000002ee98ae1>] sock_kmalloc+0xb3/0x100 net/core/sock.c:2566
          [<0000000065d7b698>] ipv6_renew_options+0x21e/0x10b0 net/ipv6/exthdrs.c:1318
          [<00000000a8c756d7>] ipv6_set_opt_hdr net/ipv6/ipv6_sockglue.c:354 [inline]
          [<00000000a8c756d7>] do_ipv6_setsockopt.constprop.0+0x28b7/0x4350 net/ipv6/ipv6_sockglue.c:668
          [<000000002854d204>] ipv6_setsockopt+0xdf/0x190 net/ipv6/ipv6_sockglue.c:1021
          [<00000000e69fdcf8>] tcp_setsockopt+0x13b/0x2620 net/ipv4/tcp.c:3789
          [<0000000090da4b9b>] __sys_setsockopt+0x239/0x620 net/socket.c:2252
          [<00000000b10d192f>] __do_sys_setsockopt net/socket.c:2263 [inline]
          [<00000000b10d192f>] __se_sys_setsockopt net/socket.c:2260 [inline]
          [<00000000b10d192f>] __x64_sys_setsockopt+0xbe/0x160 net/socket.c:2260
          [<000000000a80d7aa>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<000000000a80d7aa>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
          [<000000004562b5c6>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c52c6bb