1. 26 May, 2023 4 commits
    • Namhyung Kim's avatar
      perf ftrace latency: Remove unnecessary "--" from --use-nsec option · 8d73259e
      Namhyung Kim authored
      The option name should not have the dashes.  Current version shows four
      dashes for the option.
      
        $ perf ftrace latency -h
      
         Usage: perf ftrace [<options>] [<command>]
            or: perf ftrace [<options>] -- [<command>] [<options>]
            or: perf ftrace {trace|latency} [<options>] [<command>]
            or: perf ftrace {trace|latency} [<options>] -- [<command>] [<options>]
      
            -b, --use-bpf         Use BPF to measure function latency
            -n, ----use-nsec      Use nano-second histogram
            -T, --trace-funcs <func>
                                  Show latency of given function
      
      Fixes: 84005bb6 ("perf ftrace latency: Add -n/--use-nsec option")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230525212038.3535851-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8d73259e
    • Linus Torvalds's avatar
      Merge tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 0d85b27b
      Linus Torvalds authored
      Pull smb directory moves and client fixes from Steve French:
       "Four smb3 client fixes (three of which marked for stable) and three
        patches to move of fs/cifs and fs/ksmbd to a new common "fs/smb"
        parent directory
      
         - Move the client and server source directories to a common parent
           directory:
      
             fs/cifs -> fs/smb/client
             fs/ksmbd -> fs/smb/server
             fs/smbfs_common -> fs/smb/common
      
         - important readahead fix
      
         - important fix for SMB1 regression
      
         - fix for missing mount option ("mapchars") in mount API conversion
      
         - minor debugging improvement"
      
      * tag '6.4-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: move Documentation/filesystems/cifs to Documentation/filesystems/smb
        cifs: correct references in Documentation to old fs/cifs path
        smb: move client and server files to common directory fs/smb
        cifs: mapchars mount option ignored
        smb3: display debug information better for encryption
        cifs: fix smb1 mount regression
        cifs: Fix cifs_limit_bvec_subset() to correctly check the maxmimum size
      0d85b27b
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 192fe71c
      Linus Torvalds authored
      Pull parisc architecture fixes from Helge Deller:
       "Quite a bunch of real bugfixes in here and most of them are tagged for
        backporting: A fix for cache flushing from irq context, a kprobes &
        kgdb breakpoint handling fix, and a fix in the alternative code
        patching function to take care of CPU hotplugging.
      
        parisc now provides LOCKDEP support and comes with a lightweight
        spinlock check. Both features helped me to find the cache flush bug.
      
        Additionally writing the AGP gatt has been fixed, the machine allows
        the user to reboot after a system halt and arch_sync_dma_for_cpu() has
        been optimized for PCXL PCUs.
      
        Summary:
      
         - Fix flush_dcache_page() for usage from irq context
      
         - Handle kprobes breakpoints only in kernel context
      
         - Handle kgdb breakpoints only in kernel context
      
         - Use num_present_cpus() in alternative patching code
      
         - Enable LOCKDEP support
      
         - Add lightweight spinlock checks
      
         - Flush AGP gatt writes and adjust gatt mask in parisc_agp_mask_memory()
      
         - Allow to reboot machine after system halt
      
         - Improve cache flushing for PCXL in arch_sync_dma_for_cpu()"
      
      * tag 'parisc-for-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix flush_dcache_page() for usage from irq context
        parisc: Handle kgdb breakpoints only in kernel context
        parisc: Handle kprobes breakpoints only in kernel context
        parisc: Allow to reboot machine after system halt
        parisc: Enable LOCKDEP support
        parisc: Add lightweight spinlock checks
        parisc: Use num_present_cpus() in alternative patching code
        parisc: Flush gatt writes and adjust gatt mask in parisc_agp_mask_memory()
        parisc: Improve cache flushing for PCXL in arch_sync_dma_for_cpu()
      192fe71c
    • Linus Torvalds's avatar
      module: error out early on concurrent load of the same module file · 9828ed3f
      Linus Torvalds authored
      It turns out that udev under certain circumstances will concurrently try
      to load the same modules over-and-over excessively.  This isn't a kernel
      bug, but it ends up affecting the kernel, to the point that under
      certain circumstances we can fail to boot, because the kernel uses a lot
      of memory to read all the module data all at once.
      
      Note that it isn't a memory leak, it's just basically a thundering herd
      problem happening at bootup with a lot of CPUs, with the worst cases
      then being pretty bad.
      
      Admittedly the worst situations are somewhat contrived: lots and lots of
      CPUs, not a lot of memory, and KASAN enabled to make it all slower and
      as such (unintentionally) exacerbate the problem.
      
      Luis explains: [1]
      
       "My best assessment of the situation is that each CPU in udev ends up
        triggering a load of duplicate set of modules, not just one, but *a
        lot*. Not sure what heuristics udev uses to load a set of modules per
        CPU."
      
      Petr Pavlu chimes in: [2]
      
       "My understanding is that udev workers are forked. An initial kmod
        context is created by the main udevd process but no sharing happens
        after the fork. It means that the mentioned memory pool logic doesn't
        really kick in.
      
        Multiple parallel load requests come from multiple udev workers, for
        instance, each handling an udev event for one CPU device and making
        the exactly same requests as all others are doing at the same time.
      
        The optimization idea would be to recognize these duplicate requests
        at the udevd/kmod level and converge them"
      
      Note that module loading has tried to mitigate this issue before, see
      for example commit 064f4536 ("module: avoid allocation if module is
      already present and ready"), which has a few ASCII graphs on memory use
      due to this same issue.
      
      However, while that noticed that the module was already loaded, and
      exited with an error early before spending any more time on setting up
      the module, it didn't handle the case of multiple concurrent module
      loads all being active - but not complete - at the same time.
      
      Yes, one of them will eventually win the race and finalize its copy, and
      the others will then notice that the module already exists and error
      out, but while this all happens, we have tons of unnecessary concurrent
      work being done.
      
      Again, the real fix is for udev to not do that (maybe it should use
      threads instead of fork, and have actual shared data structures and not
      cause duplicate work). That real fix is apparently not trivial.
      
      But it turns out that the kernel already has a pretty good model for
      dealing with concurrent access to the same file: the i_writecount of the
      inode.
      
      In fact, the module loading already indirectly uses 'i_writecount' ,
      because 'kernel_file_read()' will in fact do
      
      	ret = deny_write_access(file);
      	if (ret)
      		return ret;
      	...
      	allow_write_access(file);
      
      around the read of the file data.  We do not allow concurrent writes to
      the file, and return -ETXTBUSY if the file was open for writing at the
      same time as the module data is loaded from it.
      
      And the solution to the reader concurrency problem is to simply extend
      this "no concurrent writers" logic to simply be "exclusive access".
      
      Note that "exclusive" in this context isn't really some absolute thing:
      it's only exclusion from writers and from other "special readers" that
      do this writer denial.  So we simply introduce a variation of that
      "deny_write_access()" logic that not only denies write access, but also
      requires that this is the _only_ such access that denies write access.
      
      Which means that you can't start loading a module that is already being
      loaded as a module by somebody else, or you will get the same -ETXTBSY
      error that you would get if there were writers around.
      
      [ It also means that you can't try to load a currently executing
        executable as a module, for the same reason: executables do that same
        "deny_write_access()" thing, and that's obviously where the whole
        ETXTBSY logic traditionally came from.
      
        This is not a problem for kernel modules, since the set of normal
        executable files and kernel module files is entirely disjoint. ]
      
      This new function is called "exclusive_deny_write_access()", and the
      implementation is trivial, in that it's just an atomic decrement of
      i_writecount if it was 0 before.
      
      To use that new exclusivity check, all we then do is wrap the module
      loading with that exclusive_deny_write_access()() / allow_write_access()
      pair.  The actual patch is a bit bigger than that, because we want to
      surround not just the "load file data" part, but the whole module setup,
      to get maximum exclusion.
      
      So this ends up splitting up "finit_module()" into a few helper
      functions to make it all very clear and legible.
      
      In Luis' test-case (bringing up 255 vcpu's in a virtual machine [3]),
      the "wasted vmalloc" space (ie module data read into a vmalloc'ed area
      in order to be loaded as a module, but then discarded because somebody
      else loaded the same module instead) dropped from 1.8GiB to 474kB.  Yes,
      that's gigabytes to kilobytes.
      
      It doesn't drop completely to zero, because even with this change, you
      can still end up having completely serial pointless module loads, where
      one udev process has loaded a module fully (and thus the kernel has
      released that exclusive lock on the module file), and then another udev
      process tries to load the same module again.
      
      So while we cannot fully get rid of the fundamental bug in user space,
      we _can_ get rid of the excessive concurrent thundering herd effect.
      
      A couple of final side notes on this all:
      
       - This tweak only affects the "finit_module()" system call, which gives
         the kernel a file descriptor with the module data.
      
         You can also just feed the module data as raw data from user space
         with "init_module()" (note the lack of 'f' at the beginning), and
         obviously for that case we do _not_ have any "exclusive read" logic.
      
         So if you absolutely want to do things wrong in user space, and try
         to load the same module multiple times, and error out only later when
         the kernel ends up saying "you can't load the same module name
         twice", you can still do that.
      
         And in fact, some distros will do exactly that, because they will
         uncompress the kernel module data in user space before feeding it to
         the kernel (mainly because they haven't started using the new kernel
         side decompression yet).
      
         So this is not some absolute "you can't do concurrent loads of the
         same module". It's literally just a very simple heuristic that will
         catch it early in case you try to load the exact same module file at
         the same time, and in that case avoid a potentially nasty situation.
      
       - There is another user of "deny_write_access()": the verity code that
         enables fs-verity on a file (the FS_IOC_ENABLE_VERITY ioctl).
      
         If you use fs-verity and you care about verifying the kernel modules
         (which does make sense), you should do it *before* loading said
         kernel module. That may sound obvious, but now the implementation
         basically requires it. Because if you try to do it concurrently, the
         kernel may refuse to load the module file that is being set up by the
         fs-verity code.
      
       - This all will obviously mean that if you insist on loading the same
         module in parallel, only one module load will succeed, and the others
         will return with an error.
      
         That was true before too, but what is different is that the -ETXTBSY
         error can be returned *before* the success case of another process
         fully loading and instantiating the module.
      
         Again, that might sound obvious, and it is indeed the whole point of
         the whole change: we are much quicker to notice the whole "you're
         already in the process of loading this module".
      
         So it's very much intentional, but it does mean that if you just
         spray the kernel with "finit_module()", and expect that the module is
         immediately loaded afterwards without checking the return value, you
         are doing something horribly horribly wrong.
      
         I'd like to say that that would never happen, but the whole _reason_
         for this commit is that udev is currently doing something horribly
         horribly wrong, so ...
      
      Link: https://lore.kernel.org/all/ZEGopJ8VAYnE7LQ2@bombadil.infradead.org/ [1]
      Link: https://lore.kernel.org/all/23bd0ce6-ef78-1cd8-1f21-0e706a00424a@suse.com/ [2]
      Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%2FAAUi5z@bombadil.infradead.org/ [3]
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Lucas De Marchi <lucas.demarchi@intel.com>
      Cc: Petr Pavlu <petr.pavlu@suse.com>
      Tested-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9828ed3f
  2. 25 May, 2023 20 commits
    • Linus Torvalds's avatar
      Merge tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 9db89859
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - During the acl rework we merged this cycle the generic_listxattr()
         helper had to be modified in a way that in principle it would allow
         for POSIX ACLs to be reported. At least that was the impression we
         had initially. Because before the acl rework POSIX ACLs would be
         reported if the filesystem did have POSIX ACL xattr handlers in
         sb->s_xattr. That logic changed and now we can simply check whether
         the superblock has SB_POSIXACL set and if the inode has
         inode->i_{default_}acl set report the appropriate POSIX ACL name.
      
         However, we didn't realize that generic_listxattr() was only ever
         used by two filesystems. Both of them don't support POSIX ACLs via
         sb->s_xattr handlers and so never reported POSIX ACLs via
         generic_listxattr() even if they raised SB_POSIXACL and did contain
         inodes which had acls set. The example here is nfs4.
      
         As a result, generic_listxattr() suddenly started reporting POSIX
         ACLs when it wouldn't have before. Since SB_POSIXACL implies that the
         umask isn't stripped in the VFS nfs4 can't just drop SB_POSIXACL from
         the superblock as it would also alter umask handling for them.
      
         So just have generic_listxattr() not report POSIX ACLs as it never
         did anyway. It's documented as such.
      
       - Our SB_* flags currently use a signed integer and we shift the last
         bit causing UBSAN to complain about undefined behavior. Switch to
         using unsigned. While the original patch used an explicit unsigned
         bitshift it's now pretty common to rely on the BIT() macro in a lot
         of headers nowadays. So the patch has been adjusted to use that.
      
       - Add Namjae as ntfs reviewer. They're already active this cycle so
         let's make it explicit right now.
      
      * tag 'vfs/v6.4-rc3/misc.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        ntfs: Add myself as a reviewer
        fs: don't call posix_acl_listxattr in generic_listxattr
        fs: fix undefined behavior in bit shift for SB_NOUSER
      9db89859
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 50fb587e
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth and bpf.
      
        Current release - regressions:
      
         - net: fix skb leak in __skb_tstamp_tx()
      
         - eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
      
        Current release - new code bugs:
      
         - handshake:
            - fix sock->file allocation
            - fix handshake_dup() ref counting
      
         - bluetooth:
            - fix potential double free caused by hci_conn_unlink
            - fix UAF in hci_conn_hash_flush
      
        Previous releases - regressions:
      
         - core: fix stack overflow when LRO is disabled for virtual
           interfaces
      
         - tls: fix strparser rx issues
      
         - bpf:
            - fix many sockmap/TCP related issues
            - fix a memory leak in the LRU and LRU_PERCPU hash maps
            - init the offload table earlier
      
         - eth: mlx5e:
            - do as little as possible in napi poll when budget is 0
            - fix using eswitch mapping in nic mode
            - fix deadlock in tc route query code
      
        Previous releases - always broken:
      
         - udplite: fix NULL pointer dereference in __sk_mem_raise_allocated()
      
         - raw: fix output xfrm lookup wrt protocol
      
         - smc: reset connection when trying to use SMCRv2 fails
      
         - phy: mscc: enable VSC8501/2 RGMII RX clock
      
         - eth: octeontx2-pf: fix TSOv6 offload
      
         - eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize"
      
      * tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
        udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated().
        net: phy: mscc: enable VSC8501/2 RGMII RX clock
        net: phy: mscc: remove unnecessary phydev locking
        net: phy: mscc: add support for VSC8501
        net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE
        net/handshake: Enable the SNI extension to work properly
        net/handshake: Unpin sock->file if a handshake is cancelled
        net/handshake: handshake_genl_notify() shouldn't ignore @flags
        net/handshake: Fix uninitialized local variable
        net/handshake: Fix handshake_dup() ref counting
        net/handshake: Remove unneeded check from handshake_dup()
        ipv6: Fix out-of-bounds access in ipv6_find_tlv()
        net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
        docs: netdev: document the existence of the mail bot
        net: fix skb leak in __skb_tstamp_tx()
        r8169: Use a raw_spinlock_t for the register locks.
        page_pool: fix inconsistency for page_pool_ring_[un]lock()
        bpf, sockmap: Test progs verifier error with latest clang
        bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
        bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
        ...
      50fb587e
    • Linus Torvalds's avatar
      Merge tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · eb03e318
      Linus Torvalds authored
      Pull power supply fixes from Sebastian Reichel:
      
       - Fix power_supply_get_battery_info for devices without parent devices
         resulting in NULL pointer dereference
      
       - Fix desktop systems reporting to run on battery once a power-supply
         device with device scope appears (e.g. a HID keyboard with a battery)
      
       - Ratelimit debug print about driver not providing data
      
       - Fix race condition related to external_power_changed in multiple
         drivers (ab8500, axp288, bq25890, sc27xx, bq27xxx)
      
       - Fix LED trigger switching from blinking to solid-on when charging
         finishes
      
       - Fix multiple races in bq27xxx battery driver
      
       - mt6360: handle potential ENOMEM from devm_work_autocancel
      
       - sbs-charger: Fix SBS_CHARGER_STATUS_CHARGE_INHIBITED bit
      
       - rt9467: avoid passing 0 to dev_err_probe
      
      * tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (21 commits)
        power: supply: Fix logic checking if system is running from battery
        power: supply: mt6360: add a check of devm_work_autocancel in mt6360_charger_probe
        power: supply: sbs-charger: Fix INHIBITED bit for Status reg
        power: supply: rt9467: Fix passing zero to 'dev_err_probe'
        power: supply: Ratelimit no data debug output
        power: supply: Fix power_supply_get_battery_info() if parent is NULL
        power: supply: bq24190: Call power_supply_changed() after updating input current
        power: supply: bq25890: Call power_supply_changed() after updating input current or voltage
        power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule()
        power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize
        power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes
        power: supply: bq27xxx: Move bq27xxx_battery_update() down
        power: supply: bq27xxx: Add cache parameter to bq27xxx_battery_current_and_status()
        power: supply: bq27xxx: Fix poll_interval handling and races on remove
        power: supply: bq27xxx: Fix I2C IRQ race on remove
        power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition
        power: supply: leds: Fix blink to LED on transition
        power: supply: sc27xx: Fix external_power_changed race
        power: supply: bq25890: Fix external_power_changed race
        power: supply: axp288_fuel_gauge: Fix external_power_changed race
        ...
      eb03e318
    • Linus Torvalds's avatar
      Merge tag 'sound-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 029c77f8
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes:
      
         - HD-audio runtime PM bug fix
      
         - A couple of HD-audio quirks
      
         - Fix series of ASoC Intel AVS drivers
      
         - ASoC DPCM fix for a bug found on new Intel systems
      
         - A few other ASoC device-specific small fixes"
      
      * tag 'sound-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek: Enable headset onLenovo M70/M90
        ASoC: dwc: move DMA init to snd_soc_dai_driver probe()
        ASoC: cs35l41: Fix default regmap values for some registers
        ALSA: hda: Fix unhandled register update during auto-suspend period
        ASoC: dt-bindings: tlv320aic32x4: Fix supply names
        ASoC: Intel: avs: Add missing checks on FE startup
        ASoC: Intel: avs: Fix avs_path_module::instance_id size
        ASoC: Intel: avs: Account for UID of ACPI device
        ASoC: Intel: avs: Fix declaration of enum avs_channel_config
        ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfg
        ASoC: Intel: avs: Access path components under lock
        ASoC: Intel: avs: Fix module lookup
        ALSA: hda/ca0132: add quirk for EVGA X299 DARK
        ASoC: soc-pcm: test if a BE can be prepared
        ASoC: rt5682: Disable jack detection interrupt during suspend
        ASoC: lpass: Fix for KASAN use_after_free out of bounds
      029c77f8
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.4-3' of... · ecea3ba2
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
       "Nothing special to report just a few small fixes"
      
      * tag 'platform-drivers-x86-v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86/intel/ifs: Annotate work queue on stack so object debug does not complain
        platform/x86: ISST: Remove 8 socket limit
        platform/mellanox: mlxbf-pmc: fix sscanf() error checking
        platform/x86/amd/pmf: Fix CnQF and auto-mode after resume
        platform/x86: asus-wmi: Ignore WMI events with codes 0x7B, 0xC0
      ecea3ba2
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v6.4-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 5566051f
      Linus Torvalds authored
      Pull m68k fix from Geert Uytterhoeven:
      
       - Fix signal frame issue causing user-space crashes on 68020/68030
      
      * tag 'm68k-for-v6.4-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k: Move signal frame following exception on 68020/030
      5566051f
    • Kuniyuki Iwashima's avatar
      udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). · ad42a35b
      Kuniyuki Iwashima authored
      syzbot reported [0] a null-ptr-deref in sk_get_rmem0() while using
      IPPROTO_UDPLITE (0x88):
      
        14:25:52 executing program 1:
        r0 = socket$inet6(0xa, 0x80002, 0x88)
      
      We had a similar report [1] for probably sk_memory_allocated_add()
      in __sk_mem_raise_allocated(), and commit c915fe13 ("udplite: fix
      NULL pointer dereference") fixed it by setting .memory_allocated for
      udplite_prot and udplitev6_prot.
      
      To fix the variant, we need to set either .sysctl_wmem_offset or
      .sysctl_rmem.
      
      Now UDP and UDPLITE share the same value for .memory_allocated, so we
      use the same .sysctl_wmem_offset for UDP and UDPLITE.
      
      [0]:
      general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 0 PID: 6829 Comm: syz-executor.1 Not tainted 6.4.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/28/2023
      RIP: 0010:sk_get_rmem0 include/net/sock.h:2907 [inline]
      RIP: 0010:__sk_mem_raise_allocated+0x806/0x17a0 net/core/sock.c:3006
      Code: c1 ea 03 80 3c 02 00 0f 85 23 0f 00 00 48 8b 44 24 08 48 8b 98 38 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <0f> b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 0f 8d 6f 0a 00 00 8b
      RSP: 0018:ffffc90005d7f450 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004d92000
      RDX: 0000000000000000 RSI: ffffffff88066482 RDI: ffffffff8e2ccbb8
      RBP: ffff8880173f7000 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000030000
      R13: 0000000000000001 R14: 0000000000000340 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff8880b9800000(0063) knlGS:00000000f7f1cb40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 000000002e82f000 CR3: 0000000034ff0000 CR4: 00000000003506f0
      Call Trace:
       <TASK>
       __sk_mem_schedule+0x6c/0xe0 net/core/sock.c:3077
       udp_rmem_schedule net/ipv4/udp.c:1539 [inline]
       __udp_enqueue_schedule_skb+0x776/0xb30 net/ipv4/udp.c:1581
       __udpv6_queue_rcv_skb net/ipv6/udp.c:666 [inline]
       udpv6_queue_rcv_one_skb+0xc39/0x16c0 net/ipv6/udp.c:775
       udpv6_queue_rcv_skb+0x194/0xa10 net/ipv6/udp.c:793
       __udp6_lib_mcast_deliver net/ipv6/udp.c:906 [inline]
       __udp6_lib_rcv+0x1bda/0x2bd0 net/ipv6/udp.c:1013
       ip6_protocol_deliver_rcu+0x2e7/0x1250 net/ipv6/ip6_input.c:437
       ip6_input_finish+0x150/0x2f0 net/ipv6/ip6_input.c:482
       NF_HOOK include/linux/netfilter.h:303 [inline]
       NF_HOOK include/linux/netfilter.h:297 [inline]
       ip6_input+0xa0/0xd0 net/ipv6/ip6_input.c:491
       ip6_mc_input+0x40b/0xf50 net/ipv6/ip6_input.c:585
       dst_input include/net/dst.h:468 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
       NF_HOOK include/linux/netfilter.h:303 [inline]
       NF_HOOK include/linux/netfilter.h:297 [inline]
       ipv6_rcv+0x250/0x380 net/ipv6/ip6_input.c:309
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5491
       __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5605
       netif_receive_skb_internal net/core/dev.c:5691 [inline]
       netif_receive_skb+0x133/0x7a0 net/core/dev.c:5750
       tun_rx_batched+0x4b3/0x7a0 drivers/net/tun.c:1553
       tun_get_user+0x2452/0x39c0 drivers/net/tun.c:1989
       tun_chr_write_iter+0xdf/0x200 drivers/net/tun.c:2035
       call_write_iter include/linux/fs.h:1868 [inline]
       new_sync_write fs/read_write.c:491 [inline]
       vfs_write+0x945/0xd50 fs/read_write.c:584
       ksys_write+0x12b/0x250 fs/read_write.c:637
       do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
       __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
       do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
       entry_SYSENTER_compat_after_hwframe+0x70/0x82
      RIP: 0023:0xf7f21579
      Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
      RSP: 002b:00000000f7f1c590 EFLAGS: 00000282 ORIG_RAX: 0000000000000004
      RAX: ffffffffffffffda RBX: 00000000000000c8 RCX: 0000000020000040
      RDX: 0000000000000083 RSI: 00000000f734e000 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      Modules linked in:
      
      Link: https://lore.kernel.org/netdev/CANaxB-yCk8hhP68L4Q2nFOJht8sqgXGGQO2AftpHs0u1xyGG5A@mail.gmail.com/ [1]
      Fixes: 850cbadd ("udp: use it's own memory accounting schema")
      Reported-by: syzbot+444ca0907e96f7c5e48b@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=444ca0907e96f7c5e48bSigned-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230523163305.66466-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ad42a35b
    • Jakub Kicinski's avatar
      Merge branch 'net-phy-mscc-support-vsc8501' · aa015a20
      Jakub Kicinski authored
      David Epping says:
      
      ====================
      net: phy: mscc: support VSC8501
      
      this updated series of patches adds support for the VSC8501 Ethernet
      PHY and fixes support for the VSC8502 PHY in cases where no other
      software (like U-Boot) has initialized the PHY after power up.
      
      The first patch simply adds the VSC8502 to the MODULE_DEVICE_TABLE,
      where I guess it was unintentionally missing. I have no hardware to
      test my change.
      
      The second patch adds the VSC8501 PHY with exactly the same driver
      implementation as the existing VSC8502.
      
      The (new) third patch removes phydev locking from
      vsc85xx_rgmii_set_skews(), as discussed for v2 of the patch set.
      
      The (now) fourth patch fixes the initialization for VSC8501 and VSC8502.
      I have tested this patch with VSC8501 on hardware in RGMII mode only.
      https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8501-03_Datasheet_60001741A.PDF
      https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ProductDocuments/DataSheets/VSC8502-03_Datasheet_60001742B.pdf
      Table 4-42 "RGMII CONTROL, ADDRESS 20E2 (0X14)" Bit 11 for each of
      them.
      By default the RX_CLK is disabled for these PHYs. In cases where no
      other software, like U-Boot, enabled the clock, this results in no
      received packets being handed to the MAC.
      The patch enables this clock output.
      According to Microchip support (case number 01268776) this applies
      to all modes (RGMII, GMII, and MII).
      
      Other PHYs sharing the same register map and code, like
      VSC8530/31/40/41 have the clock enabled and the relevant bit 11 is
      reserved and read-only for them. As per previous discussion the
      patch still clears the bit on these PHYs, too, possibly more easily
      supporting other future PHYs implementing this functionality.
      
      For the VSC8572 family of PHYs, having a different register map,
      no such changes are applied.
      ====================
      
      Link: https://lore.kernel.org/r/20230523153108.18548-1-david.epping@missinglinkelectronics.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa015a20
    • David Epping's avatar
      net: phy: mscc: enable VSC8501/2 RGMII RX clock · 71460c9e
      David Epping authored
      By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is
      disabled. To allow packet forwarding towards the MAC it needs to be
      enabled.
      
      For other PHYs supported by this driver the clock output is enabled
      by default.
      
      Fixes: d3169863 ("net: phy: mscc: add support for VSC8502")
      Signed-off-by: default avatarDavid Epping <david.epping@missinglinkelectronics.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      71460c9e
    • David Epping's avatar
      net: phy: mscc: remove unnecessary phydev locking · 7df0b33d
      David Epping authored
      Holding the struct phy_device (phydev) lock is unnecessary when
      accessing phydev->interface in the PHY driver .config_init method,
      which is the only place that vsc85xx_rgmii_set_skews() is called from.
      
      The phy_modify_paged() function implements required MDIO bus level
      locking, which can not be achieved by a phydev lock.
      Signed-off-by: default avatarDavid Epping <david.epping@missinglinkelectronics.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7df0b33d
    • David Epping's avatar
      net: phy: mscc: add support for VSC8501 · fb055ce4
      David Epping authored
      The VSC8501 PHY can use the same driver implementation as the VSC8502.
      Adding the PHY ID and copying the handler functions of VSC8502 is
      sufficient to operate it.
      Signed-off-by: default avatarDavid Epping <david.epping@missinglinkelectronics.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb055ce4
    • David Epping's avatar
      net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE · 57fb54ab
      David Epping authored
      The mscc driver implements support for VSC8502, so its ID should be in
      the MODULE_DEVICE_TABLE for automatic loading.
      Signed-off-by: default avatarDavid Epping <david.epping@missinglinkelectronics.com>
      Fixes: d3169863 ("net: phy: mscc: add support for VSC8502")
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      57fb54ab
    • Jakub Kicinski's avatar
      Merge branch 'bug-fixes-for-net-handshake' · 1de5900c
      Jakub Kicinski authored
      Chuck Lever says:
      
      ====================
      Bug fixes for net/handshake
      
      Paolo observed that there is a possible leak of sock->file. I
      haven't looked into that yet, but it seems to be separate from
      the fixes in this series, so no need to hold these up.
      ====================
      
      The submissions mentions net-next but it means netdev (perhaps
      merge window left over when trees are converged). In any case,
      it should have gone into net, but was instead applied to net-next
      as commit deb2e484 ("Merge branch 'net-handshake-fixes'").
      These are fixes tho, and Chuck needs them to make progress with
      the client so double-merging them into net... it is what it is :(
      
      Link: https://lore.kernel.org/r/168381978252.84244.1933636428135211300.stgit@91.116.238.104.host.secureserver.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1de5900c
    • Chuck Lever's avatar
      net/handshake: Enable the SNI extension to work properly · 26fb5480
      Chuck Lever authored
      Enable the upper layer protocol to specify the SNI peername. This
      avoids the need for tlshd to use a DNS lookup, which can return a
      hostname that doesn't match the incoming certificate's SubjectName.
      
      Fixes: 2fd55320 ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      26fb5480
    • Chuck Lever's avatar
      net/handshake: Unpin sock->file if a handshake is cancelled · 1ce77c99
      Chuck Lever authored
      If user space never calls DONE, sock->file's reference count remains
      elevated. Enable sock->file to be freed eventually in this case.
      Reported-by: default avatarJakub Kacinski <kuba@kernel.org>
      Fixes: 3b3009ea ("net/handshake: Create a NETLINK service for handling handshake requests")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ce77c99
    • Chuck Lever's avatar
      net/handshake: handshake_genl_notify() shouldn't ignore @flags · fc490880
      Chuck Lever authored
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Fixes: 3b3009ea ("net/handshake: Create a NETLINK service for handling handshake requests")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fc490880
    • Chuck Lever's avatar
      net/handshake: Fix uninitialized local variable · 7afc6d0a
      Chuck Lever authored
      trace_handshake_cmd_done_err() simply records the pointer in @req,
      so initializing it to NULL is sufficient and safe.
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Fixes: 3b3009ea ("net/handshake: Create a NETLINK service for handling handshake requests")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7afc6d0a
    • Chuck Lever's avatar
      net/handshake: Fix handshake_dup() ref counting · 7ea9c1ec
      Chuck Lever authored
      If get_unused_fd_flags() fails, we ended up calling fput(sock->file)
      twice.
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Fixes: 3b3009ea ("net/handshake: Create a NETLINK service for handling handshake requests")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7ea9c1ec
    • Chuck Lever's avatar
      net/handshake: Remove unneeded check from handshake_dup() · a095326e
      Chuck Lever authored
      handshake_req_submit() now verifies that the socket has a file.
      
      Fixes: 3b3009ea ("net/handshake: Create a NETLINK service for handling handshake requests")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a095326e
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0c615f1c
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-05-24
      
      We've added 19 non-merge commits during the last 10 day(s) which contain
      a total of 20 files changed, 738 insertions(+), 448 deletions(-).
      
      The main changes are:
      
      1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
         from John Fastabend.
      
      2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
         from Anton Protopopov.
      
      3) Init the BPF offload table earlier than just late_initcall,
         from Jakub Kicinski.
      
      4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
         from Will Deacon.
      
      5) Remove a now unsupported __fallthrough in BPF samples,
         from Andrii Nakryiko.
      
      6) Fix a typo in pkg-config call for building sign-file,
         from Jeremy Sowden.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, sockmap: Test progs verifier error with latest clang
        bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
        bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
        bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
        bpf, sockmap: Build helper to create connected socket pair
        bpf, sockmap: Pull socket helpers out of listen test for general use
        bpf, sockmap: Incorrectly handling copied_seq
        bpf, sockmap: Wake up polling after data copy
        bpf, sockmap: TCP data stall on recv before accept
        bpf, sockmap: Handle fin correctly
        bpf, sockmap: Improved check for empty queue
        bpf, sockmap: Reschedule is now done through backlog
        bpf, sockmap: Convert schedule_work into delayed_work
        bpf, sockmap: Pass skb ownership through read_skb
        bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
        bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
        samples/bpf: Drop unnecessary fallthrough
        bpf: netdev: init the offload table earlier
        selftests/bpf: Fix pkg-config call building sign-file
      ====================
      
      Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c615f1c
  3. 24 May, 2023 16 commits