1. 30 Jun, 2024 4 commits
    • Niklas Cassel's avatar
      ata: ahci: Clean up sysfs file on error · eeb25a09
      Niklas Cassel authored
      .probe() (ahci_init_one()) calls sysfs_add_file_to_group(), however,
      if probe() fails after this call, we currently never call
      sysfs_remove_file_from_group().
      
      (The sysfs_remove_file_from_group() call in .remove() (ahci_remove_one())
      does not help, as .remove() is not called on .probe() error.)
      
      Thus, if probe() fails after the sysfs_add_file_to_group() call, the next
      time we insmod the module we will get:
      
      sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/remapped_nvme'
      CPU: 11 PID: 954 Comm: modprobe Not tainted 6.10.0-rc5 #43
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x5d/0x80
       sysfs_warn_dup.cold+0x17/0x23
       sysfs_add_file_mode_ns+0x11a/0x130
       sysfs_add_file_to_group+0x7e/0xc0
       ahci_init_one+0x31f/0xd40 [ahci]
      
      Fixes: 894fba7f ("ata: ahci: Add sysfs attribute to show remapped NVMe device count")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Link: https://lore.kernel.org/r/20240629124210.181537-10-cassel@kernel.orgSigned-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      eeb25a09
    • Niklas Cassel's avatar
      ata: libata-core: Fix double free on error · ab9e0c52
      Niklas Cassel authored
      If e.g. the ata_port_alloc() call in ata_host_alloc() fails, we will jump
      to the err_out label, which will call devres_release_group().
      devres_release_group() will trigger a call to ata_host_release().
      ata_host_release() calls kfree(host), so executing the kfree(host) in
      ata_host_alloc() will lead to a double free:
      
      kernel BUG at mm/slub.c:553!
      Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 11 PID: 599 Comm: (udev-worker) Not tainted 6.10.0-rc5 #47
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
      RIP: 0010:kfree+0x2cf/0x2f0
      Code: 5d 41 5e 41 5f 5d e9 80 d6 ff ff 4d 89 f1 41 b8 01 00 00 00 48 89 d9 48 89 da
      RSP: 0018:ffffc90000f377f0 EFLAGS: 00010246
      RAX: ffff888112b1f2c0 RBX: ffff888112b1f2c0 RCX: ffff888112b1f320
      RDX: 000000000000400b RSI: ffffffffc02c9de5 RDI: ffff888112b1f2c0
      RBP: ffffc90000f37830 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffc90000f37610 R11: 617461203a736b6e R12: ffffea00044ac780
      R13: ffff888100046400 R14: ffffffffc02c9de5 R15: 0000000000000006
      FS:  00007f2f1cabe980(0000) GS:ffff88813b380000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f2f1c3acf75 CR3: 0000000111724000 CR4: 0000000000750ef0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? __die_body.cold+0x19/0x27
       ? die+0x2e/0x50
       ? do_trap+0xca/0x110
       ? do_error_trap+0x6a/0x90
       ? kfree+0x2cf/0x2f0
       ? exc_invalid_op+0x50/0x70
       ? kfree+0x2cf/0x2f0
       ? asm_exc_invalid_op+0x1a/0x20
       ? ata_host_alloc+0xf5/0x120 [libata]
       ? ata_host_alloc+0xf5/0x120 [libata]
       ? kfree+0x2cf/0x2f0
       ata_host_alloc+0xf5/0x120 [libata]
       ata_host_alloc_pinfo+0x14/0xa0 [libata]
       ahci_init_one+0x6c9/0xd20 [ahci]
      
      Ensure that we will not call kfree(host) twice, by performing the kfree()
      only if the devres_open_group() call failed.
      
      Fixes: dafd6c49 ("libata: ensure host is free'd on error exit paths")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Link: https://lore.kernel.org/r/20240629124210.181537-9-cassel@kernel.orgSigned-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      ab9e0c52
    • Niklas Cassel's avatar
      ata,scsi: libata-core: Do not leak memory for ata_port struct members · f6549f53
      Niklas Cassel authored
      libsas is currently not freeing all the struct ata_port struct members,
      e.g. ncq_sense_buf for a driver supporting Command Duration Limits (CDL).
      
      Add a function, ata_port_free(), that is used to free a ata_port,
      including its struct members. It makes sense to keep the code related to
      freeing a ata_port in its own function, which will also free all the
      struct members of struct ata_port.
      
      Fixes: 18bd7718 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD")
      Reviewed-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Link: https://lore.kernel.org/r/20240629124210.181537-8-cassel@kernel.orgSigned-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      f6549f53
    • Niklas Cassel's avatar
      ata: libata-core: Fix null pointer dereference on error · 5d92c7c5
      Niklas Cassel authored
      If the ata_port_alloc() call in ata_host_alloc() fails,
      ata_host_release() will get called.
      
      However, the code in ata_host_release() tries to free ata_port struct
      members unconditionally, which can lead to the following:
      
      BUG: unable to handle page fault for address: 0000000000003990
      PGD 0 P4D 0
      Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 10 PID: 594 Comm: (udev-worker) Not tainted 6.10.0-rc5 #44
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
      RIP: 0010:ata_host_release.cold+0x2f/0x6e [libata]
      Code: e4 4d 63 f4 44 89 e2 48 c7 c6 90 ad 32 c0 48 c7 c7 d0 70 33 c0 49 83 c6 0e 41
      RSP: 0018:ffffc90000ebb968 EFLAGS: 00010246
      RAX: 0000000000000041 RBX: ffff88810fb52e78 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff88813b3218c0 RDI: ffff88813b3218c0
      RBP: ffff88810fb52e40 R08: 0000000000000000 R09: 6c65725f74736f68
      R10: ffffc90000ebb738 R11: 73692033203a746e R12: 0000000000000004
      R13: 0000000000000000 R14: 0000000000000011 R15: 0000000000000006
      FS:  00007f6cc55b9980(0000) GS:ffff88813b300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000003990 CR3: 00000001122a2000 CR4: 0000000000750ef0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? __die_body.cold+0x19/0x27
       ? page_fault_oops+0x15a/0x2f0
       ? exc_page_fault+0x7e/0x180
       ? asm_exc_page_fault+0x26/0x30
       ? ata_host_release.cold+0x2f/0x6e [libata]
       ? ata_host_release.cold+0x2f/0x6e [libata]
       release_nodes+0x35/0xb0
       devres_release_group+0x113/0x140
       ata_host_alloc+0xed/0x120 [libata]
       ata_host_alloc_pinfo+0x14/0xa0 [libata]
       ahci_init_one+0x6c9/0xd20 [ahci]
      
      Do not access ata_port struct members unconditionally.
      
      Fixes: 633273a3 ("libata-pmp: hook PMP support and enable it")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Link: https://lore.kernel.org/r/20240629124210.181537-7-cassel@kernel.orgSigned-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      5d92c7c5
  2. 28 Jun, 2024 1 commit
  3. 19 Jun, 2024 1 commit
    • Niklas Cassel's avatar
      ata: ahci: Do not enable LPM if no LPM states are supported by the HBA · fa997b05
      Niklas Cassel authored
      LPM consists of HIPM (host initiated power management) and DIPM
      (device initiated power management).
      
      ata_eh_set_lpm() will only enable HIPM if both the HBA and the device
      supports it.
      
      However, DIPM will be enabled as long as the device supports it.
      The HBA will later reject the device's request to enter a power state
      that it does not support (Slumber/Partial/DevSleep) (DevSleep is never
      initiated by the device).
      
      For a HBA that doesn't support any LPM states, simply don't set a LPM
      policy such that all the HIPM/DIPM probing/enabling will be skipped.
      
      Not enabling HIPM or DIPM in the first place is safer than relying on
      the device following the AHCI specification and respecting the NAK.
      (There are comments in the code that some devices misbehave when
      receiving a NAK.)
      
      Performing this check in ahci_update_initial_lpm_policy() also has the
      advantage that a HBA that doesn't support any LPM states will take the
      exact same code paths as a port that is external/hot plug capable.
      
      Side note: the port in ata_port_dbg() has not been given a unique id yet,
      but this is not overly important as the debug print is disabled unless
      explicitly enabled using dynamic debug. A follow-up series will make sure
      that the unique id assignment will be done earlier. For now, the important
      thing is that the function returns before setting the LPM policy.
      
      Fixes: 7627a0ed ("ata: ahci: Drop low power policy board type")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Link: https://lore.kernel.org/r/20240618152828.2686771-2-cassel@kernel.orgSigned-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      fa997b05
  4. 14 Jun, 2024 1 commit
    • Damien Le Moal's avatar
      ata: libata-scsi: Set the RMB bit only for removable media devices · a6a75edc
      Damien Le Moal authored
      The SCSI Removable Media Bit (RMB) should only be set for removable media,
      where the device stays and the media changes, e.g. CD-ROM or floppy.
      
      The ATA removable media device bit is obsoleted since ATA-8 ACS (2006),
      but before that it was used to indicate that the device can have its media
      removed (while the device stays).
      
      Commit 8a3e33cf ("ata: ahci: find eSATA ports and flag them as
      removable") introduced a change to set the RMB bit if the port has either
      the eSATA bit or the hot-plug capable bit set. The reasoning was that the
      author wanted his eSATA ports to get treated like a USB stick.
      
      This is however wrong. See "20-082r23SPC-6: Removable Medium Bit
      Expectations" which has since been integrated to SPC, which states that:
      
      """
      Reports have been received that some USB Memory Stick device servers set
      the removable medium (RMB) bit to one. The rub comes when the medium is
      actually removed, because... The device server is removed concurrently
      with the medium removal. If there is no device server, then there is no
      device server that is waiting to have removable medium inserted.
      
      Sufficient numbers of SCSI analysts see such a device:
      - not as a device that supports removable medium;
      but
      - as a removable, hot pluggable device.
      """
      
      The definition of the RMB bit in the SPC specification has since been
      clarified to match this.
      
      Thus, a USB stick should not have the RMB bit set (and neither shall an
      eSATA nor a hot-plug capable port).
      
      Commit dc8b4afc ("ata: ahci: don't mark HotPlugCapable Ports as
      external/removable") then changed so that the RMB bit is only set for the
      eSATA bit (and not for the hot-plug capable bit), because of a lot of bug
      reports of SATA devices were being automounted by udisks. However,
      treating eSATA and hot-plug capable ports differently is not correct.
      
      From the AHCI 1.3.1 spec:
      Hot Plug Capable Port (HPCP): When set to '1', indicates that this port's
      signal and power connectors are externally accessible via a joint signal
      and power connector for blindmate device hot plug.
      
      So a hot-plug capable port is an external port, just like commit
      45b96d65 ("ata: ahci: a hotplug capable port is an external port")
      claims.
      
      In order to not violate the SPC specification, modify the SCSI INQUIRY
      data to only set the RMB bit if the ATA device can have its media removed.
      
      This fixes a reported problem where GNOME/udisks was automounting devices
      connected to hot-plug capable ports.
      
      Fixes: 45b96d65 ("ata: ahci: a hotplug capable port is an external port")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Tested-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Reported-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Closes: https://lore.kernel.org/linux-ide/c0de8262-dc4b-4c22-9fac-33432e5bddd3@t-8ch.de/Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      [cassel: wrote commit message]
      Signed-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      a6a75edc
  5. 06 Jun, 2024 1 commit
    • Michael Ellerman's avatar
      ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K · 09fe2bfa
      Michael Ellerman authored
      The pata_macio driver advertises a max_segment_size of 0xff00, because
      the hardware doesn't cope with requests >= 64K.
      
      However the SCSI core requires max_segment_size to be at least
      PAGE_SIZE, which is a problem for pata_macio when the kernel is built
      with 64K pages.
      
      In older kernels the SCSI core would just increase the segment size to
      be equal to PAGE_SIZE, however since the commit tagged below it causes a
      warning and the device fails to probe:
      
        WARNING: CPU: 0 PID: 26 at block/blk-settings.c:202 .blk_validate_limits+0x2f8/0x35c
        CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 6.10.0-rc1 #1
        Hardware name: PowerMac7,2 PPC970 0x390202 PowerMac
        ...
        NIP .blk_validate_limits+0x2f8/0x35c
        LR  .blk_alloc_queue+0xc0/0x2f8
        Call Trace:
          .blk_alloc_queue+0xc0/0x2f8
          .blk_mq_alloc_queue+0x60/0xf8
          .scsi_alloc_sdev+0x208/0x3c0
          .scsi_probe_and_add_lun+0x314/0x52c
          .__scsi_add_device+0x170/0x1a4
          .ata_scsi_scan_host+0x2bc/0x3e4
          .async_port_probe+0x6c/0xa0
          .async_run_entry_fn+0x60/0x1bc
          .process_one_work+0x228/0x510
          .worker_thread+0x360/0x530
          .kthread+0x134/0x13c
          .start_kernel_thread+0x10/0x14
        ...
        scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
      
      Although the hardware can't cope with a 64K segment, the driver
      already deals with that internally by splitting large requests in
      pata_macio_qc_prep(). That is how the driver has managed to function
      until now on 64K kernels.
      
      So fix the driver to advertise a max_segment_size of 64K, which avoids
      the warning and keeps the SCSI core happy.
      
      Fixes: afd53a3d ("scsi: core: Initialize scsi midlayer limits before allocating the queue")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Closes: https://lore.kernel.org/all/ce2bf6af-4382-4fe1-b392-cc6829f5ceb2@roeck-us.net/Reported-by: default avatarDoru Iorgulescu <doru.iorgulescu1@gmail.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218858Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Signed-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      09fe2bfa
  6. 31 May, 2024 3 commits
    • Niklas Cassel's avatar
      ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340 · 3cb648c4
      Niklas Cassel authored
      Commit 7627a0ed ("ata: ahci: Drop low power policy board type")
      dropped the board_ahci_low_power board type, and instead enables LPM if:
      -The AHCI controller reports that it supports LPM (Partial/Slumber), and
      -CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
      -The port is not defined as external in the per port PxCMD register, and
      -The port is not defined as hotplug capable in the per port PxCMD
       register.
      
      Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
      
      For HIPM (host initiated power management) to get enabled, both the AHCI
      controller and the drive have to report that they support HIPM.
      
      For DIPM (device initiated power management) to get enabled, only the
      drive has to report that it supports DIPM. However, the HBA will reject
      device requests to enter LPM states which the HBA does not support.
      
      The problem is that Apacer AS340 drives do not handle low power modes
      correctly. The problem was most likely not seen before because no one
      had used this drive with a AHCI controller with LPM enabled.
      
      Add a quirk so that we do not enable LPM for this drive, since we see
      command timeouts if we do (even though the drive claims to support DIPM).
      
      Fixes: 7627a0ed ("ata: ahci: Drop low power policy board type")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarTim Teichmann <teichmanntim@outlook.de>
      Closes: https://lore.kernel.org/linux-ide/87bk4pbve8.ffs@tglx/Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Signed-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      3cb648c4
    • Niklas Cassel's avatar
      ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD · 47388036
      Niklas Cassel authored
      Commit 7627a0ed ("ata: ahci: Drop low power policy board type")
      dropped the board_ahci_low_power board type, and instead enables LPM if:
      -The AHCI controller reports that it supports LPM (Partial/Slumber), and
      -CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
      -The port is not defined as external in the per port PxCMD register, and
      -The port is not defined as hotplug capable in the per port PxCMD
       register.
      
      Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
      
      For HIPM (host initiated power management) to get enabled, both the AHCI
      controller and the drive have to report that they support HIPM.
      
      For DIPM (device initiated power management) to get enabled, only the
      drive has to report that it supports DIPM. However, the HBA will reject
      device requests to enter LPM states which the HBA does not support.
      
      The problem is that AMD Radeon S3 SSD drives do not handle low power modes
      correctly. The problem was most likely not seen before because no one
      had used this drive with a AHCI controller with LPM enabled.
      
      Add a quirk so that we do not enable LPM for this drive, since we see
      command timeouts if we do (even though the drive claims to support both
      HIPM and DIPM).
      
      Fixes: 7627a0ed ("ata: ahci: Drop low power policy board type")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDoru Iorgulescu <doru.iorgulescu1@gmail.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Signed-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      47388036
    • Niklas Cassel's avatar
      ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1 · 86aaa7e9
      Niklas Cassel authored
      Commit 7627a0ed ("ata: ahci: Drop low power policy board type")
      dropped the board_ahci_low_power board type, and instead enables LPM if:
      -The AHCI controller reports that it supports LPM (Partial/Slumber), and
      -CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
      -The port is not defined as external in the per port PxCMD register, and
      -The port is not defined as hotplug capable in the per port PxCMD
       register.
      
      Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
      
      For HIPM (host initiated power management) to get enabled, both the AHCI
      controller and the drive have to report that they support HIPM.
      
      For DIPM (device initiated power management) to get enabled, only the
      drive has to report that it supports DIPM. However, the HBA will reject
      device requests to enter LPM states which the HBA does not support.
      
      The problem is that Crucial CT240BX500SSD1 drives do not handle low power
      modes correctly. The problem was most likely not seen before because no
      one had used this drive with a AHCI controller with LPM enabled.
      
      Add a quirk so that we do not enable LPM for this drive, since we see
      command timeouts if we do (even though the drive claims to support DIPM).
      
      Fixes: 7627a0ed ("ata: ahci: Drop low power policy board type")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAarrayy <lp610mh@gmail.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Signed-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      86aaa7e9
  7. 27 May, 2024 1 commit
    • Jason Nader's avatar
      ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake · 9e2f46cd
      Jason Nader authored
      Commit b8b8b4e0 ("ata: ahci: Add Intel Alder Lake-P AHCI controller
      to low power chipsets list") added Intel Alder Lake to the ahci_pci_tbl.
      
      Because of the way that the Intel PCS quirk was implemented, having
      an explicit entry in the ahci_pci_tbl caused the Intel PCS quirk to
      be applied. (The quirk was not being applied if there was no explict
      entry.)
      
      Thus, entries that were added to the ahci_pci_tbl also got the Intel
      PCS quirk applied.
      
      The quirk was cleaned up in commit 7edbb605 ("ahci: clean up
      intel_pcs_quirk"), such that it is clear which entries that actually
      applies the Intel PCS quirk.
      
      Newer Intel AHCI controllers do not need the Intel PCS quirk,
      and applying it when not needed actually breaks some platforms.
      
      Do not apply the Intel PCS quirk for Intel Alder Lake.
      This is in line with how things worked before commit b8b8b4e0 ("ata:
      ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list"),
      such that certain platforms using Intel Alder Lake will work once again.
      
      Cc: stable@vger.kernel.org # 6.7
      Fixes: b8b8b4e0 ("ata: ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list")
      Signed-off-by: default avatarJason Nader <dev@kayoway.com>
      Signed-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      9e2f46cd
  8. 26 May, 2024 5 commits
  9. 25 May, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of... · 9b62e02e
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "16 hotfixes, 11 of which are cc:stable.
      
        A few nilfs2 fixes, the remainder are for MM: a couple of selftests
        fixes, various singletons fixing various issues in various parts"
      
      * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm/ksm: fix possible UAF of stable_node
        mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
        mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
        nilfs2: fix potential hang in nilfs_detach_log_writer()
        nilfs2: fix unexpected freezing of nilfs_segctor_sync()
        nilfs2: fix use-after-free of timer for log writer thread
        selftests/mm: fix build warnings on ppc64
        arm64: patching: fix handling of execmem addresses
        selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
        selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
        selftests/mm: compaction_test: fix bogus test success on Aarch64
        mailmap: update email address for Satya Priya
        mm/huge_memory: don't unpoison huge_zero_folio
        kasan, fortify: properly rename memintrinsics
        lib: add version into /proc/allocinfo output
        mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
      9b62e02e
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a0db36ed
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
      
       - Fix x86 IRQ vector leak caused by a CPU offlining race
      
       - Fix build failure in the riscv-imsic irqchip driver
         caused by an API-change semantic conflict
      
       - Fix use-after-free in irq_find_at_or_after()
      
      * tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
        genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
        irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
      a0db36ed
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3a390f24
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
      
       - Fix regressions of the new x86 CPU VFM (vendor/family/model)
         enumeration/matching code
      
       - Fix crash kernel detection on buggy firmware with
         non-compliant ACPI MADT tables
      
       - Address Kconfig warning
      
      * tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
        crypto: x86/aes-xts - switch to new Intel CPU model defines
        x86/topology: Handle bogus ACPI tables correctly
        x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
      3a390f24
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi · 56676c4c
      Linus Torvalds authored
      Pull ipmi updates from Corey Minyard:
       "Mostly updates for deprecated interfaces, platform.remove and
        converting from a tasklet to a BH workqueue.
      
        Also use HAS_IOPORT for disabling inb()/outb()"
      
      * tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi:
        ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void
        ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void
        ipmi: ipmi_ssif: Convert to platform remove callback returning void
        ipmi: ipmi_si_platform: Convert to platform remove callback returning void
        ipmi: ipmi_powernv: Convert to platform remove callback returning void
        ipmi: bt-bmc: Convert to platform remove callback returning void
        char: ipmi: handle HAS_IOPORT dependencies
        ipmi: Convert from tasklet to BH workqueue
      56676c4c
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client · 74eca356
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "A series from Xiubo that adds support for additional access checks
        based on MDS auth caps which were recently made available to clients.
      
        This is needed to prevent scenarios where the MDS quietly discards
        updates that a UID-restricted client previously (wrongfully) acked to
        the user.
      
        Other than that, just a documentation fixup"
      
      * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
        doc: ceph: update userspace command to get CephFS metadata
        ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
        ceph: check the cephx mds auth access for async dirop
        ceph: check the cephx mds auth access for open
        ceph: check the cephx mds auth access for setattr
        ceph: add ceph_mds_check_access() helper
        ceph: save cap_auths in MDS client when session is opened
      74eca356
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3 · 89b61ca4
      Linus Torvalds authored
      Pull ntfs3 updates from Konstantin Komarov:
       "Fixes:
         - reusing of the file index (could cause the file to be trimmed)
         - infinite dir enumeration
         - taking DOS names into account during link counting
         - le32_to_cpu conversion, 32 bit overflow, NULL check
         - some code was refactored
      
        Changes:
         - removed max link count info display during driver init
      
        Remove:
         - atomic_open has been removed for lack of use"
      
      * tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
        fs/ntfs3: Break dir enumeration if directory contents error
        fs/ntfs3: Fix case when index is reused during tree transformation
        fs/ntfs3: Mark volume as dirty if xattr is broken
        fs/ntfs3: Always make file nonresident on fallocate call
        fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
        fs/ntfs3: Use variable length array instead of fixed size
        fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
        fs/ntfs3: Check 'folio' pointer for NULL
        fs/ntfs3: Missed le32_to_cpu conversion
        fs/ntfs3: Remove max link count info display during driver init
        fs/ntfs3: Taking DOS names into account during link counting
        fs/ntfs3: remove atomic_open
        fs/ntfs3: use kcalloc() instead of kzalloc()
      89b61ca4
    • Linus Torvalds's avatar
      Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 6c8b1a2d
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Two ksmbd server fixes, both for stable"
      
      * tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: ignore trailing slashes in share paths
        ksmbd: avoid to send duplicate oplock break notifications
      6c8b1a2d
    • Linus Torvalds's avatar
      Merge tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 54f71b03
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "There is one new driver and then most of the changes are the device
        tree bindings conversions to yaml.
      
        New driver:
         - Epson RX8111
      
        Drivers:
         - Many Device Tree bindings conversions to dtschema
         - pcf8563: wakeup-source support"
      
      * tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        pcf8563: add wakeup-source support
        rtc: rx8111: handle VLOW flag
        rtc: rx8111: demote warnings to debug level
        rtc: rx6110: Constify struct regmap_config
        dt-bindings: rtc: convert trivial devices into dtschema
        dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
        dt-bindings: rtc: pxa-rtc: convert to dtschema
        rtc: Add driver for Epson RX8111
        dt-bindings: rtc: Add Epson RX8111
        rtc: mcp795: drop unneeded MODULE_ALIAS
        rtc: nuvoton: Modify part number value
        rtc: test: Split rtc unit test into slow and normal speed test
        dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
        dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
        dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
        dt-bindings: rtc: armada-380-rtc: convert to dtschema
        rtc: cros-ec: provide ID table for avoiding fallback match
      54f71b03
    • Linus Torvalds's avatar
      Merge tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · 4286e1fc
      Linus Torvalds authored
      Pull i3c updates from Alexandre Belloni:
       "Runtime PM (power management) is improved and hot-join support has
        been added to the dw controller driver.
      
        Core:
         - Allow device driver to trigger controller runtime PM
      
        Drivers:
         - dw: hot-join support
         - svc: better IBI handling"
      
      * tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
        i3c: dw: Add hot-join support.
        i3c: master: Enable runtime PM for master controller
        i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
        i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
        i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
      4286e1fc
    • Linus Torvalds's avatar
      Merge tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs · 6951abe8
      Linus Torvalds authored
      Pull jffs2 updates from Richard Weinberger:
      
       - Fix illegal memory access in jffs2_free_inode()
      
       - Kernel-doc fixes
      
       - print symbolic error names
      
      * tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
        jffs2: Fix potential illegal address access in jffs2_free_inode
        jffs2: Simplify the allocation of slab caches
        jffs2: nodemgmt: fix kernel-doc comments
        jffs2: print symbolic error name instead of error code
      6951abe8
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · 2313022e
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Fixes for -Wmissing-prototypes warnings and further cleanup
      
       - Remove callback returning void from rtc and virtio drivers
      
       - Fix bash location
      
      * tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
        um: virtio_uml: Convert to platform remove callback returning void
        um: rtc: Convert to platform remove callback returning void
        um: Remove unused do_get_thread_area function
        um: Fix -Wmissing-prototypes warnings for __vdso_*
        um: Add an internal header shared among the user code
        um: Fix the declaration of kasan_map_memory
        um: Fix the -Wmissing-prototypes warning for get_thread_reg
        um: Fix the -Wmissing-prototypes warning for __switch_mm
        um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
        um: Stop tracking host PID in cpu_tasks
        um: process: remove unused 'n' variable
        um: vector: remove unused len variable/calculation
        um: vector: fix bpfflash parameter evaluation
        um: slirp: remove set but unused variable 'pid'
        um: signal: move pid variable where needed
        um: Makefile: use bash from the environment
        um: Add winch to winch_handlers before registering winch IRQ
        um: Fix -Wmissing-prototypes warnings for __warp_* and foo
        um: Fix -Wmissing-prototypes warnings for text_poke*
        um: Move declarations to proper headers
        ...
      2313022e
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel · 56fb6f92
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Some fixes for the end of the merge window, mostly amdgpu and panthor,
        with one nouveau uAPI change that fixes a bad decision we made a few
        months back.
      
        nouveau:
         - fix bo metadata uAPI for vm bind
      
        panthor:
         - Fixes for panthor's heap logical block.
         - Reset on unrecoverable fault
         - Fix VM references.
         - Reset fix.
      
        xlnx:
         - xlnx compile and doc fixes.
      
        amdgpu:
         - Handle vbios table integrated info v2.3
      
        amdkfd:
         - Handle duplicate BOs in reserve_bo_and_cond_vms
         - Handle memory limitations on small APUs
      
        dp/mst:
         - MST null deref fix.
      
        bridge:
         - Don't let next bridge create connector in adv7511 to make probe
           work"
      
      * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
        drm/amdgpu/atomfirmware: add intergrated info v2.3 table
        drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
        drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
        drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
        drm/bridge: adv7511: Attach next bridge without creating connector
        drm/buddy: Fix the warn on's during force merge
        drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
        drm/panthor: Call panthor_sched_post_reset() even if the reset failed
        drm/panthor: Reset the FW VM to NULL on unplug
        drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
        drm/panthor: Force an immediate reset on unrecoverable faults
        drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
        drm/panthor: Fix an off-by-one in the heap context retrieval logic
        drm/panthor: Relax the constraints on the tiler chunk size
        drm/panthor: Make sure the tiler initial/max chunks are consistent
        drm/panthor: Fix tiler OOM handling to allow incremental rendering
        drm: xlnx: zynqmp_dpsub: Fix compilation error
        drm: xlnx: zynqmp_dpsub: Fix few function comments
      56fb6f92
  10. 24 May, 2024 11 commits
    • David Howells's avatar
      cifs: Fix missing set of remote_i_size · 93a43155
      David Howells authored
      Occasionally, the generic/001 xfstest will fail indicating corruption in
      one of the copy chains when run on cifs against a server that supports
      FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs).  The
      problem is that the remote_i_size value isn't updated by cifs_setsize()
      when called by smb2_duplicate_extents(), but i_size *is*.
      
      This may cause cifs_remap_file_range() to then skip the bit after calling
      ->duplicate_extents() that sets sizes.
      
      Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
      calling cifs_setsize() to set i_size.
      
      This means we don't then need to call netfs_resize_file() upon return from
      ->duplicate_extents(), but we also fix the test to compare against the pre-dup
      inode size.
      
      [Note that this goes back before the addition of remote_i_size with the
      netfs_inode struct.  It should probably have been setting cifsi->server_eof
      previously.]
      
      Fixes: cfc63fc8 ("smb3: fix cached file size problems in duplicate extents (reflink)")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Steve French <sfrench@samba.org>
      cc: Paulo Alcantara <pc@manguebit.com>
      cc: Shyam Prasad N <nspmangalore@gmail.com>
      cc: Rohith Surabattula <rohiths.msft@gmail.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      93a43155
    • David Howells's avatar
      cifs: Fix smb3_insert_range() to move the zero_point · 8a160723
      David Howells authored
      Fix smb3_insert_range() to move the zero_point over to the new EOF.
      Without this, generic/147 fails as reads of data beyond the old EOF point
      return zeroes.
      
      Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Shyam Prasad N <nspmangalore@gmail.com>
      cc: Rohith Surabattula <rohiths.msft@gmail.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      8a160723
    • Linus Torvalds's avatar
      Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 0b32d436
      Linus Torvalds authored
      Pull more mm updates from Andrew Morton:
       "Jeff Xu's implementation of the mseal() syscall"
      
      * tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        selftest mm/mseal read-only elf memory segment
        mseal: add documentation
        selftest mm/mseal memory sealing
        mseal: add mseal syscall
        mseal: wire up mseal syscall
      0b32d436
    • Chengming Zhou's avatar
      mm/ksm: fix possible UAF of stable_node · 90e82349
      Chengming Zhou authored
      The commit 2c653d0e ("ksm: introduce ksm_max_page_sharing per page
      deduplication limit") introduced a possible failure case in the
      stable_tree_insert(), where we may free the new allocated stable_node_dup
      if we fail to prepare the missing chain node.
      
      Then that kfolio return and unlock with a freed stable_node set...  And
      any MM activities can come in to access kfolio->mapping, so UAF.
      
      Fix it by moving folio_set_stable_node() to the end after stable_node
      is inserted successfully.
      
      Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev
      Fixes: 2c653d0e ("ksm: introduce ksm_max_page_sharing per page deduplication limit")
      Signed-off-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      90e82349
    • Miaohe Lin's avatar
      mm/memory-failure: fix handling of dissolved but not taken off from buddy pages · 8cf360b9
      Miaohe Lin authored
      When I did memory failure tests recently, below panic occurs:
      
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
      flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
      raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
      page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
      ------------[ cut here ]------------
      kernel BUG at include/linux/page-flags.h:1009!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      RIP: 0010:__del_page_from_free_list+0x151/0x180
      RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
      RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
      RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
      RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
      R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
      R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
      FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
      Call Trace:
       <TASK>
       __rmqueue_pcplist+0x23b/0x520
       get_page_from_freelist+0x26b/0xe40
       __alloc_pages_noprof+0x113/0x1120
       __folio_alloc_noprof+0x11/0xb0
       alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
       __alloc_fresh_hugetlb_folio+0xe7/0x140
       alloc_pool_huge_folio+0x68/0x100
       set_max_huge_pages+0x13d/0x340
       hugetlb_sysctl_handler_common+0xe8/0x110
       proc_sys_call_handler+0x194/0x280
       vfs_write+0x387/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xc2/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7ff916114887
      RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
      RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
      RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
      R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
      R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
       </TASK>
      Modules linked in: mce_inject hwpoison_inject
      ---[ end trace 0000000000000000 ]---
      
      And before the panic, there had an warning about bad page state:
      
      BUG: Bad page state in process page-types  pfn:8cee00
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
      flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
      page_type: 0xffffff7f(buddy)
      raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
      raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
      page dumped because: nonzero mapcount
      Modules linked in: mce_inject hwpoison_inject
      CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
      Call Trace:
       <TASK>
       dump_stack_lvl+0x83/0xa0
       bad_page+0x63/0xf0
       free_unref_page+0x36e/0x5c0
       unpoison_memory+0x50b/0x630
       simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
       debugfs_attr_write+0x42/0x60
       full_proxy_write+0x5b/0x80
       vfs_write+0xcd/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xc2/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f189a514887
      RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
      RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
      RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
      R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
       </TASK>
      
      The root cause should be the below race:
      
       memory_failure
        try_memory_failure_hugetlb
         me_huge_page
          __page_handle_poison
           dissolve_free_hugetlb_folio
           drain_all_pages -- Buddy page can be isolated e.g. for compaction.
           take_page_off_buddy -- Failed as page is not in the buddy list.
      	     -- Page can be putback into buddy after compaction.
          page_ref_inc -- Leads to buddy page with refcnt = 1.
      
      Then unpoison_memory() can unpoison the page and send the buddy page back
      into buddy list again leading to the above bad page state warning.  And
      bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
      page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
      allocate this page.
      
      Fix this issue by only treating __page_handle_poison() as successful when
      it returns 1.
      
      Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
      Fixes: ceaf8fbe ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8cf360b9
    • Yuanyuan Zhong's avatar
      mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again · 6d065f50
      Yuanyuan Zhong authored
      After switching smaps_rollup to use VMA iterator, searching for next entry
      is part of the condition expression of the do-while loop.  So the current
      VMA needs to be addressed before the continue statement.
      
      Otherwise, with some VMAs skipped, userspace observed memory
      consumption from /proc/pid/smaps_rollup will be smaller than the sum of
      the corresponding fields from /proc/pid/smaps.
      
      Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
      Fixes: c4c84f06 ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
      Signed-off-by: default avatarYuanyuan Zhong <yzhong@purestorage.com>
      Reviewed-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6d065f50
    • Ryusuke Konishi's avatar
      nilfs2: fix potential hang in nilfs_detach_log_writer() · eb85dace
      Ryusuke Konishi authored
      Syzbot has reported a potential hang in nilfs_detach_log_writer() called
      during nilfs2 unmount.
      
      Analysis revealed that this is because nilfs_segctor_sync(), which
      synchronizes with the log writer thread, can be called after
      nilfs_segctor_destroy() terminates that thread, as shown in the call trace
      below:
      
      nilfs_detach_log_writer
        nilfs_segctor_destroy
          nilfs_segctor_kill_thread  --> Shut down log writer thread
          flush_work
            nilfs_iput_work_func
              nilfs_dispose_list
                iput
                  nilfs_evict_inode
                    nilfs_transaction_commit
                      nilfs_construct_segment (if inode needs sync)
                        nilfs_segctor_sync  --> Attempt to synchronize with
                                                log writer thread
                                 *** DEADLOCK ***
      
      Fix this issue by changing nilfs_segctor_sync() so that the log writer
      thread returns normally without synchronizing after it terminates, and by
      forcing tasks that are already waiting to complete once after the thread
      terminates.
      
      The skipped inode metadata flushout will then be processed together in the
      subsequent cleanup work in nilfs_segctor_destroy().
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eb85dace
    • Ryusuke Konishi's avatar
      nilfs2: fix unexpected freezing of nilfs_segctor_sync() · 936184ea
      Ryusuke Konishi authored
      A potential and reproducible race issue has been identified where
      nilfs_segctor_sync() would block even after the log writer thread writes a
      checkpoint, unless there is an interrupt or other trigger to resume log
      writing.
      
      This turned out to be because, depending on the execution timing of the
      log writer thread running in parallel, the log writer thread may skip
      responding to nilfs_segctor_sync(), which causes a call to schedule()
      waiting for completion within nilfs_segctor_sync() to lose the opportunity
      to wake up.
      
      The reason why waking up the task waiting in nilfs_segctor_sync() may be
      skipped is that updating the request generation issued using a shared
      sequence counter and adding an wait queue entry to the request wait queue
      to the log writer, are not done atomically.  There is a possibility that
      log writing and request completion notification by nilfs_segctor_wakeup()
      may occur between the two operations, and in that case, the wait queue
      entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
      nilfs_segctor_sync() will be carried over until the next request occurs.
      
      Fix this issue by performing these two operations simultaneously within
      the lock section of sc_state_lock.  Also, following the memory barrier
      guidelines for event waiting loops, move the call to set_current_state()
      in the same location into the event waiting loop to ensure that a memory
      barrier is inserted just before the event condition determination.
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
      Fixes: 9ff05123 ("nilfs2: segment constructor")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      936184ea
    • Ryusuke Konishi's avatar
      nilfs2: fix use-after-free of timer for log writer thread · f5d4e046
      Ryusuke Konishi authored
      Patch series "nilfs2: fix log writer related issues".
      
      This bug fix series covers three nilfs2 log writer-related issues,
      including a timer use-after-free issue and potential deadlock issue on
      unmount, and a potential freeze issue in event synchronization found
      during their analysis.  Details are described in each commit log.
      
      
      This patch (of 3):
      
      A use-after-free issue has been reported regarding the timer sc_timer on
      the nilfs_sc_info structure.
      
      The problem is that even though it is used to wake up a sleeping log
      writer thread, sc_timer is not shut down until the nilfs_sc_info structure
      is about to be freed, and is used regardless of the thread's lifetime.
      
      Fix this issue by limiting the use of sc_timer only while the log writer
      thread is alive.
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
      Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
      Fixes: fdce895e ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar"Bai, Shuangpeng" <sjb7183@psu.edu>
      Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJTested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5d4e046
    • Michael Ellerman's avatar
      selftests/mm: fix build warnings on ppc64 · 1901472f
      Michael Ellerman authored
      Fix warnings like:
      
        In file included from uffd-unit-tests.c:8:
        uffd-unit-tests.c: In function `uffd_poison_handle_fault':
        uffd-common.h:45:33: warning: format `%llu' expects argument of type
        `long long unsigned int', but argument 3 has type `__u64' {aka `long
        unsigned int'} [-Wformat=]
      
      By switching to unsigned long long for u64 for ppc64 builds.
      
      Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.auSigned-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1901472f
    • Will Deacon's avatar
      arm64: patching: fix handling of execmem addresses · b1480ed2
      Will Deacon authored
      Klara Modin reported warnings for a kernel configured with BPF_JIT but
      without MODULES:
      
      [   44.131296] Trying to vfree() bad address (000000004a17c299)
      [   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a #25
      [   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
      [   44.164433] Workqueue: events bpf_prog_free_deferred
      [   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.188772] sp : ffff800082a13c70
      [   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
      [   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
      [   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
      [   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
      [   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
      [   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
      [   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
      [   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
      [   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
      [   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
      [   44.275703] Call trace:
      [   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.283858] vfree (mm/vmalloc.c:3322)
      [   44.287835] execmem_free (mm/execmem.c:70)
      [   44.292347] bpf_jit_free_exec+0x10/0x1c
      [   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
      [   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
      [   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
      [   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
      [   44.317785] process_one_work (kernel/workqueue.c:3273)
      [   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
      [   44.327292] kthread (kernel/kthread.c:388)
      [   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)
      
      The problem is because bpf_arch_text_copy() silently fails to write to the
      read-only area as a result of patch_map() faulting and the resulting
      -EFAULT being chucked away.
      
      Update patch_map() to use CONFIG_EXECMEM instead of
      CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
      
      Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
      Fixes: 2c9e5d4a ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reported-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.comTested-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Cc: Björn Töpel <bjorn@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b1480ed2