1. 05 Apr, 2019 40 commits
    • Håkon Bugge's avatar
      IB/mlx4: Increase the timeout for CM cache · d3ec442d
      Håkon Bugge authored
      [ Upstream commit 2612d723 ]
      
      Using CX-3 virtual functions, either from a bare-metal machine or
      pass-through from a VM, MAD packets are proxied through the PF driver.
      
      Since the VF drivers have separate name spaces for MAD Transaction Ids
      (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
      in a cache.
      
      Following the RDMA Connection Manager (CM) protocol, it is clear when
      an entry has to evicted form the cache. But life is not perfect,
      remote peers may die or be rebooted. Hence, it's a timeout to wipe out
      a cache entry, when the PF driver assumes the remote peer has gone.
      
      During workloads where a high number of QPs are destroyed concurrently,
      excessive amount of CM DREQ retries has been observed
      
      The problem can be demonstrated in a bare-metal environment, where two
      nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
      have 16 vPorts per physical server.
      
      64 processes are associated with each vPort and creates and destroys
      one QP for each of the remote 64 processes. That is, 1024 QPs per
      vPort, all in all 16K QPs. The QPs are created/destroyed using the
      CM.
      
      When tearing down these 16K QPs, excessive CM DREQ retries (and
      duplicates) are observed. With some cat/paste/awk wizardry on the
      infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
      nodes:
      
      cm_rx_duplicates:
            dreq  2102
      cm_rx_msgs:
            drep  1989
            dreq  6195
             rep  3968
             req  4224
             rtu  4224
      cm_tx_msgs:
            drep  4093
            dreq 27568
             rep  4224
             req  3968
             rtu  3968
      cm_tx_retries:
            dreq 23469
      
      Note that the active/passive side is equally distributed between the
      two nodes.
      
      Enabling pr_debug in cm.c gives tons of:
      
      [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
      1,sl_cm_id: 0xd393089f} is NULL!
      
      By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
      tear-down phase of the application is reduced from approximately 90 to
      50 seconds. Retries/duplicates are also significantly reduced:
      
      cm_rx_duplicates:
            dreq  2460
      []
      cm_tx_retries:
            dreq  3010
             req    47
      
      Increasing the timeout further didn't help, as these duplicates and
      retries stems from a too short CMA timeout, which was 20 (~4 seconds)
      on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
      numbers fell down to about 10 for both of them.
      
      Adjustment of the CMA timeout is not part of this commit.
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Acked-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d3ec442d
    • Dongli Zhang's avatar
      loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part() · 61584032
      Dongli Zhang authored
      [ Upstream commit 758a58d0 ]
      
      Commit 0da03cab
      ("loop: Fix deadlock when calling blkdev_reread_part()") moves
      blkdev_reread_part() out of the loop_ctl_mutex. However,
      GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
      __blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
      will not rescan the loop device to delete all partitions.
      
      Below are steps to reproduce the issue:
      
      step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
      step2 # losetup -P /dev/loop0 tmp.raw
      step3 # parted /dev/loop0 mklabel gpt
      step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
      step5 # losetup -d /dev/loop0
      
      Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
      there is below kernel warning message:
      
      [  464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)
      
      This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().
      
      Fixes: 0da03cab ("loop: Fix deadlock when calling blkdev_reread_part()")
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      61584032
    • Vadim Pasternak's avatar
      platform/mellanox: mlxreg-hotplug: Fix KASAN warning · 07a31820
      Vadim Pasternak authored
      [ Upstream commit e4c275f7 ]
      
      Fix the following KASAN warning produced when booting a 64-bit kernel:
      [   13.334750] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70
      [   13.342166] Read of size 8 at addr ffff880235067178 by task kworker/2:1/42
      [   13.342176] CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 4.20.0-rc1+ #106
      [   13.342179] Hardware name: Mellanox Technologies Ltd. MSN2740/Mellanox x86 SFF board, BIOS 5.6.5 06/07/2016
      [   13.342190] Workqueue: events deferred_probe_work_func
      [   13.342194] Call Trace:
      [   13.342206]  dump_stack+0xc7/0x15b
      [   13.342214]  ? show_regs_print_info+0x5/0x5
      [   13.342220]  ? kmsg_dump_rewind_nolock+0x59/0x59
      [   13.342234]  ? _raw_write_lock_irqsave+0x100/0x100
      [   13.351593]  print_address_description+0x73/0x260
      [   13.351603]  kasan_report+0x260/0x380
      [   13.351611]  ? find_first_bit+0x19/0x70
      [   13.351619]  find_first_bit+0x19/0x70
      [   13.351630]  mlxreg_hotplug_work_handler+0x73c/0x920 [mlxreg_hotplug]
      [   13.351639]  ? __lock_text_start+0x8/0x8
      [   13.351646]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.351656]  ? mlxreg_hotplug_remove+0x1e0/0x1e0 [mlxreg_hotplug]
      [   13.351663]  ? regmap_volatile+0x40/0xb0
      [   13.351668]  ? regcache_write+0x4c/0x90
      [   13.351676]  ? mlxplat_mlxcpld_reg_write+0x24/0x30 [mlx_platform]
      [   13.351681]  ? _regmap_write+0xea/0x220
      [   13.351688]  ? __mutex_lock_slowpath+0x10/0x10
      [   13.351696]  ? devm_add_action+0x70/0x70
      [   13.351701]  ? mutex_unlock+0x1d/0x40
      [   13.351710]  mlxreg_hotplug_probe+0x82e/0x989 [mlxreg_hotplug]
      [   13.351723]  ? mlxreg_hotplug_work_handler+0x920/0x920 [mlxreg_hotplug]
      [   13.351731]  ? sysfs_do_create_link_sd.isra.2+0xf4/0x190
      [   13.351737]  ? sysfs_rename_link_ns+0xf0/0xf0
      [   13.351743]  ? devres_close_group+0x2b0/0x2b0
      [   13.351749]  ? pinctrl_put+0x20/0x20
      [   13.351755]  ? acpi_dev_pm_attach+0x2c/0xd0
      [   13.351763]  platform_drv_probe+0x70/0xd0
      [   13.351771]  really_probe+0x480/0x6e0
      [   13.351778]  ? device_attach+0x10/0x10
      [   13.351784]  ? __lock_text_start+0x8/0x8
      [   13.351790]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.351797]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.351806]  ? __driver_attach+0x190/0x190
      [   13.351812]  driver_probe_device+0x17d/0x1a0
      [   13.351819]  ? __driver_attach+0x190/0x190
      [   13.351825]  bus_for_each_drv+0xd6/0x130
      [   13.351831]  ? bus_rescan_devices+0x20/0x20
      [   13.351837]  ? __mutex_lock_slowpath+0x10/0x10
      [   13.351845]  __device_attach+0x18c/0x230
      [   13.351852]  ? device_bind_driver+0x70/0x70
      [   13.351859]  ? __mutex_lock_slowpath+0x10/0x10
      [   13.351866]  bus_probe_device+0xea/0x110
      [   13.351874]  deferred_probe_work_func+0x1c9/0x290
      [   13.351882]  ? driver_deferred_probe_add+0x1d0/0x1d0
      [   13.351889]  ? preempt_notifier_dec+0x20/0x20
      [   13.351897]  ? read_word_at_a_time+0xe/0x20
      [   13.351904]  ? strscpy+0x151/0x290
      [   13.351912]  ? set_work_pool_and_clear_pending+0x9c/0xf0
      [   13.351918]  ? __switch_to_asm+0x34/0x70
      [   13.351924]  ? __switch_to_asm+0x40/0x70
      [   13.351929]  ? __switch_to_asm+0x34/0x70
      [   13.351935]  ? __switch_to_asm+0x40/0x70
      [   13.351942]  process_one_work+0x5cc/0xa00
      [   13.351952]  ? pwq_dec_nr_in_flight+0x1e0/0x1e0
      [   13.351960]  ? pci_mmcfg_check_reserved+0x80/0xb8
      [   13.351967]  ? run_rebalance_domains+0x250/0x250
      [   13.351980]  ? stack_access_ok+0x35/0x80
      [   13.351986]  ? deref_stack_reg+0xa1/0xe0
      [   13.351994]  ? schedule+0xcd/0x250
      [   13.352000]  ? worker_enter_idle+0x2d6/0x330
      [   13.352006]  ? __schedule+0xeb0/0xeb0
      [   13.352014]  ? fork_usermode_blob+0x130/0x130
      [   13.352019]  ? mutex_lock+0xa7/0x100
      [   13.352026]  ? _raw_spin_lock_irq+0x98/0xf0
      [   13.352032]  ? _raw_read_unlock_irqrestore+0x30/0x30
      [   13.352037] i2c i2c-2: Added multiplexed i2c bus 11
      [   13.352043]  worker_thread+0x181/0xa80
      [   13.352052]  ? __switch_to_asm+0x34/0x70
      [   13.352058]  ? __switch_to_asm+0x40/0x70
      [   13.352064]  ? process_one_work+0xa00/0xa00
      [   13.352070]  ? __switch_to_asm+0x34/0x70
      [   13.352076]  ? __switch_to_asm+0x40/0x70
      [   13.352081]  ? __switch_to_asm+0x34/0x70
      [   13.352086]  ? __switch_to_asm+0x40/0x70
      [   13.352092]  ? __switch_to_asm+0x34/0x70
      [   13.352097]  ? __switch_to_asm+0x40/0x70
      [   13.352105]  ? __schedule+0x3d6/0xeb0
      [   13.352112]  ? migrate_swap_stop+0x470/0x470
      [   13.352119]  ? save_stack+0x89/0xb0
      [   13.352127]  ? kmem_cache_alloc_trace+0xe5/0x570
      [   13.352132]  ? kthread+0x59/0x1d0
      [   13.352138]  ? ret_from_fork+0x35/0x40
      [   13.352154]  ? __schedule+0xeb0/0xeb0
      [   13.352161]  ? remove_wait_queue+0x150/0x150
      [   13.352169]  ? _raw_write_lock_irqsave+0x80/0x100
      [   13.352175]  ? __lock_text_start+0x8/0x8
      [   13.352183]  ? process_one_work+0xa00/0xa00
      [   13.352188]  kthread+0x1a4/0x1d0
      [   13.352195]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [   13.352202]  ret_from_fork+0x35/0x40
      
      [   13.353879] The buggy address belongs to the page:
      [   13.353885] page:ffffea0008d419c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [   13.353890] flags: 0x2ffff8000000000()
      [   13.353897] raw: 02ffff8000000000 ffffea0008d419c8 ffffea0008d419c8 0000000000000000
      [   13.353903] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [   13.353905] page dumped because: kasan: bad access detected
      
      [   13.353908] Memory state around the buggy address:
      [   13.353912]  ffff880235067000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   13.353917]  ffff880235067080: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04
      [   13.353921] >ffff880235067100: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 f2 f2 f2 f2 04
      [   13.353923]                                                                 ^
      [   13.353927]  ffff880235067180: f2 f2 f2 f2 f2 f2 f2 04 f2 f2 f2 00 00 00 00 00
      [   13.353931]  ffff880235067200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   13.353933] ==================================================================
      
      The warning is caused by the below loop:
      	for_each_set_bit(bit, (unsigned long *)&asserted, 8) {
      while "asserted" is declared as 'unsigned'.
      
      The casting of 32-bit unsigned integer pointer to a 64-bit unsigned long
      pointer. There are two problems here.
      It causes the access of four extra byte, which can corrupt memory
      The 32-bit pointer address may not be 64-bit aligned.
      
      The fix changes variable "asserted" to "unsigned long".
      
      Fixes: 1f976f69 ("platform/x86: Move Mellanox platform hotplug driver to platform/mellanox")
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Signed-off-by: default avatarDarren Hart (VMware) <dvhart@infradead.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      07a31820
    • Yang Fan's avatar
      platform/x86: ideapad-laptop: Fix no_hw_rfkill_list for Lenovo RESCUER R720-15IKBN · 0bacfb4a
      Yang Fan authored
      [ Upstream commit 4d9b2864 ]
      
      Commit ae7c8cba ("platform/x86: ideapad-laptop: add lenovo RESCUER
      R720-15IKBN to no_hw_rfkill_list") added
          DMI_MATCH(DMI_BOARD_NAME, "80WW")
      for Lenovo RESCUER R720-15IKBN.
      
      But DMI_BOARD_NAME does not match 80WW on Lenovo RESCUER R720-15IKBN,
      thus cause Wireless LAN still be hard blocked.
      
      On Lenovo RESCUER R720-15IKBN:
          ~$ cat /sys/class/dmi/id/sys_vendor
          LENOVO
          ~$ cat /sys/class/dmi/id/board_name
          Provence-5R3
          ~$ cat /sys/class/dmi/id/product_name
          80WW
          ~$ cat /sys/class/dmi/id/product_version
          Lenovo R720-15IKBN
      
      So on Lenovo RESCUER R720-15IKBN:
          DMI_SYS_VENDOR should match "LENOVO",
          DMI_BOARD_NAME should match "Provence-5R3",
          DMI_PRODUCT_NAME should match "80WW",
          DMI_PRODUCT_VERSION should match "Lenovo R720-15IKBN".
      
      Fix it, and in according with other entries in no_hw_rfkill_list,
      use DMI_PRODUCT_VERSION instead of DMI_BOARD_NAME.
      
      Fixes: ae7c8cba ("platform/x86: ideapad-laptop: add lenovo RESCUER R720-15IKBN to no_hw_rfkill_list")
      Signed-off-by: default avatarYang Fan <nullptr.cpp@gmail.com>
      Signed-off-by: default avatarDarren Hart (VMware) <dvhart@infradead.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0bacfb4a
    • Florian Fainelli's avatar
      mlxsw: spectrum: Avoid -Wformat-truncation warnings · a64ffbaf
      Florian Fainelli authored
      [ Upstream commit ab2c4e25 ]
      
      Give precision identifiers to the two snprintf() formatting the priority
      and TC strings to avoid producing these two warnings:
      
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
      'mlxsw_sp_port_get_prio_strings':
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:37: warning: '%d'
      directive output may be truncated writing between 1 and 3 bytes into a
      region of size between 0 and 31 [-Wformat-truncation=]
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
                                           ^~
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:3: note: 'snprintf'
      output between 3 and 36 bytes into a destination of size 32
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           mlxsw_sp_port_hw_prio_stats[i].str, prio);
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
      'mlxsw_sp_port_get_tc_strings':
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:37: warning: '%d'
      directive output may be truncated writing between 1 and 11 bytes into a
      region of size between 0 and 31 [-Wformat-truncation=]
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
                                           ^~
      drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:3: note: 'snprintf'
      output between 3 and 44 bytes into a destination of size 32
         snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           mlxsw_sp_port_hw_tc_stats[i].str, tc);
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a64ffbaf
    • Florian Fainelli's avatar
      e1000e: Fix -Wformat-truncation warnings · 49dd86f0
      Florian Fainelli authored
      [ Upstream commit 135e7245 ]
      
      Provide precision hints to snprintf() since we know the destination
      buffer size of the RX/TX ring names are IFNAMSIZ + 5 - 1. This fixes the
      following warnings:
      
      drivers/net/ethernet/intel/e1000e/netdev.c: In function
      'e1000_request_msix':
      drivers/net/ethernet/intel/e1000e/netdev.c:2109:13: warning: 'snprintf'
      output may be truncated before the last format character
      [-Wformat-truncation=]
           "%s-rx-0", netdev->name);
                   ^
      drivers/net/ethernet/intel/e1000e/netdev.c:2107:3: note: 'snprintf'
      output between 6 and 21 bytes into a destination of size 20
         snprintf(adapter->rx_ring->name,
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           sizeof(adapter->rx_ring->name) - 1,
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           "%s-rx-0", netdev->name);
           ~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/intel/e1000e/netdev.c:2125:13: warning: 'snprintf'
      output may be truncated before the last format character
      [-Wformat-truncation=]
           "%s-tx-0", netdev->name);
                   ^
      drivers/net/ethernet/intel/e1000e/netdev.c:2123:3: note: 'snprintf'
      output between 6 and 21 bytes into a destination of size 20
         snprintf(adapter->tx_ring->name,
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           sizeof(adapter->tx_ring->name) - 1,
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           "%s-tx-0", netdev->name);
           ~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      49dd86f0
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat · c6fb45d8
      Andrew Lunn authored
      [ Upstream commit f6d9758b ]
      
      The following false positive lockdep splat has been observed.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.20.0+ #302 Not tainted
      ------------------------------------------------------
      systemd-udevd/160 is trying to acquire lock:
      edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704
      
      but task is already holding lock:
      edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&desc->request_mutex){+.+.}:
             mutex_lock_nested+0x1c/0x24
             __setup_irq+0xa0/0x704
             request_threaded_irq+0xd0/0x150
             mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
             mdio_probe+0x2c/0x54
             really_probe+0x200/0x2c4
             driver_probe_device+0x5c/0x174
             __driver_attach+0xd8/0xdc
             bus_for_each_dev+0x58/0x7c
             bus_add_driver+0xe4/0x1f0
             driver_register+0x7c/0x110
             mdio_driver_register+0x24/0x58
             do_one_initcall+0x74/0x2e8
             do_init_module+0x60/0x1d0
             load_module+0x1968/0x1ff4
             sys_finit_module+0x8c/0x98
             ret_fast_syscall+0x0/0x28
             0xbedf2ae8
      
      -> #0 (&chip->reg_lock){+.+.}:
             __mutex_lock+0x50/0x8b8
             mutex_lock_nested+0x1c/0x24
             __setup_irq+0x640/0x704
             request_threaded_irq+0xd0/0x150
             mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
             mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
             mdio_probe+0x2c/0x54
             really_probe+0x200/0x2c4
             driver_probe_device+0x5c/0x174
             __driver_attach+0xd8/0xdc
             bus_for_each_dev+0x58/0x7c
             bus_add_driver+0xe4/0x1f0
             driver_register+0x7c/0x110
             mdio_driver_register+0x24/0x58
             do_one_initcall+0x74/0x2e8
             do_init_module+0x60/0x1d0
             load_module+0x1968/0x1ff4
             sys_finit_module+0x8c/0x98
             ret_fast_syscall+0x0/0x28
             0xbedf2ae8
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&desc->request_mutex);
                                     lock(&chip->reg_lock);
                                     lock(&desc->request_mutex);
        lock(&chip->reg_lock);
      
      &desc->request_mutex refer to two different mutex. #1 is the GPIO for
      the chip interrupt. #2 is the chained interrupt between global 1 and
      global 2.
      
      Add lockdep classes to the GPIO interrupt to avoid this.
      Reported-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c6fb45d8
    • Aaro Koskinen's avatar
      mmc: omap: fix the maximum timeout setting · 194b888a
      Aaro Koskinen authored
      [ Upstream commit a6327b5e ]
      
      When running OMAP1 kernel on QEMU, MMC access is annoyingly noisy:
      
      	MMC: CTO of 0xff and 0xfe cannot be used!
      	MMC: CTO of 0xff and 0xfe cannot be used!
      	MMC: CTO of 0xff and 0xfe cannot be used!
      	[ad inf.]
      
      Emulator warnings appear to be valid. The TI document SPRU680 [1]
      ("OMAP5910 Dual-Core Processor MultiMedia Card/Secure Data Memory Card
      (MMC/SD) Reference Guide") page 36 states that the maximum timeout is 253
      cycles and "0xff and 0xfe cannot be used".
      
      Fix by using 0xfd as the maximum timeout.
      
      Tested using QEMU 2.5 (Siemens SX1 machine, OMAP310), and also checked on
      real hardware using Palm TE (OMAP310), Nokia 770 (OMAP1710) and Nokia N810
      (OMAP2420) that MMC works as before.
      
      [1] http://www.ti.com/lit/ug/spru680/spru680.pdf
      
      Fixes: 730c9b7e ("[MMC] Add OMAP MMC host driver")
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      194b888a
    • Qu Wenruo's avatar
      btrfs: qgroup: Make qgroup async transaction commit more aggressive · dcedd379
      Qu Wenruo authored
      [ Upstream commit f5fef459 ]
      
      [BUG]
      Btrfs qgroup will still hit EDQUOT under the following case:
      
        $ dev=/dev/test/test
        $ mnt=/mnt/btrfs
        $ umount $mnt &> /dev/null
        $ umount $dev &> /dev/null
      
        $ mkfs.btrfs -f $dev
        $ mount $dev $mnt -o nospace_cache
      
        $ btrfs subv create $mnt/subv
        $ btrfs quota enable $mnt
        $ btrfs quota rescan -w $mnt
        $ btrfs qgroup limit -e 1G $mnt/subv
      
        $ fallocate -l 900M $mnt/subv/padding
        $ sync
      
        $ rm $mnt/subv/padding
      
        # Hit EDQUOT
        $ xfs_io -f -c "pwrite 0 512M" $mnt/subv/real_file
      
      [CAUSE]
      Since commit a514d638 ("btrfs: qgroup: Commit transaction in advance
      to reduce early EDQUOT"), btrfs is not forced to commit transaction to
      reclaim more quota space.
      
      Instead, we just check pertrans metadata reservation against some
      threshold and try to do asynchronously transaction commit.
      
      However in above case, the pertrans metadata reservation is pretty small
      thus it will never trigger asynchronous transaction commit.
      
      [FIX]
      Instead of only accounting pertrans metadata reservation, we calculate
      how much free space we have, and if there isn't much free space left,
      commit transaction asynchronously to try to free some space.
      
      This may slow down the fs when we have less than 32M free qgroup space,
      but should reduce a lot of false EDQUOT, so the cost should be
      acceptable.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dcedd379
    • Aneesh Kumar K.V's avatar
      powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback · 6cf5f631
      Aneesh Kumar K.V authored
      [ Upstream commit 5330367f ]
      
      After we ALIGN up the address we need to make sure we didn't overflow
      and resulted in zero address. In that case, we need to make sure that
      the returned address is greater than mmap_min_addr.
      
      This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
      run as non root user for
      
      mmap(-1, MAP_HUGETLB)
      
      The bug is that a non-root user requesting address -1 will be given address 0
      which will then fail, whereas they should have been given something else that
      would have succeeded.
      
      We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
      with this change. So we think this is not a security issue, because it only affects
      whether we choose an address below mmap_min_addr, not whether we
      actually allow that address to be mapped. ie. there are existing capability
      checks to prevent a user mapping below mmap_min_addr and those will still be
      honoured even without this fix.
      
      Fixes: 48483760 ("powerpc/mm: Add radix support for hugetlb")
      Reviewed-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6cf5f631
    • Nicolas Boichat's avatar
      iommu/io-pgtable-arm-v7s: Only kmemleak_ignore L2 tables · fc96b44c
      Nicolas Boichat authored
      [ Upstream commit 032ebd85 ]
      
      L1 tables are allocated with __get_dma_pages, and therefore already
      ignored by kmemleak.
      
      Without this, the kernel would print this error message on boot,
      when the first L1 table is allocated:
      
      [    2.810533] kmemleak: Trying to color unknown object at 0xffffffd652388000 as Black
      [    2.818190] CPU: 5 PID: 39 Comm: kworker/5:0 Tainted: G S                4.19.16 #8
      [    2.831227] Workqueue: events deferred_probe_work_func
      [    2.836353] Call trace:
      ...
      [    2.852532]  paint_ptr+0xa0/0xa8
      [    2.855750]  kmemleak_ignore+0x38/0x6c
      [    2.859490]  __arm_v7s_alloc_table+0x168/0x1f4
      [    2.863922]  arm_v7s_alloc_pgtable+0x114/0x17c
      [    2.868354]  alloc_io_pgtable_ops+0x3c/0x78
      ...
      
      Fixes: e5fc9753 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
      Signed-off-by: default avatarNicolas Boichat <drinkcat@chromium.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fc96b44c
    • Sebastian Andrzej Siewior's avatar
      ARM: 8840/1: use a raw_spinlock_t in unwind · d81bdb3c
      Sebastian Andrzej Siewior authored
      [ Upstream commit 74ffe79a ]
      
      Mostly unwind is done with irqs enabled however SLUB may call it with
      irqs disabled while creating a new SLUB cache.
      
      I had system freeze while loading a module which called
      kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
      interrupts and then
      
      ->new_slab_objects()
       ->new_slab()
        ->setup_object()
         ->setup_object_debug()
          ->init_tracking()
           ->set_track()
            ->save_stack_trace()
             ->save_stack_trace_tsk()
              ->walk_stackframe()
               ->unwind_frame()
                ->unwind_find_idx()
                 =>spin_lock_irqsave(&unwind_lock);
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d81bdb3c
    • Lubomir Rintel's avatar
      serial: 8250_pxa: honor the port number from devicetree · 95130717
      Lubomir Rintel authored
      [ Upstream commit fe9ed6d2 ]
      
      Like the other OF-enabled drivers, use the port number from the firmware if
      the devicetree specifies an alias:
      
        aliases {
            ...
            serial2 = &uart2; /* Should be ttyS2 */
        }
      
      This is how the deprecated pxa.c driver behaved, switching to 8250_pxa
      messes up the numbering.
      Signed-off-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      95130717
    • Sai Prakash Ranjan's avatar
      coresight: etm4x: Add support to enable ETMv4.2 · 2636ccec
      Sai Prakash Ranjan authored
      [ Upstream commit 5666dfd1 ]
      
      SDM845 has ETMv4.2 and can use the existing etm4x driver.
      But the current etm driver checks only for ETMv4.0 and
      errors out for other etm4x versions. This patch adds this
      missing support to enable SoC's with ETMv4x to use same
      driver by checking only the ETM architecture major version
      number.
      
      Without this change, we get below error during etm probe:
      
      / # dmesg | grep etm
      [    6.660093] coresight-etm4x: probe of 7040000.etm failed with error -22
      [    6.666902] coresight-etm4x: probe of 7140000.etm failed with error -22
      [    6.673708] coresight-etm4x: probe of 7240000.etm failed with error -22
      [    6.680511] coresight-etm4x: probe of 7340000.etm failed with error -22
      [    6.687313] coresight-etm4x: probe of 7440000.etm failed with error -22
      [    6.694113] coresight-etm4x: probe of 7540000.etm failed with error -22
      [    6.700914] coresight-etm4x: probe of 7640000.etm failed with error -22
      [    6.707717] coresight-etm4x: probe of 7740000.etm failed with error -22
      
      With this change, etm probe is successful:
      
      / # dmesg | grep etm
      [    6.659198] coresight-etm4x 7040000.etm: CPU0: ETM v4.2 initialized
      [    6.665848] coresight-etm4x 7140000.etm: CPU1: ETM v4.2 initialized
      [    6.672493] coresight-etm4x 7240000.etm: CPU2: ETM v4.2 initialized
      [    6.679129] coresight-etm4x 7340000.etm: CPU3: ETM v4.2 initialized
      [    6.685770] coresight-etm4x 7440000.etm: CPU4: ETM v4.2 initialized
      [    6.692403] coresight-etm4x 7540000.etm: CPU5: ETM v4.2 initialized
      [    6.699024] coresight-etm4x 7640000.etm: CPU6: ETM v4.2 initialized
      [    6.705646] coresight-etm4x 7740000.etm: CPU7: ETM v4.2 initialized
      Signed-off-by: default avatarSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2636ccec
    • Nathan Chancellor's avatar
      powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc · c70214d5
      Nathan Chancellor authored
      [ Upstream commit e7140639 ]
      
      When building with -Wsometimes-uninitialized, Clang warns:
      
        arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
        uninitialized whenever 'if' condition is false
        [-Wsometimes-uninitialized]
          if (cpu_has_feature(CPU_FTRS_POWER9))
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
          if (opcode == NULL)
              ^~~~~~
        arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
        condition is always true
          if (cpu_has_feature(CPU_FTRS_POWER9))
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
        'opcode' to silence this warning
          const struct powerpc_opcode *opcode;
                                             ^
                                              = NULL
        1 warning generated.
      
      This warning seems to make no sense on the surface because opcode is set
      to NULL right below this statement. However, there is a comma instead of
      semicolon to end the dialect assignment, meaning that the opcode
      assignment only happens in the if statement. Properly terminate that
      line so that Clang no longer warns.
      
      Fixes: 5b102782 ("powerpc/xmon: Enable disassembly files (compilation changes)")
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c70214d5
    • Masahiro Yamada's avatar
      kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing · 638ecaf5
      Masahiro Yamada authored
      [ Upstream commit 9390dff6 ]
      
      If include/config/auto.conf.cmd is lost for some reasons, it is not
      self-healing, so the top Makefile misses to run syncconfig.
      Move include/config/auto.conf.cmd to the target side.
      
      I used a pattern rule instead of a normal rule here although it is
      a bit gross.
      
      If the rule were written with a normal rule like this,
      
        include/config/auto.conf \
        include/config/auto.conf.cmd \
        include/config/tristate.conf: $(KCONFIG_CONFIG)
                $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
      
      ... syncconfig would be executed per target.
      
      Using a pattern rule makes sure that syncconfig is executed just once
      because Make assumes the recipe will create all of the targets.
      
      Here is a quote from the GNU Make manual [1]:
      
      "Pattern rules may have more than one target. Unlike normal rules,
      this does not act as many different rules with the same prerequisites
      and recipe. If a pattern rule has multiple targets, make knows that
      the rule's recipe is responsible for making all of the targets. The
      recipe is executed only once to make all the targets. When searching
      for a pattern rule to match a target, the target patterns of a rule
      other than the one that matches the target in need of a rule are
      incidental: make worries only about giving a recipe and prerequisites
      to the file presently in question. However, when this file's recipe is
      run, the other targets are marked as having been updated themselves."
      
      [1]: https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.htmlSigned-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      638ecaf5
    • Benjamin Block's avatar
      scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c · 5db10748
      Benjamin Block authored
      [ Upstream commit 1749ef00 ]
      
      We had a test-report where, under memory pressure, adding LUNs to the
      systems would fail (the tests add LUNs strictly in sequence):
      
      [ 5525.853432] scsi 0:0:1:1088045124: Direct-Access     IBM      2107900          .148 PQ: 0 ANSI: 5
      [ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
      [ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
      [ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
      [ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
      [ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
      [ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
      [ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
      [ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      [ 5525.857838]  sdk: sdk1
      [ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
      [ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
      [ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
      [ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
      [ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
      [ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
      [ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
      
      Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
      there seems to be no reason to use GFP_ATMOIC here. All the different
      call-contexts use a mutex at some point, and nothing in between that
      requires no sleeping, as far as I could see. Additionally, the code that
      later allocates the block queue for the device (scsi_mq_alloc_queue())
      already uses GFP_KERNEL.
      
      There are similar allocations in two other functions:
      scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done with
      GFP_KERNEL.
      
      Here is the contexts for the three functions so far:
      
          scsi_alloc_sdev()
              scsi_probe_and_add_lun()
                  scsi_sequential_lun_scan()
                      __scsi_scan_target()
                          scsi_scan_target()
                              mutex_lock()
                          scsi_scan_channel()
                              scsi_scan_host_selected()
                                  mutex_lock()
                  scsi_report_lun_scan()
                      __scsi_scan_target()
          	            ...
                  __scsi_add_device()
                      mutex_lock()
                  __scsi_scan_target()
                      ...
              scsi_report_lun_scan()
                  ...
              scsi_get_host_dev()
                  mutex_lock()
      
          scsi_probe_and_add_lun()
              ...
      
          scsi_add_lun()
              scsi_probe_and_add_lun()
                  ...
      
      So replace all these, and give them a bit of a better chance to succeed,
      with more chances of reclaim.
      Signed-off-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5db10748
    • Alexey Kardashevskiy's avatar
      powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables · 4acf7974
      Alexey Kardashevskiy authored
      [ Upstream commit 11f5acce ]
      
      We store 2 multilevel tables in iommu_table - one for the hardware and
      one with the corresponding userspace addresses. Before allocating
      the tables, the iommu_table_group_ops::get_table_size() hook returns
      the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
      the locked_vm counter correctly. When the table is actually allocated,
      the amount of allocated memory is stored in iommu_table::it_allocated_size
      and used to decrement the locked_vm counter when we release the memory
      used by the table; .get_table_size() and .create_table() calculate it
      independently but the result is expected to be the same.
      
      However the allocator does not add the userspace table size to
      .it_allocated_size so when we destroy the table because of VFIO PCI
      unplug (i.e. VFIO container is gone but the userspace keeps running),
      we decrement locked_vm by just a half of size of memory we are
      releasing.
      
      To make things worse, since we enabled on-demand allocation of
      indirect levels, it_allocated_size contains only the amount of memory
      actually allocated at the table creation time which can just be a
      fraction. It is not a problem with incrementing locked_vm (as
      get_table_size() value is used) but it is with decrementing.
      
      As the result, we leak locked_vm and may not be able to allocate more
      IOMMU tables after few iterations of hotplug/unplug.
      
      This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
      hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
      have a single place which calculates the maximum memory a table can
      occupy. The original meaning of it_allocated_size is somewhat lost now
      though.
      
      We do not ditch it_allocated_size whatsoever here and we do not call
      get_table_size() from vfio_iommu_spapr_tce.c when decrementing
      locked_vm as we may have multiple IOMMU groups per container and even
      though they all are supposed to have the same get_table_size()
      implementation, there is a small chance for failure or confusion.
      
      Fixes: 090bad39 ("powerpc/powernv: Add indirect levels to it_userspace")
      Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4acf7974
    • Paul Kocialkowski's avatar
      usb: chipidea: Grab the (legacy) USB PHY by phandle first · 6030bcc0
      Paul Kocialkowski authored
      [ Upstream commit 68ef2362 ]
      
      According to the chipidea driver bindings, the USB PHY is specified via
      the "phys" phandle node. However, this only takes effect for USB PHYs
      that use the common PHY framework. For legacy USB PHYs, a simple lookup
      based on the USB PHY type is done instead.
      
      This does not play out well when more than one USB PHY is registered,
      since the first registered PHY matching the type will always be
      returned regardless of what the driver was bound to.
      
      Fix this by looking up the PHY based on the "phys" phandle node.
      Although generic PHYs are rather matched by their "phys-name" and not
      the "phys" phandle directly, there is no helper for similar lookup on
      legacy PHYs and it's probably not worth the effort to add it.
      
      When no legacy USB PHY is found by phandle, fallback to grabbing any
      registered USB2 PHY. This ensures backward compatibility if some users
      were actually relying on this mechanism.
      Signed-off-by: default avatarPaul Kocialkowski <paul.kocialkowski@bootlin.com>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6030bcc0
    • Eric Biggers's avatar
      crypto: cavium/zip - fix collision with generic cra_driver_name · b142c797
      Eric Biggers authored
      [ Upstream commit 41798036 ]
      
      The cavium/zip implementation of the deflate compression algorithm is
      incorrectly being registered under the generic driver name, which
      prevents the generic implementation from being registered with the
      crypto API when CONFIG_CRYPTO_DEV_CAVIUM_ZIP=y.  Similarly the lzs
      algorithm (which does not currently have a generic implementation...)
      is incorrectly being registered as lzs-generic.
      
      Fix the naming collision by adding a suffix "-cavium" to the
      cra_driver_name of the cavium/zip algorithms.
      
      Fixes: 640035a2 ("crypto: zip - Add ThunderX ZIP driver core")
      Cc: Mahipal Challa <mahipalreddy2006@gmail.com>
      Cc: Jan Glauber <jglauber@cavium.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b142c797
    • Julia Lawall's avatar
      crypto: crypto4xx - add missing of_node_put after of_device_is_available · d401d121
      Julia Lawall authored
      [ Upstream commit 8c2b43d2 ]
      
      Add an of_node_put when a tested device node is not available.
      
      The semantic patch that fixes this problem is as follows
      (http://coccinelle.lip6.fr):
      
      // <smpl>
      @@
      identifier f;
      local idexpression e;
      expression x;
      @@
      
      e = f(...);
      ... when != of_node_put(e)
          when != x = e
          when != e = x
          when any
      if (<+...of_device_is_available(e)...+>) {
        ... when != of_node_put(e)
      (
        return e;
      |
      + of_node_put(e);
        return ...;
      )
      }
      // </smpl>
      
      Fixes: 5343e674 ("crypto4xx: integrate ppc4xx-rng into crypto4xx")
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d401d121
    • Wen Yang's avatar
      mt76: fix a leaked reference by adding a missing of_node_put · 241ebd2e
      Wen Yang authored
      [ Upstream commit 34e022d8 ]
      
      The call to of_find_node_by_phandle returns a node pointer with refcount
      incremented thus it must be explicitly decremented after the last
      usage.
      
      Detected by coccinelle with the following warnings:
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:58:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:61:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:67:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:70:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      ./drivers/net/wireless/mediatek/mt76/eeprom.c:72:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
      Signed-off-by: default avatarWen Yang <wen.yang99@zte.com.cn>
      Cc: Felix Fietkau <nbd@nbd.name>
      Cc: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: linux-wireless@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mediatek@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      241ebd2e
    • Alexei Avshalom Lazar's avatar
      wil6210: check null pointer in _wil_cfg80211_merge_extra_ies · 6115055b
      Alexei Avshalom Lazar authored
      [ Upstream commit de77a53c ]
      
      ies1 or ies2 might be null when code inside
      _wil_cfg80211_merge_extra_ies access them.
      Add explicit check for null and make sure ies1/ies2 are not
      accessed in such a case.
      
      spos might be null and be accessed inside
      _wil_cfg80211_merge_extra_ies.
      Add explicit check for null in the while condition statement
      and make sure spos is not accessed in such a case.
      Signed-off-by: default avatarAlexei Avshalom Lazar <ailizaro@codeaurora.org>
      Signed-off-by: default avatarMaya Erez <merez@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6115055b
    • Rafael J. Wysocki's avatar
      PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove() · 9546c366
      Rafael J. Wysocki authored
      [ Upstream commit 95c80bc6 ]
      
      Dongdong reported a deadlock triggered by a hotplug event during a sysfs
      "remove" operation:
      
        pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
        # echo 1 > 0000:00:0c.0/remove
      
        PME and hotplug share an MSI/MSI-X vector.  The sysfs "remove" side is:
      
          remove_store
             pci_stop_and_remove_bus_device_locked
      	 pci_lock_rescan_remove
      	 pci_stop_and_remove_bus_device
      	   ...
      	   pcie_pme_remove
      	     pcie_pme_suspend
      	       synchronize_irq        # wait for hotplug IRQ handler
      	 pci_unlock_rescan_remove
      
        The hotplug side is:
      
          pciehp_ist
             pciehp_handle_presence_or_link_change
      	 pciehp_configure_device
      	   pci_lock_rescan_remove     # wait for pci_unlock_rescan_remove()
      
        INFO: task bash:10913 blocked for more than 120 seconds.
      
        # ps -ax |grep D
         PID TTY      STAT   TIME COMMAND
        10913 ttyAMA0  Ds+    0:00 -bash
        14022 ?        D      0:00 [irq/745-pciehp]
      
        # cat /proc/14022/stack
        __switch_to+0x94/0xd8
        pci_lock_rescan_remove+0x20/0x28
        pciehp_configure_device+0x30/0x140
        pciehp_handle_presence_or_link_change+0x324/0x458
        pciehp_ist+0x1dc/0x1e0
      
        # cat /proc/10913/stack
        __switch_to+0x94/0xd8
        synchronize_irq+0x8c/0xc0
        pcie_pme_suspend+0xa4/0x118
        pcie_pme_remove+0x20/0x40
        pcie_port_remove_service+0x3c/0x58
        ...
        pcie_port_device_remove+0x2c/0x48
        pcie_portdrv_remove+0x68/0x78
        pci_device_remove+0x48/0x120
        ...
        pci_stop_bus_device+0x84/0xc0
        pci_stop_and_remove_bus_device_locked+0x24/0x40
        remove_store+0xa4/0xb8
        dev_attr_store+0x44/0x60
        sysfs_kf_write+0x58/0x80
      
      It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
      reasons.
      
      First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
      native hotplug interrupt handler as well as for the PME one, because they
      share one IRQ (as per the spec).  That may deadlock if hotplug is signaled
      while pcie_pme_remove() is running and the latter calls
      pci_lock_rescan_remove() before the former.
      
      Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
      for the port, it will return without disabling the interrupt as expected by
      pcie_pme_remove() which was overlooked by commit c7b5a4e6 ("PCI / PM:
      Fix native PME handling during system suspend/resume").
      
      To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
      its status and prevent the PME worker function from re-enabling it before
      calling free_irq() on it, which should be sufficient.
      
      Fixes: c7b5a4e6 ("PCI / PM: Fix native PME handling during system suspend/resume")
      Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.comReported-by: default avatarDongdong Liu <liudongdong3@huawei.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bhelgaas: add URL and deadlock details from Dongdong]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9546c366
    • Tony Jones's avatar
      tools lib traceevent: Fix buffer overflow in arg_eval · 224c996e
      Tony Jones authored
      [ Upstream commit 7c5b019e ]
      
      Fix buffer overflow observed when running perf test.
      
      The overflow is when trying to evaluate "1ULL << (64 - 1)" which is
      resulting in -9223372036854775808 which overflows the 20 character
      buffer.
      
      If is possible this bug has been reported before but I still don't see
      any fix checked in:
      
      See: https://www.spinics.net/lists/linux-perf-users/msg07714.htmlReported-by: default avatarMichael Sartain <mikesart@fastmail.com>
      Reported-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Fixes: f7d82350 ("tools/events: Add files to create libtraceevent.a")
      Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.deSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      224c996e
    • Carlos Maiolino's avatar
      fs: fix guard_bio_eod to check for real EOD errors · 83c39533
      Carlos Maiolino authored
      [ Upstream commit dce30ca9 ]
      
      guard_bio_eod() can truncate a segment in bio to allow it to do IO on
      odd last sectors of a device.
      
      It already checks if the IO starts past EOD, but it does not consider
      the possibility of an IO request starting within device boundaries can
      contain more than one segment past EOD.
      
      In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
      underflow bvec->bv_len.
      
      Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
      
      This situation has been found on filesystems such as isofs and vfat,
      which doesn't check the device size before mount, if the device is
      smaller than the filesystem itself, a readahead on such filesystem,
      which spans EOD, can trigger this situation, leading a call to
      zero_user() with a wrong size possibly corrupting memory.
      
      I didn't see any crash, or didn't let the system run long enough to
      check if memory corruption will be hit somewhere, but adding
      instrumentation to guard_bio_end() to check truncated_bytes size, was
      enough to see the error.
      
      The following script can trigger the error.
      
      MNT=/mnt
      IMG=./DISK.img
      DEV=/dev/loop0
      
      mkfs.vfat $IMG
      mount $IMG $MNT
      cp -R /etc $MNT &> /dev/null
      umount $MNT
      
      losetup -D
      
      losetup --find --show --sizelimit 16247280 $IMG
      mount $DEV $MNT
      
      find $MNT -type f -exec cat {} + >/dev/null
      
      Kudos to Eric Sandeen for coming up with the reproducer above
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83c39533
    • luojiajun's avatar
      jbd2: fix invalid descriptor block checksum · 6a817a7a
      luojiajun authored
      [ Upstream commit 6e876c3d ]
      
      In jbd2_journal_commit_transaction(), if we are in abort mode,
      we may flush the buffer without setting descriptor block checksum
      by goto start_journal_io. Then fs is mounted,
      jbd2_descriptor_block_csum_verify() failed.
      
      [  271.379811] EXT4-fs (vdd): shut down requested (2)
      [  271.381827] Aborting journal on device vdd-8.
      [  271.597136] JBD2: Invalid checksum recovering block 22199 in log
      [  271.598023] JBD2: recovery failed
      [  271.598484] EXT4-fs (vdd): error loading journal
      
      Fix this problem by keep setting descriptor block checksum if the
      descriptor buffer is not NULL.
      
      This checksum problem can be reproduced by xfstests generic/388.
      Signed-off-by: default avatarluojiajun <luojiajun3@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a817a7a
    • Florian Westphal's avatar
      netfilter: conntrack: tcp: only close if RST matches exact sequence · ca66f667
      Florian Westphal authored
      [ Upstream commit be0502a3 ]
      
      TCP resets cause instant transition from established to closed state
      provided the reset is in-window.  Endpoints that implement RFC 5961
      require resets to match the next expected sequence number.
      RST segments that are in-window (but that do not match RCV.NXT) are
      ignored, and a "challenge ACK" is sent back.
      
      Main problem for conntrack is that its a middlebox, i.e.  whereas an end
      host might have ACK'd SEQ (and would thus accept an RST with this
      sequence number), conntrack might not have seen this ACK (yet).
      
      Therefore we can't simply flag RSTs with non-exact match as invalid.
      
      This updates RST processing as follows:
      
      1. If the connection is in a state other than ESTABLISHED, nothing is
         changed, RST is subject to normal in-window check.
      
      2. If the RSTs sequence number either matches exactly RCV.NXT,
         connection state moves to CLOSE.
      
      3. The same applies if the RST sequence number aligns with a previous
         packet in the same direction.
      
      In all other cases, the connection remains in ESTABLISHED state.
      If the normal-in-window check passes, the timeout will be lowered
      to that of CLOSE.
      
      If the peer sends a challenge ack, connection timeout will be reset.
      
      If the challenge ACK triggers another RST (RST was valid after all),
      this 2nd RST will match expected sequence and conntrack state changes to
      CLOSE.
      
      If no challenge ACK is received, the connection will time out after
      CLOSE seconds (10 seconds by default), just like without this patch.
      
      Packetdrill test case:
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      // Receive a segment.
      0.210 < P. 1:1001(1000) ack 1 win 46
      0.210 > . 1:1(0) ack 1001
      
      // Application writes 1000 bytes.
      0.250 write(4, ..., 1000) = 1000
      0.250 > P. 1:1001(1000) ack 1001
      
      // First reset, old sequence. Conntrack (correctly) considers this
      // invalid due to failed window validation (regardless of this patch).
      0.260 < R  2:2(0) ack 1001 win 260
      
      // 2nd reset, but too far ahead sequence.  Same: correctly handled
      // as invalid.
      0.270 < R 99990001:99990001(0) ack 1001 win 260
      
      // in-window, but not exact sequence.
      // Current Linux kernels might reply with a challenge ack, and do not
      // remove connection.
      // Without this patch, conntrack state moves to CLOSE.
      // With patch, timeout is lowered like CLOSE, but connection stays
      // in ESTABLISHED state.
      0.280 < R 1010:1010(0) ack 1001 win 260
      
      // Expect challenge ACK
      0.281 > . 1001:1001(0) ack 1001 win 501
      
      // With or without this patch, RST will cause connection
      // to move to CLOSE (sequence number matches)
      // 0.282 < R 1001:1001(0) ack 1001 win 260
      
      // ACK
      0.300 < . 1001:1001(0) ack 1001 win 257
      
      // more data could be exchanged here, connection
      // is still established
      
      // Client closes the connection.
      0.610 < F. 1001:1001(0) ack 1001 win 260
      0.650 > . 1001:1001(0) ack 1002
      
      // Close the connection without reading outstanding data
      0.700 close(4) = 0
      
      // so one more reset.  Will be deemed acceptable with patch as well:
      // connection is already closing.
      0.701 > R. 1001:1001(0) ack 1002 win 501
      // End packetdrill test case.
      
      With patch, this generates following conntrack events:
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      
      Without patch, first RST moves connection to close, whereas socket state
      does not change until FIN is received.
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca66f667
    • Li RongQing's avatar
      netfilter: nf_tables: check the result of dereferencing base_chain->stats · 709aaa09
      Li RongQing authored
      [ Upstream commit a9f5e78c ]
      
      Check the result of dereferencing base_chain->stats, instead of result
      of this_cpu_ptr with NULL.
      
      base_chain->stats maybe be changed to NULL when a chain is updated and a
      new NULL counter can be attached.
      
      And we do not need to check returning of this_cpu_ptr since
      base_chain->stats is from percpu allocator if it is non-NULL,
      this_cpu_ptr returns a valid value.
      
      And fix two sparse error by replacing rcu_access_pointer and
      rcu_dereference with READ_ONCE under rcu_read_lock.
      
      Thanks for Eric's help to finish this patch.
      
      Fixes: 00924094 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarZhang Yu <zhangyu31@baidu.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      709aaa09
    • Yao Liu's avatar
      cifs: Fix NULL pointer dereference of devname · 36a3219e
      Yao Liu authored
      [ Upstream commit 68e2672f ]
      
      There is a NULL pointer dereference of devname in strspn()
      
      The oops looks something like:
      
        CIFS: Attempting to mount (null)
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:strspn+0x0/0x50
        ...
        Call Trace:
         ? cifs_parse_mount_options+0x222/0x1710 [cifs]
         ? cifs_get_volume_info+0x2f/0x80 [cifs]
         cifs_setup_volume_info+0x20/0x190 [cifs]
         cifs_get_volume_info+0x50/0x80 [cifs]
         cifs_smb3_do_mount+0x59/0x630 [cifs]
         ? ida_alloc_range+0x34b/0x3d0
         cifs_do_mount+0x11/0x20 [cifs]
         mount_fs+0x52/0x170
         vfs_kern_mount+0x6b/0x170
         do_mount+0x216/0xdc0
         ksys_mount+0x83/0xd0
         __x64_sys_mount+0x25/0x30
         do_syscall_64+0x65/0x220
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix this by adding a NULL check on devname in cifs_parse_devname()
      Signed-off-by: default avatarYao Liu <yotta.liu@ucloud.cn>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      36a3219e
    • Namjae Jeon's avatar
      cifs: Accept validate negotiate if server return NT_STATUS_NOT_SUPPORTED · d579b4ea
      Namjae Jeon authored
      [ Upstream commit 969ae8e8 ]
      
      Old windows version or Netapp SMB server will return
      NT_STATUS_NOT_SUPPORTED since they do not allow or implement
      FSCTL_VALIDATE_NEGOTIATE_INFO. The client should accept the response
      provided it's properly signed.
      
      See
      https://blogs.msdn.microsoft.com/openspecification/2012/06/28/smb3-secure-dialect-negotiation/
      
      and
      
      MS-SMB2 validate negotiate response processing:
      https://msdn.microsoft.com/en-us/library/hh880630.aspx
      
      Samba client had already handled it.
      https://bugzilla.samba.org/attachment.cgi?id=13285&action=editSigned-off-by: default avatarNamjae Jeon <linkinjeon@gmail.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d579b4ea
    • Chao Yu's avatar
      f2fs: fix to check inline_xattr_size boundary correctly · 4ab78f4d
      Chao Yu authored
      [ Upstream commit 500e0b28 ]
      
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4ab78f4d
    • Jason Cai (Xiang Feng)'s avatar
      dm thin: add sanity checks to thin-pool and external snapshot creation · 8c81fcd3
      Jason Cai (Xiang Feng) authored
      [ Upstream commit 70de2cbd ]
      
      Invoking dm_get_device() twice on the same device path with different
      modes is dangerous.  Because in that case, upgrade_mode() will alloc a
      new 'dm_dev' and free the old one, which may be referenced by a previous
      caller.  Dereferencing the dangling pointer will trigger kernel NULL
      pointer dereference.
      
      The following two cases can reproduce this issue.  Actually, they are
      invalid setups that must be disallowed, e.g.:
      
      1. Creating a thin-pool with read_only mode, and the same device as
      both metadata and data.
      
      dmsetup create thinp --table \
          "0 41943040 thin-pool /dev/vdb /dev/vdb 128 0 1 read_only"
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
      ...
      Call Trace:
       new_read+0xfb/0x110 [dm_bufio]
       dm_bm_read_lock+0x43/0x190 [dm_persistent_data]
       ? kmem_cache_alloc_trace+0x15c/0x1e0
       __create_persistent_data_objects+0x65/0x3e0 [dm_thin_pool]
       dm_pool_metadata_open+0x8c/0xf0 [dm_thin_pool]
       pool_ctr.cold.79+0x213/0x913 [dm_thin_pool]
       ? realloc_argv+0x50/0x70 [dm_mod]
       dm_table_add_target+0x14e/0x330 [dm_mod]
       table_load+0x122/0x2e0 [dm_mod]
       ? dev_status+0x40/0x40 [dm_mod]
       ctl_ioctl+0x1aa/0x3e0 [dm_mod]
       dm_ctl_ioctl+0xa/0x10 [dm_mod]
       do_vfs_ioctl+0xa2/0x600
       ? handle_mm_fault+0xda/0x200
       ? __do_page_fault+0x26c/0x4f0
       ksys_ioctl+0x60/0x90
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x55/0x150
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      2. Creating a external snapshot using the same thin-pool device.
      
      dmsetup create thinp --table \
          "0 41943040 thin-pool /dev/vdc /dev/vdb 128 0 2 ignore_discard"
      dmsetup message /dev/mapper/thinp 0 "create_thin 0"
      dmsetup create snap --table \
                  "0 204800 thin /dev/mapper/thinp 0 /dev/mapper/thinp"
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      ...
      Call Trace:
      ? __alloc_pages_nodemask+0x13c/0x2e0
      retrieve_status+0xa5/0x1f0 [dm_mod]
      ? dm_get_live_or_inactive_table.isra.7+0x20/0x20 [dm_mod]
       table_status+0x61/0xa0 [dm_mod]
       ctl_ioctl+0x1aa/0x3e0 [dm_mod]
       dm_ctl_ioctl+0xa/0x10 [dm_mod]
       do_vfs_ioctl+0xa2/0x600
       ksys_ioctl+0x60/0x90
       ? ksys_write+0x4f/0xb0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x55/0x150
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: default avatarJason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c81fcd3
    • Louis Taylor's avatar
      cifs: use correct format characters · 626d98bb
      Louis Taylor authored
      [ Upstream commit 259594be ]
      
      When compiling with -Wformat, clang emits the following warnings:
      
      fs/cifs/smb1ops.c:312:20: warning: format specifies type 'unsigned
      short' but the argument has type 'unsigned int' [-Wformat]
                               tgt_total_cnt, total_in_tgt);
                                              ^~~~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:289:4: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->flags, ref->server_type);
                       ^~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:289:16: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->flags, ref->server_type);
                                   ^~~~~~~~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:291:4: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->ref_flag, ref->path_consumed);
                       ^~~~~~~~~~~~~
      
      fs/cifs/cifs_dfs_ref.c:291:19: warning: format specifies type 'short'
      but the argument has type 'int' [-Wformat]
                       ref->ref_flag, ref->path_consumed);
                                      ^~~~~~~~~~~~~~~~~~
      The types of these arguments are unconditionally defined, so this patch
      updates the format character to the correct ones for ints and unsigned
      ints.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: default avatarLouis Taylor <louis@kragniz.eu>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      626d98bb
    • Qian Cai's avatar
      page_poison: play nicely with KASAN · a6c56bf6
      Qian Cai authored
      [ Upstream commit 4117992d ]
      
      KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
      It triggers false positives in the allocation path:
      
        BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
        Read of size 8 at addr ffff88881f800000 by task swapper/0
        CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
        Call Trace:
         dump_stack+0xe0/0x19a
         print_address_description.cold.2+0x9/0x28b
         kasan_report.cold.3+0x7a/0xb5
         __asan_report_load8_noabort+0x19/0x20
         memchr_inv+0x2ea/0x330
         kernel_poison_pages+0x103/0x3d5
         get_page_from_freelist+0x15e7/0x4d90
      
      because KASAN has not yet unpoisoned the shadow page for allocation
      before it checks memchr_inv() but only found a stale poison pattern.
      
      Also, false positives in free path,
      
        BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
        Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1
        CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
        Call Trace:
         dump_stack+0xe0/0x19a
         print_address_description.cold.2+0x9/0x28b
         kasan_report.cold.3+0x7a/0xb5
         check_memory_region+0x22d/0x250
         memset+0x28/0x40
         kernel_poison_pages+0x29e/0x3d5
         __free_pages_ok+0x75f/0x13e0
      
      due to KASAN adds poisoned redzones around slab objects, but the page
      poisoning needs to poison the whole page.
      
      Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pwSigned-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a6c56bf6
    • Shuriyc Chu's avatar
      fs/file.c: initialize init_files.resize_wait · d609ecd8
      Shuriyc Chu authored
      [ Upstream commit 5704a068 ]
      
      (Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647)
      
      'get_unused_fd_flags' in kthread cause kernel crash.  It works fine on
      4.1, but causes crash after get 64 fds.  It also cause crash on
      ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the
      same.
      
      The crash message on centos7.5 shows below:
      
        start fd 61
        start fd 62
        start fd 63
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: __wake_up_common+0x2e/0x90
        PGD 0
        Oops: 0000 [#1] SMP
        Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core
         virtio floppy dm_mirror dm_region_hash dm_log dm_mod
        CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.3.3.el7.x86_64 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
        task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000
        RIP: 0010:__wake_up_common+0x2e/0x90
        RSP: 0018:ffff8e94247a2d18  EFLAGS: 00010086
        RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0
        RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8
        R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8
        R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
        FS:  0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0
        Call Trace:
          __wake_up+0x39/0x50
          expand_files+0x131/0x250
          __alloc_fd+0x47/0x170
          get_unused_fd_flags+0x30/0x40
          test_fd+0x12a/0x1c0 [test]
          kthread+0xd1/0xe0
          ret_from_fork_nospec_begin+0x21/0x21
        Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 <48> 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef
        RIP   __wake_up_common+0x2e/0x90
         RSP <ffff8e94247a2d18>
        CR2: 0000000000000000
      
      This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4
      (3.10.0-693.21.1 ) is ok.  Root cause: the item 'resize_wait' is not
      initialized before being used.
      Reported-by: default avatarRichard Zhang <zhang.zijian@h3c.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d609ecd8
    • Sahitya Tummala's avatar
      f2fs: do not use mutex lock in atomic context · 9b4f2766
      Sahitya Tummala authored
      [ Upstream commit 9083977d ]
      
      Fix below warning coming because of using mutex lock in atomic context.
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
      in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
      Preemption disabled at: __radix_tree_preload+0x28/0x130
      Call trace:
       dump_backtrace+0x0/0x2b4
       show_stack+0x20/0x28
       dump_stack+0xa8/0xe0
       ___might_sleep+0x144/0x194
       __might_sleep+0x58/0x8c
       mutex_lock+0x2c/0x48
       f2fs_trace_pid+0x88/0x14c
       f2fs_set_node_page_dirty+0xd0/0x184
      
      Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
      spin_lock() acquired.
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9b4f2766
    • Jia Guo's avatar
      ocfs2: fix a panic problem caused by o2cb_ctl · 20141feb
      Jia Guo authored
      [ Upstream commit cc725ef3 ]
      
      In the process of creating a node, it will cause NULL pointer
      dereference in kernel if o2cb_ctl failed in the interval (mkdir,
      o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
      
      The node num is initialized to 0 in function o2nm_node_group_make_item,
      o2nm_node_group_drop_item will mistake the node number 0 for a valid
      node number when we delete the node before the node number is set
      correctly.  If the local node number of the current host happens to be
      0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
      o2hb_thread still running.  The panic stack is generated as follows:
      
        o2hb_thread
            \-o2hb_do_disk_heartbeat
                \-o2hb_check_own_slot
                    |-slot = &reg->hr_slots[o2nm_this_node()];
                    //o2nm_this_node() return O2NM_INVALID_NODE_NUM
      
      We need to check whether the node number is set when we delete the node.
      
      Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.comSigned-off-by: default avatarJia Guo <guojia12@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Acked-by: default avatarJun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      20141feb
    • Qian Cai's avatar
      mm/slab.c: kmemleak no scan alien caches · f09c424c
      Qian Cai authored
      [ Upstream commit 92d1d07d ]
      
      Kmemleak throws endless warnings during boot due to in
      __alloc_alien_cache(),
      
          alc = kmalloc_node(memsize, gfp, node);
          init_arraycache(&alc->ac, entries, batch);
          kmemleak_no_scan(ac);
      
      Kmemleak does not track the array cache (alc->ac) but the alien cache
      (alc) instead, so let it track the latter by lifting kmemleak_no_scan()
      out of init_arraycache().
      
      There is another place that calls init_arraycache(), but
      alloc_kmem_cache_cpus() uses the percpu allocation where will never be
      considered as a leak.
      
        kmemleak: Found object by alias at 0xffff8007b9aa7e38
        CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
        Call trace:
         dump_backtrace+0x0/0x168
         show_stack+0x24/0x30
         dump_stack+0x88/0xb0
         lookup_object+0x84/0xac
         find_and_get_object+0x84/0xe4
         kmemleak_no_scan+0x74/0xf4
         setup_kmem_cache_node+0x2b4/0x35c
         __do_tune_cpucache+0x250/0x2d4
         do_tune_cpucache+0x4c/0xe4
         enable_cpucache+0xc8/0x110
         setup_cpu_cache+0x40/0x1b8
         __kmem_cache_create+0x240/0x358
         create_cache+0xc0/0x198
         kmem_cache_create_usercopy+0x158/0x20c
         kmem_cache_create+0x50/0x64
         fsnotify_init+0x58/0x6c
         do_one_initcall+0x194/0x388
         kernel_init_freeable+0x668/0x688
         kernel_init+0x18/0x124
         ret_from_fork+0x10/0x18
        kmemleak: Object 0xffff8007b9aa7e00 (size 256):
        kmemleak:   comm "swapper/0", pid 1, jiffies 4294697137
        kmemleak:   min_count = 1
        kmemleak:   count = 0
        kmemleak:   flags = 0x1
        kmemleak:   checksum = 0
        kmemleak:   backtrace:
             kmemleak_alloc+0x84/0xb8
             kmem_cache_alloc_node_trace+0x31c/0x3a0
             __kmalloc_node+0x58/0x78
             setup_kmem_cache_node+0x26c/0x35c
             __do_tune_cpucache+0x250/0x2d4
             do_tune_cpucache+0x4c/0xe4
             enable_cpucache+0xc8/0x110
             setup_cpu_cache+0x40/0x1b8
             __kmem_cache_create+0x240/0x358
             create_cache+0xc0/0x198
             kmem_cache_create_usercopy+0x158/0x20c
             kmem_cache_create+0x50/0x64
             fsnotify_init+0x58/0x6c
             do_one_initcall+0x194/0x388
             kernel_init_freeable+0x668/0x688
             kernel_init+0x18/0x124
        kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38
        CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
        Call trace:
         dump_backtrace+0x0/0x168
         show_stack+0x24/0x30
         dump_stack+0x88/0xb0
         kmemleak_no_scan+0x90/0xf4
         setup_kmem_cache_node+0x2b4/0x35c
         __do_tune_cpucache+0x250/0x2d4
         do_tune_cpucache+0x4c/0xe4
         enable_cpucache+0xc8/0x110
         setup_cpu_cache+0x40/0x1b8
         __kmem_cache_create+0x240/0x358
         create_cache+0xc0/0x198
         kmem_cache_create_usercopy+0x158/0x20c
         kmem_cache_create+0x50/0x64
         fsnotify_init+0x58/0x6c
         do_one_initcall+0x194/0x388
         kernel_init_freeable+0x668/0x688
         kernel_init+0x18/0x124
         ret_from_fork+0x10/0x18
      
      Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw
      Fixes: 1fe00d50 ("slab: factor out initialization of array cache")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f09c424c
    • Uladzislau Rezki (Sony)'s avatar
      mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! · 8a0fc62e
      Uladzislau Rezki (Sony) authored
      [ Upstream commit afd07389 ]
      
      One of the vmalloc stress test case triggers the kernel BUG():
      
        <snip>
        [60.562151] ------------[ cut here ]------------
        [60.562154] kernel BUG at mm/vmalloc.c:512!
        [60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI
        [60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161
        [60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        [60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390
        <snip>
      
      it can happen due to big align request resulting in overflowing of
      calculated address, i.e.  it becomes 0 after ALIGN()'s fixup.
      
      Fix it by checking if calculated address is within vstart/vend range.
      
      Link: http://lkml.kernel.org/r/20190124115648.9433-2-urezki@gmail.comSigned-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8a0fc62e