1. 15 Jun, 2019 40 commits
    • Lu Baolu's avatar
      iommu/vt-d: Don't request page request irq under dmar_global_lock · 2a5fd4fa
      Lu Baolu authored
      [ Upstream commit a7755c3c ]
      
      Requesting page reqest irq under dmar_global_lock could cause
      potential lock race condition (caught by lockdep).
      
      [    4.100055] ======================================================
      [    4.100063] WARNING: possible circular locking dependency detected
      [    4.100072] 5.1.0-rc4+ #2169 Not tainted
      [    4.100078] ------------------------------------------------------
      [    4.100086] swapper/0/1 is trying to acquire lock:
      [    4.100094] 000000007dcbe3c3 (dmar_lock){+.+.}, at: dmar_alloc_hwirq+0x35/0x140
      [    4.100112] but task is already holding lock:
      [    4.100120] 0000000060bbe946 (dmar_global_lock){++++}, at: intel_iommu_init+0x191/0x1438
      [    4.100136] which lock already depends on the new lock.
      [    4.100146] the existing dependency chain (in reverse order) is:
      [    4.100155]
                     -> #2 (dmar_global_lock){++++}:
      [    4.100169]        down_read+0x44/0xa0
      [    4.100178]        intel_irq_remapping_alloc+0xb2/0x7b0
      [    4.100186]        mp_irqdomain_alloc+0x9e/0x2e0
      [    4.100195]        __irq_domain_alloc_irqs+0x131/0x330
      [    4.100203]        alloc_isa_irq_from_domain.isra.4+0x9a/0xd0
      [    4.100212]        mp_map_pin_to_irq+0x244/0x310
      [    4.100221]        setup_IO_APIC+0x757/0x7ed
      [    4.100229]        x86_late_time_init+0x17/0x1c
      [    4.100238]        start_kernel+0x425/0x4e3
      [    4.100247]        secondary_startup_64+0xa4/0xb0
      [    4.100254]
                     -> #1 (irq_domain_mutex){+.+.}:
      [    4.100265]        __mutex_lock+0x7f/0x9d0
      [    4.100273]        __irq_domain_add+0x195/0x2b0
      [    4.100280]        irq_domain_create_hierarchy+0x3d/0x40
      [    4.100289]        msi_create_irq_domain+0x32/0x110
      [    4.100297]        dmar_alloc_hwirq+0x111/0x140
      [    4.100305]        dmar_set_interrupt.part.14+0x1a/0x70
      [    4.100314]        enable_drhd_fault_handling+0x2c/0x6c
      [    4.100323]        apic_bsp_setup+0x75/0x7a
      [    4.100330]        x86_late_time_init+0x17/0x1c
      [    4.100338]        start_kernel+0x425/0x4e3
      [    4.100346]        secondary_startup_64+0xa4/0xb0
      [    4.100352]
                     -> #0 (dmar_lock){+.+.}:
      [    4.100364]        lock_acquire+0xb4/0x1c0
      [    4.100372]        __mutex_lock+0x7f/0x9d0
      [    4.100379]        dmar_alloc_hwirq+0x35/0x140
      [    4.100389]        intel_svm_enable_prq+0x61/0x180
      [    4.100397]        intel_iommu_init+0x1128/0x1438
      [    4.100406]        pci_iommu_init+0x16/0x3f
      [    4.100414]        do_one_initcall+0x5d/0x2be
      [    4.100422]        kernel_init_freeable+0x1f0/0x27c
      [    4.100431]        kernel_init+0xa/0x110
      [    4.100438]        ret_from_fork+0x3a/0x50
      [    4.100444]
                     other info that might help us debug this:
      
      [    4.100454] Chain exists of:
                       dmar_lock --> irq_domain_mutex --> dmar_global_lock
      [    4.100469]  Possible unsafe locking scenario:
      
      [    4.100476]        CPU0                    CPU1
      [    4.100483]        ----                    ----
      [    4.100488]   lock(dmar_global_lock);
      [    4.100495]                                lock(irq_domain_mutex);
      [    4.100503]                                lock(dmar_global_lock);
      [    4.100512]   lock(dmar_lock);
      [    4.100518]
                      *** DEADLOCK ***
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Reported-by: default avatarDave Jiang <dave.jiang@intel.com>
      Fixes: a222a7f0 ("iommu/vt-d: Implement page request handling")
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a5fd4fa
    • Valentin Schneider's avatar
      arm64: defconfig: Update UFSHCD for Hi3660 soc · 3b837ba0
      Valentin Schneider authored
      [ Upstream commit 7b3320e6 ]
      
      Commit 7ee7ef24 ("scsi: arm64: defconfig: enable configs for Hisilicon ufs")
      set 'CONFIG_SCSI_UFS_HISI=y', but the configs it depends
      on
      
        (CONFIG_SCSI_HFSHCD_PLATFORM && CONFIG_SCSI_UFSHCD)
      
      were left to being built as modules.
      
      Commit 1f4fa50d ("arm64: defconfig: Regenerate for v4.20") "fixed"
      that by reverting to 'CONFIG_SCSI_UFS_HISI=m'.
      
      Thing is, if the rootfs is stored in the on-board flash (which
      is the "canonical" way of doing things), we either need these drivers
      to be built-in, or we need to fiddle with an initramfs to access that
      flash and eventually load the modules installed over there.
      
      The former is the easiest, do that.
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3b837ba0
    • Nathan Fontenot's avatar
      powerpc/pseries: Track LMB nid instead of using device tree · 7bd4b91a
      Nathan Fontenot authored
      [ Upstream commit b2d3b5ee ]
      
      When removing memory we need to remove the memory from the node
      it was added to instead of looking up the node it should be in
      in the device tree.
      
      During testing we have seen scenarios where the affinity for a
      LMB changes due to a partition migration or PRRN event. In these
      cases the node the LMB exists in may not match the node the device
      tree indicates it belongs in. This can lead to a system crash
      when trying to DLPAR remove the LMB after a migration or PRRN
      event. The current code looks up the node in the device tree to
      remove the LMB from, the crash occurs when we try to offline this
      node and it does not have any data, i.e. node_data[nid] == NULL.
      
      36:mon> e
      cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
          pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
          lr: c0000000003a14ec: remove_memory+0xbc/0x110
          sp: c0000001828b7a90
         msr: 800000000280b033
         dar: 9a28
       dsisr: 40000000
        current = 0xc0000006329c4c80
        paca    = 0xc000000007a55200   softe: 0        irq_happened: 0x01
          pid   = 76926, comm = kworker/u320:3
      
      36:mon> t
      [link register   ] c0000000003a14ec remove_memory+0xbc/0x110
      [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
      [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
      [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
      [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
      [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
      [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
      [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
      [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
      [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
      [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
      
      To resolve this we need to track the node a LMB belongs to when
      it is added to the system so we can remove it from that node instead
      of the node that the device tree indicates it should belong to.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7bd4b91a
    • Takashi Iwai's avatar
      ALSA: hda - Register irq handler after the chip initialization · 996fffea
      Takashi Iwai authored
      [ Upstream commit f495222e ]
      
      Currently the IRQ handler in HD-audio controller driver is registered
      before the chip initialization.  That is, we have some window opened
      between the azx_acquire_irq() call and the CORB/RIRB setup.  If an
      interrupt is triggered in this small window, the IRQ handler may
      access to the uninitialized RIRB buffer, which leads to a NULL
      dereference Oops.
      
      This is usually no big problem since most of Intel chips do register
      the IRQ via MSI, and we've already fixed the order of the IRQ
      enablement and the CORB/RIRB setup in the former commit b61749a8
      ("sound: enable interrupt after dma buffer initialization"), hence the
      IRQ won't be triggered in that room.  However, some platforms use a
      shared IRQ, and this may allow the IRQ trigger by another source.
      
      Another possibility is the kdump environment: a stale interrupt might
      be present in there, the IRQ handler can be falsely triggered as well.
      
      For covering this small race, let's move the azx_acquire_irq() call
      after hda_intel_init_chip() call.  Although this is a bit radical
      change, it can cover more widely than checking the CORB/RIRB setup
      locally in the callee side.
      Reported-by: default avatarLiwei Song <liwei.song@windriver.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      996fffea
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: fix netdev refcnt leak · 11ae693f
      Taehee Yoo authored
      [ Upstream commit 26a302af ]
      
      flow_offload_alloc() calls nf_route() to get a dst_entry. Internally,
      nf_route() calls ip_route_output_key() that allocates a dst_entry and
      holds it. So, a dst_entry should be released by dst_release() if
      nf_route() is successful.
      
      Otherwise, netns exit routine cannot be finished and the following
      message is printed:
      
      [  257.490952] unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      11ae693f
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: check ttl value in flow offload data path · c155f374
      Taehee Yoo authored
      [ Upstream commit 33cc3c0c ]
      
      nf_flow_offload_ip_hook() and nf_flow_offload_ipv6_hook() do not check
      ttl value. So, ttl value overflow may occur.
      
      Fixes: 97add9f0 ("netfilter: flow table support for IPv4")
      Fixes: 09952107 ("netfilter: flow table support for IPv6")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c155f374
    • Keith Busch's avatar
      nvme-pci: shutdown on timeout during deletion · c6508f86
      Keith Busch authored
      [ Upstream commit 9dc1a38e ]
      
      We do not restart a controller in a deleting state for timeout errors.
      When in this state, unblock potential request dispatchers with failed
      completions by shutting down the controller on timeout detection.
      Reported-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c6508f86
    • Keith Busch's avatar
      nvme-pci: unquiesce admin queue on shutdown · ca6bcc03
      Keith Busch authored
      [ Upstream commit c8e9e9b7 ]
      
      Just like IO queues, the admin queue also will not be restarted after a
      controller shutdown. Unquiesce this queue so that we do not block
      request dispatch on a permanently disabled controller.
      Reported-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca6bcc03
    • Kishon Vijay Abraham I's avatar
      PCI: designware-ep: Use aligned ATU window for raising MSI interrupts · d6f90fae
      Kishon Vijay Abraham I authored
      [ Upstream commit 6b733030 ]
      
      Certain platforms like K2G reguires the outbound ATU window to be
      aligned. The alignment size is already present in mem->page_size.
      Use the alignment size present in mem->page_size to configure an
      aligned ATU window. In order to raise an interrupt, CPU has to write
      to address offset from the start of the window unlike before where
      writes were always to the beginning of the ATU window.
      Signed-off-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6f90fae
    • Kishon Vijay Abraham I's avatar
      misc: pci_endpoint_test: Fix test_reg_bar to be updated in pci_endpoint_test · 449b9fd4
      Kishon Vijay Abraham I authored
      [ Upstream commit 8f220664 ]
      
      commit 834b9051 ("misc: pci_endpoint_test: Add support for
      PCI_ENDPOINT_TEST regs to be mapped to any BAR") while adding
      test_reg_bar in order to map PCI_ENDPOINT_TEST regs to be mapped to any
      BAR failed to update test_reg_bar in pci_endpoint_test, resulting in
      test_reg_bar having invalid value when used outside probe.
      
      Fix it.
      
      Fixes: 834b9051 ("misc: pci_endpoint_test: Add support for PCI_ENDPOINT_TEST regs to be mapped to any BAR")
      Signed-off-by: default avatarKishon Vijay Abraham I <kishon@ti.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      449b9fd4
    • Greg Kurz's avatar
      vfio-pci/nvlink2: Fix potential VMA leak · dcaa5e20
      Greg Kurz authored
      [ Upstream commit 2c85f2bd ]
      
      If vfio_pci_register_dev_region() fails then we should rollback
      previous changes, ie. unmap the ATSD registers.
      
      Fixes: 7f928917 ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dcaa5e20
    • Lu Baolu's avatar
      iommu/vt-d: Set intel_iommu_gfx_mapped correctly · 47e80978
      Lu Baolu authored
      [ Upstream commit cf1ec453 ]
      
      The intel_iommu_gfx_mapped flag is exported by the Intel
      IOMMU driver to indicate whether an IOMMU is used for the
      graphic device. In a virtualized IOMMU environment (e.g.
      QEMU), an include-all IOMMU is used for graphic device.
      This flag is found to be clear even the IOMMU is used.
      
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Reported-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Fixes: c0771df8 ("intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.")
      Suggested-by: default avatarKevin Tian <kevin.tian@intel.com>
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47e80978
    • Ming Lei's avatar
      blk-mq: move cancel of requeue_work into blk_mq_release · 9e2c1b3d
      Ming Lei authored
      [ Upstream commit fbc2a15e ]
      
      With holding queue's kobject refcount, it is safe for driver
      to schedule requeue. However, blk_mq_kick_requeue_list() may
      be called after blk_sync_queue() is done because of concurrent
      requeue activities, then requeue work may not be completed when
      freeing queue, and kernel oops is triggered.
      
      So moving the cancel of requeue_work into blk_mq_release() for
      avoiding race between requeue and freeing queue.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9e2c1b3d
    • Vladimir Zapolskiy's avatar
      watchdog: fix compile time error of pretimeout governors · 1b6349c8
      Vladimir Zapolskiy authored
      [ Upstream commit a223770b ]
      
      CONFIG_WATCHDOG_PRETIMEOUT_GOV build symbol adds watchdog_pretimeout.o
      object to watchdog.o, the latter is compiled only if CONFIG_WATCHDOG_CORE
      is selected, so it rightfully makes sense to add it as a dependency.
      
      The change fixes the next compilation errors, if CONFIG_WATCHDOG_CORE=n
      and CONFIG_WATCHDOG_PRETIMEOUT_GOV=y are selected:
      
        drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_register':
        drivers/watchdog/pretimeout_noop.c:35: undefined reference to `watchdog_register_governor'
        drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_unregister':
        drivers/watchdog/pretimeout_noop.c:40: undefined reference to `watchdog_unregister_governor'
      
        drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_register':
        drivers/watchdog/pretimeout_panic.c:35: undefined reference to `watchdog_register_governor'
        drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_unregister':
        drivers/watchdog/pretimeout_panic.c:40: undefined reference to `watchdog_unregister_governor'
      Reported-by: default avatarKuo, Hsuan-Chi <hckuo2@illinois.edu>
      Fixes: ff84136c ("watchdog: add watchdog pretimeout governor framework")
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b6349c8
    • Georg Hofmann's avatar
      watchdog: imx2_wdt: Fix set_timeout for big timeout values · 81a0eaaa
      Georg Hofmann authored
      [ Upstream commit b07e228e ]
      
      The documentated behavior is: if max_hw_heartbeat_ms is implemented, the
      minimum of the set_timeout argument and max_hw_heartbeat_ms should be used.
      This patch implements this behavior.
      Previously only the first 7bits were used and the input argument was
      returned.
      Signed-off-by: default avatarGeorg Hofmann <georg@hofmannsweb.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@linux-watchdog.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      81a0eaaa
    • Florian Westphal's avatar
      netfilter: nf_tables: fix base chain stat rcu_dereference usage · 1f468942
      Florian Westphal authored
      [ Upstream commit edbd82c5 ]
      
      Following splat gets triggered when nfnetlink monitor is running while
      xtables-nft selftests are running:
      
      net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
      other info that might help us debug this:
      
      1 lock held by xtables-nft-mul/27006:
       #0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
      Call Trace:
       nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
       nf_tables_chain_notify+0xf8/0x1a0
       nf_tables_commit+0x165c/0x1740
      
      nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
      or from the transaction path if a userspace process subscribed to nftables
      notifications.
      
      In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
      hold transaction mutex so the pointer can be NULLed right after the check.
      Just unconditionally fetch the value, then have the helper return
      immediately if its NULL.
      
      In the notification case we don't hold the rcu read lock, but updates are
      prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
      aware of this.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1f468942
    • Serge Semin's avatar
      mips: Make sure dt memory regions are valid · b0e7ae14
      Serge Semin authored
      [ Upstream commit 93fa5b28 ]
      
      There are situations when memory regions coming from dts may be
      too big for the platform physical address space. This especially
      concerns XPA-capable systems. Bootloader may determine more than 4GB
      memory available and pass it to the kernel over dts memory node, while
      kernel is built without XPA/64BIT support. In this case the region
      may either simply be truncated by add_memory_region() method
      or by u64->phys_addr_t type casting. But in worst case the method
      can even drop the memory region if it exceeds PHYS_ADDR_MAX size.
      So lets make sure the retrieved from dts memory regions are valid,
      and if some of them aren't, just manually truncate them with a warning
      printed out.
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Thomas Bogendoerfer <tbogendoerfer@suse.de>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Serge Semin <Sergey.Semin@t-platforms.ru>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0e7ae14
    • Jakub Jankowski's avatar
      netfilter: nf_conntrack_h323: restore boundary check correctness · 58c8298c
      Jakub Jankowski authored
      [ Upstream commit f5e85ce8 ]
      
      Since commit bc7d811a ("netfilter: nf_ct_h323: Convert
      CHECK_BOUND macro to function"), NAT traversal for H.323
      doesn't work, failing to parse H323-UserInformation.
      nf_h323_error_boundary() compares contents of the bitstring,
      not the addresses, preventing valid H.323 packets from being
      conntrack'd.
      
      This looks like an oversight from when CHECK_BOUND macro was
      converted to a function.
      
      To fix it, stop dereferencing bs->cur and bs->end.
      
      Fixes: bc7d811a ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function")
      Signed-off-by: default avatarJakub Jankowski <shasta@toxcorp.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      58c8298c
    • Taehee Yoo's avatar
      netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast · d34b034d
      Taehee Yoo authored
      [ Upstream commit 43c8f131 ]
      
      rhashtable_insert_fast() may return an error value when memory
      allocation fails, but flow_offload_add() does not check for errors.
      This patch just adds missing error checking.
      
      Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d34b034d
    • Ludovic Barre's avatar
      mmc: mmci: Prevent polling for busy detection in IRQ context · db8ca7fd
      Ludovic Barre authored
      [ Upstream commit 8520ce1e ]
      
      The IRQ handler, mmci_irq(), loops until all status bits have been cleared.
      However, the status bit signaling busy in variant->busy_detect_flag, may be
      set even if busy detection isn't monitored for the current request.
      
      This may be the case for the CMD11 when switching the I/O voltage, which
      leads to that mmci_irq() busy loops in IRQ context. Fix this problem, by
      clearing the status bit for busy, before continuing to validate the
      condition for the loop. This is safe, because the busy status detection has
      already been taken care of by mmci_cmd_irq().
      Signed-off-by: default avatarLudovic Barre <ludovic.barre@st.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      db8ca7fd
    • Amir Goldstein's avatar
      ovl: do not generate duplicate fsnotify events for "fake" path · f2ab446f
      Amir Goldstein authored
      [ Upstream commit d9899030 ]
      
      Overlayfs "fake" path is used for stacked file operations on underlying
      files.  Operations on files with "fake" path must not generate fsnotify
      events with path data, because those events have already been generated at
      overlayfs layer and because the reported event->fd for fanotify marks on
      underlying inode/filesystem will have the wrong path (the overlayfs path).
      
      Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/Reported-by: default avatarMurphy Zhou <jencce.kernel@gmail.com>
      Fixes: d1d04ef8 ("ovl: stack file ops")
      Signed-off-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f2ab446f
    • Andreas Schwab's avatar
      fbcon: Don't reset logo_shown when logo is currently shown · 6aaaa535
      Andreas Schwab authored
      [ Upstream commit 3c5a1b11 ]
      
      When the logo is currently drawn on a virtual console, and the console
      loglevel is reduced to quiet, logo_shown must be left alone, so that it
      the scrolling region on that virtual console is properly reset.
      
      Fixes: 10993504 ("fbcon: Silence fbcon logo on 'quiet' boots")
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Yisheng Xie <ysxie@foxmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Marko Myllynen <myllynen@redhat.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6aaaa535
    • Jisheng Zhang's avatar
      PCI: dwc: Free MSI IRQ page in dw_pcie_free_msi() · 62729638
      Jisheng Zhang authored
      [ Upstream commit dc69a3d5 ]
      
      To avoid a memory leak, free the page allocated for MSI IRQ in
      dw_pcie_free_msi().
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarGustavo Pimentel <gustavo.pimentel@synopsys.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62729638
    • Jisheng Zhang's avatar
      PCI: dwc: Free MSI in dw_pcie_host_init() error path · 2507c7c9
      Jisheng Zhang authored
      [ Upstream commit 9e2b5de5 ]
      
      If we ever did MSI-related initializations, we need to call
      dw_pcie_free_msi() in the error code path.
      
      Remove the IS_ENABLED(CONFIG_PCI_MSI) check for MSI init because
      pci_msi_enabled() already has a stub for !CONFIG_PCI_MSI.
      Signed-off-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarGustavo Pimentel <gustavo.pimentel@synopsys.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2507c7c9
    • Maciej Żenczykowski's avatar
      uml: fix a boot splat wrt use of cpu_all_mask · 84e5ca83
      Maciej Żenczykowski authored
      [ Upstream commit 689a5860 ]
      
      Memory: 509108K/542612K available (3835K kernel code, 919K rwdata, 1028K rodata, 129K init, 211K bss, 33504K reserved, 0K cma-reserved)
      NR_IRQS: 15
      clocksource: timer: mask: 0xffffffffffffffff max_cycles: 0x1cd42e205, max_idle_ns: 881590404426 ns
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 0 at kernel/time/clockevents.c:458 clockevents_register_device+0x72/0x140
      posix-timer cpumask == cpu_all_mask, using cpu_possible_mask instead
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-00048-ged79cc87 #4
      Stack:
       604ebda0 603c5370 604ebe20 6046fd17
       00000000 6006fcbb 604ebdb0 603c53b5
       604ebe10 6003bfc4 604ebdd0 9000001ca
      Call Trace:
       [<6006fcbb>] ? printk+0x0/0x94
       [<60083160>] ? clockevents_register_device+0x72/0x140
       [<6001f16e>] show_stack+0x13b/0x155
       [<603c5370>] ? dump_stack_print_info+0xe2/0xeb
       [<6006fcbb>] ? printk+0x0/0x94
       [<603c53b5>] dump_stack+0x2a/0x2c
       [<6003bfc4>] __warn+0x10e/0x13e
       [<60070320>] ? vprintk_func+0xc8/0xcf
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<6003c08b>] warn_slowpath_fmt+0x97/0x99
       [<600311a1>] ? set_signals+0x0/0x3f
       [<6003bff4>] ? warn_slowpath_fmt+0x0/0x99
       [<600842cb>] ? tick_oneshot_mode_active+0x44/0x4f
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<6007d2d5>] ? __clocksource_select+0x20/0x1b1
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<60083160>] clockevents_register_device+0x72/0x140
       [<60031192>] ? get_signals+0x0/0xf
       [<60030fd6>] ? block_signals+0x0/0x16
       [<6006fcbb>] ? printk+0x0/0x94
       [<60002eec>] um_timer_setup+0xc8/0xca
       [<60001b59>] start_kernel+0x47f/0x57e
       [<600035bc>] start_kernel_proc+0x49/0x4d
       [<6006c483>] ? kmsg_dump_register+0x82/0x8a
       [<6001de62>] new_thread_handler+0x81/0xb2
       [<60003571>] ? kmsg_dumper_stdout_init+0x1a/0x1c
       [<60020c75>] uml_finishsetup+0x54/0x59
      
      random: get_random_bytes called from init_oops_id+0x27/0x34 with crng_init=0
      ---[ end trace 00173d0117a88acb ]---
      Calibrating delay loop... 6941.90 BogoMIPS (lpj=34709504)
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: linux-um@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      84e5ca83
    • YueHaibing's avatar
      configfs: fix possible use-after-free in configfs_register_group · 93e0a666
      YueHaibing authored
      [ Upstream commit 35399f87 ]
      
      In configfs_register_group(), if create_default_group() failed, we
      forget to unlink the group. It will left a invalid item in the parent list,
      which may trigger the use-after-free issue seen below:
      
      BUG: KASAN: use-after-free in __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
      Read of size 8 at addr ffff8881ef61ae20 by task syz-executor.0/5996
      
      CPU: 1 PID: 5996 Comm: syz-executor.0 Tainted: G         C        5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xa9/0x10e lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
       __list_add include/linux/list.h:60 [inline]
       list_add_tail include/linux/list.h:93 [inline]
       link_obj+0xb0/0x190 fs/configfs/dir.c:759
       link_group+0x1c/0x130 fs/configfs/dir.c:784
       configfs_register_group+0x56/0x1e0 fs/configfs/dir.c:1751
       configfs_register_default_group+0x72/0xc0 fs/configfs/dir.c:1834
       ? 0xffffffffc1be0000
       iio_sw_trigger_init+0x23/0x1000 [industrialio_sw_trigger]
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f494ecbcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 00007f494ecbcc70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f494ecbd6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      
      Allocated by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:497
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       configfs_register_default_group+0x4c/0xc0 fs/configfs/dir.c:1829
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:459
       slab_free_hook mm/slub.c:1429 [inline]
       slab_free_freelist_hook mm/slub.c:1456 [inline]
       slab_free mm/slub.c:3003 [inline]
       kfree+0xe1/0x270 mm/slub.c:3955
       configfs_register_default_group+0x9a/0xc0 fs/configfs/dir.c:1836
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881ef61ae00
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 32 bytes inside of
       192-byte region [ffff8881ef61ae00, ffff8881ef61aec0)
      The buggy address belongs to the page:
      page:ffffea0007bd8680 count:1 mapcount:0 mapping:ffff8881f6c03000 index:0xffff8881ef61a700
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 ffffea0007ca4740 0000000500000005 ffff8881f6c03000
      raw: ffff8881ef61a700 000000008010000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881ef61ad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881ef61ad80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
      >ffff8881ef61ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                     ^
       ffff8881ef61ae80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881ef61af00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 5cf6a51e ("configfs: allow dynamic group creation")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      93e0a666
    • John Sperbeck's avatar
      percpu: remove spurious lock dependency between percpu and sched · da0abba5
      John Sperbeck authored
      [ Upstream commit 198790d9 ]
      
      In free_percpu() we sometimes call pcpu_schedule_balance_work() to
      queue a work item (which does a wakeup) while holding pcpu_lock.
      This creates an unnecessary lock dependency between pcpu_lock and
      the scheduler's pi_lock.  There are other places where we call
      pcpu_schedule_balance_work() without hold pcpu_lock, and this case
      doesn't need to be different.
      
      Moving the call outside the lock prevents the following lockdep splat
      when running tools/testing/selftests/bpf/{test_maps,test_progs} in
      sequence with lockdep enabled:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      5.1.0-dbg-DEV #1 Not tainted
      ------------------------------------------------------
      kworker/23:255/18872 is trying to acquire lock:
      000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520
      
      but task is already holding lock:
      00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #4 (pcpu_lock){..-.}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock_irqsave+0x3a/0x50
             pcpu_alloc+0xfa/0x780
             __alloc_percpu_gfp+0x12/0x20
             alloc_htab_elem+0x184/0x2b0
             __htab_percpu_map_update_elem+0x252/0x290
             bpf_percpu_hash_update+0x7c/0x130
             __do_sys_bpf+0x1912/0x1be0
             __x64_sys_bpf+0x1a/0x20
             do_syscall_64+0x59/0x400
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      -> #3 (&htab->buckets[i].lock){....}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock_irqsave+0x3a/0x50
             htab_map_update_elem+0x1af/0x3a0
      
      -> #2 (&rq->lock){-.-.}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock+0x2f/0x40
             task_fork_fair+0x37/0x160
             sched_fork+0x211/0x310
             copy_process.part.43+0x7b1/0x2160
             _do_fork+0xda/0x6b0
             kernel_thread+0x29/0x30
             rest_init+0x22/0x260
             arch_call_rest_init+0xe/0x10
             start_kernel+0x4fd/0x520
             x86_64_start_reservations+0x24/0x26
             x86_64_start_kernel+0x6f/0x72
             secondary_startup_64+0xa4/0xb0
      
      -> #1 (&p->pi_lock){-.-.}:
             lock_acquire+0x9e/0x180
             _raw_spin_lock_irqsave+0x3a/0x50
             try_to_wake_up+0x41/0x600
             wake_up_process+0x15/0x20
             create_worker+0x16b/0x1e0
             workqueue_init+0x279/0x2ee
             kernel_init_freeable+0xf7/0x288
             kernel_init+0xf/0x180
             ret_from_fork+0x24/0x30
      
      -> #0 (&(&pool->lock)->rlock){-.-.}:
             __lock_acquire+0x101f/0x12a0
             lock_acquire+0x9e/0x180
             _raw_spin_lock+0x2f/0x40
             __queue_work+0xb2/0x520
             queue_work_on+0x38/0x80
             free_percpu+0x221/0x260
             pcpu_freelist_destroy+0x11/0x20
             stack_map_free+0x2a/0x40
             bpf_map_free_deferred+0x3c/0x50
             process_one_work+0x1f7/0x580
             worker_thread+0x54/0x410
             kthread+0x10f/0x150
             ret_from_fork+0x24/0x30
      
      other info that might help us debug this:
      
      Chain exists of:
        &(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcpu_lock);
                                     lock(&htab->buckets[i].lock);
                                     lock(pcpu_lock);
        lock(&(&pool->lock)->rlock);
      
       *** DEADLOCK ***
      
      3 locks held by kworker/23:255/18872:
       #0: 00000000b36a6e16 ((wq_completion)events){+.+.},
           at: process_one_work+0x17a/0x580
       #1: 00000000dfd966f0 ((work_completion)(&map->work)){+.+.},
           at: process_one_work+0x17a/0x580
       #2: 00000000e3e7a6aa (pcpu_lock){..-.},
           at: free_percpu+0x36/0x260
      
      stack backtrace:
      CPU: 23 PID: 18872 Comm: kworker/23:255 Not tainted 5.1.0-dbg-DEV #1
      Hardware name: ...
      Workqueue: events bpf_map_free_deferred
      Call Trace:
       dump_stack+0x67/0x95
       print_circular_bug.isra.38+0x1c6/0x220
       check_prev_add.constprop.50+0x9f6/0xd20
       __lock_acquire+0x101f/0x12a0
       lock_acquire+0x9e/0x180
       _raw_spin_lock+0x2f/0x40
       __queue_work+0xb2/0x520
       queue_work_on+0x38/0x80
       free_percpu+0x221/0x260
       pcpu_freelist_destroy+0x11/0x20
       stack_map_free+0x2a/0x40
       bpf_map_free_deferred+0x3c/0x50
       process_one_work+0x1f7/0x580
       worker_thread+0x54/0x410
       kthread+0x10f/0x150
       ret_from_fork+0x24/0x30
      Signed-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      da0abba5
    • Eugen Hristev's avatar
      media: atmel: atmel-isc: fix asd memory allocation · 6f9ca872
      Eugen Hristev authored
      [ Upstream commit 1e4e25c4 ]
      
      The subsystem will free the asd memory on notifier cleanup, if the asd is
      added to the notifier.
      However the memory is freed using kfree.
      Thus, we cannot allocate the asd using devm_*
      This can lead to crashes and problems.
      To test this issue, just return an error at probe, but cleanup the
      notifier beforehand.
      
      Fixes: 10626744 ("[media] atmel-isc: add the Image Sensor Controller code")
      Signed-off-by: default avatarEugen Hristev <eugen.hristev@microchip.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6f9ca872
    • Chao Yu's avatar
      f2fs: fix to do checksum even if inode page is uptodate · b0395364
      Chao Yu authored
      [ Upstream commit b42b179b ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0395364
    • Chao Yu's avatar
      f2fs: fix to retrieve inline xattr space · 0fd306ae
      Chao Yu authored
      [ Upstream commit 45a74688 ]
      
      With below mkfs and mount option, generic/339 of fstest will report that
      scratch image becomes corrupted.
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
      MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
      
      [ASSERT] (f2fs_check_dirent_position:1315)  --> Wrong position of dirent pino:1970, name: (...)
      level:8, dir_level:0, pgofs:951, correct range:[900, 901]
      
      In old kernel, inline data and directory always reserved 200 bytes in
      inode layout, even if inline_xattr is disabled, then new kernel tries
      to retrieve that space for non-inline xattr inode, but for inline dentry,
      its layout size should be fixed, so we just keep that reserved space.
      
      But the problem here is that, after inline dentry conversion, inline
      dentry layout no longer exists, if we still reserve inline xattr space,
      after dents updates, there will be a hole in inline xattr space, which
      can break hierarchy hash directory structure.
      
      This patch fixes this issue by retrieving inline xattr space after
      inline dentry conversion.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0fd306ae
    • Chao Yu's avatar
      f2fs: fix to avoid deadloop in foreground GC · e2877ff9
      Chao Yu authored
      [ Upstream commit 793ab1c8 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203211
      
      - Overview
      When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cd test
      touch t
      
      - Messages
      [   58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      [   58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
      ... (repeating)
      
      During GC, if segment type stored in SSA and SIT is inconsistent, we just
      skip migrating current segment directly, since we need to know the exact
      type to decide the migration function we use.
      
      So in foreground GC, we will easily run into a infinite loop as we may
      select the same victim segment which has inconsistent type due to greedy
      policy. In order to end up this, we choose to shutdown filesystem. For
      backgrond GC, we need to do that as well, so that we can avoid latter
      potential infinite looped foreground GC.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e2877ff9
    • Chao Yu's avatar
      f2fs: fix to do sanity check on valid block count of segment · 53275b02
      Chao Yu authored
      [ Upstream commit e95bcdb2 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_13.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2102!
       RIP: 0010:update_sit_entry+0x394/0x410
       Call Trace:
        f2fs_allocate_data_block+0x16f/0x660
        do_write_page+0x62/0x170
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x372/0x1400
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      sit.vblocks and sum valid block count in sit.valid_map may be
      inconsistent, segment w/ zero vblocks will be treated as free
      segment, while allocating in free segment, we may allocate a
      free block, if its bitmap is valid previously, it can cause
      kernel crash due to bitmap verification failure.
      
      Anyway, to avoid further serious metadata inconsistence and
      corruption, it is necessary and worth to detect SIT
      inconsistence. So let's enable check_block_count() to verify
      vblocks and valid_map all the time rather than do it only
      CONFIG_F2FS_CHECK_FS is enabled.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53275b02
    • Chao Yu's avatar
      f2fs: fix to avoid panic in dec_valid_node_count() · 076d876b
      Chao Yu authored
      [ Upstream commit ea6d7e72 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203213
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:2012!
       RIP: 0010:truncate_node+0x2c9/0x2e0
       Call Trace:
        f2fs_truncate_xattr_node+0xa1/0x130
        f2fs_remove_inode_page+0x82/0x2d0
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_node_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      076d876b
    • Chao Yu's avatar
      f2fs: fix to use inline space only if inline_xattr is enable · 7cc9fda7
      Chao Yu authored
      [ Upstream commit 622927f3 ]
      
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7cc9fda7
    • Chao Yu's avatar
      f2fs: fix to avoid panic in dec_valid_block_count() · 3f4a094e
      Chao Yu authored
      [ Upstream commit 5e159cd3 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3f4a094e
    • Chao Yu's avatar
      f2fs: fix to clear dirty inode in error path of f2fs_iget() · 83c46592
      Chao Yu authored
      [ Upstream commit 546d22f0 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203217
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_test_05.c
      mkdir test
      mount -t f2fs tmp.img test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/inode.c:707!
       RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
       Call Trace:
        evict+0xba/0x180
        f2fs_iget+0x598/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_readlinkat+0x56/0x110
        __x64_sys_readlink+0x16/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During inode loading, __recover_inline_status() can recovery inode status
      and set inode dirty, once we failed in following process, it will fail
      the check in f2fs_evict_inode, result in trigger BUG_ON().
      
      Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83c46592
    • Chao Yu's avatar
      f2fs: fix to do sanity check on free nid · c5722a10
      Chao Yu authored
      [ Upstream commit 626bcf2b ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5722a10
    • Chao Yu's avatar
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · 54916125
      Chao Yu authored
      [ Upstream commit 8b6810f8 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      54916125
    • Chao Yu's avatar
      f2fs: fix error path of recovery · d6fa38f7
      Chao Yu authored
      [ Upstream commit 98838579 ]
      
      There are some places in where we missed to unlock page or unlock page
      incorrectly, fix them.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d6fa38f7
    • Chao Yu's avatar
      f2fs: fix to avoid panic in f2fs_inplace_write_data() · da06b18b
      Chao Yu authored
      [ Upstream commit 05573d6c ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203239
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_15.c
      ./run.sh f2fs
      sync
      
      - Kernel messages
       ------------[ cut here ]------------
       kernel BUG at fs/f2fs/segment.c:3162!
       RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
       Call Trace:
        f2fs_do_write_data_page+0x3c1/0x820
        __write_data_page+0x156/0x720
        f2fs_write_cache_pages+0x20d/0x460
        f2fs_write_data_pages+0x1b4/0x300
        do_writepages+0x15/0x60
        __filemap_fdatawrite_range+0x7c/0xb0
        file_write_and_wait_range+0x2c/0x80
        f2fs_do_sync_file+0x102/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_inplace_write_data() will trigger kernel panic due
      to data block locates in node type segment.
      
      To avoid panic, let's just return error code and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      da06b18b