1. 26 Jul, 2019 40 commits
    • Coly Li's avatar
      bcache: fix potential deadlock in cached_def_free() · 5f5eb171
      Coly Li authored
      [ Upstream commit 7e865eba ]
      
      When enable lockdep and reboot system with a writeback mode bcache
      device, the following potential deadlock warning is reported by lockdep
      engine.
      
      [  101.536569][  T401] kworker/2:2/401 is trying to acquire lock:
      [  101.538575][  T401] 00000000bbf6e6c7 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
      [  101.542054][  T401]
      [  101.542054][  T401] but task is already holding lock:
      [  101.544587][  T401] 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
      [  101.548386][  T401]
      [  101.548386][  T401] which lock already depends on the new lock.
      [  101.548386][  T401]
      [  101.551874][  T401]
      [  101.551874][  T401] the existing dependency chain (in reverse order) is:
      [  101.555000][  T401]
      [  101.555000][  T401] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
      [  101.557860][  T401]        process_one_work+0x277/0x640
      [  101.559661][  T401]        worker_thread+0x39/0x3f0
      [  101.561340][  T401]        kthread+0x125/0x140
      [  101.562963][  T401]        ret_from_fork+0x3a/0x50
      [  101.564718][  T401]
      [  101.564718][  T401] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
      [  101.567701][  T401]        lock_acquire+0xb4/0x1c0
      [  101.569651][  T401]        flush_workqueue+0xae/0x4c0
      [  101.571494][  T401]        drain_workqueue+0xa9/0x180
      [  101.573234][  T401]        destroy_workqueue+0x17/0x250
      [  101.575109][  T401]        cached_dev_free+0x44/0x120 [bcache]
      [  101.577304][  T401]        process_one_work+0x2a4/0x640
      [  101.579357][  T401]        worker_thread+0x39/0x3f0
      [  101.581055][  T401]        kthread+0x125/0x140
      [  101.582709][  T401]        ret_from_fork+0x3a/0x50
      [  101.584592][  T401]
      [  101.584592][  T401] other info that might help us debug this:
      [  101.584592][  T401]
      [  101.588355][  T401]  Possible unsafe locking scenario:
      [  101.588355][  T401]
      [  101.590974][  T401]        CPU0                    CPU1
      [  101.592889][  T401]        ----                    ----
      [  101.594743][  T401]   lock((work_completion)(&cl->work)#2);
      [  101.596785][  T401]                                lock((wq_completion)bcache_writeback_wq);
      [  101.600072][  T401]                                lock((work_completion)(&cl->work)#2);
      [  101.602971][  T401]   lock((wq_completion)bcache_writeback_wq);
      [  101.605255][  T401]
      [  101.605255][  T401]  *** DEADLOCK ***
      [  101.605255][  T401]
      [  101.608310][  T401] 2 locks held by kworker/2:2/401:
      [  101.610208][  T401]  #0: 00000000cf2c7d17 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
      [  101.613709][  T401]  #1: 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
      [  101.617480][  T401]
      [  101.617480][  T401] stack backtrace:
      [  101.619539][  T401] CPU: 2 PID: 401 Comm: kworker/2:2 Tainted: G        W         5.2.0-rc4-lp151.20-default+ #1
      [  101.623225][  T401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
      [  101.627210][  T401] Workqueue: events cached_dev_free [bcache]
      [  101.629239][  T401] Call Trace:
      [  101.630360][  T401]  dump_stack+0x85/0xcb
      [  101.631777][  T401]  print_circular_bug+0x19a/0x1f0
      [  101.633485][  T401]  __lock_acquire+0x16cd/0x1850
      [  101.635184][  T401]  ? __lock_acquire+0x6a8/0x1850
      [  101.636863][  T401]  ? lock_acquire+0xb4/0x1c0
      [  101.638421][  T401]  ? find_held_lock+0x34/0xa0
      [  101.640015][  T401]  lock_acquire+0xb4/0x1c0
      [  101.641513][  T401]  ? flush_workqueue+0x87/0x4c0
      [  101.643248][  T401]  flush_workqueue+0xae/0x4c0
      [  101.644832][  T401]  ? flush_workqueue+0x87/0x4c0
      [  101.646476][  T401]  ? drain_workqueue+0xa9/0x180
      [  101.648303][  T401]  drain_workqueue+0xa9/0x180
      [  101.649867][  T401]  destroy_workqueue+0x17/0x250
      [  101.651503][  T401]  cached_dev_free+0x44/0x120 [bcache]
      [  101.653328][  T401]  process_one_work+0x2a4/0x640
      [  101.655029][  T401]  worker_thread+0x39/0x3f0
      [  101.656693][  T401]  ? process_one_work+0x640/0x640
      [  101.658501][  T401]  kthread+0x125/0x140
      [  101.660012][  T401]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  101.661985][  T401]  ret_from_fork+0x3a/0x50
      [  101.691318][  T401] bcache: bcache_device_free() bcache0 stopped
      
      Here is how the above potential deadlock may happen in reboot/shutdown
      code path,
      1) bcache_reboot() is called firstly in the reboot/shutdown code path,
         then in bcache_reboot(), bcache_device_stop() is called.
      2) bcache_device_stop() sets BCACHE_DEV_CLOSING on d->falgs, then call
         closure_queue(&d->cl) to invoke cached_dev_flush(). And in turn
         cached_dev_flush() calls cached_dev_free() via closure_at()
      3) In cached_dev_free(), after stopped writebach kthread
         dc->writeback_thread, the kwork dc->writeback_write_wq is stopping by
         destroy_workqueue().
      4) Inside destroy_workqueue(), drain_workqueue() is called. Inside
         drain_workqueue(), flush_workqueue() is called. Then wq->lockdep_map
         is acquired by lock_map_acquire() in flush_workqueue(). After the
         lock acquired the rest part of flush_workqueue() just wait for the
         workqueue to complete.
      5) Now we look back at writeback thread routine bch_writeback_thread(),
         in the main while-loop, write_dirty() is called via continue_at() in
         read_dirty_submit(), which is called via continue_at() in while-loop
         level called function read_dirty(). Inside write_dirty() it may be
         re-called on workqueeu dc->writeback_write_wq via continue_at().
         It means when the writeback kthread is stopped in cached_dev_free()
         there might be still one kworker queued on dc->writeback_write_wq
         to execute write_dirty() again.
      6) Now this kworker is scheduled on dc->writeback_write_wq to run by
         process_one_work() (which is called by worker_thread()). Before
         calling the kwork routine, wq->lockdep_map is acquired.
      7) But wq->lockdep_map is acquired already in step 4), so a A-A lock
         (lockdep terminology) scenario happens.
      
      Indeed on multiple cores syatem, the above deadlock is very rare to
      happen, just as the code comments in process_one_work() says,
      2263     * AFAICT there is no possible deadlock scenario between the
      2264     * flush_work() and complete() primitives (except for
      	   single-threaded
      2265     * workqueues), so hiding them isn't a problem.
      
      But it is still good to fix such lockdep warning, even no one running
      bcache on single core system.
      
      The fix is simple. This patch solves the above potential deadlock by,
      - Do not destroy workqueue dc->writeback_write_wq in cached_dev_free().
      - Flush and destroy dc->writeback_write_wq in writebach kthread routine
        bch_writeback_thread(), where after quit the thread main while-loop
        and before cached_dev_put() is called.
      
      By this fix, dc->writeback_write_wq will be stopped and destroy before
      the writeback kthread stopped, so the chance for a A-A locking on
      wq->lockdep_map is disappeared, such A-A deadlock won't happen
      any more.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5f5eb171
    • Coly Li's avatar
      bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() · 2d1169fe
      Coly Li authored
      [ Upstream commit b387e9b5 ]
      
      When system memory is in heavy pressure, bch_gc_thread_start() from
      run_cache_set() may fail due to out of memory. In such condition,
      c->gc_thread is assigned to -ENOMEM, not NULL pointer. Then in following
      failure code path bch_cache_set_error(), when cache_set_flush() gets
      called, the code piece to stop c->gc_thread is broken,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      
      And KASAN catches such NULL pointer deference problem, with the warning
      information:
      
      [  561.207881] ==================================================================
      [  561.207900] BUG: KASAN: null-ptr-deref in kthread_stop+0x3b/0x440
      [  561.207904] Write of size 4 at addr 000000000000001c by task kworker/15:1/313
      
      [  561.207913] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G        W         5.0.0-vanilla+ #3
      [  561.207916] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.207935] Workqueue: events cache_set_flush [bcache]
      [  561.207940] Call Trace:
      [  561.207948]  dump_stack+0x9a/0xeb
      [  561.207955]  ? kthread_stop+0x3b/0x440
      [  561.207960]  ? kthread_stop+0x3b/0x440
      [  561.207965]  kasan_report+0x176/0x192
      [  561.207973]  ? kthread_stop+0x3b/0x440
      [  561.207981]  kthread_stop+0x3b/0x440
      [  561.207995]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  561.208008]  process_one_work+0x856/0x1620
      [  561.208015]  ? find_held_lock+0x39/0x1d0
      [  561.208028]  ? drain_workqueue+0x380/0x380
      [  561.208048]  worker_thread+0x87/0xb80
      [  561.208058]  ? __kthread_parkme+0xb6/0x180
      [  561.208067]  ? process_one_work+0x1620/0x1620
      [  561.208072]  kthread+0x326/0x3e0
      [  561.208079]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  561.208090]  ret_from_fork+0x3a/0x50
      [  561.208110] ==================================================================
      [  561.208113] Disabling lock debugging due to kernel taint
      [  561.208115] irq event stamp: 11800231
      [  561.208126] hardirqs last  enabled at (11800231): [<ffffffff83008538>] do_syscall_64+0x18/0x410
      [  561.208127] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      [  561.208129] #PF error: [WRITE]
      [  561.312253] hardirqs last disabled at (11800230): [<ffffffff830052ff>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  561.312259] softirqs last  enabled at (11799832): [<ffffffff850005c7>] __do_softirq+0x5c7/0x8c3
      [  561.405975] PGD 0 P4D 0
      [  561.442494] softirqs last disabled at (11799821): [<ffffffff831add2c>] irq_exit+0x1ac/0x1e0
      [  561.791359] Oops: 0002 [#1] SMP KASAN NOPTI
      [  561.791362] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G    B   W         5.0.0-vanilla+ #3
      [  561.791363] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.791371] Workqueue: events cache_set_flush [bcache]
      [  561.791374] RIP: 0010:kthread_stop+0x3b/0x440
      [  561.791376] Code: 00 00 65 8b 05 26 d5 e0 7c 89 c0 48 0f a3 05 ec aa df 02 0f 82 dc 02 00 00 4c 8d 63 20 be 04 00 00 00 4c 89 e7 e8 65 c5 53 00 <f0> ff 43 20 48 8d 7b 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
      [  561.791377] RSP: 0018:ffff88872fc8fd10 EFLAGS: 00010286
      [  561.838895] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838916] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838934] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838948] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838966] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838979] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838996] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.067028] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffffffff832dd314
      [  563.067030] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000297
      [  563.067032] RBP: ffff88872fc8fe88 R08: fffffbfff0b8213d R09: fffffbfff0b8213d
      [  563.067034] R10: 0000000000000001 R11: fffffbfff0b8213c R12: 000000000000001c
      [  563.408618] R13: ffff88dc61cc0f68 R14: ffff888102b94900 R15: ffff88dc61cc0f68
      [  563.408620] FS:  0000000000000000(0000) GS:ffff888f7dc00000(0000) knlGS:0000000000000000
      [  563.408622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  563.408623] CR2: 000000000000001c CR3: 0000000f48a1a004 CR4: 00000000007606e0
      [  563.408625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  563.408627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  563.904795] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.915796] PKRU: 55555554
      [  563.915797] Call Trace:
      [  563.915807]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  563.915812]  process_one_work+0x856/0x1620
      [  564.001226] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.033563]  ? find_held_lock+0x39/0x1d0
      [  564.033567]  ? drain_workqueue+0x380/0x380
      [  564.033574]  worker_thread+0x87/0xb80
      [  564.062823] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.118042]  ? __kthread_parkme+0xb6/0x180
      [  564.118046]  ? process_one_work+0x1620/0x1620
      [  564.118048]  kthread+0x326/0x3e0
      [  564.118050]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  564.167066] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.252441]  ret_from_fork+0x3a/0x50
      [  564.252447] Modules linked in: msr rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib i40iw configfs iw_cm ib_cm libiscsi scsi_transport_iscsi mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat intel_rapl skx_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ses raid0 aesni_intel cdc_ether enclosure usbnet ipmi_ssif joydev aes_x86_64 i40e scsi_transport_sas mii bcache md_mod crypto_simd mei_me ioatdma crc64 ptp cryptd pcspkr i2c_i801 mlx4_core glue_helper pps_core mei lpc_ich dca wmi ipmi_si ipmi_devintf nd_pmem dax_pmem nd_btt ipmi_msghandler device_dax pcc_cpufreq button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd ttm megaraid_sas drm usbcore nfit libnvdimm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
      [  564.299390] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.348360] CR2: 000000000000001c
      [  564.348362] ---[ end trace b7f0e5cc7b2103b0 ]---
      
      Therefore, it is not enough to only check whether c->gc_thread is NULL,
      we should use IS_ERR_OR_NULL() to check both NULL pointer and error
      value.
      
      This patch changes the above buggy code piece in this way,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2d1169fe
    • Coly Li's avatar
      bcache: acquire bch_register_lock later in cached_dev_free() · 02d4c4ba
      Coly Li authored
      [ Upstream commit 80265d8d ]
      
      When enable lockdep engine, a lockdep warning can be observed when
      reboot or shutdown system,
      
      [ 3142.764557][    T1] bcache: bcache_reboot() Stopping all devices:
      [ 3142.776265][ T2649]
      [ 3142.777159][ T2649] ======================================================
      [ 3142.780039][ T2649] WARNING: possible circular locking dependency detected
      [ 3142.782869][ T2649] 5.2.0-rc4-lp151.20-default+ #1 Tainted: G        W
      [ 3142.785684][ T2649] ------------------------------------------------------
      [ 3142.788479][ T2649] kworker/3:67/2649 is trying to acquire lock:
      [ 3142.790738][ T2649] 00000000aaf02291 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
      [ 3142.794678][ T2649]
      [ 3142.794678][ T2649] but task is already holding lock:
      [ 3142.797402][ T2649] 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
      [ 3142.801462][ T2649]
      [ 3142.801462][ T2649] which lock already depends on the new lock.
      [ 3142.801462][ T2649]
      [ 3142.805277][ T2649]
      [ 3142.805277][ T2649] the existing dependency chain (in reverse order) is:
      [ 3142.808902][ T2649]
      [ 3142.808902][ T2649] -> #2 (&bch_register_lock){+.+.}:
      [ 3142.812396][ T2649]        __mutex_lock+0x7a/0x9d0
      [ 3142.814184][ T2649]        cached_dev_free+0x17/0x120 [bcache]
      [ 3142.816415][ T2649]        process_one_work+0x2a4/0x640
      [ 3142.818413][ T2649]        worker_thread+0x39/0x3f0
      [ 3142.820276][ T2649]        kthread+0x125/0x140
      [ 3142.822061][ T2649]        ret_from_fork+0x3a/0x50
      [ 3142.823965][ T2649]
      [ 3142.823965][ T2649] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
      [ 3142.827244][ T2649]        process_one_work+0x277/0x640
      [ 3142.829160][ T2649]        worker_thread+0x39/0x3f0
      [ 3142.830958][ T2649]        kthread+0x125/0x140
      [ 3142.832674][ T2649]        ret_from_fork+0x3a/0x50
      [ 3142.834915][ T2649]
      [ 3142.834915][ T2649] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
      [ 3142.838121][ T2649]        lock_acquire+0xb4/0x1c0
      [ 3142.840025][ T2649]        flush_workqueue+0xae/0x4c0
      [ 3142.842035][ T2649]        drain_workqueue+0xa9/0x180
      [ 3142.844042][ T2649]        destroy_workqueue+0x17/0x250
      [ 3142.846142][ T2649]        cached_dev_free+0x52/0x120 [bcache]
      [ 3142.848530][ T2649]        process_one_work+0x2a4/0x640
      [ 3142.850663][ T2649]        worker_thread+0x39/0x3f0
      [ 3142.852464][ T2649]        kthread+0x125/0x140
      [ 3142.854106][ T2649]        ret_from_fork+0x3a/0x50
      [ 3142.855880][ T2649]
      [ 3142.855880][ T2649] other info that might help us debug this:
      [ 3142.855880][ T2649]
      [ 3142.859663][ T2649] Chain exists of:
      [ 3142.859663][ T2649]   (wq_completion)bcache_writeback_wq --> (work_completion)(&cl->work)#2 --> &bch_register_lock
      [ 3142.859663][ T2649]
      [ 3142.865424][ T2649]  Possible unsafe locking scenario:
      [ 3142.865424][ T2649]
      [ 3142.868022][ T2649]        CPU0                    CPU1
      [ 3142.869885][ T2649]        ----                    ----
      [ 3142.871751][ T2649]   lock(&bch_register_lock);
      [ 3142.873379][ T2649]                                lock((work_completion)(&cl->work)#2);
      [ 3142.876399][ T2649]                                lock(&bch_register_lock);
      [ 3142.879727][ T2649]   lock((wq_completion)bcache_writeback_wq);
      [ 3142.882064][ T2649]
      [ 3142.882064][ T2649]  *** DEADLOCK ***
      [ 3142.882064][ T2649]
      [ 3142.885060][ T2649] 3 locks held by kworker/3:67/2649:
      [ 3142.887245][ T2649]  #0: 00000000e774cdd0 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
      [ 3142.890815][ T2649]  #1: 00000000f7df89da ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
      [ 3142.894884][ T2649]  #2: 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
      [ 3142.898797][ T2649]
      [ 3142.898797][ T2649] stack backtrace:
      [ 3142.900961][ T2649] CPU: 3 PID: 2649 Comm: kworker/3:67 Tainted: G        W         5.2.0-rc4-lp151.20-default+ #1
      [ 3142.904789][ T2649] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
      [ 3142.909168][ T2649] Workqueue: events cached_dev_free [bcache]
      [ 3142.911422][ T2649] Call Trace:
      [ 3142.912656][ T2649]  dump_stack+0x85/0xcb
      [ 3142.914181][ T2649]  print_circular_bug+0x19a/0x1f0
      [ 3142.916193][ T2649]  __lock_acquire+0x16cd/0x1850
      [ 3142.917936][ T2649]  ? __lock_acquire+0x6a8/0x1850
      [ 3142.919704][ T2649]  ? lock_acquire+0xb4/0x1c0
      [ 3142.921335][ T2649]  ? find_held_lock+0x34/0xa0
      [ 3142.923052][ T2649]  lock_acquire+0xb4/0x1c0
      [ 3142.924635][ T2649]  ? flush_workqueue+0x87/0x4c0
      [ 3142.926375][ T2649]  flush_workqueue+0xae/0x4c0
      [ 3142.928047][ T2649]  ? flush_workqueue+0x87/0x4c0
      [ 3142.929824][ T2649]  ? drain_workqueue+0xa9/0x180
      [ 3142.931686][ T2649]  drain_workqueue+0xa9/0x180
      [ 3142.933534][ T2649]  destroy_workqueue+0x17/0x250
      [ 3142.935787][ T2649]  cached_dev_free+0x52/0x120 [bcache]
      [ 3142.937795][ T2649]  process_one_work+0x2a4/0x640
      [ 3142.939803][ T2649]  worker_thread+0x39/0x3f0
      [ 3142.941487][ T2649]  ? process_one_work+0x640/0x640
      [ 3142.943389][ T2649]  kthread+0x125/0x140
      [ 3142.944894][ T2649]  ? kthread_create_worker_on_cpu+0x70/0x70
      [ 3142.947744][ T2649]  ret_from_fork+0x3a/0x50
      [ 3142.970358][ T2649] bcache: bcache_device_free() bcache0 stopped
      
      Here is how the deadlock happens.
      1) bcache_reboot() calls bcache_device_stop(), then inside
         bcache_device_stop() BCACHE_DEV_CLOSING bit is set on d->flags.
         Then closure_queue(&d->cl) is called to invoke cached_dev_flush().
      2) In cached_dev_flush(), cached_dev_free() is called by continu_at().
      3) In cached_dev_free(), when stopping the writeback kthread of the
         cached device by kthread_stop(), dc->writeback_thread will be waken
         up to quite the kthread while-loop, then cached_dev_put() is called
         in bch_writeback_thread().
      4) Calling cached_dev_put() in writeback kthread may drop dc->count to
         0, then dc->detach kworker is scheduled, which is initialized as
         cached_dev_detach_finish().
      5) Inside cached_dev_detach_finish(), the last line of code is to call
         closure_put(&dc->disk.cl), which drops the last reference counter of
         closrure dc->disk.cl, then the callback cached_dev_flush() gets
         called.
      Now cached_dev_flush() is called for second time in the code path, the
      first time is in step 2). And again bch_register_lock will be acquired
      again, and a A-A lock (lockdep terminology) is happening.
      
      The root cause of the above A-A lock is in cached_dev_free(), mutex
      bch_register_lock is held before stopping writeback kthread and other
      kworkers. Fortunately now we have variable 'bcache_is_reboot', which may
      prevent device registration or unregistration during reboot/shutdown
      time, so it is unncessary to hold bch_register_lock such early now.
      
      This is how this patch fixes the reboot/shutdown time A-A lock issue:
      After moving mutex_lock(&bch_register_lock) to a later location where
      before atomic_read(&dc->running) in cached_dev_free(), such A-A lock
      problem can be solved without any reboot time registration race.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      02d4c4ba
    • Coly Li's avatar
      bcache: check CACHE_SET_IO_DISABLE bit in bch_journal() · 7afcee10
      Coly Li authored
      [ Upstream commit 383ff218 ]
      
      When too many I/O errors happen on cache set and CACHE_SET_IO_DISABLE
      bit is set, bch_journal() may continue to work because the journaling
      bkey might be still in write set yet. The caller of bch_journal() may
      believe the journal still work but the truth is in-memory journal write
      set won't be written into cache device any more. This behavior may
      introduce potential inconsistent metadata status.
      
      This patch checks CACHE_SET_IO_DISABLE bit at the head of bch_journal(),
      if the bit is set, bch_journal() returns NULL immediately to notice
      caller to know journal does not work.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7afcee10
    • Coly Li's avatar
      bcache: check CACHE_SET_IO_DISABLE in allocator code · 2e99386f
      Coly Li authored
      [ Upstream commit e775339e ]
      
      If CACHE_SET_IO_DISABLE of a cache set flag is set by too many I/O
      errors, currently allocator routines can still continue allocate
      space which may introduce inconsistent metadata state.
      
      This patch checkes CACHE_SET_IO_DISABLE bit in following allocator
      routines,
      - bch_bucket_alloc()
      - __bch_bucket_alloc_set()
      Once CACHE_SET_IO_DISABLE is set on cache set, the allocator routines
      may reject allocation request earlier to avoid potential inconsistent
      metadata.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2e99386f
    • Eiichi Tsukata's avatar
      EDAC: Fix global-out-of-bounds write when setting edac_mc_poll_msec · 6a512abe
      Eiichi Tsukata authored
      [ Upstream commit d8655e76 ]
      
      Commit 9da21b15 ("EDAC: Poll timeout cannot be zero, p2") assumes
      edac_mc_poll_msec to be unsigned long, but the type of the variable still
      remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
      write.
      
      Reproducer:
      
        # echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec
      
      KASAN report:
      
        BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
        Write of size 8 at addr ffffffffb91b2d00 by task bash/1996
      
        CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
        Call Trace:
         dump_stack+0xca/0x13e
         print_address_description.cold+0x5/0x246
         __kasan_report.cold+0x75/0x9a
         ? edac_set_poll_msec+0x140/0x150
         kasan_report+0xe/0x20
         edac_set_poll_msec+0x140/0x150
         ? dimmdev_location_show+0x30/0x30
         ? vfs_lock_file+0xe0/0xe0
         ? _raw_spin_lock+0x87/0xe0
         param_attr_store+0x1b5/0x310
         ? param_array_set+0x4f0/0x4f0
         module_attr_store+0x58/0x80
         ? module_attr_show+0x80/0x80
         sysfs_kf_write+0x13d/0x1a0
         kernfs_fop_write+0x2bc/0x460
         ? sysfs_kf_bin_read+0x270/0x270
         ? kernfs_notify+0x1f0/0x1f0
         __vfs_write+0x81/0x100
         vfs_write+0x1e1/0x560
         ksys_write+0x126/0x250
         ? __ia32_sys_read+0xb0/0xb0
         ? do_syscall_64+0x1f/0x390
         do_syscall_64+0xc1/0x390
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7fa7caa5e970
        Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
        RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
        RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
        RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
        R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
        R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005
      
        The buggy address belongs to the variable:
         edac_mc_poll_msec+0x0/0x40
      
        Memory state around the buggy address:
         ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
         ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
        >ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
                           ^
         ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
         ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fix it by changing the type of edac_mc_poll_msec to unsigned int.
      The reason why this patch adopts unsigned int rather than unsigned long
      is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
      integer conversion bugs and unsigned int will be large enough for
      edac_mc_poll_msec.
      Reviewed-by: default avatarJames Morse <james.morse@arm.com>
      Fixes: 9da21b15 ("EDAC: Poll timeout cannot be zero, p2")
      Signed-off-by: default avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6a512abe
    • Ahmad Masri's avatar
      wil6210: drop old event after wmi_call timeout · 12058bfa
      Ahmad Masri authored
      [ Upstream commit 1a276003 ]
      
      This change fixes a rare race condition of handling WMI events after
      wmi_call expires.
      
      wmi_recv_cmd immediately handles an event when reply_buf is defined and
      a wmi_call is waiting for the event.
      However, in case the wmi_call has already timed-out, there will be no
      waiting/running wmi_call and the event will be queued in WMI queue and
      will be handled later in wmi_event_handle.
      Meanwhile, a new similar wmi_call for the same command and event may
      be issued. In this case, when handling the queued event we got WARN_ON
      printed.
      
      Fixing this case as a valid timeout and drop the unexpected event.
      Signed-off-by: default avatarAhmad Masri <amasri@codeaurora.org>
      Signed-off-by: default avatarMaya Erez <merez@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      12058bfa
    • Zefir Kurtisi's avatar
      ath9k: correctly handle short radar pulses · 9ef7f897
      Zefir Kurtisi authored
      [ Upstream commit df5c4150 ]
      
      In commit 3c0efb74 ("ath9k: discard undersized packets")
      the lower bound of RX packets was set to 10 (min ACK size) to
      filter those that would otherwise be treated as invalid at
      mac80211.
      
      Alas, short radar pulses are reported as PHY_ERROR frames
      with length set to 3. Therefore their detection stopped
      working after that commit.
      
      NOTE: ath9k drivers built thereafter will not pass DFS
      certification.
      
      This extends the criteria for short packets to explicitly
      handle PHY_ERROR frames.
      
      Fixes: 3c0efb74 ("ath9k: discard undersized packets")
      Signed-off-by: default avatarZefir Kurtisi <zefir.kurtisi@neratec.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9ef7f897
    • Arnd Bergmann's avatar
      crypto: asymmetric_keys - select CRYPTO_HASH where needed · 58093426
      Arnd Bergmann authored
      [ Upstream commit 90acc065 ]
      
      Build testing with some core crypto options disabled revealed
      a few modules that are missing CRYPTO_HASH:
      
      crypto/asymmetric_keys/x509_public_key.o: In function `x509_get_sig_params':
      x509_public_key.c:(.text+0x4c7): undefined reference to `crypto_alloc_shash'
      x509_public_key.c:(.text+0x5e5): undefined reference to `crypto_shash_digest'
      crypto/asymmetric_keys/pkcs7_verify.o: In function `pkcs7_digest.isra.0':
      pkcs7_verify.c:(.text+0xab): undefined reference to `crypto_alloc_shash'
      pkcs7_verify.c:(.text+0x1b2): undefined reference to `crypto_shash_digest'
      pkcs7_verify.c:(.text+0x3c1): undefined reference to `crypto_shash_update'
      pkcs7_verify.c:(.text+0x411): undefined reference to `crypto_shash_finup'
      
      This normally doesn't show up in randconfig tests because there is
      a large number of other options that select CRYPTO_HASH.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      58093426
    • Arnd Bergmann's avatar
      crypto: serpent - mark __serpent_setkey_sbox noinline · 34bfba9d
      Arnd Bergmann authored
      [ Upstream commit 47397118 ]
      
      The same bug that gcc hit in the past is apparently now showing
      up with clang, which decides to inline __serpent_setkey_sbox:
      
      crypto/serpent_generic.c:268:5: error: stack frame size of 2112 bytes in function '__serpent_setkey' [-Werror,-Wframe-larger-than=]
      
      Marking it 'noinline' reduces the stack usage from 2112 bytes to
      192 and 96 bytes, respectively, and seems to generate more
      useful object code.
      
      Fixes: c871c10e ("crypto: serpent - improve __serpent_setkey with UBSAN")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Biggers <ebiggers@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      34bfba9d
    • Mauro S. M. Rodrigues's avatar
      ixgbe: Check DDM existence in transceiver before access · 53b3ba90
      Mauro S. M. Rodrigues authored
      [ Upstream commit 655c9141 ]
      
      Some transceivers may comply with SFF-8472 but not implement the Digital
      Diagnostic Monitoring (DDM) interface described in it. The existence of
      such area is specified by bit 6 of byte 92, set to 1 if implemented.
      
      Currently, due to not checking this bit ixgbe fails trying to read SFP
      module's eeprom with the follow message:
      
      ethtool -m enP51p1s0f0
      Cannot get Module EEPROM data: Input/output error
      
      Because it fails to read the additional 256 bytes in which it was assumed
      to exist the DDM data.
      
      This issue was noticed using a Mellanox Passive DAC PN 01FT738. The eeprom
      data was confirmed by Mellanox as correct and present in other Passive
      DACs in from other manufacturers.
      Signed-off-by: default avatar"Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53b3ba90
    • Jianbo Liu's avatar
      net/mlx5: Get vport ACL namespace by vport index · e00660b6
      Jianbo Liu authored
      [ Upstream commit f53297d6 ]
      
      The ingress and egress ACL root namespaces are created per vport and
      stored into arrays. However, the vport number is not the same as the
      index. Passing the array index, instead of vport number, to get the
      correct ingress and egress acl namespace.
      
      Fixes: 9b93ab98 ("net/mlx5: Separate ingress/egress namespaces for each vport")
      Signed-off-by: default avatarJianbo Liu <jianbol@mellanox.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Reviewed-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e00660b6
    • Waibel Georg's avatar
      gpio: Fix return value mismatch of function gpiod_get_from_of_node() · c0cb5eab
      Waibel Georg authored
      [ Upstream commit 025bf377 ]
      
      In case the requested gpio property is not found in the device tree, some
      callers of gpiod_get_from_of_node() expect a return value of NULL, others
      expect -ENOENT.
      In particular devm_fwnode_get_index_gpiod_from_child() expects -ENOENT.
      Currently it gets a NULL, which breaks the loop that tries all
      gpio_suffixes. The result is that a gpio property is not found, even
      though it is there.
      
      This patch changes gpiod_get_from_of_node() to return -ENOENT instead
      of NULL when the requested gpio property is not found in the device
      tree. Additionally it modifies all calling functions to properly
      evaluate the return value.
      
      Another approach would be to leave the return value of
      gpiod_get_from_of_node() as is and fix the bug in
      devm_fwnode_get_index_gpiod_from_child(). Other callers would still need
      to be reworked. The effort would be the same as with the chosen solution.
      Signed-off-by: default avatarGeorg Waibel <georg.waibel@sensor-technik.de>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c0cb5eab
    • Ferdinand Blomqvist's avatar
      rslib: Fix handling of of caller provided syndrome · 8c1afd1c
      Ferdinand Blomqvist authored
      [ Upstream commit ef4d6a85 ]
      
      Check if the syndrome provided by the caller is zero, and act
      accordingly.
      Signed-off-by: default avatarFerdinand Blomqvist <ferdinand.blomqvist@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190620141039.9874-6-ferdinand.blomqvist@gmail.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c1afd1c
    • Jiong Wang's avatar
      bpf: fix BPF_ALU32 | BPF_ARSH on BE arches · 66397927
      Jiong Wang authored
      [ Upstream commit 75672dda ]
      
      Yauheni reported the following code do not work correctly on BE arches:
      
             ALU_ARSH_X:
                     DST = (u64) (u32) ((*(s32 *) &DST) >> SRC);
                     CONT;
             ALU_ARSH_K:
                     DST = (u64) (u32) ((*(s32 *) &DST) >> IMM);
                     CONT;
      
      and are causing failure of test_verifier test 'arsh32 on imm 2' on BE
      arches.
      
      The code is taking address and interpreting memory directly, so is not
      endianness neutral. We should instead perform standard C type casting on
      the variable. A u64 to s32 conversion will drop the high 32-bit and reserve
      the low 32-bit as signed integer, this is all we want.
      
      Fixes: 2dc6b100 ("bpf: interpreter support BPF_ALU | BPF_ARSH")
      Reported-by: default avatarYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      66397927
    • Ferdinand Blomqvist's avatar
      rslib: Fix decoding of shortened codes · 27cb2288
      Ferdinand Blomqvist authored
      [ Upstream commit 2034a42d ]
      
      The decoding of shortenend codes is broken. It only works as expected if
      there are no erasures.
      
      When decoding with erasures, Lambda (the error and erasure locator
      polynomial) is initialized from the given erasure positions. The pad
      parameter is not accounted for by the initialisation code, and hence
      Lambda is initialized from incorrect erasure positions.
      
      The fix is to adjust the erasure positions by the supplied pad.
      Signed-off-by: default avatarFerdinand Blomqvist <ferdinand.blomqvist@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190620141039.9874-3-ferdinand.blomqvist@gmail.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      27cb2288
    • Nathan Chancellor's avatar
      xsk: Properly terminate assignment in xskq_produce_flush_desc · eeb491bd
      Nathan Chancellor authored
      [ Upstream commit f7019b7b ]
      
      Clang warns:
      
      In file included from net/xdp/xsk_queue.c:10:
      net/xdp/xsk_queue.h:292:2: warning: expression result unused
      [-Wunused-value]
              WRITE_ONCE(q->ring->producer, q->prod_tail);
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      include/linux/compiler.h:284:6: note: expanded from macro 'WRITE_ONCE'
              __u.__val;                                      \
              ~~~ ^~~~~
      1 warning generated.
      
      The q->prod_tail assignment has a comma at the end, not a semi-colon.
      Fix that so clang no longer warns and everything works as expected.
      
      Fixes: c497176c ("xsk: add Rx receive functions and poll support")
      Link: https://github.com/ClangBuiltLinux/linux/issues/544Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Acked-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eeb491bd
    • Felix Kaechele's avatar
      netfilter: ctnetlink: Fix regression in conntrack entry deletion · c0e3f909
      Felix Kaechele authored
      [ Upstream commit e7600865 ]
      
      Commit f8e60898 ("netfilter: ctnetlink: Resolve conntrack
      L3-protocol flush regression") introduced a regression in which deletion
      of conntrack entries would fail because the L3 protocol information
      is replaced by AF_UNSPEC. As a result the search for the entry to be
      deleted would turn up empty due to the tuple used to perform the search
      is now different from the tuple used to initially set up the entry.
      
      For flushing the conntrack table we do however want to keep the option
      for nfgenmsg->version to have a non-zero value to allow for newer
      user-space tools to request treatment under the new behavior. With that
      it is possible to independently flush tables for a defined L3 protocol.
      This was introduced with the enhancements in in commit 59c08c69
      ("netfilter: ctnetlink: Support L3 protocol-filter on flush").
      
      Older user-space tools will retain the behavior of flushing all tables
      regardless of defined L3 protocol.
      
      Fixes: f8e60898 ("netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression")
      Suggested-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFelix Kaechele <felix@kaechele.ca>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c0e3f909
    • Marek Szyprowski's avatar
      clocksource/drivers/exynos_mct: Increase priority over ARM arch timer · f7542bae
      Marek Szyprowski authored
      [ Upstream commit 6282edb7 ]
      
      Exynos SoCs based on CA7/CA15 have 2 timer interfaces: custom Exynos MCT
      (Multi Core Timer) and standard ARM Architected Timers.
      
      There are use cases, where both timer interfaces are used simultanously.
      One of such examples is using Exynos MCT for the main system timer and
      ARM Architected Timers for the KVM and virtualized guests (KVM requires
      arch timers).
      
      Exynos Multi-Core Timer driver (exynos_mct) must be however started
      before ARM Architected Timers (arch_timer), because they both share some
      common hardware blocks (global system counter) and turning on MCT is
      needed to get ARM Architected Timer working properly.
      
      To ensure selecting Exynos MCT as the main system timer, increase MCT
      timer rating. To ensure proper starting order of both timers during
      suspend/resume cycle, increase MCT hotplug priority over ARM Archictected
      Timers.
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: default avatarChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f7542bae
    • Dmitry Osipenko's avatar
      clocksource/drivers/tegra: Restore base address before cleanup · 34316ace
      Dmitry Osipenko authored
      [ Upstream commit fc9babc2 ]
      
      We're adjusting the timer's base for each per-CPU timer to point to the
      actual start of the timer since device-tree defines a compound registers
      range that includes all of the timers. In this case the original base
      need to be restore before calling iounmap to unmap the proper address.
      Signed-off-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Acked-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Acked-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      34316ace
    • Tejun Heo's avatar
      libata: don't request sense data on !ZAC ATA devices · c6edf12f
      Tejun Heo authored
      [ Upstream commit ca156e00 ]
      
      ZAC support added sense data requesting on error for both ZAC and ATA
      devices. This seems to cause erratic error handling behaviors on some
      SSDs where the device reports sense data availability and then
      delivers the wrong content making EH take the wrong actions.  The
      failure mode was sporadic on a LITE-ON ssd and couldn't be reliably
      reproduced.
      
      There is no value in requesting sense data from non-ZAC ATA devices
      while there's a significant risk of introducing EH misbehaviors which
      are difficult to reproduce and fix.  Let's do the sense data dancing
      only for ZAC devices.
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Tested-by: default avatarMasato Suzuki <masato.suzuki@wdc.com>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c6edf12f
    • Dmitry Osipenko's avatar
      clocksource/drivers/tegra: Release all IRQ's on request_irq() error · 9e94e997
      Dmitry Osipenko authored
      [ Upstream commit 7a391670 ]
      
      Release all requested IRQ's on the request error to properly clean up
      allocated resources.
      Signed-off-by: default avatarDmitry Osipenko <digetx@gmail.com>
      Acked-By: default avatarPeter De Schrijver <pdeschrijver@nvidia.com>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9e94e997
    • Amadeusz Sławiński's avatar
      ASoC: Intel: hdac_hdmi: Set ops to NULL on remove · 0e47a834
      Amadeusz Sławiński authored
      [ Upstream commit 0f6ff785 ]
      
      When we unload Skylake driver we may end up calling
      hdac_component_master_unbind(), it uses acomp->audio_ops, which we set
      in hdmi_codec_probe(), so we need to set it to NULL in hdmi_codec_remove(),
      otherwise we will dereference no longer existing pointer.
      Signed-off-by: default avatarAmadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
      Reviewed-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0e47a834
    • Kyle Meyer's avatar
      perf tools: Increase MAX_NR_CPUS and MAX_CACHES · ad1f4986
      Kyle Meyer authored
      [ Upstream commit 9f94c7f9 ]
      
      Attempting to profile 1024 or more CPUs with perf causes two errors:
      
        perf record -a
        [ perf record: Woken up X times to write data ]
        way too many cpu caches..
        [ perf record: Captured and wrote X MB perf.data (X samples) ]
      
        perf report -C 1024
        Error: failed to set  cpu bitmap
        Requested CPU 1024 too large. Consider raising MAX_NR_CPUS
      
        Increasing MAX_NR_CPUS from 1024 to 2048 and redefining MAX_CACHES as
        MAX_NR_CPUS * 4 returns normal functionality to perf:
      
        perf record -a
        [ perf record: Woken up X times to write data ]
        [ perf record: Captured and wrote X MB perf.data (X samples) ]
      
        perf report -C 1024
        ...
      Signed-off-by: default avatarKyle Meyer <kyle.meyer@hpe.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190620193630.154025-1-meyerk@stormcage.eag.rdlabs.hpecorp.netSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ad1f4986
    • Miaoqing Pan's avatar
      ath10k: fix PCIE device wake up failed · 81124baa
      Miaoqing Pan authored
      [ Upstream commit 011d4111 ]
      
      Observed PCIE device wake up failed after ~120 iterations of
      soft-reboot test. The error message is
      "ath10k_pci 0000:01:00.0: failed to wake up device : -110"
      
      The call trace as below:
      ath10k_pci_probe -> ath10k_pci_force_wake -> ath10k_pci_wake_wait ->
      ath10k_pci_is_awake
      
      Once trigger the device to wake up, we will continuously check the RTC
      state until it returns RTC_STATE_V_ON or timeout.
      
      But for QCA99x0 chips, we use wrong value for RTC_STATE_V_ON.
      Occasionally, we get 0x7 on the fist read, we thought as a failure
      case, but actually is the right value, also verified with the spec.
      So fix the issue by changing RTC_STATE_V_ON from 0x5 to 0x7, passed
      ~2000 iterations.
      
      Tested HW: QCA9984
      Signed-off-by: default avatarMiaoqing Pan <miaoqing@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      81124baa
    • Miaoqing Pan's avatar
      ath10k: fix fw crash by moving chip reset after napi disabled · 080c204b
      Miaoqing Pan authored
      [ Upstream commit 08d80e4c ]
      
      On SMP platform, when continuously running wifi up/down, the napi
      poll can be scheduled during chip reset, which will call
      ath10k_pci_has_fw_crashed() to check the fw status. But in the reset
      period, the value from FW_INDICATOR_ADDRESS register will return
      0xdeadbeef, which also be treated as fw crash. Fix the issue by
      moving chip reset after napi disabled.
      
      ath10k_pci 0000:01:00.0: firmware crashed! (guid 73b30611-5b1e-4bdd-90b4-64c81eb947b6)
      ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
      ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal otp max-sta 512 raw 0 hwcrypto 1
      ath10k_pci 0000:01:00.0: failed to get memcpy hi address for firmware address 4: -16
      ath10k_pci 0000:01:00.0: failed to read firmware dump area: -16
      ath10k_pci 0000:01:00.0: Copy Engine register dump:
      ath10k_pci 0000:01:00.0: [00]: 0x0004a000   0   0   0   0
      ath10k_pci 0000:01:00.0: [01]: 0x0004a400   0   0   0   0
      ath10k_pci 0000:01:00.0: [02]: 0x0004a800   0   0   0   0
      ath10k_pci 0000:01:00.0: [03]: 0x0004ac00   0   0   0   0
      ath10k_pci 0000:01:00.0: [04]: 0x0004b000   0   0   0   0
      ath10k_pci 0000:01:00.0: [05]: 0x0004b400   0   0   0   0
      ath10k_pci 0000:01:00.0: [06]: 0x0004b800   0   0   0   0
      ath10k_pci 0000:01:00.0: [07]: 0x0004bc00   1   0   1   0
      ath10k_pci 0000:01:00.0: [08]: 0x0004c000   0   0   0   0
      ath10k_pci 0000:01:00.0: [09]: 0x0004c400   0   0   0   0
      ath10k_pci 0000:01:00.0: [10]: 0x0004c800   0   0   0   0
      ath10k_pci 0000:01:00.0: [11]: 0x0004cc00   0   0   0   0
      
      Tested HW: QCA9984,QCA9887,WCN3990
      Signed-off-by: default avatarMiaoqing Pan <miaoqing@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      080c204b
    • Claire Chang's avatar
      ath10k: add missing error handling · 4e59c7c6
      Claire Chang authored
      [ Upstream commit 4b553f3c ]
      
      In function ath10k_sdio_mbox_rx_alloc() [sdio.c],
      ath10k_sdio_mbox_alloc_rx_pkt() is called without handling the error cases.
      This will make the driver think the allocation for skb is successful and
      try to access the skb. If we enable failslab, system will easily crash with
      NULL pointer dereferencing.
      
      Call trace of CONFIG_FAILSLAB:
      ath10k_sdio_irq_handler+0x570/0xa88 [ath10k_sdio]
      process_sdio_pending_irqs+0x4c/0x174
      sdio_run_irqs+0x3c/0x64
      sdio_irq_work+0x1c/0x28
      
      Fixes: d96db25d ("ath10k: add initial SDIO support")
      Signed-off-by: default avatarClaire Chang <tientzu@chromium.org>
      Reviewed-by: default avatarBrian Norris <briannorris@chromium.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4e59c7c6
    • Julian Anastasov's avatar
      ipvs: fix tinfo memory leak in start_sync_thread · 477c2749
      Julian Anastasov authored
      [ Upstream commit 5db7c8b9 ]
      
      syzkaller reports for memory leak in start_sync_thread [1]
      
      As Eric points out, kthread may start and stop before the
      threadfn function is called, so there is no chance the
      data (tinfo in our case) to be released in thread.
      
      Fix this by releasing tinfo in the controlling code instead.
      
      [1]
      BUG: memory leak
      unreferenced object 0xffff8881206bf700 (size 32):
       comm "syz-executor761", pid 7268, jiffies 4294943441 (age 20.470s)
       hex dump (first 32 bytes):
         00 40 7c 09 81 88 ff ff 80 45 b8 21 81 88 ff ff  .@|......E.!....
         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
       backtrace:
         [<0000000057619e23>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
         [<0000000057619e23>] slab_post_alloc_hook mm/slab.h:439 [inline]
         [<0000000057619e23>] slab_alloc mm/slab.c:3326 [inline]
         [<0000000057619e23>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
         [<0000000086ce5479>] kmalloc include/linux/slab.h:547 [inline]
         [<0000000086ce5479>] start_sync_thread+0x5d2/0xe10 net/netfilter/ipvs/ip_vs_sync.c:1862
         [<000000001a9229cc>] do_ip_vs_set_ctl+0x4c5/0x780 net/netfilter/ipvs/ip_vs_ctl.c:2402
         [<00000000ece457c8>] nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
         [<00000000ece457c8>] nf_setsockopt+0x4c/0x80 net/netfilter/nf_sockopt.c:115
         [<00000000942f62d4>] ip_setsockopt net/ipv4/ip_sockglue.c:1258 [inline]
         [<00000000942f62d4>] ip_setsockopt+0x9b/0xb0 net/ipv4/ip_sockglue.c:1238
         [<00000000a56a8ffd>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
         [<00000000fa895401>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130
         [<0000000095eef4cf>] __sys_setsockopt+0x98/0x120 net/socket.c:2078
         [<000000009747cf88>] __do_sys_setsockopt net/socket.c:2089 [inline]
         [<000000009747cf88>] __se_sys_setsockopt net/socket.c:2086 [inline]
         [<000000009747cf88>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
         [<00000000ded8ba80>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
         [<00000000893b4ac8>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+7e2e50c8adfccd2e5041@syzkaller.appspotmail.com
      Suggested-by: default avatarEric Biggers <ebiggers@kernel.org>
      Fixes: 998e7a76 ("ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      477c2749
    • Lorenzo Bianconi's avatar
      mt7601u: fix possible memory leak when the device is disconnected · 5d099ded
      Lorenzo Bianconi authored
      [ Upstream commit 23377c20 ]
      
      When the device is disconnected while passing traffic it is possible
      to receive out of order urbs causing a memory leak since the skb linked
      to the current tx urb is not removed. Fix the issue deallocating the skb
      cleaning up the tx ring. Moreover this patch fixes the following kernel
      warning
      
      [   57.480771] usb 1-1: USB disconnect, device number 2
      [   57.483451] ------------[ cut here ]------------
      [   57.483462] TX urb mismatch
      [   57.483481] WARNING: CPU: 1 PID: 32 at drivers/net/wireless/mediatek/mt7601u/dma.c:245 mt7601u_complete_tx+0x165/00
      [   57.483483] Modules linked in:
      [   57.483496] CPU: 1 PID: 32 Comm: kworker/1:1 Not tainted 5.2.0-rc1+ #72
      [   57.483498] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
      [   57.483502] Workqueue: usb_hub_wq hub_event
      [   57.483507] RIP: 0010:mt7601u_complete_tx+0x165/0x1e0
      [   57.483510] Code: 8b b5 10 04 00 00 8b 8d 14 04 00 00 eb 8b 80 3d b1 cb e1 00 00 75 9e 48 c7 c7 a4 ea 05 82 c6 05 f
      [   57.483513] RSP: 0000:ffffc900000a0d28 EFLAGS: 00010092
      [   57.483516] RAX: 000000000000000f RBX: ffff88802c0a62c0 RCX: ffffc900000a0c2c
      [   57.483518] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff810a8371
      [   57.483520] RBP: ffff88803ced6858 R08: 0000000000000000 R09: 0000000000000001
      [   57.483540] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000046
      [   57.483542] R13: ffff88802c0a6c88 R14: ffff88803baab540 R15: ffff88803a0cc078
      [   57.483548] FS:  0000000000000000(0000) GS:ffff88803eb00000(0000) knlGS:0000000000000000
      [   57.483550] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   57.483552] CR2: 000055e7f6780100 CR3: 0000000028c86000 CR4: 00000000000006a0
      [   57.483554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   57.483556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   57.483559] Call Trace:
      [   57.483561]  <IRQ>
      [   57.483565]  __usb_hcd_giveback_urb+0x77/0xe0
      [   57.483570]  xhci_giveback_urb_in_irq.isra.0+0x8b/0x140
      [   57.483574]  handle_cmd_completion+0xf5b/0x12c0
      [   57.483577]  xhci_irq+0x1f6/0x1810
      [   57.483581]  ? lockdep_hardirqs_on+0x9e/0x180
      [   57.483584]  ? _raw_spin_unlock_irq+0x24/0x30
      [   57.483588]  __handle_irq_event_percpu+0x3a/0x260
      [   57.483592]  handle_irq_event_percpu+0x1c/0x60
      [   57.483595]  handle_irq_event+0x2f/0x4c
      [   57.483599]  handle_edge_irq+0x7e/0x1a0
      [   57.483603]  handle_irq+0x17/0x20
      [   57.483607]  do_IRQ+0x54/0x110
      [   57.483610]  common_interrupt+0xf/0xf
      [   57.483612]  </IRQ>
      Acked-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5d099ded
    • Masahiro Yamada's avatar
      x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c · 79e5cd36
      Masahiro Yamada authored
      [ Upstream commit bc53d3d7 ]
      
      Without 'set -e', shell scripts continue running even after any
      error occurs. The missed 'set -e' is a typical bug in shell scripting.
      
      For example, when a disk space shortage occurs while this script is
      running, it actually ends up with generating a truncated capflags.c.
      
      Yet, mkcapflags.sh continues running and exits with 0. So, the build
      system assumes it has succeeded.
      
      It will not be re-generated in the next invocation of Make since its
      timestamp is newer than that of any of the source files.
      
      Add 'set -e' so that any error in this script is caught and propagated
      to the build system.
      
      Since 9c2af1c7 ("kbuild: add .DELETE_ON_ERROR special target"),
      make automatically deletes the target on any failure. So, the broken
      capflags.c will be deleted automatically.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/20190625072622.17679-1-yamada.masahiro@socionext.comSigned-off-by: default avatarSasha Levin <sashal@kernel.org>
      79e5cd36
    • Lorenzo Bianconi's avatar
      mt7601u: do not schedule rx_tasklet when the device has been disconnected · ce48f658
      Lorenzo Bianconi authored
      [ Upstream commit 4079e8cc ]
      
      Do not schedule rx_tasklet when the usb dongle is disconnected.
      Moreover do not grub rx_lock in mt7601u_kill_rx since usb_poison_urb
      can run concurrently with urb completion and we can unlink urbs from rx
      ring in any order.
      This patch fixes the common kernel warning reported when
      the device is removed.
      
      [   24.921354] usb 3-14: USB disconnect, device number 7
      [   24.921593] ------------[ cut here ]------------
      [   24.921594] RX urb mismatch
      [   24.921675] WARNING: CPU: 4 PID: 163 at drivers/net/wireless/mediatek/mt7601u/dma.c:200 mt7601u_complete_rx+0xcb/0xd0 [mt7601u]
      [   24.921769] CPU: 4 PID: 163 Comm: kworker/4:2 Tainted: G           OE     4.19.31-041931-generic #201903231635
      [   24.921770] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P1.30 05/23/2014
      [   24.921782] Workqueue: usb_hub_wq hub_event
      [   24.921797] RIP: 0010:mt7601u_complete_rx+0xcb/0xd0 [mt7601u]
      [   24.921800] RSP: 0018:ffff9bd9cfd03d08 EFLAGS: 00010086
      [   24.921802] RAX: 0000000000000000 RBX: ffff9bd9bf043540 RCX: 0000000000000006
      [   24.921803] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9bd9cfd16420
      [   24.921804] RBP: ffff9bd9cfd03d28 R08: 0000000000000002 R09: 00000000000003a8
      [   24.921805] R10: 0000002f485fca34 R11: 0000000000000000 R12: ffff9bd9bf043c1c
      [   24.921806] R13: ffff9bd9c62fa3c0 R14: 0000000000000082 R15: 0000000000000000
      [   24.921807] FS:  0000000000000000(0000) GS:ffff9bd9cfd00000(0000) knlGS:0000000000000000
      [   24.921808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   24.921808] CR2: 00007fb2648b0000 CR3: 0000000142c0a004 CR4: 00000000001606e0
      [   24.921809] Call Trace:
      [   24.921812]  <IRQ>
      [   24.921819]  __usb_hcd_giveback_urb+0x8b/0x140
      [   24.921821]  usb_hcd_giveback_urb+0xca/0xe0
      [   24.921828]  xhci_giveback_urb_in_irq.isra.42+0x82/0xf0
      [   24.921834]  handle_cmd_completion+0xe02/0x10d0
      [   24.921837]  xhci_irq+0x274/0x4a0
      [   24.921838]  xhci_msi_irq+0x11/0x20
      [   24.921851]  __handle_irq_event_percpu+0x44/0x190
      [   24.921856]  handle_irq_event_percpu+0x32/0x80
      [   24.921861]  handle_irq_event+0x3b/0x5a
      [   24.921867]  handle_edge_irq+0x80/0x190
      [   24.921874]  handle_irq+0x20/0x30
      [   24.921889]  do_IRQ+0x4e/0xe0
      [   24.921891]  common_interrupt+0xf/0xf
      [   24.921892]  </IRQ>
      [   24.921900] RIP: 0010:usb_hcd_flush_endpoint+0x78/0x180
      [   24.921354] usb 3-14: USB disconnect, device number 7
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ce48f658
    • Ping-Ke Shih's avatar
      rtlwifi: rtl8192cu: fix error handle when usb probe failed · e6506579
      Ping-Ke Shih authored
      [ Upstream commit 6c0ed66f ]
      
      rtl_usb_probe() must do error handle rtl_deinit_core() only if
      rtl_init_core() is done, otherwise goto error_out2.
      
      | usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
      | rtl_usb: reg 0xf0, usbctrl_vendorreq TimeOut! status:0xffffffb9 value=0x0
      | rtl8192cu: Chip version 0x10
      | rtl_usb: reg 0xa, usbctrl_vendorreq TimeOut! status:0xffffffb9 value=0x0
      | rtl_usb: Too few input end points found
      | INFO: trying to register non-static key.
      | the code is fine but needs lockdep annotation.
      | turning off the locking correctness validator.
      | CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.1.0-rc4-319354-g9a33b36 #3
      | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      | Google 01/01/2011
      | Workqueue: usb_hub_wq hub_event
      | Call Trace:
      |   __dump_stack lib/dump_stack.c:77 [inline]
      |   dump_stack+0xe8/0x16e lib/dump_stack.c:113
      |   assign_lock_key kernel/locking/lockdep.c:786 [inline]
      |   register_lock_class+0x11b8/0x1250 kernel/locking/lockdep.c:1095
      |   __lock_acquire+0xfb/0x37c0 kernel/locking/lockdep.c:3582
      |   lock_acquire+0x10d/0x2f0 kernel/locking/lockdep.c:4211
      |   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
      |   _raw_spin_lock_irqsave+0x44/0x60 kernel/locking/spinlock.c:152
      |   rtl_c2hcmd_launcher+0xd1/0x390
      | drivers/net/wireless/realtek/rtlwifi/base.c:2344
      |   rtl_deinit_core+0x25/0x2d0 drivers/net/wireless/realtek/rtlwifi/base.c:574
      |   rtl_usb_probe.cold+0x861/0xa70
      | drivers/net/wireless/realtek/rtlwifi/usb.c:1093
      |   usb_probe_interface+0x31d/0x820 drivers/usb/core/driver.c:361
      |   really_probe+0x2da/0xb10 drivers/base/dd.c:509
      |   driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
      |   __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
      |   bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
      |   __device_attach+0x223/0x3a0 drivers/base/dd.c:844
      |   bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
      |   device_add+0xad2/0x16e0 drivers/base/core.c:2106
      |   usb_set_configuration+0xdf7/0x1740 drivers/usb/core/message.c:2021
      |   generic_probe+0xa2/0xda drivers/usb/core/generic.c:210
      |   usb_probe_device+0xc0/0x150 drivers/usb/core/driver.c:266
      |   really_probe+0x2da/0xb10 drivers/base/dd.c:509
      |   driver_probe_device+0x21d/0x350 drivers/base/dd.c:671
      |   __device_attach_driver+0x1d8/0x290 drivers/base/dd.c:778
      |   bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:454
      |   __device_attach+0x223/0x3a0 drivers/base/dd.c:844
      |   bus_probe_device+0x1f1/0x2a0 drivers/base/bus.c:514
      |   device_add+0xad2/0x16e0 drivers/base/core.c:2106
      |   usb_new_device.cold+0x537/0xccf drivers/usb/core/hub.c:2534
      |   hub_port_connect drivers/usb/core/hub.c:5089 [inline]
      |   hub_port_connect_change drivers/usb/core/hub.c:5204 [inline]
      |   port_event drivers/usb/core/hub.c:5350 [inline]
      |   hub_event+0x138e/0x3b00 drivers/usb/core/hub.c:5432
      |   process_one_work+0x90f/0x1580 kernel/workqueue.c:2269
      |   worker_thread+0x9b/0xe20 kernel/workqueue.c:2415
      |   kthread+0x313/0x420 kernel/kthread.c:253
      |   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      Reported-by: syzbot+1fcc5ef45175fc774231@syzkaller.appspotmail.com
      Signed-off-by: default avatarPing-Ke Shih <pkshih@realtek.com>
      Acked-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e6506579
    • Icenowy Zheng's avatar
      net: stmmac: sun8i: force select external PHY when no internal one · dc654e84
      Icenowy Zheng authored
      [ Upstream commit 0fec7e72 ]
      
      The PHY selection bit also exists on SoCs without an internal PHY; if it's
      set to 1 (internal PHY, default value) then the MAC will not make use of
      any PHY on such SoCs.
      
      This problem appears when adapting for H6, which has no real internal PHY
      (the "internal PHY" on H6 is not on-die, but on a co-packaged AC200 chip,
      connected via RMII interface at GPIO bank A).
      
      Force the PHY selection bit to 0 when the SOC doesn't have an internal PHY,
      to address the problem of a wrong default value.
      Signed-off-by: default avatarIcenowy Zheng <icenowy@aosc.io>
      Signed-off-by: default avatarOndrej Jirman <megous@megous.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc654e84
    • Hans Verkuil's avatar
      media: hdpvr: fix locking and a missing msleep · 93ba4c72
      Hans Verkuil authored
      [ Upstream commit 6bc5a4a1 ]
      
      This driver has three locking issues:
      
      - The wait_event_interruptible() condition calls hdpvr_get_next_buffer(dev)
        which uses a mutex, which is not allowed. Rewrite with list_empty_careful()
        that doesn't need locking.
      
      - In hdpvr_read() the call to hdpvr_stop_streaming() didn't lock io_mutex,
        but it should have since stop_streaming expects that.
      
      - In hdpvr_device_release() io_mutex was locked when calling flush_work(),
        but there it shouldn't take that mutex since the work done by flush_work()
        also wants to lock that mutex.
      
      There are also two other changes (suggested by Keith):
      
      - msecs_to_jiffies(4000); (a NOP) should have been msleep(4000).
      - Change v4l2_dbg to v4l2_info to always log if streaming had to be restarted.
      Reported-by: default avatarKeith Pyle <kpyle@austin.rr.com>
      Suggested-by: default avatarKeith Pyle <kpyle@austin.rr.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      93ba4c72
    • André Almeida's avatar
      media: vimc: cap: check v4l2_fill_pixfmt return value · e116077a
      André Almeida authored
      [ Upstream commit 77ae46e1 ]
      
      v4l2_fill_pixfmt() returns -EINVAL if the pixelformat used as parameter is
      invalid or if the user is trying to use a multiplanar format with the
      singleplanar API. Currently, the vimc_cap_try_fmt_vid_cap() returns such
      value, but vimc_cap_s_fmt_vid_cap() is ignoring it. Fix that and returns
      an error value if vimc_cap_try_fmt_vid_cap() has failed.
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      Suggested-by: default avatarHelen Koike <helen.koike@collabora.com>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e116077a
    • Philipp Zabel's avatar
      media: coda: increment sequence offset for the last returned frame · d0e10dee
      Philipp Zabel authored
      [ Upstream commit b3b7d968 ]
      
      If no more frames are decoded in bitstream end mode, and a previously
      decoded frame has been returned, the firmware still increments the frame
      number. To avoid a sequence number mismatch after decoder restart,
      increment the sequence_offset correction parameter.
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d0e10dee
    • Marco Felsch's avatar
      media: coda: fix last buffer handling in V4L2_ENC_CMD_STOP · cc1c3011
      Marco Felsch authored
      [ Upstream commit f3775f89 ]
      
      coda_encoder_cmd() is racy, as the last scheduled picture run worker can
      still be in-flight while the ENC_CMD_STOP command is issued. Depending
      on the exact timing the sequence numbers might already be changed, but
      the last buffer might not have been put on the destination queue yet.
      
      In this case the current implementation would prematurely wake the
      destination queue with last_buffer_dequeued=true, causing userspace to
      call streamoff before the last buffer is handled.
      
      Close this race window by synchronizing with the pic_run_worker before
      doing the sequence check.
      Signed-off-by: default avatarMarco Felsch <m.felsch@pengutronix.de>
      [l.stach@pengutronix.de: switch to flush_work, reword commit message]
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cc1c3011
    • Philipp Zabel's avatar
      media: coda: fix mpeg2 sequence number handling · b6a853dd
      Philipp Zabel authored
      [ Upstream commit 56d159a4 ]
      
      Sequence number handling assumed that the BIT processor frame number
      starts counting at 1, but this is not true for the MPEG-2 decoder,
      which starts at 0. Fix the sequence counter offset detection to handle
      this.
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b6a853dd
    • Ard Biesheuvel's avatar
      acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 · 67d73cc8
      Ard Biesheuvel authored
      [ Upstream commit 2af22f3e ]
      
      Some Qualcomm Snapdragon based laptops built to run Microsoft Windows
      are clearly ACPI 5.1 based, given that that is the first ACPI revision
      that supports ARM, and introduced the FADT 'arm_boot_flags' field,
      which has a non-zero field on those systems.
      
      So in these cases, infer from the ARM boot flags that the FADT must be
      5.1 or later, and treat it as 5.1.
      Acked-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Tested-by: default avatarLee Jones <lee.jones@linaro.org>
      Reviewed-by: default avatarGraeme Gregory <graeme.gregory@linaro.org>
      Acked-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      67d73cc8
    • Kuninori Morimoto's avatar
      ASoC: soc-core: call snd_soc_unbind_card() under mutex_lock; · 95783489
      Kuninori Morimoto authored
      [ Upstream commit b545542a ]
      
      commit 34ac3c3e ("ASoC: core: lock client_mutex while removing
      link components") added mutex_lock() at soc_remove_link_components().
      
      Is is called from snd_soc_unbind_card()
      
      	snd_soc_unbind_card()
      =>		soc_remove_link_components()
      		soc_cleanup_card_resources()
      			soc_remove_dai_links()
      =>				soc_remove_link_components()
      
      And, there are 2 way to call it.
      
      (1)
      	snd_soc_unregister_component()
      **		mutex_lock()
      			snd_soc_component_del_unlocked()
      =>				snd_soc_unbind_card()
      **		mutex_unlock()
      
      (2)
      	snd_soc_unregister_card()
      =>		snd_soc_unbind_card()
      
      (1) case is already using mutex_lock() when it calles
      snd_soc_unbind_card(), thus, we will get lockdep warning.
      
      commit 495f926c ("ASoC: core: Fix deadlock in
      snd_soc_instantiate_card()") tried to fixup it, but still not
      enough. We still have lockdep warning when we try unbind/bind.
      
      We need mutex_lock() under snd_soc_unregister_card()
      instead of snd_remove_link_components()/snd_soc_unbind_card().
      
      Fixes: 34ac3c3e ("ASoC: core: lock client_mutex while removing link components")
      Fixes: 495f926c ("ASoC: core: Fix deadlock in snd_soc_instantiate_card()")
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      95783489