1. 26 Jan, 2019 33 commits
    • Kevin Barnett's avatar
      scsi: smartpqi: correct lun reset issues · 1da52f24
      Kevin Barnett authored
      [ Upstream commit 2ba55c98 ]
      
      Problem:
      The Linux kernel takes a logical volume offline after a LUN reset.  This is
      generally accompanied by this message in the dmesg output:
      
      Device offlined - not ready after error recovery
      
      Root Cause:
      The root cause is a "quirk" in the timeout handling in the Linux SCSI
      layer. The Linux kernel places a 30-second timeout on most media access
      commands (reads and writes) that it send to device drivers.  When a media
      access command times out, the Linux kernel goes into error recovery mode
      for the LUN that was the target of the command that timed out. Every
      command that timed out is kept on a list inside of the Linux kernel to be
      retried later. The kernel attempts to recover the command(s) that timed out
      by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
      TEST UNIT READY commands are successful, the kernel retries the command(s)
      that timed out.
      
      Each SCSI command issued by the kernel has a result field associated with
      it. This field indicates the final result of the command (success or
      error). When a command times out, the kernel places a value in this result
      field indicating that the command timed out.
      
      The "quirk" is that after the LUN reset and TEST UNIT READY commands are
      completed, the kernel checks each command on the timed-out command list
      before retrying it. If the result field is still "timed out", the kernel
      treats that command as not having been successfully recovered for a
      retry. If the number of commands that are in this state are greater than
      two, the kernel takes the LUN offline.
      
      Fix:
      When our RAIDStack receives a LUN reset, it simply waits until all
      outstanding commands complete. Generally, all of these outstanding commands
      complete successfully. Therefore, the fix in the smartpqi driver is to
      always set the command result field to indicate success when a request
      completes successfully. This normally isn’t necessary because the result
      field is always initialized to success when the command is submitted to the
      driver. So when the command completes successfully, the result field is
      left untouched. But in this case, the kernel changes the result field
      behind the driver’s back and then expects the field to be changed by the
      driver as the commands that timed-out complete.
      Reviewed-by: default avatarDave Carroll <david.carroll@microsemi.com>
      Reviewed-by: default avatarScott Teel <scott.teel@microsemi.com>
      Signed-off-by: default avatarKevin Barnett <kevin.barnett@microsemi.com>
      Signed-off-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1da52f24
    • Daniel Vetter's avatar
      sysfs: Disable lockdep for driver bind/unbind files · 54e6f64e
      Daniel Vetter authored
      [ Upstream commit 4f4b3743 ]
      
      This is the much more correct fix for my earlier attempt at:
      
      https://lkml.org/lkml/2018/12/10/118
      
      Short recap:
      
      - There's not actually a locking issue, it's just lockdep being a bit
        too eager to complain about a possible deadlock.
      
      - Contrary to what I claimed the real problem is recursion on
        kn->count. Greg pointed me at sysfs_break_active_protection(), used
        by the scsi subsystem to allow a sysfs file to unbind itself. That
        would be a real deadlock, which isn't what's happening here. Also,
        breaking the active protection means we'd need to manually handle
        all the lifetime fun.
      
      - With Rafael we discussed the task_work approach, which kinda works,
        but has two downsides: It's a functional change for a lockdep
        annotation issue, and it won't work for the bind file (which needs
        to get the errno from the driver load function back to userspace).
      
      - Greg also asked why this never showed up: To hit this you need to
        unregister a 2nd driver from the unload code of your first driver. I
        guess only gpus do that. The bug has always been there, but only
        with a recent patch series did we add more locks so that lockdep
        built a chain from unbinding the snd-hda driver to the
        acpi_video_unregister call.
      
      Full lockdep splat:
      
      [12301.898799] ============================================
      [12301.898805] WARNING: possible recursive locking detected
      [12301.898811] 4.20.0-rc7+ #84 Not tainted
      [12301.898815] --------------------------------------------
      [12301.898821] bash/5297 is trying to acquire lock:
      [12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80
      [12301.898841] but task is already holding lock:
      [12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
      [12301.898856] other info that might help us debug this:
      [12301.898862]  Possible unsafe locking scenario:
      [12301.898867]        CPU0
      [12301.898870]        ----
      [12301.898874]   lock(kn->count#39);
      [12301.898879]   lock(kn->count#39);
      [12301.898883] *** DEADLOCK ***
      [12301.898891]  May be due to missing lock nesting notation
      [12301.898899] 5 locks held by bash/5297:
      [12301.898903]  #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0
      [12301.898915]  #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190
      [12301.898925]  #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
      [12301.898936]  #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240
      [12301.898950]  #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40
      [12301.898960] stack backtrace:
      [12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84
      [12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
      [12301.898982] Call Trace:
      [12301.898989]  dump_stack+0x67/0x9b
      [12301.898997]  __lock_acquire+0x6ad/0x1410
      [12301.899003]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899010]  ? find_held_lock+0x2d/0x90
      [12301.899017]  ? mutex_spin_on_owner+0xe4/0x150
      [12301.899023]  ? find_held_lock+0x2d/0x90
      [12301.899030]  ? lock_acquire+0x90/0x180
      [12301.899036]  lock_acquire+0x90/0x180
      [12301.899042]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899049]  __kernfs_remove+0x296/0x310
      [12301.899055]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899060]  ? kernfs_name_hash+0xd/0x80
      [12301.899066]  ? kernfs_find_ns+0x6c/0x100
      [12301.899073]  kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899080]  bus_remove_driver+0x92/0xa0
      [12301.899085]  acpi_video_unregister+0x24/0x40
      [12301.899127]  i915_driver_unload+0x42/0x130 [i915]
      [12301.899160]  i915_pci_remove+0x19/0x30 [i915]
      [12301.899169]  pci_device_remove+0x36/0xb0
      [12301.899176]  device_release_driver_internal+0x185/0x240
      [12301.899183]  unbind_store+0xaf/0x180
      [12301.899189]  kernfs_fop_write+0x104/0x190
      [12301.899195]  __vfs_write+0x31/0x180
      [12301.899203]  ? rcu_read_lock_sched_held+0x6f/0x80
      [12301.899209]  ? rcu_sync_lockdep_assert+0x29/0x50
      [12301.899216]  ? __sb_start_write+0x13c/0x1a0
      [12301.899221]  ? vfs_write+0x17f/0x1b0
      [12301.899227]  vfs_write+0xb9/0x1b0
      [12301.899233]  ksys_write+0x50/0xc0
      [12301.899239]  do_syscall_64+0x4b/0x180
      [12301.899247]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [12301.899253] RIP: 0033:0x7f452ac7f7a4
      [12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
      [12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4
      [12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001
      [12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730
      [12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d
      [12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d
      
      Looking around I've noticed that usb and i2c already handle similar
      recursion problems, where a sysfs file can unbind the same type of
      sysfs somewhere else in the hierarchy. Relevant commits are:
      
      commit 356c05d5
      Author: Alan Stern <stern@rowland.harvard.edu>
      Date:   Mon May 14 13:30:03 2012 -0400
      
          sysfs: get rid of some lockdep false positives
      
      commit e9b526fe
      Author: Alexander Sverdlin <alexander.sverdlin@nsn.com>
      Date:   Fri May 17 14:56:35 2013 +0200
      
          i2c: suppress lockdep warning on delete_device
      
      Implement the same trick for driver bind/unbind.
      
      v2: Put the macro into bus.c (Greg).
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Cc: Arend van Spriel <aspriel@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Bartosz Golaszewski <brgl@bgdev.pl>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      54e6f64e
    • Takashi Sakamoto's avatar
      ALSA: bebob: fix model-id of unit for Apogee Ensemble · d9c8a99f
      Takashi Sakamoto authored
      [ Upstream commit 644b2e97 ]
      
      This commit fixes hard-coded model-id for an unit of Apogee Ensemble with
      a correct value. This unit uses DM1500 ASIC produced ArchWave AG (formerly
      known as BridgeCo AG).
      
      I note that this model supports three modes in the number of data channels
      in tx/rx streams; 8 ch pairs, 10 ch pairs, 18 ch pairs. The mode is
      switched by Vendor-dependent AV/C command, like:
      
      $ cd linux-firewire-utils
      $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0600000000 (8ch pairs)
      $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0601000000 (10ch pairs)
      $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0602000000 (18ch pairs)
      
      When switching between different mode, the unit disappears from IEEE 1394
      bus, then appears on the bus with different combination of stream formats.
      In a mode of 18 ch pairs, available sampling rate is up to 96.0 kHz, else
      up to 192.0 kHz.
      
      $ ./hinawa-config-rom-printer /dev/fw1
      { 'bus-info': { 'adj': False,
                      'bmc': True,
                      'chip_ID': 21474898341,
                      'cmc': True,
                      'cyc_clk_acc': 100,
                      'generation': 2,
                      'imc': True,
                      'isc': True,
                      'link_spd': 2,
                      'max_ROM': 1,
                      'max_rec': 512,
                      'name': '1394',
                      'node_vendor_ID': 987,
                      'pmc': False},
        'root-directory': [ ['HARDWARE_VERSION', 19],
                            [ 'NODE_CAPABILITIES',
                              { 'addressing': {'64': True, 'fix': True, 'prv': False},
                                'misc': {'int': False, 'ms': False, 'spt': True},
                                'state': { 'atn': False,
                                           'ded': False,
                                           'drq': True,
                                           'elo': False,
                                           'init': False,
                                           'lst': True,
                                           'off': False},
                                'testing': {'bas': False, 'ext': False}}],
                            ['VENDOR', 987],
                            ['DESCRIPTOR', 'Apogee Electronics'],
                            ['MODEL', 126702],
                            ['DESCRIPTOR', 'Ensemble'],
                            ['VERSION', 5297],
                            [ 'UNIT',
                              [ ['SPECIFIER_ID', 41005],
                                ['VERSION', 65537],
                                ['MODEL', 126702],
                                ['DESCRIPTOR', 'Ensemble']]],
                            [ 'DEPENDENT_INFO',
                              [ ['SPECIFIER_ID', 2037],
                                ['VERSION', 1],
                                [(58, 'IMMEDIATE'), 16777159],
                                [(59, 'IMMEDIATE'), 1048576],
                                [(60, 'IMMEDIATE'), 16777159],
                                [(61, 'IMMEDIATE'), 6291456]]]]}
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d9c8a99f
    • Nikos Tsironis's avatar
      dm snapshot: Fix excessive memory usage and workqueue stalls · 69855b55
      Nikos Tsironis authored
      [ Upstream commit 721b1d98 ]
      
      kcopyd has no upper limit to the number of jobs one can allocate and
      issue. Under certain workloads this can lead to excessive memory usage
      and workqueue stalls. For example, when creating multiple dm-snapshot
      targets with a 4K chunk size and then writing to the origin through the
      page cache. Syncing the page cache causes a large number of BIOs to be
      issued to the dm-snapshot origin target, which itself issues an even
      larger (because of the BIO splitting taking place) number of kcopyd
      jobs.
      
      Running the following test, from the device mapper test suite [1],
      
        dmtest run --suite snapshot -n many_snapshots_of_same_volume_N
      
      , with 8 active snapshots, results in the kcopyd job slab cache growing
      to 10G. Depending on the available system RAM this can lead to the OOM
      killer killing user processes:
      
      [463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP),
                    nodemask=(null), order=1, oom_score_adj=0
      [463.492894] kthreadd cpuset=/ mems_allowed=0
      [463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3
      [463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [463.492952] Call Trace:
      [463.492964]  dump_stack+0x7d/0xbb
      [463.492973]  dump_header+0x6b/0x2fc
      [463.492987]  ? lockdep_hardirqs_on+0xee/0x190
      [463.493012]  oom_kill_process+0x302/0x370
      [463.493021]  out_of_memory+0x113/0x560
      [463.493030]  __alloc_pages_slowpath+0xf40/0x1020
      [463.493055]  __alloc_pages_nodemask+0x348/0x3c0
      [463.493067]  cache_grow_begin+0x81/0x8b0
      [463.493072]  ? cache_grow_begin+0x874/0x8b0
      [463.493078]  fallback_alloc+0x1e4/0x280
      [463.493092]  kmem_cache_alloc_node+0xd6/0x370
      [463.493098]  ? copy_process.part.31+0x1c5/0x20d0
      [463.493105]  copy_process.part.31+0x1c5/0x20d0
      [463.493115]  ? __lock_acquire+0x3cc/0x1550
      [463.493121]  ? __switch_to_asm+0x34/0x70
      [463.493129]  ? kthread_create_worker_on_cpu+0x70/0x70
      [463.493135]  ? finish_task_switch+0x90/0x280
      [463.493165]  _do_fork+0xe0/0x6d0
      [463.493191]  ? kthreadd+0x19f/0x220
      [463.493233]  kernel_thread+0x25/0x30
      [463.493235]  kthreadd+0x1bf/0x220
      [463.493242]  ? kthread_create_on_cpu+0x90/0x90
      [463.493248]  ret_from_fork+0x3a/0x50
      [463.493279] Mem-Info:
      [463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0
      [463.493285]  active_file:80216 inactive_file:80107 isolated_file:435
      [463.493285]  unevictable:0 dirty:51266 writeback:109372 unstable:0
      [463.493285]  slab_reclaimable:31191 slab_unreclaimable:3483521
      [463.493285]  mapped:526 shmem:4903 pagetables:1759 bounce:0
      [463.493285]  free:33623 free_pcp:2392 free_cma:0
      ...
      [463.493489] Unreclaimable slab info:
      [463.493513] Name                      Used          Total
      [463.493522] bio-6                   1028KB       1028KB
      [463.493525] bio-5                   1028KB       1028KB
      [463.493528] dm_snap_pending_exception     236783KB     243789KB
      [463.493531] dm_exception              41KB         42KB
      [463.493534] bio-4                   1216KB       1216KB
      [463.493537] bio-3                 439396KB     439396KB
      [463.493539] kcopyd_job           6973427KB    6973427KB
      ...
      [463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child
      [463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB
      [463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      
      Moreover, issuing a large number of kcopyd jobs results in kcopyd
      hogging the CPU, while processing them. As a result, processing of work
      items, queued for execution on the same CPU as the currently running
      kcopyd thread, is stalled for long periods of time, hurting performance.
      Running the aforementioned test we get, in dmesg, messages like the
      following:
      
      [67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s!
      [67501.195586] Showing busy workqueues and worker pools:
      [67501.195591] workqueue events: flags=0x0
      [67501.195597]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195611]     pending: cache_reap
      [67501.195641] workqueue mm_percpu_wq: flags=0x8
      [67501.195645]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195656]     pending: vmstat_update
      [67501.195682] workqueue kblockd: flags=0x18
      [67501.195687]   pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256
      [67501.195698]     pending: blk_timeout_work
      [67501.195753] workqueue kcopyd: flags=0x8
      [67501.195757]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195768]     pending: do_work [dm_mod]
      [67501.195802] workqueue kcopyd: flags=0x8
      [67501.195806]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195817]     pending: do_work [dm_mod]
      [67501.195834] workqueue kcopyd: flags=0x8
      [67501.195838]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195848]     pending: do_work [dm_mod]
      [67501.195881] workqueue kcopyd: flags=0x8
      [67501.195885]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195896]     pending: do_work [dm_mod]
      [67501.195920] workqueue kcopyd: flags=0x8
      [67501.195924]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256
      [67501.195935]     in-flight: 67:do_work [dm_mod]
      [67501.195945]     pending: do_work [dm_mod]
      [67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765
      
      The root cause for these issues is the way dm-snapshot uses kcopyd. In
      particular, the lack of an explicit or implicit limit to the maximum
      number of in-flight COW jobs. The merging path is not affected because
      it implicitly limits the in-flight kcopyd jobs to one.
      
      Fix these issues by using a semaphore to limit the maximum number of
      in-flight kcopyd jobs. We grab the semaphore before allocating a new
      kcopyd job in start_copy() and start_full_bio() and release it after the
      job finishes in copy_callback().
      
      The initial semaphore value is configurable through a module parameter,
      to allow fine tuning the maximum number of in-flight COW jobs. Setting
      this parameter to zero initializes the semaphore to INT_MAX.
      
      A default value of 2048 maximum in-flight kcopyd jobs was chosen. This
      value was decided experimentally as a trade-off between memory
      consumption, stalling the kernel's workqueues and maintaining a high
      enough throughput.
      
      Re-running the aforementioned test:
      
        * Workqueue stalls are eliminated
        * kcopyd's job slab cache uses a maximum of 130MB
        * The time taken by the test to write to the snapshot-origin target is
          reduced from 05m20.48s to 03m26.38s
      
      [1] https://github.com/jthornber/device-mapper-test-suiteSigned-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarIlias Tsitsimpis <iliastsi@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69855b55
    • Arnaldo Carvalho de Melo's avatar
      tools lib subcmd: Don't add the kernel sources to the include path · 7a6da629
      Arnaldo Carvalho de Melo authored
      [ Upstream commit ece98049 ]
      
      At some point we decided not to directly include kernel sources files
      when building tools/perf/, but when tools/lib/subcmd/ was forked from
      tools/perf it somehow ended up adding it via these two lines in its
      Makefile:
      
        CFLAGS += -I$(srctree)/include/uapi
        CFLAGS += -I$(srctree)/include
      
      As $(srctree) points to the kernel sources.
      
      Removing those lines and keeping just:
      
        CFLAGS += -I$(srctree)/tools/include/
      
      Is enough to build tools/perf and tools/objtool.
      
      This fixes the build when building from the sources in environments such
      as the Android NDK crossbuilding from a fedora:26 system:
      
        subcmd-util.h:11:15: error: expected ',' or ';' before 'void'
         static inline void report(const char *prefix, const char *err, va_list params)
                       ^
        In file included from /git/perf/include/uapi/linux/stddef.h:2:0,
                         from /git/perf/include/uapi/linux/posix_types.h:5,
                         from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h:36,
                         from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/unistd.h:33,
                         from run-command.c:2:
        subcmd-util.h:18:17: error: '__no_instrument_function__' attribute applies only to functions
      
      The /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h
      file that includes linux/posix_types.h ends up getting the one in the kernel
      sources causing the breakage. Fix it.
      
      Test built tools/objtool/ too.
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 4b6ab94e ("perf subcmd: Create subcmd library")
      Link: https://lkml.kernel.org/n/tip-5lhaoecrj12t0bqwvpiu14sm@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7a6da629
    • Nikos Tsironis's avatar
      dm kcopyd: Fix bug causing workqueue stalls · e9566e99
      Nikos Tsironis authored
      [ Upstream commit d7e6b8df ]
      
      When using kcopyd to run callbacks through dm_kcopyd_do_callback() or
      submitting copy jobs with a source size of 0, the jobs are pushed
      directly to the complete_jobs list, which could be under processing by
      the kcopyd thread. As a result, the kcopyd thread can continue running
      completed jobs indefinitely, without releasing the CPU, as long as
      someone keeps submitting new completed jobs through the aforementioned
      paths. Processing of work items, queued for execution on the same CPU as
      the currently running kcopyd thread, is thus stalled for excessive
      amounts of time, hurting performance.
      
      Running the following test, from the device mapper test suite [1],
      
        dmtest run --suite snapshot -n parallel_io_to_many_snaps_N
      
      , with 8 active snapshots, we get, in dmesg, messages like the
      following:
      
      [68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s!
      [68899.949282] Showing busy workqueues and worker pools:
      [68899.949288] workqueue events: flags=0x0
      [68899.949295]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
      [68899.949306]     pending: vmstat_shepherd, cache_reap
      [68899.949331] workqueue mm_percpu_wq: flags=0x8
      [68899.949337]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949345]     pending: vmstat_update
      [68899.949387] workqueue dm_bufio_cache: flags=0x8
      [68899.949392]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
      [68899.949400]     pending: work_fn [dm_bufio]
      [68899.949423] workqueue kcopyd: flags=0x8
      [68899.949429]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949437]     pending: do_work [dm_mod]
      [68899.949452] workqueue kcopyd: flags=0x8
      [68899.949458]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
      [68899.949466]     in-flight: 13:do_work [dm_mod]
      [68899.949474]     pending: do_work [dm_mod]
      [68899.949487] workqueue kcopyd: flags=0x8
      [68899.949493]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949501]     pending: do_work [dm_mod]
      [68899.949515] workqueue kcopyd: flags=0x8
      [68899.949521]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949529]     pending: do_work [dm_mod]
      [68899.949541] workqueue kcopyd: flags=0x8
      [68899.949547]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949555]     pending: do_work [dm_mod]
      [68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084
      
      Fix this by splitting the complete_jobs list into two parts: A user
      facing part, named callback_jobs, and one used internally by kcopyd,
      retaining the name complete_jobs. dm_kcopyd_do_callback() and
      dispatch_job() now push their jobs to the callback_jobs list, which is
      spliced to the complete_jobs list once, every time the kcopyd thread
      wakes up. This prevents kcopyd from hogging the CPU indefinitely and
      causing workqueue stalls.
      
      Re-running the aforementioned test:
      
        * Workqueue stalls are eliminated
        * The maximum writing time among all targets is reduced from 09m37.10s
          to 06m04.85s and the total run time of the test is reduced from
          10m43.591s to 7m19.199s
      
      [1] https://github.com/jthornber/device-mapper-test-suiteSigned-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarIlias Tsitsimpis <iliastsi@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e9566e99
    • Arnaldo Carvalho de Melo's avatar
      perf parse-events: Fix unchecked usage of strncpy() · 8691aa52
      Arnaldo Carvalho de Melo authored
      [ Upstream commit bd8d57fb ]
      
      The strncpy() function may leave the destination string buffer
      unterminated, better use strlcpy() that we have a __weak fallback
      implementation for systems without it.
      
      This fixes this warning on an Alpine Linux Edge system with gcc 8.2:
      
        util/parse-events.c: In function 'print_symbol_events':
        util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
            strncpy(name, syms->symbol, MAX_NAME_LEN);
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        In function 'print_symbol_events.constprop',
            inlined from 'print_events' at util/parse-events.c:2508:2:
        util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
            strncpy(name, syms->symbol, MAX_NAME_LEN);
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        In function 'print_symbol_events.constprop',
            inlined from 'print_events' at util/parse-events.c:2511:2:
        util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
            strncpy(name, syms->symbol, MAX_NAME_LEN);
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 947b4ad1 ("perf list: Fix max event string size")
      Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8691aa52
    • Arnaldo Carvalho de Melo's avatar
      perf svghelper: Fix unchecked usage of strncpy() · f9e281df
      Arnaldo Carvalho de Melo authored
      [ Upstream commit 2f530253 ]
      
      The strncpy() function may leave the destination string buffer
      unterminated, better use strlcpy() that we have a __weak fallback
      implementation for systems without it.
      
      In this specific case this would only happen if fgets() was buggy, as
      its man page states that it should read one less byte than the size of
      the destination buffer, so that it can put the nul byte at the end of
      it, so it would never copy 255 non-nul chars, as fgets reads into the
      orig buffer at most 254 non-nul chars and terminates it. But lets just
      switch to strlcpy to keep the original intent and silence the gcc 8.2
      warning.
      
      This fixes this warning on an Alpine Linux Edge system with gcc 8.2:
      
        In function 'cpu_model',
            inlined from 'svg_cpu_box' at util/svghelper.c:378:2:
        util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation]
             strncpy(cpu_m, &buf[13], 255);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Fixes: f48d55ce ("perf: Add a SVG helper library file")
      Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f9e281df
    • Adrian Hunter's avatar
      perf intel-pt: Fix error with config term "pt=0" · 9da7dfdb
      Adrian Hunter authored
      [ Upstream commit 1c6f709b ]
      
      Users should never use 'pt=0', but if they do it may give a meaningless
      error:
      
      	$ perf record -e intel_pt/pt=0/u uname
      	Error:
      	The sys_perf_event_open() syscall returned with 22 (Invalid argument) for
      	event (intel_pt/pt=0/u).
      
      Fix that by forcing 'pt=1'.
      
      Committer testing:
      
        # perf record -e intel_pt/pt=0/u uname
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u).
        /bin/dmesg | grep -i perf may provide additional information.
      
        # perf record -e intel_pt/pt=0/u uname
        pt=0 doesn't make sense, forcing pt=1
        Linux
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.020 MB perf.data ]
        #
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/b7c5b4e5-9497-10e5-fd43-5f3e4a0fe51d@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9da7dfdb
    • Sergey Senozhatsky's avatar
      tty/serial: do not free trasnmit buffer page under port lock · 80c8e528
      Sergey Senozhatsky authored
      [ Upstream commit d7240214 ]
      
      LKP has hit yet another circular locking dependency between uart
      console drivers and debugobjects [1]:
      
           CPU0                                    CPU1
      
                                                  rhltable_init()
                                                   __init_work()
                                                    debug_object_init
           uart_shutdown()                          /* db->lock */
            /* uart_port->lock */                    debug_print_object()
             free_page()                              printk()
                                                       call_console_drivers()
              debug_check_no_obj_freed()                /* uart_port->lock */
               /* db->lock */
                debug_print_object()
      
      So there are two dependency chains:
      	uart_port->lock -> db->lock
      And
      	db->lock -> uart_port->lock
      
      This particular circular locking dependency can be addressed in several
      ways:
      
      a) One way would be to move debug_print_object() out of db->lock scope
         and, thus, break the db->lock -> uart_port->lock chain.
      b) Another one would be to free() transmit buffer page out of db->lock
         in UART code; which is what this patch does.
      
      It makes sense to apply a) and b) independently: there are too many things
      going on behind free(), none of which depend on uart_port->lock.
      
      The patch fixes transmit buffer page free() in uart_shutdown() and,
      additionally, in uart_port_startup() (as was suggested by Dmitry Safonov).
      
      [1] https://lore.kernel.org/lkml/20181211091154.GL23332@shao2-debian/T/#uSigned-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Dmitry Safonov <dima@arista.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      80c8e528
    • Jonas Danielsson's avatar
      mmc: atmel-mci: do not assume idle after atmci_request_end · 20f33f37
      Jonas Danielsson authored
      [ Upstream commit ae460c11 ]
      
      On our AT91SAM9260 board we use the same sdio bus for wifi and for the
      sd card slot. This caused the atmel-mci to give the following splat on
      the serial console:
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 538 at drivers/mmc/host/atmel-mci.c:859 atmci_send_command+0x24/0x44
        Modules linked in:
        CPU: 0 PID: 538 Comm: mmcqd/0 Not tainted 4.14.76 #14
        Hardware name: Atmel AT91SAM9
        [<c000fccc>] (unwind_backtrace) from [<c000d3dc>] (show_stack+0x10/0x14)
        [<c000d3dc>] (show_stack) from [<c0017644>] (__warn+0xd8/0xf4)
        [<c0017644>] (__warn) from [<c0017704>] (warn_slowpath_null+0x1c/0x24)
        [<c0017704>] (warn_slowpath_null) from [<c033bb9c>] (atmci_send_command+0x24/0x44)
        [<c033bb9c>] (atmci_send_command) from [<c033e984>] (atmci_start_request+0x1f4/0x2dc)
        [<c033e984>] (atmci_start_request) from [<c033f3b4>] (atmci_request+0xf0/0x164)
        [<c033f3b4>] (atmci_request) from [<c0327108>] (mmc_start_request+0x280/0x2d0)
        [<c0327108>] (mmc_start_request) from [<c032800c>] (mmc_start_areq+0x230/0x330)
        [<c032800c>] (mmc_start_areq) from [<c03366f8>] (mmc_blk_issue_rw_rq+0xc4/0x310)
        [<c03366f8>] (mmc_blk_issue_rw_rq) from [<c03372c4>] (mmc_blk_issue_rq+0x118/0x5ac)
        [<c03372c4>] (mmc_blk_issue_rq) from [<c033781c>] (mmc_queue_thread+0xc4/0x118)
        [<c033781c>] (mmc_queue_thread) from [<c002daf8>] (kthread+0x100/0x118)
        [<c002daf8>] (kthread) from [<c000a580>] (ret_from_fork+0x14/0x34)
        ---[ end trace 594371ddfa284bd6 ]---
      
      This is:
        WARN_ON(host->cmd);
      
      This was fixed on our board by letting atmci_request_end determine what
      state we are in. Instead of unconditionally setting it to STATE_IDLE on
      STATE_END_REQUEST.
      Signed-off-by: default avatarJonas Danielsson <jonas@orbital-systems.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      20f33f37
    • Masahiro Yamada's avatar
      kconfig: fix memory leak when EOF is encountered in quotation · 4f67ca09
      Masahiro Yamada authored
      [ Upstream commit fbac5977 ]
      
      An unterminated string literal followed by new line is passed to the
      parser (with "multi-line strings not supported" warning shown), then
      handled properly there.
      
      On the other hand, an unterminated string literal at end of file is
      never passed to the parser, then results in memory leak.
      
      [Test Code]
      
        ----------(Kconfig begin)----------
        source "Kconfig.inc"
      
        config A
                bool "a"
        -----------(Kconfig end)-----------
      
        --------(Kconfig.inc begin)--------
        config B
                bool "b\No new line at end of file
        ---------(Kconfig.inc end)---------
      
      [Summary from Valgrind]
      
        Before the fix:
      
          LEAK SUMMARY:
             definitely lost: 16 bytes in 1 blocks
             ...
      
        After the fix:
      
          LEAK SUMMARY:
             definitely lost: 0 bytes in 0 blocks
             ...
      
      Eliminate the memory leak path by handling this case. Of course, such
      a Kconfig file is wrong already, so I will add an error message later.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4f67ca09
    • Masahiro Yamada's avatar
      kconfig: fix file name and line number of warn_ignored_character() · 7ff335ee
      Masahiro Yamada authored
      [ Upstream commit 77c1c0fa ]
      
      Currently, warn_ignore_character() displays invalid file name and
      line number.
      
      The lexer should use current_file->name and yylineno, while the parser
      should use zconf_curname() and zconf_lineno().
      
      This difference comes from that the lexer is always going ahead
      of the parser. The parser needs to look ahead one token to make a
      shift/reduce decision, so the lexer is requested to scan more text
      from the input file.
      
      This commit fixes the warning message from warn_ignored_character().
      
      [Test Code]
      
        ----(Kconfig begin)----
        /
        -----(Kconfig end)-----
      
      [Output]
      
        Before the fix:
      
        <none>:0:warning: ignoring unsupported character '/'
      
        After the fix:
      
        Kconfig:1:warning: ignoring unsupported character '/'
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7ff335ee
    • Lucas Stach's avatar
      clk: imx6q: reset exclusive gates on init · 73319e8d
      Lucas Stach authored
      [ Upstream commit f7542d81 ]
      
      The exclusive gates may be set up in the wrong way by software running
      before the clock driver comes up. In that case the exclusive setup is
      locked in its initial state, as the complementary function can't be
      activated without disabling the initial setup first.
      
      To avoid this lock situation, reset the exclusive gates to the off
      state and allow the kernel to provide the proper setup.
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Reviewed-by: default avatarDong Aisheng <Aisheng.dong@nxp.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      73319e8d
    • David Disseldorp's avatar
      scsi: target: use consistent left-aligned ASCII INQUIRY data · e5620572
      David Disseldorp authored
      [ Upstream commit 0de26357 ]
      
      spc5r17.pdf specifies:
      
        4.3.1 ASCII data field requirements
        ASCII data fields shall contain only ASCII printable characters (i.e.,
        code values 20h to 7Eh) and may be terminated with one or more ASCII null
        (00h) characters.  ASCII data fields described as being left-aligned
        shall have any unused bytes at the end of the field (i.e., highest
        offset) and the unused bytes shall be filled with ASCII space characters
        (20h).
      
      LIO currently space-pads the T10 VENDOR IDENTIFICATION and PRODUCT
      IDENTIFICATION fields in the standard INQUIRY data. However, the PRODUCT
      REVISION LEVEL field in the standard INQUIRY data as well as the T10 VENDOR
      IDENTIFICATION field in the INQUIRY Device Identification VPD Page are
      zero-terminated/zero-padded.
      
      Fix this inconsistency by using space-padding for all of the above fields.
      Signed-off-by: default avatarDavid Disseldorp <ddiss@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBryant G. Ly <bly@catalogicsoftware.com>
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarRoman Bolshakov <r.bolshakov@yadro.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e5620572
    • yupeng's avatar
      net: call sk_dst_reset when set SO_DONTROUTE · 6d814b14
      yupeng authored
      [ Upstream commit 0fbe82e6 ]
      
      after set SO_DONTROUTE to 1, the IP layer should not route packets if
      the dest IP address is not in link scope. But if the socket has cached
      the dst_entry, such packets would be routed until the sk_dst_cache
      expires. So we should clean the sk_dst_cache when a user set
      SO_DONTROUTE option. Below are server/client python scripts which
      could reprodue this issue:
      
      server side code:
      
      ==========================================================================
      import socket
      import struct
      import time
      
      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      s.bind(('0.0.0.0', 9000))
      s.listen(1)
      sock, addr = s.accept()
      sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1))
      while True:
          sock.send(b'foo')
          time.sleep(1)
      ==========================================================================
      
      client side code:
      ==========================================================================
      import socket
      import time
      
      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      s.connect(('server_address', 9000))
      while True:
          data = s.recv(1024)
          print(data)
      ==========================================================================
      Signed-off-by: default avataryupeng <yupeng0921@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d814b14
    • Nathan Chancellor's avatar
      media: firewire: Fix app_info parameter type in avc_ca{,_app}_info · bb457a47
      Nathan Chancellor authored
      [ Upstream commit b2e9a4ed ]
      
      Clang warns:
      
      drivers/media/firewire/firedtv-avc.c:999:45: warning: implicit
      conversion from 'int' to 'char' changes value from 159 to -97
      [-Wconstant-conversion]
              app_info[0] = (EN50221_TAG_APP_INFO >> 16) & 0xff;
                          ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
      drivers/media/firewire/firedtv-avc.c:1000:45: warning: implicit
      conversion from 'int' to 'char' changes value from 128 to -128
      [-Wconstant-conversion]
              app_info[1] = (EN50221_TAG_APP_INFO >>  8) & 0xff;
                          ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
      drivers/media/firewire/firedtv-avc.c:1040:44: warning: implicit
      conversion from 'int' to 'char' changes value from 159 to -97
      [-Wconstant-conversion]
              app_info[0] = (EN50221_TAG_CA_INFO >> 16) & 0xff;
                          ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
      drivers/media/firewire/firedtv-avc.c:1041:44: warning: implicit
      conversion from 'int' to 'char' changes value from 128 to -128
      [-Wconstant-conversion]
              app_info[1] = (EN50221_TAG_CA_INFO >>  8) & 0xff;
                          ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
      4 warnings generated.
      
      Change app_info's type to unsigned char to match the type of the
      member msg in struct ca_msg, which is the only thing passed into the
      app_info parameter in this function.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/105Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bb457a47
    • Breno Leitao's avatar
      powerpc/pseries/cpuidle: Fix preempt warning · dc21489d
      Breno Leitao authored
      [ Upstream commit 2b038cbc ]
      
      When booting a pseries kernel with PREEMPT enabled, it dumps the
      following warning:
      
         BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
         caller is pseries_processor_idle_init+0x5c/0x22c
         CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828
         Call Trace:
         [c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable)
         [c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160
         [c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c
         [c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300
         [c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500
         [c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160
         [c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c
      
      This happens because the code calls get_lppaca() which calls
      get_paca() and it checks if preemption is disabled through
      check_preemption_disabled().
      
      Preemption should be disabled because the per CPU variable may make no
      sense if there is a preemption (and a CPU switch) after it reads the
      per CPU data and when it is used.
      
      In this device driver specifically, it is not a problem, because this
      code just needs to have access to one lppaca struct, and it does not
      matter if it is the current per CPU lppaca struct or not (i.e. when
      there is a preemption and a CPU migration).
      
      That said, the most appropriate fix seems to be related to avoiding
      the debug_smp_processor_id() call at get_paca(), instead of calling
      preempt_disable() before get_paca().
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc21489d
    • Breno Leitao's avatar
      powerpc/xmon: Fix invocation inside lock region · 8117508a
      Breno Leitao authored
      [ Upstream commit 8d4a8622 ]
      
      Currently xmon needs to get devtree_lock (through rtas_token()) during its
      invocation (at crash time). If there is a crash while devtree_lock is being
      held, then xmon tries to get the lock but spins forever and never get into
      the interactive debugger, as in the following case:
      
      	int *ptr = NULL;
      	raw_spin_lock_irqsave(&devtree_lock, flags);
      	*ptr = 0xdeadbeef;
      
      This patch avoids calling rtas_token(), thus trying to get the same lock,
      at crash time. This new mechanism proposes getting the token at
      initialization time (xmon_init()) and just consuming it at crash time.
      
      This would allow xmon to be possible invoked independent of devtree_lock
      being held or not.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarThiago Jung Bauermann <bauerman@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8117508a
    • Joel Fernandes (Google)'s avatar
      pstore/ram: Do not treat empty buffers as valid · fffdbf58
      Joel Fernandes (Google) authored
      [ Upstream commit 30696378 ]
      
      The ramoops backend currently calls persistent_ram_save_old() even
      if a buffer is empty. While this appears to work, it is does not seem
      like the right thing to do and could lead to future bugs so lets avoid
      that. It also prevents misleading prints in the logs which claim the
      buffer is valid.
      
      I got something like:
      
      	found existing buffer, size 0, start 0
      
      When I was expecting:
      
      	no valid data in buffer (sig = ...)
      
      This bails out early (and reports with pr_debug()), since it's an
      acceptable state.
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Co-developed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fffdbf58
    • Daniel Santos's avatar
      jffs2: Fix use of uninitialized delayed_work, lockdep breakage · 93611460
      Daniel Santos authored
      [ Upstream commit a788c527 ]
      
      jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER
      is defined then a write buffer is available and has been initialized.
      However, this does is not the case when the mtd device has no
      out-of-band buffer:
      
      int jffs2_nand_flash_setup(struct jffs2_sb_info *c)
      {
              if (!c->mtd->oobsize)
                      return 0;
      ...
      
      The resulting call to cancel_delayed_work_sync passing a uninitialized
      (but zeroed) delayed_work struct forces lockdep to become disabled.
      
      [   90.050639] overlayfs: upper fs does not support tmpfile.
      [   90.652264] INFO: trying to register non-static key.
      [   90.662171] the code is fine but needs lockdep annotation.
      [   90.673090] turning off the locking correctness validator.
      [   90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0
      [   90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7
      [   90.713349]         80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff
      [   90.730020]         00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420
      [   90.746690]         6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90
      [   90.763362]         80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000
      [   90.780033]         ...
      [   90.784902] Call Trace:
      [   90.789793] [<8000f0d8>] show_stack+0xb8/0x148
      [   90.798659] [<8005a000>] register_lock_class+0x270/0x55c
      [   90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c
      [   90.818964] [<8005e314>] lock_acquire+0x194/0x1dc
      [   90.828345] [<8003f27c>] flush_work+0x200/0x24c
      [   90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210
      [   90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54
      [   90.857173] [<80125cf4>] iterate_supers+0xf4/0x120
      [   90.866729] [<80158fc4>] sys_sync+0x44/0x9c
      [   90.875067] [<80014424>] syscall_common+0x34/0x58
      Signed-off-by: default avatarDaniel Santos <daniel.santos@pobox.com>
      Reviewed-by: default avatarHou Tao <houtao1@huawei.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      93611460
    • Chuck Lever's avatar
      rxe: IB_WR_REG_MR does not capture MR's iova field · 6b325359
      Chuck Lever authored
      [ Upstream commit b024dd0e ]
      
      FRWR memory registration is done with a series of calls and WRs.
      1. ULP invokes ib_dma_map_sg()
      2. ULP invokes ib_map_mr_sg()
      3. ULP posts an IB_WR_REG_MR on the Send queue
      
      Step 2 generates an iova. It is permissible for ULPs to change this
      iova (with certain restrictions) between steps 2 and 3.
      
      rxe_map_mr_sg captures the MR's iova but later when rxe processes the
      REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova
      after step 2 but before step 3, rxe never captures that change.
      
      When the remote sends an RDMA Read targeting that MR, rxe looks up the
      R_key, but the altered iova does not match the iova stored in the MR,
      causing the RDMA Read request to fail.
      Reported-by: default avatarAnna Schumaker <schumaker.anna@gmail.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6b325359
    • Ondrej Mosnacek's avatar
      selinux: always allow mounting submounts · 62044cba
      Ondrej Mosnacek authored
      [ Upstream commit 2cbdcb88 ]
      
      If a superblock has the MS_SUBMOUNT flag set, we should always allow
      mounting it. These mounts are done automatically by the kernel either as
      part of mounting some parent mount (e.g. debugfs always mounts tracefs
      under "tracing" for compatibility) or they are mounted automatically as
      needed on subdirectory accesses (e.g. NFS crossmnt mounts). Since such
      automounts are either an implicit consequence of the parent mount (which
      is already checked) or they can happen during regular accesses (where it
      doesn't make sense to check against the current task's context), the
      mount permission check should be skipped for them.
      
      Without this patch, attempts to access contents of an automounted
      directory can cause unexpected SELinux denials.
      
      In the current kernel tree, the MS_SUBMOUNT flag is set only via
      vfs_submount(), which is called only from the following places:
       - AFS, when automounting special "symlinks" referencing other cells
       - CIFS, when automounting "referrals"
       - NFS, when automounting subtrees
       - debugfs, when automounting tracefs
      
      In all cases the submounts are meant to be transparent to the user and
      it makes sense that if mounting the master is allowed, then so should be
      the automounts. Note that CAP_SYS_ADMIN capability checking is already
      skipped for (SB_KERNMOUNT|SB_SUBMOUNT) in:
       - sget_userns() in fs/super.c:
      	if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) &&
      	    !(type->fs_flags & FS_USERNS_MOUNT) &&
      	    !capable(CAP_SYS_ADMIN))
      		return ERR_PTR(-EPERM);
       - sget() in fs/super.c:
              /* Ensure the requestor has permissions over the target filesystem */
              if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !ns_capable(user_ns, CAP_SYS_ADMIN))
                      return ERR_PTR(-EPERM);
      
      Verified internally on patched RHEL 7.6 with a reproducer using
      NFS+httpd and selinux-tesuite.
      
      Fixes: 93faccbb ("fs: Better permission checking for submounts")
      Signed-off-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62044cba
    • Anders Roxell's avatar
      arm64: perf: set suppress_bind_attrs flag to true · bd37f21e
      Anders Roxell authored
      [ Upstream commit 81e9fa8b ]
      
      The armv8_pmuv3 driver doesn't have a remove function, and when the test
      'CONFIG_DEBUG_TEST_DRIVER_REMOVE=y' is enabled, the following Call trace
      can be seen.
      
      [    1.424287] Failed to register pmu: armv8_pmuv3, reason -17
      [    1.424870] WARNING: CPU: 0 PID: 1 at ../kernel/events/core.c:11771 perf_event_sysfs_init+0x98/0xdc
      [    1.425220] Modules linked in:
      [    1.425531] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         4.19.0-rc7-next-20181012-00003-ge7a97b1ad77b-dirty #35
      [    1.425951] Hardware name: linux,dummy-virt (DT)
      [    1.426212] pstate: 80000005 (Nzcv daif -PAN -UAO)
      [    1.426458] pc : perf_event_sysfs_init+0x98/0xdc
      [    1.426720] lr : perf_event_sysfs_init+0x98/0xdc
      [    1.426908] sp : ffff00000804bd50
      [    1.427077] x29: ffff00000804bd50 x28: ffff00000934e078
      [    1.427429] x27: ffff000009546000 x26: 0000000000000007
      [    1.427757] x25: ffff000009280710 x24: 00000000ffffffef
      [    1.428086] x23: ffff000009408000 x22: 0000000000000000
      [    1.428415] x21: ffff000009136008 x20: ffff000009408730
      [    1.428744] x19: ffff80007b20b400 x18: 000000000000000a
      [    1.429075] x17: 0000000000000000 x16: 0000000000000000
      [    1.429418] x15: 0000000000000400 x14: 2e79726f74636572
      [    1.429748] x13: 696420656d617320 x12: 656874206e692065
      [    1.430060] x11: 6d616e20656d6173 x10: 2065687420687469
      [    1.430335] x9 : ffff00000804bd50 x8 : 206e6f7361657220
      [    1.430610] x7 : 2c3376756d705f38 x6 : ffff00000954d7ce
      [    1.430880] x5 : 0000000000000000 x4 : 0000000000000000
      [    1.431226] x3 : 0000000000000000 x2 : ffffffffffffffff
      [    1.431554] x1 : 4d151327adc50b00 x0 : 0000000000000000
      [    1.431868] Call trace:
      [    1.432102]  perf_event_sysfs_init+0x98/0xdc
      [    1.432382]  do_one_initcall+0x6c/0x1a8
      [    1.432637]  kernel_init_freeable+0x1bc/0x280
      [    1.432905]  kernel_init+0x18/0x160
      [    1.433115]  ret_from_fork+0x10/0x18
      [    1.433297] ---[ end trace 27fd415390eb9883 ]---
      
      Rework to set suppress_bind_attrs flag to avoid removing the device when
      CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, since there's no real reason to
      remove the armv8_pmuv3 driver.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Co-developed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bd37f21e
    • Maciej W. Rozycki's avatar
      MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur · 8ac4ad06
      Maciej W. Rozycki authored
      [ Upstream commit e4849aff ]
      
      The Broadcom SiByte BCM1250, BCM1125, and BCM1125H SOCs have an onchip
      DRAM controller that supports memory amounts of up to 16GiB, and due to
      how the address decoder has been wired in the SOC any memory beyond 1GiB
      is actually mapped starting from 4GiB physical up, that is beyond the
      32-bit addressable limit[1].  Consequently if the maximum amount of
      memory has been installed, then it will span up to 19GiB.
      
      Many of the evaluation boards we support that are based on one of these
      SOCs have their memory soldered and the amount present fits in the
      32-bit address range.  The BCM91250A SWARM board however has actual DIMM
      slots and accepts, depending on the peripherals revision of the SOC, up
      to 4GiB or 8GiB of memory in commercially available JEDEC modules[2].
      I believe this is also the case with the BCM91250C2 LittleSur board.
      This means that up to either 3GiB or 7GiB of memory requires 64-bit
      addressing to access.
      
      I believe the BCM91480B BigSur board, which has the BCM1480 SOC instead,
      accepts at least as much memory, although I have no documentation or
      actual hardware available to verify that.
      
      Both systems have PCI slots installed for use by any PCI option boards,
      including ones that only support 32-bit addressing (additionally the
      32-bit PCI host bridge of the BCM1250, BCM1125, and BCM1125H SOCs limits
      addressing to 32-bits), and there is no IOMMU available.  Therefore for
      PCI DMA to work in the presence of memory beyond enable swiotlb for the
      affected systems.
      
      All the other SOC onchip DMA devices use 40-bit addressing and therefore
      can address the whole memory, so only enable swiotlb if PCI support and
      support for DMA beyond 4GiB have been both enabled in the configuration
      of the kernel.
      
      This shows up as follows:
      
      Broadcom SiByte BCM1250 B2 @ 800 MHz (SB1 rev 2)
      Board type: SiByte BCM91250A (SWARM)
      Determined physical RAM map:
       memory: 000000000fe7fe00 @ 0000000000000000 (usable)
       memory: 000000001ffffe00 @ 0000000080000000 (usable)
       memory: 000000000ffffe00 @ 00000000c0000000 (usable)
       memory: 0000000087fffe00 @ 0000000100000000 (usable)
      software IO TLB: mapped [mem 0xcbffc000-0xcfffc000] (64MB)
      
      in the bootstrap log and removes failures like these:
      
      defxx 0000:02:00.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0
      fddi0: Receive buffer allocation failed
      fddi0: Adapter open failed!
      IP-Config: Failed to open fddi0
      defxx 0000:09:08.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0
      fddi1: Receive buffer allocation failed
      fddi1: Adapter open failed!
      IP-Config: Failed to open fddi1
      
      when memory beyond 4GiB is handed out to devices that can only do 32-bit
      addressing.
      
      This updates commit cce335ae ("[MIPS] 64-bit Sibyte kernels need
      DMA32.").
      
      References:
      
      [1] "BCM1250/BCM1125/BCM1125H User Manual", Revision 1250_1125-UM100-R,
          Broadcom Corporation, 21 Oct 2002, Section 3: "System Overview",
          "Memory Map", pp. 34-38
      
      [2] "BCM91250A User Manual", Revision 91250A-UM100-R, Broadcom
          Corporation, 18 May 2004, Section 3: "Physical Description",
          "Supported DRAM", p. 23
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      [paul.burton@mips.com: Remove GPL text from dma.c; SPDX tag covers it]
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Patchwork: https://patchwork.linux-mips.org/patch/21108/
      References: cce335ae ("[MIPS] 64-bit Sibyte kernels need DMA32.")
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8ac4ad06
    • Takashi Sakamoto's avatar
      ALSA: oxfw: add support for APOGEE duet FireWire · 1871002d
      Takashi Sakamoto authored
      [ Upstream commit fba43f45 ]
      
      This commit adds support for APOGEE duet FireWire, launched 2007, already
      discontinued. This model uses Oxford Semiconductor FW971 as its
      communication engine. Below is information on Configuration ROM of this
      unit. The unit supports some AV/C commands defined by Audio subunit
      specification and vendor dependent commands.
      
      $ ./hinawa-config-rom-printer /dev/fw1
      { 'bus-info': { 'adj': False,
                      'bmc': False,
                      'chip_ID': 42949742248,
                      'cmc': False,
                      'cyc_clk_acc': 255,
                      'generation': 0,
                      'imc': False,
                      'isc': True,
                      'link_spd': 3,
                      'max_ROM': 0,
                      'max_rec': 64,
                      'name': '1394',
                      'node_vendor_ID': 987,
                      'pmc': False},
        'root-directory': [ ['VENDOR', 987],
                            ['DESCRIPTOR', 'Apogee Electronics'],
                            ['MODEL', 122333],
                            ['DESCRIPTOR', 'Duet'],
                            [ 'NODE_CAPABILITIES',
                              { 'addressing': {'64': True, 'fix': True, 'prv': False},
                                'misc': {'int': False, 'ms': False, 'spt': True},
                                'state': { 'atn': False,
                                           'ded': False,
                                           'drq': True,
                                           'elo': False,
                                           'init': False,
                                           'lst': True,
                                           'off': False},
                                'testing': {'bas': False, 'ext': False}}],
                            [ 'UNIT',
                              [ ['SPECIFIER_ID', 41005],
                                ['VERSION', 65537],
                                ['MODEL', 122333],
                                ['DESCRIPTOR', 'Duet']]]]}
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1871002d
    • Anders Roxell's avatar
      serial: set suppress_bind_attrs flag only if builtin · e9770160
      Anders Roxell authored
      [ Upstream commit 64609794 ]
      
      When the test 'CONFIG_DEBUG_TEST_DRIVER_REMOVE=y' is enabled,
      arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init).
      devtmpfs gets killed because we try to remove a file and decrement the
      wb reference count before the noop_backing_device_info gets initialized.
      
      [    0.332075] Serial: AMBA PL011 UART driver
      [    0.485276] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39, base_baud = 0) is a PL011 rev1
      [    0.502382] console [ttyAMA0] enabled
      [    0.515710] Unable to handle kernel paging request at virtual address 0000800074c12000
      [    0.516053] Mem abort info:
      [    0.516222]   ESR = 0x96000004
      [    0.516417]   Exception class = DABT (current EL), IL = 32 bits
      [    0.516641]   SET = 0, FnV = 0
      [    0.516826]   EA = 0, S1PTW = 0
      [    0.516984] Data abort info:
      [    0.517149]   ISV = 0, ISS = 0x00000004
      [    0.517339]   CM = 0, WnR = 0
      [    0.517553] [0000800074c12000] user address but active_mm is swapper
      [    0.517928] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [    0.518305] Modules linked in:
      [    0.518839] CPU: 0 PID: 13 Comm: kdevtmpfs Not tainted 4.19.0-rc5-next-20180928-00002-g2ba39ab0cd01-dirty #82
      [    0.519307] Hardware name: linux,dummy-virt (DT)
      [    0.519681] pstate: 80000005 (Nzcv daif -PAN -UAO)
      [    0.519959] pc : __destroy_inode+0x94/0x2a8
      [    0.520212] lr : __destroy_inode+0x78/0x2a8
      [    0.520401] sp : ffff0000098c3b20
      [    0.520590] x29: ffff0000098c3b20 x28: 00000000087a3714
      [    0.520904] x27: 0000000000002000 x26: 0000000000002000
      [    0.521179] x25: ffff000009583000 x24: 0000000000000000
      [    0.521467] x23: ffff80007bb52000 x22: ffff80007bbaa7c0
      [    0.521737] x21: ffff0000093f9338 x20: 0000000000000000
      [    0.522033] x19: ffff80007bbb05d8 x18: 0000000000000400
      [    0.522376] x17: 0000000000000000 x16: 0000000000000000
      [    0.522727] x15: 0000000000000400 x14: 0000000000000400
      [    0.523068] x13: 0000000000000001 x12: 0000000000000001
      [    0.523421] x11: 0000000000000000 x10: 0000000000000970
      [    0.523749] x9 : ffff0000098c3a60 x8 : ffff80007bbab190
      [    0.524017] x7 : ffff80007bbaa880 x6 : 0000000000000c88
      [    0.524305] x5 : ffff0000093d96c8 x4 : 61c8864680b583eb
      [    0.524567] x3 : ffff0000093d6180 x2 : ffffffffffffffff
      [    0.524872] x1 : 0000800074c12000 x0 : 0000800074c12000
      [    0.525207] Process kdevtmpfs (pid: 13, stack limit = 0x(____ptrval____))
      [    0.525529] Call trace:
      [    0.525806]  __destroy_inode+0x94/0x2a8
      [    0.526108]  destroy_inode+0x34/0x88
      [    0.526370]  evict+0x144/0x1c8
      [    0.526636]  iput+0x184/0x230
      [    0.526871]  dentry_unlink_inode+0x118/0x130
      [    0.527152]  d_delete+0xd8/0xe0
      [    0.527420]  vfs_unlink+0x240/0x270
      [    0.527665]  handle_remove+0x1d8/0x330
      [    0.527875]  devtmpfsd+0x138/0x1c8
      [    0.528085]  kthread+0x14c/0x158
      [    0.528291]  ret_from_fork+0x10/0x18
      [    0.528720] Code: 92800002 aa1403e0 d538d081 8b010000 (c85f7c04)
      [    0.529367] ---[ end trace 5a3dee47727f877c ]---
      
      Rework to set suppress_bind_attrs flag to avoid removing the device when
      CONFIG_DEBUG_TEST_DRIVER_REMOVE=y. This applies for pic32_uart and
      xilinx_uartps as well.
      Co-developed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e9770160
    • Anders Roxell's avatar
      writeback: don't decrement wb->refcnt if !wb->bdi · db16eb55
      Anders Roxell authored
      [ Upstream commit 347a28b5 ]
      
      This happened while running in qemu-system-aarch64, the AMBA PL011 UART
      driver when enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE.
      arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init),
      devtmpfs' handle_remove() crashes because the reference count is a NULL
      pointer only because wb->bdi hasn't been initialized yet.
      
      Rework so that wb_put have an extra check if wb->bdi before decrement
      wb->refcnt and also add a WARN_ON_ONCE to get a warning if it happens again
      in other drivers.
      
      Fixes: 52ebea74 ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
      Co-developed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      db16eb55
    • Miroslav Lichvar's avatar
      e1000e: allow non-monotonic SYSTIM readings · de390c26
      Miroslav Lichvar authored
      [ Upstream commit e1f65b0d ]
      
      It seems with some NICs supported by the e1000e driver a SYSTIM reading
      may occasionally be few microseconds before the previous reading and if
      enabled also pass e1000e_sanitize_systim() without reaching the maximum
      number of rereads, even if the function is modified to check three
      consecutive readings (i.e. it doesn't look like a double read error).
      This causes an underflow in the timecounter and the PHC time jumps hours
      ahead.
      
      This was observed on 82574, I217 and I219. The fastest way to reproduce
      it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl
      on the PHC.
      
      Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of
      timecounter_read() in order to allow non-monotonic SYSTIM readings and
      prevent the PHC from jumping.
      
      Cc: Richard Cochran <richardcochran@gmail.com>
      Signed-off-by: default avatarMiroslav Lichvar <mlichvar@redhat.com>
      Acked-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      de390c26
    • João Paulo Rechi Vita's avatar
      platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey · dfc711a4
      João Paulo Rechi Vita authored
      [ Upstream commit 78f3ac76 ]
      
      In the past, Asus firmwares would change the panel backlight directly
      through the EC when the display off hotkey (Fn+F7) was pressed, and
      only notify the OS of such change, with 0x33 when the LCD was ON and
      0x34 when the LCD was OFF. These are currently mapped to
      KEY_DISPLAYTOGGLE and KEY_DISPLAY_OFF, respectively.
      
      Most recently the EC on Asus most machines lost ability to toggle the
      LCD backlight directly, but unless the OS informs the firmware it is
      going to handle the display toggle hotkey events, the firmware still
      tries change the brightness through the EC, to no effect. The end result
      is a long list (at Endless we counted 11) of Asus laptop models where
      the display toggle hotkey does not perform any action. Our firmware
      engineers contacts at Asus were surprised that there were still machines
      out there with the old behavior.
      
      Calling WMNB(ASUS_WMI_DEVID_BACKLIGHT==0x00050011, 2) on the _WDG device
      tells the firmware that it should let the OS handle the display toggle
      event, in which case it will simply notify the OS of a key press with
      0x35, as shown by the DSDT excerpts bellow.
      
       Scope (_SB)
       {
           (...)
      
           Device (ATKD)
           {
               (...)
      
               Name (_WDG, Buffer (0x28)
               {
                   /* 0000 */  0xD0, 0x5E, 0x84, 0x97, 0x6D, 0x4E, 0xDE, 0x11,
                   /* 0008 */  0x8A, 0x39, 0x08, 0x00, 0x20, 0x0C, 0x9A, 0x66,
                   /* 0010 */  0x4E, 0x42, 0x01, 0x02, 0x35, 0xBB, 0x3C, 0x0B,
                   /* 0018 */  0xC2, 0xE3, 0xED, 0x45, 0x91, 0xC2, 0x4C, 0x5A,
                   /* 0020 */  0x6D, 0x19, 0x5D, 0x1C, 0xFF, 0x00, 0x01, 0x08
               })
               Method (WMNB, 3, Serialized)
               {
                   CreateDWordField (Arg2, Zero, IIA0)
                   CreateDWordField (Arg2, 0x04, IIA1)
                   Local0 = (Arg1 & 0xFFFFFFFF)
      
                   (...)
      
                   If ((Local0 == 0x53564544))
                   {
                       (...)
      
                       If ((IIA0 == 0x00050011))
                       {
                           If ((IIA1 == 0x02))
                           {
                               ^^PCI0.SBRG.EC0.SPIN (0x72, One)
                               ^^PCI0.SBRG.EC0.BLCT = One
                           }
      
                           Return (One)
                       }
                   }
                   (...)
               }
               (...)
           }
           (...)
       }
       (...)
      
       Scope (_SB.PCI0.SBRG.EC0)
       {
           (...)
      
           Name (BLCT, Zero)
      
           (...)
      
           Method (_Q10, 0, NotSerialized)  // _Qxx: EC Query
           {
               If ((BLCT == Zero))
               {
                   Local0 = One
                   Local0 = RPIN (0x72)
                   Local0 ^= One
                   SPIN (0x72, Local0)
                   If (ATKP)
                   {
                       Local0 = (0x34 - Local0)
                       ^^^^ATKD.IANE (Local0)
                   }
               }
               ElseIf ((BLCT == One))
               {
                   If (ATKP)
                   {
                       ^^^^ATKD.IANE (0x35)
                   }
               }
           }
           (...)
       }
      Signed-off-by: default avatarJoão Paulo Rechi Vita <jprvita@endlessm.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dfc711a4
    • David Ahern's avatar
      ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses · 2af7450e
      David Ahern authored
      [ Upstream commit d4a7e9bb ]
      
      I realized the last patch calls dev_get_by_index_rcu in a branch not
      holding the rcu lock. Add the calls to rcu_read_lock and rcu_read_unlock.
      
      Fixes: ec90ad33 ("ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2af7450e
    • David Ahern's avatar
      ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address · dbbbd01e
      David Ahern authored
      [ Upstream commit ec90ad33 ]
      
      Similar to c5ee0663 ("ipv6: Consider sk_bound_dev_if when binding a
      socket to an address"), binding a socket to v4 mapped addresses needs to
      consider if the socket is bound to a device.
      
      This problem also exists from the beginning of git history.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbbbd01e
    • Kai-Heng Feng's avatar
      r8169: Add support for new Realtek Ethernet · eb0e073f
      Kai-Heng Feng authored
      [ Upstream commit 36352991 ]
      
      There are two new Realtek Ethernet devices which are re-branded r8168h.
      Add the IDs to to support them.
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Reviewed-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb0e073f
  2. 23 Jan, 2019 7 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.152 · ef50e305
      Greg Kroah-Hartman authored
      ef50e305
    • Jan Kara's avatar
      nbd: Use set_blocksize() to set device blocksize · f3fc8899
      Jan Kara authored
      commit c8a83a6b upstream.
      
      NBD can update block device block size implicitely through
      bd_set_size(). Make it explicitely set blocksize with set_blocksize() as
      this behavior of bd_set_size() is going away.
      
      CC: Josef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3fc8899
    • Josef Bacik's avatar
      nbd: set the logical and physical blocksize properly · eb108751
      Josef Bacik authored
      commit e544541b upstream.
      
      We noticed when trying to do O_DIRECT to an export on the server side
      that we were getting requests smaller than the 4k sectorsize of the
      device.  This is because the client isn't setting the logical and
      physical blocksizes properly for the underlying device.  Fix this up by
      setting the queue blocksizes and then calling bd_set_size.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb108751
    • Mauro Carvalho Chehab's avatar
      media: vb2: be sure to unlock mutex on errors · ef32aca7
      Mauro Carvalho Chehab authored
      commit c06ef2e9 upstream.
      
      As reported by smatch:
      drivers/media/common/videobuf2/videobuf2-core.c: drivers/media/common/videobuf2/videobuf2-core.c:2159 vb2_mmap() warn: inconsistent returns 'mutex:&q->mmap_lock'.
        Locked on:   line 2148
        Unlocked on: line 2100
                     line 2108
                     line 2113
                     line 2118
                     line 2156
                     line 2159
      
      There is one error condition that doesn't unlock a mutex.
      
      Fixes: cd26d1c4 ("media: vb2: vb2_mmap: move lock up")
      Reviewed-by: default avatarHans Verkuil <hverkuil-cisco@xs4all.nl>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef32aca7
    • Michal Hocko's avatar
      mm, memcg: fix reclaim deadlock with writeback · 5cf3e5ff
      Michal Hocko authored
      commit 63f3655f upstream.
      
      Liu Bo has experienced a deadlock between memcg (legacy) reclaim and the
      ext4 writeback
      
        task1:
          wait_on_page_bit+0x82/0xa0
          shrink_page_list+0x907/0x960
          shrink_inactive_list+0x2c7/0x680
          shrink_node_memcg+0x404/0x830
          shrink_node+0xd8/0x300
          do_try_to_free_pages+0x10d/0x330
          try_to_free_mem_cgroup_pages+0xd5/0x1b0
          try_charge+0x14d/0x720
          memcg_kmem_charge_memcg+0x3c/0xa0
          memcg_kmem_charge+0x7e/0xd0
          __alloc_pages_nodemask+0x178/0x260
          alloc_pages_current+0x95/0x140
          pte_alloc_one+0x17/0x40
          __pte_alloc+0x1e/0x110
          alloc_set_pte+0x5fe/0xc20
          do_fault+0x103/0x970
          handle_mm_fault+0x61e/0xd10
          __do_page_fault+0x252/0x4d0
          do_page_fault+0x30/0x80
          page_fault+0x28/0x30
      
        task2:
          __lock_page+0x86/0xa0
          mpage_prepare_extent_to_map+0x2e7/0x310 [ext4]
          ext4_writepages+0x479/0xd60
          do_writepages+0x1e/0x30
          __writeback_single_inode+0x45/0x320
          writeback_sb_inodes+0x272/0x600
          __writeback_inodes_wb+0x92/0xc0
          wb_writeback+0x268/0x300
          wb_workfn+0xb4/0x390
          process_one_work+0x189/0x420
          worker_thread+0x4e/0x4b0
          kthread+0xe6/0x100
          ret_from_fork+0x41/0x50
      
      He adds
       "task1 is waiting for the PageWriteback bit of the page that task2 has
        collected in mpd->io_submit->io_bio, and tasks2 is waiting for the
        LOCKED bit the page which tasks1 has locked"
      
      More precisely task1 is handling a page fault and it has a page locked
      while it charges a new page table to a memcg.  That in turn hits a
      memory limit reclaim and the memcg reclaim for legacy controller is
      waiting on the writeback but that is never going to finish because the
      writeback itself is waiting for the page locked in the #PF path.  So
      this is essentially ABBA deadlock:
      
                                              lock_page(A)
                                              SetPageWriteback(A)
                                              unlock_page(A)
        lock_page(B)
                                              lock_page(B)
        pte_alloc_pne
          shrink_page_list
            wait_on_page_writeback(A)
                                              SetPageWriteback(B)
                                              unlock_page(B)
      
                                              # flush A, B to clear the writeback
      
      This accumulating of more pages to flush is used by several filesystems
      to generate a more optimal IO patterns.
      
      Waiting for the writeback in legacy memcg controller is a workaround for
      pre-mature OOM killer invocations because there is no dirty IO
      throttling available for the controller.  There is no easy way around
      that unfortunately.  Therefore fix this specific issue by pre-allocating
      the page table outside of the page lock.  We have that handy
      infrastructure for that already so simply reuse the fault-around pattern
      which already does this.
      
      There are probably other hidden __GFP_ACCOUNT | GFP_KERNEL allocations
      from under a fs page locked but they should be really rare.  I am not
      aware of a better solution unfortunately.
      
      [akpm@linux-foundation.org: fix mm/memory.c:__do_fault()]
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@kernel.org: enhance comment, per Johannes]
        Link: http://lkml.kernel.org/r/20181214084948.GA5624@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20181213092221.27270-1-mhocko@kernel.org
      Fixes: c3b94f44 ("memcg: further prevent OOM with too many dirty pages")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Debugged-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cf3e5ff
    • Ivan Mironov's avatar
      drm/fb-helper: Ignore the value of fb_var_screeninfo.pixclock · a5a0bcbc
      Ivan Mironov authored
      commit 66a8d5bf upstream.
      
      Strict requirement of pixclock to be zero breaks support of SDL 1.2
      which contains hardcoded table of supported video modes with non-zero
      pixclock values[1].
      
      To better understand which pixclock values are considered valid and how
      driver should handle these values, I briefly examined few existing fbdev
      drivers and documentation in Documentation/fb/. And it looks like there
      are no strict rules on that and actual behaviour varies:
      
      	* some drivers treat (pixclock == 0) as "use defaults" (uvesafb.c);
      	* some treat (pixclock == 0) as invalid value which leads to
      	  -EINVAL (clps711x-fb.c);
      	* some pass converted pixclock value to hardware (uvesafb.c);
      	* some are trying to find nearest value from predefined table
                (vga16fb.c, video_gx.c).
      
      Given this, I believe that it should be safe to just ignore this value if
      changing is not supported. It seems that any portable fbdev application
      which was not written only for one specific device working under one
      specific kernel version should not rely on any particular behaviour of
      pixclock anyway.
      
      However, while enabling SDL1 applications to work out of the box when
      there is no /etc/fb.modes with valid settings, this change affects the
      video mode choosing logic in SDL. Depending on current screen
      resolution, contents of /etc/fb.modes and resolution requested by
      application, this may lead to user-visible difference (not always):
      image will be displayed in a right way, but it will be aligned to the
      left instead of center. There is no "right behaviour" here as well, as
      emulated fbdev, opposing to old fbdev drivers, simply ignores any
      requsts of video mode changes with resolutions smaller than current.
      
      The easiest way to reproduce this problem is to install sdl-sopwith[2],
      remove /etc/fb.modes file if it exists, and then try to run sopwith
      from console without X. At least in Fedora 29, sopwith may be simply
      installed from standard repositories.
      
      [1] SDL 1.2.15 source code, src/video/fbcon/SDL_fbvideo.c, vesa_timings
      [2] http://sdl-sopwith.sourceforge.net/Signed-off-by: default avatarIvan Mironov <mironov.ivan@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: 79e53945 ("DRM: i915: add mode setting support")
      Fixes: 771fe6b9 ("drm/radeon: introduce kernel modesetting for radeon hardware")
      Fixes: 785b93ef ("drm/kms: move driver specific fb common code to helper functions (v2)")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190108072353.28078-3-mironov.ivan@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      a5a0bcbc
    • Tetsuo Handa's avatar
      loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl() · 5d3cf501
      Tetsuo Handa authored
      commit 628bd859 upstream.
      
      Commit 0a42e99b ("loop: Get rid of loop_index_mutex") forgot to
      remove mutex_unlock(&loop_ctl_mutex) from loop_control_ioctl() when
      replacing loop_index_mutex with loop_ctl_mutex.
      
      Fixes: 0a42e99b ("loop: Get rid of loop_index_mutex")
      Reported-by: default avatarsyzbot <syzbot+c0138741c2290fc5e63f@syzkaller.appspotmail.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d3cf501