1. 20 Aug, 2020 2 commits
    • Pavel Begunkov's avatar
      io_uring: comment on kfree(iovec) checks · f261c168
      Pavel Begunkov authored
      kfree() handles NULL pointers well, but io_{read,write}() checks it
      because of performance reasons. Leave a comment there for those who are
      tempted to patch it.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f261c168
    • Pavel Begunkov's avatar
      io_uring: fix racy req->flags modification · bb175342
      Pavel Begunkov authored
      Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and
      io_cqring_overflow_flush() are racy, because they might be called
      asynchronously.
      
      REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can
      be guaranteed that requests _currently_ marked inflight can't be
      overflown, the problem will be solved with removing the flag
      altogether.
      
      That's how the patch works, it removes inflight status of a request
      in io_cqring_fill_event() whenever it should be thrown into CQ-overflow
      list. That's Ok to do, because no opcode specific handling can be done
      after io_cqring_fill_event(), the same assumption as with "struct
      io_completion" patches.
      And it already have a good place for such cleanups, which is
      io_clean_op(). A nice side effect of this is removing this inflight
      check from the hot path.
      
      note on synchronisation: now __io_cqring_fill_event() may be taking two
      spinlocks simultaneously, completion_lock and inflight_lock. It's fine,
      because we never do that in reverse order, and CQ-overflow of inflight
      requests shouldn't happen often.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bb175342
  2. 19 Aug, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: use system_unbound_wq for ring exit work · fc666777
      Jens Axboe authored
      We currently use system_wq, which is unbounded in terms of number of
      workers. This means that if we're exiting tons of rings at the same
      time, then we'll briefly spawn tons of event kworkers just for a very
      short blocking time as the rings exit.
      
      Use system_unbound_wq instead, which has a sane cap on the concurrency
      level.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fc666777
  3. 18 Aug, 2020 1 commit
    • Jens Axboe's avatar
      io_uring: cleanup io_import_iovec() of pre-mapped request · 8452fd0c
      Jens Axboe authored
      io_rw_prep_async() goes through a dance of clearing req->io, calling
      the iovec import, then re-setting req->io. Provide an internal helper
      that does the right thing without needing state tweaked to get there.
      
      This enables further cleanups in io_read, io_write, and
      io_resubmit_prep(), but that's left for another time.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8452fd0c
  4. 16 Aug, 2020 8 commits
    • Jens Axboe's avatar
      io_uring: get rid of kiocb_wait_page_queue_init() · 3b2a4439
      Jens Axboe authored
      The 5.9 merge moved this function io_uring, which means that we don't
      need to retain the generic nature of it. Clean up this part by removing
      redundant checks, and just inlining the small remainder in
      io_rw_should_retry().
      
      No functional changes in this patch.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3b2a4439
    • Jens Axboe's avatar
      io_uring: find and cancel head link async work on files exit · b711d4ea
      Jens Axboe authored
      Commit f254ac04 ("io_uring: enable lookup of links holding inflight files")
      only handled 2 out of the three head link cases we have, we also need to
      lookup and cancel work that is blocked in io-wq if that work has a link
      that's holding a reference to the files structure.
      
      Put the "cancel head links that hold this request pending" logic into
      io_attempt_cancel(), which will to through the motions of finding and
      canceling head links that hold the current inflight files stable request
      pending.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b711d4ea
    • Linus Torvalds's avatar
      Linux 5.9-rc1 · 9123e3a7
      Linus Torvalds authored
      9123e3a7
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block · 2cc3c4b3
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A few differerent things in here.
      
        Seems like syzbot got some more io_uring bits wired up, and we got a
        handful of reports and the associated fixes are in here.
      
        General fixes too, and a lot of them marked for stable.
      
        Lastly, a bit of fallout from the async buffered reads, where we now
        more easily trigger short reads. Some applications don't really like
        that, so the io_read() code now handles short reads internally, and
        got a cleanup along the way so that it's now easier to read (and
        documented). We're now passing tests that failed before"
      
      * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block:
        io_uring: short circuit -EAGAIN for blocking read attempt
        io_uring: sanitize double poll handling
        io_uring: internally retry short reads
        io_uring: retain iov_iter state over io_read/io_write calls
        task_work: only grab task signal lock when needed
        io_uring: enable lookup of links holding inflight files
        io_uring: fail poll arm on queue proc failure
        io_uring: hold 'ctx' reference around task_work queue + execute
        fs: RWF_NOWAIT should imply IOCB_NOIO
        io_uring: defer file table grabbing request cleanup for locked requests
        io_uring: add missing REQ_F_COMP_LOCKED for nested requests
        io_uring: fix recursive completion locking on oveflow flush
        io_uring: use TWA_SIGNAL for task_work uncondtionally
        io_uring: account locked memory before potential error case
        io_uring: set ctx sq/cq entry count earlier
        io_uring: Fix NULL pointer dereference in loop_rw_iter()
        io_uring: add comments on how the async buffered read retry works
        io_uring: io_async_buf_func() need not test page bit
      2cc3c4b3
    • Mike Rapoport's avatar
      parisc: fix PMD pages allocation by restoring pmd_alloc_one() · 6f6aea7e
      Mike Rapoport authored
      Commit 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one()
      and pmd_free_one()") converted parisc to use generic version of
      pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for
      PMD.
      
      Restore the original version of pmd_alloc_one() for parisc, just use
      GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and
      memset.
      
      Fixes: 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()")
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.eeSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f6aea7e
    • Linus Torvalds's avatar
      Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block · 4b6c093e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes on the block side of things:
      
         - Discard granularity fix (Coly)
      
         - rnbd cleanups (Guoqing)
      
         - md error handling fix (Dan)
      
         - md sysfs fix (Junxiao)
      
         - Fix flush request accounting, which caused an IO slowdown for some
           configurations (Ming)
      
         - Properly propagate loop flag for partition scanning (Lennart)"
      
      * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block:
        block: fix double account of flush request's driver tag
        loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE
        rnbd: no need to set bi_end_io in rnbd_bio_map_kern
        rnbd: remove rnbd_dev_submit_io
        md-cluster: Fix potential error pointer dereference in resize_bitmaps()
        block: check queue's limits.discard_granularity in __blkdev_issue_discard()
        md: get sysfs entry after redundancy attr group create
      4b6c093e
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d84835b1
      Linus Torvalds authored
      Pull RISC-V fix from Palmer Dabbelt:
       "I collected a single fix during the merge window: we managed to break
        the early trap setup on !MMU, this fixes it"
      
      * tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Setup exception vector for nommu platform
      d84835b1
    • Linus Torvalds's avatar
      Merge tag 'sh-for-5.9' of git://git.libc.org/linux-sh · 5bbec3cf
      Linus Torvalds authored
      Pull arch/sh updates from Rich Felker:
       "Cleanup, SECCOMP_FILTER support, message printing fixes, and other
        changes to arch/sh"
      
      * tag 'sh-for-5.9' of git://git.libc.org/linux-sh: (34 commits)
        sh: landisk: Add missing initialization of sh_io_port_base
        sh: bring syscall_set_return_value in line with other architectures
        sh: Add SECCOMP_FILTER
        sh: Rearrange blocks in entry-common.S
        sh: switch to copy_thread_tls()
        sh: use the generic dma coherent remap allocator
        sh: don't allow non-coherent DMA for NOMMU
        dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig
        sh: unexport register_trapped_io and match_trapped_io_handler
        sh: don't include <asm/io_trapped.h> in <asm/io.h>
        sh: move the ioremap implementation out of line
        sh: move ioremap_fixed details out of <asm/io.h>
        sh: remove __KERNEL__ ifdefs from non-UAPI headers
        sh: sort the selects for SUPERH alphabetically
        sh: remove -Werror from Makefiles
        sh: Replace HTTP links with HTTPS ones
        arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*
        sh: stacktrace: Remove stacktrace_ops.stack()
        sh: machvec: Modernize printing of kernel messages
        sh: pci: Modernize printing of kernel messages
        ...
      5bbec3cf
  5. 15 Aug, 2020 28 commits
    • Jens Axboe's avatar
      io_uring: short circuit -EAGAIN for blocking read attempt · f91daf56
      Jens Axboe authored
      One case was missed in the short IO retry handling, and that's hitting
      -EAGAIN on a blocking attempt read (eg from io-wq context). This is a
      problem on sockets that are marked as non-blocking when created, they
      don't carry any REQ_F_NOWAIT information to help us terminate them
      instead of perpetually retrying.
      
      Fixes: 227c0c96 ("io_uring: internally retry short reads")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f91daf56
    • Jens Axboe's avatar
      io_uring: sanitize double poll handling · d4e7cd36
      Jens Axboe authored
      There's a bit of confusion on the matching pairs of poll vs double poll,
      depending on if the request is a pure poll (IORING_OP_POLL_ADD) or
      poll driven retry.
      
      Add io_poll_get_double() that returns the double poll waitqueue, if any,
      and io_poll_get_single() that returns the original poll waitqueue. With
      that, remove the argument to io_poll_remove_double().
      
      Finally ensure that wait->private is cleared once the double poll handler
      has run, so that remove knows it's already been seen.
      
      Cc: stable@vger.kernel.org # v5.8
      Reported-by: syzbot+7f617d4a9369028b8a2c@syzkaller.appspotmail.com
      Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d4e7cd36
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux · 713eee84
      Linus Torvalds authored
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
       "Fixes:
         - Fixes for 'perf bench numa'.
      
         - Always memset source before memcpy in 'perf bench mem'.
      
         - Quote CC and CXX for their arguments to fix build in environments
           using those variables to pass more than just the compiler names.
      
         - Fix module symbol processing, addressing regression detected via
           "perf test".
      
         - Allow multiple probes in record+script_probe_vfs_getname.sh 'perf
           test' entry.
      
        Improvements:
         - Add script to autogenerate socket family name id->string table from
           copy of kernel header, used so far in 'perf trace'.
      
         - 'perf ftrace' improvements to provide similar options for this
           utility so that one can go from 'perf record', 'perf trace', etc to
           'perf ftrace' just by changing the name of the subcommand.
      
         - Prefer new "sched:sched_waking" trace event when it exists in 'perf
           sched' post processing.
      
         - Update POWER9 metrics to utilize other metrics.
      
         - Fall back to querying debuginfod if debuginfo not found locally.
      
        Miscellaneous:
         - Sync various kvm headers with kernel sources"
      
      * tag 'perf-tools-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (40 commits)
        perf ftrace: Make option description initials all capital letters
        perf build-ids: Fall back to debuginfod query if debuginfo not found
        perf bench numa: Remove dead code in parse_nodes_opt()
        perf stat: Update POWER9 metrics to utilize other metrics
        perf ftrace: Add change log
        perf: ftrace: Add set_tracing_options() to set all trace options
        perf ftrace: Add option --tid to filter by thread id
        perf ftrace: Add option -D/--delay to delay tracing
        perf: ftrace: Allow set graph depth by '--graph-opts'
        perf ftrace: Add support for trace option tracing_thresh
        perf ftrace: Add option 'verbose' to show more info for graph tracer
        perf ftrace: Add support for tracing option 'irq-info'
        perf ftrace: Add support for trace option funcgraph-irqs
        perf ftrace: Add support for trace option sleep-time
        perf ftrace: Add support for tracing option 'func_stack_trace'
        perf tools: Add general function to parse sublevel options
        perf ftrace: Add option '--inherit' to trace children processes
        perf ftrace: Show trace column header
        perf ftrace: Add option '-m/--buffer-size' to set per-cpu buffer size
        perf ftrace: Factor out function write_tracing_file_int()
        ...
      713eee84
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 50f6c7db
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes and small updates all around the place:
      
         - Fix mitigation state sysfs output
      
         - Fix an FPU xstate/sxave code assumption bug triggered by
           Architectural LBR support
      
         - Fix Lightning Mountain SoC TSC frequency enumeration bug
      
         - Fix kexec debug output
      
         - Fix kexec memory range assumption bug
      
         - Fix a boundary condition in the crash kernel code
      
         - Optimize porgatory.ro generation a bit
      
         - Enable ACRN guests to use X2APIC mode
      
         - Reduce a __text_poke() IRQs-off critical section for the benefit of
           PREEMPT_RT"
      
      * tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/alternatives: Acquire pte lock with interrupts enabled
        x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
        x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
        x86/purgatory: Don't generate debug info for purgatory.ro
        x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
        kexec_file: Correctly output debugging information for the PT_LOAD ELF header
        kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
        x86/crash: Correct the address boundary of function parameters
        x86/acrn: Remove redundant chars from ACRN signature
        x86/acrn: Allow ACRN guest to use X2APIC mode
      50f6c7db
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1195d58f
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Two fixes: fix a new tracepoint's output value, and fix the formatting
        of show-state syslog printouts"
      
      * tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/debug: Fix the alignment of the show-state debug output
        sched: Fix use of count for nr_running tracepoint
      1195d58f
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f5faaaa
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Misc fixes, an expansion of perf syscall access to CAP_PERFMON
        privileged tools, plus a RAPL HW-enablement for Intel SPR platforms"
      
      * tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/rapl: Add support for Intel SPR platform
        perf/x86/rapl: Support multiple RAPL unit quirks
        perf/x86/rapl: Fix missing psys sysfs attributes
        hw_breakpoint: Remove unused __register_perf_hw_breakpoint() declaration
        kprobes: Remove show_registers() function prototype
        perf/core: Take over CAP_SYS_PTRACE creds to CAP_PERFMON capability
      7f5faaaa
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · eb1319af
      Linus Torvalds authored
      Pull locking fixlets from Ingo Molnar:
       "A documentation fix and a 'fallthrough' macro update"
      
      * tag 'locking-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Convert to use the preferred 'fallthrough' macro
        Documentation/locking/locktypes: Fix a typo
      eb1319af
    • Linus Torvalds's avatar
      Merge tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux · 410520d0
      Linus Torvalds authored
      Pull 9p updates from Dominique Martinet:
      
       - some code cleanup
      
       - a couple of static analysis fixes
      
       - setattr: try to pick a fid associated with the file rather than the
         dentry, which might sometimes matter
      
      * tag '9p-for-5.9-rc1' of git://github.com/martinetd/linux:
        9p: Remove unneeded cast from memory allocation
        9p: remove unused code in 9p
        net/9p: Fix sparse endian warning in trans_fd.c
        9p: Fix memory leak in v9fs_mount
        9p: retrieve fid from file when file instance exist.
      410520d0
    • Linus Torvalds's avatar
      Merge tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6 · f6513bd3
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small cifs/smb3 fixes, one for stable fixing mkdir path with
        the 'idsfromsid' mount option"
      
      * tag '5.9-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
        SMB3: Fix mkdir when idsfromsid configured on mount
        cifs: Convert to use the fallthrough macro
        cifs: Fix an error pointer dereference in cifs_mount()
      f6513bd3
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 37711e5e
      Linus Torvalds authored
      Pull NFS client updates from Trond Myklebust:
       "Stable fixes:
         - pNFS: Don't return layout segments that are being used for I/O
         - pNFS: Don't move layout segments off the active list when being used for I/O
      
        Features:
         - NFS: Add support for user xattrs through the NFSv4.2 protocol
         - NFS: Allow applications to speed up readdir+statx() using AT_STATX_DONT_SYNC
         - NFSv4.0 allow nconnect for v4.0
      
        Bugfixes and cleanups:
         - nfs: ensure correct writeback errors are returned on close()
         - nfs: nfs_file_write() should check for writeback errors
         - nfs: Fix getxattr kernel panic and memory overflow
         - NFS: Fix the pNFS/flexfiles mirrored read failover code
         - SUNRPC: dont update timeout value on connection reset
         - freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
         - sunrpc: destroy rpc_inode_cachep after unregister_filesystem"
      
      * tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (32 commits)
        NFS: Fix flexfiles read failover
        fs: nfs: delete repeated words in comments
        rpc_pipefs: convert comma to semicolon
        nfs: Fix getxattr kernel panic and memory overflow
        NFS: Don't return layout segments that are in use
        NFS: Don't move layouts to plh_return_segs list while in use
        NFS: Add layout segment info to pnfs read/write/commit tracepoints
        NFS: Add tracepoints for layouterror and layoutstats.
        NFS: Report the stateid + status in trace_nfs4_layoutreturn_on_close()
        SUNRPC dont update timeout value on connection reset
        nfs: nfs_file_write() should check for writeback errors
        nfs: ensure correct writeback errors are returned on close()
        NFSv4.2: xattr cache: get rid of cache discard work queue
        NFS: remove redundant initialization of variable result
        NFSv4.0 allow nconnect for v4.0
        freezer: Add unsafe versions of freezable_schedule_timeout_interruptible for NFS
        sunrpc: destroy rpc_inode_cachep after unregister_filesystem
        NFSv4.2: add client side xattr caching.
        NFSv4.2: hook in the user extended attribute handlers
        NFSv4.2: add the extended attribute proc functions.
        ...
      37711e5e
    • Linus Torvalds's avatar
      Merge tag 'edac_updates_for_5.9_pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 6ffdcde4
      Linus Torvalds authored
      Pull edac fix from Tony Luck:
       "Fix for the ie31200 driver that missed the first pull"
      
      * tag 'edac_updates_for_5.9_pt2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/ie31200: Fallback if host bridge device is already initialized
      6ffdcde4
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · b07175dc
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
       "Another round of 'allOf' removals and whitespace clean-ups of schemas"
      
      * tag 'devicetree-fixes-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: Remove more cases of 'allOf' containing a '$ref'
        dt-bindings: Whitespace clean-ups in schema files
      b07175dc
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 341323fa
      Linus Torvalds authored
      Pull more ACPI updates from Rafael Wysocki:
       "Add new hardware support to the ACPI driver for AMD SoCs, the x86 clk
        driver and the Designware i2c driver (changes from Akshu Agrawal and
        Pu Wen)"
      
      * tag 'acpi-5.9-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        clk: x86: Support RV architecture
        ACPI: APD: Add a fmw property is_raven
        clk: x86: Change name from ST to FCH
        ACPI: APD: Change name from ST to FCH
        i2c: designware: Add device HID for Hygon I2C controller
      341323fa
    • Linus Torvalds's avatar
      Merge tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1a5d9dbb
      Linus Torvalds authored
      Pull one more power management update from Rafael Wysocki:
       "Modify the intel_pstate driver to allow it to work in the passive mode
        with hardware-managed P-states (HWP) enabled"
      
      * tag 'pm-5.9-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Implement passive mode with HWP enabled
      1a5d9dbb
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · 884e0d3d
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "Core Frameworks
         - Make better attempt at matching device with the correct OF node
         - Allow batch removal of hierarchical sub-devices
      
        New Drivers
         - Add STM32 Clocksource driver
         - Add support for Khadas System Control Microcontroller
      
        Driver Removal
         - Remove unused driver for TI's SMSC ECE1099
      
        New Device Support
         - Add support for Intel Emmitsburg PCH to Intel LPSS PCI
         - Add support for Intel Tiger Lake PCH-H to Intel LPSS PCI
         - Add support for Dialog DA revision to Dialog DA9063
      
        New Functionality
         - Add support for AXP803 to be probed by I2C
      
        Fix-ups
         - Numerous W=1 warning fixes
         - Device Tree changes (stm32-lptimer, gateworks-gsc, khadas,mcu, stmfx, cros-ec, j721e-system-controller)
         - Enabled Regmap 'fast I/O' in stm32-lptimer
         - Change BUG_ON to WARN_ON in arizona-core
         - Remove superfluous code/initialisation (madera, max14577)
         - Trivial formatting/spelling issues (madera-core, madera-i2c, da9055, max77693-private)
         - Switch to of_platform_populate() in sprd-sc27xx-spi
         - Expand out set/get brightness/pwm macros in lm3533-ctrlbank
         - Disable IRQs on suspend in motorola-cpcap
         - Clean-up error handling in intel_soc_pmic_mrfld
         - Ensure correct removal order of sub-devices in madera
         - Many s/HTTP/HTTPS/ link changes
         - Ensure name used with Regmap is unique in syscon
      
        Bug Fixes
         - Properly 'put' clock on unbind and error in arizona-core
         - Fix revision handling in da9063
         - Fix 'assignment of read-only location' error in kempld-core
         - Avoid using the Regmap API when atomic in rn5t618
         - Redefine volatile register description in rn5t618
         - Use locking to protect event handler in dln2"
      
      * tag 'mfd-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (76 commits)
        mfd: syscon: Use a unique name with regmap_config
        mfd: Replace HTTP links with HTTPS ones
        mfd: dln2: Run event handler loop under spinlock
        mfd: madera: Improve handling of regulator unbinding
        mfd: mfd-core: Add mechanism for removal of a subset of children
        mfd: intel_soc_pmic_mrfld: Simplify the return expression of intel_scu_ipc_dev_iowrite8()
        mfd: max14577: Remove redundant initialization of variable current_bits
        mfd: rn5t618: Fix caching of battery related registers
        mfd: max77693-private: Drop a duplicated word
        mfd: da9055: pdata.h: Drop a duplicated word
        mfd: rn5t618: Make restart handler atomic safe
        mfd: kempld-core: Fix 'assignment of read-only location' error
        mfd: axp20x: Allow the AXP803 to be probed by I2C
        mfd: da9063: Add support for latest DA silicon revision
        mfd: da9063: Fix revision handling to correctly select reg tables
        dt-bindings: mfd: st,stmfx: Remove I2C unit name
        dt-bindings: mfd: ti,j721e-system-controller.yaml: Add J721e system controller
        mfd: motorola-cpcap: Disable interrupt for suspend
        mfd: smsc-ece1099: Remove driver
        mfd: core: Add OF_MFD_CELL_REG() helper
        ...
      884e0d3d
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 18737f42
      Linus Torvalds authored
      Merge more updates from Andrew Morton:
       "Subsystems affected by this patch series: mm/hotfixes, lz4, exec,
        mailmap, mm/thp, autofs, sysctl, mm/kmemleak, mm/misc and lib"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
        virtio: pci: constify ioreadX() iomem argument (as in generic implementation)
        ntb: intel: constify ioreadX() iomem argument (as in generic implementation)
        rtl818x: constify ioreadX() iomem argument (as in generic implementation)
        iomap: constify ioreadX() iomem argument (as in generic implementation)
        sh: use generic strncpy()
        sh: clkfwk: remove r8/r16/r32
        include/asm-generic/vmlinux.lds.h: align ro_after_init
        mm: annotate a data race in page_zonenum()
        mm/swap.c: annotate data races for lru_rotate_pvecs
        mm/rmap: annotate a data race at tlb_flush_batched
        mm/mempool: fix a data race in mempool_free()
        mm/list_lru: fix a data race in list_lru_count_one
        mm/memcontrol: fix a data race in scan count
        mm/page_counter: fix various data races at memsw
        mm/swapfile: fix and annotate various data races
        mm/filemap.c: fix a data race in filemap_fault()
        mm/swap_state: mark various intentional data races
        mm/page_io: mark various intentional data races
        mm/frontswap: mark various intentional data races
        mm/kmemleak: silence KCSAN splats in checksum
        ...
      18737f42
    • Krzysztof Kozlowski's avatar
      virtio: pci: constify ioreadX() iomem argument (as in generic implementation) · fe0580ac
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-5-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe0580ac
    • Krzysztof Kozlowski's avatar
      ntb: intel: constify ioreadX() iomem argument (as in generic implementation) · 58184e95
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarDave Jiang <dave.jiang@intel.com>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-4-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58184e95
    • Krzysztof Kozlowski's avatar
      rtl818x: constify ioreadX() iomem argument (as in generic implementation) · 5ca6ad7d
      Krzysztof Kozlowski authored
      The ioreadX() helpers have inconsistent interface.  On some architectures
      void *__iomem address argument is a pointer to const, on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200709072837.5869-3-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ca6ad7d
    • Krzysztof Kozlowski's avatar
      iomap: constify ioreadX() iomem argument (as in generic implementation) · 8f28ca6b
      Krzysztof Kozlowski authored
      Patch series "iomap: Constify ioreadX() iomem argument", v3.
      
      The ioread8/16/32() and others have inconsistent interface among the
      architectures: some taking address as const, some not.
      
      It seems there is nothing really stopping all of them to take pointer to
      const.
      
      This patch (of 4):
      
      The ioreadX() and ioreadX_rep() helpers have inconsistent interface.  On
      some architectures void *__iomem address argument is a pointer to const,
      on some not.
      
      Implementations of ioreadX() do not modify the memory under the address so
      they can be converted to a "const" version for const-safety and
      consistency among architectures.
      
      [krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
        Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
      [akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
        Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.comSuggested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      Cc: Allen Hubbe <allenbh@gmail.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
      Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f28ca6b
    • Kuninori Morimoto's avatar
      sh: use generic strncpy() · f9e7ff9c
      Kuninori Morimoto authored
      Current SH will get below warning at strncpy()
      
      In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
                       from ${LINUX}/include/linux/string.h:20,
                       from ${LINUX}/include/linux/bitmap.h:9,
                       from ${LINUX}/include/linux/nodemask.h:95,
                       from ${LINUX}/include/linux/mmzone.h:17,
                       from ${LINUX}/include/linux/gfp.h:6,
                       from ${LINUX}/innclude/linux/slab.h:15,
                       from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
      ${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
      ${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
        80 is above array bounds of 'char[26]' [-Warray-bounds]
         : "0" (__dest), "1" (__src), "r" (__src+__n)
                                           ~~~~~^~~~
      
      In general, strncpy() should behave like below.
      
      	char dest[10];
      	char *src = "12345";
      
      	strncpy(dest, src, 10);
      	// dest = {'1', '2', '3', '4', '5',
      	           '\0','\0','\0','\0','\0'}
      
      But, current SH strnpy() has 2 issues.
      1st is it will access to out-of-memory (= src + 10).
      2nd is it needs big fixup for it, and maintenance __asm__
      code is difficult.
      
      To solve these issues, this patch simply uses generic strncpy()
      instead of architecture specific one.
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://marc.info/?l=linux-renesas-soc&m=157664657013309Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9e7ff9c
    • Kuninori Morimoto's avatar
      sh: clkfwk: remove r8/r16/r32 · a8e3943b
      Kuninori Morimoto authored
      SH will get below warning
      
      ${LINUX}/drivers/sh/clk/cpg.c: In function 'r8':
      ${LINUX}/drivers/sh/clk/cpg.c:41:17: warning: passing argument 1 of 'ioread8'
       discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        return ioread8(addr);
                       ^~~~
      In file included from ${LINUX}/arch/sh/include/asm/io.h:21,
                       from ${LINUX}/include/linux/io.h:13,
                       from ${LINUX}/drivers/sh/clk/cpg.c:14:
      ${LINUX}/include/asm-generic/iomap.h:29:29: note: expected 'void *' but
      argument is of type 'const void *'
       extern unsigned int ioread8(void __iomem *);
                                   ^~~~~~~~~~~~~~
      
      We don't need "const" for r8/r16/r32.  And we don't need r8/r16/r32
      themselvs.  This patch cleanup these.
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Romain Naour <romain.naour@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      X-MARC-Message: https://marc.info/?l=linux-renesas-soc&m=157852973916903Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8e3943b
    • Romain Naour's avatar
      include/asm-generic/vmlinux.lds.h: align ro_after_init · 7f897acb
      Romain Naour authored
      Since the patch [1], building the kernel using a toolchain built with
      binutils 2.33.1 prevents booting a sh4 system under Qemu.  Apply the patch
      provided by Alan Modra [2] that fix alignment of rodata.
      
      [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ebd2263ba9a9124d93bbc0ece63d7e0fae89b40e
      [2] https://www.sourceware.org/ml/binutils/2019-12/msg00112.htmlSigned-off-by: default avatarRomain Naour <romain.naour@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Alan Modra <amodra@gmail.com>
      Cc: Bin Meng <bin.meng@windriver.com>
      Cc: Chen Zhou <chenzhou10@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Link: https://marc.info/?l=linux-sh&m=158429470221261Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f897acb
    • Qian Cai's avatar
      mm: annotate a data race in page_zonenum() · c403f6a3
      Qian Cai authored
       BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page
      
       write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
        page_cpupid_xchg_last+0x51/0x80
        page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
        wp_page_reuse+0x3e/0xc0
        wp_page_reuse at mm/memory.c:2453
        do_wp_page+0x472/0x7b0
        do_wp_page at mm/memory.c:2798
        __handle_mm_fault+0xcb0/0xd00
        handle_pte_fault at mm/memory.c:4049
        (inlined by) __handle_mm_fault at mm/memory.c:4163
        handle_mm_fault+0xfc/0x2f0
        handle_mm_fault at mm/memory.c:4200
        do_page_fault+0x263/0x6f9
        do_user_addr_fault at arch/x86/mm/fault.c:1465
        (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
        page_fault+0x34/0x40
      
       read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
        put_page+0x15a/0x1f0
        page_zonenum at include/linux/mm.h:923
        (inlined by) is_zone_device_page at include/linux/mm.h:929
        (inlined by) page_is_devmap_managed at include/linux/mm.h:948
        (inlined by) put_page at include/linux/mm.h:1023
        wp_page_copy+0x571/0x930
        wp_page_copy at mm/memory.c:2615
        do_wp_page+0x107/0x7b0
        __handle_mm_fault+0xcb0/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A page never changes its zone number. The zone number happens to be
      stored in the same word as other bits which are modified, but the zone
      number bits will never be modified by any other write, so it can accept
      a reload of the zone bits after an intervening write and it don't need
      to use READ_ONCE(). Thus, annotate this data race using
      ASSERT_EXCLUSIVE_BITS() to also assert that there are no concurrent
      writes to it.
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Link: http://lkml.kernel.org/r/1581619089-14472-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c403f6a3
    • Qian Cai's avatar
      mm/swap.c: annotate data races for lru_rotate_pvecs · 7e0cc01e
      Qian Cai authored
      Read to lru_add_pvec->nr could be interrupted and then write to the same
      variable.  The write has local interrupt disabled, but the plain reads
      result in data races.  However, it is unlikely the compilers could do much
      damage here given that lru_add_pvec->nr is a "unsigned char" and there is
      an existing compiler barrier.  Thus, annotate the reads using the
      data_race() macro.  The data races were reported by KCSAN,
      
       BUG: KCSAN: data-race in lru_add_drain_cpu / rotate_reclaimable_page
      
       write to 0xffff9291ebcb8a40 of 1 bytes by interrupt on cpu 23:
        rotate_reclaimable_page+0x2df/0x490
        pagevec_add at include/linux/pagevec.h:81
        (inlined by) rotate_reclaimable_page at mm/swap.c:259
        end_page_writeback+0x1b5/0x2b0
        end_swap_bio_write+0x1d0/0x280
        bio_endio+0x297/0x560
        dec_pending+0x218/0x430 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x297/0x560
        blk_update_request+0x201/0x920
        scsi_end_request+0x6b/0x4a0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x1ed/0x2a0
        scsi_softirq_done+0x1c9/0x1d0
        blk_done_softirq+0x181/0x1d0
        __do_softirq+0xd9/0x57c
        irq_exit+0xa2/0xc0
        do_IRQ+0x8b/0x190
        ret_from_intr+0x0/0x42
        delay_tsc+0x46/0x80
        __const_udelay+0x3c/0x40
        __udelay+0x10/0x20
        kcsan_setup_watchpoint+0x202/0x3a0
        __tsan_read1+0xc2/0x100
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9291ebcb8a40 of 1 bytes by task 37761 on cpu 23:
        lru_add_drain_cpu+0xb8/0x3f0
        lru_add_drain_cpu at mm/swap.c:602
        lru_add_drain+0x25/0x40
        shrink_active_list+0xe1/0xc80
        shrink_lruvec+0x766/0xb70
        shrink_node+0x2d6/0xca0
        do_try_to_free_pages+0x1f7/0x9a0
        try_to_free_pages+0x252/0x5b0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x16e/0x6f0
        __handle_mm_fault+0xcd5/0xd40
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       2 locks held by oom02/37761:
        #0: ffff9281e5928808 (&mm->mmap_sem#2){++++}, at: do_page_fault
        #1: ffffffffb3ade380 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part
       irq event stamp: 1949217
       trace_hardirqs_on_thunk+0x1a/0x1c
       __do_softirq+0x2e7/0x57c
       __do_softirq+0x34c/0x57c
       irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 23 PID: 37761 Comm: oom02 Not tainted 5.6.0-rc3-next-20200226+ #6
       Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/20200228044018.1263-1-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e0cc01e
    • Qian Cai's avatar
      mm/rmap: annotate a data race at tlb_flush_batched · 9c1177b6
      Qian Cai authored
      mm->tlb_flush_batched could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one
      
       write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6:
        try_to_unmap_one+0x59a/0x1ab0
        set_tlb_ubc_flush_pending at mm/rmap.c:635
        (inlined by) try_to_unmap_one at mm/rmap.c:1538
        rmap_walk_anon+0x296/0x650
        rmap_walk+0xdf/0x100
        try_to_unmap+0x18a/0x2f0
        shrink_page_list+0xef6/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4:
        flush_tlb_batched_pending+0x29/0x90
        flush_tlb_batched_pending at mm/rmap.c:682
        change_p4d_range+0x5dd/0x1030
        change_pte_range at mm/mprotect.c:44
        (inlined by) change_pmd_range at mm/mprotect.c:212
        (inlined by) change_pud_range at mm/mprotect.c:240
        (inlined by) change_p4d_range at mm/mprotect.c:260
        change_protection+0x222/0x310
        change_prot_numa+0x3e/0x60
        task_numa_work+0x219/0x350
        task_work_run+0xed/0x140
        prepare_exit_to_usermode+0x2cc/0x2e0
        ret_from_intr+0x32/0x42
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 4 PID: 6364 Comm: mtest01 Tainted: G        W    L 5.5.0-next-20200210+ #5
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      flush_tlb_batched_pending() is under PTL but the write is not, but
      mm->tlb_flush_batched is only a bool type, so the value is unlikely to be
      shattered.  Thus, mark it as an intentional data race by using the data
      race macro.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Link: http://lkml.kernel.org/r/1581450783-8262-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9c1177b6
    • Qian Cai's avatar
      mm/mempool: fix a data race in mempool_free() · abe1de42
      Qian Cai authored
      mempool_t pool.curr_nr could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in mempool_free / remove_element
      
       write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113:
        remove_element+0x4a/0x1c0
        remove_element at mm/mempool.c:132
        mempool_alloc+0x102/0x210
        (inlined by) mempool_alloc at mm/mempool.c:399
        bio_alloc_bioset+0x106/0x2c0
        get_swap_bio+0x49/0x230
        __swap_writepage+0x680/0xc30
        swap_writepage+0x9c/0xf0
        pageout+0x33e/0xae0
        shrink_page_list+0x1f57/0x2870
        shrink_inactive_list+0x316/0x880
        shrink_lruvec+0x8dc/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        <snip>
      
       read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64:
        mempool_free+0x3e/0x150
        mempool_free at mm/mempool.c:492
        bio_free+0x192/0x280
        bio_put+0x91/0xd0
        end_swap_bio_write+0x1d8/0x280
        bio_endio+0x2c2/0x5b0
        dec_pending+0x22b/0x440 [dm_mod]
        clone_endio+0xe4/0x2c0 [dm_mod]
        bio_endio+0x2c2/0x5b0
        blk_update_request+0x217/0x940
        scsi_end_request+0x6b/0x4d0
        scsi_io_completion+0xb7/0x7e0
        scsi_finish_command+0x223/0x310
        scsi_softirq_done+0x1d5/0x210
        blk_mq_complete_request+0x224/0x250
        scsi_mq_done+0xc2/0x250
        pqi_raid_io_complete+0x5a/0x70 [smartpqi]
        pqi_irq_handler+0x150/0x1410 [smartpqi]
        __handle_irq_event_percpu+0x90/0x540
        handle_irq_event_percpu+0x49/0xd0
        handle_irq_event+0x85/0xca
        handle_edge_irq+0x13f/0x3e0
        do_IRQ+0x86/0x190
        <snip>
      
      Since the write is under pool->lock but the read is done as lockless.
      Even though the commit 5b990546 ("mempool: fix and document
      synchronization and memory barrier usage") introduced the smp_wmb() and
      smp_rmb() pair to improve the situation, it is adequate to protect it
      from data races which could lead to a logic bug, so fix it by adding
      READ_ONCE() for the read.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/1581446384-2131-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abe1de42
    • Qian Cai's avatar
      mm/list_lru: fix a data race in list_lru_count_one · a1f45935
      Qian Cai authored
      struct list_lru_one l.nr_items could be accessed concurrently as noticed
      by KCSAN,
      
       BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move
      
       write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39:
        list_lru_isolate_move+0xf9/0x130
        list_lru_isolate_move at mm/list_lru.c:180
        inode_lru_isolate+0x12b/0x2a0
        __list_lru_walk_one+0x122/0x3d0
        list_lru_walk_one+0x75/0xa0
        prune_icache_sb+0x8b/0xc0
        super_cache_scan+0x1b8/0x250
        do_shrink_slab+0x256/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        balance_pgdat+0x652/0xd90
        kswapd+0x396/0x8d0
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56:
        list_lru_count_one+0x116/0x2f0
        list_lru_count_one at mm/list_lru.c:193
        super_cache_count+0xe8/0x170
        do_shrink_slab+0x95/0x6d0
        shrink_slab+0x41b/0x4a0
        shrink_node+0x35c/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 56 PID: 6345 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #4
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      A shattered l.nr_items could affect the shrinker behaviour due to a data
      race. Fix it by adding READ_ONCE() for the read. Since the writes are
      aligned and up to word-size, assume those are safe from data races to
      avoid readability issues of writing WRITE_ONCE(var, var + val).
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1581114679-5488-1-git-send-email-cai@lca.pwSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1f45935