1. 27 Feb, 2020 7 commits
    • Bob Peterson's avatar
      gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty · 9ff78289
      Bob Peterson authored
      Before this patch, if gfs2_ail_empty_gl saw there was nothing on
      the ail list, it would return and not flush the log. The problem
      is that there could still be a revoke for the rgrp sitting on the
      sd_log_le_revoke list that's been recently taken off the ail list.
      But that revoke still needs to be written, and the rgrp_go_inval
      still needs to call log_flush_wait to ensure the revokes are all
      properly written to the journal before we relinquish control of
      the glock to another node. If we give the glock to another node
      before we have this knowledge, the node might crash and its journal
      replayed, in which case the missing revoke would allow the journal
      replay to replay the rgrp over top of the rgrp we already gave to
      another node, thus overwriting its changes and corrupting the
      file system.
      
      This patch makes gfs2_ail_empty_gl still call gfs2_log_flush rather
      than returning.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      9ff78289
    • Bob Peterson's avatar
      gfs2: Check for log write errors before telling dlm to unlock · d93ae386
      Bob Peterson authored
      Before this patch, function do_xmote just assumed all the writes
      submitted to the journal were finished and successful, and it
      called the go_unlock function to release the dlm lock. But if
      they're not, and a revoke failed to make its way to the journal,
      a journal replay on another node will cause corruption if we
      let the go_inval function continue and tell dlm to release the
      glock to another node. This patch adds a couple checks for errors
      in do_xmote after the calls to go_sync and go_inval. If an error
      is found, we cannot withdraw yet, because the withdraw itself
      uses glocks to make the file system read-only. Instead, we flag
      the error. Later, asserts should cause another node to replay
      the journal before continuing, thus protecting rgrp and dinode
      glocks and maintaining the integrity of the metadata. Note that
      we only need to do this for journaled glocks. System glocks
      should be able to progress even under withdrawn conditions.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      d93ae386
    • Bob Peterson's avatar
      gfs2: Prepare to withdraw as soon as an IO error occurs in log write · f05b86db
      Bob Peterson authored
      Before this patch, function gfs2_end_log_write would detect any IO
      errors writing to the journal and put out an appropriate message,
      but it never set a withdrawing condition. Eventually, the log daemon
      would see the error and determine it was time to withdraw, but in
      the meantime, other processes could continue running as if nothing
      bad ever happened. The biggest consequence is that __gfs2_glock_put
      would BUG() when it saw that there were still unwritten items.
      
      This patch sets the WITHDRAWING status as soon as an IO error is
      detected, and that way, the BUG will be avoided so the file system
      can be properly withdrawn and unmounted.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      f05b86db
    • Bob Peterson's avatar
      gfs2: Issue revokes more intelligently · 5e4c7632
      Bob Peterson authored
      Before this patch, function gfs2_write_revokes would call
      gfs2_ail1_empty, then traverse the sd_ail1_list looking for
      transactions that had bds which were no longer queued to a glock.
      And if it found some, it would try to issue revokes for them, up to
      a predetermined maximum. There were two problems with how it did
      this. First was the fact that gfs2_ail1_empty moves transactions
      which have nothing remaining on the ail1 list from the sd_ail1_list
      to the sd_ail2_list, thus making its traversal of sd_ail1_list
      miss them completely, and therefore, never issue revokes for them.
      Second was the fact that there were three traversals (or partial
      traversals) of the sd_ail1_list, each of which took and then
      released the sd_ail_lock lock: First inside gfs2_ail1_empty,
      second to determine if there are any revokes to be issued, and
      third to actually issue them. All this taking and releasing of the
      sd_ail_lock meant other processes could modify the lists and the
      conditions in which we're working.
      
      This patch simplies the whole process by adding a new parameter
      to function gfs2_ail1_empty, max_revokes. For normal calls, this
      is passed in as 0, meaning we don't want to issue any revokes.
      For function gfs2_write_revokes, we pass in the maximum number
      of revokes we can, thus allowing gfs2_ail1_empty to add the
      revokes where needed. This simplies the code, allows for a single
      holding of the sd_ail_lock, and allows gfs2_ail1_empty to add
      revokes for all the necessary bd items without missing any.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      5e4c7632
    • Bob Peterson's avatar
      gfs2: Add verbose option to check_journal_clean · 7d9f9249
      Bob Peterson authored
      Before this patch, function check_journal_clean would give messages
      related to journal recovery. That's fine for mount time, but when a
      node withdraws and forces replay that way, we don't want all those
      distracting and misleading messages. This patch adds a new parameter
      to make those messages optional.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      7d9f9249
    • Bob Peterson's avatar
      gfs2: fix infinite loop when checking ail item count before go_inval · 33dbd1e4
      Bob Peterson authored
      Before this patch, the rgrp_go_inval and inode_go_inval functions each
      checked if there were any items left on the ail count (by way of a
      count), and if so, did a withdraw. But the withdraw code now uses
      glocks when changing the file system to read-only status. So we can
      not have glock functions withdrawing or a hang will likely result:
      The glocks can't be serviced by the work_func if the work_func is
      busy doing its own withdraw.
      
      This patch removes the checks from the go_inval functions and adds
      a centralized check in do_xmote to warn about the problem and not
      withdraw, but flag the error so it's eventually caught when the logd
      daemon eventually runs.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      33dbd1e4
    • Bob Peterson's avatar
      gfs2: Force withdraw to replay journals and wait for it to finish · 601ef0d5
      Bob Peterson authored
      When a node withdraws from a file system, it often leaves its journal
      in an incomplete state. This is especially true when the withdraw is
      caused by io errors writing to the journal. Before this patch, a
      withdraw would try to write a "shutdown" record to the journal, tell
      dlm it's done with the file system, and none of the other nodes
      know about the problem. Later, when the problem is fixed and the
      withdrawn node is rebooted, it would then discover that its own
      journal was incomplete, and replay it. However, replaying it at this
      point is almost guaranteed to introduce corruption because the other
      nodes are likely to have used affected resource groups that appeared
      in the journal since the time of the withdraw. Replaying the journal
      later will overwrite any changes made, and not through any fault of
      dlm, which was instructed during the withdraw to release those
      resources.
      
      This patch makes file system withdraws seen by the entire cluster.
      Withdrawing nodes dequeue their journal glock to allow recovery.
      
      The remaining nodes check all the journals to see if they are
      clean or in need of replay. They try to replay dirty journals, but
      only the journals of withdrawn nodes will be "not busy" and
      therefore available for replay.
      
      Until the journal replay is complete, no i/o related glocks may be
      given out, to ensure that the replay does not cause the
      aforementioned corruption: We cannot allow any journal replay to
      overwrite blocks associated with a glock once it is held.
      
      The "live" glock which is now used to signal when a withdraw
      occurs. When a withdraw occurs, the node signals its withdraw by
      dequeueing the "live" glock and trying to enqueue it in EX mode,
      thus forcing the other nodes to all see a demote request, by way
      of a "1CB" (one callback) try lock. The "live" glock is not
      granted in EX; the callback is only just used to indicate a
      withdraw has occurred.
      
      Note that all nodes in the cluster must wait for the recovering
      node to finish replaying the withdrawing node's journal before
      continuing. To this end, it checks that the journals are clean
      multiple times in a retry loop.
      
      Also note that the withdraw function may be called from a wide
      variety of situations, and therefore, we need to take extra
      precautions to make sure pointers are valid before using them in
      many circumstances.
      
      We also need to take care when glocks decide to withdraw, since
      the withdraw code now uses glocks.
      
      Also, before this patch, if a process encountered an error and
      decided to withdraw, if another process was already withdrawing,
      the second withdraw would be silently ignored, which set it free
      to unlock its glocks. That's correct behavior if the original
      withdrawer encounters further errors down the road. But if
      secondary waiters don't wait for the journal replay, unlocking
      glocks will allow other nodes to use them, despite the fact that
      the journal containing those blocks is being replayed. The
      replay needs to finish before our glocks are released to other
      nodes. IOW, secondary withdraws need to wait for the first
      withdraw to finish.
      
      For example, if an rgrp glock is unlocked by a process that didn't
      wait for the first withdraw, a journal replay could introduce file
      system corruption by replaying a rgrp block that has already been
      granted to a different cluster node.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      601ef0d5
  2. 20 Feb, 2020 1 commit
    • Bob Peterson's avatar
      gfs2: Allow some glocks to be used during withdraw · a72d2401
      Bob Peterson authored
      We need to allow some glocks to be enqueued, dequeued, promoted, and demoted
      when we're withdrawn. For example, to maintain metadata integrity, we should
      disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like
      iopen or the transaction glocks may be safely used because none of their
      metadata goes through the journal. So in general, we should disallow all
      glocks with an address space, and allow all the others. One exception is:
      we need to allow our active journal to be demoted so others may recover it.
      
      Allowing glocks after withdraw gives us the ability to take appropriate
      action (in a following patch) to have our journal properly replayed by
      another node rather than just abandoning the current transactions and
      pretending nothing bad happened, leaving the other nodes free to modify
      the blocks we had in our journal, which may result in file system
      corruption.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      a72d2401
  3. 10 Feb, 2020 12 commits
    • Bob Peterson's avatar
      gfs2: move check_journal_clean to util.c for future use · 0d91061a
      Bob Peterson authored
      Before this patch function check_journal_clean was in ops_fstype.c.
      This patch moves it to util.c so we can make use of it elsewhere
      in a future patch.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0d91061a
    • Bob Peterson's avatar
      gfs2: Ignore dlm recovery requests if gfs2 is withdrawn · 03678a99
      Bob Peterson authored
      When a node fails, user space informs dlm of the node failure,
      and dlm instructs gfs2 on the surviving nodes to perform journal
      recovery. It does this by calling various callback functions in
      lock_dlm.c. To mark its progress, it keeps generation numbers
      and recover bits in a dlm "control" lock lvb, which is seen by
      all nodes to determine which journals need to be replayed.
      
      The gfs2 on all nodes get the same recovery requests from dlm,
      so they all try to do the recovery, but only one will be
      granted the exclusive lock on the journal. The others fail
      with a "Busy" message on their "try lock."
      
      However, when a node is withdrawn, it cannot safely do any
      recovery or replay any journals. To make matters worse,
      gfs2 might withdraw as a result of attempting recovery. For
      example, this might happen if the device goes offline, or if
      an hba fails. But in today's gfs2 code, it doesn't check for
      being withdrawn at any step in the recovery process. What's
      worse is that these callbacks from dlm have no return code,
      so there is no way to indicate failure back to dlm. We can
      send a "Recovery failed" uevent eventually, but that tells
      user space what happened, not dlm's kernel code.
      
      Before this patch, lock_dlm would perform its recovery steps but
      ignore the result, and eventually it would still update its
      generation number in the lvb, despite the fact that it may have
      withdrawn or encountered an error. The other nodes would then
      see the newer generation number in the lvb and conclude that
      they don't need to do recovery because the generation number
      is newer than the last one they saw. They think a different
      node has already recovered the journal.
      
      This patch adds checks to several of the callbacks used by dlm
      in its recovery state machine so that the functions are ignored
      and skipped if an io error has occurred or if the file system
      is withdrawn. That prevents the lvb bits from being updated, and
      therefore dlm and user space still see the need for recovery to
      take place.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      03678a99
    • Bob Peterson's avatar
      gfs2: Only complain the first time an io error occurs in quota or log · f34a6135
      Bob Peterson authored
      Before this patch, all io errors received by the quota daemon or the
      logd daemon would cause a complaint message to be issued, such as:
      
         gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0
      
      This patch changes it so that the error message is only issued the
      first time the error is encountered.
      
      Also, before this patch function gfs2_end_log_write did not set the
      sd_log_error value, so log errors would not cause the file system to
      be withdrawn. This patch sets the error code so the file system is
      properly withdrawn if an io error is encountered writing to the journal.
      
      WARNING: This change in function breaks check xfstests generic/441
      and causes it to fail: io errors writing to the log should cause a
      file system to be withdrawn, and no further operations are tolerated.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      f34a6135
    • Bob Peterson's avatar
      gfs2: log error reform · 036330c9
      Bob Peterson authored
      Before this patch, gfs2 kept track of journal io errors in two
      places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
      This patch consolidates the two into sd_log_error so that it
      reflects the first error encountered writing to the journal.
      In future patches, we will take advantage of this by checking
      this value rather than having to check both when reacting to
      io errors.
      
      In addition, this fixes a tight loop in unmount: If buffers
      get on the ail1 list and an io error occurs elsewhere, the
      ail1 list would never be cleared because they were always busy.
      So unmount would hang, waiting for the ail1 list to empty.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      036330c9
    • Bob Peterson's avatar
      gfs2: Rework how rgrp buffer_heads are managed · b3422cac
      Bob Peterson authored
      Before this patch, the rgrp code had a serious problem related to
      how it managed buffer_heads for resource groups. The problem caused
      file system corruption, especially in cases of journal replay.
      
      When an rgrp glock was demoted to transfer ownership to a
      different cluster node, do_xmote() first calls rgrp_go_sync and then
      rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called
      gfs2_rgrp_brelse() that dropped the buffer_head reference count.
      In most cases, the reference count went to zero, which is right.
      However, there were other places where the buffers are handled
      differently.
      
      After rgrp_go_sync, do_xmote called rgrp_go_inval which called
      gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to
      truncate_inode_pages_range would get rid of the pages in memory,
      but only if the reference count drops to 0.
      
      Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL.
      So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer
      to the buffer_heads in cases where the reference count was still 1.
      Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time,
      it failed the check for "if (bi->bi_bh)" and thus failed to call
      brelse a second time. Because of that, the reference count on those
      buffers sometimes failed to drop from 1 to 0. And that caused
      function truncate_inode_pages_range to keep the pages in page cache
      rather than freeing them.
      
      The next time the rgrp glock was acquired, the metadata read of
      the rgrp buffers re-used the pages in memory, which were now
      wrong because they were likely modified by the other node who
      acquired the glock in EX (which is why we demoted the glock).
      This re-use of the page cache caused corruption because changes
      made by the other nodes were never seen, so the bitmaps were
      inaccurate.
      
      For some reason, the problem became most apparent when journal
      replay forced the replay of rgrps in memory, which caused newer
      rgrp data to be overwritten by the older in-core pages.
      
      A big part of the problem was that the rgrp buffer were released
      in multiple places: The go_unlock function would release them when
      the glock was released rather than when the glock is demoted,
      which is clearly wrong because our intent was to cache them until
      the glock is demoted from SH or EX.
      
      This patch attempts to clean up the mess and make one consistent
      and centralized mechanism for managing the rgrp buffer_heads by
      implementing several changes:
      
      1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync.
         We don't want to release the buffers or zero the pointers when
         syncing for the reasons stated above. It only makes sense to
         release them when the glock is actually invalidated (go_inval).
         And when we do, then we set the bh pointers to NULL.
      2. The go_unlock function (which was only used for rgrps) is
         eliminated, as we've talked about doing many times before.
         The go_unlock function was called too early in the glock dq
         process, and should not happen until the glock is invalidated.
      3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd.
         That will now happen automatically when the rgrp glocks are
         demoted, and shouldn't happen any sooner or later than that.
         Instead, function gfs2_clear_rgrpd has been modified to demote
         the rgrp glocks, and therefore, free those pages, before the
         remaining glocks are culled by gfs2_gl_hash_clear. This
         prevents the gl_object from hanging around when the glocks are
         culled.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b3422cac
    • Bob Peterson's avatar
      gfs2: clear ail1 list when gfs2 withdraws · 30fe70a8
      Bob Peterson authored
      This patch fixes a bug in which function gfs2_log_flush can get into
      an infinite loop when a gfs2 file system is withdrawn. The problem
      is the infinite loop "for (;;)" in gfs2_log_flush which would never
      finish because the io error and subsequent withdraw prevented the
      items from being taken off the ail list.
      
      This patch tries to clean up the mess by allowing withdraw situations
      to move not-in-flight buffer_heads to the ail2 list, where they will
      be dealt with later.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      30fe70a8
    • Bob Peterson's avatar
      gfs2: Introduce concept of a pending withdraw · 69511080
      Bob Peterson authored
      File system withdraws can be delayed when inconsistencies are
      discovered when we cannot withdraw immediately, for example, when
      critical spin_locks are held. But delaying the withdraw can cause
      gfs2 to ignore the error and keep running for a short period of time.
      For example, an rgrp glock may be dequeued and demoted while there
      are still buffers that haven't been properly revoked, due to io
      errors writing to the journal.
      
      This patch introduces a new concept of a pending withdraw, which
      means an inconsistency has been discovered and we need to withdraw
      at the earliest possible opportunity. In these cases, we aren't
      quite withdrawn yet, but we still need to not dequeue glocks and
      other critical things. If we dequeue the glocks and the withdraw
      results in our journal being replayed, the replay could overwrite
      data that's been modified by a different node that acquired the
      glock in the meantime.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Reviewed-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      69511080
    • Andreas Gruenbacher's avatar
      gfs2: Return bool from gfs2_assert functions · 8e28ef1f
      Andreas Gruenbacher authored
      The gfs2_assert functions only print messages when the filesystem hasn't been
      withdrawn yet, and they indicate whether or not they've printed something in
      their return value.  However, none of the callers use that information, so
      simply return whether or not the assert has failed.
      
      (The gfs2_assert functions are still backwards; they return false when an
      assertion is true.)
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      8e28ef1f
    • Andreas Gruenbacher's avatar
      gfs2: Turn gfs2_consist into void functions · a5ca2f1c
      Andreas Gruenbacher authored
      Change the various gfs2_consist functions to return void.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      a5ca2f1c
    • Andreas Gruenbacher's avatar
      gfs2: Remove usused cluster_wide arguments of gfs2_consist functions · d7e7ab3f
      Andreas Gruenbacher authored
      These arguments are always passed as 0, and they are never evaluated.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      d7e7ab3f
    • Andreas Gruenbacher's avatar
      gfs2: Report errors before withdraw · 8dc88ac6
      Andreas Gruenbacher authored
      In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before
      withdrawing the filesystem: otherwise, when we withdraw first and withdraw is
      configured to panic, we'll never get to the error reporting.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      8dc88ac6
    • Andreas Gruenbacher's avatar
      gfs2: Split gfs2_lm_withdraw into two functions · badb55ec
      Andreas Gruenbacher authored
      Split gfs2_lm_withdraw into a function that prints an error message and a
      function that withdraws the filesystem.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      badb55ec
  4. 06 Feb, 2020 3 commits
  5. 31 Jan, 2020 17 commits
    • Linus Torvalds's avatar
      Merge tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · a62aa6f7
      Linus Torvalds authored
      Pull gfs2 updates from Andreas Gruenbacher:
      
       - Fix some corner cases on filesystems with a block size < page size.
      
       - Fix a corner case that could expose incorrect access times over nfs.
      
       - Revert an otherwise sensible revoke accounting cleanup that causes
         assertion failures. The revoke accounting is whacky and needs to be
         fixed properly before we can add back this cleanup.
      
       - Various other minor cleanups.
      
      In addition, please expect to see another pull request from Bob Peterson
      about his gfs2 recovery patch queue shortly.
      
      * tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        Revert "gfs2: eliminate tr_num_revoke_rm"
        gfs2: remove unused LBIT macros
        fs/gfs2: remove unused IS_DINODE and IS_LEAF macros
        gfs2: Remove GFS2_MIN_LVB_SIZE define
        gfs2: Fix incorrect variable name
        gfs2: Avoid access time thrashing in gfs2_inode_lookup
        gfs2: minor cleanup: remove unneeded variable ret in gfs2_jdata_writepage
        gfs2: eliminate ssize parameter from gfs2_struct2blk
        gfs2: Another gfs2_find_jhead fix
      a62aa6f7
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 677b60dc
      Linus Torvalds authored
      Pull iomap fix from Darrick Wong:
       "A single patch fixing an off-by-one error when we're checking to see
        how far we're gotten into an EOF page"
      
      * tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fs: Fix page_mkwrite off-by-one errors
      677b60dc
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 7eec11d3
      Linus Torvalds authored
      Pull updates from Andrew Morton:
       "Most of -mm and quite a number of other subsystems: hotfixes, scripts,
        ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov.
      
        MM is fairly quiet this time.  Holidays, I assume"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
        kcov: ignore fault-inject and stacktrace
        include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
        execve: warn if process starts with executable stack
        reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
        init/main.c: fix misleading "This architecture does not have kernel memory protection" message
        init/main.c: fix quoted value handling in unknown_bootoption
        init/main.c: remove unnecessary repair_env_string in do_initcall_level
        init/main.c: log arguments and environment passed to init
        fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
        fs/binfmt_elf.c: coredump: delete duplicated overflow check
        fs/binfmt_elf.c: coredump: allocate core ELF header on stack
        fs/binfmt_elf.c: make BAD_ADDR() unlikely
        fs/binfmt_elf.c: better codegen around current->mm
        fs/binfmt_elf.c: don't copy ELF header around
        fs/binfmt_elf.c: fix ->start_code calculation
        fs/binfmt_elf.c: smaller code generation around auxv vector fill
        lib/find_bit.c: uninline helper _find_next_bit()
        lib/find_bit.c: join _find_next_bit{_le}
        uapi: rename ext2_swab() to swab() and share globally in swab.h
        lib/scatterlist.c: adjust indentation in __sg_alloc_table
        ...
      7eec11d3
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · ddaefe89
      Linus Torvalds authored
      Pull module updates from Jessica Yu:
       "Summary of modules changes for the 5.6 merge window:
      
         - Add "MS" (SHF_MERGE|SHF_STRINGS) section flags to __ksymtab_strings
           to indicate to the linker that it can perform string deduplication
           (i.e., duplicate strings are reduced to a single copy in the string
           table). This means any repeated namespace string would be merged to
           just one entry in __ksymtab_strings.
      
         - Various code cleanups and small fixes (fix small memleak in error
           path, improve moduleparam docs, silence rcu warnings, improve error
           logging)"
      
      * tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        module.h: Annotate mod_kallsyms with __rcu
        module: avoid setting info->name early in case we can fall back to info->mod->name
        modsign: print module name along with error message
        kernel/module: Fix memleak in module_add_modinfo_attrs()
        export.h: reduce __ksymtab_strings string duplication by using "MS" section flags
        moduleparam: fix kerneldoc
        modules: lockdep: Suppress suspicious RCU usage warning
      ddaefe89
    • Linus Torvalds's avatar
      Merge tag 'mips_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · c5951e7c
      Linus Torvalds authored
      Pull MIPS changes from Paul Burton:
       "Nothing too big or scary in here:
      
         - Support mremap() for the VDSO, primarily to allow CRIU to restore
           the VDSO to its checkpointed location.
      
         - Restore the MIPS32 cBPF JIT, after having reverted the enablement
           of the eBPF JIT for MIPS32 systems in the 5.5 cycle.
      
         - Improve cop0 counter synchronization behaviour whilst onlining CPUs
           by running with interrupts disabled.
      
         - Better match FPU behaviour when emulating multiply-accumulate
           instructions on pre-r6 systems that implement IEEE754-2008 style
           MACs.
      
         - Loongson64 kernels now build using the MIPS64r2 ISA, allowing them
           to take advantage of instructions introduced by r2.
      
         - Support for the Ingenic X1000 SoC & the really nice little CU Neo
           development board that's using it.
      
         - Support for WMAC on GARDENA Smart Gateway devices.
      
         - Lots of cleanup & refactoring of SGI IP27 (Origin 2*) support in
           preparation for introducing IP35 (Origin 3*) support.
      
         - Various Kconfig & Makefile cleanups"
      
      * tag 'mips_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (60 commits)
        MIPS: PCI: Add detection of IOC3 on IO7, IO8, IO9 and Fuel
        MIPS: Loongson64: Disable exec hazard
        MIPS: Loongson64: Bump ISA level to MIPSR2
        MIPS: Make DIEI support as a config option
        MIPS: OCTEON: octeon-irq: fix spelling mistake "to" -> "too"
        MIPS: asm: local: add barriers for Loongson
        MIPS: Loongson64: Select mac2008 only feature
        MIPS: Add MAC2008 Support
        Revert "MIPS: Add custom serial.h with BASE_BAUD override for generic kernel"
        MIPS: sort MIPS and MIPS_GENERIC Kconfig selects alphabetically (again)
        MIPS: make CPU_HAS_LOAD_STORE_LR opt-out
        MIPS: generic: don't unconditionally select PINCTRL
        MIPS: don't explicitly select LIBFDT in Kconfig
        MIPS: sync-r4k: do slave counter synchronization with disabled HW interrupts
        MIPS: SGI-IP30: Check for valid pointer before using it
        MIPS: syscalls: fix indentation of the 'SYSNR' message
        MIPS: boot: fix typo in 'vmlinux.lzma.its' target
        MIPS: fix indentation of the 'RELOCS' message
        dt-bindings: Document loongson vendor-prefix
        MIPS: CU1000-Neo: Refresh defconfig to support HWMON and WiFi.
        ...
      c5951e7c
    • Linus Torvalds's avatar
      Merge tag 'arc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · b7e573bb
      Linus Torvalds authored
      Pull ARC updates from Vineet Gupta:
      
       - Wire up clone3 syscall
      
       - ARCv2 FPU state save/restore across context switch
      
       - AXS10x platform and misc fixes
      
      * tag 'arc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARCv2: fpu: preserve userspace fpu state
        ARC: fpu: declutter code, move bits out into fpu.h
        ARC: wireup clone3 syscall
        ARC: [plat-axs10x]: Add missing multicast filter number to GMAC node
        ARC: update feature support for jump-labels
      b7e573bb
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.6-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · a1084542
      Linus Torvalds authored
      Pull RISC-V updates from Palmer Dabbelt:
       "This contains a handful of patches for this merge window:
      
         - Support for kasan
      
         - 32-bit physical addresses on rv32i-based systems
      
         - Support for CONFIG_DEBUG_VIRTUAL
      
         - DT entry for the FU540 GPIO controller, which has recently had a
           device driver merged
      
        These boot a buildroot-based system on QEMU's virt board for me"
      
      * tag 'riscv-for-linus-5.6-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: dts: Add DT support for SiFive FU540 GPIO driver
        riscv: mm: add support for CONFIG_DEBUG_VIRTUAL
        riscv: keep 32-bit kernel to 32-bit phys_addr_t
        kasan: Add riscv to KASAN documentation.
        riscv: Add KASAN support
        kasan: No KASAN's memmove check if archs don't have it.
      a1084542
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b70a2d6b
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Misc fixes:
      
         - three fixes and a cleanup for the resctrl code
      
         - a HyperV fix
      
         - a fix to /proc/kcore contents in live debugging sessions
      
         - a fix for the x86 decoder opcode map"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/decoder: Add TEST opcode to Group3-2
        x86/resctrl: Clean up unused function parameter in mkdir path
        x86/resctrl: Fix a deadlock due to inaccurate reference
        x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
        x86/resctrl: Fix use-after-free when deleting resource groups
        x86/hyper-v: Add "polling" bit to hv_synic_sint
        x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y
      b70a2d6b
    • Dmitry Vyukov's avatar
      kcov: ignore fault-inject and stacktrace · 43e76af8
      Dmitry Vyukov authored
      Don't instrument 3 more files that contain debugging facilities and
      produce large amounts of uninteresting coverage for every syscall.
      
      The following snippets are sprinkled all over the place in kcov traces
      in a debugging kernel.  We already try to disable instrumentation of
      stack unwinding code and of most debug facilities.  I guess we did not
      use fault-inject.c at the time, and stacktrace.c was somehow missed (or
      something has changed in kernel/configs).  This change both speeds up
      kcov (kernel doesn't need to store these PCs, user-space doesn't need to
      process them) and frees trace buffer capacity for more useful coverage.
      
        should_fail
        lib/fault-inject.c:149
        fail_dump
        lib/fault-inject.c:45
      
        stack_trace_save
        kernel/stacktrace.c:124
        stack_trace_consume_entry
        kernel/stacktrace.c:86
        stack_trace_consume_entry
        kernel/stacktrace.c:89
        ... a hundred frames skipped ...
        stack_trace_consume_entry
        kernel/stacktrace.c:93
        stack_trace_consume_entry
        kernel/stacktrace.c:86
      
      Link: http://lkml.kernel.org/r/20200116111449.217744-1-dvyukov@gmail.comSigned-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43e76af8
    • Andy Shevchenko's avatar
    • Alexey Dobriyan's avatar
      execve: warn if process starts with executable stack · 47a2ebb7
      Alexey Dobriyan authored
      There were few episodes of silent downgrade to an executable stack over
      years:
      
      1) linking innocent looking assembly file will silently add executable
         stack if proper linker options is not given as well:
      
      	$ cat f.S
      	.intel_syntax noprefix
      	.text
      	.globl f
      	f:
      	        ret
      
      	$ cat main.c
      	void f(void);
      	int main(void)
      	{
      	        f();
      	        return 0;
      	}
      
      	$ gcc main.c f.S
      	$ readelf -l ./a.out
      	  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                               0x0000000000000000 0x0000000000000000  RWE    0x10
      			 					 ^^^
      
      2) converting C99 nested function into a closure
         https://nullprogram.com/blog/2019/11/15/
      
      	void intsort2(int *base, size_t nmemb, _Bool invert)
      	{
      	    int cmp(const void *a, const void *b)
      	    {
      	        int r = *(int *)a - *(int *)b;
      	        return invert ? -r : r;
      	    }
      	    qsort(base, nmemb, sizeof(*base), cmp);
      	}
      
      will silently require stack trampolines while non-closure version will
      not.
      
      Without doubt this behaviour is documented somewhere, add a warning so
      that developers and users can at least notice.  After so many years of
      x86_64 having proper executable stack support it should not cause too
      many problems.
      
      Link: http://lkml.kernel.org/r/20191208171918.GC19716@avx2Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47a2ebb7
    • Yunfeng Ye's avatar
      reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() · aacee544
      Yunfeng Ye authored
      The variable inode may be NULL in reiserfs_insert_item(), but there is
      no check before accessing the member of inode.
      
      Fix this by adding NULL pointer check before calling reiserfs_debug().
      
      Link: http://lkml.kernel.org/r/79c5135d-ff25-1cc9-4e99-9f572b88cc00@huawei.comSigned-off-by: default avatarYunfeng Ye <yeyunfeng@huawei.com>
      Cc: zhengbin <zhengbin13@huawei.com>
      Cc: Hu Shiyuan <hushiyuan@huawei.com>
      Cc: Feilong Lin <linfeilong@huawei.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aacee544
    • Christophe Leroy's avatar
      init/main.c: fix misleading "This architecture does not have kernel memory protection" message · f596ded1
      Christophe Leroy authored
      This message leads to thinking that memory protection is not implemented
      for the said architecture, whereas absence of CONFIG_STRICT_KERNEL_RWX
      only means that memory protection has not been selected at compile time.
      
      Don't print this message when CONFIG_ARCH_HAS_STRICT_KERNEL_RWX is
      selected by the architecture.  Instead, print "Kernel memory protection
      not selected by kernel config."
      
      Link: http://lkml.kernel.org/r/62477e446d9685459d4f27d193af6ff1bd69d55f.1578557581.git.christophe.leroy@c-s.frSigned-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f596ded1
    • Arvind Sankar's avatar
      init/main.c: fix quoted value handling in unknown_bootoption · 283900e8
      Arvind Sankar authored
      Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2.
      
      unknown_bootoption passes unrecognized command line arguments to init as
      either environment variables or arguments.  Some of the logic in the
      function is broken for quoted command line arguments.
      
      When an argument of the form param="value" is processed by parse_args
      and passed to unknown_bootoption, the command line has
      
        param\0"value\0
      
      with val pointing to the beginning of value.  The helper function
      repair_env_string is then used to restore the '=' character that was
      removed by parse_args, and strip the quotes off fully.  This results in
      
        param=value\0\0
      
      and val ends up pointing to the 'a' instead of the 'v' in value.  This
      bug was introduced when repair_env_string was refactored into a separate
      function, and the decrement of val in repair_env_string became dead
      code.
      
      This causes two problems in unknown_bootoption in the two places where
      the val pointer is used as a substitute for the length of param:
      
      1. An argument of the form param=".value" is misinterpreted as a
         potential module parameter, with the result that it will not be
         placed in init's environment.
      
      2. An argument of the form param="value" is checked to see if param is
         an existing environment variable that should be overwritten, but the
         comparison is off-by-one and compares 'param=v' instead of 'param='
         against the existing environment. So passing, for example,
         TERM="vt100" on the command line results in init being passed both
         TERM=linux and TERM=vt100 in its environment.
      
      Patch 1 adds logging for the arguments and environment passed to init
      and is independent of the rest: it can be dropped if this is
      unnecessarily verbose.
      
      Patch 2 removes repair_env_string from initcall parameter parsing in
      do_initcall_level, as that uses a separate copy of the command line now
      and the repairing is no longer necessary.
      
      Patch 3 fixes the bug in unknown_bootoption by recording the length of
      param explicitly instead of implying it from val-param.
      
      This patch (of 3):
      
      Commit a99cd112 ("init: fix bug where environment vars can't be
      passed via boot args") introduced two minor bugs in unknown_bootoption
      by factoring out the quoted value handling into a separate function.
      
      When value is quoted, repair_env_string will move the value up 1 byte to
      strip the quotes, so val in unknown_bootoption no longer points to the
      actual location of the value.
      
      The result is that an argument of the form param=".value" is mistakenly
      treated as a potential module parameter and is not placed in init's
      environment, and an argument of the form param="value" can result in a
      duplicate environment variable: eg TERM="vt100" on the command line will
      result in both TERM=linux and TERM=vt100 being placed into init's
      environment.
      
      Fix this by recording the length of the param before calling
      repair_env_string instead of relying on val.
      
      Link: http://lkml.kernel.org/r/20191212180023.24339-4-nivedita@alum.mit.eduSigned-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Krzysztof Mazur <krzysiek@podlesie.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      283900e8
    • Arvind Sankar's avatar
      init/main.c: remove unnecessary repair_env_string in do_initcall_level · 7e2762e1
      Arvind Sankar authored
      Since commit 08746a65 ("init: fix in-place parameter modification
      regression"), parse_args in do_initcall_level is called on a copy of
      saved_command_line.  It is unnecessary to call repair_env_string during
      this parsing, as this copy is not used for anything later.
      
      Remove the now unnecessary arguments from repair_env_string as well.
      
      Link: http://lkml.kernel.org/r/20191212180023.24339-3-nivedita@alum.mit.eduSigned-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Cc: Krzysztof Mazur <krzysiek@podlesie.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e2762e1
    • Arvind Sankar's avatar
      init/main.c: log arguments and environment passed to init · b88c50ac
      Arvind Sankar authored
      Extend logging in `run_init_process` to also show the arguments and
      environment that we are passing to init.
      
      Link: http://lkml.kernel.org/r/20191212180023.24339-2-nivedita@alum.mit.eduSigned-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Krzysztof Mazur <krzysiek@podlesie.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b88c50ac
    • Alexey Dobriyan's avatar
      fs/binfmt_elf.c: coredump: allow process with empty address space to coredump · 1fbede6e
      Alexey Dobriyan authored
      Unmapping whole address space at once with
      
      	munmap(0, (1ULL<<47) - 4096)
      
      or equivalent will create empty coredump.
      
      It is silly way to exit, however registers content may still be useful.
      
      The right to coredump is fundamental right of a process!
      
      Link: http://lkml.kernel.org/r/20191222150137.GA1277@avx2Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1fbede6e