1. 11 Mar, 2014 40 commits
    • Konrad Rzeszutek Wilk's avatar
      xen/boot: Disable BIOS SMP MP table search. · 79b6a2d6
      Konrad Rzeszutek Wilk authored
      commit bd49940a upstream.
      
      As the initial domain we are able to search/map certain regions
      of memory to harvest configuration data. For all low-level we
      use ACPI tables - for interrupts we use exclusively ACPI _PRT
      (so DSDT) and MADT for INT_SRC_OVR.
      
      The SMP MP table is not used at all. As a matter of fact we do
      not even support machines that only have SMP MP but no ACPI tables.
      
      Lets follow how Moorestown does it and just disable searching
      for BIOS SMP tables.
      
      This also fixes an issue on HP Proliant BL680c G5 and DL380 G6:
      
      9f->100 for 1:1 PTE
      Freeing 9f-100 pfn range: 97 pages freed
      1-1 mapping on 9f->100
      .. snip..
      e820: BIOS-provided physical RAM map:
      Xen: [mem 0x0000000000000000-0x000000000009efff] usable
      Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved
      Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable
      .. snip..
      Scan for SMP in [mem 0x00000000-0x000003ff]
      Scan for SMP in [mem 0x0009fc00-0x0009ffff]
      Scan for SMP in [mem 0x000f0000-0x000fffff]
      found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0]
      (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0
      (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e()
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71
      . snip..
      Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5
      .. snip..
      Call Trace:
       [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81ad32ac>] early_ioremap+0x13/0x15
       [<ffffffff81acc140>] get_mpc_size+0x2f/0x67
       [<ffffffff81acc284>] smp_scan_config+0x10c/0x136
       [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a
       [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b
       [<ffffffff81624731>] ? printk+0x48/0x4a
       [<ffffffff81abca7f>] start_kernel+0x90/0x390
       [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136
       [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661
      (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
      
      which is that ioremap would end up mapping 0xff using _PAGE_IOMAP
      (which is what early_ioremap sticks as a flag) - which meant
      we would get MFN 0xFF (pte ff461, which is OK), and then it would
      also map 0x100 (b/c ioremap tries to get page aligned request, and
      it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page)
      as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP
      bypasses the P2M lookup we would happily set the PTE to 1000461.
      Xen would deny the request since we do not have access to the
      Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example
      0x80140.
      
      Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665Acked-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79b6a2d6
    • Takashi Iwai's avatar
      saa7134: Fix unlocked snd_pcm_stop() call · d54ecc0f
      Takashi Iwai authored
      commit e6355ad7 upstream.
      
      snd_pcm_stop() must be called in the PCM substream lock context.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [wml: Backported to 3.4: Adjust filename]
      Signed-off-by: default avatarWeng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d54ecc0f
    • Theodore Ts'o's avatar
      ext4: return ENOMEM if sb_getblk() fails · c28414d3
      Theodore Ts'o authored
      commit 860d21e2 upstream.
      
      The only reason for sb_getblk() failing is if it can't allocate the
      buffer_head.  So ENOMEM is more appropriate than EIO.  In addition,
      make sure that the file system is marked as being inconsistent if
      sb_getblk() fails.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      [xr: Backported to 3.4:
       - Drop change to inline.c
       - Call to ext4_ext_check() from ext4_ext_find_extent() is conditional]
      Signed-off-by: default avatarRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c28414d3
    • Roland Dreier's avatar
      block: Don't access request after it might be freed · 0ef4c881
      Roland Dreier authored
      commit 893d290f upstream.
      
      After we've done __elv_add_request() and __blk_run_queue() in
      blk_execute_rq_nowait(), the request might finish and be freed
      immediately.  Therefore checking if the type is REQ_TYPE_PM_RESUME
      isn't safe afterwards, because if it isn't, rq might be gone.
      Instead, check beforehand and stash the result in a temporary.
      
      This fixes crashes in blk_execute_rq_nowait() I get occasionally when
      running with lots of memory debugging options enabled -- I think this
      race is usually harmless because the window for rq to be reallocated
      is so small.
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      [xr: Backported to 3.4: adjust context]
      Signed-off-by: default avatarRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ef4c881
    • Paul Clements's avatar
      nbd: correct disconnect behavior · 50e97121
      Paul Clements authored
      commit c378f70a upstream.
      
      Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
      ioctl) the return from NBD_DO_IT is undefined (it is usually one of
      several error codes).  This means that nbd-client does not know if a
      manual disconnect was performed or whether a network error occurred.
      Because of this, nbd-client's persist mode (which tries to reconnect after
      error, but not after manual disconnect) does not always work correctly.
      
      This change fixes this by causing NBD_DO_IT to always return 0 if a user
      requests a disconnect.  This means that nbd-client can correctly either
      persist the connection (if an error occurred) or disconnect (if the user
      requested it).
      Signed-off-by: default avatarPaul Clements <paul.clements@steeleye.com>
      Acked-by: default avatarRob Landley <rob@landley.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [xr: Backported to 3.4: adjust context]
      Signed-off-by: default avatarRui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50e97121
    • Jeff Layton's avatar
      cifs: adjust sequence number downward after signing NT_CANCEL request · b0f9634d
      Jeff Layton authored
      commit 31efee60 upstream.
      
      When a call goes out, the signing code adjusts the sequence number
      upward by two to account for the request and the response. An NT_CANCEL
      however doesn't get a response of its own, it just hurries the server
      along to get it to respond to the original request more quickly.
      Therefore, we must adjust the sequence number back down by one after
      signing a NT_CANCEL request.
      Reported-by: default avatarTim Perry <tdparmor-sambabugs@yahoo.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0f9634d
    • Jan Kara's avatar
      ext4: fix possible use-after-free with AIO · b54e3acc
      Jan Kara authored
      commit 091e26df upstream.
      
      Running AIO is pinning inode in memory using file reference. Once AIO
      is completed using aio_complete(), file reference is put and inode can
      be freed from memory. So we have to be sure that calling aio_complete()
      is the last thing we do with the inode.
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b54e3acc
    • Adam Thomas's avatar
      UBIFS: fix double free of ubifs_orphan objects · 8a4188e2
      Adam Thomas authored
      commit 8afd500c upstream.
      
      The last orphan in the dnext list has its dnext set to NULL. Because
      of that, ubifs_delete_orphan assumes that it is not on the dnext list
      and frees it immediately instead ignoring it as a second delete. The
      orphan is later freed again by erase_deleted.
      
      This change adds an explicit flag to ubifs_orphan indicating whether
      it is pending delete.
      Signed-off-by: default avatarAdam Thomas <adamthomas1111@gmail.com>
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a4188e2
    • Theodore Ts'o's avatar
      ext4/jbd2: don't wait (forever) for stale tid caused by wraparound · ebdc12a0
      Theodore Ts'o authored
      commit d76a3a77 upstream.
      
      In the case where an inode has a very stale transaction id (tid) in
      i_datasync_tid or i_sync_tid, it's possible that after a very large
      (2**31) number of transactions, that the tid number space might wrap,
      causing tid_geq()'s calculations to fail.
      
      Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
      by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
      attempted to fix this problem, but it only avoided kjournald spinning
      forever by fixing the logic in jbd2_log_start_commit().
      
      Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
      that might call jbd2_log_start_commit() with a stale tid, those
      functions will subsequently call jbd2_log_wait_commit() with the same
      stale tid, and then wait for a very long time.  To fix this, we
      replace the calls to jbd2_log_start_commit() and
      jbd2_log_wait_commit() with a call to a new function,
      jbd2_complete_transaction(), which will correctly handle stale tid's.
      
      As a bonus, jbd2_complete_transaction() will avoid locking
      j_state_lock for writing unless a commit needs to be started.  This
      should have a small (but probably not measurable) improvement for
      ext4's scalability.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Reported-by: default avatarGeorge Barnett <gbarnett@atlassian.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebdc12a0
    • Dave Chiluk's avatar
      ncpfs: fix rmdir returns Device or resource busy · d7d659d6
      Dave Chiluk authored
      commit 698b8223 upstream.
      
      1d2ef590 caused a regression in ncpfs such that
      directories could no longer be removed.  This was because ncp_rmdir checked
      to see if a dentry could be unhashed before allowing it to be removed. Since
      1d2ef590 introduced a change that incremented
      dentry->d_count causing it to always be greater than 1 unhash would always
      fail.  Thus causing the error path in ncp_rmdir to always be taken.  Removing
      this error path is safe as unhashing is still accomplished by calls to dput
      from vfs_rmdir.
      Signed-off-by: default avatarDave Chiluk <chiluk@canonical.com>
      Signed-off-by: default avatarPetr Vandrovec <petr@vandrovec.name>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7d659d6
    • Jeff Layton's avatar
      cifs: don't instantiate new dentries in readdir for inodes that need to be revalidated immediately · 3d956c8a
      Jeff Layton authored
      commit 757c4f62 upstream.
      
      David reported that commit c2b93e06 (cifs: only set ops for inodes in
      I_NEW state) caused a regression with mfsymlinks. Prior to that patch,
      if a mfsymlink dentry was instantiated at readdir time, the inode would
      get a new set of ops when it was revalidated. After that patch, this
      did not occur.
      
      This patch addresses this by simply skipping instantiating dentries in
      the readdir codepath when we know that they will need to be immediately
      revalidated. The next attempt to use that dentry will cause a new lookup
      to occur (which is basically what we want to happen anyway).
      
      Cc: "Stefan (metze) Metzmacher" <metze@samba.org>
      Cc: Sachin Prabhu <sprabhu@redhat.com>
      Reported-and-Tested-by: default avatarDavid McBride <dwm37@cam.ac.uk>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [bwh: Backported to 3.2: need to return NULL]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d956c8a
    • majianpeng's avatar
      libceph: unregister request in __map_request failed and nofail == false · b3f19e7f
      majianpeng authored
      commit 73d9f7ee upstream.
      
      For nofail == false request, if __map_request failed, the caller does
      cleanup work, like releasing the relative pages.  It doesn't make any sense
      to retry this request.
      Signed-off-by: default avatarJianpeng Ma <majianpeng@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      [bwh: Backported to 3.2: adjust indentation]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3f19e7f
    • Maxim Patlasov's avatar
      fuse: hotfix truncate_pagecache() issue · fe17c202
      Maxim Patlasov authored
      commit 06a7c3c2 upstream.
      
      The way how fuse calls truncate_pagecache() from fuse_change_attributes()
      is completely wrong. Because, w/o i_mutex held, we never sure whether
      'oldsize' and 'attr->size' are valid by the time of execution of
      truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
      released fc->lock in the middle of fuse_change_attributes(), we completely
      loose control of actions which may happen with given inode until we reach
      truncate_pagecache. The list of potentially dangerous actions includes
      mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.
      
      The typical outcome of doing truncate_pagecache() with outdated arguments
      is data corruption from user point of view. This is (in some sense)
      acceptable in cases when the issue is triggered by a change of the file on
      the server (i.e. externally wrt fuse operation), but it is absolutely
      intolerable in scenarios when a single fuse client modifies a file without
      any external intervention. A real life case I discovered by fsx-linux
      looked like this:
      
      1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
      FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
      2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
      to the server synchronously, then calls fuse_change_attributes(). The
      latter updates i_size, releases fc->lock, but before comparing oldsize vs
      attr->size..
      3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
      updating attributes and i_size, but now oldsize is equal to
      outarg.attr.size because i_size has just been updated (step 2). Hence,
      fuse_do_setattr() returns w/o calling truncate_pagecache().
      4. As soon as ftruncate(2) completes, the user extends file size by
      write(2) making a hole in the middle of file, then reads data from the hole
      either by read(2) or mmap-ed read. The user expects to get zero data from
      the hole, but gets stale data because truncate_pagecache() is not executed
      yet.
      
      The scenario above illustrates one side of the problem: not truncating the
      page cache even though we should. Another side corresponds to truncating
      page cache too late, when the state of inode changed significantly.
      Theoretically, the following is possible:
      
      1. As in the previous scenario fuse_dentry_revalidate() discovered that
      i_size changed (due to our own fuse_do_setattr()) and is going to call
      truncate_pagecache() for some 'new_size' it believes valid right now. But
      by the time that particular truncate_pagecache() is called ...
      2. fuse_do_setattr() returns (either having called truncate_pagecache() or
      not -- it doesn't matter).
      3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
      4. mmap-ed write makes a page in the extended region dirty.
      
      The result will be the lost of data user wrote on the fourth step.
      
      The patch is a hotfix resolving the issue in a simplistic way: let's skip
      dangerous i_size update and truncate_pagecache if an operation changing
      file size is in progress. This simplistic approach looks correct for the
      cases w/o external changes. And to handle them properly, more sophisticated
      and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
      postpone it until the issue is well discussed on the mailing list(s).
      
      Changed in v2:
       - improved patch description to cover both sides of the issue.
      Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      [bwh: Backported to 3.2: add the fuse_inode::state field which we didn't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe17c202
    • Miklos Szeredi's avatar
      fuse: readdir: check for slash in names · f4a69e06
      Miklos Szeredi authored
      commit efeb9e60 upstream.
      
      Userspace can add names containing a slash character to the directory
      listing.  Don't allow this as it could cause all sorts of trouble.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      [bwh: Backported to 3.2: drop changes to parse_dirplusfile() which we
       don't have]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4a69e06
    • Vyacheslav Dubeyko's avatar
      nilfs2: fix issue with race condition of competition between segments for dirty blocks · 831c8764
      Vyacheslav Dubeyko authored
      commit 7f42ec39 upstream.
      
      Many NILFS2 users were reported about strange file system corruption
      (for example):
      
         NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
         NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)
      
      But such error messages are consequence of file system's issue that takes
      place more earlier.  Fortunately, Jerome Poulin <jeromepoulin@gmail.com>
      and Anton Eliasson <devel@antoneliasson.se> were reported about another
      issue not so recently.  These reports describe the issue with segctor
      thread's crash:
      
        BUG: unable to handle kernel paging request at 0000000000004c83
        IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
        Call Trace:
         nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
         nilfs_segctor_construct+0x17b/0x290 [nilfs2]
         nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
         kthread+0xc0/0xd0
         ret_from_fork+0x7c/0xb0
      
      These two issues have one reason.  This reason can raise third issue
      too.  Third issue results in hanging of segctor thread with eating of
      100% CPU.
      
      REPRODUCING PATH:
      
      One of the possible way or the issue reproducing was described by
      Jermoe me Poulin <jeromepoulin@gmail.com>:
      
      1. init S to get to single user mode.
      2. sysrq+E to make sure only my shell is running
      3. start network-manager to get my wifi connection up
      4. login as root and launch "screen"
      5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
      6. lscp | xz -9e > lscp.txt.xz
      7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
      8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
      9. start a screen and launch strace -f -o find-cat.log -t find
      /mnt/nilfs -type f -exec cat {} > /dev/null \;
      10. start a screen and launch strace -f -o apt-get.log -t apt-get update
      11. launch the last command again as it did not crash the first time
      12. apt-get crashes
      13. ps aux > ps-aux-crashed.log
      13. sysrq+W
      14. sysrq+E  wait for everything to terminate
      15. sysrq+SUSB
      
      Simplified way of the issue reproducing is starting kernel compilation
      task and "apt-get update" in parallel.
      
      REPRODUCIBILITY:
      
      The issue is reproduced not stable [60% - 80%].  It is very important to
      have proper environment for the issue reproducing.  The critical
      conditions for successful reproducing:
      
      (1) It should have big modified file by mmap() way.
      
      (2) This file should have the count of dirty blocks are greater that
          several segments in size (for example, two or three) from time to time
          during processing.
      
      (3) It should be intensive background activity of files modification
          in another thread.
      
      INVESTIGATION:
      
      First of all, it is possible to see that the reason of crash is not valid
      page address:
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
        NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
      
      Moreover, value of b_page (0x1a82) is 6786.  This value looks like segment
      number.  And b_blocknr with b_size values look like block numbers.  So,
      buffer_head's pointer points on not proper address value.
      
      Detailed investigation of the issue is discovered such picture:
      
        [-----------------------------SEGMENT 6783-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783
      
        [-----------------------------SEGMENT 6784-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
        NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15
      
        [-----------------------------SEGMENT 6785-------------------------------]
        NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
        NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
        NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
        NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
        NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
        NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
        NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
        NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
        NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
        NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
        [----------] ditto
        NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12
      
        NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
        NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
      
        NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
      
        BUG: unable to handle kernel paging request at 0000000000001a82
        IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
      
      Usually, for every segment we collect dirty files in list.  Then, dirty
      blocks are gathered for every dirty file, prepared for write and
      submitted by means of nilfs_segbuf_submit_bh() call.  Finally, it takes
      place complete write phase after calling nilfs_end_bio_write() on the
      block layer.  Buffers/pages are marked as not dirty on final phase and
      processed files removed from the list of dirty files.
      
      It is possible to see that we had three prepare_write and submit_bio
      phases before segbuf_wait and complete_write phase.  Moreover, segments
      compete between each other for dirty blocks because on every iteration
      of segments processing dirty buffer_heads are added in several lists of
      payload_buffers:
      
        [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
        [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
      
      The next pointer is the same but prev pointer has changed.  It means
      that buffer_head has next pointer from one list but prev pointer from
      another.  Such modification can be made several times.  And, finally, it
      can be resulted in various issues: (1) segctor hanging, (2) segctor
      crashing, (3) file system metadata corruption.
      
      FIX:
      This patch adds:
      
      (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
          for every proccessed dirty block;
      
      (2) checking of BH_Async_Write flag in
          nilfs_lookup_dirty_data_buffers() and
          nilfs_lookup_dirty_node_buffers();
      
      (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
          nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
      Reported-by: default avatarJerome Poulin <jeromepoulin@gmail.com>
      Reported-by: default avatarAnton Eliasson <devel@antoneliasson.se>
      Cc: Paul Fertser <fercerpav@gmail.com>
      Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp>
      Cc: Piotr Szymaniak <szarpaj@grubelek.pl>
      Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net>
      Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com>
      Cc: Elmer Zhang <freeboy6716@gmail.com>
      Cc: Kenneth Langga <klangga@gmail.com>
      Signed-off-by: default avatarVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: nilfs_clear_dirty_page() has not been separated
       from nilfs_clear_dirty_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Rui Xiang <rui.xiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      831c8764
    • Jiri Olsa's avatar
      perf tools: Fix cache event name generation · 73a11828
      Jiri Olsa authored
      commit 275ef387 upstream.
      
      If the event name is specified with all 3 components, the last one
      overwrites the previous one during the name composing within the
      parse_events_add_cache function.
      
      Fixing this by properly adjusting the string index.
      Reported-by: default avatarJoel Uckelman <joel@lightboxtechnologies.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joel Uckelman <joel@lightboxtechnologies.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LPU-Reference: 20120905175133.GA18352@krava.brq.redhat.com
      [ committer note: Remove the newline fix, done already in 42e1fb77 ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73a11828
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Remove extraneous newline when parsing hardware cache events · f25c118b
      Arnaldo Carvalho de Melo authored
      commit 42e1fb77 upstream.
      
      Noticed while developing a 'perf test' entry to verify that
      perf_evsel__name works.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-xz6zgh38mp3cjnd2udh38z8f@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f25c118b
    • Jiang Liu's avatar
      mm/hotplug: correctly add new zone to all other nodes' zone lists · 446327d6
      Jiang Liu authored
      commit 08dff7b7 upstream.
      
      When online_pages() is called to add new memory to an empty zone, it
      rebuilds all zone lists by calling build_all_zonelists().  But there's a
      bug which prevents the new zone to be added to other nodes' zone lists.
      
      online_pages() {
      	build_all_zonelists()
      	.....
      	node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
      }
      
      Here the node of the zone is put into N_HIGH_MEMORY state after calling
      build_all_zonelists(), but build_all_zonelists() only adds zones from
      nodes in N_HIGH_MEMORY state to the fallback zone lists.
      build_all_zonelists()
      
          ->__build_all_zonelists()
      	->build_zonelists()
      	    ->find_next_best_node()
      		->for_each_node_state(n, N_HIGH_MEMORY)
      
      So memory in the new zone will never be used by other nodes, and it may
      cause strange behavor when system is under memory pressure.  So put node
      into N_HIGH_MEMORY state before calling build_all_zonelists().
      Signed-off-by: default avatarJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarJiang Liu <liuj97@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      446327d6
    • Tejun Heo's avatar
      cgroup: fix RCU accesses to task->cgroups · 71a20685
      Tejun Heo authored
      commit 14611e51 upstream.
      
      task->cgroups is a RCU pointer pointing to struct css_set.  A task
      switches to a different css_set on cgroup migration but a css_set
      doesn't change once created and its pointers to cgroup_subsys_states
      aren't RCU protected.
      
      task_subsys_state[_check]() is the macro to acquire css given a task
      and subsys_id pair.  It RCU-dereferences task->cgroups->subsys[] not
      task->cgroups, so the RCU pointer task->cgroups ends up being
      dereferenced without read_barrier_depends() after it.  It's broken.
      
      Fix it by introducing task_css_set[_check]() which does
      RCU-dereference on task->cgroups.  task_subsys_state[_check]() is
      reimplemented to directly dereference ->subsys[] of the css_set
      returned from task_css_set[_check]().
      
      This removes some of sparse RCU warnings in cgroup.
      
      v2: Fixed unbalanced parenthsis and there's no need to use
          rcu_dereference_raw() when !CONFIG_PROVE_RCU.  Both spotted by Li.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      [bwh: Backported to 3.2:
       - Adjust context
       - Remove CONFIG_PROVE_RCU condition
       - s/lockdep_is_held(&cgroup_mutex)/cgroup_lock_is_held()/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71a20685
    • Kees Cook's avatar
      proc connector: reject unprivileged listener bumps · 2f590c47
      Kees Cook authored
      commit e70ab977 upstream.
      
      While PROC_CN_MCAST_LISTEN/IGNORE is entirely advisory, it was possible
      for an unprivileged user to turn off notifications for all listeners by
      sending PROC_CN_MCAST_IGNORE. Instead, require the same privileges as
      required for a multicast bind.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Acked-by: default avatarEvgeniy Polyakov <zbr@ioremap.net>
      Acked-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f590c47
    • Greg Edwards's avatar
      KVM: IOMMU: hva align mapping page size · f7741e3b
      Greg Edwards authored
      commit 27ef63c7 upstream.
      
      When determining the page size we could use to map with the IOMMU, the
      page size should also be aligned with the hva, not just the gfn.  The
      gfn may not reflect the real alignment within the hugetlbfs file.
      
      Most of the time, this works fine.  However, if the hugetlbfs file is
      backed by non-contiguous huge pages, a multi-huge page memslot starts at
      an unaligned offset within the hugetlbfs file, and the gfn is aligned
      with respect to the huge page size, kvm_host_page_size() will return the
      huge page size and we will use that to map with the IOMMU.
      
      When we later unpin that same memslot, the IOMMU returns the unmap size
      as the huge page size, and we happily unpin that many pfns in
      monotonically increasing order, not realizing we are spanning
      non-contiguous huge pages and partially unpin the wrong huge page.
      
      Ensure the IOMMU mapping page size is aligned with the hva corresponding
      to the gfn, which does reflect the alignment within the hugetlbfs file.
      Reviewed-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Edwards <gedwards@ddn.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      [bwh: Backported to 3.2: s/__gfn_to_hva_memslot/gfn_to_hva_memslot/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7741e3b
    • Alexander Graf's avatar
      KVM: PPC: Emulate dcbf · 67bc20f7
      Alexander Graf authored
      commit d3286144 upstream.
      
      Guests can trigger MMIO exits using dcbf. Since we don't emulate cache
      incoherent MMIO, just do nothing and move on.
      Reported-by: default avatarBen Collins <ben.c@servergy.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Tested-by: default avatarBen Collins <ben.c@servergy.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67bc20f7
    • Christian Borntraeger's avatar
      s390/kvm: dont announce RRBM support · 115f5063
      Christian Borntraeger authored
      commit 87cac8f8 upstream.
      
      Newer kernels (linux-next with the transparent huge page patches)
      use rrbm if the feature is announced via feature bit 66.
      RRBM will cause intercepts, so KVM does not handle it right now,
      causing an illegal instruction in the guest.
      The  easy solution is to disable the feature bit for the guest.
      
      This fixes bugs like:
      Kernel BUG at 0000000000124c2a [verbose debug info unavailable]
      illegal operation: 0001 [#1] SMP
      Modules linked in: virtio_balloon virtio_net ipv6 autofs4
      CPU: 0 Not tainted 3.5.4 #1
      Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670)
      Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
           00000000003cc000 0000000000000004 0000000000000000 0000000079800000
           0000000000040000 0000000000000000 000000007bed3918 000000007cf40000
           0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c
           000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0
       Krnl Code:>0000000000124c2a: b9810025          ogr     %r2,%r5
                 0000000000124c2e: 41343000           la      %r3,0(%r4,%r3)
                 0000000000124c32: a716fffa           brct    %r1,124c26
                 0000000000124c36: b9010022           lngr    %r2,%r2
                 0000000000124c3a: e3d0f0800004       lg      %r13,128(%r15)
                 0000000000124c40: eb22003f000c       srlg    %r2,%r2,63
      [ 2150.713198] Call Trace:
      [ 2150.713223] ([<00000000002312c4>] page_referenced_one+0x6c/0x27c)
      [ 2150.713749]  [<0000000000233812>] page_referenced+0x32a/0x410
      [...]
      
      CC: Alex Graf <agraf@suse.de>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Qiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      115f5063
    • Dominik Dingel's avatar
      KVM: s390: move kvm_guest_enter,exit closer to sie · bf5597c6
      Dominik Dingel authored
      commit 2b29a9fd upstream.
      
      Any uaccess between guest_enter and guest_exit could trigger a page fault,
      the page fault handler would handle it as a guest fault and translate a
      user address as guest address.
      Signed-off-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [hq: Backported to 3.4: adjust context]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf5597c6
    • Tejun Heo's avatar
      cgroup: cgroup_subsys->fork() should be called after the task is added to css_set · 30ec268b
      Tejun Heo authored
      commit 5edee61e upstream.
      
      cgroup core has a bug which violates a basic rule about event
      notifications - when a new entity needs to be added, you add that to
      the notification list first and then make the new entity conform to
      the current state.  If done in the reverse order, an event happening
      inbetween will be lost.
      
      cgroup_subsys->fork() is invoked way before the new task is added to
      the css_set.  Currently, cgroup_freezer is the only user of ->fork()
      and uses it to make new tasks conform to the current state of the
      freezer.  If FROZEN state is requested while fork is in progress
      between cgroup_fork_callbacks() and cgroup_post_fork(), the child
      could escape freezing - the cgroup isn't frozen when ->fork() is
      called and the freezer couldn't see the new task on the css_set.
      
      This patch moves cgroup_subsys->fork() invocation to
      cgroup_post_fork() after the new task is added to the css_set.
      cgroup_fork_callbacks() is removed.
      
      Because now a task may be migrated during cgroup_subsys->fork(),
      freezer_fork() is updated so that it adheres to the usual RCU locking
      and the rather pointless comment on why locking can be different there
      is removed (if it doesn't make anything simpler, why even bother?).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      [hq: Backported to 3.4:
       - Adjust context
       - Iterate over first CGROUP_BUILTIN_SUBSYS_COUNT elements of subsys]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30ec268b
    • Johannes Weiner's avatar
      mm: vmscan: fix endless loop in kswapd balancing · f47929fd
      Johannes Weiner authored
      commit 60cefed4 upstream.
      
      Kswapd does not in all places have the same criteria for a balanced
      zone.  Zones are only being reclaimed when their high watermark is
      breached, but compaction checks loop over the zonelist again when the
      zone does not meet the low watermark plus two times the size of the
      allocation.  This gets kswapd stuck in an endless loop over a small
      zone, like the DMA zone, where the high watermark is smaller than the
      compaction requirement.
      
      Add a function, zone_balanced(), that checks the watermark, and, for
      higher order allocations, if compaction has enough free memory.  Then
      use it uniformly to check for balanced zones.
      
      This makes sure that when the compaction watermark is not met, at least
      reclaim happens and progress is made - or the zone is declared
      unreclaimable at some point and skipped entirely.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarGeorge Spelvin <linux@horizon.com>
      Reported-by: default avatarJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: default avatarTomas Racek <tracek@redhat.com>
      Tested-by: default avatarJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [hq: Backported to 3.4: adjust context]
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      f47929fd
    • Hannes Reinecke's avatar
      dm mpath: fix stalls when handling invalid ioctls · 8371cffe
      Hannes Reinecke authored
      commit a1989b33 upstream.
      
      An invalid ioctl will never be valid, irrespective of whether multipath
      has active paths or not.  So for invalid ioctls we do not have to wait
      for multipath to activate any paths, but can rather return an error
      code immediately.  This fix resolves numerous instances of:
      
       udevd[]: worker [] unexpectedly returned with status 0x0100
      
      that have been seen during testing.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8371cffe
    • Linus Walleij's avatar
      dma: ste_dma40: don't dereference free:d descriptor · 8d8e4839
      Linus Walleij authored
      commit e9baa9d9 upstream.
      
      It appears that in the DMA40 driver the DMA tasklet will very
      often dereference memory for a descriptor just free:d from the
      DMA40 slab. Nothing happens because no other part of the driver
      has yet had a chance to claim this memory, but it's really
      nasty to dereference free:d memory, so let's check the flag
      before the descriptor is free and store it in a bool variable.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d8e4839
    • Jan Kara's avatar
      quota: Fix race between dqput() and dquot_scan_active() · 54499e71
      Jan Kara authored
      commit 1362f4ea upstream.
      
      Currently last dqput() can race with dquot_scan_active() causing it to
      call callback for an already deactivated dquot. The race is as follows:
      
      CPU1					CPU2
        dqput()
          spin_lock(&dq_list_lock);
          if (atomic_read(&dquot->dq_count) > 1) {
           - not taken
          if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
            spin_unlock(&dq_list_lock);
            ->release_dquot(dquot);
              if (atomic_read(&dquot->dq_count) > 1)
               - not taken
      					  dquot_scan_active()
      					    spin_lock(&dq_list_lock);
      					    if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
      					     - not taken
      					    atomic_inc(&dquot->dq_count);
      					    spin_unlock(&dq_list_lock);
              - proceeds to release dquot
      					    ret = fn(dquot, priv);
      					     - called for inactive dquot
      
      Fix the problem by making sure possible ->release_dquot() is finished by
      the time we call the callback and new calls to it will notice reference
      dquot_scan_active() has taken and bail out.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54499e71
    • Eric Paris's avatar
      SELinux: bigendian problems with filename trans rules · 186ef238
      Eric Paris authored
      commit 9085a642 upstream.
      
      When writing policy via /sys/fs/selinux/policy I wrote the type and class
      of filename trans rules in CPU endian instead of little endian.  On
      x86_64 this works just fine, but it means that on big endian arch's like
      ppc64 and s390 userspace reads the policy and converts it from
      le32_to_cpu.  So the values are all screwed up.  Write the values in le
      format like it should have been to start.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      186ef238
    • Peter Zijlstra's avatar
      perf: Fix hotplug splat · f80747a4
      Peter Zijlstra authored
      commit e3703f8c upstream.
      
      Drew Richardson reported that he could make the kernel go *boom* when hotplugging
      while having perf events active.
      
      It turned out that when you have a group event, the code in
      __perf_event_exit_context() fails to remove the group siblings from
      the context.
      
      We then proceed with destroying and freeing the event, and when you
      re-plug the CPU and try and add another event to that CPU, things go
      *boom* because you've still got dead entries there.
      Reported-by: default avatarDrew Richardson <drew.richardson@arm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f80747a4
    • Lai Jiangshan's avatar
      workqueue: ensure @task is valid across kthread_stop() · 23f0913c
      Lai Jiangshan authored
      commit 5bdfff96 upstream.
      
      When a kworker should die, the kworkre is notified through WORKER_DIE
      flag instead of kthread_should_stop().  This, IIRC, is primarily to
      keep the test synchronized inside worker_pool lock.  WORKER_DIE is
      first set while holding pool->lock, the lock is dropped and
      kthread_stop() is called.
      
      Unfortunately, this means that there's a slight chance that the target
      kworker may see WORKER_DIE before kthread_stop() finishes and exits
      and frees the target task before or during kthread_stop().
      
      Fix it by pinning the target task before setting WORKER_DIE and
      putting it after kthread_stop() is done.
      
      tj: Improved patch description and comment.  Moved pinning above
          WORKER_DIE for better signify what it's protecting.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23f0913c
    • Guenter Roeck's avatar
      hwmon: (max1668) Fix writing the minimum temperature · 8d362994
      Guenter Roeck authored
      commit 500a9157 upstream.
      
      When trying to set the minimum temperature, the driver was erroneously
      writing the maximum temperature into the chip.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarJean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d362994
    • Joerg Dorchain's avatar
      USB: ftdi_sio: add Cressi Leonardo PID · 564a4d6c
      Joerg Dorchain authored
      commit 6dbd46c8 upstream.
      
      Hello,
      
      the following patch adds an entry for the PID of a Cressi Leonardo
      diving computer interface to kernel 3.13.0.
      It is detected as FT232RL.
      Works with subsurface.
      Signed-off-by: default avatarJoerg Dorchain <joerg@dorchain.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      564a4d6c
    • Aleksander Morgado's avatar
      USB: serial: option: blacklist interface 4 for Cinterion PHS8 and PXS8 · f922754e
      Aleksander Morgado authored
      commit 12df84d4 upstream.
      
      This interface is to be handled by the qmi_wwan driver.
      
      CC: Hans-Christoph Schemmel <hans-christoph.schemmel@gemalto.com>
      CC: Christian Schmiedl <christian.schmiedl@gemalto.com>
      CC: Nicolaus Colberg <nicolaus.colberg@gemalto.com>
      CC: David McCullough <david.mccullough@accelecon.com>
      Signed-off-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f922754e
    • Lan Tianyu's avatar
      ACPI / processor: Rework processor throttling with work_on_cpu() · 1d023027
      Lan Tianyu authored
      commit f3ca4164 upstream.
      
      acpi_processor_set_throttling() uses set_cpus_allowed_ptr() to make
      sure that the (struct acpi_processor)->acpi_processor_set_throttling()
      callback will run on the right CPU.  However, the function may be
      called from a worker thread already bound to a different CPU in which
      case that won't work.
      
      Make acpi_processor_set_throttling() use work_on_cpu() as appropriate
      instead of abusing set_cpus_allowed_ptr().
      Reported-and-tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarLan Tianyu <tianyu.lan@intel.com>
      [rjw: Changelog]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d023027
    • Hans de Goede's avatar
      ACPI / video: Filter the _BCL table for duplicate brightness values · b0b0c264
      Hans de Goede authored
      commit bd8ba205 upstream.
      
      Some devices have duplicate entries in there brightness levels table, ie
      on my Dell Latitude E6430 the table looks like this:
      
      [    3.686060] acpi backlight index   0, val 80
      [    3.686095] acpi backlight index   1, val 50
      [    3.686122] acpi backlight index   2, val 5
      [    3.686147] acpi backlight index   3, val 5
      [    3.686172] acpi backlight index   4, val 5
      [    3.686197] acpi backlight index   5, val 5
      [    3.686223] acpi backlight index   6, val 5
      [    3.686248] acpi backlight index   7, val 5
      [    3.686273] acpi backlight index   8, val 6
      [    3.686332] acpi backlight index   9, val 7
      [    3.686356] acpi backlight index  10, val 8
      [    3.686380] acpi backlight index  11, val 9
      etc.
      
      Notice that brightness values 0-5 are all mapped to 5. This means that
      if userspace writes any value between 0 and 5 to the brightness sysfs attribute
      and then reads it, it will always return 0, which is somewhat unexpected.
      
      This is a problem for ie gnome-settings-daemon, which uses read-modify-write
      logic when the users presses the brightness up or down keys. This is done
      this way to take brightness changes from other sources into account.
      
      On this specific laptop what happens once the brightness has been set to 0,
      is that gsd reads 0, adds 5, writes 5, and on the next brightness up key press
      again reads 0, so things get stuck at the lowest brightness setting.
      
      Filtering out the duplicate table entries, makes any write to brightness
      read back as the written value as one would expect, fixing this.
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0b0c264
    • Jean Delvare's avatar
      i7core_edac: Fix PCI device reference count · 9f84cff0
      Jean Delvare authored
      commit c0f5eeed upstream.
      
      The reference count changes done by pci_get_device can be a little
      misleading when the usage diverges from the most common scheme. The
      reference count of the device passed as the last parameter is always
      decreased, even if the function returns no new device. So if we are
      going to try alternative device IDs, we must manually increment the
      device reference count before each retry. If we don't, we end up
      decreasing the reference count, and after a few modprobe/rmmod cycles
      the PCI devices will vanish.
      
      In other words and as Alan put it: without this fix the EDAC code
      corrupts the PCI device list.
      
      This fixes kernel bug #50491:
      https://bugzilla.kernel.org/show_bug.cgi?id=50491Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvareReviewed-by: default avatarAlan Cox <alan@linux.intel.com>
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f84cff0
    • Tejun Heo's avatar
      sata_sil: apply MOD15WRITE quirk to TOSHIBA MK2561GSYN · 31d183aa
      Tejun Heo authored
      commit 9f9c47f0 upstream.
      
      It's a bit odd to see a newer device showing mod15write; however, the
      reported behavior is highly consistent and other factors which could
      contribute seem to have been verified well enough.  Also, both
      sata_sil itself and the drive are fairly outdated at this point making
      the risk of this change fairly low.  It is possible, probably likely,
      that other drive models in the same family have the same problem;
      however, for now, let's just add the specific model which was tested.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarmatson <lists-matsonpa@luxsci.me>
      References: http://lkml.kernel.org/g/201401211912.s0LJCk7F015058@rs103.luxsci.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31d183aa
    • Denis V. Lunev's avatar
      ata: enable quirk from jmicron JMB350 for JMB394 · 13e60b22
      Denis V. Lunev authored
      commit efb9e0f4 upstream.
      
      Without the patch the kernel generates the following error.
      
       ata11.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
       ata11.15: Port Multiplier vendor mismatch '0x197b' != '0x123'
       ata11.15: PMP revalidation failed (errno=-19)
       ata11.15: failed to recover PMP after 5 tries, giving up
      
      This patch helps to bypass this error and the device becomes
      functional.
      Signed-off-by: default avatarDenis V. Lunev <den@openvz.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: <linux-ide@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13e60b22