1. 06 Mar, 2013 40 commits
    • Xiaowei.Hu's avatar
      ocfs2: ac->ac_allow_chain_relink=0 won't disable group relink · 0ebab908
      Xiaowei.Hu authored
      commit 309a85b6 upstream.
      
      ocfs2_block_group_alloc_discontig() disables chain relink by setting
      ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple
      cluster groups.
      
      It doesn't keep the credits for all chain relink,but
      ocfs2_claim_suballoc_bits overrides this in this call trace:
      ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()->
      __ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits()
      ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call
      ocfs2_search_chain() one time and disable it again, and then we run out
      of credits.
      
      Fix is to allow relink by default and disable it in
      ocfs2_block_group_alloc_discontig.
      
      Without this patch, End-users will run into a crash due to run out of
      credits, backtrace like this:
      
        RIP: 0010:[<ffffffffa0808b14>]  [<ffffffffa0808b14>]
        jbd2_journal_dirty_metadata+0x164/0x170 [jbd2]
        RSP: 0018:ffff8801b919b5b8  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0
        RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8
        RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0
        R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800
        FS:  00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0)
        Call Trace:
          ocfs2_journal_dirty+0x2f/0x70 [ocfs2]
          ocfs2_relink_block_group+0x111/0x480 [ocfs2]
          ocfs2_search_chain+0x455/0x9a0 [ocfs2]
          ...
      Signed-off-by: default avatarXiaowei.Hu <xiaowei.hu@oracle.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0ebab908
    • Jeff Liu's avatar
      ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctly · 98e4a531
      Jeff Liu authored
      commit 32918dd9 upstream.
      
      We need to re-initialize the security for a new reflinked inode with its
      parent dirs if it isn't specified to be preserved for ocfs2_reflink().
      However, the code logic is broken at ocfs2_init_security_and_acl()
      although ocfs2_init_security_get() succeed.  As a result,
      ocfs2_acl_init() does not involked and therefore the default ACL of
      parent dir was missing on the new inode.
      
      Note this was introduced by 9d8f13ba ("security: new
      security_inode_init_security API adds function callback")
      
      To reproduce:
      
          set default ACL for the parent dir(ocfs2 in this case):
          $ setfacl -m default:user:jeff:rwx ../ocfs2/
          $ getfacl ../ocfs2/
          # file: ../ocfs2/
          # owner: jeff
          # group: jeff
          user::rwx
          group::r-x
          other::r-x
          default:user::rwx
          default:user:jeff:rwx
          default:group::r-x
          default:mask::rwx
          default:other::r-x
      
          $ touch a
          $ getfacl a
          # file: a
          # owner: jeff
          # group: jeff
          user::rw-
          group::rw-
          other::r--
      
      Before patching, create reflink file b from a, the user
      default ACL entry(user:jeff:rwx)was missing:
      
          $ ./ocfs2_reflink a b
          $ getfacl b
          # file: b
          # owner: jeff
          # group: jeff
          user::rw-
          group::rw-
          other::r--
      
      In this case, the end user can also observed an error message at syslog:
      
        (ocfs2_reflink,3229,2):ocfs2_init_security_and_acl:7193 ERROR: status = 0
      
      After applying this patch, create reflink file c from a:
      
          $ ./ocfs2_reflink a c
          $ getfacl c
          # file: c
          # owner: jeff
          # group: jeff
          user::rw-
          user:jeff:rwx			#effective:rw-
          group::r-x			#effective:r--
          mask::rw-
          other::r--
      
      Test program:
      /* Usage: reflink <source> <dest> */
      #include <stdio.h>
      #include <stdint.h>
      #include <stdbool.h>
      #include <string.h>
      #include <errno.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <sys/ioctl.h>
      
      static int
      reflink_file(char const *src_name, char const *dst_name,
      	     bool preserve_attrs)
      {
      	int fd;
      
      #ifndef REFLINK_ATTR_NONE
      #  define REFLINK_ATTR_NONE 0
      #endif
      #ifndef REFLINK_ATTR_PRESERVE
      #  define REFLINK_ATTR_PRESERVE 1
      #endif
      #ifndef OCFS2_IOC_REFLINK
      	struct reflink_arguments {
      		uint64_t old_path;
      		uint64_t new_path;
      		uint64_t preserve;
      	};
      
      #  define OCFS2_IOC_REFLINK _IOW ('o', 4, struct reflink_arguments)
      #endif
      	struct reflink_arguments args = {
      		.old_path = (unsigned long) src_name,
      		.new_path = (unsigned long) dst_name,
      		.preserve = preserve_attrs ? REFLINK_ATTR_PRESERVE :
      					     REFLINK_ATTR_NONE,
      	};
      
      	fd = open(src_name, O_RDONLY);
      	if (fd < 0) {
      		fprintf(stderr, "Failed to open %s: %s\n",
      			src_name, strerror(errno));
      		return -1;
      	}
      
      	if (ioctl(fd, OCFS2_IOC_REFLINK, &args) < 0) {
      		fprintf(stderr, "Failed to reflink %s to %s: %s\n",
      			src_name, dst_name, strerror(errno));
      		return -1;
      	}
      }
      
      int
      main(int argc, char *argv[])
      {
      	if (argc != 3) {
      		fprintf(stdout, "Usage: %s source dest\n", argv[0]);
      		return 1;
      	}
      
      	return reflink_file(argv[1], argv[2], 0);
      }
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Reviewed-by: default avatarTao Ma <boyu.mt@taobao.com>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      98e4a531
    • H. Peter Anvin's avatar
      x86: Make sure we can boot in the case the BDA contains pure garbage · c2581d3d
      H. Peter Anvin authored
      commit 7c100936 upstream.
      
      On non-BIOS platforms it is possible that the BIOS data area contains
      garbage instead of being zeroed or something equivalent (firmware
      people: we are talking of 1.5K here, so please do the sane thing.)
      
      We need on the order of 20-30K of low memory in order to boot, which
      may grow up to < 64K in the future.  We probably want to avoid the
      lowest of the low memory.  At the same time, it seems extremely
      unlikely that a legitimate EBDA would ever reach down to the 128K
      (which would require it to be over half a megabyte in size.)  Thus,
      pick 128K as the cutoff for "this is insane, ignore."  We may still
      end up reserving a bunch of extra memory on the low megabyte, but that
      is not really a major issue these days.  In the worst case we lose
      512K of RAM.
      
      This code really should be merged with trim_bios_range() in
      arch/x86/kernel/setup.c, but that is a bigger patch for a later merge
      window.
      Reported-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.orgSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c2581d3d
    • Jan Kara's avatar
      ocfs2: fix possible use-after-free with AIO · d7763498
      Jan Kara authored
      commit 9b171e0c upstream.
      
      Running AIO is pinning inode in memory using file reference. Once AIO
      is completed using aio_complete(), file reference is put and inode can
      be freed from memory. So we have to be sure that calling aio_complete()
      is the last thing we do with the inode.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d7763498
    • Konrad Rzeszutek Wilk's avatar
      doc, kernel-parameters: Document 'console=hvc<n>' · 7479d5dd
      Konrad Rzeszutek Wilk authored
      commit a2fd6419 upstream.
      
      Both the PowerPC hypervisor and Xen hypervisor can utilize the
      hvc driver.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1361825650-14031-3-git-send-email-konrad.wilk@oracle.comSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7479d5dd
    • Konrad Rzeszutek Wilk's avatar
      doc, xen: Mention 'earlyprintk=xen' in the documentation. · 56070188
      Konrad Rzeszutek Wilk authored
      commit 2482a92e upstream.
      
      The earlyprintk for Xen PV guests utilizes a simple hypercall
      (console_io) to provide output to Xen emergency console.
      
      Note that the Xen hypervisor should be booted with 'loglevel=all'
      to output said information.
      Reported-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1361825650-14031-2-git-send-email-konrad.wilk@oracle.comSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      56070188
    • Shawn Guo's avatar
      mmc: sdhci-esdhc-imx: fix host version read · e3efb0af
      Shawn Guo authored
      commit ef4d0888 upstream.
      
      When commit 95a2482a (mmc: sdhci-esdhc-imx: add basic imx6q usdhc
      support) works around host version issue on imx6q, it gets the
      register address fixup "reg ^= 2" lost for imx25/35/51/53 esdhc.
      Thus, the controller version on these SoCs is wrongly identified
      as v1 while it's actually v2.
      
      Add the address fixup back and take a different approach to correct
      imx6q host version, so that the host version read gets back to work
      for all SoCs.
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarChris Ball <cjb@laptop.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e3efb0af
    • Greg Thelen's avatar
      tmpfs: fix use-after-free of mempolicy object · 2b82b58d
      Greg Thelen authored
      commit 5f00110f upstream.
      
      The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
      option is not specified in the remount request.  A new policy can be
      specified if mpol=M is given.
      
      Before this patch remounting an mpol bound tmpfs without specifying
      mpol= mount option in the remount request would set the filesystem's
      mempolicy object to a freed mempolicy object.
      
      To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
          # mkdir /tmp/x
      
          # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
      
          # mount -o remount,size=200M nodev /tmp/x
      
          # grep /tmp/x /proc/mounts
          nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
              # note ? garbage in mpol=... output above
      
          # dd if=/dev/zero of=/tmp/x/f count=1
              # panic here
      
      Panic:
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          [...]
          Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
          Call Trace:
            mpol_shared_policy_init+0xa5/0x160
            shmem_get_inode+0x209/0x270
            shmem_mknod+0x3e/0xf0
            shmem_create+0x18/0x20
            vfs_create+0xb5/0x130
            do_last+0x9a1/0xea0
            path_openat+0xb3/0x4d0
            do_filp_open+0x42/0xa0
            do_sys_open+0xfe/0x1e0
            compat_sys_open+0x1b/0x20
            cstar_dispatch+0x7/0x1f
      
      Non-debug kernels will not crash immediately because referencing the
      dangling mpol will not cause a fault.  Instead the filesystem will
      reference a freed mempolicy object, which will cause unpredictable
      behavior.
      
      The problem boils down to a dropped mpol reference below if
      shmem_parse_options() does not allocate a new mpol:
      
          config = *sbinfo
          shmem_parse_options(data, &config, true)
          mpol_put(sbinfo->mpol)
          sbinfo->mpol = config.mpol  /* BUG: saves unreferenced mpol */
      
      This patch avoids the crash by not releasing the mempolicy if
      shmem_parse_options() doesn't create a new mpol.
      
      How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
      not look back further.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2b82b58d
    • Mel Gorman's avatar
      mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages · 9fb42c9a
      Mel Gorman authored
      commit 67d46b29 upstream.
      
      Rob van der Heij reported the following (paraphrased) on private mail.
      
      	The scenario is that I want to avoid backups to fill up the page
      	cache and purge stuff that is more likely to be used again (this is
      	with s390x Linux on z/VM, so I don't give it as much memory that
      	we don't care anymore). So I have something with LD_PRELOAD that
      	intercepts the close() call (from tar, in this case) and issues
      	a posix_fadvise() just before closing the file.
      
      	This mostly works, except for small files (less than 14 pages)
      	that remains in page cache after the face.
      
      Unfortunately Rob has not had a chance to test this exact patch but the
      test program below should be reproducing the problem he described.
      
      The issue is the per-cpu pagevecs for LRU additions.  If the pages are
      added by one CPU but fadvise() is called on another then the pages
      remain resident as the invalidate_mapping_pages() only drains the local
      pagevecs via its call to pagevec_release().  The user-visible effect is
      that a program that uses fadvise() properly is not obeyed.
      
      A possible fix for this is to put the necessary smarts into
      invalidate_mapping_pages() to globally drain the LRU pagevecs if a
      pagevec page could not be discarded.  The downside with this is that an
      inode cache shrink would send a global IPI and memory pressure
      potentially causing global IPI storms is very undesirable.
      
      Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
      check if invalidate_mapping_pages() discarded all the requested pages.
      If a subset of pages are discarded it drains the LRU pagevecs and tries
      again.  If the second attempt fails, it assumes it is due to the pages
      being mapped, locked or dirty and does not care.  With this patch, an
      application using fadvise() correctly will be obeyed but there is a
      downside that a malicious application can force the kernel to send
      global IPIs and increase overhead.
      
      If accepted, I would like this to be considered as a -stable candidate.
      It's not an urgent issue but it's a system call that is not working as
      advertised which is weak.
      
      The following test program demonstrates the problem.  It should never
      report that pages are still resident but will without this patch.  It
      assumes that CPU 0 and 1 exist.
      
      int main() {
      	int fd;
      	int pagesize = getpagesize();
      	ssize_t written = 0, expected;
      	char *buf;
      	unsigned char *vec;
      	int resident, i;
      	cpu_set_t set;
      
      	/* Prepare a buffer for writing */
      	expected = FILESIZE_PAGES * pagesize;
      	buf = malloc(expected + 1);
      	if (buf == NULL) {
      		printf("ENOMEM\n");
      		exit(EXIT_FAILURE);
      	}
      	buf[expected] = 0;
      	memset(buf, 'a', expected);
      
      	/* Prepare the mincore vec */
      	vec = malloc(FILESIZE_PAGES);
      	if (vec == NULL) {
      		printf("ENOMEM\n");
      		exit(EXIT_FAILURE);
      	}
      
      	/* Bind ourselves to CPU 0 */
      	CPU_ZERO(&set);
      	CPU_SET(0, &set);
      	if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
      		perror("sched_setaffinity");
      		exit(EXIT_FAILURE);
      	}
      
      	/* open file, unlink and write buffer */
      	fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
      	if (fd == -1) {
      		perror("open");
      		exit(EXIT_FAILURE);
      	}
      	unlink("fadvise-test-file");
      	while (written < expected) {
      		ssize_t this_write;
      		this_write = write(fd, buf + written, expected - written);
      
      		if (this_write == -1) {
      			perror("write");
      			exit(EXIT_FAILURE);
      		}
      
      		written += this_write;
      	}
      	free(buf);
      
      	/*
      	 * Force ourselves to another CPU. If fadvise only flushes the local
      	 * CPUs pagevecs then the fadvise will fail to discard all file pages
      	 */
      	CPU_ZERO(&set);
      	CPU_SET(1, &set);
      	if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
      		perror("sched_setaffinity");
      		exit(EXIT_FAILURE);
      	}
      
      	/* sync and fadvise to discard the page cache */
      	fsync(fd);
      	if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
      		perror("posix_fadvise");
      		exit(EXIT_FAILURE);
      	}
      
      	/* map the file and use mincore to see which parts of it are resident */
      	buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
      	if (buf == NULL) {
      		perror("mmap");
      		exit(EXIT_FAILURE);
      	}
      	if (mincore(buf, expected, vec) == -1) {
      		perror("mincore");
      		exit(EXIT_FAILURE);
      	}
      
      	/* Check residency */
      	for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
      		if (vec[i])
      			resident++;
      	}
      	if (resident != 0) {
      		printf("Nr unexpected pages resident: %d\n", resident);
      		exit(EXIT_FAILURE);
      	}
      
      	munmap(buf, expected);
      	close(fd);
      	free(vec);
      	exit(EXIT_SUCCESS);
      }
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reported-by: default avatarRob van der Heij <rvdheij@gmail.com>
      Tested-by: default avatarRob van der Heij <rvdheij@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9fb42c9a
    • Robin Holt's avatar
      mmu_notifier_unregister NULL Pointer deref and multiple ->release() callouts · 09454abb
      Robin Holt authored
      commit 751efd86 upstream.
      
      There is a race condition between mmu_notifier_unregister() and
      __mmu_notifier_release().
      
      Assume two tasks, one calling mmu_notifier_unregister() as a result of a
      filp_close() ->flush() callout (task A), and the other calling
      mmu_notifier_release() from an mmput() (task B).
      
                      A                               B
      t1                                              srcu_read_lock()
      t2              if (!hlist_unhashed())
      t3                                              srcu_read_unlock()
      t4              srcu_read_lock()
      t5                                              hlist_del_init_rcu()
      t6                                              synchronize_srcu()
      t7              srcu_read_unlock()
      t8              hlist_del_rcu()  <--- NULL pointer deref.
      
      Additionally, the list traversal in __mmu_notifier_release() is not
      protected by the by the mmu_notifier_mm->hlist_lock which can result in
      callouts to the ->release() notifier from both mmu_notifier_unregister()
      and __mmu_notifier_release().
      
      -stable suggestions:
      
      The stable trees prior to 3.7.y need commits 21a92735 and
      70400303 cherry-picked in that order prior to cherry-picking this
      commit.  The 3.7.y tree already has those two commits.
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Sagi Grimberg <sagig@mellanox.co.il>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      09454abb
    • Andrea Arcangeli's avatar
      mm: mmu_notifier: make the mmu_notifier srcu static · 3d2b066c
      Andrea Arcangeli authored
      commit 70400303 upstream.
      
      The variable must be static especially given the variable name.
      
      s/RCU/SRCU/ over a few comments.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3d2b066c
    • Sagi Grimberg's avatar
      mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may safely schedule · 80da2799
      Sagi Grimberg authored
      commit 21a92735 upstream.
      
      With an RCU based mmu_notifier implementation, any callout to
      mmu_notifier_invalidate_range_{start,end}() or
      mmu_notifier_invalidate_page() would not be allowed to call schedule()
      as that could potentially allow a modification to the mmu_notifier
      structure while it is currently being used.
      
      Since srcu allocs 4 machine words per instance per cpu, we may end up
      with memory exhaustion if we use srcu per mm.  So all mms share a global
      srcu.  Note that during large mmu_notifier activity exit & unregister
      paths might hang for longer periods, but it is tolerable for current
      mmu_notifier clients.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.co.il>
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      80da2799
    • Phileas Fogg's avatar
      powerpc/kexec: Disable hard IRQ before kexec · 819a56e8
      Phileas Fogg authored
      commit 8520e443 upstream.
      
      Disable hard IRQ before kexec a new kernel image.
      Not doing it can result in corrupted data in the memory segments
      reserved for the new kernel.
      Signed-off-by: default avatarPhileas Fogg <phileas-fogg@mail.ru>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      819a56e8
    • Jan Kara's avatar
      fs: Fix possible use-after-free with AIO · 53edcd23
      Jan Kara authored
      commit 54c807e7 upstream.
      
      Running AIO is pinning inode in memory using file reference. Once AIO
      is completed using aio_complete(), file reference is put and inode can
      be freed from memory. So we have to be sure that calling aio_complete()
      is the last thing we do with the inode.
      
      CC: Christoph Hellwig <hch@infradead.org>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Jeff Moyer <jmoyer@redhat.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      53edcd23
    • Lukas Czerner's avatar
      ext4: fix free clusters calculation in bigalloc filesystem · 668f6e6a
      Lukas Czerner authored
      commit 304e220f upstream.
      
      ext4_has_free_clusters() should tell us whether there is enough free
      clusters to allocate, however number of free clusters in the file system
      is converted to blocks using EXT4_C2B() which is not only wrong use of
      the macro (we should have used EXT4_NUM_B2C) but it's also completely
      wrong concept since everything else is in cluster units.
      
      Moreover when calculating number of root clusters we should be using
      macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be
      off by one. However r_blocks_count should always be a multiple of the
      cluster ratio so doing a plain bit shift should be enough here. We
      avoid using EXT4_B2C() because it's confusing.
      
      As a result of the first problem number of free clusters is much bigger
      than it should have been and ext4_has_free_clusters() would return 1 even
      if there is really not enough free clusters available.
      
      Fix this by removing the EXT4_C2B() conversion of free clusters and
      using bit shift when calculating number of root clusters. This bug
      affects number of xfstests tests covering file system ENOSPC situation
      handling. With this patch most of the ENOSPC problems with bigalloc file
      system disappear, especially the errors caused by delayed allocation not
      having enough space when the actual allocation is finally requested.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      668f6e6a
    • Lars-Peter Clausen's avatar
      drivers/video/backlight/adp88?0_bl.c: fix resume · 8ca4e7b9
      Lars-Peter Clausen authored
      commit 5eb02c01 upstream.
      
      Clearing the NSTBY bit in the control register also automatically clears
      the BLEN bit.  So we need to make sure to set it again during resume,
      otherwise the backlight will stay off.
      Signed-off-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Acked-by: default avatarMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8ca4e7b9
    • Junxiao Bi's avatar
      ocfs2: unlock super lock if lockres refresh failed · 9ba3d1e2
      Junxiao Bi authored
      commit 3278bb74 upstream.
      
      If lockres refresh failed, the super lock will never be released which
      will cause some processes on other cluster nodes hung forever.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9ba3d1e2
    • MITSUNARI Shigeo's avatar
      fs/block_dev.c: page cache wrongly left invalidated after revalidate_disk() · 65a79a3b
      MITSUNARI Shigeo authored
      commit 7630b661 upstream.
      
      We found that bdev->bd_invalidated was left set once revalidate_disk()
      is called, which results in page cache flush every time that device is
      open.
      
      Specifically, we found this problem in MD block device.  Once we resize
      a MD device, mdadm --monitor periodically flush all page cache for that
      device every 60 or 1000 seconds when it opens the device.
      
      This bug lies since at least 3.2.0 till the latest kernel(3.6.2).  Patch
      is attached.
      
      The following steps will reproduce the problem.
      
      1. prepair a block device (eg /dev/sdb).
      
      2. create two partitions:
      
         sudo parted /dev/sdb
         mklabel gpt
         mkpart primary 0% 50%
         mkpart primary 50% 100%
      
      3. create a md device.
      
         sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2
      
      4. create file system and mount it
      
         sudo mkfs.ext3 /dev/md/hoge
         sudo mkdir /mnt/test
         sudo mount /dev/md/hoge /mnt/test
      
      5. try to resize the device
      
         sudo mdadm -G /dev/md/hoge --size=max
      
      6. create a file to fill file cache.
      
        sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10
      
      and verify the current status of file by free command.
      
      7. mdadm monitor will open the md device every 1000 seconds and you
         will find all file cache on the device are cleared.
      
      The timing can be reduced by the following steps.
      
      a) kill mdadm and restart it with --delay option
      
         /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog
      
      or open the md device directly.
      
         sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1
      Signed-off-by: default avatarMITSUNARI Shigeo <herumi@nifty.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      65a79a3b
    • Jim Somerville's avatar
      inotify: remove broken mask checks causing unmount to be EINVAL · c8164cb5
      Jim Somerville authored
      commit 676a0675 upstream.
      
      Running the command:
      
      	inotifywait -e unmount /mnt/disk
      
      immediately aborts with a -EINVAL return code.  This is however a valid
      parameter.  This abort occurs only if unmount is the sole event
      parameter.  If other event parameters are supplied, then the unmount
      event wait will work.
      
      The problem was introduced by commit 44b350fc ("inotify: Fix mask
      checks").  In that commit, it states:
      
      	The mask checks in inotify_update_existing_watch() and
      	inotify_new_watch() are useless because inotify_arg_to_mask()
      	sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway.
      
      But instead of removing the useless checks, it did this:
      
      	        mask = inotify_arg_to_mask(arg);
      	-       if (unlikely(!mask))
      	+       if (unlikely(!(mask & IN_ALL_EVENTS)))
      	                return -EINVAL;
      
      The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other
      parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS.  So the
      check should be:
      
      	if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT))))
      
      But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask
      anyway, so the check is always going to pass and thus should simply be
      removed.  Also note that inotify_arg_to_mask completely controls what
      mask bits get set from arg, there's no way for invalid bits to get
      enabled there.
      
      Lets fix it by simply removing the useless broken checks.
      Signed-off-by: default avatarJim Somerville <Jim.Somerville@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c8164cb5
    • Tejun Heo's avatar
      posix-timer: Don't call idr_find() with out-of-range ID · 4fa1b6ed
      Tejun Heo authored
      commit e182bb38 upstream.
      
      When idr_find() was fed a negative ID, it used to look up the ID
      ignoring the sign bit before recent ("idr: remove MAX_IDR_MASK and
      move left MAX_IDR_* into idr.c") patch. Now a negative ID triggers
      a WARN_ON_ONCE().
      
      __lock_timer() feeds timer_id from userland directly to idr_find()
      without sanitizing it which can trigger the above malfunctions.  Add a
      range check on @timer_id before invoking idr_find() in __lock_timer().
      
      While timer_t is defined as int by all archs at the moment, Andrew
      worries that it may be defined as a larger type later on.  Make the
      test cover larger integers too so that it at least is guaranteed to
      not return the wrong timer.
      
      Note that WARN_ON_ONCE() in idr_find() on id < 0 is transitional
      precaution while moving away from ignoring MSB.  Once it's gone we can
      remove the guard as long as timer_t isn't larger than int.
      
      Signed-off-by: Tejun Heo <tj@kernel.org>nnn
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20130220232412.GL3570@htj.dyndns.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4fa1b6ed
    • Pawel Moll's avatar
      ALSA: usb: Fix Processing Unit Descriptor parsers · 7d3cbb43
      Pawel Moll authored
      commit b531f81b upstream.
      
      Commit 99fc8645 "ALSA: usb-mixer:
      parse descriptors with structs" introduced a set of useful parsers
      for descriptors. Unfortunately the parses for the Processing Unit
      Descriptor came with a very subtle bug...
      
      Functions uac_processing_unit_iProcessing() and
      uac_processing_unit_specific() were indexing the baSourceID array
      forgetting the fields before the iProcessing and process-specific
      descriptors.
      
      The problem was observed with Sound Blaster Extigy mixer,
      where nNrModes in Up/Down-mix Processing Unit Descriptor
      was accessed at offset 10 of the descriptor (value 0)
      instead of offset 15 (value 7). In result the resulting
      control had interesting limit values:
      
      Simple mixer control 'Channel Routing Mode Select',0
        Capabilities: volume volume-joined penum
        Playback channels: Mono
        Capture channels: Mono
        Limits: 0 - -1
        Mono: -1 [100%]
      
      Fixed by starting from the bmControls, which was calculated
      correctly, instead of baSourceID.
      
      Now the mentioned control is fine:
      
      Simple mixer control 'Channel Routing Mode Select',0
        Capabilities: volume volume-joined penum
        Playback channels: Mono
        Capture channels: Mono
        Limits: 0 - 6
        Mono: 0 [0%]
      Signed-off-by: default avatarPawel Moll <mail@pawelmoll.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [bwh: Backported to 3.2: adjust filename]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7d3cbb43
    • Matt Fleming's avatar
      x86, efi: Make "noefi" really disable EFI runtime serivces · 1a116473
      Matt Fleming authored
      commit fb834c7a upstream.
      
      commit 1de63d60 ("efi: Clear EFI_RUNTIME_SERVICES rather than
      EFI_BOOT by "noefi" boot parameter") attempted to make "noefi" true to
      its documentation and disable EFI runtime services to prevent the
      bricking bug described in commit e0094244 ("samsung-laptop:
      Disable on EFI hardware"). However, it's not possible to clear
      EFI_RUNTIME_SERVICES from an early param function because
      EFI_RUNTIME_SERVICES is set in efi_init() *after* parse_early_param().
      
      This resulted in "noefi" effectively becoming a no-op and no longer
      providing users with a way to disable EFI, which is bad for those
      users that have buggy machines.
      Reported-by: default avatarWalt Nelson Jr <walt0924@gmail.com>
      Cc: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1361392572-25657-1-git-send-email-matt@console-pimps.orgSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      [bwh: Backported to 3.2: efi_runtime_init() is not a separate function,
       so put a whole set of statements in an if (!disable_runtime) block]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1a116473
    • Stefan Bader's avatar
      xen: Send spinlock IPI to all waiters · 7afb6c33
      Stefan Bader authored
      commit 76eaca03 upstream.
      
      There is a loophole between Xen's current implementation of
      pv-spinlocks and the scheduler. This was triggerable through
      a testcase until v3.6 changed the TLB flushing code. The
      problem potentially is still there just not observable in the
      same way.
      
      What could happen was (is):
      
      1. CPU n tries to schedule task x away and goes into a slow
         wait for the runq lock of CPU n-# (must be one with a lower
         number).
      2. CPU n-#, while processing softirqs, tries to balance domains
         and goes into a slow wait for its own runq lock (for updating
         some records). Since this is a spin_lock_irqsave in softirq
         context, interrupts will be re-enabled for the duration of
         the poll_irq hypercall used by Xen.
      3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
         an interrupt (e.g. endio) and when processing the interrupt,
         tries to wake up task x. But that is in schedule and still
         on_cpu, so try_to_wake_up goes into a tight loop.
      4. The runq lock of CPU n-# gets unlocked, but the message only
         gets sent to the first waiter, which is CPU n-# and that is
         busily stuck.
      5. CPU n-# never returns from the nested interruption to take and
         release the lock because the scheduler uses a busy wait.
         And CPU n never finishes the task migration because the unlock
         notification only went to CPU n-#.
      
      To avoid this and since the unlocking code has no real sense of
      which waiter is best suited to grab the lock, just send the IPI
      to all of them. This causes the waiters to return from the hyper-
      call (those not interrupted at least) and do active spinlocking.
      
      BugLink: http://bugs.launchpad.net/bugs/1011792Acked-by: default avatarJan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7afb6c33
    • Wei Liu's avatar
      xen: close evtchn port if binding to irq fails · c03cebd1
      Wei Liu authored
      commit e7e44e44 upstream.
      Signed-off-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c03cebd1
    • Daniel Vetter's avatar
      intel/iommu: force writebuffer-flush quirk on Gen 4 Chipsets · cf65e1c8
      Daniel Vetter authored
      commit 210561ff upstream.
      
      We already have the quirk entry for the mobile platform, but also
      reports on some desktop versions. So be paranoid and set it
      everywhere.
      
      References: http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg33138.html
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "Sankaran, Rajesh" <rajesh.sankaran@intel.com>
      Reported-and-tested-by: default avatarMihai Moldovan <ionic@ionic.de>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cf65e1c8
    • Patrik Jakobsson's avatar
      drm/i915: Set i9xx sdvo clock limits according to specifications · 9ee8bc68
      Patrik Jakobsson authored
      commit 4f7dfb67 upstream.
      
      The Intel PRM says the M1 and M2 divisors must be in the range of 10-20 and 5-9.
      Since we do all calculations based on them being register values (which are
      subtracted by 2) we need to specify them accordingly.
      Signed-off-by: default avatarPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56359Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9ee8bc68
    • Jani Nikula's avatar
      drm/i915: add missing \n to UTS_RELEASE in the error_state · 59bebf6c
      Jani Nikula authored
      commit fdfa175d upstream.
      
      Amending
      commit 4518f611
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Jan 23 16:16:35 2013 +0100
      
          drm/i915: dump UTS_RELEASE into the error_state
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      59bebf6c
    • Mika Kuoppala's avatar
      drm/i915: disable shared panel fitter for pipe · db0a8a87
      Mika Kuoppala authored
      commit 24a1f16d upstream.
      
      If encoder is switched off by BIOS, but the panel fitter is left on,
      we never try to turn off the panel fitter and leave it still attached
      to the pipe - which can cause blurry output elsewhere.
      
      Based on work by Chris Wilson <chris@chris-wilson.co.uk>
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58867Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Tested-by: default avatarAndreas Sturmlechner <andreas.sturmlechner@gmail.com>
      [danvet: Remove the redundant HAS_PCH_SPLIT check and add a tiny
      comment.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      db0a8a87
    • Paulo Zanoni's avatar
      drm: don't add inferred modes for monitors that don't support them · 7a8e149e
      Paulo Zanoni authored
      commit 196e077d upstream.
      
      If bit 0 of the features byte (0x18) is set to 0, then, according to
      the EDID spec, "the display is non-continuous frequency (multi-mode)
      and is only specified to accept the video timing formats that are
      listed in Base EDID and certain Extension Blocks".
      
      For more information, please see the EDID spec, check the notes of the
      table that explains the "Feature Support" byte (18h) and also the
      notes on the tables of the section that explains "Display Range Limits
      & Additional Timing Description Definition (tag #FDh)".
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45729Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarAdam Jackson <ajax@redhat.com>
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7a8e149e
    • Jan Beulich's avatar
      xen-blkback: do not leak mode property · 0b9662bc
      Jan Beulich authored
      commit 9d092603 upstream.
      
      "be->mode" is obtained from xenbus_read(), which does a kmalloc() for
      the message body. The short string is never released, so do it along
      with freeing "be" itself, and make sure the string isn't kept when
      backend_changed() doesn't complete successfully (which made it
      desirable to slightly re-structure that function, so that the error
      cleanup can be done in one place).
      Reported-by: default avatarOlaf Hering <olaf@aepfle.de>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0b9662bc
    • David Henningsson's avatar
      ALSA: hda - hdmi: ELD shouldn't be valid after unplug · dd250c18
      David Henningsson authored
      commit bbfd8a19 upstream.
      
      Currently, eld_valid is never set to false, except at kernel module
      load time. This patch makes sure that eld is no longer valid when
      the cable is (hot-)unplugged.
      Signed-off-by: default avatarDavid Henningsson <david.henningsson@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dd250c18
    • Trond Myklebust's avatar
      NLM: Ensure that we resend all pending blocking locks after a reclaim · c0bb151f
      Trond Myklebust authored
      commit 666b3d80 upstream.
      
      Currently, nlmclnt_lock will break out of the for(;;) loop when
      the reclaimer wakes up the blocking lock thread by setting
      nlm_lck_denied_grace_period. This causes the lock request to fail
      with an ENOLCK error.
      The intention was always to ensure that we resend the lock request
      after the grace period has expired.
      Reported-by: default avatarWangyuan Zhang <Wangyuan.Zhang@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c0bb151f
    • Larry Finger's avatar
      b43: Increase number of RX DMA slots · c6f8eab7
      Larry Finger authored
      commit ccae0e50 upstream.
      
      Bastian Bittorf reported that some of the silent freezes on a Linksys WRT54G
      were due to overflow of the RX DMA ring buffer, which was created with 64
      slots. That finding reminded me that I was seeing similar crashed on a netbook,
      which also has a relatively slow processor. After increasing the number of
      slots to 128, runs on the netbook that previously failed now worked; however,
      I found that 109 slots had been used in one test. For that reason, the number
      of slots is being increased to 256.
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Bastian Bittorf <bittorf@bluebottle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c6f8eab7
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Call ftrace cleanup module notifier after all other notifiers · a638e6bb
      Steven Rostedt (Red Hat) authored
      commit 8c189ea6 upstream.
      
      Commit: c1bf08ac "ftrace: Be first to run code modification on modules"
      
      changed ftrace module notifier's priority to INT_MAX in order to
      process the ftrace nops before anything else could touch them
      (namely kprobes). This was the correct thing to do.
      
      Unfortunately, the ftrace module notifier also contains the ftrace
      clean up code. As opposed to the set up code, this code should be
      run *after* all the module notifiers have run in case a module is doing
      correct clean-up and unregisters its ftrace hooks. Basically, ftrace
      needs to do clean up on module removal, as it needs to know about code
      being removed so that it doesn't try to modify that code. But after it
      removes the module from its records, if a ftrace user tries to remove
      a probe, that removal will fail due as the record of that code segment
      no longer exists.
      
      Nothing really bad happens if the probe removal is called after ftrace
      did the clean up, but the ftrace removal function will return an error.
      Correct code (such as kprobes) will produce a WARN_ON() if it fails
      to remove the probe. As people get annoyed by frivolous warnings, it's
      best to do the ftrace clean up after everything else.
      
      By splitting the ftrace_module_notifier into two notifiers, one that
      does the module load setup that is run at high priority, and the other
      that is called for module clean up that is run at low priority, the
      problem is solved.
      Reported-by: default avatarFrank Ch. Eigler <fche@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a638e6bb
    • Nicholas Bellinger's avatar
      target: Add missing mapped_lun bounds checking during make_mappedlun setup · 5f5db37d
      Nicholas Bellinger authored
      commit fbbf8555 upstream.
      
      This patch adds missing bounds checking for the configfs provided
      mapped_lun value during target_fabric_make_mappedlun() setup ahead
      of se_lun_acl initialization.
      
      This addresses a potential OOPs when using a mapped_lun value that
      exceeds the hardcoded TRANSPORT_MAX_LUNS_PER_TPG-1 value within
      se_node_acl->device_list[].
      Reported-by: default avatarJan Engelhardt <jengelh@inai.de>
      Cc: Jan Engelhardt <jengelh@inai.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5f5db37d
    • Nicholas Bellinger's avatar
      target: Fix lookup of dynamic NodeACLs during cached demo-mode operation · f310bed9
      Nicholas Bellinger authored
      commit fcf29481 upstream.
      
      This patch fixes a bug in core_tpg_check_initiator_node_acl() ->
      core_tpg_get_initiator_node_acl() where a dynamically created
      se_node_acl generated during session login would be skipped during
      subsequent lookup due to the '!acl->dynamic_node_acl' check, causing
      a new se_node_acl to be created with a duplicate ->initiatorname.
      
      This would occur when a fabric endpoint was configured with
      TFO->tpg_check_demo_mode()=1 + TPF->tpg_check_demo_mode_cache()=1
      preventing the release of an existing se_node_acl during se_session
      shutdown.
      
      Also, drop the unnecessary usage of core_tpg_get_initiator_node_acl()
      within core_dev_init_initiator_node_lun_acl() that originally
      required the extra '!acl->dynamic_node_acl' check, and just pass
      the configfs provided se_node_acl pointer instead.
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [bwh: Backported to 3.2: adjust context, filename of header]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f310bed9
    • Jussi Kivilinna's avatar
      rtlwifi: usb: allocate URB control message setup_packet and data buffer separately · 9e2282d4
      Jussi Kivilinna authored
      commit bc6b8923 upstream.
      
      rtlwifi allocates both setup_packet and data buffer of control message urb,
      using shared kmalloc in _usbctrl_vendorreq_async_write. Structure used for
      allocating is:
      	struct {
      		u8 data[254];
      		struct usb_ctrlrequest dr;
      	};
      
      Because 'struct usb_ctrlrequest' is __packed, setup packet is unaligned and
      DMA mapping of both 'data' and 'dr' confuses ARM/sunxi, leading to memory
      corruptions and freezes.
      
      Patch changes setup packet to be allocated separately.
      
      [v2]:
       - Use WARN_ON_ONCE instead of WARN_ON
      Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9e2282d4
    • Linus Torvalds's avatar
      mm: fix pageblock bitmap allocation · 0d15de8e
      Linus Torvalds authored
      commit 7c45512d upstream.
      
      Commit c060f943 ("mm: use aligned zone start for pfn_to_bitidx
      calculation") fixed out calculation of the index into the pageblock
      bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages.
      
      However, the _allocation_ of that bitmap had never taken this alignment
      requirement into accout, so depending on the exact size and alignment of
      the zone, the use of that index could then access past the allocation,
      resulting in some very subtle memory corruption.
      
      This was reported (and bisected) by Ingo Molnar: one of his random
      config builds would hang with certain very specific kernel command line
      options.
      
      In the meantime, commit c060f943 has been marked for stable, so this
      fix needs to be back-ported to the stable kernels that backported the
      commit to use the right alignment.
      Bisected-and-tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0d15de8e
    • Lukas Czerner's avatar
      ext4: fix xattr block allocation/release with bigalloc · f861f64d
      Lukas Czerner authored
      commit 1231b3a1 upstream.
      
      Currently when new xattr block is created or released we we would call
      dquot_free_block() or dquot_alloc_block() respectively, among the else
      decrementing or incrementing the number of blocks assigned to the
      inode by one block.
      
      This however does not work for bigalloc file system because we always
      allocate/free the whole cluster so we have to count with that in
      dquot_free_block() and dquot_alloc_block() as well.
      
      Use the clusters-to-blocks conversion EXT4_C2B() when passing number of
      blocks to the dquot_alloc/free functions to fix the problem.
      
      The problem has been revealed by xfstests #117 (and possibly others).
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f861f64d
    • Li Zefan's avatar
      cpuset: fix cpuset_print_task_mems_allowed() vs rename() race · 59222bdd
      Li Zefan authored
      commit 63f43f55 upstream.
      
      rename() will change dentry->d_name. The result of this race can
      be worse than seeing partially rewritten name, but we might access
      a stale pointer because rename() will re-allocate memory to hold
      a longer name.
      
      It's safe in the protection of dentry->d_lock.
      
      v2: check NULL dentry before acquiring dentry lock.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      59222bdd